SANER 2018

2018 IEEE 25th International Conference on Software Analysis, Evolution, and Reengineering (SANER), March 20-23, 2018, Campobasso, Italy

Desktop Layout

Defect Prediction
Technical Research Papers
Room 2
Cross-Version Defect Prediction via Hybrid Active Learning with Kernel Principal Component Analysis
Zhou Xu, Jin Liu, Xiapu Luo, and Tao Zhang
(Wuhan University, China; Hong Kong Polytechnic University, China; Harbin Engineering University, China)
Abstract: As defects in software modules may cause product failure and financial loss, it is critical to utilize defect prediction methods to effectively identify the potentially defective modules for a thorough inspection, especially in the early stage of software development lifecycle. For an upcoming version of a software project, it is practical to employ the historical labeled defect data of the prior versions within the same project to conduct defect prediction on the current version, i.e., Cross-Version Defect Prediction (CVDP). However, software development is a dynamic evolution process that may cause the data distribution (such as defect characteristics) to vary across versions. Furthermore, the raw features usually may not well reveal the intrinsic structure information behind the data. Therefore, it is challenging to perform effective CVDP. In this paper, we propose a two-phase CVDP framework that combines Hybrid Active Learning and Kernel PCA (HALKP) to address these two issues. In the first stage, HALKP uses a hybrid active learning method to select some informative and representative unlabeled modules from the current version for querying their labels, then merges them into the labeled modules of the prior version to form an enhanced training set. In the second stage, HALKP employs a non-linear mapping method, kernel PCA, to extract representative features by embedding the original data of two versions into a high-dimension space. We evaluate the HALKP framework on 31 versions of 10 projects with three prevalent performance indicators. The experimental results indicate that HALKP achieves encouraging results with average F-measure, g-mean and Balance of 0.480, 0.592 and 0.580, respectively and significantly outperforms nearly all baseline methods.


Time stamp: 2019-06-24T14:28:25+02:00