2015 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2015), August 30 – September 4, 2015, Bergamo, Italy

Desktop Layout

Measurement and Metric
Research Papers
Alabastro A, Chair: Kathryn Stolee
A Method to Identify and Correct Problematic Software Activity Data: Exploiting Capacity Constraints and Data Redundancies
Qimu Zheng, Audris Mockus, and Minghui Zhou
(Peking University, China; University of Tennessee, USA)
Publisher's Version
Abstract: Mining software repositories to understand and improve software development is a common approach in research and practice. The operational data obtained from these repositories often do not faithfully represent the intended aspects of software development and, therefore, may jeopardize the conclusions derived from it. We propose an approach to identify problematic values based on the constraints of software development and to correct such values using data redundancies. We investigate the approach using issue and commit data of Mozilla project. In particular, we identified problematic data in four types of events and found the fraction of problematic values to exceed 10% and rapidly rising. We found the corrected values to be 50% closer to the most accurate estimate of task completion time. Finally, we found that the models of time until fix changed substantially when data were corrected, with the corrected data providing a 20% better fit. We discuss how the approach may be generalized to other types of operational data to increase fidelity of software measurement in practice and in research.


Time stamp: 2019-07-19T12:25:45+02:00