MaLTeSQuE 2018 – Proceedings

Varying Defect Prediction Approaches during Project Evolution: A Preliminary Investigation
Salvatore Geremia and Damian A. Tamburri
(University of Molise, Italy; Eindhoven University of Technology, Netherlands)
Defect prediction approaches use various features of software product or process to prioritize testing, analysis and general quality assurance activities. Such approaches require the availability of project’s historical data, making them inapplicable in early phase. To cope with this problem, researchers have proposed cross-project and even cross-company prediction models, which use training material from other projects to build the model. Despite such advances, there is limited knowledge of how, as the project evolves, it would be convenient to still keep using data from other projects, and when, instead, it might become convenient to switch towards a local prediction model. This paper empirically investigates, using historical data from four open source projects, on how the performance of various kinds of defect prediction approaches - within-project prediction, local and global cross-project prediction, and mixed (injected local cross) prediction - varies over time. Results of the study are part of a long-term investigation towards supporting the customization of defect prediction models over projects’ history.

The Role of Meta-Learners in the Adaptive Selection of Classifiers
Dario Di Nucci

and Andrea De Lucia

(University of Salerno, Italy; Vrije Universiteit Brussel, Belgium)
The use of machine learning techniques able to classify source code components in defective or not received a lot of attention by the research community in the last decades. Previous studies indicated that no machine learning classifier is capable of providing the best accuracy in any context, highlighting interesting complementarity among them. For these reasons ensemble methods, that combines several classifier models, have been proposed. Among these, it was proposed ASCI (Adaptive Selection of Classifiers in bug predIction), an adaptive method able to dynamically select among a set of machine learning classifiers the one that better predicts the bug proneness of a class based on its characteristics. In summary, ASCI experiments each classifier on the training set and then use a meta-learner (e.g., Random Forest) to select the most suitable classifier to use for each test set instance. In this work, we conduct an empirical investigation on 21 open source software systems with the aim of analyzing the performance of several classifiers used as meta-learner in combination with ASCI. The results show that the selection of the meta-learner has not strong influence in the results achieved by ASCI in the context of within-project bug prediction. Indeed, the use of lightweight classifiers such as Naive Bayes or Logistic Regression is suggested.

Machine Learning-Based Run-Time Anomaly Detection in Software Systems: An Industrial Evaluation
Fabian Huch

, Mojdeh Golagha, Ana Petrovska, and Alexander Krauss
(TU Munich, Germany; QAware, Germany)
Anomalies are an inevitable occurrence while operating enterprise software systems. Traditionally, anomalies are detected by threshold-based alarms for critical metrics, or health probing requests. However, fully automated detection in complex systems is challenging, since it is very difficult to distinguish truly anomalous behavior from normal operation. To this end, the traditional approaches may not be sufficient. Thus, we propose machine learning classifiers to predict the system’s health status. We evaluated our approach in an industrial case study, on a large, real-world dataset of 7.5· 10⁶ data points for 231 features. Our results show that recurrent neural networks with long short-term memory (LSTM) are more effective in detecting anomalies and health issues, as compared to other classifiers. We achieved an area under precision-recall curve of 0.44. At the default threshold, we can automatically detect 70 % of the anomalies. Despite the low precision of 31 %, the rate in which false positives occur is only 4 %.

Change Analysis and Testing

How High Will It Be? Using Machine Learning Models to Predict Branch Coverage in Automated Testing
Giovanni Grano

, Timofey V. Titov, Sebastiano Panichella, and Harald C. Gall

(University of Zurich, Switzerland)
Software testing is a crucial component in modern continuous integration development environment. Ideally, at every commit, all the system's test cases should be executed and moreover, new test cases should be generated for the new code. This is especially true in a Continuous Test Generation (CTG) environment, where the automatic generation of test cases is integrated into the continuous integration pipeline. Furthermore, developers want to achieve a minimum level of coverage for every build of their systems. Since both executing all the test cases and generating new ones for all the classes at every commit is not feasible, they have to select which subset of classes has to be tested. In this context, knowing a priori the branch coverage that can be achieved with test data generation tools might give some useful indications for answering such a question. In this paper, we take the first steps towards the definition of machine learning models to predict the branch coverage achieved by test data generation tools. We conduct a preliminary study considering well known code metrics as a features. Despite the simplicity of these features, our results show that using machine learning to predict branch coverage in automated testing is a viable and feasible option.

Ensemble Techniques for Software Change Prediction: A Preliminary Investigation
Gemma Catolino

and Filomena Ferrucci

(University of Salerno, Italy)
Predicting the classes more likely to change in the future helps developers to focus on the more critical parts of a software system, with the aim of preventively improving its maintainability. The research community has devoted a lot of effort in the definition of change prediction models, i.e., models exploiting a machine learning classifier to relate a set of independent variables to the change-proneness of classes. Besides the good performances of such models, key results of previous studies highlight how classifiers tend to perform similarly even though they are able to correctly predict the change-proneness of different code elements, possibly indicating the presence of some complementarity among them. In this paper, we aim at analyzing the extent to which ensemble methodologies, i.e., machine learning techniques able to combine multiple classifiers, can improve the performances of change-prediction models. Specifically, we empirically compared the performances of three ensemble techniques (i.e., Boosting, Random Forest, and Bagging) with those of standard machine learning classifiers (i.e., Logistic Regression and Naive Bayes). The study was conducted on eight open source systems and the results showed how ensemble techniques, in some cases, perform better than standard machine learning approaches, even if the differences among them is small. This requires the need of further research aimed at devising effective methodologies to ensemble different classifiers.

Co-evolution Analysis of Production and Test Code by Learning Association Rules of Changes
László Vidács and Martin Pinzger
(University of Szeged, Hungary; University of Klagenfurt, Austria)
Many modern software systems come with automated tests. While these tests help to maintain code quality by providing early feedback after modifications, they also need to be maintained. In this paper, we replicate a recent pattern mining experiment to find patterns on how production and test code co-evolve over time. Understanding co-evolution patterns may directly affect the quality of tests and thus the quality of the whole system. The analysis takes into account fine grained changes in both types of code. Since the full list of fine grained changes cannot be perceived, association rules are learned from the history to extract co-change patterns.We analyzed the occurrence of 6 patterns throughout almost 2500 versions of a Java system and found that patterns are present, but supported by weaker links than in previously reported. Hence we experimented with weighting methods and investigated the composition of commits.

User-Oriented Machine Learning for Software Quality Assessment

ConfigFile++: Automatic Comment Enhancement for Misconfiguration Prevention
Yuanliang Zhang, Shanshan Li

, Xiangyang Xu, Xiangke Liao, Shazhou Yang, and Yun Xiong

(National University of Defense Technology, China; Fudan University, China)
Nowadays, misconfiguration has become one of the key factors leading to system problems. Most current research on the topic explores misconfiguration diagnosis, but is less concerned with educating users about how to configure correctly in order to prevent misconfiguration before it happens. In this paper, we manually study 22 open source software projects and summarize several observations on the comments of their configuration files, most of which lack sufficient information and are poorly formatted. Based on these observations and the general process of misconfiguration diagnosis, we design and implement a tool called ConfigFile++ that automatically enhances the comment in configuration files. By using name-based analysis and machine learning, ConfigFile++ extracts guiding information about the configuration option from the user manual and source code, and inserts it into the configuration files. The format of insert comment is also designed to make enhanced comments concise and clear. We use real-world examples of misconfigurations to evaluate our tool. The results show that ConfigFile++ can prevent 33 out of 50 misconfigurations.

Investigating Type Declaration Mismatches in Python
Luca Pascarella

, Achyudh Ram, Azqa Nadeem

, Dinesh Bisesser, Norman Knyazev, and Alberto Bacchelli

(Delft University of Technology, Netherlands; University of Zurich, Switzerland)
Past research provided evidence that developers making code changes sometimes omit to update the related documentation, thus creating inconsistencies that may contribute to faults and crashes. In dynamically typed languages, such as Python, an inconsistency in the documentation may lead to a mismatch in type declarations only visible at runtime. With our study, we investigate how often the documentation is inconsistent in a sample of 239 methods from five Python open- source software projects. Our results highlight that more than 20% of the comments are either partially defined or entirely missing and that almost 1% of the methods in the analyzed projects contain type inconsistencies. Based on these results, we create a tool, PyID, to early detect type mismatches in Python documentation and we evaluate its performance with our oracle.

User-Perceived Reusability Estimation Based on Analysis of Software Repositories
Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis

, and Andreas Symeonidis
(University of Thessaloniki, Greece)
The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.

MaLTeSQuE 2018 – Proceedings

2018 IEEE International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)

Frontmatter

Defects and Anomalies

Change Analysis and Testing

User-Oriented Machine Learning for Software Quality Assessment