MaLTeSQuE 2022 – Proceedings

Message from the Chairs
Welcome to the 6th edition of the workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE 2022), held in Singapore, on November 18th, 2022, co-located with the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). MaLTeSQuE received a total of six submissions from all over the world, from which five papers were included in the program. The program also features two keynotes, by Yuriy Brun and Mike Papadakis, on the promises, dangers, and best practices of working at the intersection of machine learning and software engineering.

Keynote

The Promise and Perils of Using Machine Learning When Engineering Software (Keynote Paper)
Yuriy Brun

(University of Massachusetts at Amherst, USA)
Machine learning has radically changed what computing can accomplish, including the limits of what software engineering can do. I will discuss recent software engineering advances machine learning has enabled, from automatically repairing software bugs to data-driven software systems that automatically learn to make decisions. Unfortunately, with the promises of these new technologies come serious perils. For example, automatically generated program patches can break as much functionality as they repair. And self-learning, data-driven software can make decisions that result in unintended consequences, including unsafe, racist, or sexist behavior. But to build solutions to these shortcomings we may need to look no further than machine learning itself. I will introduce multiple ways machine learning can help verify software properties, leading to higher-quality systems.

Publisher's Version

Machine Learning for Code Assessment

Neural Language Models for Code Quality Identification
Srinivasan Sengamedu

and Hangqi Zhao
(Amazon, USA; Twitter, USA)
Neural Language Models for code have lead to interesting applications such as code completion and bug fix generation. Another type of code related application is the identification of code quality issues such as repetitive code and unnatural code. Neural language models contain implicit knowledge about such aspects. We propose a framework to detect code quality issues using neural language models. To handle repository-specific conventions, we use local or repository-specific models. The models are successful in detecting real-world code quality issues with low false positive rate.

Publisher's Version

Are Machine Programming Systems using Right Source-Code Measures to Select Code Repositories?
Niranjan Hasabnis
(Intel Corporation, USA)
Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing, and it aims to assist software and hardware engineers, among other applications. Along with powerful compute resources, MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming and solve problems in the areas of debugging, code recommendation, auto-completion, etc. Unfortunately, several of the existing MP systems either do not consider quality of code repositories or use atypical quality measures than those typically used in software engineering community to select them. As such, impact of quality of code repositories on the performance of these systems needs to be studied.
In this preliminary paper, we evaluate impact of different quality repositories on the performance of a candidate MP system. Towards that objective, we develop a framework, named GitRank, to rank open-source repositories on quality, maintainability, and popularity by leveraging existing research on this topic. We then apply GitRank to evaluate correlation between the quality measures used by the candidate MP system and the quality measures used by our framework. Our preliminary results reveal some correlation between the quality measures used in GitRank and ControlFlag's performance, suggesting that some of the measures used in GitRank are applicable to ControlFlag. But it also raises questions around right quality measures for code repositories used in MP systems. We believe that our findings also generate interesting insights towards code quality measures that affect performance of MP systems.

Publisher's Version

Machine Learning for Software Processes

On the Application of Machine Learning Models to Assess and Predict Software Reusability
Matthew Yit Hang Yeow

, Chun Yong Chong

, and Mei Kuan Lim

(Monash University, Malaysia)
Software reuse has proven to be an effective strategy for developers to significantly increase software quality, reduce costs and increase the effectiveness of software development. Research in software reuse typically addresses two main hurdles: reduce the time and effort required to identify reusable candidates, and avoid selecting low-quality software components that may lead to higher cost of development (i.e., solving bugs, errors, refactoring). Inherently, human judgment falls short in the aspect of reliability and effectiveness. Hence this paper investigates the applicability of Machine Learning (ML) algorithms in assessing software reuse. We collected more than 32k open-source projects and employed GitHub fork as the ground truth to its reuse. We developed ML classification pipelines based on both internal and external software metrics to perform software reuse prediction. Our best-performing ML classification model achieved an accuracy of 86%, outperforming existing research in prediction performance and data coverage. Subsequently, we leverage our results by identifying key software characteristics that make software highly reusable. Our results show that size-related metrics (i.e., number of setters, methods, attributes) are the most impactful in contributing to the reuse of the software.

Publisher's Version

Using Machine Learning to Guide the Application of Software Refactorings: A Preliminary Exploration
Nikolaos Nikolaidis

, Dimitrios Zisis

, Apostolos Ampatzoglou

, Nikolaos Mittas

, and Alexander Chatzigeorgiou

(University of Macedonia, Greece; Accenture, Greece; International Hellenic University, Greece)
Refactorings constitute the most direct and comprehensible ap-proach for addressing software quality issues, stemming directly from identified code smells. Nevertheless, despite their popularity in both the research and industrial communities: (a) the effect of a refactoring is not guaranteed to be successful; and (b) the plethora of available refactoring opportunities does not allow their compre-hensive application. Thus, there is a need of guidance, on when to apply a refactoring opportunity, and when the development team shall postpone it. The notion of interest, forms one of the major pil-lars of the Technical Debt metaphor expressing the additional maintenance effort that will be required because of the accumulated debt. To assess the benefits of refactorings and guide when a refac-toring should take place, we first present the results of an empirical study assessing and quantifying the impact of various refactorings on Technical Debt Interest (building a real-world training set) and use machine learning approaches for guiding the application of fu-ture refactorings. To estimate interest, we rely on the FITTED framework, which for each object-oriented class assesses its dis-tance from the best-quality peer; whereas the refactorings that are applied throughout the history of a software project are extracted with the RefactoringMiner tool. The dataset of this study involves 4,166 refactorings applied across 26,058 revisions of 10 Apache projects. The results suggest that the majority of refactorings reduce Technical Debt interest; however, considering all refactoring appli-cations, it cannot be claimed that the mean impact differs from zero, confirming the results of previous studies highlighting mixed ef-fects from the application of refactorings. To alleviate this problem, we have built an adequately accurate (~70%) model for the predic-tion of whether or not a refactoring should take place, in order to reduce Technical Debt interest.

Publisher's Version

DeepCrash: Deep Metric Learning for Crash Bucketing Based on Stack Trace
Liu Chao

, Xie Qiaoluan, Li Yong, Xu Yang, and Choi Hyun-Deok
(SAP Labs, China; SAP Labs, South Korea)
Some software projects collect vast crash reports from testing and end users, then organize them in groups to efficiently fix bugs. This task is crash report bucketing. In particular, a high precision and fast speed crash similarity measurement approach is the critical constraint for large-scale crash bucketing. In this paper, we propose a deep learning-based crash bucketing method which maps stack trace to feature vectors and groups these feature vectors into buckets. First, we develop a frame tokenization method for stack trace, called frame2vec, to extract frame representations based on frame segmentation. Second, we propose a deep metric model to map the sequential stack trace representations into feature vectors whose similarity can represent the similarity of crashes. Third, a clustering algorithm is used to rapidly group similar feature vectors into same buckets to get the final result. Additionally, we evaluate our approach with the other seven competing methods on both private and public data sets. The results reveal that our method can speed up clustering and maintain high competitive precision.

Publisher's Version

MaLTeSQuE 2022 – Proceedings

6th International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE 2022)

Frontmatter

Keynote

Machine Learning for Code Assessment

Machine Learning for Software Processes