MaLTeSQuE 2020 – Proceedings

A Preliminary Study on the Adequacy of Static Analysis Warnings with Respect to Code Smell Prediction
Savanna Lujan, Fabiano Pecorelli, Fabio Palomba

, Andrea De Lucia

, and Valentina Lenarduzzi
(Tampere University, Finland; University of Salerno, Italy; LUT University, Finland)
Code smells are poor implementation choices applied during software evolution that can affect source code maintainability. While several heuristic-based approaches have been proposed in the past, machine learning solutions have recently gained attention since they may potentially address some limitations of state-of-the-art approaches. Unfortunately, however, machine learning-based code smell detectors still suffer from low accuracy. In this paper, we aim at advancing the knowledge in the field by investigating the role of static analysis warnings as features of machine learning models for the detection of three code smell types. We first verify the potential contribution given by these features. Then, we build code smell prediction models exploiting the most relevant features coming from the first analysis. The main finding of the study reports that the warnings given by the considered tools lead the performance of code smell prediction models to drastically increase with respect to what reported by previous research in the field.

Publisher's Version

DeepIaC: Deep Learning-Based Linguistic Anti-pattern Detection in IaC
Nemania Borovits, Indika Kumara, Parvathy Krishnan, Stefano Dalla Palma, Dario Di Nucci, Fabio Palomba

, Damian A. Tamburri, and Willem-Jan van den Heuvel
(Tilburg University, Netherlands; JADS, Netherlands; Eindhoven University of Technology, Netherlands; University of Salerno, Italy)
Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between 0.785 and 0.915 in detecting inconsistencies.

Publisher's Version

Speeding Up the Data Extraction of Machine Learning Approaches: A Distributed Framework
Martin Steinhauer and Fabio Palomba

(University of Salerno, Italy)
In the last decade, mining software repositories (MSR) has become one of the most important sources to feed machine learning models. Especially open-source projects on platforms like GitHub are providing a tremendous amount of data and make them easily accessible. Nevertheless, there is still a lack of standardized pipelines to extract data in an automated and fast way. Even though several frameworks and tools exist which can fulfill specific tasks or parts of the data extraction process, none of them allow neither building an automated mining pipeline nor the possibility for full parallelization. As a consequence, researchers interested in using mining software repositories to feed machine learning models are often forced to re-implement commonly used tasks leading to additional development time and libraries may not be integrated optimally.
This preliminary study aims to demonstrate current limitations of existing tools and Git itself which are threatening the prospects of standardization and parallelization. We also introduce the multi-dimensionality aspects of a Git repository and how they affect the computation time. Finally, as a proof of concept, we define an exemplary pipeline for predicting refactoring operations, assessing its performance. Finally, we discuss the limitations of the pipeline and further optimizations to be done.

Publisher's Version

Anomalies and Defects

RARE: A Labeled Dataset for Cloud-Native Memory Anomalies
Francesco Lomio

, Diego Martínez Baselga, Sergio Moreschini, Heikki Huttunen, and Davide Taibi

(Tampere University, Finland)
Anomaly detection has been attracting interest from both the industry and the research community for many years, as the number of published papers and services adopted grew exponentially over the last decade. One of the reasons behind this is the wide adoption of cloud systems from the majority of players in multiple industries, such as online shopping, advertisement or remote computing. In this work we propose a Dataset foR cloud-nAtive memoRy anomaliEs: RARE. It includes labelled anomaly time-series data, comprising of over 900 unique metrics. This dataset has been generated using a microservice for injecting artificial byte stream in order to overload the nodes, provoking memory anomalies, which in some cases resulted in a crash. The system was built using a Kafka server deployed on a Kubernetes system. Moreover, in order to get access and download the metrics related to the server, we utilised Prometheus. In this paper we present a dataset that can be used coupled with machine learning algorithms for detecting anomalies in a cloud based system. The dataset will be available in the form of CSV file through an online repository. Moreover, we also included an example of application using a Random Forest algorithm for classifying the data as anomalous or not. The goal of the RARE dataset is to help in the development of more accurate and reliable machine learning methods for anomaly detection in cloud based systems.

Publisher's Version

TraceSim: A Method for Calculating Stack Trace Similarity
Roman Vasiliev

, Dmitrij Koznov

, George Chernishev

, Aleksandr Khvorov

, Dmitry Luciv

, and Nikita Povarov

(JetBrains, Russia; Saint-Petersburg State University, Russia; ITMO University, Russia)
Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to a large volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in the overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient.
In this paper, we describe TraceSim — a novel approach to this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.

Publisher's Version

Singling the Odd Ones Out: A Novelty Detection Approach to Find Defects in Infrastructure-as-Code
Stefano Dalla Palma

, Majid Mohammadi, Dario Di Nucci, and Damian A. Tamburri
(Tilburg University, Netherlands; JADS, Netherlands; Eindhoven University of Technology, Netherlands)
Infrastructure-as-Code (IaC) is increasingly adopted. However, little is known about how to best maintain and evolve it. Previous studies focused on defining Machine-Learning models to predict defect-prone blueprints using supervised binary classification. This class of techniques uses both defective and non-defective instances in the training phase. Furthermore, the high imbalance between defective and non-defective samples makes the training more difficult and leads to unreliable classifiers. In this work, we tackle the defect-prediction problem from a different perspective using novelty detection and evaluate the performance of three techniques, namely OneClassSVM, LocalOutlierFactor, and IsolationForest, and compare their performance with a baseline RandomForest binary classifier. Such models are trained using only non-defective samples: defective data points are treated as novelty because the number of defective samples is too little compared to defective ones. We conduct an empirical study on an extremely-imbalanced dataset consisting of 85 real-world Ansible projects containing only small amounts of defective instances. We found that novelty detection techniques can recognize defects with a high level of precision and recall, an AUC-PR up to 0.86, and an MCC up to 0.31. We deem our results can influence the current trends in defect detection and put forward a new research path toward dealing with this problem.

Publisher's Version

MaLTeSQuE 2020 – Proceedings

4th ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE 2020)

Frontmatter

Anti-patterns

Anomalies and Defects