SANER 2020 Workshops
Workshops of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering (SANER)
Powered by
Conference Publishing Consulting

2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF), February 18, 2020, London, ON, Canada

IBF 2020 – Proceedings

Contents - Abstracts - Authors

2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF)


Title Page

Message from the Chairs
Welcome to the Second International Workshop on Intelligent Bug Fixing (IBF 2020) co-located with the 27th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2020) held in London, ON, Canada.

Patches and Testing

Exploring the Differences between Plausible and Correct Patches at Fine-Grained Level
Bo Yang and Jinqiu Yang
(Concordia University, Canada)
Test-based automated program repair techniques use test cases to validate the correctness of automatically- generated patches. However, insufficient test cases lead to the generation of incorrect patches, i.e., passing all the test cases, however are incorrect. In this work, we present an exploratory study to understand what are the runtime behaviours are being modified by automatically-generated plausible patches, and how such modifications of runtime behaviours are different from those by correct patches. We utilized an off-the-shelf invariant generation tool to infer an abstraction of runtime behaviours and computed the modified runtime behaviours at the abstraction level. Our exploratory study shows that majority of the studied plausible patches (92/96) expose different modifications of runtime behaviours (i.e., captured by the invariant generation tool), compared to correct patches.

Article Search
Can This Fault Be Detected by Automated Test Generation: A Preliminary Study
Hangyuan Cheng, Ping Ma, Jingxuan Zhang, and Jifeng Xuan
(Wuhan University, China; Nanjing University of Aeronautics and Astronautics, China)
Automated test generation can reduce the manual effort to improve software quality. A test generation method employs code coverage, such as the widely-used branch coverage, to guide the inference of test cases. These test cases can be used to detect hidden faults. An automatic tool takes a specific type of code coverage as a configurable parameter. Given an automated tool of test generation, a fault may be detected by one type of code coverage, but omitted by another. In frequently released software projects, the time budget of testing is limited. Configuring code coverage for a testing tool can effectively improve the quality of projects. In this paper, we conduct a preliminary study on whether a fault can be detected by specific code coverage in automated test generation. We build predictive models with 60 metrics of faulty source code to identify detectable faults under eight types of code coverage, such as branch coverage. In the experiment, an off-the-shelf tool, EvoSuite is used to generate test data. Experimental results show that different types of code coverage result in the detection of different faults. The extracted metrics of faulty source code can be used to predict whether a fault can be detected with the given code coverage; all studied code coverage can increase the number of detected faults that are missed by the widely-used branch coverage. This study can be viewed as a preliminary result to support the configuration of code coverage in the application of automated test generation.

Article Search
Utilizing Source Code Embeddings to Identify Correct Patches
Viktor Csuvik, Dániel Horváth, Ferenc Horváth, and László Vidács
(University of Szeged, Hungary)
The so called Generate-and-Validate approach of Automatic Program Repair consists of two main activities, the generate activity, which produces candidate solutions to the problem, and the validate activity, which checks the correctness of the generated solutions. The latter however might not give a reliable result, since most of the techniques establish the correctness of the solutions by (re-)running the available test cases. A program is marked as a possible fix, if it passes all the available test cases. Although tests can be run automatically, in real life applications the problem of over- and underfitting often occurs, resulting in inadequate patches. At this point manual investigation of repair candidates is needed although they passed the tests. Our goal is to investigate ways to predict correct patches. The core idea is to exploit textual and structural similarity between the original (buggy) program and the generated patches. To do so we apply Doc2vec and Bert embedding methods on source code. So far APR tools generate mostly one-line fixes, leaving most of the original source code intact. Our observation was, that patches which bring in new variables, make larger changes in the code are usually the incorrect ones. The proposed approach was evaluated on the QuixBugs dataset consisting of 40 bugs and fixes belonging to them. Our approach successfully filtered out 45% of the incorrect patches.

Article Search

Quality and Bugs

An Empirical Study of High-Impact Factors for Machine Learning-Based Vulnerability Detection
Wei Zheng, Jialiang Gao, Xiaoxue Wu, Yuxing Xun, Guoliang Liu, and Xiang Chen
(Northwestern Polytechnical University, China; Nantong University, China)
Vulnerability detection is an important topic of software engineering. To improve the effectiveness and efficiency of vulnerability detection, many traditional machine learning-based and deep learning-based vulnerability detection methods have been proposed. However, the impact of different factors on vulnerability detection is unknown. For example, classification models and vectorization methods can directly affect the detection results and code replacement can affect the features of vulnerability detection. We conduct a comparative study to evaluate the impact of different classification algorithms, vectorization methods and user-defined variables and functions name replacement. In this paper, we collected three different vulnerability code datasets. These datasets correspond to different types of vulnerabilities and have different proportions of source code. Besides, we extract and analyze the features of vulnerability code datasets to explain some experimental results. Our findings from the experimental results can be summarized as follows: (i) the performance of using deep learning is better than using traditional machine learning and BLSTM can achieve the best performance. (ii) CountVectorizer can improve the performance of traditional machine learning. (iii) Different vulnerability types and different code sources will generate different features. We use the Random Forest algorithm to generate the features of vulnerability code datasets. These generated features include system-related functions, syntax keywords, and user-defined names. (iv) Datasets without user-defined variables and functions name replacement will achieve better vulnerability detection results.

Article Search
An Empirical Study of Bug Bounty Programs
Thomas Walshe and Andrew Simpson
(University of Oxford, UK)
The task of identifying vulnerabilities is commonly outsourced to hackers participating in bug bounty programs. As of July 2019, bug bounty platforms such as HackerOne have over 200 publicly listed programs, with programs listed on HackerOne being responsible for the discovery of tens of thousands of vulnerabilities since 2013. We report the results of an empirical analysis that was undertaken using the data available from two bug bounty platforms to understand the costs and benefits of bug bounty programs both to participants and to organisations. We consider the economics of bug bounty programs, investigating the costs and benefits to those running such programs and the hackers that participate in finding vulnerabilities. We find that the average cost of operating a bug bounty program for a year is now less than the cost of hiring two additional software engineers.

Article Search
Why Is My Bug Wontfix?
Qingye Wang
(Zhejiang University, China)
Developers often use bug reports to triage and fix bugs. However, not every bug can be fixed eventually. To understand the underlying reasons why bugs are wontfix, we conduct an empirical study on three open source projects (i.e., Mozilla, Eclipse and Apache OpenOffice) in Bugzilla. First, we manually analyzed 600 wontfix bug reports. Second, we used the open card sorting approach to label these bug reports why they were wontfix, and we summarized 12 categories of reasons. Next, we further studied the frequency distribution of the categories across projects. We found that Not Support bug reports are the majority of the wontfix bug reports. Moreover, the frequency distribution of wontfix bug reports across the 12 categories is basically similar for the three open source projects.

Article Search
Blve: Should the Current Software Version Be Suitable for Release?
Wei Zheng, Xiaojun Chen, Manqing Zhang, Zhao Shi, Junzheng Chen, and Xiang Chen
(Northwestern Polytechnical University, China; Nantong University, China)
Recently, agile development has become a popular software development method and many version iterations occur during agile development. It is very important to ensure the quality of each software version. However in actual development, it is difficult to know every stage or version about large-scale software development. That means developers do not know exactly which version the current project corresponds to. Simultaneously, there are many necessary requirements for software release in actual development. When we know exactly the version corresponding to the current project, we can know whether the current software version meets the release requirements. Therefore, we need a good software version division method. This paper presents a novel software version division method Blve by using machine learning method. We construct an accurate division model trained with Support Vector Regression method (SVR) to divide software version by processing the data which is commonly recorded in bug list. Then, we process the results of the regression and use the classification indicators for evaluation. In addition, we propose a slope-based approach to optimize the model, and this optimization can improve the accuracy performance measure to about 95%.

Article Search

proc time: 1.26