Powered by
2018 IEEE 25th International Conference on Software Analysis, Evolution, and Reengineering (SANER),
March 20-23, 2018,
Campobasso, Italy
Technical Research Papers
Program Analysis
Wed, Mar 21, 10:30 - 11:30, Aula Magna
Context Is King: The Developer Perspective on the Usage of Static Analysis Tools
Carmine Vassallo ,
Sebastiano Panichella, Fabio Palomba, Sebastian Proksch,
Andy Zaidman, and
Harald C. Gall
(University of Zurich, Switzerland; Delft University of Technology, Netherlands)
Automatic static analysis tools (ASATs) are tools that support automatic code quality evaluation of software systems with the aim of (i) avoiding and/or removing bugs and (ii) spotting design issues. Hindering their wide-spread acceptance are their (i) high false positive rates and (ii) low comprehensibility of the generated warnings. Researchers and ASATs vendors have proposed solutions to prioritize such warnings with the aim of guiding developers toward the most severe ones. However, none of the proposed solutions considers the development context in which an ASAT is being used to further improve the selection of relevant warnings. To shed light on the impact of such contexts on the warnings configuration, usage and adopted prioritization strategies, we surveyed 42 developers (69% in industry and 31% in open source projects) and interviewed 11 industrial experts that integrate ASATs in their workflow. While we can confirm previous findings on the reluctance of developers to configure ASATs, our study highlights that (i) 71% of developers do pay attention to different warning categories depending on the development context, and (ii) 63% of our respondents rely on specific factors (e.g., team policies and composition) when prioritizing warnings to fix during their programming. Our results clearly indicate ways to better assist developers by improving existing warning selection and prioritization strategies.
@InProceedings{SANER18p38,
author = {Carmine Vassallo and Sebastiano Panichella and Fabio Palomba and Sebastian Proksch and Andy Zaidman and Harald C. Gall},
title = {Context Is King: The Developer Perspective on the Usage of Static Analysis Tools},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {38--49},
doi = {},
year = {2018},
}
Micro-clones in Evolving Software
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
Detection, tracking, and refactoring of code clones (i.e., identical or nearly similar code fragments in the code-base of a software system) have been extensively investigated by a great many studies. Code clones have often been considered bad smells. While clone refactoring is important for removing code clones from the code-base, clone tracking is important for consistently updating code clones that are not suitable for refactoring. In this research we investigate the importance of micro-clones (i.e., code clones of less than five lines of code) in consistent updating of the code-base. While the existing clone detectors and trackers have ignored micro clones, our investigation on thousands of commits from six subject systems imply that around 80% of all consistent updates during system evolution occur in micro clones. The percentage of consistent updates occurring in micro clones is significantly higher than that in regular clones according to our statistical significance tests. Also, the consistent updates occurring in micro-clones can be up to 23% of all updates during the whole period of evolution. According to our manual analysis, around 83% of the consistent updates in micro-clones are non-trivial. As micro-clones also require consistent updates like the regular clones, tracking or refactoring micro-clones can help us considerably minimize effort for consistently updating such clones. Thus, micro-clones should also be taken into proper consideration when making clone management decisions.
@InProceedings{SANER18p50,
author = {Manishankar Mondal and Chanchal K. Roy and Kevin A. Schneider},
title = {Micro-clones in Evolving Software},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {50--60},
doi = {},
year = {2018},
}
Software Logging
Wed, Mar 21, 11:45 - 12:45, Aula Magna
SMARTLOG: Place Error Log Statement by Deep Understanding of Log Intention
Zhouyang Jia, Shanshan Li
, Xiaodong Liu, Xiangke Liao, and Yunhuai Liu
(National University of Defense Technology, China; Peking University, China)
Failure-diagnosis logs can dramatically reduce the system recovery time when software systems fail. Log automation tools can assist developers to write high quality log code. In traditional designs of log automation tools, they define log placement rules by extracting syntax features or summarizing code patterns. These approaches are, however, limited since the log placements are far beyond those rules but are according to the intention of software code. To overcome these limitations, we design and implement SmartLog, an intention-aware log automation tool. To describe the intention of log statements, we propose the Intention Description Model (IDM). SmartLog then explores the intention of existing logs and mines log rules from equivalent intentions. We conduct the experiments based on 6 real-world open-source projects. Experimental results show that SmartLog improves the accuracy of log placement by 43% and 16% compared with two state-of-the-art works. For 86 real-world patches aimed to add logs, 57% of them can be covered by SmartLog, while the overhead of all additional logs is less than 1%.
@InProceedings{SANER18p61,
author = {Zhouyang Jia and Shanshan Li and Xiaodong Liu and Xiangke Liao and Yunhuai Liu},
title = {SMARTLOG: Place Error Log Statement by Deep Understanding of Log Intention},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {61--71},
doi = {},
year = {2018},
}
Info
Testing
Wed, Mar 21, 13:45 - 14:45, Aula Magna
Exploring the Integration of User Feedback in Automated Testing of Android Applications
Giovanni Grano , Adelina Ciurumelea,
Sebastiano Panichella, Fabio Palomba, and
Harald C. Gall
(University of Zurich, Switzerland)
The intense competition characterizing mobile application's marketplaces forces developers to create and maintain high-quality mobile apps in order to ensure their commercial success and acquire new users. This motivated the research community to propose solutions that automate the testing process of mobile apps. However, the main problem of current testing tools is that they generate redundant and random inputs that are insufficient to properly simulate the human behavior, thus leaving feature and crash bugs undetected until they are encountered by users. To cope with this problem, we conjecture that information available in user reviews---that previous work showed as effective for maintenance and evolution problems---can be successfully exploited to identify the main issues users experience while using mobile applications, e.g., GUI problems and crashes.
In this paper we provide initial insights into this direction, investigating (i) what type of user feedback can be actually exploited for testing purposes, (ii) how complementary user feedback and automated testing tools are, when detecting crash bugs or errors and (iii) whether an automated system able to monitor crash-related information reported in user feedback is sufficiently accurate. Results of our study, involving 11,296 reviews of 8 mobile applications, show that user feedback can be exploited to provide contextual details about errors or exceptions detected by automated testing tools. Moreover, they also help detecting bugs that would remain uncovered when rely on testing tools only. Finally, the accuracy of the proposed automated monitoring system demonstrates the feasibility of our vision, i.e., integrate user feedback into testing process.
@InProceedings{SANER18p72,
author = {Giovanni Grano and Adelina Ciurumelea and Sebastiano Panichella and Fabio Palomba and Harald C. Gall},
title = {Exploring the Integration of User Feedback in Automated Testing of Android Applications},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {72--83},
doi = {},
year = {2018},
}
Structured Random Differential Testing of Instruction Decoders
Nathan Jay and
Barton P. Miller
(University of Wisconsin-Madison, USA)
Decoding binary executable files is a critical facility
for software analysis, including debugging, performance monitor-
ing, malware detection, cyber forensics, and sandboxing, among
other techniques. As a foundational capability, binary decoding
must be consistently correct for the techniques that rely on it to
be viable. Unfortunately, modern instruction sets are huge and
the encodings are complex, so as a result, modern binary decoders
are buggy. In this paper, we present a testing methodology that
automatically infers structural information for an instruction set
and uses the inferred structure to efficiently generate structured-
random test cases independent of the instruction set being tested.
Our testing methodology includes automatic output verification
using differential analysis and reassembly to generate error
reports. This testing methodology requires little instruction-
set-specific knowledge, allowing rapid testing of decoders for
new architectures and extensions to existing ones. We have
implemented our testing procedure in a tool name Fleece and
used it to test multiple binary decoders (Intel XED, libopcodes,
LLVM, Dyninst and Capstone) on multiple architectures (x86,
ARM and PowerPC). Our testing efficiently covered thousands
of instruction format variations for each instruction set and
uncovered decoding bugs in every decoder we tested.
@InProceedings{SANER18p84,
author = {Nathan Jay and Barton P. Miller},
title = {Structured Random Differential Testing of Instruction Decoders},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {84--94},
doi = {},
year = {2018},
}
Clustering Support for Inadequate Test Suite Reduction
Carmen Coviello, Simone Romano, Giuseppe Scanniello, Alessandro Marchetto, Giuliano Antoniol, and Anna Corazza
(University of Basilicata, Italy; Polytechnique Montréal, Canada; Federico II University of Naples, Italy)
Regression testing is an important activity that can be expensive (e.g., for large test suites). Test suite reduction approaches speed up regression testing by removing redundant test cases. These approaches can be classified as adequate or inadequate. Adequate approaches reduce test suites so that they completely preserve the test requirements (e.g., code coverage) of the original test suites. Inadequate approaches produce reduced test suites that only partially preserve the test requirements. An inadequate approach is appealing when it leads to a greater
reduction in test suite size at the expense of a small loss in fault-detection capability. We investigate a clustering-based approach for inadequate test suite reduction and compare it with well-known adequate approaches. Our investigation is founded on a public dataset and allows an exploration of trade-offs in test suite reduction. Results help a more informed decision, using
guidelines defined in this research, to balance size, coverage, and fault-detection loss of reduced test suites when using clustering.
@InProceedings{SANER18p95,
author = {Carmen Coviello and Simone Romano and Giuseppe Scanniello and Alessandro Marchetto and Giuliano Antoniol and Anna Corazza},
title = {Clustering Support for Inadequate Test Suite Reduction},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {95--105},
doi = {},
year = {2018},
}
Program Repair
Wed, Mar 21, 15:00 - 16:00, Aula Magna
Automatically Repairing Dependency-Related Build Breakage
Christian Macho, Shane McIntosh, and
Martin Pinzger
(University of Klagenfurt, Austria; McGill University, Canada)
Build systems are widely used in today’s software projects to automate integration and build processes. Similar to source code, build specifications need to be maintained to avoid outdated specifications, and build breakage as a consequence. Recent work indicates that neglected build maintenance is one of the most frequently occurring reasons why open source and proprietary builds break. In this paper, we propose BuildMedic, an approach to automatically repair Maven builds that break due to dependency-related issues. Based on a manual investigation of 37 broken Maven builds in 23 open source Java projects, we derive three repair strategies to automatically repair the build, namely Version Update, Delete Dependency, and Add Repository. We evaluate the three strategies on 84 additional broken builds from the 23 studied projects in order to demonstrate the applicability of our approach. The evaluation shows that BuildMedic can automatically repair 45 of these broken builds (54%). Furthermore, in 36% of the successfully repaired build breakages, BuildMedic outputs at least one repair candidate that is considered a correct repair. Moreover, 76% of them could be repaired with only a single dependency correction.
@InProceedings{SANER18p106,
author = {Christian Macho and Shane McIntosh and Martin Pinzger},
title = {Automatically Repairing Dependency-Related Build Breakage},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {106--117},
doi = {},
year = {2018},
}
Mining StackOverflow for Program Repair
Xuliang Liu and Hao Zhong
(Shanghai Jiao Tong University, China)
In recent years, automatic program repair has been a hot research topic in the software engineering community, and many approaches have been proposed. Although these approaches produce promising results, some researchers criticize that existing approaches are still limited in their repair capability, due to their limited repair templates. Indeed, it is quite difficult to design effective repair templates. An award-wining paper analyzes thousands of manual bug fixes, but summarizes only ten repair templates. Although more bugs are thus repaired, recent studies show such repair templates are still insufficient.
We notice that programmers often refer to Stack Overflow, when they repair bugs. With years of accumulation, Stack Overflow has millions of posts that are potentially useful to repair many bugs. The observation motives our work towards mining repair templates from Stack Overflow. In this paper, we propose a novel approach, called SOFIX, that extracts code samples from Stack Overflow, and mines repair patterns from extracted code samples. Based on our mined repair patterns, we derived 13 repair templates. We implemented these repair templates in SOFIX, and conducted evaluations on the widely used benchmark, Defects4J. Our results show that SOFIX repaired 23 bugs, which are more than existing approaches. After comparing repaired bugs and templates, we find that SOFIX repaired more bugs, since it has more repair templates. In addition, our results also reveal the urgent need for better fault localization techniques.
@InProceedings{SANER18p118,
author = {Xuliang Liu and Hao Zhong},
title = {Mining StackOverflow for Program Repair},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {118--129},
doi = {},
year = {2018},
}
Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J
Victor Sobreira,
Thomas Durieux, Fernanda Madeiral, Martin Monperrus, and Marcelo de Almeida Maia
(Federal University of Uberlândia, Brazil; Inria, France; University of Lille, France; KTH, Sweden)
Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like “which bugs can my technique handle?” and “for which bugs is my technique effective?” depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysis-based approach. We found that 1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines; 2) 92% of the patches change only one file, and 38% has no spreading at all; 3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches; and 4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks. These results are useful for researchers to perform advanced analysis on their techniques’ results based on Defects4J. Moreover, our set of properties can be used to characterize and compare different bug datasets.
@InProceedings{SANER18p130,
author = {Victor Sobreira and Thomas Durieux and Fernanda Madeiral and Martin Monperrus and Marcelo de Almeida Maia},
title = {Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {130--140},
doi = {},
year = {2018},
}
Info
Mobile Development
Wed, Mar 21, 16:30 - 17:30, Aula Magna
Detecting Third-Party Libraries in Android Applications with High Precision and Recall
Yuan Zhang
, Jiarun Dai, Xiaohan Zhang, Sirong Huang, Zhemin Yang, Min Yang
, and Hao Chen
(Fudan University, China; Shanghai Institute of Intelligent Electronics and Systems, China; Shanghai Institute for Advanced Communication and Data Science, China; University of California at Davis, USA)
Third-party libraries are widely used in Android applications to ease development and enhance functionalities. However, the incorporated libraries also bring new security & privacy issues to the host application, and blur the accounting of application code and library code. Under this situation, a precise and reliable library detector is highly desirable. In fact, library code may be customized by developers during integration and dead library code may be eliminated by code obfuscators during application build process. However, existing research on library detection has not gracefully handled these problems, thus facing severe limitations in practice.
In this paper, we propose LibPecker, an obfuscation-resilient, highly precise and reliable library detector for Android applications. LibPecker adopts signature matching to give a similarity score between a given library and an application. By fully utilizing the internal class dependencies inside a library, LibPecker generates a strict signature for each class. To tolerate library code customization and elimination as much as possible, LibPecker introduces adaptive class similarity threshold and weighted class similarity score in calculating library similarity. To quantitatively evaluate precision and recall of LibPecker, we perform the first such experiment (to the best of our knowledge) with a large number of libraries and applications. Results show that LibPecker significantly outperforms state-of-the-art tool in both recall and precision (91% and 98.1% respectively
@InProceedings{SANER18p141,
author = {Yuan Zhang and Jiarun Dai and Xiaohan Zhang and Sirong Huang and Zhemin Yang and Min Yang and Hao Chen},
title = {Detecting Third-Party Libraries in Android Applications with High Precision and Recall},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {141--152},
doi = {},
year = {2018},
}
Software Quality
Thu, Mar 22, 10:30 - 11:30, Aula Magna
How Do Developers Fix Issues and Pay Back Technical Debt in the Apache Ecosystem?
Georgios Digkas, Mircea Lungu, Paris Avgeriou,
Alexander Chatzigeorgiou , and Apostolos Ampatzoglou
(University of Groningen, Netherlands; University of Macedonia, Greece)
During software evolution technical debt (TD) follows a constant ebb and flow, being incurred and paid back, sometimes in the same day and sometimes ten years later. There have been several studies in the literature investigating how technical debt in source code accumulates during time and the consequences of this accumulation for software maintenance. However, to the best of our knowledge there are no large scale studies that focus on the types of issues that are fixed and the amount of TD that is paid back during software evolution. In this paper we present the results of a case study, in which we analyzed the evolution of fifty-seven Java open-source software projects by the Apache Software Foundation at the temporal granularity level of weekly snapshots. In particular, we focus on the amount of technical debt that is paid back and the types of issues that are fixed. The findings reveal that a small subset of all issue types is responsible for the largest percentage of TD repayment and thus, targeting particular violations the development team can achieve higher benefits.
@InProceedings{SANER18p153,
author = {Georgios Digkas and Mircea Lungu and Paris Avgeriou and Alexander Chatzigeorgiou and Apostolos Ampatzoglou},
title = {How Do Developers Fix Issues and Pay Back Technical Debt in the Apache Ecosystem?},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {153--163},
doi = {},
year = {2018},
}
How Good Is Your Puppet? An Empirically Defined and Validated Quality Model for Puppet
Eduard van der Bent, Jurriaan Hage, Joost Visser, and
Georgios Gousios
(Utrecht University, Netherlands; Software Improvement Group, Netherlands; Delft University of Technology, Netherlands)
Puppet is a declarative language for configuration management that has rapidly gained popularity in recent years. Numerous organizations now rely on Puppet code for deploying their software systems onto cloud infrastructures. In this paper we provide a definition of code quality for Puppet code and an automated technique for measuring and rating Puppet code quality. To this end, we first explore the notion of code quality as it applies to Puppet code by performing a survey among Puppet developers. Second, we develop a measurement model for the maintainability aspect of Puppet code quality. To arrive at this measurement model, we derive appropriate quality metrics from our survey results and from existing software quality models. We implemented the Puppet code quality model in a software analysis tool. We validate our definition of Puppet code quality and the measurement model by a structured interview with Puppet experts and by comparing the tool results with quality judgments of those experts. The validation shows that the measurement model and tool provide quality judgments of Puppet code that closely match the judgments of experts. Also, the experts deem the model appropriate and usable in practice. The Software Improvement Group (SIG) has started using the model in its consultancy practice.
@InProceedings{SANER18p164,
author = {Eduard van der Bent and Jurriaan Hage and Joost Visser and Georgios Gousios},
title = {How Good Is Your Puppet? An Empirically Defined and Validated Quality Model for Puppet},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {164--174},
doi = {},
year = {2018},
}
Info
Behavior and Runtime Analysis
Thu, Mar 22, 10:30 - 11:30, Room 2
Maintaining Behaviour Driven Development Specifications: Challenges and Opportunities
Leonard Peter Binamungu ,
Suzanne M. Embury , and
Nikolaos Konstantinou
(University of Manchester, UK)
In Behaviour-Driven Development (BDD) the behaviour of a software system is specified as a set of example interactions with the system using a "Given-When-Then" structure. These examples are expressed in high level domain-specific terms, and are executable. They thus act both as a specification of requirements and as tests that can verify whether the current system implementation provides the desired behaviour or not. This approach has many advantages but also presents some problems. When the number of examples grows, BDD specifications can become costly to maintain and extend. Some teams find that parts of the system are effectively frozen due to the challenges of finding and modifying the examples associated with them. We surveyed 75 BDD practitioners from 26 countries to understand the extent of BDD use, its benefits and challenges, and specifically the challenges of maintaining BDD specifications in practice. We found that BDD is in active use amongst respondents, and that the use of domain specific terms, improving communication among stakeholders, the executable nature of BDD specifications, and facilitating comprehension of code intentions are the main benefits of BDD. The results also showed that BDD specifications suffer the same maintenance challenges found in automated test suites more generally. We map the survey results to the literature, and propose 10 research opportunities in this area.
@InProceedings{SANER18p175,
author = {Leonard Peter Binamungu and Suzanne M. Embury and Nikolaos Konstantinou},
title = {Maintaining Behaviour Driven Development Specifications: Challenges and Opportunities},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {175--184},
doi = {},
year = {2018},
}
Recursion Aware Modeling and Discovery for Hierarchical Software Event Log Analysis
Maikel Leemans, Wil M. P. van der Aalst, and Mark G. J. van den Brand
(Eindhoven University of Technology, Netherlands)
This paper presents 1) a novel hierarchy and recursion extension to the process tree model; and 2) the first, recursion aware process model discovery technique that leverages hierarchical information in event logs, typically available for software systems. This technique allows us to analyze the operational processes of software systems under real-life conditions at multiple levels of granularity. The work can be positioned in-between reverse engineering and process mining.
An implementation of the proposed approach is available as a ProM plugin. Experimental results based on real-life (software) event logs demonstrate the feasibility and usefulness of the approach and show the huge potential to speed up discovery by exploiting the available hierarchy.
@InProceedings{SANER18p185,
author = {Maikel Leemans and Wil M. P. van der Aalst and Mark G. J. van den Brand},
title = {Recursion Aware Modeling and Discovery for Hierarchical Software Event Log Analysis},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {185--196},
doi = {},
year = {2018},
}
Design Analysis
Thu, Mar 22, 11:45 - 12:45, Aula Magna
Automatically Exploiting Implicit Design Knowledge When Solving the Class Responsibility Assignment Problem
Yongrui Xu, Peng Liang, and Muhammad Ali Babar
(Wuhan University, China; University of Adelaide, Australia)
Assigning responsibilities to classes is not only vital during initial software analysis/design phases in object-oriented analysis and design (OOAD), but also during maintenance and evolution phases, when new responsibilities have to be assigned to classes or existing responsibilities have to be changed. Class Re-sponsibility Assignment (CRA) is one of the most complex tasks in OOAD as it heavily relies on designers’ judgment and implicit design knowledge (DK) of design problems. Since CRA is highly dependent on the successful use of implicit DK, (semi)- automat-ed approaches that help designers to assign responsibilities to classes should make implicit DK explicit and exploit the DK ef-fectively. In this paper, we propose a learning based approach for the Class Responsibility Assignment (CRA) problem. A learning mechanism is introduced into Genetic Algorithm (GA) to extract the implicit DK about which responsibilities have a high proba-bility to be assigned to the same class, and then the extracted DK is employed automatically to improve the design quality of the generated solutions. The proposed approach has been evaluated through an experimental study with three cases. By comparing the solutions obtained from the proposed approach and the exist-ing approaches, the proposed approach can significantly improve the design quality of the generated solutions to the CRA problem, and the generated solutions by the proposed approach are more likely to be accepted by developers from the practical aspects.
@InProceedings{SANER18p197,
author = {Yongrui Xu and Peng Liang and Muhammad Ali Babar},
title = {Automatically Exploiting Implicit Design Knowledge When Solving the Class Responsibility Assignment Problem},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {197--208},
doi = {},
year = {2018},
}
Defect Prediction
Thu, Mar 22, 11:45 - 12:45, Room 2
Cross-Version Defect Prediction via Hybrid Active Learning with Kernel Principal Component Analysis
Zhou Xu, Jin Liu,
Xiapu Luo , and Tao Zhang
(Wuhan University, China; Hong Kong Polytechnic University, China; Harbin Engineering University, China)
As defects in software modules may cause product failure and financial loss, it is critical to utilize defect prediction methods to effectively identify the potentially defective modules for a thorough inspection, especially in the early stage of software development lifecycle. For an upcoming version of a software project, it is practical to employ the historical labeled defect data of the prior versions within the same project to conduct defect prediction on the current version, i.e., Cross-Version Defect Prediction (CVDP). However, software development is a dynamic evolution process that may cause the data distribution (such as defect characteristics) to vary across versions. Furthermore, the raw features usually may not well reveal the intrinsic structure information behind the data. Therefore, it is challenging to perform effective CVDP. In this paper, we propose a two-phase CVDP framework that combines Hybrid Active Learning and Kernel PCA (HALKP) to address these two issues. In the first stage, HALKP uses a hybrid active learning method to select some informative and representative unlabeled modules from the current version for querying their labels, then merges them into the labeled modules of the prior version to form an enhanced training set. In the second stage, HALKP employs a non-linear mapping method, kernel PCA, to extract representative features by embedding the original data of two versions into a high-dimension space. We evaluate the HALKP framework on 31 versions of 10 projects with three prevalent performance indicators. The experimental results indicate that HALKP achieves encouraging results with average F-measure, g-mean and Balance of 0.480, 0.592 and 0.580, respectively and significantly outperforms nearly all baseline methods.
@InProceedings{SANER18p209,
author = {Zhou Xu and Jin Liu and Xiapu Luo and Tao Zhang},
title = {Cross-Version Defect Prediction via Hybrid Active Learning with Kernel Principal Component Analysis},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {209--220},
doi = {},
year = {2018},
}
Using a Probabilistic Model to Predict Bug Fixes
Mauricio Soto and
Claire Le Goues
(Carnegie Mellon University, USA)
Automatic Software Repair (APR) has significant
potential to reduce software maintenance costs by reducing the
human effort required to localize and fix bugs. State-of-theart
generate-and-validate APR techniques select between and
instantiate various mutation operators to construct candidate
patches, informed largely by heuristic probability distributions.
This may reduce effectiveness in terms of both efficiency and
output quality. In practice, human developers have many options
in terms of how to edit code to fix bugs, some of which are
far more common than others (e.g., deleting a line of code is
more common than adding a new class). We mined the most
recent 100 bug-fixing commits from each of the 500 most popular
Java projects in GitHub (the largest dataset to date) to create a
probabilistic model describing edit distributions. We categorize,
compare and evaluate the different mutation operators used in
state-of-the-art approaches. We find that a probabilistic model-based
APR approach patches bugs more quickly in the majority
of bugs studied, and that the resulting patches are of higher
quality than those produced by previous approaches. Finally, we
mine association rules for multi-edit source code changes, an
understudied but important problem. We validate the association
rules by analyzing how much of our corpus can be built from
them. Our evaluation indicates that 84.6% of the multi-edit
patches from the corpus can be built from the association rules,
while maintaining 90% confidence.
@InProceedings{SANER18p221,
author = {Mauricio Soto and Claire Le Goues},
title = {Using a Probabilistic Model to Predict Bug Fixes},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {221--231},
doi = {},
year = {2018},
}
Connecting Software Metrics across Versions to Predict Defects
Yibin Liu,
Yanhui Li , Jianbo Guo,
Yuming Zhou , and Baowen Xu
(Nanjing University, China; Tsinghua University, China)
Accurate software defect prediction could help software practitioners allocate test resources to defect-prone modules effectively and efficiently. In the last decades, much effort has been devoted to build accurate defect prediction models, including developing quality defect predictors and modeling techniques. However, current widely used defect predictors such as code metrics and process metrics could not well describe how software modules change over the project evolution, which we believe is important for defect prediction. In order to deal with this problem, in this paper, we propose to use the Historical Version Sequence of Metrics (HVSM) in continuous software versions as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN), a popular modeling technique, to take HVSM as the input to build software prediction models. The experimental results show that, in most cases, the proposed HVSM-based RNN model has significantly better effort-aware ranking effectiveness than the commonly used baseline models.
@InProceedings{SANER18p232,
author = {Yibin Liu and Yanhui Li and Jianbo Guo and Yuming Zhou and Baowen Xu},
title = {Connecting Software Metrics across Versions to Predict Defects},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {232--243},
doi = {},
year = {2018},
}
APIs
Thu, Mar 22, 13:45 - 14:45, Aula Magna
Classifying Stack Overflow Posts on API Issues
Md Ahasanuzzaman,
Muhammad Asaduzzaman, Chanchal K. Roy, and Kevin A. Schneider
(Queen's University, Canada; University of Saskatchewan, Canada)
The design and maintenance of APIs are complex tasks due to the constantly changing requirements of its users. Despite the efforts of its designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), has become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features of posts and experience of users to build a technique, called CAPS, that can classify SO posts concerning API issues. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We also conduct studies to test the generalizability of CAPS results and to understand the effects of different sources of information on it.
@InProceedings{SANER18p244,
author = {Md Ahasanuzzaman and Muhammad Asaduzzaman and Chanchal K. Roy and Kevin A. Schneider},
title = {Classifying Stack Overflow Posts on API Issues},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {244--254},
doi = {},
year = {2018},
}
Why and How Java Developers Break APIs
Aline Brito, Laerte Xavier,
Andre Hora, and
Marco Tulio Valente
(Federal University of Minas Gerais, Brazil; Federal University of Mato Grosso do Sul, Brazil)
Modern software development depends on APIs to reuse code and increase productivity. As most software systems, these libraries and frameworks also evolve, which may break existing clients. However, the main reasons to introduce breaking changes in APIs are unclear. Therefore, in this paper, we report the results of an almost 4-month long field study with the developers of 400 popular Java libraries and frameworks. We configured an infrastructure to observe all changes in these libraries and to detect breaking changes shortly after their introduction in the code. After identifying breaking changes, we asked the developers to explain the reasons behind their decision to change the APIs. During the study, we identified 59 breaking changes, confirmed by the developers of 19 projects. By analyzing the developers' answers, we report that breaking changes are mostly motivated by the need to implement new features, by the desire to make the APIs simpler and with fewer elements, and to improve maintainability. We conclude by providing suggestions to language designers, tool builders, software engineering researchers and API developers.
@InProceedings{SANER18p255,
author = {Aline Brito and Laerte Xavier and Andre Hora and Marco Tulio Valente},
title = {Why and How Java Developers Break APIs},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {255--265},
doi = {},
year = {2018},
}
Mining Accurate Message Formats for Service APIs
Md Arafat Hossain, Steve Versteeg, Jun Han, Muhammad Ashad Kabir, Jiaojiao Jiang, and Jean-Guy Schneider
(Swinburne University of Technology, Australia; CA Technologies, Australia)
APIs play a significant role in the sharing, utilization
and integration of information and service assets for enterprises,
delivering significant business value. However, the documentation
of service APIs can often be incomplete, ambiguous, or even nonexistent,
hindering API-based application development efforts. In
this paper, we introduce an approach to automatically mine the
fine-grained message formats required in defining the APIs of
services and applications from their interaction traces, without
assuming any prior knowledge. Our approach includes three
major steps with corresponding techniques: (1) classifying the
interaction messages of a service into clusters corresponding
to message types, (2) identifying the keywords of messages in
each cluster, and (3) extracting the format of each message type.
We have applied our approach to network traces collected from
four real services which used the following application protocols:
REST, SOAP, LDAP and SIP. The results show that our approach
achieves much greater accuracy in extracting message formats
for service APIs than current state-of-art approaches.
@InProceedings{SANER18p266,
author = {Md Arafat Hossain and Steve Versteeg and Jun Han and Muhammad Ashad Kabir and Jiaojiao Jiang and Jean-Guy Schneider},
title = {Mining Accurate Message Formats for Service APIs},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {266--276},
doi = {},
year = {2018},
}
Exploring Code Bases
Thu, Mar 22, 15:00 - 16:00, Aula Magna
Mining Framework Usage Graphs from App Corpora
Sergio Mover, Sriram Sankaranarayanan, Rhys Braginton Pettee Olsen, and
Bor-Yuh Evan Chang
(University of Colorado at Boulder, USA)
We investigate the problem of mining graph-based usage patterns for large, object-oriented frameworks like Android—revisiting previous approaches based on graph-based object usage models (groums). Groums are a promising approach to represent usage patterns for object-oriented libraries because they simultaneously describe control flow and data dependencies between methods of multiple interacting object types. However, this expressivity comes at a cost: mining groums requires solving a subgraph isomorphism problem that is well known to be expensive. This cost limits the applicability of groum mining to large API frameworks.
In this paper, we employ groum mining to learn usage patterns for object-oriented frameworks from program corpora. The central challenge is to scale groum mining so that it is sensitive to usages horizontally across programs from arbitrarily many developers (as opposed to simply usages vertically within the program of a single developer). To address this challenge, we develop a novel groum mining algorithm that scales on a large corpus of programs. We first use frequent itemset mining to restrict the search for groums to smaller subsets of methods in the given corpus. Then, we pose the subgraph isomorphism as a SAT problem and apply efficient pre-processing algorithms to rule out fruitless comparisons ahead of time. Finally, we identify containment relationships between clusters of groums to characterize popular usage patterns in the corpus (as well as classify less popular patterns as possible anomalies). We find that our approach scales on a corpus of over five hundred open source Android applications, effectively mining obligatory and best-practice usage patterns.
@InProceedings{SANER18p277,
author = {Sergio Mover and Sriram Sankaranarayanan and Rhys Braginton Pettee Olsen and Bor-Yuh Evan Chang},
title = {Mining Framework Usage Graphs from App Corpora},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {277--287},
doi = {},
year = {2018},
}
A Generalized Model for Visualizing Library Popularity, Adoption, and Diffusion within a Software Ecosystem
Raula Gaikovina Kula, Coen De Roover, Daniel M. German, Takashi Ishio, and
Katsuro Inoue
(NAIST, Japan; Vrije Universiteit Brussel, Belgium; University of Victoria, Canada; Osaka University, Japan)
The popularity of super repositories such as Maven Central and the CRAN is a testament to software reuse activities in both open-source and commercial projects alike. However, several studies have highlighted the risks and dangers brought about by application developers keeping dependencies on outdated library versions. Intelligent mining of super repositories could reveal hidden trends within the corresponding software ecosystem and thereby provide valuable insights for such dependency-related decisions. In this paper, we propose the Software Universe Graph (SUG) Model as a structured abstraction of the evolution of software systems and their library dependencies over time. To demonstrate the SUG's usefulness, we conduct an empirical study using 6,374 Maven artifacts and over 6,509 CRAN packages mined from their real-world ecosystems. Visualizations of the SUG model such as `library coexistence pairings' and `dependents diffusion' uncover popularity, adoption and diffusion patterns within each software ecosystem. Results show the Maven ecosystem as having a more conservative approach to dependency updating than the CRAN ecosystem.
@InProceedings{SANER18p288,
author = {Raula Gaikovina Kula and Coen De Roover and Daniel M. German and Takashi Ishio and Katsuro Inoue},
title = {A Generalized Model for Visualizing Library Popularity, Adoption, and Diffusion within a Software Ecosystem},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {288--299},
doi = {},
year = {2018},
}
Supporting Exploratory Code Search with Differencing and Visualization
Wenjian Liu, Xin Peng
, Zhenchang Xing
, Junyi Li, Bing Xie, and Wenyun Zhao
(Fudan University, China; Shanghai Institute of Intelligent Electronics and Systems, China; Australian National University, Australia; Peking University, China)
Searching and reusing online code has become a common practice in software development. Two important characteristics of online code have not been carefully considered in current tool support. First, many pieces of online code are largely similar but subtly different. Second, several pieces of code may form complex relations through their differences. These two characteristics make it difficult to properly rank online code to a search query and reduce the efficiency of examining search results. In this paper, we present an exploratory online code search approach that explicitly takes into account the above two characteristics of online code. Given a list of methods returned for a search query, our approach uses clone detection and code differencing techniques to analyze both commonalities and differences among the methods in the search results. It then produces an exploration graph that visualizes the method differences and the relationships of methods through their differences. The exploration graph allows developers to explore search results in a structured view of different method groups present in the search results, and turns implicit code differences into visual cues to help developers navigate the search results. We implement our approach in a web-based tool called CodeNuance. We conduct experiments to evaluate the effectiveness of our CodeNuance tool for search results examination, compared with ranked-list and code-clustering based search results examination. We also compare the performance and user behavior differences in using our tool and other exploratory code search tools.
@InProceedings{SANER18p300,
author = {Wenjian Liu and Xin Peng and Zhenchang Xing and Junyi Li and Bing Xie and Wenyun Zhao},
title = {Supporting Exploratory Code Search with Differencing and Visualization},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {300--310},
doi = {},
year = {2018},
}
Video
Info
Language Models
Thu, Mar 22, 16:30 - 17:15, Aula Magna
Syntax and Sensibility: Using Language Models to Detect and Correct Syntax Errors
Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral
(University of Alberta, Canada)
Syntax errors are made by novice and experienced programmers alike;
however, novice programmers lack the years of experience
that help them quickly resolve these frustrating errors.
Standard LR parsers are of little help,
typically resolving syntax errors and their precise location poorly.
We propose a methodology that locates where syntax errors occur,
and suggests possible changes to the token stream that can fix the error identified.
This methodology finds syntax errors
by using language models trained on correct source code to find tokens that seem out of place.
Fixes are synthesized by consulting the language models
to determine what tokens are more likely at the estimated error location.
We compare n-gram and LSTM (long short-term memory) language models for this task,
each trained on a large corpus of Java code collected from GitHub.
Unlike prior work,
our methodology does not rely that the problem source code comes from the same domain as the training data.
We evaluated against a repository of real student mistakes.
Our tools are able to find a syntactically-valid fix within its top-2 suggestions,
often producing the exact fix that the student used to resolve the error.
The results show that this tool and methodology can locate and suggest corrections for syntax errors.
Our methodology is of practical use to all programmers,
but will be especially useful to novices frustrated with incomprehensible syntax errors.
@InProceedings{SANER18p311,
author = {Eddie Antonio Santos and Joshua Charles Campbell and Dhvani Patel and Abram Hindle and José Nelson Amaral},
title = {Syntax and Sensibility: Using Language Models to Detect and Correct Syntax Errors},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {311--322},
doi = {},
year = {2018},
}
Info
A Deep Neural Network Language Model with Contexts for Source Code
Anh Tuan Nguyen, Trong Duc Nguyen, Hung Dang Phan, and Tien N. Nguyen
(Iowa State University, USA; University of Texas at Dallas, USA)
Statistical language models (LMs) have been applied in several software engineering applications. However, they have issues in dealing with ambiguities in the names of program and API elements (classes and method calls). In this paper, inspired by the success of Deep Neural Network (DNN) in natural language processing, we present DNN4C, a DNN language model that complements the local context of lexical code elements with both syntactic and type contexts. We designed a context-incorporating method to use with syntactic and type annotations for source code in order to learn to distinguish the lexical tokens in different syntactic and type contexts. Our empirical evaluation on code completion for real-world projects shows that DNN4C relatively improves 11.6%, 16.3%, 27.1%, and 44.7% top-1 accuracy over the state-of-the-art language models for source code used with the same features: RNN LM, DNN LM, SLAMC, and n-gram LM, respectively. For another application, we showed that DNN4C helps improve accuracy over n-gram LM in migrating source code from Java to C# with a machine translation model.
@InProceedings{SANER18p323,
author = {Anh Tuan Nguyen and Trong Duc Nguyen and Hung Dang Phan and Tien N. Nguyen},
title = {A Deep Neural Network Language Model with Contexts for Source Code},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {323--334},
doi = {},
year = {2018},
}
Binary Analysis
Thu, Mar 22, 16:30 - 17:15, Room 2
Efficient Features for Function Matching between Binary Executables
Chariton Karamitas and Athanasios Kehagias
(CENSUS, Greece; University of Thessaloniki, Greece)
Binary diffing is the process of reverse engineering two programs, when source
code is not available, in order to study their syntactic and semantic
differences. For large programs, binary diffing can be performed by function
matching which, in turn, is reduced to a graph isomorphism problem between the
compared programs' CFGs (Control Flow Graphs) and/or CGs (Call Graphs). In this
paper we provide a set of carefully chosen features, extracted from a binary's
CG and CFG, which can be used by BinDiff algorithm variants to, first, build a
set of initial exact matches with minimal false positives (by scanning for
unique perfect matches) and, second, propagate approximate matching information
using, for example, a nearest-neighbor scheme. Furthermore, we investigate the
benefits of applying Markov lumping techniques to function CFGs (to our
knowledge, this technique has not been previously studied). The proposed
function features are evaluated in a series of experiments on various versions
of the Linux kernel (Intel64), the OpenSSH server (Intel64) and Firefox's
xul.dll (IA-32). Our prototype system is also compared to Diaphora, the current
state-of-the-art binary diffing software.
@InProceedings{SANER18p335,
author = {Chariton Karamitas and Athanasios Kehagias},
title = {Efficient Features for Function Matching between Binary Executables},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {335--345},
doi = {},
year = {2018},
}
Using Recurrent Neural Networks for Decompilation
Deborah S. Katz, Jason Ruchti, and Eric Schulte
(Carnegie Mellon University, USA; GrammaTech, USA)
Decompilation, recovering source code from binary, is useful in many situations where it is necessary to analyze or understand software for which source code is not available. Source code is much easier for humans to read than binary code, and there are many tools available to analyze source code. Existing decompilation techniques often generate source code that is difficult for humans to understand because the generated code often does not use the coding idioms that programmers use. Differences from human-written code also reduce the effectiveness of analysis tools on the decompiled source code.
To address the problem of differences between decompiled code and human-written code, we present a novel technique for decompiling binary code snippets using a model based on Recurrent Neural Networks. The model learns properties and patterns that occur in source code and uses them to produce decompilation output. We train and evaluate our technique on snippets of binary machine code compiled from C source code. The general approach we outline in this paper is not language-specific and requires little or no domain knowledge of a language and its properties or how a compiler operates, making the approach easily extensible to new languages and constructs. Furthermore, the technique can be extended and applied in situations to which traditional decompilers are not targeted, such as for decompilation of isolated binary snippets; fast, on-demand decompilation; domain-specific learned decompilation; optimizing for readability of decompilation; and recovering control flow constructs, comments, and variable or function names. We show that the translations produced by this technique are often accurate or close and can provide a useful picture of the snippet's behavior.
@InProceedings{SANER18p346,
author = {Deborah S. Katz and Jason Ruchti and Eric Schulte},
title = {Using Recurrent Neural Networks for Decompilation},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {346--356},
doi = {},
year = {2018},
}
Developers' Collaboration
Fri, Mar 23, 10:30 - 11:30, Aula Magna
How Do Developers Discuss Rationale?
Rana Alkadhi, Manuel Nonnenmacher, Emitza Guzman, and Bernd Bruegge
(TU Munich, Germany; University of Zurich, Switzerland)
Developers make various decisions during software development.
The rationale behind these decisions is of great importance during software evolution of long living software systems.
However, current practices for documenting rationale often fall short and rationale remains hidden in the heads of developers or embedded in development artifacts.
Further challenges are faced for capturing rationale in OSS projects; in which developers are geographically distributed and rely mostly on written communication channels to support and coordinate their activities.
In this paper, we present an empirical study to understand how OSS developers discuss rationale in IRC channels and explore the possibility of automatic extraction of rationale elements by analyzing IRC messages of development teams.
To achieve this, we manually analyzed 7,500 messages of three large OSS projects and identified all fine-grained elements of rationale.
We evaluated various machine learning algorithms for automatically detecting and classifying rationale in IRC messages.
Our results show that 1) rationale is discussed on average in 25% of IRC messages, 2) code committers contributed on average 54% of the discussed rationale, and 3) machine learning algorithms can detect rationale with 0.76 precision and 0.79 recall, and classify messages into finer-grained rationale elements with an average of 0.45 precision and 0.43 recall.
@InProceedings{SANER18p357,
author = {Rana Alkadhi and Manuel Nonnenmacher and Emitza Guzman and Bernd Bruegge},
title = {How Do Developers Discuss Rationale?},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {357--367},
doi = {},
year = {2018},
}
Automated Quality Assessment for Crowdsourced Test Reports of Mobile Applications
Xin Chen, He Jiang
, Xiaochen Li, Tieke He, and
Zhenyu Chen
(Dalian University of Technology, China; Nanjing University, China)
In crowdsourced mobile application testing, crowd workers help developers perform testing and submit test reports for unexpected behaviors. These submitted test reports usually provide critical information for developers to understand and reproduce the bugs. However, due to the poor performance of workers and the inconvenience of editing on mobile devices, the quality of test reports may vary sharply. At times developers have to spend a significant portion of their available resources to handle the low-quality test reports, thus heavily decreasing their efficiency. In this paper, to help developers predict whether a test report should be selected for inspection within limited resources, we propose a new framework named TERQAF to automatically model the quality of test reports. TERQAF defines a series of quantifiable indicators to measure the desirable properties of test reports and aggregates the numerical values of all indicators to determine the quality of test reports by using step transformation functions. Experiments conducted over five crowdsourced test report datasets of mobile applications show that TERQAF can correctly predict the quality of test reports with accuracy of up to 88.06% and outperform baselines by up to 23.06%. Meanwhile, the experimental results also demonstrate that the four categories of measurable indicators have positive impacts on TERQAF in evaluating the quality of test reports.
@InProceedings{SANER18p368,
author = {Xin Chen and He Jiang and Xiaochen Li and Tieke He and Zhenyu Chen},
title = {Automated Quality Assessment for Crowdsourced Test Reports of Mobile Applications},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {368--379},
doi = {},
year = {2018},
}
Refactoring
Fri, Mar 23, 11:45 - 12:45, Aula Magna
The Impact of Refactoring Changes on the SZZ Algorithm: An Empirical Study
Edmilson Campos Neto, Daniel Alencar da Costa, and Uirá Kulesza
(Federal University of Rio Grande do Norte, Brazil; Instituto Federal do Rio Grande do Norte, Brazil; Queen's University, Canada)
SZZ is a widely used algorithm in the software engineering community to identify changes that are likely to introduce bugs (i.e., bug-introducing changes). Despite its wide adoption, SZZ still has room for improvements. For example, current SZZ implementations may still flag refactoring changes as bug-introducing. Refactorings should be disregarded as bug-introducing because they do not change the system behaviour. In this paper, we empirically investigate how refactorings impact both the input (bug-fix changes) and the output (bug-introducing changes) of the SZZ algorithm. We analyse 31,518 issues of ten Apache projects with 20,298 bug-introducing changes. We use an existing tool that automatically detects refactorings in code changes. We observe that 6.5% of lines that are flagged as bug-introducing changes by SZZ are in fact refactoring changes. Regarding bug-fix changes, we observe that 19.9% of lines that are removed during a fix are related to refactorings and, therefore, their respective inducing changes are false positives. We then incorporate the refactoring-detection tool in our Refactoring Aware SZZ Implementation (RA-SZZ). Our results reveal that RA-SZZ reduces 20.8% of the lines that are flagged as bug-introducing changes compared to the state-of-the-art SZZ implementations. Finally, we perform a manual analysis to identify change patterns that are not captured by the refactoring identification tool used in our study. Our results reveal that 47.95% of the analyzed bug-introducing changes contain additional change patterns that RA-SZZ should not flag as bug-introducing.
@InProceedings{SANER18p380,
author = {Edmilson Campos Neto and Daniel Alencar da Costa and Uirá Kulesza},
title = {The Impact of Refactoring Changes on the SZZ Algorithm: An Empirical Study},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {380--390},
doi = {},
year = {2018},
}
Info
An Extensible Approach for Taming the Challenges of JavaScript Dead Code Elimination
Niels Groot Obbink, Ivano Malavolta,
Gian Luca Scoccia, and Patricia Lago
(VU University Amsterdam, Netherlands; Gran Sasso Science Institute, Italy)
JavaScript is becoming the de-facto programming language of the Web. Large-scale web applications (web apps) written in Javascript are commonplace nowadays, with big technology players (e.g., Google, Facebook) using it in their core flagship products. Today, it is common practice to reuse existing JavaScript code, usually in the form of third-party libraries and frameworks. If on one side this practice helps in speeding up development time, on the other side it comes with the risk of bringing dead code, i.e., JavaScript code which is never executed, but still downloaded from the network and parsed in the browser. This overhead can negatively impact the overall performance and energy consumption of the web app.
In this paper we present Lacuna, an approach for JavaScript dead code elimination, where existing JavaScript analysis techniques are applied in combination. The proposed approach supports both static and dynamic analyses, it is extensible, and independent of the specificities of the used JavaScript analysis techniques. Lacuna can be applied to any JavaScript code base, without imposing any constraints to the developer, e.g., on her coding style or on the use of some specific JavaScript feature (e.g., modules).
Lacuna has been evaluated on a suite of 29 publicly-available web apps, composed of 15,946 JavaScript functions, and built with different JavaScript frameworks (e.g., Angular, Vue.js, jQuery). Despite being a prototype, Lacuna obtained promising results in terms of analysis execution time and precision.
@InProceedings{SANER18p391,
author = {Niels Groot Obbink and Ivano Malavolta and Gian Luca Scoccia and Patricia Lago},
title = {An Extensible Approach for Taming the Challenges of JavaScript Dead Code Elimination},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {391--401},
doi = {},
year = {2018},
}
Automated Refactoring of Client-Side JavaScript Code to ES6 Modules
Aikaterini Paltoglou, Vassilis E. Zafeiris, E. A. Giakoumakis, and N. A. Diamantidis
(Athens University of Economics and Business, Greece)
JavaScript (JS) is a dynamic, weakly-typed and object-based programming language that expanded its reach, in recent years, from the desktop web browser to a wide range of runtime platforms in embedded, mobile and server hosts. Moreover, the scope of functionality implemented in JS scaled from DOM manipulation in dynamic HTML pages to full-scale applications for various domains, stressing the need for code reusability and maintainability. Towards this direction, the ECMAScript 6 (ES6) revision of the language standardized the syntax for class and module definitions, streamlining the encapsulation of data and functionality at various levels of granularity.
This work focuses on refactoring client-side web applications for the elimination of code smells, relevant to global variables and functions that are declared in JS files linked to a web page. These declarations “pollute” the global namespace at runtime and often lead to name conflicts with undesired effects. We propose a method for the encapsulation of global declarations through automated refactoring to ES6 modules. Our approach transforms each linked JS script of a web application to an ES6 module with appropriate import and export declarations that are inferred through static analysis. A prototype implementation of the proposed method, based on WALA libraries, has been evaluated on a set of open source projects. The evaluation results support the applicability and runtime efficiency of the proposed method.
@InProceedings{SANER18p402,
author = {Aikaterini Paltoglou and Vassilis E. Zafeiris and E. A. Giakoumakis and N. A. Diamantidis},
title = {Automated Refactoring of Client-Side JavaScript Code to ES6 Modules},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {402--412},
doi = {},
year = {2018},
}
Recommender Systems
Fri, Mar 23, 13:45 - 14:45, Aula Magna
Improving Developers Awareness of the Exception Handling Policy
Taiza Montenegro, Hugo Melo,
Roberta Coelho, and Eiji Barbosa
(Federal University of Rio Grande do Norte, Brazil)
The exception handling policy of a system comprises the set of design rules that specify its exception handling behavior (how exceptions should be handled and thrown in a system). Such policy is usually undocumented and implicitly defined by the system architect. Developers are usually unaware of such rules and may think that by just sprinkling the code with catch-blocks they can adequately deal with the exceptional conditions of a system. As a consequence, the exception handling code once designed to make the program more reliable may become a source of faults (e.g., the uncaught exceptions are one of the main causes of crashes in current Java applications). To mitigate such problem, we propose Exception Policy Expert (EPE), a tool embedded in Eclipse IDE that warns developers about policy violations related to the code being edited. A case study performed in a real development context showed that the tool could indeed make the exception handling policy explicit to the developers during development.
@InProceedings{SANER18p413,
author = {Taiza Montenegro and Hugo Melo and Roberta Coelho and Eiji Barbosa},
title = {Improving Developers Awareness of the Exception Handling Policy},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {413--422},
doi = {},
year = {2018},
}
Detecting Faulty Empty Cells in Spreadsheets
Liang Xu, Shuo Wang,
Wensheng Dou, Bo Yang, Chushu Gao, Jun Wei
, and Tao Huang
(University at Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China; North China University of Technology, China)
Spreadsheets play an important role in various business tasks, such as financial reports and data analysis. In spreadsheets, empty cells are widely used for different purposes, e.g., separating different tables, or default value “0”. However, a user may delete a formula unintentionally, and leave a cell empty. Such ad-hoc modification may introduce a faulty empty cell that should have a formula.
We observe that the context of an empty cell can help determine whether the empty cell is faulty. For example, is the empty cell next to a cell array in which all cells share the same semantics? Does the empty cell have headers similar to other non-empty cells’? In this paper, we propose EmptyCheck, to detect faulty empty cells in spreadsheets. By analyzing the context of an empty cell, EmptyCheck validates whether the cell belong to a cell array. If yes, the empty cell is faulty since it does not contain a formula. We evaluate EmptyCheck on 100 randomly sampled EUSES spreadsheets. The experimental result shows that EmptyCheck can detect faulty empty cells with high precision (75.00%) and recall (87.04%). Existing techniques can detect only 4.26% of the true faulty empty cells that EmptyCheck detects.
@InProceedings{SANER18p423,
author = {Liang Xu and Shuo Wang and Wensheng Dou and Bo Yang and Chushu Gao and Jun Wei and Tao Huang},
title = {Detecting Faulty Empty Cells in Spreadsheets},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {423--433},
doi = {},
year = {2018},
}
Software Security
Fri, Mar 23, 15:00 - 16:00, Aula Magna
Detection of Protection-Impacting Changes during Software Evolution
Marc-André Laverdière and
Ettore Merlo
(Tata Consultancy Services, Canada; Polytechnique Montréal, Canada)
Role-Based Access Control (RBAC) is often used in web applications to restrict operations and protect security sensitive information and resources. Web applications regularly undergo maintenance and evolution and their security may be affected by source code changes between releases. To prevent security regression and vulnerabilities, developers have to take re-validation actions before deploying new releases. This may become a significant undertaking, especially when quick and repeated releases are sought.
We define protection-impacting changes as those changed statements during evolution that alter privilege protection of some code. We propose an automated method that identifies protection-impacting changes within all changed statements between two versions. The proposed approach
compares statically computed security protection models and repository information corresponding to different releases of a system to identify protection-impacting changes.
Results of experiments present the occurrence of protection-impacting changes over 210 release pairs of WordPress, a PHP content management web application. First, we show that only 41% of the release pairs present protection-impacting changes.
Second, for these affected release pairs, protection-impacting changes can be identified and represent a median of 47.00 lines of code, that is 27.41% of the total changed lines of code. Over all investigated releases in WordPress, protection-impacting changes amounted to 10.89% of changed lines of code. Conversely, an average of about 89% of changed source code have no impact on
RBAC security and thus need no re-validation nor investigation. The proposed method reduces the amount of candidate causes of protection changes that developers need to investigate. This
information could help developers re-validate application security, identify causes of negative security changes, and perform repairs in a more effective way.
@InProceedings{SANER18p434,
author = {Marc-André Laverdière and Ettore Merlo},
title = {Detection of Protection-Impacting Changes during Software Evolution},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {434--444},
doi = {},
year = {2018},
}
Mining Sandboxes: Are We There Yet?
Lingfeng Bao, Tien-Duy B. Le, and
David Lo
(Singapore Management University, Singapore)
The popularity of Android platform on mobile devices has attracted much attention from many developers and researchers, as well as malware writers. Recently, Jamrozik et al. proposed a technique to secure Android applications referred to as mining sandboxes. They used an automated test case generation technique to explore the behavior of the app under test and then extracted a set of sensitive APIs that were called. Based on the extracted sensitive APIs, they built a sandbox that can block access to APIs not used during testing. However, they only evaluated the proposed technique with benign apps but not investigated whether it was effective in detecting malicious behavior of malware that infects benign apps. Furthermore, they only investigated one test case generation tool (i.e., Droidmate) to build the sandbox, while many others have been proposed in the literature.
In this work, we complement Jamrozik et al.’s work in two ways: (1) we evaluate the effectiveness of mining sandboxes on detecting malicious behaviors; (2) we investigate the effectiveness of multiple automated test case generation tools to mine sandboxes. To investigate effectiveness of mining sandboxes in detecting malicious behaviors, we make use of pairs of malware and benign app it infects. We build a sandbox based on sensitive APIs called by the benign app and check if it can identify malicious behaviors in the corresponding malware. To generate inputs to apps, we investigate five popular test case generation tools: Monkey, Droidmate, Droidbot, GUIRipper, and PUMA. We conduct two experiments to evaluate the effectiveness and efficiency of these test case generation tools on detecting malicious behavior. In the first experiment, we select 10 apps and allow test case generation tools to run for one hour; while in the second experiment, we select 102 pairs of apps and allow the test case generation tools to run for one minute. Our experiments highlight that 75.5% 77.2% of malware in our dataset can be uncovered by mining sandboxes – showing its power to protect Android apps. We also find that Droidbot performs best in generating test cases for mining sandboxes, and its effectiveness can be further boosted when coupled with other test case generation tools.
@InProceedings{SANER18p445,
author = {Lingfeng Bao and Tien-Duy B. Le and David Lo},
title = {Mining Sandboxes: Are We There Yet?},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {445--455},
doi = {},
year = {2018},
}
DeepWeak: Reasoning Common Software Weaknesses via Knowledge Graph Embedding
Zhuobing Han, Xiaohong Li
, Hongtao Liu, Zhenchang Xing
, and Zhiyong Feng
(Tianjin University, China; Australian National University, Australia)
Common software weaknesses, such as improper input validation, integer overflow, can harm system security directly or indirectly, causing adverse effects such as denial-of-service, execution of unauthorized code. Common Weakness Enumeration (CWE) maintains a standard list and classification of common software weakness. Although CWE contains rich information about software weaknesses, including textual descriptions, common sequences and relations between software weaknesses, the current data representation, i.e., hyperlined documents, does not support advanced reasoning tasks on software weaknesses, such as prediction of missing relations and common consequences of CWEs.Such reasoning tasks become critical to managing and analyzing large numbers of common software weaknesses and their relations. In this paper, we propose to represent common software weaknesses and their relations as a knowledge graph, and develop a translation-based, description-embodied knowledge representation learning method to embed both software weaknesses and their relations in the knowledge graph into a semantic vector space. The vector representations (i.e., embeddings) of software weaknesses and their relations can be exploited for knowledge acquisition and inference.We conduct extensive experiments to evaluate the performance of software weakness and relation embeddings in three reasoning tasks, including CWE link prediction, CWE triple classification, and common consequence prediction. Our knowledge graph embedding approach outperforms other description- and/or structure-based representation learning methods.
@InProceedings{SANER18p456,
author = {Zhuobing Han and Xiaohong Li and Hongtao Liu and Zhenchang Xing and Zhiyong Feng},
title = {DeepWeak: Reasoning Common Software Weaknesses via Knowledge Graph Embedding},
booktitle = {Proc.\ SANER},
publisher = {IEEE},
pages = {456--466},
doi = {},
year = {2018},
}
proc time: 0.61