ASE 2017
2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017)
Powered by
Conference Publishing Consulting

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017), October 30 – November 3, 2017, Urbana-Champaign, IL, USA

ASE 2017 – Proceedings

Contents - Abstracts - Authors


Title Page

Message from the Chairs

ASE 2017 Organization

ASE 2017 Sponsors


Cobra - An Interactive Static Code Analyzer
Gerard Holzmann
(Nimble Research, USA)
Sadly we know that virtually all software of any significance has residual errors. Some of those errors can be traced back to requirements flaws or faulty design assumptions; others are just plain coding mistakes. Static analyzers have become quite good at spotting these types of errors, but they don’t scale very well. If, for instance, you need to check a code base of a few million lines you better be prepared to wait for the result; sometimes hours. Eyeballing a large code base to find flaws is clearly not an option, so what is missing is a static analysis capability that can be used to answer common types of queries interactively, even for large code bases. I will describe the design and use of such a tool in this talk.
Article Search
Mining Structures from Massive Text Data: Will It Help Software Engineering?
Jiawei Han
(University of Illinois at Urbana-Champaign, USA)
The real-world big data are largely unstructured, interconnected text data. One of the grand challenges is to turn such massive unstructured text data into structured, actionable knowledge. We propose a text mining approach that requires only distant or minimal supervision but relies on massive text data. We show quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entities/attributes/values can be discovered by meta-path directed pattern discovery. We show text-rich and structure-rich networks can be constructed from massive unstructured data. Finally, we speculate whether such a paradigm could be useful for turning massive software repositories into multi-dimensional structures to help searching and mining software repositories.
Article Search
Software Engineering without Borders
Arie van Deursen
(Delft University of Technology, Netherlands)
DevOps approaches software engineering by advocating the removal of borders between development and operations. DevOps emphasizes operational resilience, continuous feedback from operations back to development, and rapid deployment of features developed. In this talk we will look at selected (automation) aspects related to DevOps, based on our collaborations with various industrial partners. For example, we will explore (automated) methods for analyzing log data to support deployments and monitor REST API integrations, (search-based) test input generation for reproducing crashes and testing complex database queries, and zero downtime database schema evolution and deployment. We will close by looking at borders beyond those between development and operations, in order to see whether there are other borders we need to remove in order to strengthen the impact of software engineering research.
Article Search

Technical Research

Test Generation

Systematically Testing Background Services of Mobile Apps
Li Lyna Zhang, Chieh-Jan Mike Liang, Yunxin Liu, and Enhong Chen
(University of Science and Technology of China, China; Microsoft Research, China)
Contrary to popular belief, mobile apps can spend a large fraction of time running "hidden" as background services. And, bugs in services can translate into crashes, energy depletion, device slow-down, etc. Unfortunately, without necessary testing tools, developers can only resort to telemetries from user devices in the wild. To this end, Snowdrop is a testing framework that systematically identifies and automates background services in Android apps. Snowdrop realizes a service-oriented approach that does not assume all inter-component communication messages are explicitly coded in the app bytecode. Furthermore, to improve the completeness of test inputs generated, Snowdrop infers field values by exploiting the similarity in how developers name variables. We evaluate Snowdrop by testing 848 commercially available mobile apps. Empirical results show that Snowdrop can achieve 20.91% more code path coverage than pathwise test input generators, and 64.11% more coverage than random test input generators.
Article Search
Crowd Intelligence Enhances Automated Mobile Testing
Ke Mao, Mark Harman, and Yue Jia
(University College London, UK; Facebook, UK)
We show that information extracted from crowd-based testing can enhance automated mobile testing. We introduce Polariz, which generates replicable test scripts from crowd-based testing, extracting cross-app äóÖmotifäó» events: automatically inferred reusable higher-level event sequences composed of lower-level observed event actions. Our empirical study used 434 crowd workers from 24 countries to perform 1,350 testing tasks on 9 popular Google Play apps, each with at least 1 million user installs. The findings reveal that the crowd was able to achieve 60.5% unique activity coverage and proved to be complementary to automated search-based testing in 5 out of the 9 subjects studied. Our leave-one-out evaluation demonstrates that coverage attainment can be improved (6 out of 9 cases, with no disimprovement on the remaining 3) by combining crowd-based and search-based testing.
Article Search
EHBDroid: Beyond GUI Testing for Android Applications
Wei Song, Xiangxing Qian, and Jeff Huang
(Nanjing University of Science and Technology, China; Texas A&M University, USA)
With the prevalence of Android-based mobile devices, automated testing for Android apps has received increasing attention. However, owing to the large variety of events that Android supports, test input generation is a challenging task. In this paper, we present a novel approach and an open source tool called EHBDroid for testing Android apps. In contrast to conventional GUI testing approaches, a key novelty of EHBDroid is that it does not generate events from the GUI, but directly invokes callbacks of event handlers. By doing so, EHBDroid can efficiently simulate a large number of events that are difficult to generate by traditional UI-based approaches. We have evaluated EHBDroid on a collection of 35 real-world large-scale Android apps and compared its performance with two state-of-the-art UI-based approaches, Monkey and Dynodroid. Our experimental results show that EHBDroid is significantly more effective and efficient than Monkey and Dynodroid: in a much shorter time, EHBDroid achieves as much as 22.3% higher statement coverage (11.1% on average) than the other two approaches, and found 12 bugs in these benchmarks, including 5 new bugs that the other two failed to find.
Article Search Info
Sketch-Guided GUI Test Generation for Mobile Applications
Chucheng Zhang, Haoliang Cheng, Enyi Tang, Xin Chen, Lei Bu, and Xuandong Li
(Nanjing University, China)
Mobile applications with complex GUIs are very popular today. However, generating test cases for these applications is often tedious professional work. On the one hand, manually designing and writing elaborate GUI scripts requires expertise. On the other hand, generating GUI scripts with record and playback techniques usually depends on repetitive work that testers need to interact with the application over and over again, because only one path is recorded in an execution. Automatic GUI testing focuses on exploring combinations of GUI events. As the number of combinations is huge, it is still necessary to introduce a test interface for testers to reduce its search space. This paper presents a sketch-guided GUI test generation approach for testing mobile applications, which provides a simple but expressive interface for testers to specify their testing purposes. Testers just need to draw a few simple strokes on the screenshots. Then our approach translates the strokes to a testing model and initiates a model-based automatic GUI testing. We evaluate our sketch-guided approach on a few real-world Android applications collected from the literature. The results show that our approach can achieve higher coverage than existing automatic GUI testing techniques with just 10-minute sketching for an application.
Article Search
Saying ’Hi!’ Is Not Enough: Mining Inputs for Effective Test Generation
Luca Della Toffola, Cristian Alexandru Staicu, and Michael Pradel
(ETH Zurich, Switzerland; TU Darmstadt, Germany)
Automatically generating unit tests is a powerful approach to exercise complex software. Unfortunately, current techniques often fail to provide relevant input values, such as strings that bypass domain-specific sanity checks. As a result, state-of-the-art techniques are effective for generic classes, such as collections, but less successful for domain-specific software. This paper presents TestMiner, the first technique for mining a corpus of existing tests for input values to be used by test generators for effectively testing software not in the corpus. The main idea is to extract literals from thousands of tests and to adapt information retrieval techniques to find values suitable for a particular domain. Evaluating the approach with 40 Java classes from 18 different projects shows that TestMiner improves test coverage by 21% over an existing test generator. The approach can be integrated into various test generators in a straightforward way, increasing their effectiveness on previously difficult-to-test classes.
Article Search Info
Learn&Fuzz: Machine Learning for Input Fuzzing
Patrice Godefroid, Hila Peleg, and Rishabh Singh
(Microsoft Research, USA; Technion, Israel)
Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft's new Edge browser. We discuss and measure the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability distribution to intelligently guide where to fuzz inputs.
Article Search

Developers’ Practice and Behavior

The Impact of Continuous Integration on Other Software Development Practices: A Large-Scale Empirical Study
Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu
(Nanjing University, China; Eindhoven University of Technology, Netherlands; University of California at Davis, USA; Carnegie Mellon University, USA)
Continuous Integration (CI) has become a disruptive innovation in software development: with proper tool support and adoption, positive effects have been demonstrated for pull request throughput and scaling up of project sizes. As any other innovation, adopting CI implies adapting existing practices in order to take full advantage of its potential, and "best practices" to that end have been proposed. Here we study the adaptation and evolution of code writing and submission, issue and pull request closing, and testing practices as Travis CI is adopted by hundreds of established projects on GitHub. To help essentialize the quantitative results, we also survey a sample of GitHub developers about their experiences with adopting Travis CI. Our findings suggest a more nuanced picture of how GitHub teams are adapting to, and benefiting from, continuous integration technology than suggested by prior work.
Article Search
Perceived Language Complexity in GitHub Issue Discussions and Their Effect on Issue Resolution
David Kavaler, Sasha Sirovica, Vincent Hellendoorn, Raul Aranovich, and Vladimir Filkov
(University of California at Davis, USA)
Modern software development is increasingly collaborative. Open Source Software (OSS) are the bellwether; they support dynamic teams, with tools for code sharing, communication, and issue tracking. The success of an OSS project is reliant on team communication. E.g., in issue discussions, individuals rely on rhetoric to argue their position, but also maintain technical relevancy. Rhetoric and technical language are on opposite ends of a language complexity spectrum: the former is stylistically natural; the latter is terse and concise. Issue discussions embody this duality, as developers use rhetoric to describe technical issues. The style mix in any discussion can define group culture and affect performance, e.g., issue resolution times may be longer if discussion is imprecise. Using GitHub, we studied issue discussions to understand whether project-specific language differences exist, and to what extent users conform to a language norm. We built project-specific and overall GitHub language models to study the effect of perceived language complexity on multiple responses. We find that experienced users conform to project-specific language norms, popular individuals use overall GitHub language rather than project-specific language, and conformance to project-specific language norms reduces issue resolution times. We also provide a tool to calculate project-specific perceived language complexity.
Article Search
Can Automated Pull Requests Encourage Software Developers to Upgrade Out-of-Date Dependencies?
Samim Mirhosseini and Chris Parnin
(North Carolina State University, USA)
Developers neglect to update legacy software dependencies, resulting in buggy and insecure software. One explanation for this neglect is the difficulty of constantly checking for the availability of new software updates, verifying their safety, and addressing any migration efforts needed when upgrading a dependency. Emerging tools attempt to address this problem by introducing automated pull requests and project badges to inform the developer of stale dependencies. To understand whether these tools actually help developers, we analyzed 7,470 GitHub projects that used these notification mechanisms to identify any change in upgrade behavior. Our results find that, on average, projects that use pull request notifications upgraded 1.6x as often as projects that did not use any tools. Badge notifications were slightly less effective: users upgraded 1.4x more frequently. Unfortunately, although pull request notifications are useful, developers are often overwhelmed by notifications: only a third of pull requests were actually merged. Through a survey, 62 developers indicated that their most significant concerns are breaking changes, understanding the implications of changes, and migration effort. The implications of our work suggests ways in which notifications can be improved to better align with developers' expectations and the need for new mechanisms to reduce notification fatigue and improve confidence in automated pull requests.
Article Search
Are Developers Aware of the Architectural Impact of Their Changes?
Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, and Mark Harman
(University College London, UK)
Although considered one of the most important decisions in a software development lifecycle, empirical evidence on how developers perform and perceive architectural changes is still scarce. Given the large implications of architectural decisions, we do not know whether developers are aware of their changes' impact on the software's architecture, whether awareness leads to better changes, and whether automatically making developers aware would preventdegradation. Therefore, we use code review data of 4 open source systems to investigate the intent and awareness of developers when performing changes. We extracted 8,900 reviews for which the commits are available. 2,152 of the commits have changes in their computed architectural metrics, and 338 present significant changes to the architecture. We manually inspected all reviews for commits with significant changes and found that only in 38% of the time developers are discussing the impact of their changes on the architectural structure, suggesting a lack of awareness. Finally, we observed that developers tend to be more aware of the architectural impact of their changes when the architectural structure is improved, suggesting that developers should be automatically made aware when their changes degrade the architectural structure.
Article Search Info
SentiCR: A Customized Sentiment Analysis Tool for Code Review Interactions
Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi
(Bangladesh University of Engineering and Technology, Bangladesh; Southern Illinois University at Carbondale, USA)
Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.
Article Search


Detecting Fragile Comments
Inderjot Kaur Ratol and Martin P. Robillard
(McGill University, Canada)
Refactoring is a common software development practice and many simple refactorings can be performed automatically by tools. Identifier renaming is a widely used refactoring. With tool support, rename refactorings can rely on the program structure to ensure correctness of the code transformation. Unfortunately, the textual references to the renamed identifier present in the unstructured comment text cannot be formally detected through the syntax of the language, and are thus fragile with respect to identifier renaming. We designed a new rule-based approach to detect fragile comments. Our approach, called Fraco, takes into account the type of identifier, its morphology, the scope of the identifier and the location of comments. We evaluated the approach by comparing its precision and recall against hand-annotated benchmarks created for six Java systems and compared the results against the performance of Eclipse's automated in-comment identifier replacement feature. Fraco performed with near-optimal precision and recall on most components of our evaluation dataset, and generally outperformed the baseline Eclipse feature. As part of our evaluation, we also noted that more than half of the total number of identifiers in our data set had fragile comments after renaming, which further motivates the need for research on automatic comment refactoring.
Article Search Info
Improving Software Text Retrieval using Conceptual Knowledge in Source Code
Zeqi Lin, Yanzhen Zou, Junfeng Zhao, and Bing Xie
(Peking University, China)
A large software project usually has lots of various textual learning resources about its API, such as tutorials, mailing lists, user forums, etc. Text retrieval technology allows developers to search these API learning resources for related documents using free-text queries, but it suffers from the lexical gap between search queries and documents. In this paper, we propose a novel re-ranking approach for improving the retrieval of API learning resources through leveraging software-specific conceptual knowledge in software source code. The basic idea behind this approach is that the semantic relatedness between queries and documents could be measured according to software-specific concepts involved in them, and software source code contains a large amount of software-specific conceptual knowledge. In detail, firstly we extract an API graph from software source code and use it as software-specific conceptual knowledge. Then we discover API entities involved in queries and documents, and infer semantic document relatedness through analyzing structural relationships between these API entities. We evaluate our approach in three popular open source software projects. Comparing to the state-of-the-art text retrieval approaches, our approach lead to at least 13.77% improvement with respect to mean average precision (MAP).
Article Search
Automatically Generating Commit Messages from Diffs using Neural Machine Translation
Siyuan Jiang, Ameer Armaly, and Collin McMillan
(University of Notre Dame, USA)
Commit messages are a valuable resource in comprehension of software evolution, since they provide a record of changes such as feature additions and bug repairs. Unfortunately, programmers often neglect to write good commit messages. Different techniques have been proposed to help programmers by automatically writing these messages. These techniques are effective at describing what changed, but are often verbose and lack context for understanding the rationale behind a change. In contrast, humans write messages that are short and summarize the high level rationale. In this paper, we adapt Neural Machine Translation (NMT) to automatically ``translate'' diffs into commit messages. We trained an NMT algorithm using a corpus of diffs and human-written commit messages from the top 1k Github projects. We designed a filter to help ensure that we only trained the algorithm on higher-quality commit messages. Our evaluation uncovered a pattern in which the messages we generate tend to be either very high or very low quality. Therefore, we created a quality-assurance filter to detect cases in which we are unable to produce good messages, and return a warning instead.
Article Search Info
Improving Missing Issue-Commit Link Recovery using Positive and Unlabeled Data
Yan Sun, Celia Chen, Qing Wang, and Barry Boehm
(University at Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China; Occidental College, USA; University of Southern California, USA)
Links between issue reports and corresponding fix commits are widely used in software maintenance. The quality of links directly affects maintenance costs. Currently, such links are mainly maintained by error-prone manual efforts, which may result in missing links. To tackle this problem, automatic link recovery approaches have been proposed by building traditional classifiers with positive and negative links. However, these traditional classifiers may not perform well due to the inherent characteristics of missing links. Positive links, which can be used to build link recovery model, are quite limited as the result of missing links. Since the construction of negative links depends on the number of positive links in many existing approaches, the available negative links also become restricted. In this paper, we point out that it is better to consider the missing link problem as a model learning problem by using positive and unlabeled data, rather than the construction of traditional classifier. We propose PULink, an approach that constructs the link recovery model with positive and unlabeled links. Our experiment results show that compared to existing state-of-the-art technologies built on traditional classifier, PULink can achieve competitive performance by utilizing only 70% positive links that are used in those approaches.
Article Search
APIBot: Question Answering Bot for API Documentation
Yuan Tian, Ferdian Thung, Abhishek Sharma, and David Lo
(Singapore Management University, Singapore)
As the carrier of Application Programming Interfaces (APIs) knowledge, API documentation plays a crucial role in how developers learn and use an API. It is also a valuable information resource for answering API-related questions, especially when developers cannot find reliable answers to their questions online/offline. However, finding answers to API-related questions from API documentation might not be easy because one may have to manually go through multiple pages before reaching the relevant page, and then read and understand the information inside the relevant page to figure out the answers. To deal with this challenge, we develop APIBot, a bot that can answer API questions given API documentation as an input. APIBot is built on top of SiriusQA, the QA system from Sirius, a state of the art intelligent personal assistant. To make SiriusQA work well under software engineering scenario, we make several modifications over SiriusQA by injecting domain specific knowledge. We evaluate APIBot on 92 API questions, answers of which are known to be present in Java 8 documentation. Our experiment shows that APIBot can achieve a Hit@5 score of 0.706.
Article Search
Automatic Summarization of API Reviews
Gias Uddin and Foutse Khomh
(McGill University, Canada; Polytechnique Montréal, Canada)
With the proliferation of online developer forums as informal documentation, developers often share their opinions about the APIs they use. However, given the plethora of opinions available for an API in various online developer forums, it can be challenging for a developer to make informed decisions about the APIs. While automatic summarization of opinions have been explored for other domains (e.g., cameras, cars), we found little research that investigates the benefits of summaries of public API reviews. In this paper, we present two algorithms (statistical and aspect-based) to summarize opinions about APIs. To investigate the usefulness of the techniques, we developed, Opiner, an online opinion summarization engine that presents summaries of opinions using both our proposed techniques and existing six off-the-shelf techniques. We investigated the usefulness of Opiner using two case studies, both involving professional software engineers. We found that developers were interested to use our proposed summaries much more frequently than other summaries (daily vs once a year) and that while combined with Stack Overflow, Opiner helped developers to make the right decision with more accuracy and confidence and in less time.
Article Search

Formal Verification

iCoq: Regression Proof Selection for Large-Scale Verification Projects
Ahmet Celik, Karl Palmskog, and Milos Gligoric
(University of Texas at Austin, USA; University of Illinois at Urbana-Champaign, USA)
Proof assistants such as Coq are used to construct and check formal proofs in many large-scale verification projects. As proofs grow in number and size, the need for tool support to quickly find failing proofs after revising a project increases. We present a technique for large-scale regression proof selection, suitable for use in continuous integration services, e.g., Travis CI. We instantiate the technique in a tool dubbed iCoq . iCoq tracks fine-grained dependencies between Coq definitions, propositions, and proofs, and only checks those proofs affected by changes between two revisions. iCoq additionally saves time by ignoring changes with no impact on semantics. We applied iCoq to track dependencies across many revisions in several large Coq projects and measured the time savings compared to proof checking from scratch and when using Coq's timestamp-based toolchain for incremental checking. Our results show that proof checking with iCoq is up to 10 times faster than the former and up to 3 times faster than the latter.
Article Search
More Effective Interpolations in Software Model Checking
Cong Tian, Zhao Duan, Zhenhua Duan, and C.-H. Luke Ong
(Xidian University, China; University of Oxford, UK)
An approach to CEGAR-based model checking which has proved to be successful on large models employs Craig interpolation to efficiently construct parsimonious abstractions. Following this design, we introduce new applications, universal safety interpolant and existential error interpolant, of Craig interpolation that can systematically reduce the program state space to be explored for safety verification. Whenever the universal safety interpolant is implied by the current path, all paths emanating from that location are guaranteed to be safe. Dually whenever the existential error interpolant is implied by the current path, there is guaranteed to be an unsafe path from the location. We show how these interpolants are computed and applied in safety verification. We have implemented our approach in a tool named InterpChecker by building on an open source software model checker. Experiments on a large number of benchmark programs show that both the interpolations and the auxiliary optimization strategies are effective in improving scalability of software model checking.
Article Search
Proof-Based Coverage Metrics for Formal Verification
Elaheh Ghassabani, Andrew Gacek, Michael W. Whalen, Mats P. E. Heimdahl, and Lucas Wagner
(University of Minnesota, USA; Rockwell Collins, USA)
When using formal verification on critical software, an important question involves whether the properties used for analysis are sufficient to adequately constrain the behavior of an implementation model. To address this question, coverage metrics for property-based formal verification have been proposed. Existing metrics are usually based on mutation, where the implementation model is repeatedly modified and re-analyzed to determine whether mutant models are ``killed'' by the property set. These metrics tend to be very expensive to compute, as they involve many additional verification problems. This paper proposes an alternate family of metrics that can be computed using the recently introduced idea of Inductive Validity Cores (IVCs). IVCs determine a minimal set of model elements necessary to establish a proof. One of the proposed metrics is both rigorous and substantially cheaper to compute than mutation-based metrics. In addition, unlike the mutation-based techniques, the design elements marked as necessary by the metric are guaranteed to preserve provability. We demonstrate the metrics on a large corpus of examples.
Article Search Info
Model Checker Execution Reports
Rodrigo Castaño, Víctor Braberman, Diego Garbervetsky, and Sebastian Uchitel
(University of Buenos Aires, Argentina; CONICET, Argentina)
Software model checking constitutes an undecidable problem and, as such, even an ideal tool will in some cases fail to give a conclusive answer. In practice, software model checkers fail often and usually do not provide any information on what was effectively checked. The purpose of this work is to provide a conceptual framing to extend software model checkers in a way that allows users to access information about incomplete checks. We characterize the information that model checkers themselves can provide, in terms of analyzed traces, i.e. sequences of statements, and safe cones, and present the notion of execution reports (ERs), which we also formalize. We instantiate these concepts for a family of techniques based on Abstract Reachability Trees and implement the approach using the software model checker CPAchecker. We evaluate our approach empirically and provide examples to illustrate the execution reports produced and the information that can be extracted.
Article Search
Modular Verification of Interrupt-Driven Software
Chungha Sung, Markus Kusano, and Chao Wang
(University of Southern California, USA; Virginia Tech, USA)
Interrupts have been widely used in safety-critical computer systems to handle outside stimuli and interact with the hardware, but reasoning about interrupt-driven software remains a difficult task. Although a number of static verification techniques have been proposed for interrupt-driven software, they often rely on constructing a monolithic verification model. Furthermore, they do not precisely capture the complete execution semantics of interrupts such as nested invocations of interrupt handlers. To overcome these limitations, we propose an abstract interpretation framework for static verification of interrupt-driven software that first analyzes each interrupt handler in isolation as if it were a sequential program, and then propagates the result to other interrupt handlers. This iterative process continues until results from all interrupt handlers reach a fixed point. Since our method never constructs the global model, it avoids the up-front blowup in model construction that hampers existing, non-modular, verification techniques. We have evaluated our method on 35 interrupt-driven applications with a total of 22,541 lines of code. Our results show the method is able to quickly and more accurately analyze the behavior of interrupts.
Article Search
BProVe: A Formal Verification Framework for Business Process Models
Flavio Corradini, Fabrizio Fornari, Andrea Polini, Barbara Re, Francesco Tiezzi, and Andrea Vandin
(University of Camerino, Italy; DTU, Denmark)
Business Process Modelling has acquired increasing relevance in software development. Available notations, such as BPMN, permit to describe activities of complex organisations. On the one hand, this shortens the communication gap between domain experts and IT specialists. On the other hand, this permits to clarify the characteristics of software systems introduced to provide automatic support for such activities. Nevertheless, the lack of formal semantics hinders the automatic verification of relevant properties. This paper presents a novel verification framework for BPMN 2.0, called BProVe. It is based on an operational semantics, implemented using MAUDE, devised to make the verification general and effective. A complete tool chain, based on the Eclipse modelling environment, allows for rigorous modelling and analysis of Business Processes. The approach has been validated using more than one thousand models available on a publicly accessible repository. Besides showing the performance of BProVe, this validation demonstrates its practical benefits in identifying correctness issues in real models.
Article Search


Static Detection of Asymptotic Resource Side-Channel Vulnerabilities in Web Applications
Jia Chen, Oswaldo Olivo, Isil Dillig, and Calvin Lin
(University of Texas at Austin, USA)
Web applications can leak confidential user information due to the presence of unintended side-channel vulnerabilities in code. One particularly subtle class of side-channel vulnerabilities arises due to resource usage imbalances along different execution paths of a program. Such side-channel vulnerabilities are especially severe if the resource usage imbalance is asymptotic. In particular, if the resource usage is constant along one branch but asymptotically dependent on user input along another branch, the attacker can arbitrarily inflate the program's input to amplify differences in resource usage, with the goal of inferring confidential user data. This paper formalizes the notion of asymptotic resource side-channels and presents a lightweight static analysis algorithm for automatically detecting such vulnerabilities in web applications. Based on these ideas, we have developed a tool called SCANNER for detecting resource-related side-channel vulnerabilities in PHP applications. SCANNER has found 18 zero-day security vulnerabilities in 10 different web applications and reports only 2 false positives. The vulnerabilities uncovered by SCANNER can be exploited using cross-site search attacks to extract various kinds of confidential information, such as a user's medications or purchase history.
Article Search
PAD: Programming Third-Party Web Advertisement Censorship
Weihang Wang, Yonghwi Kwon, Yunhui Zheng, Yousra Aafer, I.-Luk Kim, Wen-Chuan Lee, Yingqi Liu, Weijie Meng, Xiangyu Zhang, and Patrick Eugster
(Purdue University, USA; IBM Research, USA; TU Darmstadt, Germany)
In the current online advertisement delivery, an ad slot on a publisher's website may go through multiple layers of bidding and reselling until the final ad content is delivered. The publishers have little control on the ads being displayed on their web pages. As a result, website visitors may suffer from unwanted ads such as malvertising, intrusive ads, and information disclosure ads. Unfortunately, the visitors often blame the publisher for their unpleasant experience and switch to competitor websites. In this paper, we propose a novel programming support system for ad delivery, called PAD, for publisher programmers, who specify their policies on regulating third-party ads shown on their websites. PAD features an expressive specification language and a novel persistent policy enforcement runtime that can self-install and self-protect throughout the entire ad delegation chain. It also provides an ad-specific memory protection scheme that prevents malvertising by corrupting malicious payloads. Our experiments show that PAD has negligible runtime overhead. It effectively suppresses a set of malvertising cases and unwanted ad behaviors reported in the real world, without affecting normal functionalities and regular ads.
Article Search
All about Activity Injection: Threats, Semantics, and Detection
Sungho Lee, Sungjae Hwang, and Sukyoung Ryu
(KAIST, South Korea; LG Electronics, South Korea)
Android supports seamless user experience by maintaining activities from different apps in the same activity stack. While such close inter-app communication is essential in the Android framework, the powerful inter-app communication contains vulnerabilities that can inject malicious activities into a victim app's activity stack to hijack user interaction flows. In this paper, we demonstrate activity injection attacks with a simple malware, and formally specify the activity activation mechanism using operational semantics. Based on the operational semantics, we develop a static analysis tool, which analyzes Android apps to detect activity injection attacks. Our tool is fast enough to analyze real-world Android apps in 6~seconds on average, and our experiments found that 1,761 apps out of 129,756 real-world Android apps inject their activities into other apps' tasks.
Article Search
Detecting Information Flow by Mutating Input Data
Björn Mathis, Vitalii Avdiienko, Ezekiel O. Soremekun, Marcel Böhme, and Andreas Zeller
(Saarland University, Germany; National University of Singapore, Singapore)
Analyzing information flow is central in assessing the security of applications. However, static and dynamic analyses of information flow are easily challenged by non-available or obscure code. We present a lightweight mutation-based analysis that systematically mutates dynamic values returned by sensitive sources to assess whether the mutation changes the values passed to sensitive sinks. If so, we found a flow between source and sink. In contrast to existing techniques, mutation-based flow analysis does not attempt to identify the specific path of the flow and is thus resilient to obfuscation. In its evaluation, our MUTAFLOW prototype for Android programs showed that mutation-based flow analysis is a lightweight yet effective complement to existing tools. Compared to the popular FLOWDROID static analysis tool, MUTAFLOW requires less than 10% of source code lines but has similar accuracy; on 20 tested real-world apps, it is able to detect 75 flows that FLOWDROID misses.
Article Search
Automatically Assessing Crashes from Heap Overflows
Liang He, Yan Cai, Hong Hu, Purui Su, Zhenkai Liang, Yi Yang, Huafeng Huang, Jia Yan, Xiangkun Jia, and Dengguo Feng
(Institute of Software at Chinese Academy of Sciences, China; National University of Singapore, Singapore)
Heap overflow is one of the most widely exploited vulnerabilities, with a large number of heap overflow instances reported every year. It is important to decide whether a crash caused by heap overflow can be turned into an exploit. Efficient and effective assessment of exploitability of crashes facilitates to identify severe vulnerabilities and thus prioritize resources. In this paper, we propose the first metrics to assess heap overflow crashes based on both the attack aspect and the feasibility aspect. We further present HCSIFTER, a novel solution to automatically assess the exploitability of heap overflow instances under our metrics. Given a heap-based crash, HCSIFTER accurately detects heap overflows through dynamic execution without any source code or debugging information. Then it uses several novel methods to extract program execution information needed to quantify the severity of the heap overflow using our metrics. We have implemented a prototype HCSIFTER and applied it to assess nine programs with heap overflow vulnerabilities. HCSIFTER successfully reports that five heap overflow vulnerabilities are highly exploitable and two overflow vulnerabilities are unlikely exploitable. It also gave quantitatively assessments for other two programs. On average, it only takes about two minutes to assess one heap overflow crash. The evaluation result demonstrates both effectiveness and efficiency of HCSIFTER.
Article Search
Learning to Share: Engineering Adaptive Decision-Support for Online Social Networks
Yasmin Rafiq, Luke Dickens, Alessandra Russo, Arosha K. Bandara, Mu Yang, Avelie Stuart, Mark Levine, Gul Calikli, Blaine A. Price, and Bashar Nuseibeh
(Imperial College London, UK; University College London, UK; Open University, UK; University of Southampton, UK; University of Exeter, UK; Chalmers University of Technology, Sweden; University of Gothenburg, Sweden; Lero, Ireland)

Some online social networks (OSNs) allow users to define friendship-groups as reusable shortcuts for sharing information with multiple contacts. Posting exclusively to a friendship-group gives some privacy control, while supporting communication with (and within) this group. However, recipients of such posts may want to reuse content for their own social advantage, and can bypass existing controls by copy-pasting into a new post; this cross-posting poses privacy risks.

This paper presents a learning to share approach that enables the incorporation of more nuanced privacy controls into OSNs. Specifically, we propose a reusable, adaptive software architecture that uses rigorous runtime analysis to help OSN users to make informed decisions about suitable audiences for their posts. This is achieved by supporting dynamic formation of recipient-groups that benefit social interactions while reducing privacy risks. We exemplify the use of our approach in the context of Facebook.

Article Search

Mobile Development

UI Driven Android Application Reduction
Jianjun Huang, Yousra Aafer, David Perry, Xiangyu Zhang, and Chen Tian
(Purdue University, USA; Huawei, USA)
While smartphones and mobile apps have been an integral part of our life, modern mobile apps tend to contain a lot of rarely used functionalities. For example, applications contain advertisements and offer extra features such as recommended news stories in weather apps. While these functionalities are not essential to an app, they nonetheless consume power, CPU cycles and bandwidth. In this paper, we design a UI driven approach that allows customizing an Android app by removing its unwanted functionalities. In particular, our technique displays the UI and allows the user to select elements denoting functionalities that she wants to remove. Using this information, our technique automatically removes all the code elements related to the selected functionalities, including all the relevant background tasks. The underlying analysis is a type system, in which each code element is tagged with a type indicating if it should be removed. From the UI hints, our technique infers types for all other code elements and reduces the app accordingly. We implement a prototype and evaluate it on 10 real-world Android apps. The results show that our approach can accurately discover the removable code elements and lead to substantial resource savings in the reduced apps.
Article Search
SimplyDroid: Efficient Event Sequence Simplification for Android Application
Bo Jiang, Yuxuan Wu, Teng Li, and W. K. Chan
(Beihang University, China; City University of Hong Kong, China)
To ensure the quality of Android applications, many automatic test case generation techniques have been proposed. Among them, the Monkey fuzz testing tool and its variants are simple, effective and widely applicable. However, one major drawback of the family of Monkey tools is that they often generate a large number of events in a failure-inducing input trace, which makes the follow-up debugging activities hard to apply. It is desirable to simplify or reduce the input event sequence while triggering the same failure. In this paper, we propose an efficient event trace representation and the SimplyDroid tool with three hierarchical delta-debugging algorithms each operating on this trace representation to simplify crash traces. We have evaluated SimplyDroid on a suite of real-life Android applications with 92 crash traces. The empirical result shows that our new algorithms in SimplyDroid are both efficient and effective in reducing these event traces.
Article Search
Automated Cross-Platform Inconsistency Detection for Mobile Apps
Mattia Fazzini and Alessandro Orso
(Georgia Institute of Technology, USA)
Testing of Android apps is particularly challenging due to the fragmentation of the Android ecosystem in terms of both devices and operating system versions. Developers must in fact ensure not only that their apps behave as expected, but also that the apps’ behavior is consistent across platforms. To support this task, we propose DiffDroid, a new technique that helps developers automatically find cross-platform inconsistencies (CPIs) in mobile apps. DiffDroid combines input generation and differential testing to compare the behavior of an app on different platforms and identify possible inconsistencies. Given an app, DiffDroid (1) generates test inputs for the app, (2) runs the app with these inputs on a reference device and builds a model of the app behavior, (3) runs the app with the same inputs on a set of other devices, and (4) compares the behavior of the app on these different devices with the model of its behavior on the reference device. We implemented DiffDroid and performed an evaluation of our approach on 5 benchmarks and over 130 platforms. Our results show that DiffDroid can identify CPIs on real apps efficiently and with a limited number of false positives. DiffDroid and our experimental infrastructure are publicly available.
Article Search

Binary Analysis

In-Memory Fuzzing for Binary Code Similarity Analysis
Shuai Wang and Dinghao Wu
(Pennsylvania State University, USA)
Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use. In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.
Article Search
DSIbin: Identifying Dynamic Data Structures in C/C++ Binaries
Thomas Rupprecht, Xi Chen, David H. White, Jan H. Boockmann, Gerald Lüttgen, and Herbert Bos
(University of Bamberg, Germany; Microsoft, Canada; VU University Amsterdam, Netherlands)
Reverse engineering binary code is notoriously difficult and, especially, understanding a binary’s dynamic data structures. Existing data structure analyzers are limited wrt. program comprehension: they do not detect complex structures such as skip lists, or lists running through nodes of different types such as in the Linux kernel’s cyclic doubly-linked list. They also do not reveal complex parent-child relationships between structures. The tool DSI remedies these shortcomings but requires source code, where type information on heap nodes is available. We present DSIbin, a combination of DSI and the type excavator Howard for the inspection of C/C++ binaries. While a naive combination already improves upon related work, its precision is limited because Howard’s inferred types are often too coarse. To address this we auto-generate candidates of refined types based on speculative nested-struct detection and type merging; the plausibility of these hypotheses is then validated by DSI. We demonstrate via benchmarking that DSIbin detects data structures with high precision.
Article Search
Towards Robust Instruction-Level Trace Alignment of Binary Code
Ulf Kargén and Nahid Shahmehri
(Linköping University, Sweden)
Program trace alignment is the process of establishing a correspondence between dynamic instruction instances in executions of two semantically similar but syntactically different programs. In this paper we present what is, to the best of our knowledge, the first method capable of aligning realistically long execution traces of real programs. To maximize generality, our method works entirely on the machine code level, i.e. it does not require access to source code. Moreover, the method is based entirely on dynamic analysis, which avoids the many challenges associated with static analysis of binary code, and which additionally makes our approach inherently resilient to e.g. static code obfuscation. Therefore, we believe that our trace alignment method could prove to be a useful aid in many program analysis tasks, such as debugging, reverse-engineering, investigating plagiarism, and malware analysis. We empirically evaluate our method on 11 popular Linux programs, and show that it is capable of producing meaningful alignments in the presence of various code transformations such as optimization or obfuscation, and that it easily scales to traces with tens of millions of instructions.
Article Search
Testing Intermediate Representations for Binary Analysis
Soomin Kim, Markus Faerevaag, Minkyu Jung, SeungIl Jung, DongYeop Oh, JongHyup Lee, and Sang Kil Cha
(KAIST, South Korea; Gachon University, South Korea)
Binary lifting, which is to translate a binary executable to a high-level intermediate representation, is a primary step in binary analysis. Despite its importance, there are only few existing approaches to testing the correctness of binary lifters. Furthermore, the existing approaches suffer from low test coverage, because they largely depend on random test case generation. In this paper, we present the design and implementation of the first systematic approach to testing binary lifters. We have evaluated the proposed system on 3 state-of-the-art binary lifters, and found 24 previously unknown semantic bugs. Our result demonstrates that writing a precise binary lifter is extremely difficult even for those heavily tested projects.
Article Search

From Failures to Faults

Comprehensive Failure Characterization
Mitchell J. Gerrard and Matthew B. Dwyer
(University of Nebraska-Lincoln, USA)
There is often more than one way to trigger a fault. Standard static and dynamic approaches focus on exhibiting a single witness for a failing execution. In this paper, we study the problem of computing a comprehensive characterization which safely bounds all failing program behavior while exhibiting a diversity of witnesses for those failures. This information can be used to facilitate software engineering tasks ranging from fault localization and repair to quantitative program analysis for reliability. Our approach combines the results of overapproximating and underapproximating static analyses in an alternating iterative framework to produce upper and lower bounds on the failing input space of a program, which we call a comprehensive failure characterization (CFC). We evaluated a prototype implementation of this alternating framework on a set of 168 C programs from the SVCOMP benchmarks, and the data indicate that it is possible to efficiently, accurately, and safely characterize failure spaces.
Article Search Info
TrEKer: Tracing Error Propagation in Operating System Kernels
Nicolas Coppik, Oliver Schwahn, Stefan Winter, and Neeraj Suri
(TU Darmstadt, Germany)
Modern operating systems (OSs) consist of numerous interacting components, many of which are developed and maintained independently of one another. In monolithic systems, the boundaries of and interfaces between such components are not strictly enforced at runtime. Therefore, faults in individual components may directly affect other parts of the system in various ways. Software fault injection (SFI) is a testing technique to assess the resilience of a software system in the presence of faulty components. Unfortunately, SFI tests of OSs are inconclusive if they do not lead to observable failures, as corruptions of the internal software state may not be visible at its interfaces and, yet, affect the subsequent execution of the OS beyond the duration of the test. In this paper we present TrEKer, a fully automated approach for identifying how faulty OS components affect other parts of the system. TrEKer combines static and dynamic analyses to achieve efficient tracing on the granularity of memory accesses. We demonstrate TrEKer's ability to support SFI oracles by accurately tracing the effects of faults injected into three widely used Linux kernel modules.
Article Search
RuntimeSearch: Ctrl+F for a Running Program
Matúš Sulír and Jaroslav Porubän
(Technical University of Košice, Slovakia)
Developers often try to find occurrences of a certain term in a software system. Traditionally, a text search is limited to static source code files. In this paper, we introduce a simple approach, RuntimeSearch, where the given term is searched in the values of all string expressions in a running program. When a match is found, the program is paused and its runtime properties can be explored with a traditional debugger. The feasibility and usefulness of RuntimeSearch is demonstrated on a medium-sized Java project.
Article Search

Program Comprehension

Mining Implicit Design Templates for Actionable Code Reuse
Yun Lin, Guozhu Meng, Yinxing Xue, Zhenchang Xing, Jun Sun, Xin Peng, Yang Liu, Wenyun Zhao, and Jinsong Dong
(National University of Singapore, Singapore; Nanyang Technological University, Singapore; Australian National University, Australia; Singapore University of Technology and Design, Singapore; Fudan University, China; Griffith University, Australia)
In this paper, we propose an approach to detecting project-specific recurring designs in code base and abstracting them into design templates as reuse opportunities. The mined templates allow programmers to make further customization for generating new code. The generated code involves the code skeleton of recurring design as well as the semi-implemented code bodies annotated with comments to remind programmers of necessary modification. We implemented our approach as an Eclipse plugin called MICoDe. We evaluated our approach with a reuse simulation experiment and a user study involving 16 participants. The results of our simulation experiment on 10 open source Java projects show that, to create a new similar feature with a design template, (1) on average 69% of the elements in the template can be reused and (2) on average 60% code of the new feature can be adopted from the template. Our user study further shows that, compared to the participants adopting the copy-paste-modify strategy, the ones using MICoDe are more effective to understand a big design picture and more efficient to accomplish the code reuse task.
Article Search Video
Exploring Regular Expression Comprehension
Carl Chapman, Peipei Wang, and Kathryn T. Stolee
(Sandia National Laboratories, USA; North Carolina State University, USA)
The regular expression (regex) is a powerful tool employed in a large variety of software engineering tasks. However, prior work has shown that regexes can be very complex and that it could be difficult for developers to compose and understand them. This work seeks to identify code smells that impact comprehension. We conduct an empirical study on 42 of pairs of behaviorally equivalent but syntactically different regexes using 180 participants and evaluated the understandability of various regex language features. We further analyzed regexes in GitHub to find the community standards or the common usages of various features. We found that some regex expression representations are more understandable than others. For example, using a range (e.g., [0-9]) is often more understandable than a default character class (e.g., [d]). We also found that the DFA size of a regex significantly affects comprehension for the regexes studied. The larger the DFA of a regex (up to size eight), the more understandable it was. Finally, we identify smelly and non-smelly regex representations based on a combination of community standards and understandability metrics.
Article Search Info
Automatically Assessing Code Understandability: How Far Are We?
Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares-Vásquez, Denys Poshyvanyk, and Rocco Oliveto
(University of Molise, Italy; University of Lugano, Switzerland; College of William and Mary, USA; Universidad de los Andes, Colombia)
Program understanding plays a pivotal role in software maintenance and evolution: a deep understanding of code is the stepping stone for most software-related activities, such as bug fixing or testing. Being able to measure the understandability of a piece of code might help in estimating the effort required for a maintenance activity, in comparing the quality of alternative implementations, or even in predicting bugs. Unfortunately, there are no existing metrics specifically designed to assess the understandability of a given code snippet. In this paper, we perform a first step in this direction, by studying the extent to which several types of metrics computed on code, documentation, and developers correlate with code understandability. To perform such an investigation we ran a study with 46 participants who were asked to understand eight code snippets each. We collected a total of 324 evaluations aiming at assessing the perceived understandability, the actual level of understanding, and the time needed to understand a code snippet. Our results demonstrate that none of the (existing and new) metrics we considered is able to capture code understandability, not even the ones assumed to assess quality attributes strongly related with it, such as code readability and complexity.
Article Search Info
Improved Query Reformulation for Concept Location using CodeRank and Document Structures
Mohammad Masudur Rahman and Chanchal K. Roy
(University of Saskatchewan, Canada)
During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.
Article Search Info
Understanding Feature Requests by Leveraging Fuzzy Method and Linguistic Analysis
Lin Shi, Celia Chen, Qing Wang, Shoubin Li, and Barry Boehm
(Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; University of Southern California, USA; Occidental College, USA)
In open software development environment, a large number of feature requests with mixed quality are often posted by stakeholders and usually managed in issue tracking systems. Thoroughly understanding and analyzing the real intents that feature requests imply is a labor-intensive and challenging task. In this paper, we introduce an approach to understand feature requests automatically. We generate a set of fuzzy rules based on natural language processing techniques that classify each sentence in feature requests into a set of categories: Intent, Explanation, Benefit, Drawback, Example and Trivia. Consequently, the feature requests can be automatically structured based on the classification results. We conduct experiments on 2,112 sentences taken from 602 feature requests of nine popular open source projects. The results show that our method can reach a high performance on classifying sentences from feature requests. Moreover, when applying fuzzy rules on machine learning methods, the performance can be improved significantly.
Article Search Info


O2O Service Composition with Social Collaboration
Wenyi Qian, Xin Peng, Jun Sun, Yijun Yu, Bashar Nuseibeh, and Wenyun Zhao
(Fudan University, China; Singapore University of Technology and Design, Singapore; Open University, UK; Lero, Ireland)
In Online-to-Offline (O2O) commerce, customer services may need to be composed from online and offline services. Such composition is challenging, as it requires effective selection of appropriate services that, in turn, support optimal combination of both online and offline services. In this paper, we address this challenge by proposing an approach to O2O service composition which combines offline route planning and social collaboration to optimize service selection. We frame general O2O service composition problems using timed automata and propose an optimization procedure that incorporates: (1) a Markov Chain Monte Carlo (MCMC) algorithm to stochastically select a concrete composite service, and (2) a model checking approach to searching for an optimal collaboration plan with the lowest cost given certain time constraint. Our procedure has been evaluated using the simulation of a rich scenario on effectiveness and scalability.
Article Search
Gremlin-ATL: A Scalable Model Transformation Framework
Gwendal Daniel, Frédéric Jouault, Gerson Sunyé, and Jordi Cabot
(AtlanMod, France; Inria, France; Groupe ESEO, France; ICREA, Spain; Open University of Catalonia, Spain)
Industrial use of Model Driven Engineering techniques has emphasized the need for efficiently store, access, and transform very large models. While scalable persistence frameworks, typically based on some kind of NoSQL database, have been proposed to solve the model storage issue, the same level of performance improvement has not been achieved for the model transformation problem. Existing model transformation tools (such as the well-known ATL) often require the input models to be loaded in memory prior to the start of the transformation and are not optimized to benefit from lazy-loading mechanisms, mainly due to their dependency on current low-level APIs offered by the most popular modeling frameworks nowadays. In this paper we present Gremlin-ATL, a scalable and efficient model-to-model transformation framework that translates ATL transformations into Gremlin, a query language supported by several NoSQL databases. With Gremlin-ATL, the transformation is computed within the database itself, bypassing the modeling framework limitations and improving its performance both in terms of execution time and memory consumption. Tool support is available online.
Article Search
Diagnosing Assumption Problems in Safety-Critical Products
Mona Rahimi, Wandi Xiong, Jane Cleland-Huang, and Robyn Lutz
(DePaul University, USA; Iowa State University, USA; University of Notre Dame, USA)
Problems with the correctness and completeness of environmental assumptions contribute to many accidents in safety-critical systems. The problem is exacerbated when products are modified in new releases or in new products of a product line. In such cases existing sets of environmental assumptions are often carried forward without sufficiently rigorous analysis. This paper describes a new technique that exploits the traceability required by many certifying bodies to reason about the likelihood that environmental assumptions are omitted or incorrectly retained in new products. An analysis of over 150 examples of environmental assumptions in historical systems informs the approach. In an evaluation on three safety-related product lines the approach caught all but one of the assumption-related problems. It also provided clearly defined steps for mitigating the identified issues. The contribution of the work is to arm the safety analyst with useful information for assessing the validity of environmental assumptions for a new product.
Article Search
Software Performance Self-Adaptation through Efficient Model Predictive Control
Emilio Incerto, Mirco Tribastone, and Catia Trubiani
(Gran Sasso Science Institute, Italy; IMT School for Advanced Studies Lucca, Italy)
A key challenge in software systems that are exposed to runtime variabilities, such as workload fluctuations and service degradation, is to continuously meet performance requirements. In this paper we present an approach that allows performance self-adaptation using a system model based on queuing networks (QNs), a well-assessed formalism for software performance engineering. Software engineers can select the adaptation knobs of a QN (routing probabilities, service rates, and concurrency level) and we automatically derive a Model Predictive Control (MPC) formulation suitable to continuously configure the selected knobs and track the desired performance requirements. Previous MPC approaches have two main limitations: i) high computational cost of the optimization, due to nonlinearity of the models; ii) focus on long-run performance metrics only, due to the lack of tractable representations of the QN's time-course evolution. As a consequence, these limitations allow adaptations with coarse time granularities, neglecting the system's transient behavior. Our MPC adaptation strategy is efficient since it is based on mixed integer programming, which uses a compact representation of a QN with ordinary differential equations. An extensive evaluation on an implementation of a load balancer demonstrates the effectiveness of the adaptation and compares it with traditional methods based on probabilistic model checking.
Article Search
Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis
Pooyan Jamshidi, Norbert Siegmund, Miguel Velez, Christian Kästner, Akshay Patel, and Yuvraj Agarwal
(Carnegie Mellon University, USA; Bauhaus-University Weimar, Germany)
Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space.
Article Search Info

Reliability and Bugs

A Comprehensive Study of Real-World Numerical Bug Characteristics
Anthony Di Franco, Hui Guo, and Cindy Rubio-González
(University of California at Davis, USA)
Numerical software is used in a wide variety of applications including safety-critical systems, which have stringent correctness requirements, and whose failures have catastrophic consequences that endanger human life. Numerical bugs are known to be particularly difficult to diagnose and fix, largely due to the use of approximate representations of numbers such as floating point. Understanding the characteristics of numerical bugs is the first step to combat them more effectively. In this paper, we present the first comprehensive study of real-world numerical bugs. Specifically, we identify and carefully examine 269 numerical bugs from five widely-used numerical software libraries: NumPy, SciPy, LAPACK, GNU Scientific Library, and Elemental. We propose a categorization of numerical bugs, and discuss their frequency, symptoms and fixes. Our study opens new directions in the areas of program analysis, testing, and automated program repair of numerical software, and provides a collection of real-world numerical bugs.
Article Search
A Comprehensive Study on Real World Concurrency Bugs in Node.js
Jie Wang, Wensheng Dou, Yu Gao, Chushu Gao, Feng Qin, Kang Yin, and Jun Wei
(Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Ohio State University, USA)
Node.js becomes increasingly popular in building server-side JavaScript applications. It adopts an event-driven model, which supports asynchronous I/O and non-deterministic event processing. This asynchrony and non-determinism can introduce intricate concurrency bugs, and leads to unpredictable behaviors. An in-depth understanding of real world concurrency bugs in Node.js applications will significantly promote effective techniques in bug detection, testing and fixing for Node.js. In this paper, we present NodeCB, a comprehensive study on real world concurrency bugs in Node.js applications. Specifically, we have carefully studied 57 real bug cases from open-source Node.js applications, and have analyzed their bug characteristics, e.g., bug patterns and root causes, bug impacts, bug manifestation, and fix strategies. Through this study, we obtain several interesting findings, which may open up many new research directions in combating concurrency bugs in Node.js. For example, one finding is that two thirds of the bugs are caused by atomicity violation. However, due to lack of locks and transaction mechanism, Node.js cannot easily express and guarantee the atomic intention.
Article Search

Source Code Analysis

Generating Simpler AST Edit Scripts by Considering Copy-and-Paste
Yoshiki Higo, Akio Ohtani, and Shinji Kusumoto
(Osaka University, Japan)
In software development, there are many situations in which developers need to understand given source code changes in detail. Until now, a variety of techniques have been proposed to support understanding source code changes. Tree-based differencing techniques are expected to have better understandability than text-based ones, which are widely used nowadays (e.g., diff in Unix). In this paper, we propose to consider copy-and-paste as a kind of editing action forming tree-based edit script, which is an editing sequence that transforms a tree to another one. Software developers often perform copy-and-paste when they are writing source code. Introducing copy-and-paste action into edit script contributes to not only making simpler (more easily understandable) edit scripts but also making edit scripts closer to developers' actual editing sequences. We conducted experiments on an open dataset. As a result, we confirmed that our technique made edit scripts shorter for 18% of the code changes with a little more computational time. For the other 82% code changes, our technique generated the same edit scripts as an existing technique. We also confirmed that our technique provided more helpful visualizations. Keywords: Understanding code changes, Edit scripts, Copy-and-paste
Article Search
Renaming and Shifted Code in Structured Merging: Looking Ahead for Precision and Performance
Olaf Leßenich, Sven Apel, Christian Kästner, Georg Seibt, and Janet Siegmund
(University of Passau, Germany; Carnegie Mellon University, USA)
Diffing and merging of source-code artifacts is an essential task when integrating changes in software versions. While state-of-the-art line-based tools (e.g., git merge) are fast and independent of the programming language used, they have only a low precision. Recently, it has been shown that the precision of merging can be substantially improved by using a language- aware, structured approach that works on abstract syntax trees. But, precise structured merging is NP hard, especially, when considering the notoriously difficult scenarios of renamings and shifted code. To address these scenarios without compromising scalability, we propose a syntax-aware, heuristic optimization for structured merging that employs a lookahead mechanism during tree matching. The key idea is that renamings and shifted code are not arbitrarily distributed but their occurrence follows patterns, which we address with a syntax-specific lookahead. Our experiments with 48 real-world open-source projects (4878 merge scenarios with over 400 million lines of code) demonstrate that we can significantly improve matching precision in 28 percent while maintaining performance.
Article Search
Semantics-Assisted Code Review: An Efficient Toolchain and a User Study
Massimiliano Menarini, Yan Yan, and William G. Griswold
(University of California at San Diego, USA)
Code changes are often reviewed before they are deployed. Popular source control systems aid code review by presenting textual differences between old and new versions of the code, leaving developers with the difficult task of determining whether the differences actually produced the desired behavior. Fortunately, we can mine such information from code repositories. We propose aiding code review with inter-version semantic differential analysis. During review of a new commit, a developer is presented with summaries of both code differences and behavioral differences, which are expressed as diffs of likely invariants extracted by running the system's test cases. As a result, developers can more easily determine that the code changes produced the desired effect. We created an invariant-mining tool chain, GETTY, to support our concept of semantically-assisted code review. To validate our approach, 1) we applied GETTY to the commits of 6 popular open source projects, 2) we assessed the performance and cost of running GETTY in different configurations, and 3) we performed a comparative user study with 18 developers. Our results demonstrate that semantically-assisted code review is feasible, effective, and that real programmers can leverage it to improve the quality of their reviews.
Article Search
Detecting Unknown Inconsistencies in Web Applications
Frolin S. Ocariza, Jr., Karthik Pattabiraman, and Ali Mesbah
(University of British Columbia, Canada)
Although there has been increasing demand for more reliable web applications, JavaScript bugs abound in web applications. In response to this issue, researchers have proposed automated fault detection tools, which statically analyze the web application code to find bugs. While useful, these tools either only target a limited set of bugs based on predefined rules, or they do not detect bugs caused by cross-language interactions, which occur frequently in web application code. To address this problem, we present an anomaly-based inconsistency detection approach, implemented in a tool called Holocron. The main novelty of our approach is that it does not look for hard-coded inconsistency classes. Instead, it applies subtree pattern matching to infer inconsistency classes and association rule mining to detect inconsistencies that occur both within a single language, and between two languages. We evaluated Holocron, and it successfully detected 51 previously unreported inconsistencies - including 18 bugs and 33 code smells - in 12 web applications.
Article Search
Why and How JavaScript Developers Use Linters
Kristín Fjóla Tómasdóttir, Maurício Aniche, and Arie van Deursen
(Delft University of Technology, Netherlands)
Automatic static analysis tools help developers to automatically spot code issues in their software. They can be of extreme value in languages with dynamic characteristics, such as JavaScript, where developers can easily introduce mistakes which can go unnoticed for a long time, e.g., a simple syntactic or spelling mistake. Although research has already shown how developers perceive such tools for strongly-typed languages such as Java, little is known about their perceptions when it comes to dynamic languages. In this paper, we investigate what motivates and how developers make use of such tools in JavaScript projects. To that goal, we apply a qualitative research method to conduct and analyze a series of 15 interviews with developers responsible for the linter configuration in reputable OSS JavaScript projects that apply the most commonly used linter, ESLint. The results describe the benefits that developers obtain when using ESLint, the different ways one can configure the tool and prioritize its rules, and the existing challenges in applying linters in the real world. These results have direct implications for developers, tool makers, and researchers, such as tool improvements, and a research agenda that aims to increase our knowledge about the usefulness of such analyzers.
Article Search

Symbolic Execution

Automatic Testing of Symbolic Execution Engines via Program Generation and Differential Testing
Timotej Kapus and Cristian Cadar
(Imperial College London, UK)
Symbolic execution has attracted significant attention in recent years, with applications in software testing, security, networking and more. Symbolic execution tools, like CREST, KLEE, FuzzBALL, and Symbolic PathFinder, have enabled researchers and practitioners to experiment with new ideas, scale the technique to larger applications and apply it to new application domains. Therefore, the correctness of these tools is of critical importance. In this paper, we present our experience extending compiler testing techniques to find errors in both the concrete and symbolic execution components of symbolic execution engines. The approach used relies on a novel way to create program versions, in three different testing modes---concrete, single-path and multi-path---each exercising different features of symbolic execution engines. When combined with existing program generation techniques and appropriate oracles, this approach enables differential testing within a single symbolic execution engine. We have applied our approach to the KLEE, CREST and FuzzBALL symbolic execution engines, where it has discovered 20 different bugs exposing a variety of important errors having to do with the handling of structures, division, modulo, casting, vector instructions and more, as well as issues related to constraint solving, compiler optimisations and test input replay.
Article Search
Floating-Point Symbolic Execution: A Case Study in N-Version Programming
Daniel Liew, Daniel Schemmel, Cristian Cadar, Alastair F. Donaldson, Rafael Zähl, and Klaus Wehrle
(Imperial College London, UK; RWTH Aachen University, Germany)
Symbolic execution is a well-known program analysis technique for testing software, which makes intensive use of constraint solvers. Recent support for floating-point constraint solving has made it feasible to support floating-point reasoning in symbolic execution tools. In this paper, we present the experience of two research teams that independently added floating-point support to KLEE, a popular symbolic execution engine. Since the two teams independently developed their extensions, this created the rare opportunity to conduct a rigorous comparison between the two implementations, essentially a modern case study on N-version programming. As part of our comparison, we report on the different design and implementation decisions taken by each team, and show their impact on a rigorously assembled and tested set of benchmarks, itself a contribution of the paper.
Article Search
Rethinking Pointer Reasoning in Symbolic Execution
Emilio Coppa, Daniele Cono D’Elia, and Camil Demetrescu
(Sapienza University of Rome, Italy)
Symbolic execution is a popular program analysis technique that allows seeking for bugs by reasoning over multiple alternative execution states at once. As the number of states to explore may grow exponentially, a symbolic executor may quickly run out of space. For instance, a memory access to a symbolic address may potentially reference the entire address space, leading to a combinatorial explosion of the possible resulting execution states. To cope with this issue, state-of-the-art executors concretize symbolic addresses that span memory intervals larger than some threshold. Unfortunately, this could result in missing interesting execution states, e.g., where a bug arises. In this paper we introduce MEMSIGHT, a new approach to symbolic memory that reduces the need for concretization, hence offering the opportunity for broader state explorations and more precise pointer reasoning. Rather than mapping address instances to data as previous tools do, our technique maps symbolic address expressions to data, maintaining the possible alternative states resulting from the memory referenced by a symbolic address in a compact, implicit form. A preliminary experimental investigation on prominent benchmarks from the DARPA Cyber Grand Challenge shows that MEMSIGHT enables the exploration of states unreachable by previous techniques.
Article Search Info
Leveraging Abstract Interpretation for Efficient Dynamic Symbolic Execution
Eman Alatawi, Harald Søndergaard, and Tim Miller
(University of Melbourne, Australia)
Dynamic Symbolic Execution (DSE) is a technique to automatically generate test inputs by executing a program with concrete and symbolic values simultaneously. A key challenge in DSE is scalability; executing all feasible program paths is not possible, owing to the potentially exponential or infinite number of paths. Loops are a main source of path explosion, in particular where the number of iterations depends on a program's input. Problems arise because DSE maintains symbolic values that capture only the dependencies on symbolic inputs. This ignores control dependencies, including loop dependencies that depend indirectly on the inputs. We propose a method to increase the coverage achieved by DSE in the presence of input-data dependent loops and loop dependent branches. We combine DSE with abstract interpretation to find indirect control dependencies, including loop and branch indirect dependencies. Preliminary results show that this results in better coverage, within considerably less time compared to standard DSE.
Article Search

Program Repair

Tortoise: Interactive System Configuration Repair
Aaron Weiss, Arjun Guha, and Yuriy Brun
(Northeastern University, USA; University of Massachusetts at Amherst, USA)
System configuration languages provide powerful abstractions that simplify managing large-scale, networked systems. Thousands of organizations now use configuration languages, such as Puppet. However, specifications written in configuration languages can have bugs and the shell remains the simplest way to debug a misconfigured system. Unfortunately, it is unsafe to use the shell to fix problems when a system configuration language is in use: a fix applied from the shell may cause the system to drift from the state specified by the configuration language. Thus, despite their advantages, configuration languages force system administrators to give up the simplicity and familiarity of the shell. This paper presents a synthesis-based technique that allows administrators to use configuration languages and the shell in harmony. Administrators can fix errors using the shell and the technique automatically repairs the higher-level specification written in the configuration language. The approach (1) produces repairs that are consistent with the fix made using the shell; (2) produces repairs that are maintainable by minimizing edits made to the original specification; (3) ranks and presents multiple repairs when relevant; and (4) supports all shells the administrator may wish to use. We implement our technique for Puppet, a widely used system configuration language, and evaluate it on a suite of benchmarks under 42 repair scenarios. The top-ranked repair is selected by humans 76% of the time and the human-equivalent repair is ranked 1.31 on average.
Article Search
Contract-Based Program Repair without the Contracts
Liushan Chen, Yu Pei, and Carlo A. Furia
(Hong Kong Polytechnic University, China; Chalmers University of Technology, Sweden)
Automated program repair (APR) is a promising approach to automatically fixing software bugs. Most APR techniques use tests to drive the repair process; this makes them readily applicable to realistic code bases, but also brings the risk of generating spurious repairs that overfit the available tests. Some techniques addressed the overfitting problem by targeting code using contracts (such as pre- and postconditions), which provide additional information helpful to characterize the states of correct and faulty computations; unfortunately, mainstream programming languages do not normally include contract annotations, which severely limits the applicability of such contract-based techniques. This paper presents JAID, a novel APR technique for Java programs, which is capable of constructing detailed state abstractions---similar to those employed by contract-based techniques---that are derived from regular Java code without any special annotations. Grounding the repair generation and validation processes on rich state abstractions mitigates the overfitting problem, and helps extend APR’s applicability: in experiments with the DEFECTS4J benchmark, a prototype implementation of JAID produced genuinely correct repairs, equivalent to those written by programmers, for 25 bugs---improving over the state of the art of comparable Java APR techniques in the number and kinds of correct fixes.
Article Search Info
ELIXIR: Effective Object Oriented Program Repair
Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad
(Fujitsu Labs, USA; University of Southern California, USA)
This work is motivated by the pervasive use of method invocations in object-oriented (OO) programs, and indeed their prevalence in patches of OO-program bugs. We propose a generate-and-validate repair technique, called ELIXIR designed to be able to generate such patches. ELIXIR aggressively uses method calls, on par with local variables, fields, or constants, to construct more expressive repair-expressions, that go into synthesizing patches. The ensuing enlargement of the repair space, on account of the wider use of method calls, is effectively tackled by using a machine-learnt model to rank concrete repairs. The machine-learnt model relies on four features derived from the program context, i.e., the code surrounding the potential repair location, and the bug report. We implement ELIXIR and evaluate it on two datasets, the popular Defects4J dataset and a new dataset Bugs.jar created by us, and against 2 baseline versions of our technique, and 5 other techniques representing the state of the art in program repair. Our evaluation shows that ELIXIR is able to increase the number of correctly repaired bugs in Defects4J by 85% (from 14 to 26) and by 57% in Bugs.jar (from 14 to 22), while also significantly out-performing other state-of-the-art repair techniques including ACS, HD-Repair, NOPOL, PAR, and jGenProg.
Article Search
Leveraging Syntax-Related Code for Automated Program Repair
Qi Xin and Steven P. Reiss
(Brown University, USA)
We present our automated program repair technique ssFix which leverages existing code (from a code database) that is syntax-related to the context of a bug to produce patches for its repair. Given a faulty program and a fault-exposing test suite, ssFix does fault localization to identify suspicious statements that are likely to be faulty. For each such statement, ssFix identifies a code chunk (or target chunk) including the statement and its local context. ssFix works on the target chunk to produce patches. To do so, it first performs syntactic code search to find candidate code chunks that are syntax-related, i.e., structurally similar and conceptually related, to the target chunk from a code database (or codebase) consisting of the local faulty program and an external code repository. ssFix assumes the correct fix to be contained in the candidate chunks, and it leverages each candidate chunk to produce patches for the target chunk. To do so, ssFix translates the candidate chunk by unifying the names used in the candidate chunk with those in the target chunk; matches the chunk components (expressions and statements) between the translated candidate chunk and the target chunk; and produces patches for the target chunk based on the syntactic differences that exist between the matched components and in the unmatched components. ssFix finally validates the patched programs generated against the test suite and reports the first one that passes the test suite. We evaluated ssFix on 357 bugs in the Defects4J bug dataset. Our results show that ssFix successfully repaired 20 bugs with valid patches generated and that it outperformed five other repair techniques for Java.
Article Search Info

Recommender Systems

Boosting Complete-Code Tool for Partial Program
Hao Zhong and Xiaoyin Wang
(Shanghai Jiao Tong University, China; University of Texas at San Antonio, USA)
To improve software quality, researchers and practitioners have proposed various static tools, for various purposes (e.g., detecting bugs, anomalies, and vulnerabilities). Although many such tools are quite powerful, they typically need complete code where all the code names are known. In many scenarios, researchers have to analyze partial code in bug fixes/reports, tutorials, and forums. Partial code is a subset of complete code, and many code names of partial code may be unknown. As a result, although partial code is often syntactically correct, existing complete-code tools cannot analyze partial code. To automate the analysis on partial code, some tools have been implemented. However, due to various limitations, tools for partial code are limited in both their number and analysis capability. Instead of proposing another tool for partial code analysis, we propose a general approach, called GRAPA, that boosts existing tools for complete code to analyze partial code. Our major insight is that after unknown bindings are resolved, tools for complete code can analyze partial code with minor modifications. In particular, GRAPA locates Java archive files to resolve unknown bindings, and resolves the remaining unknown bindings from resolved bindings. To illustrate GRAPA, we implement a tool that leverages the state-of-the-art tool, WALA, to analyze Java partial code. We thus implemented the first tool that is able to build system dependency graphs for partial code, complementing existing tools for partial code analysis. We conduct an evaluation on 8,198 partial-code commits from four popular open source projects. Our results show that GRAPA fully resolved unknown code names for 98.5% bug fixes, with an accuracy of 96.1% in total. Furthermore, our results show the significance of GRAPA’s internal techniques, which provides insights on how to integrate with more complete-code tools to analyze partial code.
Article Search
A Language Model for Statements of Software Code
Yixiao Yang, Yu Jiang, Ming Gu, Jiaguang Sun, Jian Gao, and Han Liu
(Tsinghua University, China)
Building language models for source code enables a large set of improvements on traditional software engineering tasks. One promising application is automatic code completion. State-of-the-art techniques capture code regularities at token level with lexical information. Such language models are more suitable for predicting short token sequences, but become less effective with respect to long statement level predictions. In this paper, we have proposed PCC to optimize the token level based language modeling. Specifically, PCC introduced an intermediate representation (IR) for source code, which puts tokens into groups using lexeme and variable relative order. In this way, PCC is able to handle long token sequences, i.e., group sequences, to suggest a complete statement with the precise synthesizer. Further more, PCC employed a fuzzy matching technique which combined genetic and longest common sub-sequence algorithms to make the prediction more accurate. We have implemented a code completion plugin for Eclipse and evaluated it on open-source Java projects. The results have demonstrated the potential of PCC in generating precise long statement level predictions. In 30%-60% of the cases, it can correctly suggest the complete statement with only six candidates, and 40%-90% of the cases with ten candidates.
Article Search
Context-Aware Integrated Development Environment Command Recommender Systems
Marko Gasparic, Tural Gurbanov, and Francesco Ricci
(Free University of Bolzano, Italy)
Integrated development environments (IDEs) are complex applications that integrate multiple tools for creating and manipulating software project artifacts. To improve users’ knowledge and the effectiveness of usage of the available functionality, the inclusion of recommender systems into IDEs has been proposed. We present a novel IDE command recommendation algorithm that, by taking into account the contexts in which a developer works and in which different commands are usually executed, is able to provide relevant recommendations. We performed an empirical comparison of the proposed algorithm with state-of-the-art IDE command recommenders on a real-world data set. The algorithms were evaluated in terms of precision, recall, F1, k-tail, and with a new evaluation metric that is specifically measuring the usefulness of contextual recommendations. The experiments revealed that in terms of the contextual relevance and usefulness of recommendations the proposed algorithm outperforms existing algorithms.
Article Search
Predicting Relevance of Change Recommendations
Thomas Rolfsnes, Leon Moonen, and David Binkley
(Simula Research Laboratory, Norway; Loyola University Maryland, USA)

Software change recommendation seeks to suggest artifacts (e.g., files or methods) that are related to changes made by a developer, and thus identifies possible omissions or next steps. While one obvious challenge for recommender systems is to produce accurate recommendations, a complimentary challenge is to rank recommendations based on their relevance. In this paper, we address this challenge for recommendation systems that are based on evolutionary coupling. Such systems use targeted association-rule mining to identify relevant patterns in a software system’s change history. Traditionally, this process involves ranking artifacts using interestingness measures such as confidence and support. However, these measures often fall short when used to assess recommendation relevance.

We propose the use of random forest classification models to assess recommendation relevance. This approach improves on past use of various interestingness measures by learning from previous change recommendations. We empirically evaluate our approach on fourteen open source systems and two systems from our industry partners. Furthermore, we consider complimenting two mining algorithms: Co-Change and Tarmaq. The results find that random forest classification significantly outperforms previous approaches, receives lower Brier scores, and has superior trade-off between precision and recall. The results are consistent across software system and mining algorithm.

Article Search Info
AnswerBot: Automated Generation of Answer Summary to Developers’ Technical Questions
Bowen Xu, Zhenchang Xing, Xin Xia, and David Lo
(Zhejiang University, China; Singapore Management University, Singapore; Australian National University, Australia; University of British Columbia, Canada)
The prevalence of questions and answers on domain-specific Q&A sites like Stack Overflow constitutes a core knowledge asset for software engineering domain. Although search engines can return a list of questions relevant to a user query of some technical question, the abundance of relevant posts and the sheer amount of information in them makes it difficult for developers to digest them and find the most needed answers to their questions. In this work, we aim to help developers who want to quickly capture the key points of several answer posts relevant to a technical question before they read the details of the posts. We formulate our task as a query-focused multi-answer-posts summarization task for a given technical question. Our proposed approach AnswerBot contains three main steps : 1) relevant question retrieval, 2) useful answer paragraph selection, 3) diverse answer summary generation. To evaluate our approach, we build a repository of 228,817 Java questions and their corresponding answers from Stack Overflow. We conduct user studies with 100 randomly selected Java questions (not in the question repository) to evaluate the quality of the answer summaries generated by our approach and the effectiveness of its relevant question retrieval and answer paragraph selection components. Our evaluation shows that answer summaries generated by our approach are relevant, useful and diverse to developers’ technical questions, and its components can effectively retrieve relevant questions and select salient answer paragraphs for summarization.
Article Search
Recommending Crowdsourced Software Developers in Consideration of Skill Improvement
Zizhe Wang, Hailong Sun, Yang Fu, and Luting Ye
(Beihang University, China)
Finding suitable developers for a given task is critical and challenging for successful crowdsourcing software development. In practice, the development skills will be improved as developers conduct more development tasks. Prior studies on crowdsourcing developer recommendation do not consider the changing of skills, which can underestimate developers' skill to fulfill a task. In this work, we first conducted an empirical study of the performance of 7,620 developers on TopCoder. With a difficulty-weighted algorithm, we re-compute the scores of each developer by eliminating the effect of task difficulty from the performance. We find out that the skill improvement of TopCoder developers can be fitted well with the negative exponential learning curve model. Second, we design a skill prediction method based on the learning curve. Then we propose a skill improvement aware framework for recommending developers for crowdsourcing software development.
Article Search
The Rise of the (Modelling) Bots: Towards Assisted Modelling via Social Networks
Sara Pérez-Soler, Esther Guerra, Juan de Lara, and Francisco Jurado
(Autonomous University of Madrid, Spain)
We are witnessing a rising role of mobile computing and social networks to perform all sorts of tasks. This way, social networks like Twitter or Telegram are used for leisure, and they frequently serve as a discussion media for work-related activities. In this paper, we propose taking advantage of social networks to enable the collaborative creation of models by groups of users. The process is assisted by modelling bots that orchestrate the collaboration and interpret the users' inputs (in natural language) to incrementally build a (meta-)model. The advantages of this modelling approach include ubiquity of use, automation, assistance, natural user interaction, traceability of design decisions, possibility to incorporate coordination protocols, and seamless integration with the user's normal daily usage of social networks. We present a prototype implementation called SOCIO, able to work over several social networks like Twitter and Telegram, and a preliminary evaluation showing promising results.
Article Search Info


UNDEAD: Detecting and Preventing Deadlocks in Production Software
Jinpeng Zhou, Sam Silvestro, Hongyu Liu, Yan Cai, and Tongping Liu
(University of Texas at San Antonio, USA; Institute of Software at Chinese Academy of Sciences, China)
Deadlocks are critical problems afflicting parallel applications, causing software to hang with no further progress. Existing detection tools suffer not only from significant recording performance overhead, but also from excessive memory and/or storage overhead. In addition, they may generate numerous false alarms. Subsequently, after problems have been reported, tremendous manual effort is required to confirm and fix these deadlocks. This paper designs a novel system, UNDEAD, that helps defeat deadlocks in production software. Different from existing detection tools, UNDEAD imposes negligible runtime performance overhead (less than 3% on average) and small memory overhead (around 6%), without any storage consumption. After detection, UNDEAD automatically strengthens erroneous programs to prevent future occurrences of both existing and potential deadlocks, which is similar to the existing work—Dimmunix. However, UNDEAD exceeds Dimmunix with several orders of magnitude lower performance overhead, while eliminating numerous false positives. Extremely low runtime and memory overhead, convenience, and automatic prevention make UNDEAD an always-on detection tool, and a “band-aid” prevention system for production software.
Article Search
Promoting Secondary Orders of Event Pairs in Randomized Scheduling using a Randomized Stride
Mahmoud Abdelrasoul
(North Carolina State University, USA)
Because of the wide use of randomized scheduling in concurrency testing research, it is important to understand randomized scheduling and its limitations. This work analyzes how randomized scheduling discovers concurrency bugs by focusing on the probabilities of the two possible orders of a pair of events. Analysis shows that the disparity between probabilities can be large for programs that encounter a large number of events during execution. Because sets of ordered event pairs define conditions for discovering concurrency bugs, this disparity can make some concurrency bugs highly unlikely. The complementary nature of the two possible orders also indicates a potential trade-off between the probability of discovering frequently-occurring and infrequently-occurring concurrency bugs. To help address this trade-off in a more balanced way, randomized-stride scheduling is proposed, where scheduling granularity for each thread is adjusted using a randomized stride calculated based on thread length. With some assumptions, strides can be calculated to allow covering the least likely event pair orders. Experiments confirm the analysis results and also suggest that randomized-stride scheduling is more effective for discovering concurrency bugs compared to the original randomized scheduling implementation, and compared to other algorithms in recent literature.
Article Search
Parallel Bug-Finding in Concurrent Programs via Reduced Interleaving Instances
Truc L. Nguyen, Peter Schrammel, Bernd Fischer, Salvatore La Torre, and Gennaro Parlato
(University of Southampton, UK; University of Sussex, UK; Stellenbosch University, South Africa; University of Salerno, Italy)
Concurrency poses a major challenge for program verification, but it can also offer an opportunity to scale when subproblems can be analysed in parallel. We exploit this opportunity here and use a parametrizable code-to-code translation to generate a set of simpler program instances, each capturing a reduced set of the original program’s interleavings. These instances can then be checked independently in parallel. Our approach does not depend on the tool that is chosen for the final analysis, is compatible with weak memory models, and amplifies the effectiveness of existing tools, making them find bugs faster and with fewer resources. We use Lazy-CSeq as an off-the-shelf final verifier to demonstrate that our approach is able, already with a small number of cores, to find bugs in the hardest known concurrency benchmarks in a matter of minutes, whereas other dynamic and static tools fail to do so in hours.
Article Search Info
Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications
Gustavo Pinto, Anthony Canino, Fernando Castor, Guoqing Xu, and Yu David Liu
(Federal University of Paríç, Brazil; SUNY Binghamton, USA; Federal University of Pernambuco, Brazil; University of California at Irvine, USA)
ForkJoin framework is a widely used parallel programming framework upon which both core concurrency libraries and real-world applications are built. Beneath its simple and user-friendly APIs, ForkJoin is a sophisticated managed parallel runtime unfamiliar to many application programmers: the framework core is a *work-stealing* scheduler, handles *fine-grained* tasks, and sustains the pressure from *automatic* memory management. ForkJoin poses a unique gap in the compute stack between high-level software engineering and low-level system optimization. Understanding and bridging this gap is crucial for the future of parallelism support in JVM-supported applications. This paper describes a comprehensive study on parallelism bottlenecks in ForkJoin applications, with a unique focus on how they interact with underlying system-level features, such as work stealing and memory management. We identify 6 bottlenecks, and found that refactoring them can significantly improve performance and energy efficiency. Our field study includes an in-depth analysis of Akka --- a real-world actor framework --- and 30 additional open-source ForkJoin projects. We sent our patches to the developers of 15 projects, and 7 out of the 9 projects that replied to our patches have accepted them.
Article Search
Quick Verification of Concurrent Programs by Iteratively Relaxed Scheduling
Patrick Metzler, Habib Saissi, Péter Bokor, and Neeraj Suri
(TU Darmstadt, Germany)

The most prominent advantage of software verification over testing is a rigorous check of every possible software behavior. However, large state spaces of concurrent systems, due to non-deterministic scheduling, result in a slow automated verification process. Therefore, verification introduces a large delay between completion and deployment of concurrent software.

This paper introduces a novel iterative approach to verification of concurrent programs that drastically reduces this delay. By restricting the execution of concurrent programs to a small set of admissible schedules, verification complexity and time is drastically reduced. Iteratively adding admissible schedules after their verification eventually restores non-deterministic scheduling. Thereby, our framework allows to find a sweet spot between a low verification delay and sufficient execution time performance. Our evaluation of a prototype implementation on well-known benchmark programs shows that after verifying only few schedules of the program, execution time overhead is competitive to existing deterministic multi-threading frameworks.

Article Search

Program Synthesis

Automatic Loop-Invariant Generation and Refinement through Selective Sampling
Jiaying Li, Jun Sun, Li Li, Quang Loc Le, and Shang-Wei Lin
(Singapore University of Technology and Design, Singapore; Teesside University, UK; Nanyang Technological University, Singapore)
Automatic loop-invariant generation is important in program analysis and verification. In this paper, we propose to generate loop-invariants automatically through learning and verification. Given a Hoare triple of a program containing a loop, we start with randomly testing the program, collect program states at run-time and categorize them based on whether they satisfy the invariant to be discovered. Next, classification techniques are employed to generate a candidate loop-invariant automatically. Afterwards, we refine the candidate through selective sampling so as to overcome the lack of sufficient test cases. Only after a candidate invariant cannot be improved further through selective sampling, we verify whether it can be used to prove the Hoare triple. If it cannot, the generated counterexamples are added as new tests and we repeat the above process. Furthermore, we show that by introducing a path-sensitive learning, i.e., partitioning the program states according to program locations they visit and classifying each partition separately, we are able to learn disjunctive loop-invariants. In order to evaluate our idea, a prototype tool has been developed and the experiment results show that our approach complements existing approaches.
Article Search
FiB: Squeezing Loop Invariants by Interpolation between Forward/Backward Predicate Transformers
Shang-Wei Lin, Jun Sun, Hao Xiao, Yang Liu, David Sanán, and Henri Hansen
(Nanyang Technological University, Singapore; Singapore University of Technology and Design, Singapore; Tampere University of Technology, Finland)
Loop invariant generation is a fundamental problem in program analysis and verification. In this work, we propose a new approach to automatically constructing inductive loop invariants. The key idea is to aggressively squeeze an inductive invariant based on Craig interpolants between forward and backward reachability analysis. We have evaluated our approach by a set of loop benchmarks, and experimental results show that our approach is promising.
Article Search
SymInfer: Inferring Program Invariants using Symbolic States
ThanhVu Nguyen, Matthew B. Dwyer, and Willem Visser
(University of Nebraska-Lincoln, USA; Stellenbosch University, South Africa)
We introduce a new technique for inferring program invariants that uses symbolic states generated by symbolic execution. Symbolic states, which consist of path conditions and constraints on local variables, are a compact description of sets of concrete program states and they can be used for both invariant inference and invariant verification. Our technique uses a counterexample-based algorithm that creates concrete states from symbolic states, infers candidate invariants from concrete states, and then verifies or refutes candidate invariants using symbolic states. The refutation case produces concrete counterexamples that prevent spurious results and allow the technique to obtain more precise invariants. This process stops when the algorithm reaches a stable set of invariants. We present SymInfer, a tool that implements these ideas to automatically generate invariants at arbitrary locations in a Java program. The tool obtains symbolic states from Symbolic PathFinder and uses existing algorithms to infer complex (potentially nonlinear) numerical invariants. Our preliminary results show that SymInfer is effective in using symbolic states to generate precise and useful invariants for proving program safety and analyzing program runtime complexity. We also show that SymInfer outperforms existing invariant generation systems.
Article Search Info
Parsimony: An IDE for Example-Guided Synthesis of Lexers and Parsers
Alan Leung and Sorin Lerner
(University of California at San Diego, USA)
We present Parsimony, a programming-by-example development environment for synthesizing lexers and parsers by example. Parsimony provides a graphical interface in which the user presents examples simply by selecting and labeling sample text in a text editor. An underlying synthesis engine then constructs syntactic rules to solve the system of constraints induced by the supplied examples. Parsimony is more expressive and usable than prior programming-by-example systems for parsers in several ways: Parsimony can (1) synthesize lexer rules in addition to productions, (2) solve for much larger constraint systems over multiple examples, rather than handling examples one-at-a-time, and (3) infer much more complex sets of productions, such as entire algebraic expression grammars, by detecting instances of well-known grammar design patterns. The results of a controlled user study across 18 participants show that users are able to perform lexing and parsing tasks faster and with fewer mistakes when using Parsimony as compared to a traditional parsing workflow.
Article Search
Mining Constraints for Event-based Monitoring in Systems of Systems
Thomas Krismayer, Rick Rabiser, and Paul Grünbacher
(JKU Linz, Austria)
The full behavior of software-intensive systems of systems (SoS) emerges during operation only. Runtime monitoring approaches have thus been proposed to detect deviations from the expected behavior. They commonly rely on temporal logic or domain-specific languages to formally define requirements, which are then checked by analyzing the stream of monitored events and event data. Some approaches also allow developers to generate constraints from declarative specifications of the expected behavior. However, independent of the approach, deep domain knowledge is required to specify the desired behavior. This knowledge is often not accessible in SoS environments with multiple development teams independently working on different, heterogeneous systems. In this New Ideas Paper we thus describe an approach that automatically mines constraints for runtime monitoring from event logs recorded in SoS. Our approach builds on ideas from specification mining, process mining, and machine learning to mine different types of constraints on event occurrence, event timing, and event data. The approach further presents the mined constraints to users in an existing constraint language and it ranks the constraints using different criteria. We demonstrate the feasibility of our approach by applying it to event logs from a real-world industrial SoS.
Article Search
Programming Bots by Synthesizing Natural Language Expressions into API Invocations
Shayan Zamanirad, Boualem Benatallah, Moshe Chai Barukh, Fabio Casati, and Carlos Rodriguez
(UNSW, Australia; University of Trento, Italy; Tomsk Polytechnic University, Russia)
At present, bots are still in their preliminary stages of development. Many are relatively simple, or developed ad-hoc for a very specific use-case. For this reason, they are typically programmed manually, or utilize machine-learning classifiers to interpret a fixed set of user utterances. In reality, real world conversations with humans require support for dynamically cap- turing users expressions. Moreover, bots will derive immeasurable value by programming them to invoke APIs for their results. Today, within the Web and Mobile development community, complex applications are being stringed together with a few lines of code – all made possible by APIs. Yet, developers today are not as empowered to program bots in much the same way. To overcome this, we introduce BotBase, a bot programming platform that dynamically synthesizes natural language user expressions into API invocations. Our solution is two faceted: Firstly, we construct an API knowledge graph to encode and evolve APIs; secondly, leveraging the above we apply techniques in NLP, ML and Entity Recognition to perform the required synthesis from natural language user expressions into API calls.
Article Search


Test Suite Parallelization in Open-Source Projects: A Study on Its Usage and Impact
Jeanderson Candido, Luis Melo, and Marcelo d’Amorim
(Federal University of Pernambuco, Brazil)
Dealing with high testing costs remains an important problem in Software Engineering. Test suite parallelization is an important approach to address this problem. This paper reports our findings on the usage and impact of test suite parallelization in open-source projects. It provides recommendations to practitioners and tool developers to speed up test execution. Considering a set of 468 popular Java projects we analyzed, we found that 24% of the projects contain costly test suites but parallelization features still seem underutilized in practice — only 19.1% of costly projects use parallelization. The main reported reason for adoption resistance was the concern to deal with concurrency issues. Results suggest that, on average, developers prefer high predictability than high performance in running tests.
Article Search Info
Systematic Reduction of GUI Test Sequences
Lin Cheng, Zijiang Yang, and Chao Wang
(Western Michigan University, USA; University of Southern California, USA)
Graphic user interface (GUI) is an integral part of many software applications. However, GUI testing remains a challenging task. The main problem is to generate a set of high-quality test cases, i.e., sequences of user events to cover the often large input space. Since manually crafting event sequences is labor-intensive and automated testing tools often have poor performance, we propose a new GUI testing framework to efficiently generate progressively longer event sequences while avoiding redundant sequences. Our technique for identifying the redundancy among these sequences relies on statically checking a set of simple and syntactic-level conditions, whose reduction power matches and sometimes exceeds that of classic techniques based on partial order reduction. We have evaluated our method on 17 Java Swing applications. Our experimental results show the new technique, while being sound and systematic, can achieve more than 10X reduction in the number of test sequences compared to the state-of-the-art GUI testing tools.
Article Search
Automatically Reducing Tree-Structured Test Inputs
Satia Herfert, Jibesh Patra, and Michael Pradel
(TU Darmstadt, Germany)
Reducing the test input given to a program while preserving some property of interest is important, e.g., to localize faults or to reduce test suites. The well-known delta debugging algorithm and its derivatives automate this task by repeatedly reducing a given input. Unfortunately, these approaches are limited to blindly removing parts of the input and cannot reduce the input by restructuring it. This paper presents the Generalized Tree Reduction (GTR) algorithm, an effective and efficient technique to reduce arbitrary test inputs that can be represented as a tree, such as program code, PDF files, and XML documents. The algorithm combines tree transformations with delta debugging and a greedy backtracking algorithm. To reduce the size of the considered search space, the approach automatically specializes the tree transformations applied by the algorithm based on examples of input trees. We evaluate GTR by reducing Python files that cause interpreter crashes, JavaScript files that cause browser inconsistencies, PDF documents with malicious content, and XML files used to tests an XML validator. The GTR algorithm reduces the trees of these files to 45.3%, 3.6%, 44.2%, and 1.3% of the original size, respectively, outperforming both delta debugging and another state-of-the-art algorithm.
Article Search
Synthetic Data Generation for Statistical Testing
Ghanem Soltana, Mehrdad Sabetzadeh, and Lionel C. Briand
(University of Luxembourg, Luxembourg)
Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable of simultaneously fulfilling the statistical representativeness and logical validity requirements.
Article Search Info

Tool Demonstrations

Visualization, Models, and Synthesis

SEALANT: A Detection and Visualization Tool for Inter-app Security Vulnerabilities in Android
Youn Kyu Lee, Peera Yoodee, Arman Shahbazian, Daye Nam, and Nenad Medvidovic
(University of Southern California, USA)
Android’s flexible communication model allows interactions among third-party apps, but it also leads to inter-app security vulnerabilities. Specifically, malicious apps can eavesdrop on interactions between other apps or exploit the functionality of those apps, which can expose a user’s sensitive information to attackers. While the state-of-the-art tools have focused on detecting inter-app vulnerabilities in Android, they neither accurately analyze realistically large numbers of apps nor effectively deliver the identified issues to users. This paper presents SEALANT, a novel tool that combines static analysis and visualization techniques that, together, enable accurate identification of inter-app vulnerabilities as well as their systematic visualization. SEALANT statically analyzes architectural information of a given set of apps, infers vulnerable communication channels where inter-app attacks can be launched, and visualizes the identified information in a compositional representation. SEALANT has been demonstrated to accurately identify inter-app vulnerabilities from hundreds of real-world Android apps and to effectively deliver the identified information to users. (Demo Video:
Article Search Video
Visualization Support for Requirements Monitoring in Systems of Systems
Lisa Maria Kritzinger, Thomas Krismayer, Michael Vierhauser, Rick Rabiser, and Paul Grünbacher
(JKU Linz, Austria; University of Notre Dame, USA)
Industrial software systems are often systems of systems (SoS) whose full behavior only emerges at runtime. The systems and their interactions thus need to be continuously monitored and checked during operation to determine compliance with requirements. Many requirements monitoring approaches have been proposed. However, only few of these come with tools that present and visualize monitoring results and details on requirements violations to end users such as industrial engineers. In this tool demo paper we present visualization capabilities we have been developing motivated by industrial scenarios. Our tool complements REMINDS, an existing requirements monitoring framework, which supports collecting, aggregating, and analyzing events and event data in architecturally heterogeneous SoS. Our visualizations support a 'drill-down' scenario for monitoring and diagnosis: starting from a graphical status overview of the monitored systems and their relations, engineers can view trends and statistics about performed analyses and diagnose the root cause of problems by inspecting the events and event data that led to a specific violation. Initial industry feedback we received confirms the usefulness of our tool support. Demo video:
Article Search
A Demonstration of Simultaneous Execution and Editing in a Development Environment
Steven P. Reiss and Qi Xin
(Brown University, USA)
We introduce a tool within the Code Bubbles development environment that allows for continuous execution as the programmer edits. The tool, SEEDE, shows both the intermediate and final results of execution in terms of variables, control flow, output, and graphics. These results are updated as the user edits. The user can explore the execution to find or fix bugs or use the intermediate values to help write appropriate code. A demonstration video is available at
Article Search Video
TREM: A Tool for Mining Timed Regular Specifications from System Traces
Lukas Schmidt, Apurva Narayan, and Sebastian Fischmeister
(University of Waterloo, Canada)
Software specifications are useful for software validation, model checking, runtime verification, debugging, monitoring, etc. In context of safety-critical real-time systems, temporal properties play an important role. However, temporal properties are rarely present due to the complexity and evolutionary nature of software systems. We propose Timed Regular Expression Mining (TREM) a hosted tool for specification mining using timed regular expressions (TREs). It is designed for easy and robust mining of dominant temporal properties. TREM uses an abstract structure of the property; the framework constructs a finite state machine to serve as an acceptor. TREM is scalable, easy to access/use, and platform independent specification mining framework. The tool is tested on industrial strength software system traces such as the QNX real-time operating system using traces with more than 1.5 Million entries. The tool demonstration video can be accessed here:
Article Search Video
ModelWriter: Text and Model-Synchronized Document Engineering Platform
Ferhat Erata, Claire Gardent, Bikash Gyawali, Anastasia Shimorina, Yvan Lussaud, Bedir Tekinerdogan, Geylani Kardas, and Anne Monceaux
(Wageningen University and Research, Netherlands; UNIT Information Technologies, Turkey; CNRS, France; OBEO, France; Ege University, Turkey; KoçSistem Information and Communication Services, Turkey; Airbus Group Innovations, France)
The ModelWriter platform provides a generic framework for automated traceability analysis. In this paper, we demonstrate how this framework can be used to trace the consistency and completeness of technical documents that consist of a set of System Installation Design Principles used by Airbus to ensure the correctness of aircraft system installation. We show in particular, how the platform allows the integration of two types of reasoning: reasoning about the meaning of text using semantic parsing and description logic theorem proving; and reasoning about document structure using first-order relational logic and finite model finding for traceability analysis.
Article Search Video Info
Incrementally Slicing Editable Submodels
Christopher Pietsch, Manuel Ohrndorf, Udo Kelter, and Timo Kehrer
(University of Siegen, Germany; Humboldt University of Berlin, Germany)
Model slicers are tools which provide two services: (a)finding parts of interest in a model and (b) displaying these parts somehow or extract these parts as a new, autonomous model, which is referred to as slice or sub-model. This paper focuses on the creation of editable slices, which can be processed by model editors, analysis tools, model management tools etc. Slices are useful if, e.g., only a part of a large model shall be analyzed, compared or processed by time-consuming algorithms, or if sub-models shall be modified independently. We present a new generic incremental slicer which can slice models of arbitrary type and which creates slices which are consistent in the sense that they are editable by standard editors. It is built on top of a model differencing framework and does not require additional configuration data beyond those available in the differencing framework. The slicer can incrementally extend or reduce an existing slice if model elements shall be added or removed, even if the slice has been edited meanwhile. We demonstrate the usefulness of our slicer in several scenarios using a large UML model. A screencast of the demonstrated scenarios is provided at
Article Search Video Info
DSSynth: An Automated Digital Controller Synthesis Tool for Physical Plants
Alessandro Abate, Iury Bessa, Dario Cattaruzza, Lennon Chaves, Lucas Cordeiro, Cristina David, Pascal Kesseli, Daniel Kroening, and Elizabeth Polgreen
(University of Oxford, UK; Federal University of Amazonas, Brazil)
We present an automated MATLAB Toolbox, named DSSynth (Digital-System Synthesizer), to synthesize sound digital controllers for physical plants that are represented as linear time-invariant systems with single input and output. In particular, DSSynth synthesizes digital controllers that are sound w.r.t. stability and safety specifications. DSSynth considers the complete range of approximations, including time discretization, quantization effects and finite-precision arithmetic (and its rounding errors). We demonstrate the practical value of this toolbox by automatically synthesizing stable and safe controllers for intricate physical plant models from the digital control literature. The resulting toolbox enables the application of program synthesis to real-world control engineering problems.
Article Search Video Info

Analysis and Testing

A Static Analysis Tool with Optimizations for Reachability Determination
Yuexing Wang, Min Zhou, Yu Jiang, Xiaoyu Song, Ming Gu, and Jiaguang Sun
(Tsinghua University, China; Portland State University, USA)
To reduce the false positives of static analysis, many tools collect path constraints and integrate SMT solvers to filter unreachable execution paths. However, the accumulated calling and computing of SMT solvers are time and resource consuming. This paper presents TsmartLW, an alternate static analysis tool in which we implement a path constraint solving engine to speed up reachability determination. Within the engine, typical types of constraint-patterns are firstly defined based on an empirical study of a large number of code repositories. For each pattern, a constraint solving algorithm is designed and implemented. For each program, the engine predicts the most suitable strategy and then applies the strategy to solve path constraints. The experimental results on some well-known benchmarks and real-world applications show that TsmartLW is faster than some state-of-the-art static analysis tools. For example, it is 1.32x faster than CPAchecker and our engine is 369x faster than SMT solvers in solving path constraints.
Article Search Video
CogniCrypt: Supporting Developers in Using Cryptography
Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, and Ram Kamath
(University of Paderborn, Germany; University of Alberta, Canada; TU Darmstadt, Germany; Fraunhofer IEM, Germany)
Previous research suggests that developers often struggle using low-level cryptographic APIs and, as a result, produce insecure code. When asked, developers desire, among other things, more tool support to help them use such APIs. In this paper, we present CogniCrypt, a tool that supports developers with the use of cryptographic APIs. CogniCrypt assists the developer in two ways. First, for a number of common cryptographic tasks, CogniCrypt generates code that implements the respective task in a secure manner. Currently, CogniCrypt supports tasks such as data encryption, communication over secure channels, and long-term archiving. Second, CogniCrypt continuously runs static analyses in the background to ensure a secure integration of the generated code into the developer’s workspace. This video demo showcases the main features of CogniCrypt:
Article Search Video
BProVe: Tool Support for Business Process Verification
Flavio Corradini, Fabrizio Fornari, Andrea Polini, Barbara Re, Francesco Tiezzi, and Andrea Vandin
(University of Camerino, Italy; DTU, Denmark)
This demo introduces BProVe, a tool supporting automated verification of Business Process models. BProVe analysis is based on a formal operational semantics defined for the BPMN 2.0 modelling language, and is provided as a freely accessible service that uses open standard formats as input data. Furthermore a plug-in for the Eclipse platform has been developed making available a tool chain supporting users in modelling and visualising, in a friendly manner, the results of the verification. Finally we have conducted a validation through more than one thousand models, showing the effectiveness of our verification tool in practice. (Demo video:
Article Search Video
taco: A Tool to Generate Tensor Algebra Kernels
Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe
(Massachusetts Institute of Technology, USA; CEA, France; Adobe, USA)
Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and optimize libraries. Furthermore, the tensors are often sparse (most components are zero), which means the code has to traverse compressed formats. To support programmers we have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions. This paper describes the taco web and command-line tools and discusses the benefits of a code generator over a traditional library. See also the demo video at
Article Search Video Info
STARTS: STAtic Regression Test Selection
Owolabi Legunsen, August Shi, and Darko Marinov
(University of Illinois at Urbana-Champaign, USA)
Regression testing is an important part of software development, but it can be very time consuming. Regression test selection (RTS) aims to speed up regression testing by running only impacted tests—the subset of tests that can change behavior due to code changes. We present STARTS, a tool for STAtic Regression Test Selection. Unlike dynamic RTS, STARTS requires no code instrumentation or runtime information to find impacted tests; instead, STARTS uses only compile-time information. Specifically, STARTS builds a dependency graph of program types and finds, as impacted, tests that can reach some changed type in the transitive closure of the dependency graph. STARTS is a Maven plugin that can be easily integrated into any Maven- based Java project. We find that STARTS selects on average 35.2% of tests, leading to an end-to-end runtime that is on average 81.0% of running all the tests. A video demo of STARTS can be found at
Article Search
EventFlowSlicer: A Tool for Generating Realistic Goal-Driven GUI Tests
Jonathan A. Saddler and Myra B. Cohen
(University of Nebraska-Lincoln, USA)
Most automated testing techniques for graphical user interfaces (GUIs) produce test cases that are only concerned with covering the elements (widgets, menus, etc.) on the interface, or the underlying program code, with little consideration of test case semantics. This is effective for functional testing where the aim is to find as many faults as possible. However, when one wants to mimic a real user for evaluating usability, or when it is necessary to extensively test important end-user tasks of a system, or to generate examples of how to use an interface, this generation approach fails. Capture and replay techniques can be used, however there are often multiple ways to achieve a particular goal, and capturing all of these is usually too time consuming and unrealistic. Prior work on human performance regression testing introduced a constraint based method to filter test cases created by a functional test case generator, however that work did not capture the specifications, or directly generate only the required tests and considered only a single type of test goal. In this paper we present EventFlowSlicer, a tool that allows the GUI tester to specify and generate all realistic test cases relevant to achieve a stated goal. The user first captures relevant events on the interface, then adds constraints to provide restrictions on the task. An event flow graph is extracted containing only the widgets of interest for that goal. Next all test cases are generated for edges in the graph which respect the constraints. The test cases can then be replayed using a modified version of GUITAR. A video demonstration of EventFlowSlicer can be found at
Article Search Video
ANDROFLEET: Testing WiFi Peer-to-Peer Mobile Apps in the Large
Lakhdar Meftah, Maria Gomez, Romain Rouvoy, and Isabelle Chrisment
(Inria, France; University of Lille, France; Saarland University, Germany; Institut Universitaire de France, France; Telecom Nancy, France)
WiFi P2P allows mobile apps to connect to each other via WiFi without an intermediate access point. This communication mode is widely used by mobile apps to support interactions with one or more devices simultaneously. However, testing such P2P apps remains a challenge for app developers as i) existing testing frameworks lack support for WiFi P2P, and ii) WiFi P2P testing fails to scale when considering a deployment on more than two devices. In this paper, we therefore propose an acceptance testing framework, named ANDROFLEET, to automate testing of WiFi P2P mobile apps at scale. Beyond the capability of testing point-to-point interactions under various conditions, ANDROFLEET supports the deployment and the emulation of a fleet of mobile devices as part of an alpha testing phase in order to assess the robustness of a WiFi P2P app once deployed in the field. To validate ANDROFLEET, we demonstrate the detection of failing black-box acceptance tests for WiFi P2P apps and we capture the conditions under which such a mobile app can correctly work in the field. The demo video of ANDROFLEET is made available from
Article Search Video Info

Search and Editing

FEMIR: A Tool for Recommending Framework Extension Examples
Muhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and Daqing Hou
(University of Saskatchewan, Canada; Clarkson University, USA)
Software frameworks enable developers to reuse existing well tested functionalities instead of taking the burden of implementing everything from scratch. However, to meet application specific requirements, the frameworks need to be customized via extension points. This is often done by passing a framework related object as an argument to an API call. To enable such customizations, the object can be created by extending a framework class, implementing an interface, or changing the properties of the object via API calls. However, it is both a common and non-trivial task to find all the details related to the customizations. In this paper, we present a tool, called FEMIR, that utilizes partial program analysis and graph mining technique to detect, group, and rank framework extension examples. The tool extends existing code completion infrastructure to inform developers about customization choices, enabling them to browse through extension points of a framework, and frequent usages of each point in terms of code examples. A video demo is made available at
Article Search
TiQi: A Natural Language Interface for Querying Software Project Data
Jinfeng Lin, Yalin Liu, Jin Guo, Jane Cleland-Huang, William Goss, Wenchuang Liu, Sugandha Lohar, Natawut Monaikul, and Alexander Rasin
(University of Notre Dame, USA; DePaul University, USA)
Abstract—Software projects produce large quantities of data such as feature requests, requirements, design artifacts, source code, tests, safety cases, release plans, and bug reports. If leveraged effectively, this data can be used to provide project intelligence that supports diverse software engineering activities such as release planning, impact analysis, and software analytics. However, project stakeholders often lack skills to formulate complex queries needed to retrieve, manipulate, and display the data in meaningful ways. To address these challenges we introduce TiQi, a natural language interface, which allows users to express software-related queries verbally or written in natural language. TiQi is a web-based tool. It visualizes available project data as a prompt to the user, accepts Natural Language (NL) queries, transforms those queries into SQL, and then executes the queries against a centralized or distributed database. Raw data is stored either directly in the database or retrieved dynamically at runtime from case tools and repositories such as Github and Jira. The transformed query is visualized back to the user as SQL and augmented UML, and raw data results are returned. Our tool demo can be found on YouTube at the following link: Keywords-Natural Language Interface, Project Data, Query
Article Search Video
Opiner: An Opinion Search and Summarization Engine for APIs
Gias Uddin and Foutse Khomh
(McGill University, Canada; Polytechnique Montréal, Canada)
Opinions are key determinants to many of the activities related to software development. The perceptions of developers about an API, and the choices they make about whether and how they should use it, may, to a considerable degree, be conditioned upon how other developers see and evaluate the API. Given the plethora of APIs available for a given development task and the advent of developer forums as the media to share opinions about those APIs, it can be challenging for a developer to make informed decisions about an API to support the task. We introduce Opiner, our opinion search and summarization engine for API reviews. The server side of Opiner collects and summarizes opinions about APIs by crawling online developer forums and by associating the opinions found in the forum posts to the APIs discussed in the posts. The client side of Opiner is a Website that presents different summarized viewpoints of the opinions about the APIs in an online search engine. We evaluated Opiner by asking Industrial developers to select APIs for two development tasks. We found that developers were interested to use our proposed summaries of API reviews and that while combined with Stack Overflow, Opiner helped developers to make the right decision with more accuracy and confidence. The Opiner online search engine is available at: A video demo is available at:
Article Search
Defaultification Refactoring: A Tool for Automatically Converting Java Methods to Default
Raffi Khatchadourian and Hidehiko Masuhara
(City University of New York, USA; Tokyo Institute of Technology, Japan)

Enabling interfaces to declare (instance) method implementations, Java 8 default methods can be used as a substitute for the ubiquitous skeletal implementation software design pattern. Performing this transformation on legacy software manually, though, may be non-trivial. The refactoring requires analyzing complex type hierarchies, resolving multiple implementation inheritance issues, reconciling differences between class and interface methods, and analyzing tie-breakers (dispatch precedence) with overriding class methods. All of this is necessary to preserve type-correctness and confirm semantics preservation. We demonstrate an automated refactoring tool called Migrate Skeletal Implementation to Interface for transforming legacy Java code to use the new default construct. The tool, implemented as an Eclipse plug-in, is driven by an efficient, fully-automated, type constraint-based refactoring approach. It features an extensive rule set covering various corner-cases where default methods cannot be used. The resulting code is semantically equivalent to the original, more succinct, easier to comprehend, less complex, and exhibits increased modularity. A demonstration can be found at

Article Search Video Info
Kobold: Web Usability as a Service
Julián Grigera, Alejandra Garrido, and Gustavo Rossi
(Universidad Nacional de La Plata, Argentina; CIC, Argentina; CONICET, Argentina)
While Web applications have become pervasive in today’s business, social interaction and information exchange, their usability is often deficient, even being a key factor for a website success. Usability problems repeat across websites, and many of them have been catalogued, but usability evaluation and repair still remains expensive. There are efforts from both the academy and industry to automate usability testing or to provide automatic statistics, but they rarely offer concrete solutions. These solutions appear as guidelines or patterns that developers can follow manually. This paper presents Kobold, a tool that detects usability problems from real user interaction (UI) events and repairs them automatically when possible, at least suggesting concrete solutions. By using the refactoring technique and its associated concept of bad smell, Kobold mines UI events to detect usability smells and applies usability refactorings on the client to correct them. The purpose of Kobold is to deliver usability advice and solutions as a service (SaaS) for developers, allowing them to respond to feedback of the real use of their applications and improve usability incrementally, even when there are no usability experts on the team. Kobold is available at: A screencast is available at
Article Search Video Info
IntPTI: Automatic Integer Error Repair with Proper-Type Inference
Xi Cheng, Min Zhou, Xiaoyu Song, Ming Gu, and Jiaguang Sun
(Tsinghua University, China; Portland State University, USA)
Integer errors in C/C++ are caused by arithmetic operations yielding results which are unrepresentable in certain type. They can lead to serious safety and security issues. Due to the complicated semantics of C/C++ integers, integer errors are widely harbored in real-world programs and it is error-prone to repair them even for experts. An automatic tool is desired to 1) automatically generate fixes which assist developers to correct the buggy code, and 2) provide sufficient hints to help developers review the generated fixes and better understand integer types in C/C++. In this paper, we present a tool IntPTI that implements the desired functionalities for C programs. IntPTI infers appropriate types for variables and expressions to eliminate representation issues, and then utilizes the derived types with fix patterns codified from the successful human-written patches. IntPTI provides a user-friendly web interface which allows users to review and manage the fixes. We evaluate IntPTI on 7 real-world projects and the results show its competitive repair accuracy and its scalability on large code bases. The demo video for IntPTI is available at:
Article Search Video Info

Doctoral Symposium

Learning Effective Changes for Software Projects
Rahul Krishna
(North Carolina State University, USA)
The primary motivation of much of software analytics is decision making. How to make these decisions? Should one make decisions based on lessons that arise from within a particular project? Or should one generate these decisions from across multiple projects? This work is an attempt to answer these questions. Our work was motivated by a realization that much of the current generation software analytics tools focus primarily on prediction. Indeed prediction is a useful task, but it is usually followed by ``planning'' about what actions need to be taken. This research seeks to address the planning task by seeking methods that support actionable analytics by offering clear guidance on what to do. Specifically, we propose XTREE and BELLTREE algorithms for generating a set of actionable plans within and across projects. Each of these plans, if followed will improve the quality of the software project.
Article Search
Characterizing and Taming Non-deterministic Bugs in JavaScript Applications
Jie Wang
(Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China)
JavaScript has become one of the most popular programming languages for both client-side and server-side applications. In JavaScript applications, events may be generated, triggered and consumed non-deterministically. Thus, JavaScript applications may suffer from non-deterministic bugs, when events are triggered and consumed in an unexpected order. In this proposal, we aim to characterize and combat non-deterministic bugs in JavaScript applications. Specifically, we first perform a comprehensive study about real-world non-deterministic bugs in server-side JavaScript applications. In order to facilitate bug diagnosis, we further propose approaches to isolate the necessary events that are responsible for the occurrence of a failure. We also plan to design new techniques in detecting non-deterministic bugs in JavaScript applications.
Article Search
Towards API-Specific Automatic Program Repair
Sebastian Nielebock
(Otto von Guericke University Magdeburg, Germany)
The domain of Automatic Program Repair (APR) had many research contributions in recent years. So far, most approaches target fixing generic bugs in programs (e.g., off-by-one errors). Nevertheless, recent studies reveal that about 50% of real bugs require API-specific fixes (e.g., adding missing API method calls or correcting method ordering), for which existing APR approaches are not designed. In this paper, we address this problem and introduce the notion of an API-specific program repair mechanism. This mechanism detects erroneous code in a similar way to existing APR approaches. However, to fix such bugs, it uses API-specific information from the erroneous code to search for API usage patterns in other software, with which we could fix the bug. We provide first insights on the applicability of this mechanism and discuss upcoming research challenges.
Article Search
Managing Software Evolution through Semantic History Slicing
Yi Li
(University of Toronto, Canada)
Software change histories are results of incremental updates made by developers. As a side-effect of the software development process, version history is a surprisingly useful source of information for understanding, maintaining and reusing software. However, traditional commit-based sequential organization of version histories lacks semantic structure and thus are insufficient for many development tasks that require high-level, semantic understanding of program functionality, such as locating feature implementations and porting hot fixes. In this work, we propose to use well-organized unit tests as identifiers for corresponding software functionalities. We then present a family of automated techniques which analyze the semantics of historical changes and assist developers in many everyday practical settings. For validation, we evaluate our approaches on a benchmark of developer-annotated version history instances obtained from real-world open source software projects on GitHub.
Article Search
Towards the Automatic Classification of Traceability Links
Chris Mills
(Florida State University, USA)
A wide range of text-based artifacts contribute to software projects (e.g., source code, test cases, use cases, project requirements, interaction diagrams, etc.). Traceability Link Recovery (TLR) is the software task in which relevant documents in these various sets are linked to one another, uncovering information about the project that is not available when considering only the documents themselves. This information is helpful for enabling other tasks such as improving test coverage, impact analysis, and ensuring that system or regulatory requirements are met. However, while traceability links are useful, performing TLR manually is time consuming and fraught with error. Previous work has applied Information Retrieval (IR) and other techniques to reduce the human effort involved; however, that effort remains significant. In this research we seek to take the next step in reducing it by using machine learning (ML) classification models to predict whether a candidate link is valid or invalid without human oversight. Preliminary results show that this approach has promise for accurately recommending valid links; however, there are several challenges that still must be addressed in order to achieve a technique with high enough performance to consider it a viable, completely automated solution.
Article Search
Towards a Software Vulnerability Prediction Model using Traceable Code Patterns and Software Metrics
Kazi Zakia Sultana
(Mississippi State University, USA)
Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security using traceable patterns and software metrics during development. The concept of traceable patterns is similar to design patterns but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. This study explores the performance of some code patterns in vulnerability prediction and compares them with traditional software metrics. We use the findings to build an effective vulnerability prediction model. We evaluate security vulnerabilities reported for Apache Tomcat, Apache CXF and three stand-alone Java web applications. We use machine learning and statistical techniques for predicting vulnerabilities using traceable patterns and metrics as features. We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics.
Article Search
Towards Search-Based Modelling and Analysis of Requirements and Architecture Decisions
Saheed A. Busari
(University College London, UK)
Many requirements engineering and software architecture decisions are complicated by uncertainty and multiple conflicting stakeholders' objectives. Using quantitative decision models helps clarify these decisions and allows the use of multi-objective simulation optimisation techniques in analysing the impact of decisions on objectives. Existing requirements and architecture decision support methods that use quantitative decision models are limited by the difficulty in elaborating problem-specific decision models and/or lack integrated tool support for automated decision analysis under uncertainty. To address these problems and facilitate requirements and architecture decision analysis, this research proposes a novel modelling language and automated decision analysis technique, implemented in a tool called RADAR. The modelling language is a simplified version of quantitative AND/OR goal models used in requirements engineering and similar to feature models used in software product lines. This research involves developing the RADAR tool and evaluating the tool's applicability, usefulness and scalability on a set of real-world examples.
Article Search
Privacy-Aware Data-Intensive Applications
Michele Guerriero
(Politecnico di Milano, Italy)
The rise of Big Data is leading to an increasing demand for data-intensive applications (DIAs), which, in many cases, are expected to process massive amounts of sensitive data. In this context, ensuring data privacy becomes paramount. While the way we design and develop DIAs has radically changed over the last few years in order to deal with Big Data, there has been relatively little effort to make such design privacy-aware. As a result, enforcing privacy policies in large-scale data processing is currently an open research problem. This thesis proposal makes one step towards this investigation: after identifying the dataflow model as the reference computational model for large-scale DIAs, (1) we propose a novel language for specifying privacy policies on dataflow applications along with (2) a dataflow re-writing mechanism to enforce such policies during DIA execution. Although a systematic evaluation still needs to be carried out, preliminary results are promising. We plan to implement our approach within a model-driven solution to ultimately simplify the design and development of privacy-aware DIAs, i.e. DIAs that ensure privacy policies at runtime.
Article Search

proc time: 0.1