Powered by
2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017),
October 30 – November 3, 2017,
Urbana-Champaign, IL, USA
Technical Research
Test Generation
Tue, Oct 31, 10:30 - 12:30, Illini Room A (Chair: Andreas Zeller)
Systematically Testing Background Services of Mobile Apps
Li Lyna Zhang,
Chieh-Jan Mike Liang, Yunxin Liu, and Enhong Chen
(University of Science and Technology of China, China; Microsoft Research, China)
Contrary to popular belief, mobile apps can spend a large fraction of time running "hidden" as background services. And, bugs in services can translate into crashes, energy depletion, device slow-down, etc. Unfortunately, without necessary testing tools, developers can only resort to telemetries from user devices in the wild. To this end, Snowdrop is a testing framework that systematically identifies and automates background services in Android apps. Snowdrop realizes a service-oriented approach that does not assume all inter-component communication messages are explicitly coded in the app bytecode. Furthermore, to improve the completeness of test inputs generated, Snowdrop infers field values by exploiting the similarity in how developers name variables. We evaluate Snowdrop by testing 848 commercially available mobile apps. Empirical results show that Snowdrop can achieve 20.91% more code path coverage than pathwise test input generators, and 64.11% more coverage than random test input generators.
@InProceedings{ASE17p4,
author = {Li Lyna Zhang and Chieh-Jan Mike Liang and Yunxin Liu and Enhong Chen},
title = {Systematically Testing Background Services of Mobile Apps},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {4--15},
doi = {},
year = {2017},
}
Crowd Intelligence Enhances Automated Mobile Testing
Ke Mao,
Mark Harman , and Yue Jia
(University College London, UK; Facebook, UK)
We show that information extracted from crowd-based testing can enhance automated mobile testing. We introduce Polariz, which generates replicable test scripts from crowd-based testing, extracting cross-app äóÖmotifäó» events: automatically inferred reusable higher-level event sequences composed of lower-level observed event actions. Our empirical study used 434 crowd workers from 24 countries to perform 1,350 testing tasks on 9 popular Google Play apps, each with at least 1 million user installs. The findings reveal that the crowd was able to achieve 60.5% unique activity coverage and proved to be complementary to automated search-based testing in 5 out of the 9 subjects studied. Our leave-one-out evaluation demonstrates that coverage attainment can be improved (6 out of 9 cases, with no disimprovement on the remaining 3) by combining crowd-based and search-based testing.
@InProceedings{ASE17p16,
author = {Ke Mao and Mark Harman and Yue Jia},
title = {Crowd Intelligence Enhances Automated Mobile Testing},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {16--26},
doi = {},
year = {2017},
}
EHBDroid: Beyond GUI Testing for Android Applications
Wei Song
, Xiangxing Qian, and Jeff Huang
(Nanjing University of Science and Technology, China; Texas A&M University, USA)
With the prevalence of Android-based mobile devices, automated testing for Android apps has received increasing attention. However, owing to the large variety of events that Android supports, test input generation is a challenging task. In this paper, we present a novel approach and an open source tool called EHBDroid for testing Android apps. In contrast to conventional GUI testing approaches, a key novelty of EHBDroid is that it does not generate events from the GUI, but directly invokes callbacks of event handlers. By doing so, EHBDroid can efficiently simulate a large number of events that are difficult to generate by traditional UI-based approaches. We have evaluated EHBDroid on a collection of 35 real-world large-scale Android apps and compared its performance with two state-of-the-art UI-based approaches, Monkey and Dynodroid. Our experimental results show that EHBDroid is significantly more effective and efficient than Monkey and Dynodroid: in a much shorter time, EHBDroid achieves as much as 22.3% higher statement coverage (11.1% on average) than the other two approaches, and found 12 bugs in these benchmarks, including 5 new bugs that the other two failed to find.
@InProceedings{ASE17p27,
author = {Wei Song and Xiangxing Qian and Jeff Huang},
title = {EHBDroid: Beyond GUI Testing for Android Applications},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {27--37},
doi = {},
year = {2017},
}
Info
Sketch-Guided GUI Test Generation for Mobile Applications
Chucheng Zhang, Haoliang Cheng,
Enyi Tang, Xin Chen, Lei Bu
, and Xuandong Li
(Nanjing University, China)
Mobile applications with complex GUIs are very popular today. However, generating test cases for these applications is often tedious professional work. On the one hand, manually designing and writing elaborate GUI scripts requires expertise. On the other hand, generating GUI scripts with record and playback techniques usually depends on repetitive work that testers need to interact with the application over and over again, because only one path is recorded in an execution. Automatic GUI testing focuses on exploring combinations of GUI events. As the number of combinations is huge, it is still necessary to introduce a test interface for testers to reduce its search space.
This paper presents a sketch-guided GUI test generation approach for testing mobile applications, which provides a simple but expressive interface for testers to specify their testing purposes. Testers just need to draw a few simple strokes on the screenshots. Then our approach translates the strokes to a testing model and initiates a model-based automatic GUI testing. We evaluate our sketch-guided approach on a few real-world Android applications collected from the literature. The results show that our approach can achieve higher coverage than existing automatic GUI testing techniques with just 10-minute sketching for an application.
@InProceedings{ASE17p38,
author = {Chucheng Zhang and Haoliang Cheng and Enyi Tang and Xin Chen and Lei Bu and Xuandong Li},
title = {Sketch-Guided GUI Test Generation for Mobile Applications},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {38--43},
doi = {},
year = {2017},
}
Saying ’Hi!’ Is Not Enough: Mining Inputs for Effective Test Generation
Luca Della Toffola,
Cristian Alexandru Staicu, and Michael Pradel
(ETH Zurich, Switzerland; TU Darmstadt, Germany)
Automatically generating unit tests is a powerful approach to exercise complex software. Unfortunately, current techniques often fail to provide relevant input values, such as strings that bypass domain-specific sanity checks. As a result, state-of-the-art techniques are effective for generic classes, such as collections, but less successful for domain-specific software. This paper presents TestMiner, the first technique for mining a corpus of existing tests for input values to be used by test generators for effectively testing software not in the corpus. The main idea is to extract literals from thousands of tests and to adapt information retrieval techniques to find values suitable for a particular domain. Evaluating the approach with 40 Java classes from 18 different projects shows that TestMiner improves test coverage by 21% over an existing test generator. The approach can be integrated into various test generators in a straightforward way, increasing their effectiveness on previously difficult-to-test classes.
@InProceedings{ASE17p44,
author = {Luca Della Toffola and Cristian Alexandru Staicu and Michael Pradel},
title = {Saying ’Hi!’ Is Not Enough: Mining Inputs for Effective Test Generation},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {44--49},
doi = {},
year = {2017},
}
Info
Learn&Fuzz: Machine Learning for Input Fuzzing
Patrice Godefroid, Hila Peleg, and Rishabh Singh
(Microsoft Research, USA; Technion, Israel)
Fuzzing consists of repeatedly testing an application with modified,
or fuzzed, inputs with the goal of finding security vulnerabilities in
input-parsing code. In this paper, we show how to automate the
generation of an input grammar suitable for input fuzzing using sample
inputs and neural-network-based statistical machine-learning
techniques. We present a detailed case study with a complex input
format, namely PDF, and a large complex security-critical parser for
this format, namely, the PDF parser embedded in Microsoft's new Edge
browser. We discuss and measure the tension between conflicting
learning and fuzzing goals: learning wants to capture the structure of
well-formed inputs, while fuzzing wants to break that structure in
order to cover unexpected code paths and find bugs. We also present a
new algorithm for this learn&fuzz challenge which uses a learnt input
probability distribution to intelligently guide where to fuzz inputs.
@InProceedings{ASE17p50,
author = {Patrice Godefroid and Hila Peleg and Rishabh Singh},
title = {Learn&Fuzz: Machine Learning for Input Fuzzing},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {50--59},
doi = {},
year = {2017},
}
Developers’ Practice and Behavior
Tue, Oct 31, 10:30 - 12:30, Illini Room B (Chair: Sven Apel)
The Impact of Continuous Integration on Other Software Development Practices: A Large-Scale Empirical Study
Yangyang Zhao,
Alexander Serebrenik, Yuming Zhou,
Vladimir Filkov, and
Bogdan Vasilescu
(Nanjing University, China; Eindhoven University of Technology, Netherlands; University of California at Davis, USA; Carnegie Mellon University, USA)
Continuous Integration (CI) has become a disruptive innovation in software development: with proper tool support and adoption, positive effects have been demonstrated for pull request throughput and scaling up of project sizes. As any other innovation, adopting CI implies adapting existing practices in order to take full advantage of its potential, and "best practices" to that end have been proposed. Here we study the adaptation and evolution of code writing and submission, issue and pull request closing, and testing practices as Travis CI is adopted by hundreds of established projects on GitHub. To help essentialize the quantitative results, we also survey a sample of GitHub developers about their experiences with adopting Travis CI. Our findings suggest a more nuanced picture of how GitHub teams are adapting to, and benefiting from, continuous integration technology than suggested by prior work.
@InProceedings{ASE17p60,
author = {Yangyang Zhao and Alexander Serebrenik and Yuming Zhou and Vladimir Filkov and Bogdan Vasilescu},
title = {The Impact of Continuous Integration on Other Software Development Practices: A Large-Scale Empirical Study},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {60--71},
doi = {},
year = {2017},
}
Perceived Language Complexity in GitHub Issue Discussions and Their Effect on Issue Resolution
David Kavaler, Sasha Sirovica, Vincent Hellendoorn, Raul Aranovich, and
Vladimir Filkov
(University of California at Davis, USA)
Modern software development is increasingly collaborative. Open Source Software (OSS) are the bellwether; they support dynamic teams, with tools for code sharing, communication, and issue tracking. The success of an OSS project is reliant on team communication. E.g., in issue discussions, individuals rely on rhetoric to argue their position, but also maintain technical relevancy. Rhetoric and technical language are on opposite ends of a language complexity spectrum: the former is stylistically natural; the latter is terse and concise. Issue discussions embody this duality, as developers use rhetoric to describe technical issues. The style mix in any discussion can define group culture and affect performance, e.g., issue resolution times may be longer if discussion is imprecise.
Using GitHub, we studied issue discussions to understand whether project-specific language differences exist, and to what extent users conform to a language norm. We built project-specific and overall GitHub language models to study the effect of perceived language complexity on multiple responses. We find that experienced users conform to project-specific language norms, popular individuals use overall GitHub language rather than project-specific language, and conformance to project-specific language norms reduces issue resolution times. We also provide a tool to calculate project-specific perceived language complexity.
@InProceedings{ASE17p72,
author = {David Kavaler and Sasha Sirovica and Vincent Hellendoorn and Raul Aranovich and Vladimir Filkov},
title = {Perceived Language Complexity in GitHub Issue Discussions and Their Effect on Issue Resolution},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {72--83},
doi = {},
year = {2017},
}
Can Automated Pull Requests Encourage Software Developers to Upgrade Out-of-Date Dependencies?
Samim Mirhosseini and
Chris Parnin
(North Carolina State University, USA)
Developers neglect to update legacy software dependencies, resulting in buggy and insecure software. One explanation for this neglect is the difficulty of constantly checking for the availability of new software updates, verifying their safety, and addressing any migration efforts needed when upgrading a dependency. Emerging tools attempt to address this problem by introducing automated pull requests and project badges to inform the developer of stale dependencies. To understand whether these tools actually help developers, we analyzed 7,470 GitHub projects that used these notification mechanisms to identify any change in upgrade behavior. Our results find that, on average, projects that use pull request notifications upgraded 1.6x as often as projects that did not use any tools. Badge notifications were slightly less effective: users upgraded 1.4x more frequently. Unfortunately, although pull request notifications are useful, developers are often overwhelmed by notifications: only a third of pull requests were actually merged. Through a survey, 62 developers indicated that their most significant concerns are breaking changes, understanding the implications of changes, and migration effort. The implications of our work suggests ways in which notifications can be improved to better align with developers' expectations and the need for new mechanisms to reduce notification fatigue and improve confidence in automated pull requests.
@InProceedings{ASE17p84,
author = {Samim Mirhosseini and Chris Parnin},
title = {Can Automated Pull Requests Encourage Software Developers to Upgrade Out-of-Date Dependencies?},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {84--94},
doi = {},
year = {2017},
}
Are Developers Aware of the Architectural Impact of Their Changes?
Matheus Paixao,
Jens Krinke,
DongGyun Han,
Chaiyong Ragkhitwetsagul , and
Mark Harman
(University College London, UK)
Although considered one of the most important decisions in a software development lifecycle, empirical evidence on how developers perform and perceive architectural changes is still scarce. Given the large implications of architectural decisions, we do not know whether developers are aware of their changes' impact on the software's architecture, whether awareness leads to better changes, and whether automatically making developers aware would preventdegradation. Therefore, we use code review data of 4 open source systems to investigate the intent and awareness of developers when performing changes. We extracted 8,900 reviews for which the commits are available. 2,152 of the commits have changes in their computed architectural metrics, and 338 present significant changes to the architecture. We manually inspected all reviews for commits with significant changes and found that only in 38% of the time developers are discussing the impact of their changes on the architectural structure, suggesting a lack of awareness. Finally, we observed that developers tend to be more aware of the architectural impact of their changes when the architectural structure is improved, suggesting that developers should be automatically made aware when their changes degrade the architectural structure.
@InProceedings{ASE17p95,
author = {Matheus Paixao and Jens Krinke and DongGyun Han and Chaiyong Ragkhitwetsagul and Mark Harman},
title = {Are Developers Aware of the Architectural Impact of Their Changes?},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {95--105},
doi = {},
year = {2017},
}
Info
SentiCR: A Customized Sentiment Analysis Tool for Code Review Interactions
Toufique Ahmed,
Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi
(Bangladesh University of Engineering and Technology, Bangladesh; Southern Illinois University at Carbondale, USA)
Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.
@InProceedings{ASE17p106,
author = {Toufique Ahmed and Amiangshu Bosu and Anindya Iqbal and Shahram Rahimi},
title = {SentiCR: A Customized Sentiment Analysis Tool for Code Review Interactions},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {106--111},
doi = {},
year = {2017},
}
Documentation
Tue, Oct 31, 13:30 - 15:30, Illini Room A (Chair: Paul Grünbacher)
Detecting Fragile Comments
Inderjot Kaur Ratol and
Martin P. Robillard
(McGill University, Canada)
Refactoring is a common software development practice and many simple refactorings can be performed automatically by tools. Identifier renaming is a widely used refactoring. With tool support, rename refactorings can rely on the program structure to ensure correctness of the code transformation. Unfortunately, the textual references to the renamed identifier present in the unstructured comment text cannot be formally detected through the syntax of the language, and are thus fragile with respect to identifier renaming. We designed a new rule-based approach to detect fragile comments. Our approach, called Fraco, takes into account the type of identifier, its morphology, the scope of the identifier and the location of comments. We evaluated the approach by comparing its precision and recall against hand-annotated benchmarks created for six Java systems and compared the results against the performance of Eclipse's automated in-comment identifier replacement feature. Fraco performed with near-optimal precision and recall on most components of our evaluation dataset, and generally outperformed the baseline Eclipse feature. As part of our evaluation, we also noted that more than half of the total number of identifiers in our data set had fragile comments after renaming, which further motivates the need for research on automatic comment refactoring.
@InProceedings{ASE17p112,
author = {Inderjot Kaur Ratol and Martin P. Robillard},
title = {Detecting Fragile Comments},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {112--122},
doi = {},
year = {2017},
}
Info
Improving Software Text Retrieval using Conceptual Knowledge in Source Code
Zeqi Lin, Yanzhen Zou, Junfeng Zhao, and Bing Xie
(Peking University, China)
A large software project usually has lots of various textual learning resources about its API, such as tutorials, mailing lists, user forums, etc. Text retrieval technology allows developers to search these API learning resources for related documents using free-text queries, but it suffers from the lexical gap between search queries and documents. In this paper, we propose a novel re-ranking approach for improving the retrieval of API learning resources through leveraging software-specific conceptual knowledge in software source code. The basic idea behind this approach is that the semantic relatedness between queries and documents could be measured according to software-specific concepts involved in them, and software source code contains a large amount of software-specific conceptual knowledge. In detail, firstly we extract an API graph from software source code and use it as software-specific conceptual knowledge. Then we discover API entities involved in queries and documents, and infer semantic document relatedness through analyzing structural relationships between these API entities. We evaluate our approach in three popular open source software projects. Comparing to the state-of-the-art text retrieval approaches, our approach lead to at least 13.77% improvement with respect to mean average precision (MAP).
@InProceedings{ASE17p123,
author = {Zeqi Lin and Yanzhen Zou and Junfeng Zhao and Bing Xie},
title = {Improving Software Text Retrieval using Conceptual Knowledge in Source Code},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {123--134},
doi = {},
year = {2017},
}
Automatically Generating Commit Messages from Diffs using Neural Machine Translation
Siyuan Jiang, Ameer Armaly, and Collin McMillan
(University of Notre Dame, USA)
Commit messages are a valuable resource in comprehension of software evolution, since they provide a record of changes such as feature additions and bug repairs. Unfortunately, programmers often neglect to write good commit messages. Different techniques have been proposed to help programmers by automatically writing these messages. These techniques are effective at describing what changed, but are often verbose and lack context for understanding the rationale behind a change. In contrast, humans write messages that are short and summarize the high level rationale. In this paper, we adapt Neural Machine Translation (NMT) to automatically ``translate'' diffs into commit messages. We trained an NMT algorithm using a corpus of diffs and human-written commit messages from the top 1k Github projects. We designed a filter to help ensure that we only trained the algorithm on higher-quality commit messages. Our evaluation uncovered a pattern in which the messages we generate tend to be either very high or very low quality. Therefore, we created a quality-assurance filter to detect cases in which we are unable to produce good messages, and return a warning instead.
@InProceedings{ASE17p135,
author = {Siyuan Jiang and Ameer Armaly and Collin McMillan},
title = {Automatically Generating Commit Messages from Diffs using Neural Machine Translation},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {135--146},
doi = {},
year = {2017},
}
Info
Improving Missing Issue-Commit Link Recovery using Positive and Unlabeled Data
Yan Sun, Celia Chen, Qing Wang, and Barry Boehm
(University at Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China; Occidental College, USA; University of Southern California, USA)
Links between issue reports and corresponding fix commits are widely used in software maintenance. The quality of links directly affects maintenance costs. Currently, such links are mainly maintained by error-prone manual efforts, which may result in missing links. To tackle this problem, automatic link recovery approaches have been proposed by building traditional classifiers with positive and negative links. However, these traditional classifiers may not perform well due to the inherent characteristics of missing links. Positive links, which can be used to build link recovery model, are quite limited as the result of missing links. Since the construction of negative links depends on the number of positive links in many existing approaches, the available negative links also become restricted. In this paper, we point out that it is better to consider the missing link problem as a model learning problem by using positive and unlabeled data, rather than the construction of traditional classifier. We propose PULink, an approach that constructs the link recovery model with positive and unlabeled links. Our experiment results show that compared to existing state-of-the-art technologies built on traditional classifier, PULink can achieve competitive performance by utilizing only 70% positive links that are used in those approaches.
@InProceedings{ASE17p147,
author = {Yan Sun and Celia Chen and Qing Wang and Barry Boehm},
title = {Improving Missing Issue-Commit Link Recovery using Positive and Unlabeled Data},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {147--152},
doi = {},
year = {2017},
}
APIBot: Question Answering Bot for API Documentation
Yuan Tian, Ferdian Thung, Abhishek Sharma, and
David Lo
(Singapore Management University, Singapore)
As the carrier of Application Programming Interfaces (APIs) knowledge, API documentation plays a crucial role in how developers learn and use an API. It is also a valuable information resource for answering API-related questions, especially when developers cannot find reliable answers to their questions online/offline. However, finding answers to API-related questions from API documentation might not be easy because one may have to manually go through multiple pages before reaching the relevant page, and then read and understand the information inside the relevant page to figure out the answers. To deal with this challenge, we develop APIBot, a bot that can answer API questions given API documentation as an input. APIBot is built on top of SiriusQA, the QA system from Sirius, a state of the art intelligent personal assistant. To make SiriusQA work well under software engineering scenario, we make several modifications over SiriusQA by injecting domain specific knowledge. We evaluate APIBot on 92 API questions, answers of which are known to be present in Java 8 documentation. Our experiment shows that APIBot can achieve a Hit@5 score of 0.706.
@InProceedings{ASE17p153,
author = {Yuan Tian and Ferdian Thung and Abhishek Sharma and David Lo},
title = {APIBot: Question Answering Bot for API Documentation},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {153--158},
doi = {},
year = {2017},
}
Automatic Summarization of API Reviews
Gias Uddin and
Foutse Khomh
(McGill University, Canada; Polytechnique Montréal, Canada)
With the proliferation of online developer forums
as informal documentation, developers often share their opinions
about the APIs they use. However, given the plethora of opinions
available for an API in various online developer forums, it can
be challenging for a developer to make informed decisions about
the APIs. While automatic summarization of opinions have been
explored for other domains (e.g., cameras, cars), we found little
research that investigates the benefits of summaries of public API
reviews. In this paper, we present two algorithms (statistical and
aspect-based) to summarize opinions about APIs. To investigate
the usefulness of the techniques, we developed, Opiner, an online
opinion summarization engine that presents summaries of
opinions using both our proposed techniques and
existing six off-the-shelf techniques. We investigated the usefulness of Opiner using
two case studies, both involving professional software engineers.
We found that developers were interested to use our proposed
summaries much more frequently than other summaries (daily
vs once a year) and that while combined with Stack Overflow,
Opiner helped developers to make the right decision with more
accuracy and confidence and in less time.
@InProceedings{ASE17p159,
author = {Gias Uddin and Foutse Khomh},
title = {Automatic Summarization of API Reviews},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {159--170},
doi = {},
year = {2017},
}
Formal Verification
Tue, Oct 31, 13:30 - 15:30, Illini Room B (Chair: Bernd Fischer)
iCoq: Regression Proof Selection for Large-Scale Verification Projects
Ahmet Celik, Karl Palmskog, and
Milos Gligoric
(University of Texas at Austin, USA; University of Illinois at Urbana-Champaign, USA)
Proof assistants such as Coq are used to construct
and check formal proofs in many large-scale verification projects.
As proofs grow in number and size, the need for tool support
to quickly find failing proofs after revising a project increases.
We present a technique for large-scale regression proof selection,
suitable for use in continuous integration services, e.g., Travis CI.
We instantiate the technique in a tool dubbed iCoq . iCoq tracks
fine-grained dependencies between Coq definitions, propositions,
and proofs, and only checks those proofs affected by changes
between two revisions. iCoq additionally saves time by ignoring
changes with no impact on semantics. We applied iCoq to track
dependencies across many revisions in several large Coq projects
and measured the time savings compared to proof checking from
scratch and when using Coq's timestamp-based toolchain for
incremental checking. Our results show that proof checking with
iCoq is up to 10 times faster than the former and up to 3 times
faster than the latter.
@InProceedings{ASE17p171,
author = {Ahmet Celik and Karl Palmskog and Milos Gligoric},
title = {iCoq: Regression Proof Selection for Large-Scale Verification Projects},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {171--182},
doi = {},
year = {2017},
}
More Effective Interpolations in Software Model Checking
Cong Tian, Zhao Duan, Zhenhua Duan, and C.-H. Luke Ong
(Xidian University, China; University of Oxford, UK)
An approach to CEGAR-based model checking which has proved to be successful on large models employs Craig interpolation to efficiently construct parsimonious abstractions. Following this design, we introduce new applications, universal safety interpolant and existential error interpolant, of Craig interpolation that can systematically reduce the program state space to be explored for safety verification. Whenever the universal safety interpolant is implied by the current path, all paths emanating from that location are guaranteed to be safe. Dually whenever the existential error interpolant is implied by the current path, there is guaranteed to be an unsafe path from the location. We show how these interpolants are computed and applied in safety verification. We have implemented our approach in a tool named InterpChecker by building on an open source software model checker. Experiments on a large number of benchmark programs show that both the interpolations and the auxiliary optimization strategies are effective in improving scalability of software model checking.
@InProceedings{ASE17p183,
author = {Cong Tian and Zhao Duan and Zhenhua Duan and C.-H. Luke Ong},
title = {More Effective Interpolations in Software Model Checking},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {183--193},
doi = {},
year = {2017},
}
Proof-Based Coverage Metrics for Formal Verification
Elaheh Ghassabani, Andrew Gacek, Michael W. Whalen
, Mats P. E. Heimdahl, and Lucas Wagner
(University of Minnesota, USA; Rockwell Collins, USA)
When using formal verification on critical software, an important question involves whether the properties used for analysis are sufficient to adequately constrain the behavior of an implementation model. To address this question, coverage metrics for property-based formal verification have been proposed. Existing metrics are usually based on mutation, where the implementation model is repeatedly modified and re-analyzed to determine whether mutant models are ``killed'' by the property set. These metrics tend to be very expensive to compute, as they involve many additional verification problems. This paper proposes an alternate family of metrics that can be computed using the recently introduced idea of Inductive Validity Cores (IVCs). IVCs determine a minimal set of model elements necessary to establish a proof. One of the proposed metrics is both rigorous and substantially cheaper to compute than mutation-based metrics. In addition, unlike the mutation-based techniques, the design elements marked as necessary by the metric are guaranteed to preserve provability. We demonstrate the metrics on a large corpus of examples.
@InProceedings{ASE17p194,
author = {Elaheh Ghassabani and Andrew Gacek and Michael W. Whalen and Mats P. E. Heimdahl and Lucas Wagner},
title = {Proof-Based Coverage Metrics for Formal Verification},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {194--199},
doi = {},
year = {2017},
}
Info
Model Checker Execution Reports
Rodrigo Castaño, Víctor Braberman,
Diego Garbervetsky , and
Sebastian Uchitel
(University of Buenos Aires, Argentina; CONICET, Argentina)
Software model checking constitutes an undecidable problem and, as such, even an ideal tool will in some cases fail to give a conclusive answer. In practice, software model checkers fail often and usually do not provide any information on what was effectively checked. The purpose of this work is to provide a conceptual framing to extend software model checkers in a way that allows users to access information about incomplete checks. We characterize the information that model checkers themselves can provide, in terms of analyzed traces, i.e. sequences of statements, and safe cones, and present the notion of execution reports (ERs), which we also formalize. We instantiate these concepts for a family of techniques based on Abstract Reachability Trees and implement the approach using the software model checker CPAchecker. We evaluate our approach empirically and provide examples to illustrate the execution reports produced and the information that can be extracted.
@InProceedings{ASE17p200,
author = {Rodrigo Castaño and Víctor Braberman and Diego Garbervetsky and Sebastian Uchitel},
title = {Model Checker Execution Reports},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {200--205},
doi = {},
year = {2017},
}
Modular Verification of Interrupt-Driven Software
Chungha Sung,
Markus Kusano, and Chao Wang
(University of Southern California, USA; Virginia Tech, USA)
Interrupts have been widely used in safety-critical
computer systems to handle outside stimuli and interact with
the hardware, but reasoning about interrupt-driven software
remains a difficult task. Although a number of static verification
techniques have been proposed for interrupt-driven software,
they often rely on constructing a monolithic verification model.
Furthermore, they do not precisely capture the complete execution
semantics of interrupts such as nested invocations of
interrupt handlers. To overcome these limitations, we propose
an abstract interpretation framework for static verification of
interrupt-driven software that first analyzes each interrupt handler
in isolation as if it were a sequential program, and then
propagates the result to other interrupt handlers. This iterative
process continues until results from all interrupt handlers reach a
fixed point. Since our method never constructs the global model,
it avoids the up-front blowup in model construction that hampers
existing, non-modular, verification techniques. We have evaluated
our method on 35 interrupt-driven applications with a total of
22,541 lines of code. Our results show the method is able to
quickly and more accurately analyze the behavior of interrupts.
@InProceedings{ASE17p206,
author = {Chungha Sung and Markus Kusano and Chao Wang},
title = {Modular Verification of Interrupt-Driven Software},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {206--216},
doi = {},
year = {2017},
}
BProVe: A Formal Verification Framework for Business Process Models
Flavio Corradini, Fabrizio Fornari, Andrea Polini,
Barbara Re, Francesco Tiezzi, and
Andrea Vandin
(University of Camerino, Italy; DTU, Denmark)
Business Process Modelling has acquired increasing relevance in software development. Available notations, such as BPMN, permit to describe activities of complex organisations. On the one hand, this shortens the communication gap between domain experts and IT specialists. On the other hand, this permits to clarify the characteristics of software systems introduced to provide automatic support for such activities. Nevertheless, the lack of formal semantics hinders the automatic verification of relevant properties.
This paper presents a novel verification framework for BPMN 2.0, called BProVe. It is based on an operational semantics, implemented using MAUDE, devised to make the verification general and effective. A complete tool chain, based on the Eclipse modelling environment, allows for rigorous modelling and analysis of Business Processes. The approach has been validated using more than one thousand models available on a publicly accessible repository. Besides showing the performance of BProVe, this validation demonstrates its practical benefits in identifying correctness issues in real models.
@InProceedings{ASE17p217,
author = {Flavio Corradini and Fabrizio Fornari and Andrea Polini and Barbara Re and Francesco Tiezzi and Andrea Vandin},
title = {BProVe: A Formal Verification Framework for Business Process Models},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {217--228},
doi = {},
year = {2017},
}
Security
Tue, Oct 31, 13:30 - 15:30, Illini Room C (Chair: Jeff Huang)
Static Detection of Asymptotic Resource Side-Channel Vulnerabilities in Web Applications
Jia Chen, Oswaldo Olivo, Isil Dillig
, and Calvin Lin
(University of Texas at Austin, USA)
Web applications can leak confidential user information due to the presence of unintended side-channel vulnerabilities in code. One particularly subtle class of side-channel vulnerabilities arises due to resource usage imbalances along different execution paths of a program. Such side-channel vulnerabilities are especially severe if the resource usage imbalance is asymptotic. In particular, if the resource usage is constant along one branch but asymptotically dependent on user input along another branch, the attacker can arbitrarily inflate the program's input to amplify differences in resource usage, with the goal of inferring confidential user data. This paper formalizes the notion of asymptotic resource side-channels and presents a lightweight static analysis algorithm for automatically detecting such vulnerabilities in web applications. Based on these ideas, we have developed a tool called SCANNER for detecting resource-related side-channel vulnerabilities in PHP applications. SCANNER has found 18 zero-day security vulnerabilities in 10 different web applications and reports only 2 false positives. The vulnerabilities uncovered by SCANNER can be exploited using cross-site search attacks to extract various kinds of confidential information, such as a user's medications or purchase history.
@InProceedings{ASE17p229,
author = {Jia Chen and Oswaldo Olivo and Isil Dillig and Calvin Lin},
title = {Static Detection of Asymptotic Resource Side-Channel Vulnerabilities in Web Applications},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {229--239},
doi = {},
year = {2017},
}
PAD: Programming Third-Party Web Advertisement Censorship
Weihang Wang,
Yonghwi Kwon,
Yunhui Zheng, Yousra Aafer, I.-Luk Kim, Wen-Chuan Lee, Yingqi Liu, Weijie Meng,
Xiangyu Zhang , and Patrick Eugster
(Purdue University, USA; IBM Research, USA; TU Darmstadt, Germany)
In the current online advertisement delivery, an ad slot on a publisher's website may go through multiple layers of bidding and reselling until the final ad content is delivered. The publishers have little control on the ads being displayed on their web pages. As a result, website visitors may suffer from unwanted ads such as malvertising, intrusive ads, and information disclosure ads. Unfortunately, the visitors often blame the publisher for their unpleasant experience and switch to competitor websites. In this paper, we propose a novel programming support system for ad delivery, called PAD, for publisher programmers, who specify their policies on regulating third-party ads shown on their websites. PAD features an expressive specification language and a novel persistent policy enforcement runtime that can self-install and self-protect throughout the entire ad delegation chain. It also provides an ad-specific memory protection scheme that prevents malvertising by corrupting malicious payloads. Our experiments show that PAD has negligible runtime overhead. It effectively suppresses a set of malvertising cases and unwanted ad behaviors reported in the real world, without affecting normal functionalities and regular ads.
@InProceedings{ASE17p240,
author = {Weihang Wang and Yonghwi Kwon and Yunhui Zheng and Yousra Aafer and I.-Luk Kim and Wen-Chuan Lee and Yingqi Liu and Weijie Meng and Xiangyu Zhang and Patrick Eugster},
title = {PAD: Programming Third-Party Web Advertisement Censorship},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {240--251},
doi = {},
year = {2017},
}
All about Activity Injection: Threats, Semantics, and Detection
Sungho Lee,
Sungjae Hwang, and
Sukyoung Ryu
(KAIST, South Korea; LG Electronics, South Korea)
Android supports seamless user experience by maintaining activities from different apps in the same activity stack. While such close inter-app communication is essential in the Android framework, the powerful inter-app communication contains vulnerabilities that can inject malicious activities into a victim app's activity stack to hijack user interaction flows. In this paper, we demonstrate activity injection attacks with a simple malware, and formally specify the activity activation mechanism using operational semantics. Based on the operational semantics, we develop a static analysis tool, which analyzes Android apps to detect activity injection attacks. Our tool is fast enough to analyze real-world Android apps in 6~seconds on average, and our experiments found that 1,761 apps out of 129,756 real-world Android apps inject their activities into other apps' tasks.
@InProceedings{ASE17p252,
author = {Sungho Lee and Sungjae Hwang and Sukyoung Ryu},
title = {All about Activity Injection: Threats, Semantics, and Detection},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {252--262},
doi = {},
year = {2017},
}
Detecting Information Flow by Mutating Input Data
Björn Mathis,
Vitalii Avdiienko,
Ezekiel O. Soremekun,
Marcel Böhme, and
Andreas Zeller
(Saarland University, Germany; National University of Singapore, Singapore)
Analyzing information flow is central in assessing the security of applications. However, static and dynamic analyses of information flow are easily challenged by non-available or obscure code. We present a lightweight mutation-based analysis that systematically mutates dynamic values returned by sensitive sources to assess whether the mutation changes the values passed to sensitive sinks. If so, we found a flow between source and sink. In contrast to existing techniques, mutation-based flow analysis does not attempt to identify the specific path of the flow and is thus resilient to obfuscation.
In its evaluation, our MUTAFLOW prototype for Android programs showed that mutation-based flow analysis is a lightweight yet effective complement to existing tools. Compared to the popular FLOWDROID static analysis tool, MUTAFLOW requires less than 10% of source code lines but has similar accuracy; on 20 tested real-world apps, it is able to detect 75 flows that FLOWDROID misses.
@InProceedings{ASE17p263,
author = {Björn Mathis and Vitalii Avdiienko and Ezekiel O. Soremekun and Marcel Böhme and Andreas Zeller},
title = {Detecting Information Flow by Mutating Input Data},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {263--273},
doi = {},
year = {2017},
}
Automatically Assessing Crashes from Heap Overflows
Liang He,
Yan Cai ,
Hong Hu, Purui Su, Zhenkai Liang, Yi Yang, Huafeng Huang, Jia Yan, Xiangkun Jia, and Dengguo Feng
(Institute of Software at Chinese Academy of Sciences, China; National University of Singapore, Singapore)
Heap overflow is one of the most widely exploited vulnerabilities,
with a large number of heap overflow instances reported every
year. It is important to decide whether a crash caused by heap
overflow can be turned into an exploit. Efficient and effective
assessment of exploitability of crashes facilitates to identify
severe vulnerabilities and thus prioritize resources. In this paper,
we propose the first metrics to assess heap overflow crashes based
on both the attack aspect and the feasibility aspect. We further
present HCSIFTER, a novel solution to automatically assess the
exploitability of heap overflow instances under our metrics. Given a
heap-based crash, HCSIFTER accurately detects heap overflows through
dynamic execution without any source code or debugging
information. Then it uses several novel methods to extract program
execution information needed to quantify the severity of the heap
overflow using our metrics. We have implemented a prototype HCSIFTER
and applied it to assess nine programs with heap overflow
vulnerabilities. HCSIFTER successfully reports that five heap overflow
vulnerabilities are highly exploitable and two overflow
vulnerabilities are unlikely exploitable. It also gave quantitatively
assessments for other two programs. On average, it only takes about two
minutes to assess one heap overflow crash. The evaluation result
demonstrates both effectiveness and efficiency of HCSIFTER.
@InProceedings{ASE17p274,
author = {Liang He and Yan Cai and Hong Hu and Purui Su and Zhenkai Liang and Yi Yang and Huafeng Huang and Jia Yan and Xiangkun Jia and Dengguo Feng},
title = {Automatically Assessing Crashes from Heap Overflows},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {274--279},
doi = {},
year = {2017},
}
Learning to Share: Engineering Adaptive Decision-Support for Online Social Networks
Yasmin Rafiq, Luke Dickens, Alessandra Russo, Arosha K. Bandara, Mu Yang, Avelie Stuart, Mark Levine, Gul Calikli, Blaine A. Price, and Bashar Nuseibeh
(Imperial College London, UK; University College London, UK; Open University, UK; University of Southampton, UK; University of Exeter, UK; Chalmers University of Technology, Sweden; University of Gothenburg, Sweden; Lero, Ireland)
Some online social networks (OSNs) allow users to define friendship-groups as reusable shortcuts for sharing information with multiple contacts. Posting exclusively to a friendship-group gives some privacy control, while supporting communication with (and within) this group. However, recipients of such posts may want to reuse content for their own social advantage, and can bypass existing controls by copy-pasting into a new post; this cross-posting poses privacy risks.
This paper presents a learning to share approach that enables the incorporation of more nuanced privacy controls into OSNs. Specifically, we propose a reusable, adaptive software architecture that uses rigorous runtime analysis to help OSN users to make informed decisions about suitable audiences for their posts. This is achieved by supporting dynamic formation of recipient-groups that benefit social interactions while reducing privacy risks. We exemplify the use of our approach in the context of Facebook.
@InProceedings{ASE17p280,
author = {Yasmin Rafiq and Luke Dickens and Alessandra Russo and Arosha K. Bandara and Mu Yang and Avelie Stuart and Mark Levine and Gul Calikli and Blaine A. Price and Bashar Nuseibeh},
title = {Learning to Share: Engineering Adaptive Decision-Support for Online Social Networks},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {280--285},
doi = {},
year = {2017},
}
Mobile Development
Tue, Oct 31, 16:00 - 17:30, Illini Room A (Chair: Mario Linares-Vásquez)
UI Driven Android Application Reduction
Jianjun Huang, Yousra Aafer, David Perry,
Xiangyu Zhang , and Chen Tian
(Purdue University, USA; Huawei, USA)
While smartphones and mobile apps have been an integral part of our life,
modern mobile apps tend to contain a lot of rarely used functionalities.
For example, applications contain advertisements and offer extra features such as
recommended news stories in weather apps.
While these functionalities are not essential to an app, they nonetheless
consume power, CPU cycles and bandwidth. In this paper, we design
a UI driven approach that allows customizing an Android app by removing its unwanted functionalities.
In particular, our technique displays the UI and allows the user to select
elements denoting functionalities that she wants to remove.
Using this information, our technique automatically removes all the code elements related to
the selected functionalities, including all the relevant background tasks.
The underlying analysis is a type system, in which each code element
is tagged with a type indicating if it should be removed.
From the UI hints, our technique infers types for all other
code elements and reduces the app accordingly. We implement a prototype and
evaluate it on 10 real-world Android apps. The results show that our approach can
accurately discover the removable code elements and lead to substantial
resource savings in the reduced apps.
@InProceedings{ASE17p286,
author = {Jianjun Huang and Yousra Aafer and David Perry and Xiangyu Zhang and Chen Tian},
title = {UI Driven Android Application Reduction},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {286--296},
doi = {},
year = {2017},
}
SimplyDroid: Efficient Event Sequence Simplification for Android Application
Bo Jiang, Yuxuan Wu, Teng Li, and W. K. Chan
(Beihang University, China; City University of Hong Kong, China)
To ensure the quality of Android applications, many automatic test case generation techniques have been proposed. Among them, the Monkey fuzz testing tool and its variants are simple, effective and widely applicable. However, one major drawback of the family of Monkey tools is that they often generate a large number of events in a failure-inducing input trace, which makes the follow-up debugging activities hard to apply. It is desirable to simplify or reduce the input event sequence while triggering the same failure. In this paper, we propose an efficient event trace representation and the SimplyDroid tool with three hierarchical delta-debugging algorithms each operating on this trace representation to simplify crash traces. We have evaluated SimplyDroid on a suite of real-life Android applications with 92 crash traces. The empirical result shows that our new algorithms in SimplyDroid are both efficient and effective in reducing these event traces.
@InProceedings{ASE17p297,
author = {Bo Jiang and Yuxuan Wu and Teng Li and W. K. Chan},
title = {SimplyDroid: Efficient Event Sequence Simplification for Android Application},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {297--307},
doi = {},
year = {2017},
}
Automated Cross-Platform Inconsistency Detection for Mobile Apps
Mattia Fazzini and
Alessandro Orso
(Georgia Institute of Technology, USA)
Testing of Android apps is particularly challenging due to the fragmentation of the Android ecosystem in terms of both devices and operating system versions. Developers must in fact ensure not only that their apps behave as expected, but also that the apps’ behavior is consistent across platforms. To support this task, we propose DiffDroid, a new technique that helps developers automatically find cross-platform inconsistencies (CPIs) in mobile apps. DiffDroid combines input generation and differential testing to compare the behavior of an app on different platforms and identify possible inconsistencies. Given an app, DiffDroid (1) generates test inputs for the app, (2) runs the app with these inputs on a reference device and builds a model of the app behavior, (3) runs the app with the same inputs on a set of other devices, and (4) compares the behavior of the app on these different devices with the model of its behavior on the reference device. We implemented DiffDroid and performed an evaluation of our approach on 5 benchmarks and over 130 platforms. Our results show that DiffDroid can identify CPIs on real apps efficiently and with a limited number of false positives. DiffDroid and our experimental infrastructure are publicly available.
@InProceedings{ASE17p308,
author = {Mattia Fazzini and Alessandro Orso},
title = {Automated Cross-Platform Inconsistency Detection for Mobile Apps},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {308--318},
doi = {},
year = {2017},
}
Binary Analysis
Thu, Nov 2, 13:30 - 15:30, Illini Room A (Chair: Cristian Cadar)
In-Memory Fuzzing for Binary Code Similarity Analysis
Shuai Wang and Dinghao Wu
(Pennsylvania State University, USA)
Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use.
In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.
@InProceedings{ASE17p319,
author = {Shuai Wang and Dinghao Wu},
title = {In-Memory Fuzzing for Binary Code Similarity Analysis},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {319--330},
doi = {},
year = {2017},
}
DSIbin: Identifying Dynamic Data Structures in C/C++ Binaries
Thomas Rupprecht, Xi Chen,
David H. White, Jan H. Boockmann,
Gerald Lüttgen, and Herbert Bos
(University of Bamberg, Germany; Microsoft, Canada; VU University Amsterdam, Netherlands)
Reverse engineering binary code is notoriously difficult and, especially, understanding a binary’s dynamic data structures. Existing data structure analyzers are limited wrt. program comprehension: they do not detect complex structures such as skip lists, or lists running through nodes of different types such as in the Linux kernel’s cyclic doubly-linked list. They also do not reveal complex parent-child relationships between structures. The tool DSI remedies these shortcomings but requires source code, where type information on heap nodes is available.
We present DSIbin, a combination of DSI and the type excavator Howard for the inspection of C/C++ binaries. While a naive combination already improves upon related work, its precision is limited because Howard’s inferred types are often too coarse. To address this we auto-generate candidates of refined types based on speculative nested-struct detection and type merging; the plausibility of these hypotheses is then validated by DSI. We demonstrate via benchmarking that DSIbin detects data structures with high precision.
@InProceedings{ASE17p331,
author = {Thomas Rupprecht and Xi Chen and David H. White and Jan H. Boockmann and Gerald Lüttgen and Herbert Bos},
title = {DSIbin: Identifying Dynamic Data Structures in C/C++ Binaries},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {331--341},
doi = {},
year = {2017},
}
Towards Robust Instruction-Level Trace Alignment of Binary Code
Ulf Kargén and Nahid Shahmehri
(Linköping University, Sweden)
Program trace alignment is the process of establishing a correspondence between dynamic instruction instances in executions of two semantically similar but syntactically different programs. In this paper we present what is, to the best of our knowledge, the first method capable of aligning realistically long execution traces of real programs. To maximize generality, our method works entirely on the machine code level, i.e. it does not require access to source code. Moreover, the method is based entirely on dynamic analysis, which avoids the many challenges associated with static analysis of binary code, and which additionally makes our approach inherently resilient to e.g. static code obfuscation. Therefore, we believe that our trace alignment method could prove to be a useful aid in many program analysis tasks, such as debugging, reverse-engineering, investigating plagiarism, and malware analysis. We empirically evaluate our method on 11 popular Linux programs, and show that it is capable of producing meaningful alignments in the presence of various code transformations such as optimization or obfuscation, and that it easily scales to traces with tens of millions of instructions.
@InProceedings{ASE17p342,
author = {Ulf Kargén and Nahid Shahmehri},
title = {Towards Robust Instruction-Level Trace Alignment of Binary Code},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {342--352},
doi = {},
year = {2017},
}
Testing Intermediate Representations for Binary Analysis
Soomin Kim ,
Markus Faerevaag, Minkyu Jung, SeungIl Jung, DongYeop Oh, JongHyup Lee, and
Sang Kil Cha
(KAIST, South Korea; Gachon University, South Korea)
Binary lifting, which is to translate a binary executable to a high-level intermediate representation, is a primary step in binary analysis. Despite its importance, there are only few existing approaches to testing the correctness of binary lifters. Furthermore, the existing approaches suffer from low test coverage, because they largely depend on random test case generation. In this paper, we present the design and implementation of the first systematic approach to testing binary lifters. We have evaluated the proposed system on 3 state-of-the-art binary lifters, and found 24 previously unknown semantic bugs. Our result demonstrates that writing a precise binary lifter is extremely difficult even for those heavily tested projects.
@InProceedings{ASE17p353,
author = {Soomin Kim and Markus Faerevaag and Minkyu Jung and SeungIl Jung and DongYeop Oh and JongHyup Lee and Sang Kil Cha},
title = {Testing Intermediate Representations for Binary Analysis},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {353--364},
doi = {},
year = {2017},
}
From Failures to Faults
Tue, Oct 31, 16:00 - 17:30, Illini Room B (Chair: Marcelo d'Amorim)
Comprehensive Failure Characterization
Mitchell J. Gerrard and Matthew B. Dwyer
(University of Nebraska-Lincoln, USA)
There is often more than one way to trigger a fault. Standard static and dynamic approaches focus on exhibiting a single witness for a failing execution. In this paper, we study the problem of computing a comprehensive characterization which safely bounds all failing program behavior while exhibiting a diversity of witnesses for those failures. This information can be used to facilitate software engineering tasks ranging from fault localization and repair to quantitative program analysis for reliability. Our approach combines the results of overapproximating and underapproximating static analyses in an alternating iterative framework to produce upper and lower bounds on the failing input space of a program, which we call a comprehensive failure characterization (CFC). We evaluated a prototype implementation of this alternating framework on a set of 168 C programs from the SVCOMP benchmarks, and the data indicate that it is possible to efficiently, accurately, and safely characterize failure spaces.
@InProceedings{ASE17p365,
author = {Mitchell J. Gerrard and Matthew B. Dwyer},
title = {Comprehensive Failure Characterization},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {365--376},
doi = {},
year = {2017},
}
Info
TrEKer: Tracing Error Propagation in Operating System Kernels
Nicolas Coppik, Oliver Schwahn, Stefan Winter, and Neeraj Suri
(TU Darmstadt, Germany)
Modern operating systems (OSs) consist of numerous interacting components, many of which are developed and maintained independently of one another. In monolithic systems, the boundaries of and interfaces between such components are not strictly enforced at runtime. Therefore, faults in individual components may directly affect other parts of the system in various ways. Software fault injection (SFI) is a testing technique to assess the resilience of a software system in the presence of faulty components. Unfortunately, SFI tests of OSs are inconclusive if they do not lead to observable failures, as corruptions of the internal software state may not be visible at its interfaces and, yet, affect the subsequent execution of the OS beyond the duration of the test. In this paper we present TrEKer, a fully automated approach for identifying how faulty OS components affect other parts of the system. TrEKer combines static and dynamic analyses to achieve efficient tracing on the granularity of memory accesses. We demonstrate TrEKer's ability to support SFI oracles by accurately tracing the effects of faults injected into three widely used Linux kernel modules.
@InProceedings{ASE17p377,
author = {Nicolas Coppik and Oliver Schwahn and Stefan Winter and Neeraj Suri},
title = {TrEKer: Tracing Error Propagation in Operating System Kernels},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {377--387},
doi = {},
year = {2017},
}
RuntimeSearch: Ctrl+F for a Running Program
Matúš Sulír and Jaroslav Porubän
(Technical University of Košice, Slovakia)
Developers often try to find occurrences of a certain term in a software system. Traditionally, a text search is limited to static source code files. In this paper, we introduce a simple approach, RuntimeSearch, where the given term is searched in the values of all string expressions in a running program. When a match is found, the program is paused and its runtime properties can be explored with a traditional debugger. The feasibility and usefulness of RuntimeSearch is demonstrated on a medium-sized Java project.
@InProceedings{ASE17p388,
author = {Matúš Sulír and Jaroslav Porubän},
title = {RuntimeSearch: Ctrl+F for a Running Program},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {388--393},
doi = {},
year = {2017},
}
Program Comprehension
Wed, Nov 1, 10:30 - 12:30, Illini Room A (Chair: Chris Parnin)
Mining Implicit Design Templates for Actionable Code Reuse
Yun Lin ,
Guozhu Meng, Yinxing Xue, Zhenchang Xing
, Jun Sun, Xin Peng
, Yang Liu
, Wenyun Zhao, and Jinsong Dong
(National University of Singapore, Singapore; Nanyang Technological University, Singapore; Australian National University, Australia; Singapore University of Technology and Design, Singapore; Fudan University, China; Griffith University, Australia)
In this paper, we propose an approach to detecting project-specific recurring designs in code base and abstracting them into design templates as reuse opportunities. The mined templates allow programmers to make further customization for generating new code. The generated code involves the code skeleton of recurring design as well as the semi-implemented code bodies annotated with comments to remind programmers of necessary modification. We implemented our approach as an Eclipse plugin called MICoDe. We evaluated our approach with a reuse simulation experiment and a user study involving 16 participants. The results of our simulation experiment on 10 open source Java projects show that, to create a new similar feature with a design template, (1) on average 69% of the elements in the template can be reused and (2) on average 60% code of the new feature can be adopted from the template. Our user study further shows that, compared to the participants adopting the copy-paste-modify strategy, the ones using MICoDe are more effective to understand a big design picture and more efficient to accomplish the code reuse task.
@InProceedings{ASE17p394,
author = {Yun Lin and Guozhu Meng and Yinxing Xue and Zhenchang Xing and Jun Sun and Xin Peng and Yang Liu and Wenyun Zhao and Jinsong Dong},
title = {Mining Implicit Design Templates for Actionable Code Reuse},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {394--404},
doi = {},
year = {2017},
}
Video
Exploring Regular Expression Comprehension
Carl Chapman, Peipei Wang, and
Kathryn T. Stolee
(Sandia National Laboratories, USA; North Carolina State University, USA)
The regular expression (regex) is a powerful tool employed in a large variety of software engineering tasks. However, prior work has shown that regexes can be very complex and that it could be difficult for developers to compose and understand them. This work seeks to identify code smells that impact comprehension. We conduct an empirical study on 42 of pairs of behaviorally equivalent but syntactically different regexes using 180 participants and evaluated the understandability of various regex language features. We further analyzed regexes in GitHub to find the community standards or the common usages of various features. We found that some regex expression representations are more understandable than others. For example, using a range (e.g., [0-9]) is often more understandable than a default character class (e.g., [d]). We also found that the DFA size of a regex significantly affects comprehension for the regexes studied. The larger the DFA of a regex (up to size eight), the more understandable it was. Finally, we identify smelly and non-smelly regex representations based on a combination of community standards and understandability metrics.
@InProceedings{ASE17p405,
author = {Carl Chapman and Peipei Wang and Kathryn T. Stolee},
title = {Exploring Regular Expression Comprehension},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {405--416},
doi = {},
year = {2017},
}
Info
Automatically Assessing Code Understandability: How Far Are We?
Simone Scalabrino,
Gabriele Bavota ,
Christopher Vendome,
Mario Linares-Vásquez ,
Denys Poshyvanyk , and
Rocco Oliveto
(University of Molise, Italy; University of Lugano, Switzerland; College of William and Mary, USA; Universidad de los Andes, Colombia)
Program understanding plays a pivotal role in software maintenance and evolution: a deep understanding of code is the stepping stone for most software-related activities, such as bug fixing or testing. Being able to measure the understandability of a piece of code might help in estimating the effort required for a maintenance activity, in comparing the quality of alternative implementations, or even in predicting bugs. Unfortunately, there are no existing metrics specifically designed to assess the understandability of a given code snippet. In this paper, we perform a first step in this direction, by studying the extent to which several types of metrics computed on code, documentation, and developers correlate with code understandability. To perform such an investigation we ran a study with 46 participants who were asked to understand eight code snippets each. We collected a total of 324 evaluations aiming at assessing the perceived understandability, the actual level of understanding, and the time needed to understand a code snippet. Our results demonstrate that none of the (existing and new) metrics we considered is able to capture code understandability, not even the ones assumed to assess quality attributes strongly related with it, such as code readability and complexity.
@InProceedings{ASE17p417,
author = {Simone Scalabrino and Gabriele Bavota and Christopher Vendome and Mario Linares-Vásquez and Denys Poshyvanyk and Rocco Oliveto},
title = {Automatically Assessing Code Understandability: How Far Are We?},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {417--427},
doi = {},
year = {2017},
}
Info
Improved Query Reformulation for Concept Location using CodeRank and Document Structures
Mohammad Masudur Rahman and Chanchal K. Roy
(University of Saskatchewan, Canada)
During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning.
Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.
@InProceedings{ASE17p428,
author = {Mohammad Masudur Rahman and Chanchal K. Roy},
title = {Improved Query Reformulation for Concept Location using CodeRank and Document Structures},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {428--439},
doi = {},
year = {2017},
}
Info
Understanding Feature Requests by Leveraging Fuzzy Method and Linguistic Analysis
Lin Shi, Celia Chen, Qing Wang, Shoubin Li, and Barry Boehm
(Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; University of Southern California, USA; Occidental College, USA)
In open software development environment, a large number of feature requests with mixed quality are often posted by stakeholders and usually managed in issue tracking systems. Thoroughly understanding and analyzing the real intents that feature requests imply is a labor-intensive and challenging task. In this paper, we introduce an approach to understand feature requests automatically. We generate a set of fuzzy rules based on natural language processing techniques that classify each sentence in feature requests into a set of categories: Intent, Explanation, Benefit, Drawback, Example and Trivia. Consequently, the feature requests can be automatically structured based on the classification results. We conduct experiments on 2,112 sentences taken from 602 feature requests of nine popular open source projects. The results show that our method can reach a high performance on classifying sentences from feature requests. Moreover, when applying fuzzy rules on machine learning methods, the performance can be improved significantly.
@InProceedings{ASE17p440,
author = {Lin Shi and Celia Chen and Qing Wang and Shoubin Li and Barry Boehm},
title = {Understanding Feature Requests by Leveraging Fuzzy Method and Linguistic Analysis},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {440--450},
doi = {},
year = {2017},
}
Info
Models
Wed, Nov 1, 10:30 - 12:30, Illini Room B (Chair: Lars Grunske)
O2O Service Composition with Social Collaboration
Wenyi Qian, Xin Peng
, Jun Sun, Yijun Yu, Bashar Nuseibeh, and Wenyun Zhao
(Fudan University, China; Singapore University of Technology and Design, Singapore; Open University, UK; Lero, Ireland)
In Online-to-Offline (O2O) commerce, customer services may need to be composed from online and offline services. Such composition is challenging, as it requires effective selection of appropriate services that, in turn, support optimal combination of both online and offline services. In this paper, we address this challenge by proposing an approach to O2O service composition which combines offline route planning and social collaboration to optimize service selection. We frame general O2O service composition problems using timed automata and propose an optimization procedure that incorporates: (1) a Markov Chain Monte Carlo (MCMC) algorithm to stochastically select a concrete composite service, and (2) a model checking approach to searching for an optimal collaboration plan with the lowest cost given certain time constraint. Our procedure has been evaluated using the simulation of a rich scenario on effectiveness and scalability.
@InProceedings{ASE17p451,
author = {Wenyi Qian and Xin Peng and Jun Sun and Yijun Yu and Bashar Nuseibeh and Wenyun Zhao},
title = {O2O Service Composition with Social Collaboration},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {451--461},
doi = {},
year = {2017},
}
Gremlin-ATL: A Scalable Model Transformation Framework
Gwendal Daniel, Frédéric Jouault
, Gerson Sunyé, and
Jordi Cabot
(AtlanMod, France; Inria, France; Groupe ESEO, France; ICREA, Spain; Open University of Catalonia, Spain)
Industrial use of Model Driven Engineering techniques has emphasized the need for efficiently store, access, and transform very large models. While scalable persistence frameworks, typically based on some kind of NoSQL database, have been proposed to solve the model storage issue, the same level of performance improvement has not been achieved for the model transformation problem. Existing model transformation tools (such as the well-known ATL) often require the input models to be loaded in memory prior to the start of the transformation and are not optimized to benefit from lazy-loading mechanisms, mainly due to their dependency on current low-level APIs offered by the most popular modeling frameworks nowadays.
In this paper we present Gremlin-ATL, a scalable and efficient model-to-model transformation framework that translates ATL transformations into Gremlin, a query language supported by several NoSQL databases. With Gremlin-ATL, the transformation is computed within the database itself, bypassing the modeling framework limitations and improving its performance both in terms of execution time and memory consumption. Tool support is available online.
@InProceedings{ASE17p462,
author = {Gwendal Daniel and Frédéric Jouault and Gerson Sunyé and Jordi Cabot},
title = {Gremlin-ATL: A Scalable Model Transformation Framework},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {462--472},
doi = {},
year = {2017},
}
Diagnosing Assumption Problems in Safety-Critical Products
Mona Rahimi, Wandi Xiong, Jane Cleland-Huang, and
Robyn Lutz
(DePaul University, USA; Iowa State University, USA; University of Notre Dame, USA)
Problems with the correctness and completeness of environmental assumptions contribute to many accidents in safety-critical systems. The problem is exacerbated when products are modified in new releases or in new products of a product line. In such cases existing sets of environmental assumptions are often carried forward without sufficiently rigorous analysis. This paper describes a new technique that exploits the traceability required by many certifying bodies to reason about the likelihood that environmental assumptions are omitted or incorrectly retained in new products. An analysis of over 150 examples of environmental assumptions in historical systems informs the approach. In an evaluation on three safety-related product lines the approach caught all but one of the assumption-related problems. It also provided clearly defined steps for mitigating the identified issues. The contribution of the work is to arm the safety analyst with useful information for assessing the validity of environmental assumptions for a new product.
@InProceedings{ASE17p473,
author = {Mona Rahimi and Wandi Xiong and Jane Cleland-Huang and Robyn Lutz},
title = {Diagnosing Assumption Problems in Safety-Critical Products},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {473--484},
doi = {},
year = {2017},
}
Software Performance Self-Adaptation through Efficient Model Predictive Control
Emilio Incerto,
Mirco Tribastone , and
Catia Trubiani
(Gran Sasso Science Institute, Italy; IMT School for Advanced Studies Lucca, Italy)
A key challenge in software systems that are exposed to runtime variabilities, such as workload fluctuations and service degradation, is to continuously meet performance requirements. In this paper we present an approach that allows performance self-adaptation using a system model based on queuing networks (QNs), a well-assessed formalism for software performance engineering. Software engineers can select the adaptation knobs of a QN (routing probabilities, service rates, and concurrency level) and we automatically derive a Model Predictive Control (MPC) formulation suitable to continuously configure the selected knobs and track the desired performance requirements. Previous MPC approaches have two main limitations: i) high computational cost of the optimization, due to nonlinearity of the models; ii) focus on long-run performance metrics only, due to the lack of tractable representations of the QN's time-course evolution. As a consequence, these limitations allow adaptations with coarse time granularities, neglecting the system's transient behavior. Our MPC adaptation strategy is efficient since it is based on mixed integer programming, which uses a compact representation of a QN with ordinary differential equations. An extensive evaluation on an implementation of a load balancer demonstrates the effectiveness of the adaptation and compares it with traditional methods based on probabilistic model checking.
@InProceedings{ASE17p485,
author = {Emilio Incerto and Mirco Tribastone and Catia Trubiani},
title = {Software Performance Self-Adaptation through Efficient Model Predictive Control},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {485--496},
doi = {},
year = {2017},
}
Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis
Pooyan Jamshidi, Norbert Siegmund,
Miguel Velez,
Christian Kästner , Akshay Patel, and Yuvraj Agarwal
(Carnegie Mellon University, USA; Bauhaus-University Weimar, Germany)
Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space.
@InProceedings{ASE17p497,
author = {Pooyan Jamshidi and Norbert Siegmund and Miguel Velez and Christian Kästner and Akshay Patel and Yuvraj Agarwal},
title = {Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {497--508},
doi = {},
year = {2017},
}
Info
Reliability and Bugs
Wed, Nov 1, 13:30 - 15:30, Illini Room A (Chair: Michael Whalen)
A Comprehensive Study of Real-World Numerical Bug Characteristics
Anthony Di Franco,
Hui Guo, and Cindy Rubio-González
(University of California at Davis, USA)
Numerical software is used in a wide variety of applications
including safety-critical systems, which have stringent correctness
requirements, and whose failures have catastrophic consequences that
endanger human life. Numerical bugs are known to be particularly
difficult to diagnose and fix, largely due to the use of approximate
representations of numbers such as floating point. Understanding the
characteristics of numerical bugs is the first step to combat them
more effectively. In this paper, we present the first comprehensive
study of real-world numerical bugs. Specifically, we identify and
carefully examine 269 numerical bugs from five widely-used numerical
software libraries: NumPy, SciPy, LAPACK, GNU Scientific Library,
and Elemental. We propose a categorization of numerical bugs, and
discuss their frequency, symptoms and fixes. Our study opens new
directions in the areas of program analysis, testing, and automated
program repair of numerical software, and provides a collection of
real-world numerical bugs.
@InProceedings{ASE17p509,
author = {Anthony Di Franco and Hui Guo and Cindy Rubio-González},
title = {A Comprehensive Study of Real-World Numerical Bug Characteristics},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {509--519},
doi = {},
year = {2017},
}
A Comprehensive Study on Real World Concurrency Bugs in Node.js
Jie Wang,
Wensheng Dou, Yu Gao, Chushu Gao,
Feng Qin, Kang Yin, and Jun Wei
(Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Ohio State University, USA)
Node.js becomes increasingly popular in building server-side JavaScript applications. It adopts an event-driven model, which supports asynchronous I/O and non-deterministic event processing. This asynchrony and non-determinism can introduce intricate concurrency bugs, and leads to unpredictable behaviors. An in-depth understanding of real world concurrency bugs in Node.js applications will significantly promote effective techniques in bug detection, testing and fixing for Node.js.
In this paper, we present NodeCB, a comprehensive study on real world concurrency bugs in Node.js applications. Specifically, we have carefully studied 57 real bug cases from open-source Node.js applications, and have analyzed their bug characteristics, e.g., bug patterns and root causes, bug impacts, bug manifestation, and fix strategies. Through this study, we obtain several interesting findings, which may open up many new research directions in combating concurrency bugs in Node.js. For example, one finding is that two thirds of the bugs are caused by atomicity violation. However, due to lack of locks and transaction mechanism, Node.js cannot easily express and guarantee the atomic intention.
@InProceedings{ASE17p520,
author = {Jie Wang and Wensheng Dou and Yu Gao and Chushu Gao and Feng Qin and Kang Yin and Jun Wei},
title = {A Comprehensive Study on Real World Concurrency Bugs in Node.js},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {520--531},
doi = {},
year = {2017},
}
Source Code Analysis
Wed, Nov 1, 13:30 - 15:30, Illini Room B (Chair: Mark Hills)
Generating Simpler AST Edit Scripts by Considering Copy-and-Paste
Yoshiki Higo , Akio Ohtani, and Shinji Kusumoto
(Osaka University, Japan)
In software development, there are many situations in which developers need to understand given source code changes in detail. Until now, a variety of techniques have been proposed to support understanding source code changes. Tree-based differencing techniques are expected to have better understandability than text-based ones, which are widely used nowadays (e.g., diff in Unix). In this paper, we propose to consider copy-and-paste as a kind of editing action forming tree-based edit script, which is an editing sequence that transforms a tree to another one. Software developers often perform copy-and-paste when they are writing source code. Introducing copy-and-paste action into edit script contributes to not only making simpler (more easily understandable) edit scripts but also making edit scripts closer to developers' actual editing sequences. We conducted experiments on an open dataset. As a result, we confirmed that our technique made edit scripts shorter for 18% of the code changes with a little more computational time. For the other 82% code changes, our technique generated the same edit scripts as an existing technique. We also confirmed that our technique provided more helpful visualizations. Keywords: Understanding code changes, Edit scripts, Copy-and-paste
@InProceedings{ASE17p532,
author = {Yoshiki Higo and Akio Ohtani and Shinji Kusumoto},
title = {Generating Simpler AST Edit Scripts by Considering Copy-and-Paste},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {532--542},
doi = {},
year = {2017},
}
Renaming and Shifted Code in Structured Merging: Looking Ahead for Precision and Performance
Olaf Leßenich,
Sven Apel,
Christian Kästner , Georg Seibt, and
Janet Siegmund
(University of Passau, Germany; Carnegie Mellon University, USA)
Diffing and merging of source-code artifacts is an essential task when integrating changes in software versions. While state-of-the-art line-based tools (e.g., git merge) are fast and independent of the programming language used, they have only a low precision. Recently, it has been shown that the precision of merging can be substantially improved by using a language- aware, structured approach that works on abstract syntax trees. But, precise structured merging is NP hard, especially, when considering the notoriously difficult scenarios of renamings and shifted code. To address these scenarios without compromising scalability, we propose a syntax-aware, heuristic optimization for structured merging that employs a lookahead mechanism during tree matching. The key idea is that renamings and shifted code are not arbitrarily distributed but their occurrence follows patterns, which we address with a syntax-specific lookahead. Our experiments with 48 real-world open-source projects (4878 merge scenarios with over 400 million lines of code) demonstrate that we can significantly improve matching precision in 28 percent while maintaining performance.
@InProceedings{ASE17p543,
author = {Olaf Leßenich and Sven Apel and Christian Kästner and Georg Seibt and Janet Siegmund},
title = {Renaming and Shifted Code in Structured Merging: Looking Ahead for Precision and Performance},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {543--553},
doi = {},
year = {2017},
}
Semantics-Assisted Code Review: An Efficient Toolchain and a User Study
Massimiliano Menarini, Yan Yan, and
William G. Griswold
(University of California at San Diego, USA)
Code changes are often reviewed before they are deployed. Popular source control systems aid code review by presenting textual differences between old and new versions of the code, leaving developers with the difficult task of determining whether the differences actually produced the desired behavior. Fortunately, we can mine such information from code repositories. We propose aiding code review with inter-version semantic differential analysis. During review of a new commit, a developer is presented with summaries of both code differences and behavioral differences, which are expressed as diffs of likely invariants extracted by running the system's test cases. As a result, developers can more easily determine that the code changes produced the desired effect. We created an invariant-mining tool chain, GETTY, to support our concept of semantically-assisted code review. To validate our approach, 1) we applied GETTY to the commits of 6 popular open source projects, 2) we assessed the performance and cost of running GETTY in different configurations, and 3) we performed a comparative user study with 18 developers. Our results demonstrate that semantically-assisted code review is feasible, effective, and that real programmers can leverage it to improve the quality of their reviews.
@InProceedings{ASE17p554,
author = {Massimiliano Menarini and Yan Yan and William G. Griswold},
title = {Semantics-Assisted Code Review: An Efficient Toolchain and a User Study},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {554--565},
doi = {},
year = {2017},
}
Detecting Unknown Inconsistencies in Web Applications
Frolin S. Ocariza, Jr.,
Karthik Pattabiraman , and
Ali Mesbah
(University of British Columbia, Canada)
Although there has been increasing demand for more reliable web applications, JavaScript bugs abound in web applications. In response to this issue, researchers have proposed automated fault detection tools, which statically analyze the web application code to find bugs. While useful, these tools either only target a limited set of bugs based on predefined rules, or they do not detect bugs caused by cross-language interactions, which occur frequently in web application code. To address this problem, we present an anomaly-based inconsistency detection approach, implemented in a tool called Holocron. The main novelty of our approach is that it does not look for hard-coded inconsistency classes. Instead, it applies subtree pattern matching to infer inconsistency classes and association rule mining to detect inconsistencies that occur both within a single language, and between two languages. We evaluated Holocron, and it successfully detected 51 previously unreported inconsistencies - including 18 bugs and 33 code smells - in 12 web applications.
@InProceedings{ASE17p566,
author = {Frolin S. Ocariza, Jr. and Karthik Pattabiraman and Ali Mesbah},
title = {Detecting Unknown Inconsistencies in Web Applications},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {566--577},
doi = {},
year = {2017},
}
Why and How JavaScript Developers Use Linters
Kristín Fjóla Tómasdóttir,
Maurício Aniche, and Arie van Deursen
(Delft University of Technology, Netherlands)
Automatic static analysis tools help developers to automatically spot code issues in their software. They can be of extreme value in languages with dynamic characteristics, such as JavaScript, where developers can easily introduce mistakes which can go unnoticed for a long time, e.g., a simple syntactic or spelling mistake. Although research has already shown how developers perceive such tools for strongly-typed languages such as Java, little is known about their perceptions when it comes to dynamic languages. In this paper, we investigate what motivates and how developers make use of such tools in JavaScript projects. To that goal, we apply a qualitative research method to conduct and analyze a series of 15 interviews with developers responsible for the linter configuration in reputable OSS JavaScript projects that apply the most commonly used linter, ESLint. The results describe the benefits that developers obtain when using ESLint, the different ways one can configure the tool and prioritize its rules, and the existing challenges in applying linters in the real world. These results have direct implications for developers, tool makers, and researchers, such as tool improvements, and a research agenda that aims to increase our knowledge about the usefulness of such analyzers.
@InProceedings{ASE17p578,
author = {Kristín Fjóla Tómasdóttir and Maurício Aniche and Arie van Deursen},
title = {Why and How JavaScript Developers Use Linters},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {578--589},
doi = {},
year = {2017},
}
Symbolic Execution
Wed, Nov 1, 16:00 - 17:30, Illini Room A (Chair: Marsha Chechik)
Automatic Testing of Symbolic Execution Engines via Program Generation and Differential Testing
Timotej Kapus and
Cristian Cadar
(Imperial College London, UK)
Symbolic execution has attracted significant attention in recent years, with applications in software testing, security, networking and more. Symbolic execution tools, like CREST, KLEE, FuzzBALL, and Symbolic PathFinder, have enabled researchers and practitioners to experiment with new ideas, scale the technique to larger applications and apply it to new application domains. Therefore, the correctness of these tools is of critical importance.
In this paper, we present our experience extending compiler testing techniques to find errors in both the concrete and symbolic execution components of symbolic execution engines. The approach used relies on a novel way to create program versions, in three different testing modes---concrete, single-path and multi-path---each exercising different features of symbolic execution engines. When combined with existing program generation techniques and appropriate oracles, this approach enables differential testing within a single symbolic execution engine.
We have applied our approach to the KLEE, CREST and FuzzBALL symbolic execution engines, where it has discovered 20 different bugs exposing a variety of important errors having to do with the handling of structures, division, modulo, casting, vector instructions and more, as well as issues related to constraint solving, compiler optimisations and test input replay.
@InProceedings{ASE17p590,
author = {Timotej Kapus and Cristian Cadar},
title = {Automatic Testing of Symbolic Execution Engines via Program Generation and Differential Testing},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {590--600},
doi = {},
year = {2017},
}
Floating-Point Symbolic Execution: A Case Study in N-Version Programming
Daniel Liew,
Daniel Schemmel,
Cristian Cadar ,
Alastair F. Donaldson , Rafael Zähl, and Klaus Wehrle
(Imperial College London, UK; RWTH Aachen University, Germany)
Symbolic execution is a well-known program analysis technique for testing software, which makes intensive use of constraint solvers. Recent support for floating-point constraint solving has made it feasible to support floating-point reasoning in symbolic execution tools. In this paper, we present the experience of two research teams that independently added floating-point support to KLEE, a popular symbolic execution engine. Since the two teams independently developed their extensions, this created the rare opportunity to conduct a rigorous comparison between the two implementations, essentially a modern case study on N-version programming. As part of our comparison, we report on the different design and implementation decisions taken by each team, and show their impact on a rigorously assembled and tested set of benchmarks, itself a contribution of the paper.
@InProceedings{ASE17p601,
author = {Daniel Liew and Daniel Schemmel and Cristian Cadar and Alastair F. Donaldson and Rafael Zähl and Klaus Wehrle},
title = {Floating-Point Symbolic Execution: A Case Study in N-Version Programming},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {601--612},
doi = {},
year = {2017},
}
Rethinking Pointer Reasoning in Symbolic Execution
Emilio Coppa,
Daniele Cono D’Elia, and
Camil Demetrescu
(Sapienza University of Rome, Italy)
Symbolic execution is a popular program analysis technique that allows seeking for bugs by reasoning over multiple alternative execution states at once. As the number of states to explore may grow exponentially, a symbolic executor may quickly run out of space. For instance, a memory access to a symbolic address may potentially reference the entire address space, leading to a combinatorial explosion of the possible resulting execution states. To cope with this issue, state-of-the-art executors concretize symbolic addresses that span memory intervals larger than some threshold. Unfortunately, this could result in missing interesting execution states, e.g., where a bug arises. In this paper we introduce MEMSIGHT, a new approach to symbolic memory that reduces the need for concretization, hence offering the opportunity for broader state explorations and more precise pointer reasoning. Rather than mapping address instances to data as previous tools do, our technique maps symbolic address expressions to data, maintaining the possible alternative states resulting from the memory referenced by a symbolic address in a compact, implicit form. A preliminary experimental investigation on prominent benchmarks from the DARPA Cyber Grand Challenge shows that MEMSIGHT enables the exploration of states unreachable by previous techniques.
@InProceedings{ASE17p613,
author = {Emilio Coppa and Daniele Cono D’Elia and Camil Demetrescu},
title = {Rethinking Pointer Reasoning in Symbolic Execution},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {613--618},
doi = {},
year = {2017},
}
Info
Leveraging Abstract Interpretation for Efficient Dynamic Symbolic Execution
Eman Alatawi,
Harald Søndergaard , and Tim Miller
(University of Melbourne, Australia)
Dynamic Symbolic Execution (DSE) is a technique to automatically generate test inputs by executing a program with concrete and symbolic values simultaneously. A key challenge in DSE is scalability; executing all feasible program paths is not possible, owing to the potentially exponential or infinite number of paths. Loops are a main source of path explosion, in particular where the number of iterations depends on a program's input. Problems arise because DSE maintains symbolic values that capture only the dependencies on symbolic inputs. This ignores control dependencies, including loop dependencies that depend indirectly on the inputs. We propose a method to increase the coverage achieved by DSE in the presence of input-data dependent loops and loop dependent branches. We combine DSE with abstract interpretation to find indirect control dependencies, including loop and branch indirect dependencies. Preliminary results show that this results in better coverage, within considerably less time compared to standard DSE.
@InProceedings{ASE17p619,
author = {Eman Alatawi and Harald Søndergaard and Tim Miller},
title = {Leveraging Abstract Interpretation for Efficient Dynamic Symbolic Execution},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {619--624},
doi = {},
year = {2017},
}
Program Repair
Wed, Nov 1, 16:00 - 17:30, Illini Room B (Chair: Kathryn T. Stolee)
Tortoise: Interactive System Configuration Repair
Aaron Weiss, Arjun Guha, and
Yuriy Brun
(Northeastern University, USA; University of Massachusetts at Amherst, USA)
System configuration languages provide powerful abstractions that simplify
managing large-scale, networked systems. Thousands of organizations now use
configuration languages, such as Puppet. However, specifications written in
configuration languages can have bugs and the shell remains the simplest way
to debug a misconfigured system. Unfortunately, it is unsafe to use the shell
to fix problems when a system configuration language is in use: a fix applied
from the shell may cause the system to drift from the state specified by the
configuration language. Thus, despite their advantages, configuration
languages force system administrators to give up the simplicity and
familiarity of the shell.
This paper presents a synthesis-based technique that allows administrators to
use configuration languages and the shell in harmony. Administrators can fix
errors using the shell and the technique automatically repairs the
higher-level specification written in the configuration language. The
approach (1) produces repairs that are consistent with the fix made using the
shell; (2) produces repairs that are maintainable by minimizing edits made to
the original specification; (3) ranks and presents multiple repairs when
relevant; and (4) supports all shells the administrator may wish to use. We
implement our technique for Puppet, a widely used system configuration
language, and evaluate it on a suite of benchmarks under 42 repair scenarios.
The top-ranked repair is selected by humans 76% of the time and the
human-equivalent repair is ranked 1.31 on average.
@InProceedings{ASE17p625,
author = {Aaron Weiss and Arjun Guha and Yuriy Brun},
title = {Tortoise: Interactive System Configuration Repair},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {625--636},
doi = {},
year = {2017},
}
Contract-Based Program Repair without the Contracts
Liushan Chen, Yu Pei
, and
Carlo A. Furia
(Hong Kong Polytechnic University, China; Chalmers University of Technology, Sweden)
Automated program repair (APR) is a promising approach to automatically fixing software bugs. Most APR techniques use tests to drive the repair process; this makes them readily applicable to realistic code bases, but also brings the risk of generating spurious repairs that overfit the available tests. Some techniques addressed the overfitting problem by targeting code using contracts (such as pre- and postconditions), which provide additional information helpful to characterize the states of correct and faulty computations; unfortunately, mainstream programming languages do not normally include contract annotations, which severely limits the applicability of such contract-based techniques.
This paper presents JAID, a novel APR technique for Java programs, which is capable of constructing detailed state abstractions---similar to those employed by contract-based techniques---that are derived from regular Java code without any special annotations. Grounding the repair generation and validation processes on rich state abstractions mitigates the overfitting problem, and helps extend APR’s applicability: in experiments with the DEFECTS4J benchmark, a prototype implementation of JAID produced genuinely correct repairs, equivalent to those written by programmers, for 25 bugs---improving over the state of the art of comparable Java APR techniques in the number and kinds of correct fixes.
@InProceedings{ASE17p637,
author = {Liushan Chen and Yu Pei and Carlo A. Furia},
title = {Contract-Based Program Repair without the Contracts},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {637--647},
doi = {},
year = {2017},
}
Info
ELIXIR: Effective Object Oriented Program Repair
Ripon K. Saha,
Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad
(Fujitsu Labs, USA; University of Southern California, USA)
This work is motivated by the pervasive use of method invocations in object-oriented (OO) programs, and indeed their prevalence in patches of OO-program bugs. We propose a generate-and-validate repair technique, called ELIXIR designed to be able to generate such patches. ELIXIR aggressively uses method calls, on par with local variables, fields, or constants, to construct more expressive repair-expressions, that go into synthesizing patches. The ensuing enlargement of the repair
space, on account of the wider use of method calls, is effectively tackled by using a machine-learnt model to rank concrete repairs. The machine-learnt model relies on four features derived from the program context, i.e., the code surrounding the potential repair location, and the bug report. We implement ELIXIR and evaluate it on two datasets, the popular Defects4J dataset and
a new dataset Bugs.jar created by us, and against 2 baseline versions of our technique, and 5 other techniques representing the state of the art in program repair. Our evaluation shows that
ELIXIR is able to increase the number of correctly repaired bugs in Defects4J by 85% (from 14 to 26) and by 57% in Bugs.jar (from 14 to 22), while also significantly out-performing other state-of-the-art repair techniques including ACS, HD-Repair, NOPOL, PAR, and jGenProg.
@InProceedings{ASE17p648,
author = {Ripon K. Saha and Yingjun Lyu and Hiroaki Yoshida and Mukul R. Prasad},
title = {ELIXIR: Effective Object Oriented Program Repair},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {648--659},
doi = {},
year = {2017},
}
Leveraging Syntax-Related Code for Automated Program Repair
Qi Xin and
Steven P. Reiss
(Brown University, USA)
We present our automated program repair technique ssFix which leverages
existing code (from a code database) that is syntax-related to the
context of a bug to produce patches for its repair.
Given a faulty program and a fault-exposing test suite, ssFix does fault
localization to identify suspicious statements that are likely to be faulty.
For each such statement, ssFix identifies a code chunk (or target chunk)
including the statement and its local context. ssFix works on the target chunk
to produce patches. To do so, it first performs syntactic code search to
find candidate code chunks that are syntax-related, i.e., structurally similar
and conceptually related, to the target chunk from a code database (or codebase)
consisting of the local faulty program and an external code repository.
ssFix assumes the correct fix to be contained in the candidate chunks, and it
leverages each candidate chunk to produce patches for the target chunk.
To do so, ssFix translates the candidate chunk by unifying the names used in
the candidate chunk with those in the target chunk; matches the chunk components
(expressions and statements) between the translated candidate chunk and
the target chunk; and produces patches for the target chunk based on the syntactic
differences that exist between the matched components and in the unmatched
components. ssFix finally validates the patched programs generated against the
test suite and reports the first one that passes the test suite.
We evaluated ssFix on 357 bugs in the Defects4J bug dataset. Our results show
that ssFix successfully repaired 20 bugs with valid patches generated and
that it outperformed five other repair techniques for Java.
@InProceedings{ASE17p660,
author = {Qi Xin and Steven P. Reiss},
title = {Leveraging Syntax-Related Code for Automated Program Repair},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {660--670},
doi = {},
year = {2017},
}
Info
Recommender Systems
Thu, Nov 2, 10:30 - 12:30, Illini Room A (Chair: Bogdan Vasilescu)
Boosting Complete-Code Tool for Partial Program
Hao Zhong
and
Xiaoyin Wang
(Shanghai Jiao Tong University, China; University of Texas at San Antonio, USA)
To improve software quality, researchers and practitioners have proposed various static tools, for various purposes (e.g., detecting bugs, anomalies, and vulnerabilities). Although many such tools are quite powerful, they typically need complete code where all the code names are known. In many scenarios, researchers have to analyze partial code in bug fixes/reports, tutorials, and forums. Partial code is a subset of complete code, and many code names of partial code may be unknown. As a result, although partial code is often syntactically correct, existing complete-code tools cannot analyze partial code. To automate the analysis on partial code, some tools have been implemented. However, due to various limitations, tools for partial code are limited in both their number and analysis capability. Instead of proposing another tool for partial code analysis, we propose a general approach, called GRAPA, that boosts existing tools for complete code to analyze partial code. Our major insight is that after unknown bindings are resolved, tools for complete code can analyze partial code with minor modifications. In particular, GRAPA locates Java archive files to resolve unknown bindings, and resolves the remaining unknown bindings from resolved bindings. To illustrate GRAPA, we implement a tool that leverages the state-of-the-art tool, WALA, to analyze Java partial code. We thus implemented the first tool that is able to build system dependency graphs for partial code, complementing existing tools for partial code analysis. We conduct an evaluation on 8,198 partial-code commits from four popular open source projects. Our results show that GRAPA fully resolved unknown code names for 98.5% bug fixes, with an accuracy of 96.1% in total. Furthermore, our results show the significance of GRAPA’s internal techniques, which provides insights on how to integrate with more complete-code tools to analyze partial code.
@InProceedings{ASE17p671,
author = {Hao Zhong and Xiaoyin Wang},
title = {Boosting Complete-Code Tool for Partial Program},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {671--681},
doi = {},
year = {2017},
}
A Language Model for Statements of Software Code
Yixiao Yang,
Yu Jiang , Ming Gu, Jiaguang Sun, Jian Gao, and
Han Liu
(Tsinghua University, China)
Building language models for source code enables a large set of improvements on traditional software engineering tasks. One promising application is automatic code completion. State-of-the-art techniques capture code regularities at token level with lexical information. Such language models are more suitable for predicting short token sequences, but become less effective with respect to long statement level predictions. In this paper, we have proposed PCC to optimize the token level based language modeling. Specifically, PCC introduced an intermediate representation (IR) for source code, which puts tokens into groups using lexeme and variable relative order. In this way, PCC is able to handle long token sequences, i.e., group sequences, to suggest a complete statement with the precise synthesizer. Further more, PCC employed a fuzzy matching technique which combined genetic and longest common sub-sequence algorithms to make the prediction more accurate. We have implemented a code completion plugin for Eclipse and evaluated it on open-source Java projects. The results have demonstrated the potential of PCC in generating precise long statement level predictions. In 30%-60% of the cases, it can correctly suggest the complete statement with only six candidates, and 40%-90% of the cases with ten candidates.
@InProceedings{ASE17p682,
author = {Yixiao Yang and Yu Jiang and Ming Gu and Jiaguang Sun and Jian Gao and Han Liu},
title = {A Language Model for Statements of Software Code},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {682--687},
doi = {},
year = {2017},
}
Context-Aware Integrated Development Environment Command Recommender Systems
Marko Gasparic, Tural Gurbanov, and Francesco Ricci
(Free University of Bolzano, Italy)
Integrated development environments (IDEs) are complex applications that integrate multiple tools for creating and manipulating software project artifacts. To improve users’ knowledge and the effectiveness of usage of the available functionality, the inclusion of recommender systems into IDEs has been proposed. We present a novel IDE command recommendation algorithm that, by taking into account the contexts in which a developer works and in which different commands are usually executed, is able to provide relevant recommendations. We performed an empirical comparison of the proposed algorithm with state-of-the-art IDE command recommenders on a real-world data set. The algorithms were evaluated in terms of precision, recall, F1, k-tail, and with a new evaluation metric that is specifically measuring the usefulness of contextual recommendations. The experiments revealed that in terms of the contextual relevance and usefulness of recommendations the proposed algorithm outperforms existing algorithms.
@InProceedings{ASE17p688,
author = {Marko Gasparic and Tural Gurbanov and Francesco Ricci},
title = {Context-Aware Integrated Development Environment Command Recommender Systems},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {688--693},
doi = {},
year = {2017},
}
Predicting Relevance of Change Recommendations
Thomas Rolfsnes,
Leon Moonen , and
David Binkley
(Simula Research Laboratory, Norway; Loyola University Maryland, USA)
Software change recommendation seeks to suggest artifacts (e.g., files or methods) that are related to changes made by a developer, and thus identifies possible omissions or next steps. While one obvious challenge for recommender systems is to produce accurate recommendations, a complimentary challenge is to rank recommendations based on their relevance. In this paper, we address this challenge for recommendation systems that are based on evolutionary coupling. Such systems use targeted association-rule mining to identify relevant patterns in a software system’s change history. Traditionally, this process involves ranking artifacts using interestingness measures such as confidence and support. However, these measures often fall short when used to assess recommendation relevance.
We propose the use of random forest classification models to assess recommendation relevance. This approach improves on past use of various interestingness measures by learning from previous change recommendations. We empirically evaluate our approach on fourteen open source systems and two systems from our industry partners. Furthermore, we consider complimenting two mining algorithms: Co-Change and Tarmaq. The results find that random forest classification significantly outperforms previous approaches, receives lower Brier scores, and has superior trade-off between precision and recall. The results are consistent across software system and mining algorithm.
@InProceedings{ASE17p694,
author = {Thomas Rolfsnes and Leon Moonen and David Binkley},
title = {Predicting Relevance of Change Recommendations},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {694--705},
doi = {},
year = {2017},
}
Info
AnswerBot: Automated Generation of Answer Summary to Developers’ Technical Questions
Bowen Xu, Zhenchang Xing
, Xin Xia, and
David Lo
(Zhejiang University, China; Singapore Management University, Singapore; Australian National University, Australia; University of British Columbia, Canada)
The prevalence of questions and answers on domain-specific Q&A sites like Stack Overflow constitutes a core knowledge asset for software engineering domain. Although search engines can return a list of questions relevant to a user query of some technical question, the abundance of relevant posts and the sheer amount of information in them makes it difficult for developers to digest them and find the most needed answers to their questions. In this work, we aim to help developers who want to quickly capture the key points of several answer posts relevant to a technical question before they read the details of the posts. We formulate our task as a query-focused multi-answer-posts summarization task for a given technical question. Our proposed approach AnswerBot contains three main steps : 1) relevant question retrieval, 2) useful answer paragraph selection, 3) diverse answer summary generation. To evaluate our approach, we build a repository of 228,817 Java questions and their corresponding answers from Stack Overflow. We conduct user studies with 100 randomly selected Java questions (not in the question repository) to evaluate the quality of the answer summaries generated by our approach and the effectiveness of its relevant question retrieval and answer paragraph selection components. Our evaluation shows that answer summaries generated by our approach are relevant, useful and diverse to developers’ technical questions, and its components can effectively retrieve relevant questions and select salient answer paragraphs for summarization.
@InProceedings{ASE17p706,
author = {Bowen Xu and Zhenchang Xing and Xin Xia and David Lo},
title = {AnswerBot: Automated Generation of Answer Summary to Developers’ Technical Questions},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {706--716},
doi = {},
year = {2017},
}
Recommending Crowdsourced Software Developers in Consideration of Skill Improvement
Zizhe Wang, Hailong Sun
, Yang Fu, and Luting Ye
(Beihang University, China)
Finding suitable developers for a given task is critical and challenging for successful crowdsourcing software development. In practice, the development skills will be improved as developers conduct more development tasks. Prior studies on crowdsourcing developer recommendation do not consider the changing of skills, which can underestimate developers' skill to fulfill a task. In this work, we first conducted an empirical study of the performance of 7,620 developers on TopCoder. With a difficulty-weighted algorithm, we re-compute the scores of each developer by eliminating the effect of task difficulty from the performance. We find out that the skill improvement of TopCoder developers can be fitted well with the negative exponential learning curve model. Second, we design a skill prediction method based on the learning curve. Then we propose a skill improvement aware framework for recommending developers for crowdsourcing software development.
@InProceedings{ASE17p717,
author = {Zizhe Wang and Hailong Sun and Yang Fu and Luting Ye},
title = {Recommending Crowdsourced Software Developers in Consideration of Skill Improvement},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {717--722},
doi = {},
year = {2017},
}
The Rise of the (Modelling) Bots: Towards Assisted Modelling via Social Networks
Sara Pérez-Soler,
Esther Guerra ,
Juan de Lara , and
Francisco Jurado
(Autonomous University of Madrid, Spain)
We are witnessing a rising role of mobile computing and social networks to perform all sorts of tasks. This way, social networks like Twitter or Telegram are used for leisure, and they frequently serve as a discussion media for work-related activities.
In this paper, we propose taking advantage of social networks to enable the collaborative creation of models by groups of users. The process is assisted by modelling bots that orchestrate the collaboration and interpret the users' inputs (in natural language) to incrementally build a (meta-)model. The advantages of this modelling approach include ubiquity of use, automation, assistance, natural user interaction, traceability of design decisions, possibility to incorporate coordination protocols, and seamless integration with the user's normal daily usage of social networks.
We present a prototype implementation called SOCIO, able to work over several social networks like Twitter and Telegram, and a preliminary evaluation showing promising results.
@InProceedings{ASE17p723,
author = {Sara Pérez-Soler and Esther Guerra and Juan de Lara and Francisco Jurado},
title = {The Rise of the (Modelling) Bots: Towards Assisted Modelling via Social Networks},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {723--728},
doi = {},
year = {2017},
}
Info
Concurrency
Thu, Nov 2, 10:30 - 12:30, Illini Room B (Chair: Darko Marinov)
UNDEAD: Detecting and Preventing Deadlocks in Production Software
Jinpeng Zhou, Sam Silvestro
, Hongyu Liu,
Yan Cai , and Tongping Liu
(University of Texas at San Antonio, USA; Institute of Software at Chinese Academy of Sciences, China)
Deadlocks are critical problems afflicting parallel applications, causing software to hang with no further progress. Existing detection tools suffer not only from significant recording performance overhead, but also from excessive memory and/or storage overhead. In addition, they may generate numerous false alarms. Subsequently, after problems have been reported, tremendous manual effort is required to confirm and fix these deadlocks.
This paper designs a novel system, UNDEAD, that helps defeat deadlocks in production software. Different from existing detection tools, UNDEAD imposes negligible runtime performance overhead (less than 3% on average) and small memory overhead (around 6%), without any storage consumption. After detection, UNDEAD automatically strengthens erroneous programs to prevent future occurrences of both existing and potential deadlocks, which is similar to the existing work—Dimmunix. However, UNDEAD exceeds Dimmunix with several orders of magnitude lower performance overhead, while eliminating numerous false positives. Extremely low runtime and memory overhead, convenience, and automatic prevention make UNDEAD an always-on detection tool, and a “band-aid” prevention system for production software.
@InProceedings{ASE17p729,
author = {Jinpeng Zhou and Sam Silvestro and Hongyu Liu and Yan Cai and Tongping Liu},
title = {UNDEAD: Detecting and Preventing Deadlocks in Production Software},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {729--740},
doi = {},
year = {2017},
}
Promoting Secondary Orders of Event Pairs in Randomized Scheduling using a Randomized Stride
Mahmoud Abdelrasoul
(North Carolina State University, USA)
Because of the wide use of randomized scheduling in concurrency testing research, it is important to understand randomized scheduling and its limitations. This work analyzes how randomized scheduling discovers concurrency bugs by focusing on the probabilities of the two possible orders of a pair of events. Analysis shows that the disparity between probabilities can be large for programs that encounter a large number of events during execution. Because sets of ordered event pairs define conditions for discovering concurrency bugs, this disparity can make some concurrency bugs highly unlikely. The complementary nature of the two possible orders also indicates a potential trade-off between the probability of discovering frequently-occurring and infrequently-occurring concurrency bugs. To help address this trade-off in a more balanced way, randomized-stride scheduling is proposed, where scheduling granularity for each thread is adjusted using a randomized stride calculated based on thread length. With some assumptions, strides can be calculated to allow covering the least likely event pair orders. Experiments confirm the analysis results and also suggest that randomized-stride scheduling is more effective for discovering concurrency bugs compared to the original randomized scheduling implementation, and compared to other algorithms in recent literature.
@InProceedings{ASE17p741,
author = {Mahmoud Abdelrasoul},
title = {Promoting Secondary Orders of Event Pairs in Randomized Scheduling using a Randomized Stride},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {741--752},
doi = {},
year = {2017},
}
Parallel Bug-Finding in Concurrent Programs via Reduced Interleaving Instances
Truc L. Nguyen,
Peter Schrammel,
Bernd Fischer ,
Salvatore La Torre, and
Gennaro Parlato
(University of Southampton, UK; University of Sussex, UK; Stellenbosch University, South Africa; University of Salerno, Italy)
Concurrency poses a major challenge for program verification, but it can also offer an opportunity to scale when subproblems can be analysed in parallel. We exploit this opportunity here and use a parametrizable code-to-code translation to generate a set of simpler program instances, each capturing a reduced set of the original program’s interleavings. These instances can then be checked independently in parallel. Our approach does not depend on the tool that is chosen for the final analysis, is compatible with weak memory models, and amplifies the effectiveness of existing tools, making them find bugs faster and with fewer resources. We use Lazy-CSeq as an off-the-shelf final verifier to demonstrate that our approach is able, already with a small number of cores, to find bugs in the hardest known concurrency benchmarks in a matter of minutes, whereas other dynamic and static tools fail to do so in hours.
@InProceedings{ASE17p753,
author = {Truc L. Nguyen and Peter Schrammel and Bernd Fischer and Salvatore La Torre and Gennaro Parlato},
title = {Parallel Bug-Finding in Concurrent Programs via Reduced Interleaving Instances},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {753--764},
doi = {},
year = {2017},
}
Info
Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications
Gustavo Pinto, Anthony Canino,
Fernando Castor, Guoqing Xu, and
Yu David Liu
(Federal University of Paríç, Brazil; SUNY Binghamton, USA; Federal University of Pernambuco, Brazil; University of California at Irvine, USA)
ForkJoin framework is a widely used parallel programming framework upon which both core concurrency libraries and real-world applications are built. Beneath its simple and user-friendly APIs, ForkJoin is a sophisticated managed parallel runtime unfamiliar to many application programmers: the framework core is a *work-stealing* scheduler, handles *fine-grained* tasks, and sustains the pressure from *automatic* memory management. ForkJoin poses a unique gap in the compute stack between high-level software engineering and low-level system optimization. Understanding and bridging this gap is crucial for the future of parallelism support in JVM-supported applications. This paper describes a comprehensive study on parallelism bottlenecks in ForkJoin applications, with a unique focus on how they interact with underlying system-level features, such as work stealing and memory management. We identify 6 bottlenecks, and found that refactoring them can significantly improve performance and energy efficiency. Our field study includes an in-depth analysis of Akka --- a real-world actor framework --- and 30 additional open-source ForkJoin projects. We sent our patches to the developers of 15 projects, and 7 out of the 9 projects that replied to our patches have accepted them.
@InProceedings{ASE17p765,
author = {Gustavo Pinto and Anthony Canino and Fernando Castor and Guoqing Xu and Yu David Liu},
title = {Understanding and Overcoming Parallelism Bottlenecks in ForkJoin Applications},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {765--775},
doi = {},
year = {2017},
}
Quick Verification of Concurrent Programs by Iteratively Relaxed Scheduling
Patrick Metzler, Habib Saissi, Péter Bokor, and Neeraj Suri
(TU Darmstadt, Germany)
The most prominent advantage of software verification over testing is a rigorous check of every possible software behavior. However, large state spaces of concurrent systems, due to non-deterministic scheduling, result in a slow automated verification process. Therefore, verification introduces a large delay between completion and deployment of concurrent software.
This paper introduces a novel iterative approach to verification of concurrent programs that drastically reduces this delay. By restricting the execution of concurrent programs to a small set of admissible schedules, verification complexity and time is drastically reduced. Iteratively adding admissible schedules after their verification eventually restores non-deterministic scheduling. Thereby, our framework allows to find a sweet spot between a low verification delay and sufficient execution time performance. Our evaluation of a prototype implementation on well-known benchmark programs shows that after verifying only few schedules of the program, execution time overhead is competitive to existing deterministic multi-threading frameworks.
@InProceedings{ASE17p776,
author = {Patrick Metzler and Habib Saissi and Péter Bokor and Neeraj Suri},
title = {Quick Verification of Concurrent Programs by Iteratively Relaxed Scheduling},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {776--781},
doi = {},
year = {2017},
}
Program Synthesis
Tue, Oct 31, 10:30 - 12:30, Illini Room C (Chair: Antonio Filieri)
Automatic Loop-Invariant Generation and Refinement through Selective Sampling
Jiaying Li , Jun Sun, Li Li,
Quang Loc Le, and Shang-Wei Lin
(Singapore University of Technology and Design, Singapore; Teesside University, UK; Nanyang Technological University, Singapore)
Automatic loop-invariant generation is important in program analysis and verification. In this paper, we propose to generate loop-invariants automatically through learning and verification. Given a Hoare triple of a program containing a loop, we start with randomly testing the program, collect program states at run-time and categorize them based on whether they satisfy the invariant to be discovered. Next, classification techniques are employed to generate a candidate loop-invariant automatically. Afterwards, we refine the candidate through selective sampling so as to overcome the lack of sufficient test cases. Only after a candidate invariant cannot be improved further through selective sampling, we verify whether it can be used to prove the Hoare triple. If it cannot, the generated counterexamples are added as new tests and we repeat the above process. Furthermore, we show that by introducing a path-sensitive learning, i.e., partitioning the program states according to program locations they visit and classifying each partition separately, we are able to learn disjunctive loop-invariants. In order to evaluate our idea, a prototype tool has been developed and the experiment results show that our approach complements existing approaches.
@InProceedings{ASE17p782,
author = {Jiaying Li and Jun Sun and Li Li and Quang Loc Le and Shang-Wei Lin},
title = {Automatic Loop-Invariant Generation and Refinement through Selective Sampling},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {782--792},
doi = {},
year = {2017},
}
FiB: Squeezing Loop Invariants by Interpolation between Forward/Backward Predicate Transformers
Shang-Wei Lin, Jun Sun, Hao Xiao, Yang Liu
, David Sanán, and Henri Hansen
(Nanyang Technological University, Singapore; Singapore University of Technology and Design, Singapore; Tampere University of Technology, Finland)
Loop invariant generation is a fundamental problem in program analysis and verification. In this work, we propose a new approach to automatically constructing inductive loop invariants. The key idea is to aggressively squeeze an inductive invariant based on Craig interpolants between forward and backward reachability analysis. We have evaluated our approach by a set of loop benchmarks, and experimental results show that our approach is promising.
@InProceedings{ASE17p793,
author = {Shang-Wei Lin and Jun Sun and Hao Xiao and Yang Liu and David Sanán and Henri Hansen},
title = {FiB: Squeezing Loop Invariants by Interpolation between Forward/Backward Predicate Transformers},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {793--803},
doi = {},
year = {2017},
}
SymInfer: Inferring Program Invariants using Symbolic States
ThanhVu Nguyen , Matthew B. Dwyer, and Willem Visser
(University of Nebraska-Lincoln, USA; Stellenbosch University, South Africa)
We introduce a new technique for inferring program invariants that uses symbolic states generated by symbolic execution. Symbolic states, which consist of path conditions and constraints on local variables, are a compact description of sets of concrete program states and they can be used for both invariant inference and invariant verification. Our technique uses a counterexample-based algorithm that creates concrete states from symbolic states, infers candidate invariants from concrete states, and then verifies or refutes candidate invariants using symbolic states. The refutation case produces concrete counterexamples that prevent spurious results and allow the technique to obtain more precise invariants. This process stops when the algorithm reaches a stable set of invariants.
We present SymInfer, a tool that implements these ideas to automatically generate invariants at arbitrary locations in a Java program. The tool obtains symbolic states from Symbolic PathFinder and uses existing algorithms to infer complex (potentially nonlinear) numerical invariants. Our preliminary results show that SymInfer is effective in using symbolic states to generate precise and useful invariants for proving program safety and analyzing program runtime complexity. We also show that SymInfer outperforms existing invariant generation systems.
@InProceedings{ASE17p804,
author = {ThanhVu Nguyen and Matthew B. Dwyer and Willem Visser},
title = {SymInfer: Inferring Program Invariants using Symbolic States},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {804--814},
doi = {},
year = {2017},
}
Info
Parsimony: An IDE for Example-Guided Synthesis of Lexers and Parsers
Alan Leung and
Sorin Lerner
(University of California at San Diego, USA)
We present Parsimony, a programming-by-example development environment for synthesizing lexers and parsers by example. Parsimony provides a graphical interface in which the user presents examples simply by selecting and labeling sample text in a text editor. An underlying synthesis engine then constructs syntactic rules to solve the system of constraints induced by the supplied examples. Parsimony is more expressive and usable than prior programming-by-example systems for parsers in several ways: Parsimony can (1) synthesize lexer rules in addition to productions, (2) solve for much larger constraint systems over multiple examples, rather than handling examples one-at-a-time, and (3) infer much more complex sets of productions, such as entire algebraic expression grammars, by detecting instances of well-known grammar design patterns. The results of a controlled user study across 18 participants show that users are able to perform lexing and parsing tasks faster and with fewer mistakes when using Parsimony as compared to a traditional parsing workflow.
@InProceedings{ASE17p815,
author = {Alan Leung and Sorin Lerner},
title = {Parsimony: An IDE for Example-Guided Synthesis of Lexers and Parsers},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {815--825},
doi = {},
year = {2017},
}
Mining Constraints for Event-based Monitoring in Systems of Systems
Thomas Krismayer,
Rick Rabiser, and
Paul Grünbacher
(JKU Linz, Austria)
The full behavior of software-intensive systems of systems (SoS) emerges during operation only. Runtime monitoring approaches have thus been proposed to detect deviations from the expected behavior. They commonly rely on temporal logic or domain-specific languages to formally define requirements, which are then checked by analyzing the stream of monitored events and event data. Some approaches also allow developers to generate constraints from declarative specifications of the expected behavior. However, independent of the approach, deep domain knowledge is required to specify the desired behavior. This knowledge is often not accessible in SoS environments with multiple development teams independently working on different, heterogeneous systems. In this New Ideas Paper we thus describe an approach that automatically mines constraints for runtime monitoring from event logs recorded in SoS. Our approach builds on ideas from specification mining, process mining, and machine learning to mine different types of constraints on event occurrence, event timing, and event data. The approach further presents the mined constraints to users in an existing constraint language and it ranks the constraints using different criteria. We demonstrate the feasibility of our approach by applying it to event logs from a real-world industrial SoS.
@InProceedings{ASE17p826,
author = {Thomas Krismayer and Rick Rabiser and Paul Grünbacher},
title = {Mining Constraints for Event-based Monitoring in Systems of Systems},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {826--831},
doi = {},
year = {2017},
}
Programming Bots by Synthesizing Natural Language Expressions into API Invocations
Shayan Zamanirad, Boualem Benatallah, Moshe Chai Barukh, Fabio Casati, and
Carlos Rodriguez
(UNSW, Australia; University of Trento, Italy; Tomsk Polytechnic University, Russia)
At present, bots are still in their preliminary stages of development. Many are relatively simple, or developed ad-hoc for a very specific use-case. For this reason, they are typically programmed manually, or utilize machine-learning classifiers to interpret a fixed set of user utterances. In reality, real world conversations with humans require support for dynamically cap- turing users expressions. Moreover, bots will derive immeasurable value by programming them to invoke APIs for their results. Today, within the Web and Mobile development community, complex applications are being stringed together with a few lines of code – all made possible by APIs. Yet, developers today are not as empowered to program bots in much the same way. To overcome this, we introduce BotBase, a bot programming platform that dynamically synthesizes natural language user expressions into API invocations. Our solution is two faceted: Firstly, we construct an API knowledge graph to encode and evolve APIs; secondly, leveraging the above we apply techniques in NLP, ML and Entity Recognition to perform the required synthesis from natural language user expressions into API calls.
@InProceedings{ASE17p832,
author = {Shayan Zamanirad and Boualem Benatallah and Moshe Chai Barukh and Fabio Casati and Carlos Rodriguez},
title = {Programming Bots by Synthesizing Natural Language Expressions into API Invocations},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {832--837},
doi = {},
year = {2017},
}
Testing
Thu, Nov 2, 13:30 - 15:30, Illini Room B (Chair: Milos Gligoric)
Test Suite Parallelization in Open-Source Projects: A Study on Its Usage and Impact
Jeanderson Candido, Luis Melo, and
Marcelo d’Amorim
(Federal University of Pernambuco, Brazil)
Dealing with high testing costs remains an important problem in Software Engineering. Test suite
parallelization is an important approach to address this problem. This paper reports our findings on
the usage and impact of test suite parallelization in open-source projects. It provides
recommendations to practitioners and tool developers to speed up test execution. Considering a set
of 468 popular Java projects we analyzed, we found that 24% of the projects contain costly test
suites but parallelization features still seem underutilized in practice — only 19.1% of costly
projects use parallelization. The main reported reason for adoption resistance was the concern to
deal with concurrency issues. Results suggest that, on average, developers prefer high
predictability than high performance in running tests.
@InProceedings{ASE17p838,
author = {Jeanderson Candido and Luis Melo and Marcelo d’Amorim},
title = {Test Suite Parallelization in Open-Source Projects: A Study on Its Usage and Impact},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {838--848},
doi = {},
year = {2017},
}
Info
Systematic Reduction of GUI Test Sequences
Lin Cheng,
Zijiang Yang, and Chao Wang
(Western Michigan University, USA; University of Southern California, USA)
Graphic user interface (GUI) is an integral part
of many software applications. However, GUI testing remains
a challenging task. The main problem is to generate a set of
high-quality test cases, i.e., sequences of user events to cover the
often large input space. Since manually crafting event sequences
is labor-intensive and automated testing tools often have poor
performance, we propose a new GUI testing framework to
efficiently generate progressively longer event sequences while
avoiding redundant sequences. Our technique for identifying the
redundancy among these sequences relies on statically checking
a set of simple and syntactic-level conditions, whose reduction
power matches and sometimes exceeds that of classic techniques
based on partial order reduction. We have evaluated our method
on 17 Java Swing applications. Our experimental results show
the new technique, while being sound and systematic, can achieve
more than 10X reduction in the number of test sequences
compared to the state-of-the-art GUI testing tools.
@InProceedings{ASE17p849,
author = {Lin Cheng and Zijiang Yang and Chao Wang},
title = {Systematic Reduction of GUI Test Sequences},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {849--860},
doi = {},
year = {2017},
}
Automatically Reducing Tree-Structured Test Inputs
Satia Herfert, Jibesh Patra, and Michael Pradel
(TU Darmstadt, Germany)
Reducing the test input given to a program while preserving some property of interest is important, e.g., to localize faults or to reduce test suites. The well-known delta debugging algorithm and its derivatives automate this task by repeatedly reducing a given input. Unfortunately, these approaches are limited to blindly removing parts of the input and cannot reduce the input by restructuring it. This paper presents the Generalized Tree Reduction (GTR) algorithm, an effective and efficient technique to reduce arbitrary test inputs that can be represented as a tree, such as program code, PDF files, and XML documents. The algorithm combines tree transformations with delta debugging and a greedy backtracking algorithm. To reduce the size of the considered search space, the approach automatically specializes the tree transformations applied by the algorithm based on examples of input trees. We evaluate GTR by reducing Python files that cause interpreter crashes, JavaScript files that cause browser inconsistencies, PDF documents with malicious content, and XML files used to tests an XML validator. The GTR algorithm reduces the trees of these files to 45.3%, 3.6%, 44.2%, and 1.3% of the original size, respectively, outperforming both delta debugging and another state-of-the-art algorithm.
@InProceedings{ASE17p861,
author = {Satia Herfert and Jibesh Patra and Michael Pradel},
title = {Automatically Reducing Tree-Structured Test Inputs},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {861--871},
doi = {},
year = {2017},
}
Synthetic Data Generation for Statistical Testing
Ghanem Soltana,
Mehrdad Sabetzadeh, and Lionel C. Briand
(University of Luxembourg, Luxembourg)
Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable of simultaneously fulfilling the statistical representativeness and logical validity requirements.
@InProceedings{ASE17p872,
author = {Ghanem Soltana and Mehrdad Sabetzadeh and Lionel C. Briand},
title = {Synthetic Data Generation for Statistical Testing},
booktitle = {Proc.\ ASE},
publisher = {IEEE},
pages = {872--882},
doi = {},
year = {2017},
}
Info
proc time: 0.58