ESEC/FSE 2018 – Author Index |
Contents -
Abstracts -
Authors
|
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Adamsen, Christoffer Quist |
ESEC/FSE '18: "Practical AJAX Race Detection ..."
Practical AJAX Race Detection for JavaScript Web Applications
Christoffer Quist Adamsen, Anders Møller, Saba Alimadadi, and Frank Tip (Aarhus University, Denmark; Northeastern University, USA) Asynchronous client-server communication is a common source of errors in JavaScript web applications. Such errors are difficult to detect using ordinary testing because of the nondeterministic scheduling of AJAX events. Existing automated event race detectors are generally too imprecise or too inefficient to be practically useful. To address this problem, we present a new approach based on a lightweight combination of dynamic analysis and controlled execution that directly targets identification of harmful AJAX event races. We experimentally demonstrate using our implementation, AjaxRacer, that this approach is capable of automatically detecting harmful AJAX event races in many websites, and producing informative error messages that support diagnosis and debugging. Among 20 widely used web pages that use AJAX, AjaxRacer discovers harmful AJAX races in 12 of them, with a total of 72 error reports, and with very few false positives. @InProceedings{ESEC/FSE18p38, author = {Christoffer Quist Adamsen and Anders Møller and Saba Alimadadi and Frank Tip}, title = {Practical AJAX Race Detection for JavaScript Web Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {38--48}, doi = {10.1145/3236024.3236038}, year = {2018}, } Publisher's Version Info Artifacts Reusable Artifacts Functional |
|
Afzal, Afsoon |
ESEC/FSE '18-DOC: "Quality Assurance Automation ..."
Quality Assurance Automation in Autonomous Systems
Afsoon Afzal (Carnegie Mellon University, USA) Robots and autonomous systems are finding their way to interact with the public and failures in these systems could be extremely expensive, even deadly. However, low-cost software-based simulation could be a promising approach to systematically test robotics systems and prevent failures as early as possible. In our early work, we showed that the majority of bugs could actually be reproduced and discovered using low-fidelity simulation environment. We created a high-level framework for automated testing of popular ArduPilot systems. In this work, I propose novel approaches to automatically infer powerful representation of system models, and generate test suites with the purpose of enhancing automated fault localization performance and describing the root cause of failures. Finally, I propose to use those novel approaches to inform the construction of automated program repair techniques for autonomous systems. @InProceedings{ESEC/FSE18p948, author = {Afsoon Afzal}, title = {Quality Assurance Automation in Autonomous Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {948--951}, doi = {10.1145/3236024.3275429}, year = {2018}, } Publisher's Version |
|
Albarghouthi, Aws |
ESEC/FSE '18: "Syntax-Guided Synthesis of ..."
Syntax-Guided Synthesis of Datalog Programs
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik (University of Pennsylvania, USA; Hanyang University, South Korea; University of Wisconsin-Madison, USA) Datalog has witnessed promising applications in a variety of domains. We propose a programming-by-example system, ALPS, to synthesize Datalog programs from input-output examples. Scaling synthesis to realistic programs in this manner is challenging due to the rich expressivity of Datalog. We present a syntax-guided synthesis approach that prunes the search space by exploiting the observation that in practice Datalog programs comprise rules that have similar latent syntactic structure. We evaluate ALPS on a suite of 34 benchmarks from three domains—knowledge discovery, program analysis, and database queries. The evaluation shows that ALPS can synthesize 33 of these benchmarks, and outperforms the state-of-the-art tools Metagol and Zaatar, which can synthesize only up to 10 of the benchmarks. @InProceedings{ESEC/FSE18p515, author = {Xujie Si and Woosuk Lee and Richard Zhang and Aws Albarghouthi and Paraschos Koutris and Mayur Naik}, title = {Syntax-Guided Synthesis of Datalog Programs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {515--527}, doi = {10.1145/3236024.3236034}, year = {2018}, } Publisher's Version Artifacts Functional ESEC/FSE '18: "Neural-Augmented Static Analysis ..." Neural-Augmented Static Analysis of Android Communication Jinman Zhao, Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, and Damien Octeau (University of Wisconsin-Madison, USA; Google, USA) We address the problem of discovering communication links between applications in the popular Android mobile operating system, an important problem for security and privacy in Android. Any scalable static analysis in this complex setting is bound to produce an excessive amount of false-positives, rendering it impractical. To improve precision, we propose to augment static analysis with a trained neural-network model that estimates the probability that a communication link truly exists. We describe a neural-network architecture that encodes abstractions of communicating objects in two applications and estimates the probability with which a link indeed exists. At the heart of our architecture are type-directed encoders (TDE), a general framework for elegantly constructing encoders of a compound data type by recursively composing encoders for its constituent types. We evaluate our approach on a large corpus of Android applications, and demonstrate that it achieves very high accuracy. Further, we conduct thorough interpretability studies to understand the internals of the learned neural networks. @InProceedings{ESEC/FSE18p342, author = {Jinman Zhao and Aws Albarghouthi and Vaibhav Rastogi and Somesh Jha and Damien Octeau}, title = {Neural-Augmented Static Analysis of Android Communication}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {342--353}, doi = {10.1145/3236024.3236066}, year = {2018}, } Publisher's Version |
|
Alimadadi, Saba |
ESEC/FSE '18: "Practical AJAX Race Detection ..."
Practical AJAX Race Detection for JavaScript Web Applications
Christoffer Quist Adamsen, Anders Møller, Saba Alimadadi, and Frank Tip (Aarhus University, Denmark; Northeastern University, USA) Asynchronous client-server communication is a common source of errors in JavaScript web applications. Such errors are difficult to detect using ordinary testing because of the nondeterministic scheduling of AJAX events. Existing automated event race detectors are generally too imprecise or too inefficient to be practically useful. To address this problem, we present a new approach based on a lightweight combination of dynamic analysis and controlled execution that directly targets identification of harmful AJAX event races. We experimentally demonstrate using our implementation, AjaxRacer, that this approach is capable of automatically detecting harmful AJAX event races in many websites, and producing informative error messages that support diagnosis and debugging. Among 20 widely used web pages that use AJAX, AjaxRacer discovers harmful AJAX races in 12 of them, with a total of 72 error reports, and with very few false positives. @InProceedings{ESEC/FSE18p38, author = {Christoffer Quist Adamsen and Anders Møller and Saba Alimadadi and Frank Tip}, title = {Practical AJAX Race Detection for JavaScript Web Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {38--48}, doi = {10.1145/3236024.3236038}, year = {2018}, } Publisher's Version Info Artifacts Reusable Artifacts Functional |
|
Alipour, Mohammad Amin |
ESEC/FSE '18-NIER: "On the Naturalness of Proofs ..."
On the Naturalness of Proofs
Vincent J. Hellendoorn, Premkumar T. Devanbu, and Mohammad Amin Alipour (University of California at Davis, USA; University of Houston, USA) Proofs play a key role in reasoning about programs and verification of properties of systems. Mechanized proof assistants help users in developing and checking the consistency of proofs using the proof language developed by the systems; but even then writing proofs is tedious and could benefit from automated insight. In this paper, we analyze proofs in two different proof assistant systems (Coq and HOL Light) to investigate if there is evidence of "naturalness" in these proofs: viz., recurring linguistic patterns that are amenable to language models, in the way that programming languages are known to be. Such models could be used to find errors, rewrite proofs, help suggest dependencies, and perhaps even synthesize (steps of) proofs. We apply state-of-the-art language models to large corpora of proofs to show that this is indeed the case: proofs are remarkably predictable, much like other programming languages. Code completion tools for Coq proofs could save over 60% of typing effort. As proofs have become increasingly central to writing provably correct, large programs (such as the CompCert C compiler), our demonstration that they are amenable to general statistical models unlocks a range of linguistics-inspired tool support. @InProceedings{ESEC/FSE18p724, author = {Vincent J. Hellendoorn and Premkumar T. Devanbu and Mohammad Amin Alipour}, title = {On the Naturalness of Proofs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {724--728}, doi = {10.1145/3236024.3264832}, year = {2018}, } Publisher's Version |
|
Allamanis, Miltiadis |
ESEC/FSE '18: "RefiNym: Using Names to Refine ..."
RefiNym: Using Names to Refine Types
Santanu Kumar Dash, Miltiadis Allamanis, and Earl T. Barr (University College London, UK; Microsoft Research, UK) Source code is bimodal: it combines a formal, algorithmic channel and a natural language channel of identifiers and comments. In this work, we model the bimodality of code with name flows, an assignment flow graph augmented to track identifier names. Conceptual types are logically distinct types that do not always coincide with program types. Passwords and URLs are example conceptual types that can share the program type string. Our tool, RefiNym, is an unsupervised method that mines a lattice of conceptual types from name flows and reifies them into distinct nominal types. For string, RefiNym finds and splits conceptual types originally merged into a single type, reducing the number of same-type variables per scope from 8.7 to 2.2 while eliminating 21.9% of scopes that have more than one same-type variable in scope. This makes the code more self-documenting and frees the type system to prevent a developer from inadvertently assigning data across conceptual types. @InProceedings{ESEC/FSE18p107, author = {Santanu Kumar Dash and Miltiadis Allamanis and Earl T. Barr}, title = {RefiNym: Using Names to Refine Types}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {107--117}, doi = {10.1145/3236024.3236042}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Deep Learning Type Inference ..." Deep Learning Type Inference Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis (University of California at Davis, USA; Microsoft Research, USA; University College London, UK; Microsoft Research, UK) Dynamically typed languages such as JavaScript and Python are increasingly popular, yet static typing has not been totally eclipsed: Python now supports type annotations and languages like TypeScript offer a middle-ground for JavaScript: a strict superset of JavaScript, to which it transpiles, coupled with a type system that permits partially typed programs. However, static typing has a cost: adding annotations, reading the added syntax, and wrestling with the type system to fix type errors. Type inference can ease the transition to more statically typed code and unlock the benefits of richer compile-time information, but is limited in languages like JavaScript as it cannot soundly handle duck-typing or runtime evaluation via eval. We propose DeepTyper, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially. DeepTyper, leverages an automatically aligned corpus of tokens and types to accurately predict thousands of variable and function type annotations. Furthermore, we demonstrate that context is key in accurately assigning these types and introduce a technique to reduce overfitting on local cues while highlighting the need for further improvements. Finally, we show that our model can interact with a compiler to provide more than 4,000 additional type annotations with over 95% precision that could not be inferred without the aid of DeepTyper. @InProceedings{ESEC/FSE18p152, author = {Vincent J. Hellendoorn and Christian Bird and Earl T. Barr and Miltiadis Allamanis}, title = {Deep Learning Type Inference}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {152--162}, doi = {10.1145/3236024.3236051}, year = {2018}, } Publisher's Version Info |
|
Amar, Hen |
ESEC/FSE '18: "Using Finite-State Models ..."
Using Finite-State Models for Log Differencing
Hen Amar, Lingfeng Bao, Nimrod Busany, David Lo, and Shahar Maoz (Tel Aviv University, Israel; Zhejiang University, Singapore; Singapore Management University, Singapore) Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest. In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails. We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments. @InProceedings{ESEC/FSE18p49, author = {Hen Amar and Lingfeng Bao and Nimrod Busany and David Lo and Shahar Maoz}, title = {Using Finite-State Models for Log Differencing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {49--59}, doi = {10.1145/3236024.3236069}, year = {2018}, } Publisher's Version |
|
Anand, Vaastav |
ESEC/FSE '18-SRC: "Dara: Hybrid Model Checking ..."
Dara: Hybrid Model Checking of Distributed Systems
Vaastav Anand (University of British Columbia, Canada) Building correct implementations of distributed systems continues to elude us. Solutions consist of abstract modeling languages such as TLA+, PLusCal, which specify models of systems and tools like Coq, and SPIN which verify correctness of models but require considerable amount of effort, or transparent model checkers like MODIST, CMC and CHESS which suffer from state space explosion, rendering them impractical to use as they are too slow. We propose Dara, a novel hybrid technique that combines the speed of abstract model checkers with the correctness and ease-of-use of transparent model checkers. Dara utilizes tests as well as a transparent model checker to generate logs from real executions of the system. The generated logs are analyzed to infer a model of the system which is model-checked by SPIN to verify user-provided invariants. Invariant violations are reported as likely bug traces. These traces are then passed to a replay engine which tries to replay the traces as real executions of the system to remove false positives. We are currently evaluating Dara's efficiency and usability. @InProceedings{ESEC/FSE18p977, author = {Vaastav Anand}, title = {Dara: Hybrid Model Checking of Distributed Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {977--979}, doi = {10.1145/3236024.3275438}, year = {2018}, } Publisher's Version |
|
Ang, Ming Yi |
ESEC/FSE '18-IND: "Efficient Static Checking ..."
Efficient Static Checking of Library Updates
Darius Foo, Hendy Chua, Jason Yeo, Ming Yi Ang, and Asankhaya Sharma (CA Technologies, Singapore) Software engineering practices have evolved to the point where a developer writing a new application today doesn’t start from scratch, but reuses a number of open source libraries and components. These third-party libraries evolve independently of the applications in which they are used, and may not maintain stable interfaces as bugs and vulnerabilities in them are fixed. This in turn causes API incompatibilities in downstream applications which must be manually resolved. Oversight here may manifest in many ways, from test failures to crashes at runtime. To address this problem, we present a static analysis for automatically and efficiently checking if a library upgrade introduces an API incompatibility. Our analysis does not rely on reported version information from library developers, and instead computes the actual differences between methods in libraries across different versions. The analysis is scalable, enabling real-time diff queries involving arbitrary pairs of library versions. It supports a vulnerability remediation product which suggests library upgrades automatically and is lightweight enough to be part of a continuous integration/delivery (CI/CD) pipeline. To evaluate the effectiveness of our approach, we determine semantic versioning adherence of a corpus of open source libraries taken from Maven Central, PyPI, and RubyGems. We find that on average, 26% of library versions are in violation of semantic versioning. We also analyze a collection of popular open source projects from GitHub to determine if we can automatically update libraries in them without causing API incompatibilities. Our results indicate that we can suggest upgrades automatically for 10% of the libraries. @InProceedings{ESEC/FSE18p791, author = {Darius Foo and Hendy Chua and Jason Yeo and Ming Yi Ang and Asankhaya Sharma}, title = {Efficient Static Checking of Library Updates}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {791--796}, doi = {10.1145/3236024.3275535}, year = {2018}, } Publisher's Version |
|
Angell, Rico |
ESEC/FSE '18-DEMO: "Themis: Automatically Testing ..."
Themis: Automatically Testing Software for Discrimination
Rico Angell, Brittany Johnson, Yuriy Brun, and Alexandra Meliou (University of Massachusetts at Amherst, USA) Bias in decisions made by modern software is becoming a common and serious problem. We present Themis, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior. We explain how Themis can measure discrimination and aid its debugging, describe a set of optimizations Themis uses to reduce test suite size, and demonstrate Themis' effectiveness on open-source software. Themis is open-source and all our evaluation data are available at http://fairness.cs.umass.edu/. See a video of Themis in action: https://youtu.be/brB8wkaUesY @InProceedings{ESEC/FSE18p871, author = {Rico Angell and Brittany Johnson and Yuriy Brun and Alexandra Meliou}, title = {Themis: Automatically Testing Software for Discrimination}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {871--875}, doi = {10.1145/3236024.3264590}, year = {2018}, } Publisher's Version Video Info |
|
Aniche, Maurício |
ESEC/FSE '18-DEMO: "PyDriller: Python Framework ..."
PyDriller: Python Framework for Mining Software Repositories
Davide Spadini, Maurício Aniche, and Alberto Bacchelli (Delft University of Technology, Netherlands; University of Zurich, Switzerland) Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most interesting growing fields within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git repository. In this paper, we present PyDriller, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity. URL: https://github.com/ishepard/pydriller Materials: https://doi.org/10.5281/zenodo.1327363 Pre-print: https://doi.org/10.5281/zenodo.1327411 @InProceedings{ESEC/FSE18p908, author = {Davide Spadini and Maurício Aniche and Alberto Bacchelli}, title = {PyDriller: Python Framework for Mining Software Repositories}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {908--911}, doi = {10.1145/3236024.3264598}, year = {2018}, } Publisher's Version Video Info |
|
Armaly, Ameer |
ESEC/FSE '18: "Detecting Speech Act Types ..."
Detecting Speech Act Types in Developer Question/Answer Conversations during Bug Repair
Andrew Wood, Paige Rodeghero, Ameer Armaly, and Collin McMillan (University of Notre Dame, USA; Clemson University, USA; Google, USA) This paper targets the problem of speech act detection in conversations about bug repair. We conduct a ``Wizard of Oz'' experiment with 30 professional programmers, in which the programmers fix bugs for two hours, and use a simulated virtual assistant for help. Then, we use an open coding manual annotation procedure to identify the speech act types in the conversations. Finally, we train and evaluate a supervised learning algorithm to automatically detect the speech act types in the conversations. In 30 two-hour conversations, we made 2459 annotations and uncovered 26 speech act types. Our automated detection achieved 69% precision and 50% recall. The key application of this work is to advance the state of the art for virtual assistants in software engineering. Virtual assistant technology is growing rapidly, though applications in software engineering are behind those in other areas, largely due to a lack of relevant data and experiments. This paper targets this problem in the area of developer Q/A conversations about bug repair. @InProceedings{ESEC/FSE18p491, author = {Andrew Wood and Paige Rodeghero and Ameer Armaly and Collin McMillan}, title = {Detecting Speech Act Types in Developer Question/Answer Conversations during Bug Repair}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {491--502}, doi = {10.1145/3236024.3236031}, year = {2018}, } Publisher's Version |
|
Aydin, Abdulbaki |
ESEC/FSE '18: "Parameterized Model Counting ..."
Parameterized Model Counting for String and Numeric Constraints
Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu (Microsoft, USA; University of California at Santa Barbara, USA; National Chengchi University, Taiwan) Recently, symbolic program analysis techniques have been extended to quantitative analyses using model counting constraint solvers. Given a constraint and a bound, a model counting constraint solver computes the number of solutions for the constraint within the bound. We present a parameterized model counting constraint solver for string and numeric constraints. We first construct a multi-track deterministic finite state automaton that accepts all solutions to the given constraint. We limit the numeric constraints to linear integer arithmetic, and for non-regular string constraints we over-approximate the solution set. Counting the number of accepting paths in the generated automaton solves the model counting problem. Our approach is parameterized in the sense that, we do not assume a finite domain size during automata construction, resulting in a potentially infinite set of solutions, and our model counting approach works for arbitrarily large bounds. We experimentally demonstrate the effectiveness of our approach on a large set of string and numeric constraints extracted from software applications. We experimentally compare our tool to five existing model counting constraint solvers for string and numeric constraints and demonstrate that our tool is as efficient and as or more precise than other solvers. Moreover, our tool can handle mixed constraints with string and integer variables that no other tool can. @InProceedings{ESEC/FSE18p400, author = {Abdulbaki Aydin and William Eiers and Lucas Bang and Tegan Brennan and Miroslav Gavrilov and Tevfik Bultan and Fang Yu}, title = {Parameterized Model Counting for String and Numeric Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {400--410}, doi = {10.1145/3236024.3236064}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Bacchelli, Alberto |
ESEC/FSE '18-DEMO: "PyDriller: Python Framework ..."
PyDriller: Python Framework for Mining Software Repositories
Davide Spadini, Maurício Aniche, and Alberto Bacchelli (Delft University of Technology, Netherlands; University of Zurich, Switzerland) Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most interesting growing fields within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git repository. In this paper, we present PyDriller, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity. URL: https://github.com/ishepard/pydriller Materials: https://doi.org/10.5281/zenodo.1327363 Pre-print: https://doi.org/10.5281/zenodo.1327411 @InProceedings{ESEC/FSE18p908, author = {Davide Spadini and Maurício Aniche and Alberto Bacchelli}, title = {PyDriller: Python Framework for Mining Software Repositories}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {908--911}, doi = {10.1145/3236024.3264598}, year = {2018}, } Publisher's Version Video Info ESEC/FSE '18: "What Makes a Code Change Easier ..." What Makes a Code Change Easier to Review: An Empirical Investigation on Code Change Reviewability Achyudh Ram, Anand Ashok Sawant, Marco Castelluccio, and Alberto Bacchelli (University of Waterloo, Canada; Delft University of Technology, Netherlands; Mozilla, UK; Federico II University of Naples, Italy; University of Zurich, Switzerland) Peer code review is a practice widely adopted in software projects to improve the quality of code. In current code review practices, code changes are manually inspected by developers other than the author before these changes are integrated into a project or put into production. We conducted a study to obtain an empirical understanding of what makes a code change easier to review. To this end, we surveyed published academic literature and sources from gray literature (blogs and white papers), we interviewed ten professional developers, and we designed and deployed a reviewability evaluation tool that professional developers used to rate the reviewability of 98 changes. We find that reviewability is defined through several factors, such as the change description, size, and coherent commit history. We provide recommendations for practitioners and researchers. Public preprint [https://doi.org/10.5281/zenodo.1323659]; data and materials [https://doi.org/10.5281/zenodo.1323659]. @InProceedings{ESEC/FSE18p201, author = {Achyudh Ram and Anand Ashok Sawant and Marco Castelluccio and Alberto Bacchelli}, title = {What Makes a Code Change Easier to Review: An Empirical Investigation on Code Change Reviewability}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {201--212}, doi = {10.1145/3236024.3236080}, year = {2018}, } Publisher's Version |
|
Bagheri, Hamid |
ESEC/FSE '18-IND: "Modeling and Testing a Family ..."
Modeling and Testing a Family of Surgical Robots: An Experience Report
Niloofar Mansoor, Jonathan A. Saddler, Bruno Silva, Hamid Bagheri, Myra B. Cohen, and Shane Farritor (University of Nebraska-Lincoln, USA) Safety-critical applications often use dependability cases to validate that specified properties are invariant, or to demonstrate a counter example showing how that property might be violated. However, most dependability cases are written with a single product in mind. At the same time, software product lines (families of related software products) have been studied with the goal of modeling variability and commonality, and building family based techniques for both analysis and testing. However, there has been little work on building an end to end dependability case for a software product line (where a property is modeled, a counter example is found and then validated as a true positive via testing), and none that we know of in an emerging safety-critical domain, that of robotic surgery. In this paper, we study a family of surgical robots, that combine hardware and software, and are highly configurable, representing over 1300 unique robots. At the same time, they are considered safety-critical and should have associated dependability cases. We perform a case study to understand how we can bring together lightweight formal analysis, feature modeling, and testing to provide an end to end pipeline to find potential violations of important safety properties. In the process, we learned that there are some interesting and open challenges for the research community, which if solved will lead towards more dependable safety-critical cyber-physical systems. @InProceedings{ESEC/FSE18p785, author = {Niloofar Mansoor and Jonathan A. Saddler and Bruno Silva and Hamid Bagheri and Myra B. Cohen and Shane Farritor}, title = {Modeling and Testing a Family of Surgical Robots: An Experience Report}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {785--790}, doi = {10.1145/3236024.3275534}, year = {2018}, } Publisher's Version |
|
Baldi, Pierre |
ESEC/FSE '18: "Oreo: Detection of Clones ..."
Oreo: Detection of Clones in the Twilight Zone
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes (University of California at Irvine, USA) Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect – the Twilight Zone. Most clone detectors reported in the literature fail to operate in this zone. We present Oreo, a novel approach to source code clone detection that not only detects Type-1 to Type-3 clones accurately, but is also capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. We evaluate the recall of Oreo on BigCloneBench, and perform manual evaluation for precision. Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity in a scalable manner @InProceedings{ESEC/FSE18p354, author = {Vaibhav Saini and Farima Farmahinifarahani and Yadong Lu and Pierre Baldi and Cristina V. Lopes}, title = {Oreo: Detection of Clones in the Twilight Zone}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {354--365}, doi = {10.1145/3236024.3236026}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Baltes, Sebastian |
ESEC/FSE '18: "Towards a Theory of Software ..."
Towards a Theory of Software Development Expertise
Sebastian Baltes and Stephan Diehl (University of Trier, Germany) Software development includes diverse tasks such as implementing new features, analyzing requirements, and fixing bugs. Being an expert in those tasks requires a certain set of skills, knowledge, and experience. Several studies investigated individual aspects of software development expertise, but what is missing is a comprehensive theory. We present a first conceptual theory of software development expertise that is grounded in data from a mixed-methods survey with 335 software developers and in literature on expertise and expert performance. Our theory currently focuses on programming, but already provides valuable insights for researchers, developers, and employers. The theory describes important properties of software development expertise and which factors foster or hinder its formation, including how developers' performance may decline over time. Moreover, our quantitative results show that developers' expertise self-assessments are context-dependent and that experience is not necessarily related to expertise. @InProceedings{ESEC/FSE18p187, author = {Sebastian Baltes and Stephan Diehl}, title = {Towards a Theory of Software Development Expertise}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {187--200}, doi = {10.1145/3236024.3236061}, year = {2018}, } Publisher's Version Info Artifacts Reusable Artifacts Functional |
|
Bang, Lucas |
ESEC/FSE '18: "Parameterized Model Counting ..."
Parameterized Model Counting for String and Numeric Constraints
Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu (Microsoft, USA; University of California at Santa Barbara, USA; National Chengchi University, Taiwan) Recently, symbolic program analysis techniques have been extended to quantitative analyses using model counting constraint solvers. Given a constraint and a bound, a model counting constraint solver computes the number of solutions for the constraint within the bound. We present a parameterized model counting constraint solver for string and numeric constraints. We first construct a multi-track deterministic finite state automaton that accepts all solutions to the given constraint. We limit the numeric constraints to linear integer arithmetic, and for non-regular string constraints we over-approximate the solution set. Counting the number of accepting paths in the generated automaton solves the model counting problem. Our approach is parameterized in the sense that, we do not assume a finite domain size during automata construction, resulting in a potentially infinite set of solutions, and our model counting approach works for arbitrarily large bounds. We experimentally demonstrate the effectiveness of our approach on a large set of string and numeric constraints extracted from software applications. We experimentally compare our tool to five existing model counting constraint solvers for string and numeric constraints and demonstrate that our tool is as efficient and as or more precise than other solvers. Moreover, our tool can handle mixed constraints with string and integer variables that no other tool can. @InProceedings{ESEC/FSE18p400, author = {Abdulbaki Aydin and William Eiers and Lucas Bang and Tegan Brennan and Miroslav Gavrilov and Tevfik Bultan and Fang Yu}, title = {Parameterized Model Counting for String and Numeric Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {400--410}, doi = {10.1145/3236024.3236064}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Bao, Lingfeng |
ESEC/FSE '18-DEMO: "VT-Revolution: Interactive ..."
VT-Revolution: Interactive Programming Tutorials Made Possible
Lingfeng Bao, Zhenchang Xing, Xin Xia, David Lo, and Shanping Li (Zhejiang University, China; Australian National University, Australia; Monash University, Australia; Singapore Management University, Singapore) Programming video tutorials showcase programming tasks and associated workflows. Although video tutorials are easy to create, it is often difficult to explore the captured workflows and interact with the programs in the videos. In this work, we propose a tool named VTRevolution -- an interactive programming video tutorial authoring system. VTRevolution has two components: 1) a tutorial authoring system leverages operating system level instrumentation to log workflow history while tutorial authors are creating programming video tutorials; 2) a tutorial watching system enhances the learning experience of video tutorials by providing operation history and timeline-based browsing interactions. Our tutorial authoring system does not require any special recording tools or instrumentation of target applications. Neither does it incur any additional burden on tutorial authors to add interactions to video tutorials. Given a video tutorial enriched with synchronously-logged workflow history, our tutorial watching system allows tutorial watchers to explore the captured workflows and interact with files and code in a way that is impossible for video data alone. We conduct a user study of 90 developers to evaluate the design and effectiveness of our system in helping developers learn programming knowledge in video tutorials. @InProceedings{ESEC/FSE18p924, author = {Lingfeng Bao and Zhenchang Xing and Xin Xia and David Lo and Shanping Li}, title = {VT-Revolution: Interactive Programming Tutorials Made Possible}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {924--927}, doi = {10.1145/3236024.3264587}, year = {2018}, } Publisher's Version ESEC/FSE '18-DEMO: "DSM: A Specification Mining ..." DSM: A Specification Mining Tool using Recurrent Neural Network Based Language Model Tien-Duy B. Le, Lingfeng Bao, and David Lo (Singapore Management University, Singapore; Zhejiang University, China) Formal specifications are important but often unavailable. Furthermore, writing these specifications is time-consuming and requires skills from developers. In this work, we present Deep Specification Miner (DSM), an automated tool that applies deep learning to mine finite-state automaton (FSA) based specifications. DSM accepts as input a set of execution traces to train a Recurrent Neural Network Language Model (RNNLM). From the input traces, DSM creates a Prefix Tree Acceptor (PTA) and leverages the inferred RNNLM to extract many features. These features are then forwarded to clustering algorithms for merging similar automata states in the PTA for assembling a number of FSAs. Next, our tool performs a model selection heuristic to approximate F-measure of FSAs, and outputs the one with the highest estimated F-measure. Noticeably, our implementation of DSM provides several options that allows users to optimize quality of resultant FSAs. Our video demonstration on the performance of DSM is publicly available at https://goo.gl/Ju4yFS. @InProceedings{ESEC/FSE18p896, author = {Tien-Duy B. Le and Lingfeng Bao and David Lo}, title = {DSM: A Specification Mining Tool using Recurrent Neural Network Based Language Model}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {896--899}, doi = {10.1145/3236024.3264597}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Using Finite-State Models ..." Using Finite-State Models for Log Differencing Hen Amar, Lingfeng Bao, Nimrod Busany, David Lo, and Shahar Maoz (Tel Aviv University, Israel; Zhejiang University, Singapore; Singapore Management University, Singapore) Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest. In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails. We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments. @InProceedings{ESEC/FSE18p49, author = {Hen Amar and Lingfeng Bao and Nimrod Busany and David Lo and Shahar Maoz}, title = {Using Finite-State Models for Log Differencing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {49--59}, doi = {10.1145/3236024.3236069}, year = {2018}, } Publisher's Version |
|
Barik, Titus |
ESEC/FSE '18: "How Should Compilers Explain ..."
How Should Compilers Explain Problems to Developers?
Titus Barik, Denae Ford, Emerson Murphy-Hill, and Chris Parnin (Microsoft, USA; North Carolina State University, USA) Compilers primarily give feedback about problems to developers through the use of error messages. Unfortunately, developers routinely find these messages to be confusing and unhelpful. In this paper, we postulate that because error messages present poor explanations, theories of explanation---such as Toulmin's model of argument---can be applied to improve their quality. To understand how compilers should present explanations to developers, we conducted a comparative evaluation with 68 professional software developers and an empirical study of compiler error messages found in Stack Overflow questions across seven different programming languages. Our findings suggest that, given a pair of error messages, developers significantly prefer the error message that employs proper argument structure over a deficient argument structure when neither offers a resolution---but will accept a deficient argument structure if it provides a resolution to the problem. Human-authored explanations on Stack Overflow converge to one of the three argument structures: those that provide a resolution to the error, simple arguments, and extended arguments that provide additional evidence for the problem. Finally, we contribute three practical design principles to inform the design and evaluation of compiler error messages. @InProceedings{ESEC/FSE18p633, author = {Titus Barik and Denae Ford and Emerson Murphy-Hill and Chris Parnin}, title = {How Should Compilers Explain Problems to Developers?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {633--643}, doi = {10.1145/3236024.3236040}, year = {2018}, } Publisher's Version |
|
Barr, Earl T. |
ESEC/FSE '18: "RefiNym: Using Names to Refine ..."
RefiNym: Using Names to Refine Types
Santanu Kumar Dash, Miltiadis Allamanis, and Earl T. Barr (University College London, UK; Microsoft Research, UK) Source code is bimodal: it combines a formal, algorithmic channel and a natural language channel of identifiers and comments. In this work, we model the bimodality of code with name flows, an assignment flow graph augmented to track identifier names. Conceptual types are logically distinct types that do not always coincide with program types. Passwords and URLs are example conceptual types that can share the program type string. Our tool, RefiNym, is an unsupervised method that mines a lattice of conceptual types from name flows and reifies them into distinct nominal types. For string, RefiNym finds and splits conceptual types originally merged into a single type, reducing the number of same-type variables per scope from 8.7 to 2.2 while eliminating 21.9% of scopes that have more than one same-type variable in scope. This makes the code more self-documenting and frees the type system to prevent a developer from inadvertently assigning data across conceptual types. @InProceedings{ESEC/FSE18p107, author = {Santanu Kumar Dash and Miltiadis Allamanis and Earl T. Barr}, title = {RefiNym: Using Names to Refine Types}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {107--117}, doi = {10.1145/3236024.3236042}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Darwinian Data Structure Selection ..." Darwinian Data Structure Selection Michail Basios, Lingbo Li, Fan Wu, Leslie Kanthan, and Earl T. Barr (University College London, UK) Data structure selection and tuning is laborious but can vastly improve an application’s performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce ARTEMIS a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS modulo a test suite, then changes an application to use that DDS. ARTEMIS achieves substantial performance improvements for every project in 5 Java projects from DaCapo benchmark, 8 popular projects and 30 uniformly sampled projects from GitHub. For execution time, CPU usage, and memory consumption, ARTEMIS finds at least one solution that improves all measures for 86% (37/43) of the projects. The median improvement across the best solutions is 4.8%, 10.1%, 5.1% for runtime, memory and CPU usage. These aggregate results understate ARTEMIS’s potential impact. Some of the benchmarks it improves are libraries or utility functions. Two examples are gson, a ubiquitous Java serialization framework, and xalan, Apache’s XML transformation tool. ARTEMIS improves gson by 16.5%, 1% and 2.2% for memory, runtime, and CPU; ARTEMIS improves xalan’s memory consumption by 23.5%. Every client of these projects will benefit from these performance improvements. @InProceedings{ESEC/FSE18p118, author = {Michail Basios and Lingbo Li and Fan Wu and Leslie Kanthan and Earl T. Barr}, title = {Darwinian Data Structure Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {118--128}, doi = {10.1145/3236024.3236043}, year = {2018}, } Publisher's Version Info ESEC/FSE '18: "Deep Learning Type Inference ..." Deep Learning Type Inference Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis (University of California at Davis, USA; Microsoft Research, USA; University College London, UK; Microsoft Research, UK) Dynamically typed languages such as JavaScript and Python are increasingly popular, yet static typing has not been totally eclipsed: Python now supports type annotations and languages like TypeScript offer a middle-ground for JavaScript: a strict superset of JavaScript, to which it transpiles, coupled with a type system that permits partially typed programs. However, static typing has a cost: adding annotations, reading the added syntax, and wrestling with the type system to fix type errors. Type inference can ease the transition to more statically typed code and unlock the benefits of richer compile-time information, but is limited in languages like JavaScript as it cannot soundly handle duck-typing or runtime evaluation via eval. We propose DeepTyper, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially. DeepTyper, leverages an automatically aligned corpus of tokens and types to accurately predict thousands of variable and function type annotations. Furthermore, we demonstrate that context is key in accurately assigning these types and introduce a technique to reduce overfitting on local cues while highlighting the need for further improvements. Finally, we show that our model can interact with a compiler to provide more than 4,000 additional type annotations with over 95% precision that could not be inferred without the aid of DeepTyper. @InProceedings{ESEC/FSE18p152, author = {Vincent J. Hellendoorn and Christian Bird and Earl T. Barr and Miltiadis Allamanis}, title = {Deep Learning Type Inference}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {152--162}, doi = {10.1145/3236024.3236051}, year = {2018}, } Publisher's Version Info |
|
Basios, Michail |
ESEC/FSE '18: "Darwinian Data Structure Selection ..."
Darwinian Data Structure Selection
Michail Basios, Lingbo Li, Fan Wu, Leslie Kanthan, and Earl T. Barr (University College London, UK) Data structure selection and tuning is laborious but can vastly improve an application’s performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce ARTEMIS a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS modulo a test suite, then changes an application to use that DDS. ARTEMIS achieves substantial performance improvements for every project in 5 Java projects from DaCapo benchmark, 8 popular projects and 30 uniformly sampled projects from GitHub. For execution time, CPU usage, and memory consumption, ARTEMIS finds at least one solution that improves all measures for 86% (37/43) of the projects. The median improvement across the best solutions is 4.8%, 10.1%, 5.1% for runtime, memory and CPU usage. These aggregate results understate ARTEMIS’s potential impact. Some of the benchmarks it improves are libraries or utility functions. Two examples are gson, a ubiquitous Java serialization framework, and xalan, Apache’s XML transformation tool. ARTEMIS improves gson by 16.5%, 1% and 2.2% for memory, runtime, and CPU; ARTEMIS improves xalan’s memory consumption by 23.5%. Every client of these projects will benefit from these performance improvements. @InProceedings{ESEC/FSE18p118, author = {Michail Basios and Lingbo Li and Fan Wu and Leslie Kanthan and Earl T. Barr}, title = {Darwinian Data Structure Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {118--128}, doi = {10.1145/3236024.3236043}, year = {2018}, } Publisher's Version Info |
|
Berger, Thorsten |
ESEC/FSE '18: "Model Transformation Languages ..."
Model Transformation Languages under a Magnifying Glass: A Controlled Experiment with Xtend, ATL, and QVT
Regina Hebig, Christoph Seidl, Thorsten Berger, John Kook Pedersen, and Andrzej Wąsowski (Chalmers University of Technology, Sweden; University of Gothenburg, Sweden; TU Braunschweig, Germany; IT University of Copenhagen, Denmark) In Model-Driven Software Development, models are automatically processed to support the creation, build, and execution of systems. A large variety of dedicated model-transformation languages exists, promising to efficiently realize the automated processing of models. To investigate the actual benefit of using such specialized languages, we performed a large-scale controlled experiment in which over 78 subjects solve 231 individual tasks using three languages. The experiment sheds light on commonalities and differences between model transformation languages (ATL, QVT-O) and on benefits of using them in common development tasks (comprehension, change, and creation) against a modern general-purpose language (Xtend). Our results show no statistically significant benefit of using a dedicated transformation language over a modern general-purpose language. However, we were able to identify several aspects of transformation programming where domain-specific transformation languages do appear to help, including copying objects, context identification, and conditioning the computation on types. @InProceedings{ESEC/FSE18p445, author = {Regina Hebig and Christoph Seidl and Thorsten Berger and John Kook Pedersen and Andrzej Wąsowski}, title = {Model Transformation Languages under a Magnifying Glass: A Controlled Experiment with Xtend, ATL, and QVT}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {445--455}, doi = {10.1145/3236024.3236046}, year = {2018}, } Publisher's Version Info |
|
Beschastnikh, Ivan |
ESEC/FSE '18: "Performance Comprehension ..."
Performance Comprehension at WiredTiger
Alexandra Fedorova, Craig Mustard, Ivan Beschastnikh, Julia Rubin, Augustine Wong, Svetozar Miucin, and Louis Ye (University of British Columbia, Canada) Software debugging is a time-consuming and challenging process. Supporting debugging has been a focus of the software engineering field since its inception with numerous empirical studies, theories, and tools to support developers in this task. Performance bugs and performance debugging is a sub-genre of debugging that has received less attention. In this paper we contribute an empirical case study of performance bug diagnosis in the WiredTiger project, the default database engine behind MongoDB. We perform an in-depth analysis of 44 Jira tickets documenting WiredTiger performance-related issues. We investigate how developers diagnose performance bugs: what information they collect, what tools they use, and what processes they follow. Our findings show that developers spend the majority of their performance debugging time chasing outlier events, such as latency spikes and throughput drops. Yet, they are not properly supported by existing performance debugging tools in this task. We also observe that developers often use tools without knowing in advance whether the obtained information will be relevant to debugging the problem. Therefore, we believe developers can benefit from tools that can be used for unstructured exploration of performance data, rather than for answering specific questions. @InProceedings{ESEC/FSE18p83, author = {Alexandra Fedorova and Craig Mustard and Ivan Beschastnikh and Julia Rubin and Augustine Wong and Svetozar Miucin and Louis Ye}, title = {Performance Comprehension at WiredTiger}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {83--94}, doi = {10.1145/3236024.3236081}, year = {2018}, } Publisher's Version |
|
Bhowmik, Tanmay |
ESEC/FSE '18-NIER: "Towards Data-Driven Vulnerability ..."
Towards Data-Driven Vulnerability Prediction for Requirements
Sayem Mohammad Imtiaz and Tanmay Bhowmik (Mississippi State University, USA) Due to the abundance of security breaches we continue to see, the software development community is recently paying attention to a more proactive approach towards security. This includes predicting vulnerability before exploitation employing static code analysis and machine learning techniques. Such mechanisms, however, are designed to detect post-implementation vulnerabilities. As the root of a vulnerability can often be traced back to the requirement specification, and vulnerability discovered later in the development life cycle is more expensive to fix, we need additional preventive mechanisms capable of predicting vulnerability at a much earlier stage. In this paper, we propose a novel framework providing an automated support to predict vulnerabilities for a requirement as early as during requirement engineering. We further present a preliminary demonstration of our framework and the promising results we observe clearly indicate the value of this new research idea. @InProceedings{ESEC/FSE18p744, author = {Sayem Mohammad Imtiaz and Tanmay Bhowmik}, title = {Towards Data-Driven Vulnerability Prediction for Requirements}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {744--748}, doi = {10.1145/3236024.3264836}, year = {2018}, } Publisher's Version |
|
Bian, Pan |
ESEC/FSE '18: "NAR-Miner: Discovering Negative ..."
NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection
Pan Bian, Bin Liang, Wenchang Shi, Jianjun Huang, and Yan Cai (Renmin University of China, China; Institute of Software at Chinese Academy of Sciences, China) Inferring programming rules from source code based on data mining techniques has been proven to be effective to detect software bugs. Existing studies focus on discovering positive rules in the form of A ⇒ B, indicating that when operation A appears, operation B should also be here. Unfortunately, the negative rules (A ⇒ ¬ B), indicating the mutual suppression or conflict relationships among program elements, have not gotten the attention they deserve. In fact, violating such negative rules can also result in serious bugs. In this paper, we propose a novel method called NAR-Miner to automatically extract negative association programming rules from large-scale systems, and detect their violations to find bugs. However, mining negative rules faces a more serious rule explosion problem than mining positive ones. Most of the obtained negative rules are uninteresting and can lead to unacceptable false alarms. To address the issue, we design a semantics-constrained mining algorithm to focus rule mining on the elements with strong semantic relationships. Furthermore, we introduce information entropy to rank candidate negative rules and highlight the interesting ones. Consequently, we effectively mitigate the rule explosion problem. We implement NAR-Miner and apply it to a Linux kernel (v4.12-rc6). The experiments show that the uninteresting rules are dramatically reduced and 17 detected violations have been confirmed as real bugs and patched by kernel community. We also apply NAR-Miner to PostgreSQL, OpenSSL and FFmpeg and discover six real bugs. @InProceedings{ESEC/FSE18p411, author = {Pan Bian and Bin Liang and Wenchang Shi and Jianjun Huang and Yan Cai}, title = {NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {411--422}, doi = {10.1145/3236024.3236032}, year = {2018}, } Publisher's Version |
|
Bird, Christian |
ESEC/FSE '18: "Deep Learning Type Inference ..."
Deep Learning Type Inference
Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis (University of California at Davis, USA; Microsoft Research, USA; University College London, UK; Microsoft Research, UK) Dynamically typed languages such as JavaScript and Python are increasingly popular, yet static typing has not been totally eclipsed: Python now supports type annotations and languages like TypeScript offer a middle-ground for JavaScript: a strict superset of JavaScript, to which it transpiles, coupled with a type system that permits partially typed programs. However, static typing has a cost: adding annotations, reading the added syntax, and wrestling with the type system to fix type errors. Type inference can ease the transition to more statically typed code and unlock the benefits of richer compile-time information, but is limited in languages like JavaScript as it cannot soundly handle duck-typing or runtime evaluation via eval. We propose DeepTyper, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially. DeepTyper, leverages an automatically aligned corpus of tokens and types to accurately predict thousands of variable and function type annotations. Furthermore, we demonstrate that context is key in accurately assigning these types and introduce a technique to reduce overfitting on local cues while highlighting the need for further improvements. Finally, we show that our model can interact with a compiler to provide more than 4,000 additional type annotations with over 95% precision that could not be inferred without the aid of DeepTyper. @InProceedings{ESEC/FSE18p152, author = {Vincent J. Hellendoorn and Christian Bird and Earl T. Barr and Miltiadis Allamanis}, title = {Deep Learning Type Inference}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {152--162}, doi = {10.1145/3236024.3236051}, year = {2018}, } Publisher's Version Info |
|
Bissyandé, Tegawendé F. |
ESEC/FSE '18: "FraudDroid: Automated Ad Fraud ..."
FraudDroid: Automated Ad Fraud Detection for Android Apps
Feng Dong, Haoyu Wang, Li Li, Yao Guo, Tegawendé F. Bissyandé, Tianming Liu, Guoai Xu, and Jacques Klein (Beijing University of Posts and Telecommunications, China; Monash University, Australia; Peking University, China; University of Luxembourg, Luxembourg) Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (∼ 93%) and recall (∼ 92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection. @InProceedings{ESEC/FSE18p257, author = {Feng Dong and Haoyu Wang and Li Li and Yao Guo and Tegawendé F. Bissyandé and Tianming Liu and Guoai Xu and Jacques Klein}, title = {FraudDroid: Automated Ad Fraud Detection for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--268}, doi = {10.1145/3236024.3236045}, year = {2018}, } Publisher's Version |
|
Bodden, Eric |
ESEC/FSE '18: "Do Android Taint Analysis ..."
Do Android Taint Analysis Tools Keep Their Promises?
Felix Pauck, Eric Bodden, and Heike Wehrheim (University of Paderborn, Germany) In recent years, researchers have developed a number of tools to conduct taint analysis of Android applications. While all the respective papers aim at providing a thorough empirical evaluation, comparability is hindered by varying or unclear evaluation targets. Sometimes, the apps used for evaluation are not precisely described. In other cases, authors use an established benchmark but cover it only partially. In yet other cases, the evaluations differ in terms of the data leaks searched for, or lack a ground truth to compare against. All those limitations make it impossible to truly compare the tools based on those published evaluations. We thus present ReproDroid, a framework allowing the accurate comparison of Android taint analysis tools. ReproDroid supports researchers in inferring the ground truth for data leaks in apps, in automatically applying tools to benchmarks, and in evaluating the obtained results. We use ReproDroid to comparatively evaluate on equal grounds the six prominent taint analysis tools Amandroid, DIALDroid, DidFail, DroidSafe, FlowDroid and IccTA. The results are largely positive although four tools violate some promises concerning features and accuracy. Finally, we contribute to the area of unbiased benchmarking with a new and improved version of the open test suite DroidBench. @InProceedings{ESEC/FSE18p331, author = {Felix Pauck and Eric Bodden and Heike Wehrheim}, title = {Do Android Taint Analysis Tools Keep Their Promises?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {331--341}, doi = {10.1145/3236024.3236029}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional ESEC/FSE '18-NIER: "Gamifying Static Analysis ..." Gamifying Static Analysis Lisa Nguyen Quang Do and Eric Bodden (University of Paderborn, Germany; Fraunhofer IEM, Germany) In the past decades, static code analysis has become a prevalent means to detect bugs and security vulnerabilities in software systems. As software becomes more complex, analysis tools also report lists of increasingly complex warnings that developers need to address on a daily basis. The novel insight we present in this work is that static analysis tools and video games both require users to take on repetitive and challenging tasks. Importantly, though, while good video games manage to keep players engaged, static analysis tools are notorious for their lacking user experience, which prevents developers from using them to their full potential, frequently resulting in dissatisfaction and even tool abandonment. We show parallels between gaming and using static analysis tools, and advocate that the user-experience issues of analysis tools can be addressed by looking at the analysis tooling system as a whole, and by integrating gaming elements that keep users engaged, such as providing immediate and clear feedback, collaborative problem solving, or motivators such as points and badges. @InProceedings{ESEC/FSE18p714, author = {Lisa Nguyen Quang Do and Eric Bodden}, title = {Gamifying Static Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {714--718}, doi = {10.1145/3236024.3264830}, year = {2018}, } Publisher's Version Info |
|
Böhme, Marcel |
ESEC/FSE '18: "Verifying the Long-Run Behavior ..."
Verifying the Long-Run Behavior of Probabilistic System Models in the Presence of Uncertainty
Yamilet R. Serrano Llerena, Marcel Böhme, Marc Brünink, Guoxin Su, and David S. Rosenblum (National University of Singapore, Singapore; Monash University, Australia; University of Wollongong, Australia) Verifying that a stochastic system is in a certain state when it has reached equilibrium has important applications. For instance, the probabilistic verification of the long-run behavior of a safety-critical system enables assessors to check whether it accepts a human abort-command at any time with a probability that is sufficiently high. The stochastic system is represented as probabilistic model, a long-run property is asserted and a probabilistic verifier checks the model against the property. However, existing probabilistic verifiers do not account for the imprecision of the probabilistic parameters in the model. Due to uncertainty, the probability of any state transition may be subject to small perturbations which can have direct consequences for the veracity of the verification result. In reality, the safety-critical system may accept the abort-command with an insufficient probability. In this paper, we introduce the first probabilistic verification technique that accounts for uncertainty on the verification of long-run properties of a stochastic system. We present a mathematical framework for the asymptotic analysis of the stationary distribution of a discrete-time Markov chain, making no assumptions about the distribution of the perturbations. Concretely, our novel technique computes upper and lower bounds on the long-run probability, given a certain degree of uncertainty about the stochastic system. @InProceedings{ESEC/FSE18p587, author = {Yamilet R. Serrano Llerena and Marcel Böhme and Marc Brünink and Guoxin Su and David S. Rosenblum}, title = {Verifying the Long-Run Behavior of Probabilistic System Models in the Presence of Uncertainty}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {587--597}, doi = {10.1145/3236024.3236078}, year = {2018}, } Publisher's Version |
|
Braberman, Víctor |
ESEC/FSE '18-IND: "Testing and Validating End ..."
Testing and Validating End User Programmed Calculated Fields
Víctor Braberman, Diego Garbervetsky, Javier Godoy, Sebastian Uchitel, Guido de Caso, Ignacio Perez, and Santiago Perez (University of Buenos Aires, Argentina; CONICET, Argentina; Imperial College London, UK; Medallia, USA) This paper reports on an approach for systematically generating test data from production databases for end user calculated field program via a novel combination of symbolic execution and database queries. We also discuss the opportunities and challenges that this specific domain poses for symbolic execution and shows how database queries can help complement some of symbolic execution's weaknesses, namely in the treatment of loops and also of path conditions that exceed SMT solver capabilities. @InProceedings{ESEC/FSE18p827, author = {Víctor Braberman and Diego Garbervetsky and Javier Godoy and Sebastian Uchitel and Guido de Caso and Ignacio Perez and Santiago Perez}, title = {Testing and Validating End User Programmed Calculated Fields}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {827--832}, doi = {10.1145/3236024.3275531}, year = {2018}, } Publisher's Version |
|
Brennan, Tegan |
ESEC/FSE '18: "Parameterized Model Counting ..."
Parameterized Model Counting for String and Numeric Constraints
Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu (Microsoft, USA; University of California at Santa Barbara, USA; National Chengchi University, Taiwan) Recently, symbolic program analysis techniques have been extended to quantitative analyses using model counting constraint solvers. Given a constraint and a bound, a model counting constraint solver computes the number of solutions for the constraint within the bound. We present a parameterized model counting constraint solver for string and numeric constraints. We first construct a multi-track deterministic finite state automaton that accepts all solutions to the given constraint. We limit the numeric constraints to linear integer arithmetic, and for non-regular string constraints we over-approximate the solution set. Counting the number of accepting paths in the generated automaton solves the model counting problem. Our approach is parameterized in the sense that, we do not assume a finite domain size during automata construction, resulting in a potentially infinite set of solutions, and our model counting approach works for arbitrarily large bounds. We experimentally demonstrate the effectiveness of our approach on a large set of string and numeric constraints extracted from software applications. We experimentally compare our tool to five existing model counting constraint solvers for string and numeric constraints and demonstrate that our tool is as efficient and as or more precise than other solvers. Moreover, our tool can handle mixed constraints with string and integer variables that no other tool can. @InProceedings{ESEC/FSE18p400, author = {Abdulbaki Aydin and William Eiers and Lucas Bang and Tegan Brennan and Miroslav Gavrilov and Tevfik Bultan and Fang Yu}, title = {Parameterized Model Counting for String and Numeric Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {400--410}, doi = {10.1145/3236024.3236064}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Brimble, Lance |
ESEC/FSE '18-IND: "Building Lean Continuous Integration ..."
Building Lean Continuous Integration and Delivery Pipelines by Applying DevOps Principles: A Case Study at Varidesk
Vidroha Debroy, Senecca Miller, and Lance Brimble (Varidesk, USA) Continuous Integration (CI) and Continuous Delivery (CD) are widely considered to be best practices in software development. Studies have shown however, that adopting these practices can be challenging and there are many barriers that engineers may face, such as – overly long build times, lack of support for desired workflows, issues with configuration, etc. At Varidesk, we recently began shifting our primary web application (from a monolithic) to a micro-services-based architecture and also adapted our software development practices to aim for more effective CI/CD. In doing so, we also ran into some of the same afore-mentioned barriers. In this paper we focus on two specific challenges that we faced – long wait times for builds/releases to be queued and completed, and the lack of support for tooling, especially from a cross-cloud perspective. We then present the solutions that we came up with, which involved re-thinking DevOps as it applied to us, and re-building our own CI/CD pipelines based on DevOps-supporting approaches such as containerization, infrastructure-as-code, and orchestration. Our re-designed pipelines have led us to see speed increases, in terms of total build/release time, in the range of 330x-1110x and have enabled us to seamlessly move from a single-cloud to a multi- cloud environment, with no architectural changes to any apps. @InProceedings{ESEC/FSE18p851, author = {Vidroha Debroy and Senecca Miller and Lance Brimble}, title = {Building Lean Continuous Integration and Delivery Pipelines by Applying DevOps Principles: A Case Study at Varidesk}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {851--856}, doi = {10.1145/3236024.3275528}, year = {2018}, } Publisher's Version |
|
Brindescu, Caius |
ESEC/FSE '18-DOC: "How Do Developers Resolve ..."
How Do Developers Resolve Merge Conflicts? An Investigation into the Processes, Tools, and Improvements
Caius Brindescu (Oregon State University, USA) Most software development is done in teams. When more than one developer is modifying the source code, there is a change that their changes will conflict. When this happens, developers have to interrupt their workflow in order to resolve the merge conflict. This interruption can lead to frustration and lost productivity. This makes collaboration, and the problems associated with it, an important aspect of software development. Merge conflicts are some of the more difficult issues that arise when working in a team. We plan to bring in more information about the strategies developers use when resolving merge conflicts. We will gather information through in-situ observations and interviews of developers resolving conflicts when working on real development tasks, combined with analytical methods. The information obtained can then be used to improve the existing tools and make it easier for developers when working in a collaborative environment. @InProceedings{ESEC/FSE18p952, author = {Caius Brindescu}, title = {How Do Developers Resolve Merge Conflicts? An Investigation into the Processes, Tools, and Improvements}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {952--955}, doi = {10.1145/3236024.3275430}, year = {2018}, } Publisher's Version |
|
Brun, Yuriy |
ESEC/FSE '18-DEMO: "Themis: Automatically Testing ..."
Themis: Automatically Testing Software for Discrimination
Rico Angell, Brittany Johnson, Yuriy Brun, and Alexandra Meliou (University of Massachusetts at Amherst, USA) Bias in decisions made by modern software is becoming a common and serious problem. We present Themis, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior. We explain how Themis can measure discrimination and aid its debugging, describe a set of optimizations Themis uses to reduce test suite size, and demonstrate Themis' effectiveness on open-source software. Themis is open-source and all our evaluation data are available at http://fairness.cs.umass.edu/. See a video of Themis in action: https://youtu.be/brB8wkaUesY @InProceedings{ESEC/FSE18p871, author = {Rico Angell and Brittany Johnson and Yuriy Brun and Alexandra Meliou}, title = {Themis: Automatically Testing Software for Discrimination}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {871--875}, doi = {10.1145/3236024.3264590}, year = {2018}, } Publisher's Version Video Info ESEC/FSE '18-NIER: "Software Fairness ..." Software Fairness Yuriy Brun and Alexandra Meliou (University of Massachusetts at Amherst, USA) A goal of software engineering research is advancing software quality and the success of the software engineering process. However, while recent studies have demonstrated a new kind of defect in software related to its ability to operate in fair and unbiased manner, software engineering has not yet wholeheartedly tackled these new kinds of defects, thus leaving software vulnerable. This paper outlines a vision for how software engineering research can help reduce fairness defects and represents a call to action by the software engineering research community to reify that vision. Modern software is riddled with examples of biased behavior, from automated translation injecting gender stereotypes, to vision systems failing to see faces of certain races, to the US criminal justice sytem relying on biased computational assessments of crime recidivism. While systems may learn bias from biased data, bias can also emerge from ambiguous or incomplete requirement specification, poor design, implementation bugs, and unintended component interactions. We argue that software fairness is analogous to software quality, and that numerous software engineering challenges in the areas of requirements, specification, design, testing, and verification need to be tackled to solve this problem. @InProceedings{ESEC/FSE18p754, author = {Yuriy Brun and Alexandra Meliou}, title = {Software Fairness}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {754--759}, doi = {10.1145/3236024.3264838}, year = {2018}, } Publisher's Version Info |
|
Brünink, Marc |
ESEC/FSE '18: "Verifying the Long-Run Behavior ..."
Verifying the Long-Run Behavior of Probabilistic System Models in the Presence of Uncertainty
Yamilet R. Serrano Llerena, Marcel Böhme, Marc Brünink, Guoxin Su, and David S. Rosenblum (National University of Singapore, Singapore; Monash University, Australia; University of Wollongong, Australia) Verifying that a stochastic system is in a certain state when it has reached equilibrium has important applications. For instance, the probabilistic verification of the long-run behavior of a safety-critical system enables assessors to check whether it accepts a human abort-command at any time with a probability that is sufficiently high. The stochastic system is represented as probabilistic model, a long-run property is asserted and a probabilistic verifier checks the model against the property. However, existing probabilistic verifiers do not account for the imprecision of the probabilistic parameters in the model. Due to uncertainty, the probability of any state transition may be subject to small perturbations which can have direct consequences for the veracity of the verification result. In reality, the safety-critical system may accept the abort-command with an insufficient probability. In this paper, we introduce the first probabilistic verification technique that accounts for uncertainty on the verification of long-run properties of a stochastic system. We present a mathematical framework for the asymptotic analysis of the stationary distribution of a discrete-time Markov chain, making no assumptions about the distribution of the perturbations. Concretely, our novel technique computes upper and lower bounds on the long-run probability, given a certain degree of uncertainty about the stochastic system. @InProceedings{ESEC/FSE18p587, author = {Yamilet R. Serrano Llerena and Marcel Böhme and Marc Brünink and Guoxin Su and David S. Rosenblum}, title = {Verifying the Long-Run Behavior of Probabilistic System Models in the Presence of Uncertainty}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {587--597}, doi = {10.1145/3236024.3236078}, year = {2018}, } Publisher's Version |
|
Bultan, Tevfik |
ESEC/FSE '18: "Parameterized Model Counting ..."
Parameterized Model Counting for String and Numeric Constraints
Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu (Microsoft, USA; University of California at Santa Barbara, USA; National Chengchi University, Taiwan) Recently, symbolic program analysis techniques have been extended to quantitative analyses using model counting constraint solvers. Given a constraint and a bound, a model counting constraint solver computes the number of solutions for the constraint within the bound. We present a parameterized model counting constraint solver for string and numeric constraints. We first construct a multi-track deterministic finite state automaton that accepts all solutions to the given constraint. We limit the numeric constraints to linear integer arithmetic, and for non-regular string constraints we over-approximate the solution set. Counting the number of accepting paths in the generated automaton solves the model counting problem. Our approach is parameterized in the sense that, we do not assume a finite domain size during automata construction, resulting in a potentially infinite set of solutions, and our model counting approach works for arbitrarily large bounds. We experimentally demonstrate the effectiveness of our approach on a large set of string and numeric constraints extracted from software applications. We experimentally compare our tool to five existing model counting constraint solvers for string and numeric constraints and demonstrate that our tool is as efficient and as or more precise than other solvers. Moreover, our tool can handle mixed constraints with string and integer variables that no other tool can. @InProceedings{ESEC/FSE18p400, author = {Abdulbaki Aydin and William Eiers and Lucas Bang and Tegan Brennan and Miroslav Gavrilov and Tevfik Bultan and Fang Yu}, title = {Parameterized Model Counting for String and Numeric Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {400--410}, doi = {10.1145/3236024.3236064}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Busany, Nimrod |
ESEC/FSE '18: "Using Finite-State Models ..."
Using Finite-State Models for Log Differencing
Hen Amar, Lingfeng Bao, Nimrod Busany, David Lo, and Shahar Maoz (Tel Aviv University, Israel; Zhejiang University, Singapore; Singapore Management University, Singapore) Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest. In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails. We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments. @InProceedings{ESEC/FSE18p49, author = {Hen Amar and Lingfeng Bao and Nimrod Busany and David Lo and Shahar Maoz}, title = {Using Finite-State Models for Log Differencing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {49--59}, doi = {10.1145/3236024.3236069}, year = {2018}, } Publisher's Version |
|
Cai, Yan |
ESEC/FSE '18: "NAR-Miner: Discovering Negative ..."
NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection
Pan Bian, Bin Liang, Wenchang Shi, Jianjun Huang, and Yan Cai (Renmin University of China, China; Institute of Software at Chinese Academy of Sciences, China) Inferring programming rules from source code based on data mining techniques has been proven to be effective to detect software bugs. Existing studies focus on discovering positive rules in the form of A ⇒ B, indicating that when operation A appears, operation B should also be here. Unfortunately, the negative rules (A ⇒ ¬ B), indicating the mutual suppression or conflict relationships among program elements, have not gotten the attention they deserve. In fact, violating such negative rules can also result in serious bugs. In this paper, we propose a novel method called NAR-Miner to automatically extract negative association programming rules from large-scale systems, and detect their violations to find bugs. However, mining negative rules faces a more serious rule explosion problem than mining positive ones. Most of the obtained negative rules are uninteresting and can lead to unacceptable false alarms. To address the issue, we design a semantics-constrained mining algorithm to focus rule mining on the elements with strong semantic relationships. Furthermore, we introduce information entropy to rank candidate negative rules and highlight the interesting ones. Consequently, we effectively mitigate the rule explosion problem. We implement NAR-Miner and apply it to a Linux kernel (v4.12-rc6). The experiments show that the uninteresting rules are dramatically reduced and 17 detected violations have been confirmed as real bugs and patched by kernel community. We also apply NAR-Miner to PostgreSQL, OpenSSL and FFmpeg and discover six real bugs. @InProceedings{ESEC/FSE18p411, author = {Pan Bian and Bin Liang and Wenchang Shi and Jianjun Huang and Yan Cai}, title = {NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {411--422}, doi = {10.1145/3236024.3236032}, year = {2018}, } Publisher's Version |
|
Canino, Anthony |
ESEC/FSE '18: "Stochastic Energy Optimization ..."
Stochastic Energy Optimization for Mobile GPS Applications
Anthony Canino, Yu David Liu, and Hidehiko Masuhara (SUNY Binghamton, USA; Tokyo Institute of Technology, Japan) Mobile applications regularly interact with their noisy and ever-changing physical environment. The fundamentally uncertain nature of such interactions leads to significant challenges in energy optimization, a crucial goal of software engineering on mobile devices. This paper presents Aeneas, a novel energy optimization framework for Android in the presence of uncertainty. Aeneas provides a minimalistic programming model where acceptable program behavioral settings are abstracted as knobs and application-specific optimization goals — such as meeting an energy budget — are crystallized as rewards, both of which are directly programmable. At its heart, Aeneas is endowed with a stochastic optimizer to adaptively and intelligently select the reward-optimal knob setting through a form of reinforcement learning. We evaluate Aeneas on mobile GPS applications built over Google LocationService API. Through an in-field case study that covers approximately 6500 miles and 150 hours of driving as well as 20 hours of biking and hiking, we find that Aeneas can effectively and resiliently meet programmer-specified energy budgets in uncertain physical environments where individual GPS readings undergo significant fluctuation. Compared with non-stochastic approaches such as profile-guided optimization, Aeneas produces significantly more stable results across runs. @InProceedings{ESEC/FSE18p703, author = {Anthony Canino and Yu David Liu and Hidehiko Masuhara}, title = {Stochastic Energy Optimization for Mobile GPS Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {703--713}, doi = {10.1145/3236024.3236076}, year = {2018}, } Publisher's Version |
|
Castelluccio, Marco |
ESEC/FSE '18: "What Makes a Code Change Easier ..."
What Makes a Code Change Easier to Review: An Empirical Investigation on Code Change Reviewability
Achyudh Ram, Anand Ashok Sawant, Marco Castelluccio, and Alberto Bacchelli (University of Waterloo, Canada; Delft University of Technology, Netherlands; Mozilla, UK; Federico II University of Naples, Italy; University of Zurich, Switzerland) Peer code review is a practice widely adopted in software projects to improve the quality of code. In current code review practices, code changes are manually inspected by developers other than the author before these changes are integrated into a project or put into production. We conducted a study to obtain an empirical understanding of what makes a code change easier to review. To this end, we surveyed published academic literature and sources from gray literature (blogs and white papers), we interviewed ten professional developers, and we designed and deployed a reviewability evaluation tool that professional developers used to rate the reviewability of 98 changes. We find that reviewability is defined through several factors, such as the change description, size, and coherent commit history. We provide recommendations for practitioners and researchers. Public preprint [https://doi.org/10.5281/zenodo.1323659]; data and materials [https://doi.org/10.5281/zenodo.1323659]. @InProceedings{ESEC/FSE18p201, author = {Achyudh Ram and Anand Ashok Sawant and Marco Castelluccio and Alberto Bacchelli}, title = {What Makes a Code Change Easier to Review: An Empirical Investigation on Code Change Reviewability}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {201--212}, doi = {10.1145/3236024.3236080}, year = {2018}, } Publisher's Version |
|
Celik, Ahmet |
ESEC/FSE '18-IND: "Regression Test Selection ..."
Regression Test Selection for TizenRT
Ahmet Celik, Young Chul Lee, and Milos Gligoric (University of Texas at Austin, USA; Samsung Electronics, South Korea) Regression testing - running tests after code modifications - is widely practiced in industry, including at Samsung. Regression Test Selection (RTS) optimizes regression testing by skipping tests that are not affected by recent code changes. Recent work has developed robust RTS tools, which mostly target managed languages, e.g., Java and C#, and thus are not applicable to large C projects, e.g., TizenRT, a lightweight RTOS-based platform. We present Selfection, an RTS tool for projects written in C; we discuss the key challenges to develop Selfection and our design decisions. Selfection uses the objdump and readelf tools to statically build a dependency graph of functions from binaries and detect modified code elements. We integrated Selfection in TizenRT and evaluated its benefits if tests are run in an emulator and on a supported hardware platform (ARTIK 053). We used the latest 150 revisions of TizenRT available on GitHub. We measured the benefits of Selfection as the reduction in the number of tests and reduction in test execution time over running all tests at each revision (i.e., RetestAll). Our results show that Selfection can reduce, on average, the number of tests to 4.95% and end-to-end execution time to 7.04% when tests are executed in the emulator, and to 5.74% and 26.82% when tests are executed on the actual hardware. Our results also show that the time taken to maintain the dependency graph and detect modified functions is negligible. @InProceedings{ESEC/FSE18p845, author = {Ahmet Celik and Young Chul Lee and Milos Gligoric}, title = {Regression Test Selection for TizenRT}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {845--850}, doi = {10.1145/3236024.3275527}, year = {2018}, } Publisher's Version |
|
Chen, Bihuan |
ESEC/FSE '18-DEMO: "FOT: A Versatile, Configurable, ..."
FOT: A Versatile, Configurable, Extensible Fuzzing Framework
Hongxu Chen, Yuekang Li, Bihuan Chen, Yinxing Xue, and Yang Liu (Nanyang Technological University, Singapore; Fudan University, China; University of Science and Technology of China, China) Greybox fuzzing is one of the most effective approaches for detecting software vulnerabilities. Various new techniques have been continuously emerging to enhance the effectiveness and/or efficiency by incorporating novel ideas into different components of a greybox fuzzer. However, there lacks a modularized fuzzing framework that can easily plugin new techniques and hence facilitate the reuse, integration and comparison of different techniques. To address this problem, we propose a fuzzing framework, namely Fuzzing Orchestration Toolkit (FOT). FOT is designed to be versatile, configurable and extensible. With FOT and its extensions, we have found 111 new bugs from 11 projects. Among these bugs, 18 CVEs have been assigned. Video link: https://youtu.be/O6Qu7BJ8RP0. @InProceedings{ESEC/FSE18p867, author = {Hongxu Chen and Yuekang Li and Bihuan Chen and Yinxing Xue and Yang Liu}, title = {FOT: A Versatile, Configurable, Extensible Fuzzing Framework}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {867--870}, doi = {10.1145/3236024.3264593}, year = {2018}, } Publisher's Version Video Info |
|
Chen, Di |
ESEC/FSE '18: "Applications of Psychological ..."
Applications of Psychological Science for Actionable Analytics
Di Chen, Wei Fu, Rahul Krishna, and Tim Menzies (North Carolina State University, USA) According to psychological scientists, humans understand models that most match their own internal models, which they characterize as lists of "heuristic"s (i.e. lists of very succinct rules). One such heuristic rule generator is the Fast-and-Frugal Trees (FFT) preferred by psychological scientists. Despite their successful use in many applied domains, FFTs have not been applied in software analytics. Accordingly, this paper assesses FFTs for software analytics. We find that FFTs are remarkably effective in that their models are very succinct (5 lines or less describing a binary decision tree) while also outperforming result from very recent, top-level, conference papers. Also, when we restrict training data to operational attributes (i.e., those attributes that are frequently changed by developers), the performance of FFTs are not effected (while the performance of other learners can vary wildly). Our conclusions are two-fold. Firstly, there is much that software analytics community could learn from psychological science. Secondly, proponents of complex methods should always baseline those methods against simpler alternatives. For example, FFTs could be used as a standard baseline learner against which other software analytics tools are compared. @InProceedings{ESEC/FSE18p456, author = {Di Chen and Wei Fu and Rahul Krishna and Tim Menzies}, title = {Applications of Psychological Science for Actionable Analytics}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {456--467}, doi = {10.1145/3236024.3236050}, year = {2018}, } Publisher's Version |
|
Chen, Dongjie |
ESEC/FSE '18: "Testing Multithreaded Programs ..."
Testing Multithreaded Programs via Thread Speed Control
Dongjie Chen, Yanyan Jiang, Chang Xu, Xiaoxing Ma, and Jian Lu (Nanjing University, China) A multithreaded program's interleaving space is discrete and astronomically large, making effectively sampling thread schedules for manifesting concurrency bugs a challenging task. Observing that concurrency bugs can be manifested by adjusting thread relative speeds, this paper presents the new concept of speed space in which each vector denotes a family of thread schedules. A multithreaded program's speed space is approximately continuous, easy-to-sample, and preserves certain categories of concurrency bugs. We discuss the design, implementation, and evaluation of our speed-controlled scheduler for exploring adversarial/abnormal schedules. The experimental results confirm that our technique is effective in sampling diverse schedules. Our implementation also found previously unknown concurrency bugs in real-world multithreaded programs. @InProceedings{ESEC/FSE18p15, author = {Dongjie Chen and Yanyan Jiang and Chang Xu and Xiaoxing Ma and Jian Lu}, title = {Testing Multithreaded Programs via Thread Speed Control}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {15--25}, doi = {10.1145/3236024.3236077}, year = {2018}, } Publisher's Version Info |
|
Chen, Hongxu |
ESEC/FSE '18-DEMO: "FOT: A Versatile, Configurable, ..."
FOT: A Versatile, Configurable, Extensible Fuzzing Framework
Hongxu Chen, Yuekang Li, Bihuan Chen, Yinxing Xue, and Yang Liu (Nanyang Technological University, Singapore; Fudan University, China; University of Science and Technology of China, China) Greybox fuzzing is one of the most effective approaches for detecting software vulnerabilities. Various new techniques have been continuously emerging to enhance the effectiveness and/or efficiency by incorporating novel ideas into different components of a greybox fuzzer. However, there lacks a modularized fuzzing framework that can easily plugin new techniques and hence facilitate the reuse, integration and comparison of different techniques. To address this problem, we propose a fuzzing framework, namely Fuzzing Orchestration Toolkit (FOT). FOT is designed to be versatile, configurable and extensible. With FOT and its extensions, we have found 111 new bugs from 11 projects. Among these bugs, 18 CVEs have been assigned. Video link: https://youtu.be/O6Qu7BJ8RP0. @InProceedings{ESEC/FSE18p867, author = {Hongxu Chen and Yuekang Li and Bihuan Chen and Yinxing Xue and Yang Liu}, title = {FOT: A Versatile, Configurable, Extensible Fuzzing Framework}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {867--870}, doi = {10.1145/3236024.3264593}, year = {2018}, } Publisher's Version Video Info |
|
Chen, Jia |
ESEC/FSE '18: "Singularity: Pattern Fuzzing ..."
Singularity: Pattern Fuzzing for Worst Case Complexity
Jiayi Wei, Jia Chen, Yu Feng, Kostas Ferles, and Isil Dillig (University of Texas at Austin, USA) We describe a new blackbox complexity testing technique for determining the worst-case asymptotic complexity of a given application. The key idea is to look for an input pattern —rather than a concrete input— that maximizes the asymptotic resource usage of the target program. Because input patterns can be described concisely as programs in a restricted language, our method transforms the complexity testing problem to optimal program synthesis. In particular, we express these input patterns using a new model of computation called Recurrent Computation Graph (RCG) and solve the optimal synthesis problem by developing a genetic programming algorithm that operates on RCGs. We have implemented the proposed ideas in a tool called Singularityand evaluate it on a diverse set of benchmarks. Our evaluation shows that Singularitycan effectively discover the worst-case complexity of various algorithms and that it is more scalable compared to existing state-of-the-art techniques. Furthermore, our experiments also corroborate that Singularitycan discover previously unknown performance bugs and availability vulnerabilities in real-world applications such as Google Guava and JGraphT. @InProceedings{ESEC/FSE18p213, author = {Jiayi Wei and Jia Chen and Yu Feng and Kostas Ferles and Isil Dillig}, title = {Singularity: Pattern Fuzzing for Worst Case Complexity}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {213--223}, doi = {10.1145/3236024.3236039}, year = {2018}, } Publisher's Version Results Replicated |
|
Chen, Junjie |
ESEC/FSE '18: "Optimizing Test Prioritization ..."
Optimizing Test Prioritization via Test Distribution Analysis
Junjie Chen, Yiling Lou, Lingming Zhang, Jianyi Zhou, Xiaoleng Wang, Dan Hao, and Lu Zhang (Peking University, China; University of Texas at Dallas, USA; Baidu, China) Test prioritization aims to detect regression faults faster via reordering test executions, and a large number of test prioritization techniques have been proposed accordingly. However, test prioritization effectiveness is usually measured in terms of the average percentage of faults detected concerned with the number of test executions, rather than the actual regression testing time, making it unclear which technique is optimal in actual regression testing time. To answer this question, this paper first conducts an empirical study to investigate the actual regression testing time of various prioritization techniques. The results reveal a number of practical guidelines. In particular, no prioritization technique can always perform optimal in practice. To achieve the optimal prioritization effectiveness for any given project in practice, based on the findings of this study, we design learning-based Predictive Test Prioritization (PTP). PTP predicts the optimal prioritization technique for a given project based on the test distribution analysis (i.e., the distribution of test coverage, testing time, and coverage per unit time). The results show that PTP correctly predicts the optimal prioritization technique for 46 out of 50 open-source projects from GitHub, outperforming state-of-the-art techniques significantly in regression testing time, e.g., 43.16% to 94.92% improvement in detecting the first regression fault. Furthermore, PTP has been successfully integrated into the practical testing infrastructure of Baidu (a search service provider with over 600M monthly active users), and received positive feedbacks from the testing team of this company, e.g., saving beyond 2X testing costs with negligible overheads. @InProceedings{ESEC/FSE18p656, author = {Junjie Chen and Yiling Lou and Lingming Zhang and Jianyi Zhou and Xiaoleng Wang and Dan Hao and Lu Zhang}, title = {Optimizing Test Prioritization via Test Distribution Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {656--667}, doi = {10.1145/3236024.3236053}, year = {2018}, } Publisher's Version |
|
Chen, Quan |
ESEC/FSE '18-NIER: "DLFuzz: Differential Fuzzing ..."
DLFuzz: Differential Fuzzing Testing of Deep Learning Systems
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun (Tsinghua University, China; Shanghai Jiao Tong University, China) Deep learning (DL) systems are increasingly applied to safety-critical domains such as autonomous driving cars. It is of significant importance to ensure the reliability and robustness of DL systems. Existing testing methodologies always fail to include rare inputs in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz, the first differential fuzzing testing framework to guide DL systems exposing incorrect behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and the prediction difference between the original input and the mutated input, without manual labeling effort or cross-referencing oracles from other DL systems with the same functionality. We present empirical evaluations on two well-known datasets to demonstrate its efficiency. Compared with DeepXplore, the state-of-the-art DL whitebox testing framework, DLFuzz does not require extra efforts to find similar functional DL systems for cross-referencing check, but could generate 338.59% more adversarial inputs with 89.82% smaller perturbations, averagely obtain 2.86% higher neuron coverage, and save 20.11% time consumption. @InProceedings{ESEC/FSE18p739, author = {Jianmin Guo and Yu Jiang and Yue Zhao and Quan Chen and Jiaguang Sun}, title = {DLFuzz: Differential Fuzzing Testing of Deep Learning Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {739--743}, doi = {10.1145/3236024.3264835}, year = {2018}, } Publisher's Version |
|
Chen, Sen |
ESEC/FSE '18-IND: "Are Mobile Banking Apps Secure? ..."
Are Mobile Banking Apps Secure? What Can Be Improved?
Sen Chen, Ting Su, Lingling Fan, Guozhu Meng, Minhui Xue, Yang Liu, and Lihua Xu (East China Normal University, China; Nanyang Technological University, Singapore; Chinese Academy of Sciences, China; Macquarie University, Australia; New York University Shanghai, China) Mobile banking apps, as one of the most contemporary FinTechs, have been widely adopted by banking entities to provide instant financial services. However, our recent work discovered thousands of vulnerabilities in 693 banking apps, which indicates these apps are not as secure as we expected. This motivates us to conduct this study for understanding the current security status of them. First, we take 6 months to track the reporting and patching procedure of these vulnerabilities. Second, we audit 4 state-of the-art vulnerability detection tools on those patched vulnerabilities. Third, we discuss with 7 banking entities via in-person or online meetings and conduct an online survey to gain more feedback from financial app developers. Through this study, we reveal that (1) people may have inconsistent understandings of the vulnerabilities and different criteria for rating severity; (2) state-of-the-art tools are not effective in detecting vulnerabilities that the banking entities most concern; and (3) more efforts should be endeavored in different aspects to secure banking apps. We believe our study can help bridge the existing gaps, and further motivate different parties, including banking entities, researchers and policy makers, to better tackle security issues altogether. @InProceedings{ESEC/FSE18p797, author = {Sen Chen and Ting Su and Lingling Fan and Guozhu Meng and Minhui Xue and Yang Liu and Lihua Xu}, title = {Are Mobile Banking Apps Secure? What Can Be Improved?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {797--802}, doi = {10.1145/3236024.3275523}, year = {2018}, } Publisher's Version |
|
Chen, Xiangping |
ESEC/FSE '18-NIER: "Salient-Class Location: Help ..."
Salient-Class Location: Help Developers Understand Code Change in Code Review
Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng (Sun Yat-sen University, China; Hebei GEO University, China) Code review involves a significant amount of human effort to understand the code change, because the information required to inspect code changes may distribute across multiple files that reviewers are not familiar with. Code changes are often organized as commits for review. In this paper, we found that most of the commits contain a salient class, which is saliently modified and causes the modification of the rest classes in a commit. Our user studies confirmed that identifying the salient class in a commit can facilitate reviewers in understanding code change. We model the salient class identification as a binary classification problem and extract a number of discriminative features from commit to characterize the salience of a class. The initial experiment result shows that the proposed approach can improve the efficiency of reviewers understanding code changes in code review. @InProceedings{ESEC/FSE18p770, author = {Yuan Huang and Nan Jia and Xiangping Chen and Kai Hong and Zibin Zheng}, title = {Salient-Class Location: Help Developers Understand Code Change in Code Review}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {770--774}, doi = {10.1145/3236024.3264841}, year = {2018}, } Publisher's Version |
|
Chen, Yuanliang |
ESEC/FSE '18-IND: "PAFL: Extend Fuzzing Optimizations ..."
PAFL: Extend Fuzzing Optimizations of Single Mode to Industrial Parallel Mode
Jie Liang, Yu Jiang, Yuanliang Chen, Mingzhe Wang, Chijin Zhou, and Jiaguang Sun (Tsinghua University, China) Researchers have proposed many optimizations to improve the efficiency of fuzzing, and most optimized strategies work very well on their targets when running in single mode with instantiating one fuzzer instance. However, in real industrial practice, most fuzzers run in parallel mode with instantiating multiple fuzzer instances, and those optimizations unfortunately fail to maintain the efficiency improvements. In this paper, we present PAFL, a framework that utilizes efficient guiding information synchronization and task division to extend those existing fuzzing optimizations of single mode to industrial parallel mode. With an additional data structure to store the guiding information, the synchronization ensures the information is shared and updated among different fuzzer instances timely. Then, the task division promotes the diversity of fuzzer instances by splitting the fuzzing task into several sub-tasks based on branch bitmap. We first evaluate PAFL using 12 different real-world programs from Google fuzzer-test-suite. Results show that in parallel mode, two AFL improvers–AFLFast and FairFuzz do not outperform AFL, which is different from the case in single mode. However, when augmented with PAFL, the performance of AFLFast and FairFuzz in parallel mode improves. They cover 8% and 17% more branches, trigger 79% and 52% more unique crashes. For further evaluation on more widely-used software systems from GitHub, optimized fuzzers augmented with PAFL find more real bugs, and 25 of which are security-critical vulnerabilities registered as CVEs in the US National Vulnerability Database. @InProceedings{ESEC/FSE18p809, author = {Jie Liang and Yu Jiang and Yuanliang Chen and Mingzhe Wang and Chijin Zhou and Jiaguang Sun}, title = {PAFL: Extend Fuzzing Optimizations of Single Mode to Industrial Parallel Mode}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {809--814}, doi = {10.1145/3236024.3275525}, year = {2018}, } Publisher's Version |
|
Cheung, Alvin |
ESEC/FSE '18-DEMO: "PowerStation: Automatically ..."
PowerStation: Automatically Detecting and Fixing Inefficiencies of Database-Backed Web Applications in IDE
Junwen Yang, Cong Yan, Pranav Subramaniam, Shan Lu, and Alvin Cheung (University of Chicago, USA; University of Washington, USA) Modern web applications are built using a myriad of software components, and each of them exposes different programming models (e.g., application logic expressed in an imperative language, database queries expressed using declarative SQL). To improve programmer productivity, Object Relational Mapping (ORM) frameworks have been developed to allow developers build web applications in an object-oriented manner. Despite such frameworks, prior work has found that developers still struggle in developing performant ORM-based web applications. This paper presents PowerStation, a RubyMine IDE plugin for optimizing web applications developed using the Ruby on Rails ORM. Using automated static analysis, PowerStation detects ORM-related inefficiency problems and suggests fixes to developers. Our evaluation using 12 real-world applications shows that PowerStation can automatically detects 1221 performance issues across them. A tutorial on using PowerStation can be found at https://youtu.be/rAV8CGuSj6k. @InProceedings{ESEC/FSE18p884, author = {Junwen Yang and Cong Yan and Pranav Subramaniam and Shan Lu and Alvin Cheung}, title = {PowerStation: Automatically Detecting and Fixing Inefficiencies of Database-Backed Web Applications in IDE}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {884--887}, doi = {10.1145/3236024.3264589}, year = {2018}, } Publisher's Version |
|
Cheung, Shing-Chi |
ESEC/FSE '18: "Do the Dependency Conflicts ..."
Do the Dependency Conflicts in My Project Matter?
Ying Wang, Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, Bo Yang, Hai Yu, Zhiliang Zhu, and Shing-Chi Cheung (Northeastern University, China; Hong Kong University of Science and Technology, China) Intensive dependencies of a Java project on third-party libraries can easily lead to the presence of multiple library or class versions on its classpath. When this happens, JVM will load one version and shadows the others. Dependency conflict (DC) issues occur when the loaded version fails to cover a required feature (e.g., method) referenced by the project, thus causing runtime exceptions. However, the warnings of duplicate classes or libraries detected by existing build tools such as Maven can be benign since not all instances of duplication will induce runtime exceptions, and hence are often ignored by developers. In this paper, we conducted an empirical study on real-world DC issues collected from large open source projects. We studied the manifestation and fixing patterns of DC issues. Based on our findings, we designed Decca, an automated detection tool that assesses DC issues' severity and filters out the benign ones. Our evaluation results on 30 projects show that Decca achieves a precision of 0.923 and recall of 0.766 in detecting high-severity DC issues. Decca also detected new DC issues in these projects. Subsequently, 20 DC bug reports were filed, and 11 of them were confirmed by developers. Issues in 6 reports were fixed with our suggested patches. @InProceedings{ESEC/FSE18p319, author = {Ying Wang and Ming Wen and Zhenwei Liu and Rongxin Wu and Rui Wang and Bo Yang and Hai Yu and Zhiliang Zhu and Shing-Chi Cheung}, title = {Do the Dependency Conflicts in My Project Matter?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {319--330}, doi = {10.1145/3236024.3236056}, year = {2018}, } Publisher's Version Info Artifacts Reusable Artifacts Functional ESEC/FSE '18: "Which Generated Test Failures ..." Which Generated Test Failures Are Fault Revealing? Prioritizing Failures Based on Inferred Precondition Violations using PAF Mijung Kim, Shing-Chi Cheung, and Sunghun Kim (Hong Kong University of Science and Technology, China) Automated unit testing tools, such as Randoop, have been developed to produce failing tests as means of finding faults. However, these tools often produce false alarms, so are not widely used in practice. The main reason for a false alarm is that the generated failing test violates an implicit precondition of the method under test, such as a field should not be null at the entry of the method. This condition is not explicitly programmed or documented but implicitly assumed by developers. To address this limitation, we propose a technique called PAF to cluster generated test failures due to the same cause and reorder them based on their likelihood of violating an implicit precondition of the method under test. From various test executions, PAF observes their dataflows to the variables whose values are used when the program fails. Based on the dataflow similarity and where these values are originated, PAF clusters failures and determines their likelihood of being fault revealing. We integrated PAF into Randoop. Our empirical results on open-source projects show that PAF effectively clusters fault revealing tests arising from the same fault and successfully prioritizes the fault-revealing ones. @InProceedings{ESEC/FSE18p679, author = {Mijung Kim and Shing-Chi Cheung and Sunghun Kim}, title = {Which Generated Test Failures Are Fault Revealing? Prioritizing Failures Based on Inferred Precondition Violations using PAF}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {679--690}, doi = {10.1145/3236024.3236058}, year = {2018}, } Publisher's Version |
|
Chintalapati, Murali |
ESEC/FSE '18: "Predicting Node Failure in ..."
Predicting Node Failure in Cloud Service Systems
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, and Dongmei Zhang (Microsoft Research, China; Microsoft, USA; University of Newcastle, Australia) In recent years, many traditional software systems have migrated to cloud computing platforms and are provided as online services. The service quality matters because system failures could seriously affect business and user experience. A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability. Predicting node failure in cloud service systems is challenging, because a node failure could be caused by a variety of reasons and reflected by many temporal and spatial signals. Furthermore, the failure data is highly imbalanced. To tackle these challenges, we propose MING, a novel technique that combines: 1) a LSTM model to incorporate the temporal data, 2) a Random Forest model to incorporate spatial data; 3) a ranking model that embeds the intermediate results of the two models as feature inputs and ranks the nodes by their failure-proneness, 4) a cost-sensitive function to identify the optimal threshold for selecting the faulty nodes. We evaluate our approach using real-world data collected from a cloud service system. The results confirm the effectiveness of the proposed approach. We have also successfully applied the proposed approach in real industrial practice. @InProceedings{ESEC/FSE18p480, author = {Qingwei Lin and Ken Hsieh and Yingnong Dang and Hongyu Zhang and Kaixin Sui and Yong Xu and Jian-Guang Lou and Chenggang Li and Youjiang Wu and Randolph Yao and Murali Chintalapati and Dongmei Zhang}, title = {Predicting Node Failure in Cloud Service Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {480--490}, doi = {10.1145/3236024.3236060}, year = {2018}, } Publisher's Version |
|
Chua, Hendy |
ESEC/FSE '18-IND: "Efficient Static Checking ..."
Efficient Static Checking of Library Updates
Darius Foo, Hendy Chua, Jason Yeo, Ming Yi Ang, and Asankhaya Sharma (CA Technologies, Singapore) Software engineering practices have evolved to the point where a developer writing a new application today doesn’t start from scratch, but reuses a number of open source libraries and components. These third-party libraries evolve independently of the applications in which they are used, and may not maintain stable interfaces as bugs and vulnerabilities in them are fixed. This in turn causes API incompatibilities in downstream applications which must be manually resolved. Oversight here may manifest in many ways, from test failures to crashes at runtime. To address this problem, we present a static analysis for automatically and efficiently checking if a library upgrade introduces an API incompatibility. Our analysis does not rely on reported version information from library developers, and instead computes the actual differences between methods in libraries across different versions. The analysis is scalable, enabling real-time diff queries involving arbitrary pairs of library versions. It supports a vulnerability remediation product which suggests library upgrades automatically and is lightweight enough to be part of a continuous integration/delivery (CI/CD) pipeline. To evaluate the effectiveness of our approach, we determine semantic versioning adherence of a corpus of open source libraries taken from Maven Central, PyPI, and RubyGems. We find that on average, 26% of library versions are in violation of semantic versioning. We also analyze a collection of popular open source projects from GitHub to determine if we can automatically update libraries in them without causing API incompatibilities. Our results indicate that we can suggest upgrades automatically for 10% of the libraries. @InProceedings{ESEC/FSE18p791, author = {Darius Foo and Hendy Chua and Jason Yeo and Ming Yi Ang and Asankhaya Sharma}, title = {Efficient Static Checking of Library Updates}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {791--796}, doi = {10.1145/3236024.3275535}, year = {2018}, } Publisher's Version |
|
Cimatti, Alessandro |
ESEC/FSE '18: "Symbolic Execution with Existential ..."
Symbolic Execution with Existential Second-Order Constraints
Sergey Mechtaev, Alberto Griggio, Alessandro Cimatti, and Abhik Roychoudhury (National University of Singapore, Singapore; Fondazione Bruno Kessler, Italy) Symbolic execution systematically explores program paths by solving path conditions --- formulas over symbolic variables. Typically, the symbolic variables range over numbers, arrays and strings. We introduce symbolic execution with existential second-order constraints --- an extension of traditional symbolic execution that allows symbolic variables to range over functions whose interpretations are restricted by a user-defined language. The aims of this new technique are twofold. First, it offers a general analysis framework that can be applied in multiple domains such as program repair and library modelling. Secondly, it addresses the path explosion problem of traditional first-order symbolic execution in certain applications. To realize this technique, we integrate symbolic execution with program synthesis. Specifically, we propose a method of second-order constraint solving that provides efficient proofs of unsatisfiability, which is critical for the performance of symbolic execution. Our evaluation shows that the proposed technique (1) helps to repair programs with loops by mitigating the path explosion, (2) can enable analysis of applications written against unavailable libraries by modelling these libraries from the usage context. @InProceedings{ESEC/FSE18p389, author = {Sergey Mechtaev and Alberto Griggio and Alessandro Cimatti and Abhik Roychoudhury}, title = {Symbolic Execution with Existential Second-Order Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {389--399}, doi = {10.1145/3236024.3236049}, year = {2018}, } Publisher's Version |
|
Coghlan, Christy A. |
ESEC/FSE '18: "The Impact of Regular Expression ..."
The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale
James C. Davis, Christy A. Coghlan, Francisco Servant, and Dongyoon Lee (Virginia Tech, USA) Regular expressions (regexes) are a popular and powerful means of automatically manipulating text. Regexes are also an understudied denial of service vector (ReDoS). If a regex has super-linear worst-case complexity, an attacker may be able to trigger this complexity, exhausting the victim’s CPU resources and causing denial of service. Existing research has shown how to detect these superlinear regexes, and practitioners have identified super-linear regex anti-pattern heuristics that may lead to such complexity. In this paper, we empirically study three major aspects of ReDoS that have hitherto been unexplored: the incidence of super-linear regexes in practice, how they can be prevented, and how they can be repaired. In the ecosystems of two of the most popular programming languages — JavaScript and Python – we detected thousands of super-linear regexes affecting over 10,000 modules across diverse application domains. We also found that the conventional wisdom for super-linear regex anti-patterns has few false negatives but many false positives; these anti-patterns appear to be necessary, but not sufficient, signals of super-linear behavior. Finally, we found that when faced with a super-linear regex, developers favor revising it over truncating input or developing a custom parser, regardless of whether they had been shown examples of all three fix strategies. These findings motivate further research into ReDoS, since many modules are vulnerable to it and existing mechanisms to avoid it are insufficient. We believe that ReDoS vulnerabilities are a larger threat in practice than might have been guessed. @InProceedings{ESEC/FSE18p246, author = {James C. Davis and Christy A. Coghlan and Francisco Servant and Dongyoon Lee}, title = {The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {246--256}, doi = {10.1145/3236024.3236027}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Cohen, Myra B. |
ESEC/FSE '18-IND: "Modeling and Testing a Family ..."
Modeling and Testing a Family of Surgical Robots: An Experience Report
Niloofar Mansoor, Jonathan A. Saddler, Bruno Silva, Hamid Bagheri, Myra B. Cohen, and Shane Farritor (University of Nebraska-Lincoln, USA) Safety-critical applications often use dependability cases to validate that specified properties are invariant, or to demonstrate a counter example showing how that property might be violated. However, most dependability cases are written with a single product in mind. At the same time, software product lines (families of related software products) have been studied with the goal of modeling variability and commonality, and building family based techniques for both analysis and testing. However, there has been little work on building an end to end dependability case for a software product line (where a property is modeled, a counter example is found and then validated as a true positive via testing), and none that we know of in an emerging safety-critical domain, that of robotic surgery. In this paper, we study a family of surgical robots, that combine hardware and software, and are highly configurable, representing over 1300 unique robots. At the same time, they are considered safety-critical and should have associated dependability cases. We perform a case study to understand how we can bring together lightweight formal analysis, feature modeling, and testing to provide an end to end pipeline to find potential violations of important safety properties. In the process, we learned that there are some interesting and open challenges for the research community, which if solved will lead towards more dependable safety-critical cyber-physical systems. @InProceedings{ESEC/FSE18p785, author = {Niloofar Mansoor and Jonathan A. Saddler and Bruno Silva and Hamid Bagheri and Myra B. Cohen and Shane Farritor}, title = {Modeling and Testing a Family of Surgical Robots: An Experience Report}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {785--790}, doi = {10.1145/3236024.3275534}, year = {2018}, } Publisher's Version |
|
Cordeiro, Lucas C. |
ESEC/FSE '18-NIER: "Towards Counterexample-Guided ..."
Towards Counterexample-Guided k-Induction for Fast Bug Detection
Mikhail R. Gadelha, Felipe R. Monteiro, Lucas C. Cordeiro, and Denis A. Nicole (University of Southampton, UK; Federal University of Amazonas, Brazil; University of Manchester, UK) Recently, the k-induction algorithm has proven to be a successful approach for both finding bugs and proving correctness. However, since the algorithm is an incremental approach, it might waste resources trying to prove incorrect programs. In this paper, we extend the k-induction algorithm to shorten the number of steps required to find a property violation. We convert the algorithm into a meet-in-the-middle bidirectional search algorithm, using the counterexample produced from over-approximating the program. The main advantage is in the reduction of the state explosion by reducing the maximum required steps from k to ⌊k/2 + 1⌋. @InProceedings{ESEC/FSE18p765, author = {Mikhail R. Gadelha and Felipe R. Monteiro and Lucas C. Cordeiro and Denis A. Nicole}, title = {Towards Counterexample-Guided k-Induction for Fast Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {765--769}, doi = {10.1145/3236024.3264840}, year = {2018}, } Publisher's Version Video Info |
|
Da Costa, Daniel Alencar |
ESEC/FSE '18: "Winning the App Production ..."
Winning the App Production Rally
Ehsan Noei, Daniel Alencar Da Costa, and Ying Zou (Queen's University, Canada) When a user looks for an Android app in Google Play Store, a number of apps appear in a specific rank. Mobile apps with higher ranks are more likely to be noticed and downloaded by users. The goal of this work is to understand the evolution of ranks and identify the variables that share a strong relationship with ranks. We explore 900 apps with a total of 4,878,011 user-reviews in 30 app development areas. We discover 13 clusters of rank trends. We observe that the majority of the subject apps (i.e., 61%) dropped in the rankings over the two years of our study. By applying a regression model, we find the variables that statistically significantly explain the rank trends, such as the number of releases. Moreover, we build a mixed effects model to study the changes in ranks across apps and various versions of each app. We find that not all the variables that common-wisdom would deem important have a significant relationship with ranks. Furthermore, app developers should not be afraid of a late entry into the market as new apps can achieve higher ranks than existing apps. Finally, we present the findings to 51 developers. According to the feedback, the findings can help app developers to achieve better ranks in Google Play Store. @InProceedings{ESEC/FSE18p283, author = {Ehsan Noei and Daniel Alencar Da Costa and Ying Zou}, title = {Winning the App Production Rally}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {283--294}, doi = {10.1145/3236024.3236044}, year = {2018}, } Publisher's Version |
|
Daian, Philip |
ESEC/FSE '18-DEMO: "A Formal Verification Tool ..."
A Formal Verification Tool for Ethereum VM Bytecode
Daejun Park, Yi Zhang, Manasvi Saxena, Philip Daian, and Grigore Roşu (University of Illinois at Urbana-Champaign, USA; Runtime Verification, USA; Cornell Tech, USA) In this paper, we present a formal verification tool for the Ethereum Virtual Machine (EVM) bytecode. To precisely reason about all possible behaviors of the EVM bytecode, we adopted KEVM, a complete formal semantics of the EVM, and instantiated the K-framework's reachability logic theorem prover to generate a correct-by-construction deductive verifier for the EVM. We further optimized the verifier by introducing EVM-specific abstractions and lemmas to improve its scalability. Our EVM verifier has been used to verify various high-profile smart contracts including the ERC20 token, Ethereum Casper, and DappHub MakerDAO contracts. @InProceedings{ESEC/FSE18p912, author = {Daejun Park and Yi Zhang and Manasvi Saxena and Philip Daian and Grigore Roşu}, title = {A Formal Verification Tool for Ethereum VM Bytecode}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {912--915}, doi = {10.1145/3236024.3264591}, year = {2018}, } Publisher's Version Video Info |
|
Dang, Yingnong |
ESEC/FSE '18: "Predicting Node Failure in ..."
Predicting Node Failure in Cloud Service Systems
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, and Dongmei Zhang (Microsoft Research, China; Microsoft, USA; University of Newcastle, Australia) In recent years, many traditional software systems have migrated to cloud computing platforms and are provided as online services. The service quality matters because system failures could seriously affect business and user experience. A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability. Predicting node failure in cloud service systems is challenging, because a node failure could be caused by a variety of reasons and reflected by many temporal and spatial signals. Furthermore, the failure data is highly imbalanced. To tackle these challenges, we propose MING, a novel technique that combines: 1) a LSTM model to incorporate the temporal data, 2) a Random Forest model to incorporate spatial data; 3) a ranking model that embeds the intermediate results of the two models as feature inputs and ranks the nodes by their failure-proneness, 4) a cost-sensitive function to identify the optimal threshold for selecting the faulty nodes. We evaluate our approach using real-world data collected from a cloud service system. The results confirm the effectiveness of the proposed approach. We have also successfully applied the proposed approach in real industrial practice. @InProceedings{ESEC/FSE18p480, author = {Qingwei Lin and Ken Hsieh and Yingnong Dang and Hongyu Zhang and Kaixin Sui and Yong Xu and Jian-Guang Lou and Chenggang Li and Youjiang Wu and Randolph Yao and Murali Chintalapati and Dongmei Zhang}, title = {Predicting Node Failure in Cloud Service Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {480--490}, doi = {10.1145/3236024.3236060}, year = {2018}, } Publisher's Version |
|
Dash, Santanu Kumar |
ESEC/FSE '18: "RefiNym: Using Names to Refine ..."
RefiNym: Using Names to Refine Types
Santanu Kumar Dash, Miltiadis Allamanis, and Earl T. Barr (University College London, UK; Microsoft Research, UK) Source code is bimodal: it combines a formal, algorithmic channel and a natural language channel of identifiers and comments. In this work, we model the bimodality of code with name flows, an assignment flow graph augmented to track identifier names. Conceptual types are logically distinct types that do not always coincide with program types. Passwords and URLs are example conceptual types that can share the program type string. Our tool, RefiNym, is an unsupervised method that mines a lattice of conceptual types from name flows and reifies them into distinct nominal types. For string, RefiNym finds and splits conceptual types originally merged into a single type, reducing the number of same-type variables per scope from 8.7 to 2.2 while eliminating 21.9% of scopes that have more than one same-type variable in scope. This makes the code more self-documenting and frees the type system to prevent a developer from inadvertently assigning data across conceptual types. @InProceedings{ESEC/FSE18p107, author = {Santanu Kumar Dash and Miltiadis Allamanis and Earl T. Barr}, title = {RefiNym: Using Names to Refine Types}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {107--117}, doi = {10.1145/3236024.3236042}, year = {2018}, } Publisher's Version |
|
Davis, James C. |
ESEC/FSE '18: "The Impact of Regular Expression ..."
The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale
James C. Davis, Christy A. Coghlan, Francisco Servant, and Dongyoon Lee (Virginia Tech, USA) Regular expressions (regexes) are a popular and powerful means of automatically manipulating text. Regexes are also an understudied denial of service vector (ReDoS). If a regex has super-linear worst-case complexity, an attacker may be able to trigger this complexity, exhausting the victim’s CPU resources and causing denial of service. Existing research has shown how to detect these superlinear regexes, and practitioners have identified super-linear regex anti-pattern heuristics that may lead to such complexity. In this paper, we empirically study three major aspects of ReDoS that have hitherto been unexplored: the incidence of super-linear regexes in practice, how they can be prevented, and how they can be repaired. In the ecosystems of two of the most popular programming languages — JavaScript and Python – we detected thousands of super-linear regexes affecting over 10,000 modules across diverse application domains. We also found that the conventional wisdom for super-linear regex anti-patterns has few false negatives but many false positives; these anti-patterns appear to be necessary, but not sufficient, signals of super-linear behavior. Finally, we found that when faced with a super-linear regex, developers favor revising it over truncating input or developing a custom parser, regardless of whether they had been shown examples of all three fix strategies. These findings motivate further research into ReDoS, since many modules are vulnerable to it and existing mechanisms to avoid it are insufficient. We believe that ReDoS vulnerabilities are a larger threat in practice than might have been guessed. @InProceedings{ESEC/FSE18p246, author = {James C. Davis and Christy A. Coghlan and Francisco Servant and Dongyoon Lee}, title = {The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {246--256}, doi = {10.1145/3236024.3236027}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Debroy, Vidroha |
ESEC/FSE '18-IND: "Building Lean Continuous Integration ..."
Building Lean Continuous Integration and Delivery Pipelines by Applying DevOps Principles: A Case Study at Varidesk
Vidroha Debroy, Senecca Miller, and Lance Brimble (Varidesk, USA) Continuous Integration (CI) and Continuous Delivery (CD) are widely considered to be best practices in software development. Studies have shown however, that adopting these practices can be challenging and there are many barriers that engineers may face, such as – overly long build times, lack of support for desired workflows, issues with configuration, etc. At Varidesk, we recently began shifting our primary web application (from a monolithic) to a micro-services-based architecture and also adapted our software development practices to aim for more effective CI/CD. In doing so, we also ran into some of the same afore-mentioned barriers. In this paper we focus on two specific challenges that we faced – long wait times for builds/releases to be queued and completed, and the lack of support for tooling, especially from a cross-cloud perspective. We then present the solutions that we came up with, which involved re-thinking DevOps as it applied to us, and re-building our own CI/CD pipelines based on DevOps-supporting approaches such as containerization, infrastructure-as-code, and orchestration. Our re-designed pipelines have led us to see speed increases, in terms of total build/release time, in the range of 330x-1110x and have enabled us to seamlessly move from a single-cloud to a multi- cloud environment, with no architectural changes to any apps. @InProceedings{ESEC/FSE18p851, author = {Vidroha Debroy and Senecca Miller and Lance Brimble}, title = {Building Lean Continuous Integration and Delivery Pipelines by Applying DevOps Principles: A Case Study at Varidesk}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {851--856}, doi = {10.1145/3236024.3275528}, year = {2018}, } Publisher's Version |
|
De Caso, Guido |
ESEC/FSE '18-IND: "Testing and Validating End ..."
Testing and Validating End User Programmed Calculated Fields
Víctor Braberman, Diego Garbervetsky, Javier Godoy, Sebastian Uchitel, Guido de Caso, Ignacio Perez, and Santiago Perez (University of Buenos Aires, Argentina; CONICET, Argentina; Imperial College London, UK; Medallia, USA) This paper reports on an approach for systematically generating test data from production databases for end user calculated field program via a novel combination of symbolic execution and database queries. We also discuss the opportunities and challenges that this specific domain poses for symbolic execution and shows how database queries can help complement some of symbolic execution's weaknesses, namely in the treatment of loops and also of path conditions that exceed SMT solver capabilities. @InProceedings{ESEC/FSE18p827, author = {Víctor Braberman and Diego Garbervetsky and Javier Godoy and Sebastian Uchitel and Guido de Caso and Ignacio Perez and Santiago Perez}, title = {Testing and Validating End User Programmed Calculated Fields}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {827--832}, doi = {10.1145/3236024.3275531}, year = {2018}, } Publisher's Version |
|
DeFreez, Daniel |
ESEC/FSE '18: "Path-Based Function Embedding ..."
Path-Based Function Embedding and Its Application to Error-Handling Specification Mining
Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González (University of California at Davis, USA) Identifying relationships among program elements is useful for program understanding, debugging, and analysis. One such kind of relationship is synonymy. Function synonyms are functions that play a similar role in code; examples include functions that perform initialization for different device drivers, and functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents Func2vec, a technique that learns an embedding mapping each function to a vector in a continuous vector space such that vectors for function synonyms are in close proximity. We compute the function embedding by training a neural network on sentences generated using random walks over the interprocedural control-flow graph. We show the effectiveness of Func2vec at identifying function synonyms in the Linux kernel. Finally, we apply Func2vec to the problem of mining error-handling specifications in Linux file systems and drivers. We show that the function synonyms identified by Func2vec result in error-handling specifications with high support. @InProceedings{ESEC/FSE18p423, author = {Daniel DeFreez and Aditya V. Thakur and Cindy Rubio-González}, title = {Path-Based Function Embedding and Its Application to Error-Handling Specification Mining}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {423--433}, doi = {10.1145/3236024.3236059}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional ESEC/FSE '18-SRC: "Mining Error-Handling Specifications ..." Mining Error-Handling Specifications for Systems Software Daniel DeFreez (University of California at Davis, USA) This paper presents a technique for mining error-handling specifications from systems software. It presents a static analysis for detecting error handlers in low-level code, and it shows how function synonyms can be used to mine for error-handling specifications with only a few supporting examples. @InProceedings{ESEC/FSE18p983, author = {Daniel DeFreez}, title = {Mining Error-Handling Specifications for Systems Software}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {983--985}, doi = {10.1145/3236024.3275440}, year = {2018}, } Publisher's Version |
|
Devanbu, Premkumar T. |
ESEC/FSE '18-NIER: "On the Naturalness of Proofs ..."
On the Naturalness of Proofs
Vincent J. Hellendoorn, Premkumar T. Devanbu, and Mohammad Amin Alipour (University of California at Davis, USA; University of Houston, USA) Proofs play a key role in reasoning about programs and verification of properties of systems. Mechanized proof assistants help users in developing and checking the consistency of proofs using the proof language developed by the systems; but even then writing proofs is tedious and could benefit from automated insight. In this paper, we analyze proofs in two different proof assistant systems (Coq and HOL Light) to investigate if there is evidence of "naturalness" in these proofs: viz., recurring linguistic patterns that are amenable to language models, in the way that programming languages are known to be. Such models could be used to find errors, rewrite proofs, help suggest dependencies, and perhaps even synthesize (steps of) proofs. We apply state-of-the-art language models to large corpora of proofs to show that this is indeed the case: proofs are remarkably predictable, much like other programming languages. Code completion tools for Coq proofs could save over 60% of typing effort. As proofs have become increasingly central to writing provably correct, large programs (such as the CompCert C compiler), our demonstration that they are amenable to general statistical models unlocks a range of linguistics-inspired tool support. @InProceedings{ESEC/FSE18p724, author = {Vincent J. Hellendoorn and Premkumar T. Devanbu and Mohammad Amin Alipour}, title = {On the Naturalness of Proofs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {724--728}, doi = {10.1145/3236024.3264832}, year = {2018}, } Publisher's Version |
|
Diehl, Stephan |
ESEC/FSE '18: "Towards a Theory of Software ..."
Towards a Theory of Software Development Expertise
Sebastian Baltes and Stephan Diehl (University of Trier, Germany) Software development includes diverse tasks such as implementing new features, analyzing requirements, and fixing bugs. Being an expert in those tasks requires a certain set of skills, knowledge, and experience. Several studies investigated individual aspects of software development expertise, but what is missing is a comprehensive theory. We present a first conceptual theory of software development expertise that is grounded in data from a mixed-methods survey with 335 software developers and in literature on expertise and expert performance. Our theory currently focuses on programming, but already provides valuable insights for researchers, developers, and employers. The theory describes important properties of software development expertise and which factors foster or hinder its formation, including how developers' performance may decline over time. Moreover, our quantitative results show that developers' expertise self-assessments are context-dependent and that experience is not necessarily related to expertise. @InProceedings{ESEC/FSE18p187, author = {Sebastian Baltes and Stephan Diehl}, title = {Towards a Theory of Software Development Expertise}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {187--200}, doi = {10.1145/3236024.3236061}, year = {2018}, } Publisher's Version Info Artifacts Reusable Artifacts Functional |
|
Dillig, Isil |
ESEC/FSE '18: "Singularity: Pattern Fuzzing ..."
Singularity: Pattern Fuzzing for Worst Case Complexity
Jiayi Wei, Jia Chen, Yu Feng, Kostas Ferles, and Isil Dillig (University of Texas at Austin, USA) We describe a new blackbox complexity testing technique for determining the worst-case asymptotic complexity of a given application. The key idea is to look for an input pattern —rather than a concrete input— that maximizes the asymptotic resource usage of the target program. Because input patterns can be described concisely as programs in a restricted language, our method transforms the complexity testing problem to optimal program synthesis. In particular, we express these input patterns using a new model of computation called Recurrent Computation Graph (RCG) and solve the optimal synthesis problem by developing a genetic programming algorithm that operates on RCGs. We have implemented the proposed ideas in a tool called Singularityand evaluate it on a diverse set of benchmarks. Our evaluation shows that Singularitycan effectively discover the worst-case complexity of various algorithms and that it is more scalable compared to existing state-of-the-art techniques. Furthermore, our experiments also corroborate that Singularitycan discover previously unknown performance bugs and availability vulnerabilities in real-world applications such as Google Guava and JGraphT. @InProceedings{ESEC/FSE18p213, author = {Jiayi Wei and Jia Chen and Yu Feng and Kostas Ferles and Isil Dillig}, title = {Singularity: Pattern Fuzzing for Worst Case Complexity}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {213--223}, doi = {10.1145/3236024.3236039}, year = {2018}, } Publisher's Version Results Replicated |
|
Dolan-Gavitt, Brendan |
ESEC/FSE '18: "Bug Synthesis: Challenging ..."
Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults
Subhajit Roy, Awanish Pandey, Brendan Dolan-Gavitt, and Yu Hu (IIT Kanpur, India; New York University, USA) In spite of decades of research in bug detection tools, there is a surprising dearth of ground-truth corpora that can be used to evaluate the efficacy of such tools. Recently, systems such as LAVA and EvilCoder have been proposed to automatically inject bugs into software to quickly generate large bug corpora, but the bugs created so far differ from naturally occurring bugs in a number of ways. In this work, we propose a new automated bug injection system, Apocalypse, that uses formal techniques—symbolic execution, constraint-based program synthesis and model counting—to automatically inject fair (can potentially be discovered by current bug-detection tools), deep (requiring a long sequence of dependencies to be satisfied to fire), uncorrelated (each bug behaving independent of others), reproducible (a trigger input being available) and rare (can be triggered by only a few program inputs) bugs in large software code bases. In our evaluation, we inject bugs into thirty Coreutils programs as well as the TCAS test suite. We find that bugs synthesized by Apocalypse are highly realistic under a variety of metrics, that they do not favor a particular bug-finding strategy (unlike bugs produced by LAVA), and that they are more difficult to find than manually injected bugs, requiring up around 240× more tests to discover with a state-of-the-art symbolic execution tool. @InProceedings{ESEC/FSE18p224, author = {Subhajit Roy and Awanish Pandey and Brendan Dolan-Gavitt and Yu Hu}, title = {Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {224--234}, doi = {10.1145/3236024.3236084}, year = {2018}, } Publisher's Version |
|
Dong, Feng |
ESEC/FSE '18: "FraudDroid: Automated Ad Fraud ..."
FraudDroid: Automated Ad Fraud Detection for Android Apps
Feng Dong, Haoyu Wang, Li Li, Yao Guo, Tegawendé F. Bissyandé, Tianming Liu, Guoai Xu, and Jacques Klein (Beijing University of Posts and Telecommunications, China; Monash University, Australia; Peking University, China; University of Luxembourg, Luxembourg) Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (∼ 93%) and recall (∼ 92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection. @InProceedings{ESEC/FSE18p257, author = {Feng Dong and Haoyu Wang and Li Li and Yao Guo and Tegawendé F. Bissyandé and Tianming Liu and Guoai Xu and Jacques Klein}, title = {FraudDroid: Automated Ad Fraud Detection for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--268}, doi = {10.1145/3236024.3236045}, year = {2018}, } Publisher's Version |
|
Dou, Liang |
ESEC/FSE '18-IND: "FACTS: Automated Black-Box ..."
FACTS: Automated Black-Box Testing of FinTech Systems
Qingshun Wang, Lintao Gu, Minhui Xue, Lihua Xu, Wenyu Niu, Liang Dou, Liang He, and Tao Xie (East China Normal University, China; Macquarie University, Australia; New York University Shanghai, China; CFETS Information Technology, China; University of Illinois at Urbana-Champaign, USA) FinTech, short for ``financial technology,'' has advanced the process of transforming financial business from a traditional manual-process-driven to an automation-driven model by providing various software platforms. However, the current FinTech-industry still heavily depends on manual testing, which becomes the bottleneck of FinTech industry development. To automate the testing process, we propose an approach of black-box testing for a FinTech system with effective tool support for both test generation and test oracles. For test generation, we first extract input categories from business-logic specifications, and then mutate real data collected from system logs with values randomly picked from each extracted input category. For test oracles, we propose a new technique of priority differential testing where we evaluate execution results of system-test inputs on the system's head (i.e., latest) version in the version repository (1) against the last legacy version in the version repository (only when the executed test inputs are on new, not-yet-deployed services) and (2) against both the currently-deployed version and the last legacy version (only when the test inputs are on existing, deployed services). When we rank the behavior-inconsistency results for developers to inspect, for the latter case, we give the currently-deployed version as a higher-priority source of behavior to check. We apply our approach to the CSTP subsystem, one of the largest data processing and forwarding modules of the China Foreign Exchange Trade System (CFETS) platform, whose annual total transaction volume reaches 150 trillion US dollars. Extensive experimental results show that our approach can substantially boost the branch coverage by approximately 40%, and is also efficient to identify common faults in the FinTech system. @InProceedings{ESEC/FSE18p839, author = {Qingshun Wang and Lintao Gu and Minhui Xue and Lihua Xu and Wenyu Niu and Liang Dou and Liang He and Tao Xie}, title = {FACTS: Automated Black-Box Testing of FinTech Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {839--844}, doi = {10.1145/3236024.3275533}, year = {2018}, } Publisher's Version |
|
Dou, Wensheng |
ESEC/FSE '18: "An Empirical Study on Crash ..."
An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems
Yu Gao, Wensheng Dou, Feng Qin, Chushu Gao, Dong Wang, Jun Wei, Ruirui Huang, Li Zhou, and Yongming Wu (Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Ohio State University, USA; Alibaba Group, China) In large-scale distributed systems, node crashes are inevitable, and can happen at any time. As such, distributed systems are usually designed to be resilient to these node crashes via various crash recovery mechanisms, such as write-ahead logging in HBase and hinted handoffs in Cassandra. However, faults in crash recovery mechanisms and their implementations can introduce intricate crash recovery bugs, and lead to severe consequences. In this paper, we present CREB, the most comprehensive study on 103 Crash REcovery Bugs from four popular open-source distributed systems, including ZooKeeper, Hadoop MapReduce, Cassandra and HBase. For all the studied bugs, we analyze their root causes, triggering conditions, bug impacts and fixing. Through this study, we obtain many interesting findings that can open up new research directions for combating crash recovery bugs. @InProceedings{ESEC/FSE18p539, author = {Yu Gao and Wensheng Dou and Feng Qin and Chushu Gao and Dong Wang and Jun Wei and Ruirui Huang and Li Zhou and Yongming Wu}, title = {An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {539--550}, doi = {10.1145/3236024.3236030}, year = {2018}, } Publisher's Version ESEC/FSE '18-NIER: "How Are Spreadsheet Templates ..." How Are Spreadsheet Templates Used in Practice: A Case Study on Enron Liang Xu, Wensheng Dou, Jiaxin Zhu, Chushu Gao, Jun Wei, and Tao Huang (University of Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China) To reduce the effort of creating similar spreadsheets, end users may create expected spreadsheets from some predesigned templates, which contain necessary table layouts (e.g., headers and styles) and formulas, other than from scratch. When there are no explicitly predesigned spreadsheet templates, end users often take an existing spreadsheet as the instance template to create a new spreadsheet. However, improper template design and usage can introduce various issues. For example, a formula error in the template can be easily propagated to all its instances without users’ noticing. Since template design and usage are rarely documented in literature and practice, practitioners and researchers lack understanding of them to achieve effective improvement. In this paper, we conduct the first empirical study on the design and the usage of spreadsheet templates based on 47 predesigned templates (490 instances in total), and 21 instance template groups (168 template and instance pairs in total), extracted from the Enron corpus. Our study reveals a number of spreadsheet template design and usage issues in practice, and also sheds lights on several interesting research directions. @InProceedings{ESEC/FSE18p734, author = {Liang Xu and Wensheng Dou and Jiaxin Zhu and Chushu Gao and Jun Wei and Tao Huang}, title = {How Are Spreadsheet Templates Used in Practice: A Case Study on Enron}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {734--738}, doi = {10.1145/3236024.3264834}, year = {2018}, } Publisher's Version |
|
Dutta, Saikat |
ESEC/FSE '18: "Testing Probabilistic Programming ..."
Testing Probabilistic Programming Systems
Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic (University of Illinois at Urbana-Champaign, USA) Probabilistic programming systems (PP systems) allow developers to model stochastic phenomena and perform efficient inference on the models. The number and adoption of probabilistic programming systems is growing significantly. However, there is no prior study of bugs in these systems and no methodology for systematically testing PP systems. Yet, testing PP systems is highly non-trivial, especially when they perform approximate inference. In this paper, we characterize 118 previously reported bugs in three open-source PP systems—Edward, Pyro and Stan—and pro- pose ProbFuzz, an extensible system for testing PP systems. Prob- Fuzz allows a developer to specify templates of probabilistic models, from which it generates concrete probabilistic programs and data for testing. ProbFuzz uses language-specific translators to generate these concrete programs, which use the APIs of each PP system. ProbFuzz finds potential bugs by checking the output from running the generated programs against several oracles, including an accu- racy checker. Using ProbFuzz, we found 67 previously unknown bugs in recent versions of these PP systems. Developers already accepted 51 bug fixes that we submitted to the three PP systems, and their underlying systems, PyTorch and TensorFlow. @InProceedings{ESEC/FSE18p574, author = {Saikat Dutta and Owolabi Legunsen and Zixin Huang and Sasa Misailovic}, title = {Testing Probabilistic Programming Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {574--586}, doi = {10.1145/3236024.3236057}, year = {2018}, } Publisher's Version |
|
Dyer, Robert |
ESEC/FSE '18-NIER: "Software Engineering Collaboratories ..."
Software Engineering Collaboratories (SEClabs) and Collaboratories as a Service (CaaS)
Elena Sherman and Robert Dyer (Boise State University, USA; Bowling Green State University, USA) Novel research ideas require strong evaluations. Modern software engineering research evaluation typically requires a set of benchmark programs. Open source software repositories have provided a great opportunity for researchers to find such programs for use in their evaluations. Many tools/techniques have been developed to help automate the curation of open source software. There has also been encouragement for researchers to provide their research artifacts so that other researchers can easily reproduce the results. We argue that these two trends (i.e., curating open source software for research evaluation and the providing of research artifacts) drive the need for Software Engineer Collaboratories (SEClabs). We envision research communities coming together to create SEClab instances, where research artifacts can be made publicly available to other researchers. The community can then vet such artifacts and make them available as a service, thus turning the collaboratory into a Collaboratory as a Service (CaaS). If our vision is realized, the speed and transparency of research will drastically increase. @InProceedings{ESEC/FSE18p760, author = {Elena Sherman and Robert Dyer}, title = {Software Engineering Collaboratories (SEClabs) and Collaboratories as a Service (CaaS)}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {760--764}, doi = {10.1145/3236024.3264839}, year = {2018}, } Publisher's Version |
|
Eiers, William |
ESEC/FSE '18: "Parameterized Model Counting ..."
Parameterized Model Counting for String and Numeric Constraints
Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu (Microsoft, USA; University of California at Santa Barbara, USA; National Chengchi University, Taiwan) Recently, symbolic program analysis techniques have been extended to quantitative analyses using model counting constraint solvers. Given a constraint and a bound, a model counting constraint solver computes the number of solutions for the constraint within the bound. We present a parameterized model counting constraint solver for string and numeric constraints. We first construct a multi-track deterministic finite state automaton that accepts all solutions to the given constraint. We limit the numeric constraints to linear integer arithmetic, and for non-regular string constraints we over-approximate the solution set. Counting the number of accepting paths in the generated automaton solves the model counting problem. Our approach is parameterized in the sense that, we do not assume a finite domain size during automata construction, resulting in a potentially infinite set of solutions, and our model counting approach works for arbitrarily large bounds. We experimentally demonstrate the effectiveness of our approach on a large set of string and numeric constraints extracted from software applications. We experimentally compare our tool to five existing model counting constraint solvers for string and numeric constraints and demonstrate that our tool is as efficient and as or more precise than other solvers. Moreover, our tool can handle mixed constraints with string and integer variables that no other tool can. @InProceedings{ESEC/FSE18p400, author = {Abdulbaki Aydin and William Eiers and Lucas Bang and Tegan Brennan and Miroslav Gavrilov and Tevfik Bultan and Fang Yu}, title = {Parameterized Model Counting for String and Numeric Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {400--410}, doi = {10.1145/3236024.3236064}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Elbaum, Sebastian |
ESEC/FSE '18: "Phys: Probabilistic Physical ..."
Phys: Probabilistic Physical Unit Assignment and Inconsistency Detection
Sayali Kate, John-Paul Ore, Xiangyu Zhang, Sebastian Elbaum, and Zhaogui Xu (Purdue University, USA; University of Nebraska-Lincoln, USA; University on Nebraska-Lincoln, USA; Nanjing University, China) Program variables used in robotic and cyber-physical systems often have implicit physical units that cannot be determined from their variable types. Inferring an abstract physical unit type for variables and checking their physical unit type consistency is of particular importance for validating the correctness of such systems. For instance, a variable with the unit of ‘meter’ should not be assigned to another variable with the unit of ‘degree-per-second’. Existing solutions have various limitations such as requiring developers to annotate variables with physical units and only handling variables that are directly or transitively used in popular robotic libraries with known physical unit information. We observe that there are a lot of physical unit hints in these softwares such as variable names and specific forms of expressions. These hints have uncertainty as developers may not respect conventions. We propose to model them with probability distributions and conduct probabilistic inference. At the end, our technique produces a unit distribution for each variable. Unit inconsistencies can then be detected using the highly probable unit assignments. Experimental results on 30 programs show that our technique can infer units for 159.3% more variables compared to the state-of-the-art with more than 88.7% true positives, and inconsistencies detection on 90 programs shows that our technique reports 103.3% more inconsistencies with 85.3% true positives. @InProceedings{ESEC/FSE18p563, author = {Sayali Kate and John-Paul Ore and Xiangyu Zhang and Sebastian Elbaum and Zhaogui Xu}, title = {Phys: Probabilistic Physical Unit Assignment and Inconsistency Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {563--573}, doi = {10.1145/3236024.3236035}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Erata, Ferhat |
ESEC/FSE '18-DEMO: "AlloyInEcore: Embedding of ..."
AlloyInEcore: Embedding of First-Order Relational Logic into Meta-Object Facility for Automated Model Reasoning
Ferhat Erata, Arda Goknil, Ivan Kurtev, and Bedir Tekinerdogan (UNIT Information Technologies, Turkey; University of Luxembourg, Luxembourg; Altran, Netherlands; Wageningen University, Netherlands) We present AlloyInEcore, a tool for specifying metamodels with their static semantics to facilitate automated, formal reasoning on models. Software development projects require that software systems be specified in various models (e.g., requirements models, architecture models, test models, and source code). It is crucial to reason about those models to ensure the correct and complete system specifications. AlloyInEcore~allows the user to specify metamodels with their static semantics, while, using the semantics, it automatically detects inconsistent models, and completes partial models. It has been evaluated on three industrial case studies in the automotive domain (https://modelwriter.github.io/AlloyInEcore/). @InProceedings{ESEC/FSE18p920, author = {Ferhat Erata and Arda Goknil and Ivan Kurtev and Bedir Tekinerdogan}, title = {AlloyInEcore: Embedding of First-Order Relational Logic into Meta-Object Facility for Automated Model Reasoning}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {920--923}, doi = {10.1145/3236024.3264588}, year = {2018}, } Publisher's Version Video Info |
|
Erofeeva, Irina |
ESEC/FSE '18-SRC: "How Dance(Sport) Can Help ..."
How Dance(Sport) Can Help to Produce Better Software
Irina Erofeeva (Innopolis University, Russia) Dancing and dancesport have a long tradition of instructions and development. Furthermore, there are many aspects of them that resemble the software development process. This work analyses their features with the intent to explore what of them is already applied in software development and what could be borrowed and applied in the future. Additionally, an investigation of the associated brain activities could be performed to gather a deeper understanding of analogies and differences. @InProceedings{ESEC/FSE18p971, author = {Irina Erofeeva}, title = {How Dance(Sport) Can Help to Produce Better Software}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {971--973}, doi = {10.1145/3236024.3275436}, year = {2018}, } Publisher's Version |
|
Fakhoury, Sarah |
ESEC/FSE '18-DOC: "Moving towards Objective Measures ..."
Moving towards Objective Measures of Program Comprehension
Sarah Fakhoury (Washington State University, USA) Traditionally, program comprehension research relies heavily on indirect measures of comprehension, where subjects report on their own comprehension levels or summarize part of an artifact so that researchers can instead deduce the level of comprehension. However, there are several potential issues that can result from using these indirect measures because they are prone to participant biases and implicitly deduce comprehension based on various factors. The proposed research presents a framework to move towards more objective measures of program comprehension through the use of brain imaging and eye tracking technology. We aim to shed light on how the human brain processes comprehension tasks, specifically what aspects of the source code cause measurable increases in the cognitive load of developers in both bug localization tasks, as well as code reviews. We discuss the proposed methodology, preliminary results, and overall contributions of the work @InProceedings{ESEC/FSE18p936, author = {Sarah Fakhoury}, title = {Moving towards Objective Measures of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {936--939}, doi = {10.1145/3236024.3275426}, year = {2018}, } Publisher's Version |
|
Fan, Lingling |
ESEC/FSE '18-IND: "Are Mobile Banking Apps Secure? ..."
Are Mobile Banking Apps Secure? What Can Be Improved?
Sen Chen, Ting Su, Lingling Fan, Guozhu Meng, Minhui Xue, Yang Liu, and Lihua Xu (East China Normal University, China; Nanyang Technological University, Singapore; Chinese Academy of Sciences, China; Macquarie University, Australia; New York University Shanghai, China) Mobile banking apps, as one of the most contemporary FinTechs, have been widely adopted by banking entities to provide instant financial services. However, our recent work discovered thousands of vulnerabilities in 693 banking apps, which indicates these apps are not as secure as we expected. This motivates us to conduct this study for understanding the current security status of them. First, we take 6 months to track the reporting and patching procedure of these vulnerabilities. Second, we audit 4 state-of the-art vulnerability detection tools on those patched vulnerabilities. Third, we discuss with 7 banking entities via in-person or online meetings and conduct an online survey to gain more feedback from financial app developers. Through this study, we reveal that (1) people may have inconsistent understandings of the vulnerabilities and different criteria for rating severity; (2) state-of-the-art tools are not effective in detecting vulnerabilities that the banking entities most concern; and (3) more efforts should be endeavored in different aspects to secure banking apps. We believe our study can help bridge the existing gaps, and further motivate different parties, including banking entities, researchers and policy makers, to better tackle security issues altogether. @InProceedings{ESEC/FSE18p797, author = {Sen Chen and Ting Su and Lingling Fan and Guozhu Meng and Minhui Xue and Yang Liu and Lihua Xu}, title = {Are Mobile Banking Apps Secure? What Can Be Improved?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {797--802}, doi = {10.1145/3236024.3275523}, year = {2018}, } Publisher's Version |
|
Farmahinifarahani, Farima |
ESEC/FSE '18: "Oreo: Detection of Clones ..."
Oreo: Detection of Clones in the Twilight Zone
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes (University of California at Irvine, USA) Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect – the Twilight Zone. Most clone detectors reported in the literature fail to operate in this zone. We present Oreo, a novel approach to source code clone detection that not only detects Type-1 to Type-3 clones accurately, but is also capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. We evaluate the recall of Oreo on BigCloneBench, and perform manual evaluation for precision. Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity in a scalable manner @InProceedings{ESEC/FSE18p354, author = {Vaibhav Saini and Farima Farmahinifarahani and Yadong Lu and Pierre Baldi and Cristina V. Lopes}, title = {Oreo: Detection of Clones in the Twilight Zone}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {354--365}, doi = {10.1145/3236024.3236026}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Farritor, Shane |
ESEC/FSE '18-IND: "Modeling and Testing a Family ..."
Modeling and Testing a Family of Surgical Robots: An Experience Report
Niloofar Mansoor, Jonathan A. Saddler, Bruno Silva, Hamid Bagheri, Myra B. Cohen, and Shane Farritor (University of Nebraska-Lincoln, USA) Safety-critical applications often use dependability cases to validate that specified properties are invariant, or to demonstrate a counter example showing how that property might be violated. However, most dependability cases are written with a single product in mind. At the same time, software product lines (families of related software products) have been studied with the goal of modeling variability and commonality, and building family based techniques for both analysis and testing. However, there has been little work on building an end to end dependability case for a software product line (where a property is modeled, a counter example is found and then validated as a true positive via testing), and none that we know of in an emerging safety-critical domain, that of robotic surgery. In this paper, we study a family of surgical robots, that combine hardware and software, and are highly configurable, representing over 1300 unique robots. At the same time, they are considered safety-critical and should have associated dependability cases. We perform a case study to understand how we can bring together lightweight formal analysis, feature modeling, and testing to provide an end to end pipeline to find potential violations of important safety properties. In the process, we learned that there are some interesting and open challenges for the research community, which if solved will lead towards more dependable safety-critical cyber-physical systems. @InProceedings{ESEC/FSE18p785, author = {Niloofar Mansoor and Jonathan A. Saddler and Bruno Silva and Hamid Bagheri and Myra B. Cohen and Shane Farritor}, title = {Modeling and Testing a Family of Surgical Robots: An Experience Report}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {785--790}, doi = {10.1145/3236024.3275534}, year = {2018}, } Publisher's Version |
|
Fazzini, Mattia |
ESEC/FSE '18-DOC: "Automated Support for Mobile ..."
Automated Support for Mobile Application Testing and Maintenance
Mattia Fazzini (Georgia Institute of Technology, USA) Mobile applications are an essential part of our daily life. In fact, they can be used for tasks that range from reading the news to performing bank transactions. Considering the impact that mobile applications have in our lives, it is important for developers to test them and gain confidence that they behave as expected. However, testing mobile applications proves to be challenging. In fact, mobile companies report that they do not have enough time and the right methods to test. In addition, in the case of Android applications, the situation is further complicated by the "fragmentation" of the ecosystem. Developers not only need to ensure that an application behaves as expected but also need to make sure that the application does so on a multitude of different devices. Finally, because it is virtually impossible to release a bug free application, developers also need to quickly react to bug reports and release a fixed version of the application before customer loss. The research plan proposed in this paper, aims to provide novel techniques to automate the support for mobile application testing and maintenance. Specifically, it proposes techniques to: test apps more effectively and efficiently, tackle the problems caused by the "fragmentation" of the Android ecosystem, and help developers in quickly handling bug reports. @InProceedings{ESEC/FSE18p932, author = {Mattia Fazzini}, title = {Automated Support for Mobile Application Testing and Maintenance}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {932--935}, doi = {10.1145/3236024.3275425}, year = {2018}, } Publisher's Version |
|
Fedorova, Alexandra |
ESEC/FSE '18: "Performance Comprehension ..."
Performance Comprehension at WiredTiger
Alexandra Fedorova, Craig Mustard, Ivan Beschastnikh, Julia Rubin, Augustine Wong, Svetozar Miucin, and Louis Ye (University of British Columbia, Canada) Software debugging is a time-consuming and challenging process. Supporting debugging has been a focus of the software engineering field since its inception with numerous empirical studies, theories, and tools to support developers in this task. Performance bugs and performance debugging is a sub-genre of debugging that has received less attention. In this paper we contribute an empirical case study of performance bug diagnosis in the WiredTiger project, the default database engine behind MongoDB. We perform an in-depth analysis of 44 Jira tickets documenting WiredTiger performance-related issues. We investigate how developers diagnose performance bugs: what information they collect, what tools they use, and what processes they follow. Our findings show that developers spend the majority of their performance debugging time chasing outlier events, such as latency spikes and throughput drops. Yet, they are not properly supported by existing performance debugging tools in this task. We also observe that developers often use tools without knowing in advance whether the obtained information will be relevant to debugging the problem. Therefore, we believe developers can benefit from tools that can be used for unstructured exploration of performance data, rather than for answering specific questions. @InProceedings{ESEC/FSE18p83, author = {Alexandra Fedorova and Craig Mustard and Ivan Beschastnikh and Julia Rubin and Augustine Wong and Svetozar Miucin and Louis Ye}, title = {Performance Comprehension at WiredTiger}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {83--94}, doi = {10.1145/3236024.3236081}, year = {2018}, } Publisher's Version |
|
Feng, Xiaobing |
ESEC/FSE '18: "CloudRaid: Hunting Concurrency ..."
CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining
Jie Lu, Feng Li, Lian Li, and Xiaobing Feng (Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Institute of Information Engineering at Chinese Academy of Sciences, China) Cloud systems suffer from distributed concurrency bugs, which are notoriously difficult to detect and often lead to data loss and service outage. This paper presents CloudRaid, a new effective tool to battle distributed concurrency bugs. CloudRaid automatically detects concurrency bugs in cloud systems, by analyzing and testing those message orderings that are likely to expose errors. We observe that large-scale online cloud applications process millions of user requests per second, exercising many permutations of message orderings extensively. Those already sufficiently-tested message orderings are unlikely to expose errors. Hence, CloudRaid mines logs from previous executions to uncover those message orderings which are feasible, but not sufficiently tested. Specifically, CloudRaid tries to flip the order of a pair of messages <S,P> if they may happen in parallel, but S always arrives before P from existing logs, i.e., excercising the order P ↣ S. The log-based approach makes it suitable to live systems. We have applied CloudRaid to automatically test four representative distributed systems: Apache Hadoop2/Yarn, HBase, HDFS and Cassandra. CloudRaid can automatically test 40 different versions of the 4 systems (10 versions per system) in 35 hours, and can successfully trigger 28 concurrency bugs, including 8 new bugs that have never been found before. The 8 new bugs have all been confirmed by their original developers, and 3 of them are considered as critical bugs that have already been fixed. @InProceedings{ESEC/FSE18p3, author = {Jie Lu and Feng Li and Lian Li and Xiaobing Feng}, title = {CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {3--14}, doi = {10.1145/3236024.3236071}, year = {2018}, } Publisher's Version |
|
Feng, Yu |
ESEC/FSE '18: "Singularity: Pattern Fuzzing ..."
Singularity: Pattern Fuzzing for Worst Case Complexity
Jiayi Wei, Jia Chen, Yu Feng, Kostas Ferles, and Isil Dillig (University of Texas at Austin, USA) We describe a new blackbox complexity testing technique for determining the worst-case asymptotic complexity of a given application. The key idea is to look for an input pattern —rather than a concrete input— that maximizes the asymptotic resource usage of the target program. Because input patterns can be described concisely as programs in a restricted language, our method transforms the complexity testing problem to optimal program synthesis. In particular, we express these input patterns using a new model of computation called Recurrent Computation Graph (RCG) and solve the optimal synthesis problem by developing a genetic programming algorithm that operates on RCGs. We have implemented the proposed ideas in a tool called Singularityand evaluate it on a diverse set of benchmarks. Our evaluation shows that Singularitycan effectively discover the worst-case complexity of various algorithms and that it is more scalable compared to existing state-of-the-art techniques. Furthermore, our experiments also corroborate that Singularitycan discover previously unknown performance bugs and availability vulnerabilities in real-world applications such as Google Guava and JGraphT. @InProceedings{ESEC/FSE18p213, author = {Jiayi Wei and Jia Chen and Yu Feng and Kostas Ferles and Isil Dillig}, title = {Singularity: Pattern Fuzzing for Worst Case Complexity}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {213--223}, doi = {10.1145/3236024.3236039}, year = {2018}, } Publisher's Version Results Replicated |
|
Ferles, Kostas |
ESEC/FSE '18: "Singularity: Pattern Fuzzing ..."
Singularity: Pattern Fuzzing for Worst Case Complexity
Jiayi Wei, Jia Chen, Yu Feng, Kostas Ferles, and Isil Dillig (University of Texas at Austin, USA) We describe a new blackbox complexity testing technique for determining the worst-case asymptotic complexity of a given application. The key idea is to look for an input pattern —rather than a concrete input— that maximizes the asymptotic resource usage of the target program. Because input patterns can be described concisely as programs in a restricted language, our method transforms the complexity testing problem to optimal program synthesis. In particular, we express these input patterns using a new model of computation called Recurrent Computation Graph (RCG) and solve the optimal synthesis problem by developing a genetic programming algorithm that operates on RCGs. We have implemented the proposed ideas in a tool called Singularityand evaluate it on a diverse set of benchmarks. Our evaluation shows that Singularitycan effectively discover the worst-case complexity of various algorithms and that it is more scalable compared to existing state-of-the-art techniques. Furthermore, our experiments also corroborate that Singularitycan discover previously unknown performance bugs and availability vulnerabilities in real-world applications such as Google Guava and JGraphT. @InProceedings{ESEC/FSE18p213, author = {Jiayi Wei and Jia Chen and Yu Feng and Kostas Ferles and Isil Dillig}, title = {Singularity: Pattern Fuzzing for Worst Case Complexity}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {213--223}, doi = {10.1145/3236024.3236039}, year = {2018}, } Publisher's Version Results Replicated |
|
Filkov, Vladimir |
ESEC/FSE '18: "One Size Does Not Fit All: ..."
One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows
Yang Zhang, Bogdan Vasilescu, Huaimin Wang, and Vladimir Filkov (National University of Defense Technology, China; Carnegie Mellon University, USA; University of California at Davis, USA) Continuous deployment (CD) is a software development practice aimed at automating delivery and deployment of a software product, following any changes to its code. If properly implemented, CD together with other automation in the development process can bring numerous benefits, including higher control and flexibility over release schedules, lower risks, fewer defects, and easier on-boarding of new developers. Here we focus on the (r)evolution in CD workflows caused by containerization, the virtualization technology that enables packaging an application together with all its dependencies and execution environment in a light-weight, self-contained unit, of which Docker has become the de-facto industry standard. There are many available choices for containerized CD workflows, some more appropriate than others for a given project. Owing to cross-listing of GitHub projects on Docker Hub, in this paper we report on a mixed-methods study to shed light on developers' experiences and expectations with containerized CD workflows. Starting from a survey, we explore the motivations, specific workflows, needs, and barriers with containerized CD. We find two prominent workflows, based on the automated builds feature on Docker Hub or continuous integration services, with different trade-offs. We then propose hypotheses and test them in a large-scale quantitative study. @InProceedings{ESEC/FSE18p295, author = {Yang Zhang and Bogdan Vasilescu and Huaimin Wang and Vladimir Filkov}, title = {One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {295--306}, doi = {10.1145/3236024.3236033}, year = {2018}, } Publisher's Version |
|
Foo, Darius |
ESEC/FSE '18-IND: "Efficient Static Checking ..."
Efficient Static Checking of Library Updates
Darius Foo, Hendy Chua, Jason Yeo, Ming Yi Ang, and Asankhaya Sharma (CA Technologies, Singapore) Software engineering practices have evolved to the point where a developer writing a new application today doesn’t start from scratch, but reuses a number of open source libraries and components. These third-party libraries evolve independently of the applications in which they are used, and may not maintain stable interfaces as bugs and vulnerabilities in them are fixed. This in turn causes API incompatibilities in downstream applications which must be manually resolved. Oversight here may manifest in many ways, from test failures to crashes at runtime. To address this problem, we present a static analysis for automatically and efficiently checking if a library upgrade introduces an API incompatibility. Our analysis does not rely on reported version information from library developers, and instead computes the actual differences between methods in libraries across different versions. The analysis is scalable, enabling real-time diff queries involving arbitrary pairs of library versions. It supports a vulnerability remediation product which suggests library upgrades automatically and is lightweight enough to be part of a continuous integration/delivery (CI/CD) pipeline. To evaluate the effectiveness of our approach, we determine semantic versioning adherence of a corpus of open source libraries taken from Maven Central, PyPI, and RubyGems. We find that on average, 26% of library versions are in violation of semantic versioning. We also analyze a collection of popular open source projects from GitHub to determine if we can automatically update libraries in them without causing API incompatibilities. Our results indicate that we can suggest upgrades automatically for 10% of the libraries. @InProceedings{ESEC/FSE18p791, author = {Darius Foo and Hendy Chua and Jason Yeo and Ming Yi Ang and Asankhaya Sharma}, title = {Efficient Static Checking of Library Updates}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {791--796}, doi = {10.1145/3236024.3275535}, year = {2018}, } Publisher's Version |
|
Ford, Denae |
ESEC/FSE '18: "How Should Compilers Explain ..."
How Should Compilers Explain Problems to Developers?
Titus Barik, Denae Ford, Emerson Murphy-Hill, and Chris Parnin (Microsoft, USA; North Carolina State University, USA) Compilers primarily give feedback about problems to developers through the use of error messages. Unfortunately, developers routinely find these messages to be confusing and unhelpful. In this paper, we postulate that because error messages present poor explanations, theories of explanation---such as Toulmin's model of argument---can be applied to improve their quality. To understand how compilers should present explanations to developers, we conducted a comparative evaluation with 68 professional software developers and an empirical study of compiler error messages found in Stack Overflow questions across seven different programming languages. Our findings suggest that, given a pair of error messages, developers significantly prefer the error message that employs proper argument structure over a deficient argument structure when neither offers a resolution---but will accept a deficient argument structure if it provides a resolution to the problem. Human-authored explanations on Stack Overflow converge to one of the three argument structures: those that provide a resolution to the error, simple arguments, and extended arguments that provide additional evidence for the problem. Finally, we contribute three practical design principles to inform the design and evaluation of compiler error messages. @InProceedings{ESEC/FSE18p633, author = {Titus Barik and Denae Ford and Emerson Murphy-Hill and Chris Parnin}, title = {How Should Compilers Explain Problems to Developers?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {633--643}, doi = {10.1145/3236024.3236040}, year = {2018}, } Publisher's Version |
|
Fox, Armando |
ESEC/FSE '18-NIER: "Towards Quantifying the Development ..."
Towards Quantifying the Development Value of Code Contributions
Jinglei Ren, Hezheng Yin, Qingda Hu, Armando Fox, and Wojciech Koszek (Microsoft Research, China; University of California at Berkeley, USA; Tsinghua University, China; FreeBSD Project, USA) Quantifying the value of developers’ code contributions to a software project requires more than simply counting lines of code or commits. We define the development value of code as a combination of its structural value (the effect of code reuse) and its non-structural value (the impact on development). We propose techniques to automatically calculate both components of development value and combine them using Learning to Rank. Our preliminary empirical study shows that our analysis yields richer results than those obtained by human assessment or simple counting methods and demonstrates the potential of our approach. @InProceedings{ESEC/FSE18p775, author = {Jinglei Ren and Hezheng Yin and Qingda Hu and Armando Fox and Wojciech Koszek}, title = {Towards Quantifying the Development Value of Code Contributions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {775--779}, doi = {10.1145/3236024.3264842}, year = {2018}, } Publisher's Version |
|
Fu, Wei |
ESEC/FSE '18: "Applications of Psychological ..."
Applications of Psychological Science for Actionable Analytics
Di Chen, Wei Fu, Rahul Krishna, and Tim Menzies (North Carolina State University, USA) According to psychological scientists, humans understand models that most match their own internal models, which they characterize as lists of "heuristic"s (i.e. lists of very succinct rules). One such heuristic rule generator is the Fast-and-Frugal Trees (FFT) preferred by psychological scientists. Despite their successful use in many applied domains, FFTs have not been applied in software analytics. Accordingly, this paper assesses FFTs for software analytics. We find that FFTs are remarkably effective in that their models are very succinct (5 lines or less describing a binary decision tree) while also outperforming result from very recent, top-level, conference papers. Also, when we restrict training data to operational attributes (i.e., those attributes that are frequently changed by developers), the performance of FFTs are not effected (while the performance of other learners can vary wildly). Our conclusions are two-fold. Firstly, there is much that software analytics community could learn from psychological science. Secondly, proponents of complex methods should always baseline those methods against simpler alternatives. For example, FFTs could be used as a standard baseline learner against which other software analytics tools are compared. @InProceedings{ESEC/FSE18p456, author = {Di Chen and Wei Fu and Rahul Krishna and Tim Menzies}, title = {Applications of Psychological Science for Actionable Analytics}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {456--467}, doi = {10.1145/3236024.3236050}, year = {2018}, } Publisher's Version |
|
Fu, Ying |
ESEC/FSE '18-IND: "VulSeeker-Pro: Enhanced Semantic ..."
VulSeeker-Pro: Enhanced Semantic Learning Based Binary Vulnerability Seeker with Emulation
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, Heyuan Shi, and Jiaguang Sun (Tsinghua University, China) Learning-based clone detection is widely exploited for binary vulnerability search. Although they solve the problem of high time overhead of traditional dynamic and static search approaches to some extent, their accuracy is limited, and need to manually identify the true positive cases among the top-M search results during the industrial practice. This paper presents VulSeeker-Pro, an enhanced binary vulnerability seeker that integrates function semantic emulation at the back end of semantic learning, to release the engineers from the manual identification work. It first uses the semantic learning based predictor to quickly predict the top-M candidate functions which are the most similar to the vulnerability from the target binary. Then the top-M candidates are fed to the emulation engine to resort, and more accurate top-N candidate functions are obtained. With fast filtering of semantic learning and dynamic trace generation of function semantic emulation, VulSeeker-Pro can achieve higher search accuracy with little time overhead. The experimental results on 15 known CVE vulnerabilities involving 6 industry widely used programs show that VulSeeker-Pro significantly outperforms the state-of-the-art approaches in terms of accuracy. In a total of 45 searches, VulSeeker-Pro finds 40 and 43 real vulnerabilities in the top-1 and top-5 candidate functions, which are 12.33× and 2.58× more than the most recent and related work Gemini. In terms of efficiency, it takes 0.22 seconds on average to determine whether the target binary function contains a known vulnerability or not. @InProceedings{ESEC/FSE18p803, author = {Jian Gao and Xin Yang and Ying Fu and Yu Jiang and Heyuan Shi and Jiaguang Sun}, title = {VulSeeker-Pro: Enhanced Semantic Learning Based Binary Vulnerability Seeker with Emulation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {803--808}, doi = {10.1145/3236024.3275524}, year = {2018}, } Publisher's Version |
|
Gadelha, Mikhail R. |
ESEC/FSE '18-NIER: "Towards Counterexample-Guided ..."
Towards Counterexample-Guided k-Induction for Fast Bug Detection
Mikhail R. Gadelha, Felipe R. Monteiro, Lucas C. Cordeiro, and Denis A. Nicole (University of Southampton, UK; Federal University of Amazonas, Brazil; University of Manchester, UK) Recently, the k-induction algorithm has proven to be a successful approach for both finding bugs and proving correctness. However, since the algorithm is an incremental approach, it might waste resources trying to prove incorrect programs. In this paper, we extend the k-induction algorithm to shorten the number of steps required to find a property violation. We convert the algorithm into a meet-in-the-middle bidirectional search algorithm, using the counterexample produced from over-approximating the program. The main advantage is in the reduction of the state explosion by reducing the maximum required steps from k to ⌊k/2 + 1⌋. @InProceedings{ESEC/FSE18p765, author = {Mikhail R. Gadelha and Felipe R. Monteiro and Lucas C. Cordeiro and Denis A. Nicole}, title = {Towards Counterexample-Guided k-Induction for Fast Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {765--769}, doi = {10.1145/3236024.3264840}, year = {2018}, } Publisher's Version Video Info |
|
Gao, Chushu |
ESEC/FSE '18: "An Empirical Study on Crash ..."
An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems
Yu Gao, Wensheng Dou, Feng Qin, Chushu Gao, Dong Wang, Jun Wei, Ruirui Huang, Li Zhou, and Yongming Wu (Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Ohio State University, USA; Alibaba Group, China) In large-scale distributed systems, node crashes are inevitable, and can happen at any time. As such, distributed systems are usually designed to be resilient to these node crashes via various crash recovery mechanisms, such as write-ahead logging in HBase and hinted handoffs in Cassandra. However, faults in crash recovery mechanisms and their implementations can introduce intricate crash recovery bugs, and lead to severe consequences. In this paper, we present CREB, the most comprehensive study on 103 Crash REcovery Bugs from four popular open-source distributed systems, including ZooKeeper, Hadoop MapReduce, Cassandra and HBase. For all the studied bugs, we analyze their root causes, triggering conditions, bug impacts and fixing. Through this study, we obtain many interesting findings that can open up new research directions for combating crash recovery bugs. @InProceedings{ESEC/FSE18p539, author = {Yu Gao and Wensheng Dou and Feng Qin and Chushu Gao and Dong Wang and Jun Wei and Ruirui Huang and Li Zhou and Yongming Wu}, title = {An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {539--550}, doi = {10.1145/3236024.3236030}, year = {2018}, } Publisher's Version ESEC/FSE '18-NIER: "How Are Spreadsheet Templates ..." How Are Spreadsheet Templates Used in Practice: A Case Study on Enron Liang Xu, Wensheng Dou, Jiaxin Zhu, Chushu Gao, Jun Wei, and Tao Huang (University of Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China) To reduce the effort of creating similar spreadsheets, end users may create expected spreadsheets from some predesigned templates, which contain necessary table layouts (e.g., headers and styles) and formulas, other than from scratch. When there are no explicitly predesigned spreadsheet templates, end users often take an existing spreadsheet as the instance template to create a new spreadsheet. However, improper template design and usage can introduce various issues. For example, a formula error in the template can be easily propagated to all its instances without users’ noticing. Since template design and usage are rarely documented in literature and practice, practitioners and researchers lack understanding of them to achieve effective improvement. In this paper, we conduct the first empirical study on the design and the usage of spreadsheet templates based on 47 predesigned templates (490 instances in total), and 21 instance template groups (168 template and instance pairs in total), extracted from the Enron corpus. Our study reveals a number of spreadsheet template design and usage issues in practice, and also sheds lights on several interesting research directions. @InProceedings{ESEC/FSE18p734, author = {Liang Xu and Wensheng Dou and Jiaxin Zhu and Chushu Gao and Jun Wei and Tao Huang}, title = {How Are Spreadsheet Templates Used in Practice: A Case Study on Enron}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {734--738}, doi = {10.1145/3236024.3264834}, year = {2018}, } Publisher's Version |
|
Gao, Cuiyun |
ESEC/FSE '18-DEMO: "INFAR: Insight Extraction ..."
INFAR: Insight Extraction from App Reviews
Cuiyun Gao, Jichuan Zeng, David Lo, Chin-Yew Lin, Michael R. Lyu, and Irwin King (Chinese University of Hong Kong, China; Singapore Management University, Singapore; Microsoft Research, China) App reviews play an essential role for users to convey their feedback about using the app. The critical information contained in app reviews can assist app developers for maintaining and updating mobile apps. However, the noisy nature and large-quantity of daily generated app reviews make it difficult to understand essential information carried in app reviews. Several prior studies have proposed methods that can automatically classify or cluster user reviews into a few app topics (e.g., security). These methods usually act on a static collection of user reviews. However, due to the dynamic nature of user feedback (i.e., reviews keep coming as new users register or new app versions being released) and multiple analysis dimensions (e.g., review quantity and user rating), developers still need to spend substantial effort in extracting contrastive information that can only be teased out by comparing data from multiple time periods or analysis dimensions. This is needed to answer questions such as: what kind of issues users are experiencing most? is there an unexpected rise in a particular kind of issue? etc. To address this need, in this paper, we introduce INFAR, a tool that automatically extracts INsights From App Reviews across time periods and analysis dimensions, and presents them in natural language supported by an interactive chart. The insights INFAR extracts include several perspectives: (1) salient topics (i.e., issue topics with significantly lower ratings), (2) abnormal topics (i.e., issue topics that experience a rapid rise in volume during a time period), (3) correlations between two topics, and (4) causal factors to rating or review quantity changes. To evaluate our tool, we conduct an empirical evaluation by involving six popular apps and 12 industrial practitioners, and 92% (11/12) of them approve the practical usefulness of the insights summarized by INFAR. Demo Tool Website: https://remine-lab.github.io/paper/infar.html Demo Video: https://youtu.be/MjcoiyjA5TE @InProceedings{ESEC/FSE18p904, author = {Cuiyun Gao and Jichuan Zeng and David Lo and Chin-Yew Lin and Michael R. Lyu and Irwin King}, title = {INFAR: Insight Extraction from App Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {904--907}, doi = {10.1145/3236024.3264595}, year = {2018}, } Publisher's Version |
|
Gao, Jian |
ESEC/FSE '18-IND: "VulSeeker-Pro: Enhanced Semantic ..."
VulSeeker-Pro: Enhanced Semantic Learning Based Binary Vulnerability Seeker with Emulation
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, Heyuan Shi, and Jiaguang Sun (Tsinghua University, China) Learning-based clone detection is widely exploited for binary vulnerability search. Although they solve the problem of high time overhead of traditional dynamic and static search approaches to some extent, their accuracy is limited, and need to manually identify the true positive cases among the top-M search results during the industrial practice. This paper presents VulSeeker-Pro, an enhanced binary vulnerability seeker that integrates function semantic emulation at the back end of semantic learning, to release the engineers from the manual identification work. It first uses the semantic learning based predictor to quickly predict the top-M candidate functions which are the most similar to the vulnerability from the target binary. Then the top-M candidates are fed to the emulation engine to resort, and more accurate top-N candidate functions are obtained. With fast filtering of semantic learning and dynamic trace generation of function semantic emulation, VulSeeker-Pro can achieve higher search accuracy with little time overhead. The experimental results on 15 known CVE vulnerabilities involving 6 industry widely used programs show that VulSeeker-Pro significantly outperforms the state-of-the-art approaches in terms of accuracy. In a total of 45 searches, VulSeeker-Pro finds 40 and 43 real vulnerabilities in the top-1 and top-5 candidate functions, which are 12.33× and 2.58× more than the most recent and related work Gemini. In terms of efficiency, it takes 0.22 seconds on average to determine whether the target binary function contains a known vulnerability or not. @InProceedings{ESEC/FSE18p803, author = {Jian Gao and Xin Yang and Ying Fu and Yu Jiang and Heyuan Shi and Jiaguang Sun}, title = {VulSeeker-Pro: Enhanced Semantic Learning Based Binary Vulnerability Seeker with Emulation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {803--808}, doi = {10.1145/3236024.3275524}, year = {2018}, } Publisher's Version |
|
Gao, Yu |
ESEC/FSE '18: "An Empirical Study on Crash ..."
An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems
Yu Gao, Wensheng Dou, Feng Qin, Chushu Gao, Dong Wang, Jun Wei, Ruirui Huang, Li Zhou, and Yongming Wu (Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Ohio State University, USA; Alibaba Group, China) In large-scale distributed systems, node crashes are inevitable, and can happen at any time. As such, distributed systems are usually designed to be resilient to these node crashes via various crash recovery mechanisms, such as write-ahead logging in HBase and hinted handoffs in Cassandra. However, faults in crash recovery mechanisms and their implementations can introduce intricate crash recovery bugs, and lead to severe consequences. In this paper, we present CREB, the most comprehensive study on 103 Crash REcovery Bugs from four popular open-source distributed systems, including ZooKeeper, Hadoop MapReduce, Cassandra and HBase. For all the studied bugs, we analyze their root causes, triggering conditions, bug impacts and fixing. Through this study, we obtain many interesting findings that can open up new research directions for combating crash recovery bugs. @InProceedings{ESEC/FSE18p539, author = {Yu Gao and Wensheng Dou and Feng Qin and Chushu Gao and Dong Wang and Jun Wei and Ruirui Huang and Li Zhou and Yongming Wu}, title = {An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {539--550}, doi = {10.1145/3236024.3236030}, year = {2018}, } Publisher's Version |
|
Garbervetsky, Diego |
ESEC/FSE '18-IND: "Testing and Validating End ..."
Testing and Validating End User Programmed Calculated Fields
Víctor Braberman, Diego Garbervetsky, Javier Godoy, Sebastian Uchitel, Guido de Caso, Ignacio Perez, and Santiago Perez (University of Buenos Aires, Argentina; CONICET, Argentina; Imperial College London, UK; Medallia, USA) This paper reports on an approach for systematically generating test data from production databases for end user calculated field program via a novel combination of symbolic execution and database queries. We also discuss the opportunities and challenges that this specific domain poses for symbolic execution and shows how database queries can help complement some of symbolic execution's weaknesses, namely in the treatment of loops and also of path conditions that exceed SMT solver capabilities. @InProceedings{ESEC/FSE18p827, author = {Víctor Braberman and Diego Garbervetsky and Javier Godoy and Sebastian Uchitel and Guido de Caso and Ignacio Perez and Santiago Perez}, title = {Testing and Validating End User Programmed Calculated Fields}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {827--832}, doi = {10.1145/3236024.3275531}, year = {2018}, } Publisher's Version |
|
Gavrilov, Miroslav |
ESEC/FSE '18: "Parameterized Model Counting ..."
Parameterized Model Counting for String and Numeric Constraints
Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu (Microsoft, USA; University of California at Santa Barbara, USA; National Chengchi University, Taiwan) Recently, symbolic program analysis techniques have been extended to quantitative analyses using model counting constraint solvers. Given a constraint and a bound, a model counting constraint solver computes the number of solutions for the constraint within the bound. We present a parameterized model counting constraint solver for string and numeric constraints. We first construct a multi-track deterministic finite state automaton that accepts all solutions to the given constraint. We limit the numeric constraints to linear integer arithmetic, and for non-regular string constraints we over-approximate the solution set. Counting the number of accepting paths in the generated automaton solves the model counting problem. Our approach is parameterized in the sense that, we do not assume a finite domain size during automata construction, resulting in a potentially infinite set of solutions, and our model counting approach works for arbitrarily large bounds. We experimentally demonstrate the effectiveness of our approach on a large set of string and numeric constraints extracted from software applications. We experimentally compare our tool to five existing model counting constraint solvers for string and numeric constraints and demonstrate that our tool is as efficient and as or more precise than other solvers. Moreover, our tool can handle mixed constraints with string and integer variables that no other tool can. @InProceedings{ESEC/FSE18p400, author = {Abdulbaki Aydin and William Eiers and Lucas Bang and Tegan Brennan and Miroslav Gavrilov and Tevfik Bultan and Fang Yu}, title = {Parameterized Model Counting for String and Numeric Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {400--410}, doi = {10.1145/3236024.3236064}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Gligoric, Milos |
ESEC/FSE '18-IND: "Regression Test Selection ..."
Regression Test Selection for TizenRT
Ahmet Celik, Young Chul Lee, and Milos Gligoric (University of Texas at Austin, USA; Samsung Electronics, South Korea) Regression testing - running tests after code modifications - is widely practiced in industry, including at Samsung. Regression Test Selection (RTS) optimizes regression testing by skipping tests that are not affected by recent code changes. Recent work has developed robust RTS tools, which mostly target managed languages, e.g., Java and C#, and thus are not applicable to large C projects, e.g., TizenRT, a lightweight RTOS-based platform. We present Selfection, an RTS tool for projects written in C; we discuss the key challenges to develop Selfection and our design decisions. Selfection uses the objdump and readelf tools to statically build a dependency graph of functions from binaries and detect modified code elements. We integrated Selfection in TizenRT and evaluated its benefits if tests are run in an emulator and on a supported hardware platform (ARTIK 053). We used the latest 150 revisions of TizenRT available on GitHub. We measured the benefits of Selfection as the reduction in the number of tests and reduction in test execution time over running all tests at each revision (i.e., RetestAll). Our results show that Selfection can reduce, on average, the number of tests to 4.95% and end-to-end execution time to 7.04% when tests are executed in the emulator, and to 5.74% and 26.82% when tests are executed on the actual hardware. Our results also show that the time taken to maintain the dependency graph and detect modified functions is negligible. @InProceedings{ESEC/FSE18p845, author = {Ahmet Celik and Young Chul Lee and Milos Gligoric}, title = {Regression Test Selection for TizenRT}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {845--850}, doi = {10.1145/3236024.3275527}, year = {2018}, } Publisher's Version |
|
Glondu, Stéphane |
ESEC/FSE '18-INV: "Ten Years of Hunting for Similar ..."
Ten Years of Hunting for Similar Code for Fun and Profit (Keynote)
Stéphane Glondu, Lingxiao Jiang, and Zhendong Su (Inria, France; Singapore Management University, Singapore; ETH Zurich, Switzerland) In 2007, the Deckard paper was published at ICSE. Since its publication, it has led to much follow-up research and applications. The paper made two core contributions: a novel vector embedding of structured code for fast similarity detection, and an application of the embedding for clone detection, resulting in the Deckard tool. The vector embedding is simple and easy to adapt. Similar code detection is also fundamental for a range of classical and emerging problems in software engineering, security, and computer science education (e.g., code reuse, refactoring, porting, translation, synthesis, program repair, malware detection, and feedback generation). Both have buttressed the paper’s influence. In 2018, the Deckard paper received the ACM SIGSOFT Impact Paper award. In this keynote, we take the opportunity to review the work’s inception, evolution and impact on its subsequent work and applications, and to share our thoughts on exciting ongoing and future developments. @InProceedings{ESEC/FSE18p2, author = {Stéphane Glondu and Lingxiao Jiang and Zhendong Su}, title = {Ten Years of Hunting for Similar Code for Fun and Profit (Keynote)}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {2--2}, doi = {10.1145/3236024.3280856}, year = {2018}, } Publisher's Version |
|
Godoy, Javier |
ESEC/FSE '18-IND: "Testing and Validating End ..."
Testing and Validating End User Programmed Calculated Fields
Víctor Braberman, Diego Garbervetsky, Javier Godoy, Sebastian Uchitel, Guido de Caso, Ignacio Perez, and Santiago Perez (University of Buenos Aires, Argentina; CONICET, Argentina; Imperial College London, UK; Medallia, USA) This paper reports on an approach for systematically generating test data from production databases for end user calculated field program via a novel combination of symbolic execution and database queries. We also discuss the opportunities and challenges that this specific domain poses for symbolic execution and shows how database queries can help complement some of symbolic execution's weaknesses, namely in the treatment of loops and also of path conditions that exceed SMT solver capabilities. @InProceedings{ESEC/FSE18p827, author = {Víctor Braberman and Diego Garbervetsky and Javier Godoy and Sebastian Uchitel and Guido de Caso and Ignacio Perez and Santiago Perez}, title = {Testing and Validating End User Programmed Calculated Fields}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {827--832}, doi = {10.1145/3236024.3275531}, year = {2018}, } Publisher's Version |
|
Goknil, Arda |
ESEC/FSE '18-DEMO: "AlloyInEcore: Embedding of ..."
AlloyInEcore: Embedding of First-Order Relational Logic into Meta-Object Facility for Automated Model Reasoning
Ferhat Erata, Arda Goknil, Ivan Kurtev, and Bedir Tekinerdogan (UNIT Information Technologies, Turkey; University of Luxembourg, Luxembourg; Altran, Netherlands; Wageningen University, Netherlands) We present AlloyInEcore, a tool for specifying metamodels with their static semantics to facilitate automated, formal reasoning on models. Software development projects require that software systems be specified in various models (e.g., requirements models, architecture models, test models, and source code). It is crucial to reason about those models to ensure the correct and complete system specifications. AlloyInEcore~allows the user to specify metamodels with their static semantics, while, using the semantics, it automatically detects inconsistent models, and completes partial models. It has been evaluated on three industrial case studies in the automotive domain (https://modelwriter.github.io/AlloyInEcore/). @InProceedings{ESEC/FSE18p920, author = {Ferhat Erata and Arda Goknil and Ivan Kurtev and Bedir Tekinerdogan}, title = {AlloyInEcore: Embedding of First-Order Relational Logic into Meta-Object Facility for Automated Model Reasoning}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {920--923}, doi = {10.1145/3236024.3264588}, year = {2018}, } Publisher's Version Video Info |
|
Grama, Ananth |
ESEC/FSE '18: "MODE: Automated Neural Network ..."
MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama (Purdue University, USA) Artificial intelligence models are becoming an integral part of modern computing systems. Just like software inevitably has bugs, models have bugs too, leading to poor classification/prediction accuracy. Unlike software bugs, model bugs cannot be easily fixed by directly modifying models. Existing solutions work by providing additional training inputs. However, they have limited effectiveness due to the lack of understanding of model misbehaviors and hence the incapability of selecting proper inputs. Inspired by software debugging, we propose a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing. Our evaluation results on 29 different models for 6 different applications show that our technique can fix model bugs effectively and efficiently without introducing new bugs. For simple applications (e.g., digit recognition), MODE improves the test accuracy from 75% to 93% on average whereas the state-of-the-art can only improve to 85% with 11 times more training time. For complex applications and models (e.g., object recognition), MODE is able to improve the accuracy from 75% to over 91% in minutes to a few hours, whereas state-of-the-art fails to fix the bug or even degrades the test accuracy. @InProceedings{ESEC/FSE18p175, author = {Shiqing Ma and Yingqi Liu and Wen-Chuan Lee and Xiangyu Zhang and Ananth Grama}, title = {MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {175--186}, doi = {10.1145/3236024.3236082}, year = {2018}, } Publisher's Version |
|
Griggio, Alberto |
ESEC/FSE '18: "Symbolic Execution with Existential ..."
Symbolic Execution with Existential Second-Order Constraints
Sergey Mechtaev, Alberto Griggio, Alessandro Cimatti, and Abhik Roychoudhury (National University of Singapore, Singapore; Fondazione Bruno Kessler, Italy) Symbolic execution systematically explores program paths by solving path conditions --- formulas over symbolic variables. Typically, the symbolic variables range over numbers, arrays and strings. We introduce symbolic execution with existential second-order constraints --- an extension of traditional symbolic execution that allows symbolic variables to range over functions whose interpretations are restricted by a user-defined language. The aims of this new technique are twofold. First, it offers a general analysis framework that can be applied in multiple domains such as program repair and library modelling. Secondly, it addresses the path explosion problem of traditional first-order symbolic execution in certain applications. To realize this technique, we integrate symbolic execution with program synthesis. Specifically, we propose a method of second-order constraint solving that provides efficient proofs of unsatisfiability, which is critical for the performance of symbolic execution. Our evaluation shows that the proposed technique (1) helps to repair programs with loops by mitigating the path explosion, (2) can enable analysis of applications written against unavailable libraries by modelling these libraries from the usage context. @InProceedings{ESEC/FSE18p389, author = {Sergey Mechtaev and Alberto Griggio and Alessandro Cimatti and Abhik Roychoudhury}, title = {Symbolic Execution with Existential Second-Order Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {389--399}, doi = {10.1145/3236024.3236049}, year = {2018}, } Publisher's Version |
|
Gu, Lintao |
ESEC/FSE '18-IND: "FACTS: Automated Black-Box ..."
FACTS: Automated Black-Box Testing of FinTech Systems
Qingshun Wang, Lintao Gu, Minhui Xue, Lihua Xu, Wenyu Niu, Liang Dou, Liang He, and Tao Xie (East China Normal University, China; Macquarie University, Australia; New York University Shanghai, China; CFETS Information Technology, China; University of Illinois at Urbana-Champaign, USA) FinTech, short for ``financial technology,'' has advanced the process of transforming financial business from a traditional manual-process-driven to an automation-driven model by providing various software platforms. However, the current FinTech-industry still heavily depends on manual testing, which becomes the bottleneck of FinTech industry development. To automate the testing process, we propose an approach of black-box testing for a FinTech system with effective tool support for both test generation and test oracles. For test generation, we first extract input categories from business-logic specifications, and then mutate real data collected from system logs with values randomly picked from each extracted input category. For test oracles, we propose a new technique of priority differential testing where we evaluate execution results of system-test inputs on the system's head (i.e., latest) version in the version repository (1) against the last legacy version in the version repository (only when the executed test inputs are on new, not-yet-deployed services) and (2) against both the currently-deployed version and the last legacy version (only when the test inputs are on existing, deployed services). When we rank the behavior-inconsistency results for developers to inspect, for the latter case, we give the currently-deployed version as a higher-priority source of behavior to check. We apply our approach to the CSTP subsystem, one of the largest data processing and forwarding modules of the China Foreign Exchange Trade System (CFETS) platform, whose annual total transaction volume reaches 150 trillion US dollars. Extensive experimental results show that our approach can substantially boost the branch coverage by approximately 40%, and is also efficient to identify common faults in the FinTech system. @InProceedings{ESEC/FSE18p839, author = {Qingshun Wang and Lintao Gu and Minhui Xue and Lihua Xu and Wenyu Niu and Liang Dou and Liang He and Tao Xie}, title = {FACTS: Automated Black-Box Testing of FinTech Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {839--844}, doi = {10.1145/3236024.3275533}, year = {2018}, } Publisher's Version |
|
Gulzar, Muhammad Ali |
ESEC/FSE '18-DEMO: "BigSift: Automated Debugging ..."
BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing
Muhammad Ali Gulzar, Siman Wang, and Miryung Kim (University of California at Los Angeles, USA; Hunan University, China) Developing Big Data Analytics often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g. program crash, outlier results, etc.) arise, developers are often interested in pinpointing the root cause of errors. To address this problem, BigSift takes an Apache Spark program, a user-defined test oracle function, and a dataset as input and outputs a minimum set of input records that reproduces the same test failure by combining the insights from delta debugging with data provenance. The technical contribution of BigSift is the design of systems optimizations that bring automated debugging closer to a reality for data intensive scalable computing. BigSift exposes an interactive web interface where a user can monitor a big data analytics job running remotely on the cloud, write a user-defined test oracle function, and then trigger the automated debugging process. BigSift also provides a set of predefined test oracle functions, which can be used for explaining common types of anomalies in big data analytics--for example, finding the origin of the output value that is more than k standard deviations away from the median. The demonstration video is available at https://youtu.be/jdBsCd61a1Q. @InProceedings{ESEC/FSE18p863, author = {Muhammad Ali Gulzar and Siman Wang and Miryung Kim}, title = {BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {863--866}, doi = {10.1145/3236024.3264586}, year = {2018}, } Publisher's Version |
|
Guo, Jianmin |
ESEC/FSE '18-NIER: "DLFuzz: Differential Fuzzing ..."
DLFuzz: Differential Fuzzing Testing of Deep Learning Systems
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun (Tsinghua University, China; Shanghai Jiao Tong University, China) Deep learning (DL) systems are increasingly applied to safety-critical domains such as autonomous driving cars. It is of significant importance to ensure the reliability and robustness of DL systems. Existing testing methodologies always fail to include rare inputs in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz, the first differential fuzzing testing framework to guide DL systems exposing incorrect behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and the prediction difference between the original input and the mutated input, without manual labeling effort or cross-referencing oracles from other DL systems with the same functionality. We present empirical evaluations on two well-known datasets to demonstrate its efficiency. Compared with DeepXplore, the state-of-the-art DL whitebox testing framework, DLFuzz does not require extra efforts to find similar functional DL systems for cross-referencing check, but could generate 338.59% more adversarial inputs with 89.82% smaller perturbations, averagely obtain 2.86% higher neuron coverage, and save 20.11% time consumption. @InProceedings{ESEC/FSE18p739, author = {Jianmin Guo and Yu Jiang and Yue Zhao and Quan Chen and Jiaguang Sun}, title = {DLFuzz: Differential Fuzzing Testing of Deep Learning Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {739--743}, doi = {10.1145/3236024.3264835}, year = {2018}, } Publisher's Version |
|
Guo, Shengjian |
ESEC/FSE '18: "Adversarial Symbolic Execution ..."
Adversarial Symbolic Execution for Detecting Concurrency-Related Cache Timing Leaks
Shengjian Guo, Meng Wu, and Chao Wang (Virginia Tech, USA; University of Southern California, USA) The timing characteristics of cache, a high-speed storage between the fast CPU and the slow memory, may reveal sensitive information of a program, thus allowing an adversary to conduct side-channel attacks. Existing methods for detecting timing leaks either ignore cache all together or focus only on passive leaks generated by the program itself, without considering leaks that are made possible by concurrently running some other threads. In this work, we show that timing-leak-freedom is not a compositional property: a program that is not leaky when running alone may become leaky when interleaved with other threads. Thus, we develop a new method, named adversarial symbolic execution, to detect such leaks. It systematically explores both the feasible program paths and their interleavings while modeling the cache, and leverages an SMT solver to decide if there are timing leaks. We have implemented our method in LLVM and evaluated it on a set of real-world ciphers with 14,455 lines of C code in total. Our experiments demonstrate both the efficiency of our method and its effectiveness in detecting side-channel leaks. @InProceedings{ESEC/FSE18p377, author = {Shengjian Guo and Meng Wu and Chao Wang}, title = {Adversarial Symbolic Execution for Detecting Concurrency-Related Cache Timing Leaks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {377--388}, doi = {10.1145/3236024.3236028}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Guo, Yao |
ESEC/FSE '18: "FraudDroid: Automated Ad Fraud ..."
FraudDroid: Automated Ad Fraud Detection for Android Apps
Feng Dong, Haoyu Wang, Li Li, Yao Guo, Tegawendé F. Bissyandé, Tianming Liu, Guoai Xu, and Jacques Klein (Beijing University of Posts and Telecommunications, China; Monash University, Australia; Peking University, China; University of Luxembourg, Luxembourg) Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (∼ 93%) and recall (∼ 92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection. @InProceedings{ESEC/FSE18p257, author = {Feng Dong and Haoyu Wang and Li Li and Yao Guo and Tegawendé F. Bissyandé and Tianming Liu and Guoai Xu and Jacques Klein}, title = {FraudDroid: Automated Ad Fraud Detection for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--268}, doi = {10.1145/3236024.3236045}, year = {2018}, } Publisher's Version |
|
Gusmanov, Kamill |
ESEC/FSE '18-SRC: "On the Adoption of Neural ..."
On the Adoption of Neural Networks in Modeling Software Reliability
Kamill Gusmanov (Innopolis University, Russia) This work models the reliability of software systems using recurrent neural networks with long short-term memory (LSTM) units and truncated backpropagation algorithm, and encoder-decoder LSTM architecture and proposes LSTM with software reliability functions as activation functions and LSTM with input features as the output of software reliability functions. An initial evaluation on data coming from 4 industrial projects is also provided. @InProceedings{ESEC/FSE18p962, author = {Kamill Gusmanov}, title = {On the Adoption of Neural Networks in Modeling Software Reliability}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {962--964}, doi = {10.1145/3236024.3275433}, year = {2018}, } Publisher's Version |
|
Han, Shi |
ESEC/FSE '18-IND: "Automated Refactoring of Nested-IF ..."
Automated Refactoring of Nested-IF Formulae in Spreadsheets
Jie Zhang, Shi Han, Dan Hao, Lu Zhang, and Dongmei Zhang (Peking University, China; Microsoft Research, China) Spreadsheets are the most popular end-user programming software, where formulae act like programs and also have smells. One well recognized smell is the use of nested-IF expressions, which have low readability and high cognitive cost for users, and are error-prone during reuse or maintenance. End users usually lack essential programming language knowledge to tackle or even realize this problem, yet no automatic approaches are currently available. This paper proposes the first exploration of the nest-if usage status against two large-scale spreadsheet corpora containing over 80,000 industry-level spreadsheets. It turns out the use of nested-IF expressions are surprisingly common among end users. We then present an approach to tackling this problem through automatic formula refactoring. The general idea of the automatic approach is two-fold. First, we detect and remove logic redundancy based on the AST of a formula. Second, we identify higher-level semantics that have been represented with fragmented and scattered syntax, and reassemble the syntax using concise built-in functions. A comprehensive evaluation with over 28 million nested-IF formulae reveals that the approach is able to relieve the smell of over 90% of nested-IF formulae. @InProceedings{ESEC/FSE18p833, author = {Jie Zhang and Shi Han and Dan Hao and Lu Zhang and Dongmei Zhang}, title = {Automated Refactoring of Nested-IF Formulae in Spreadsheets}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {833--838}, doi = {10.1145/3236024.3275532}, year = {2018}, } Publisher's Version |
|
Hao, Dan |
ESEC/FSE '18-IND: "Automated Refactoring of Nested-IF ..."
Automated Refactoring of Nested-IF Formulae in Spreadsheets
Jie Zhang, Shi Han, Dan Hao, Lu Zhang, and Dongmei Zhang (Peking University, China; Microsoft Research, China) Spreadsheets are the most popular end-user programming software, where formulae act like programs and also have smells. One well recognized smell is the use of nested-IF expressions, which have low readability and high cognitive cost for users, and are error-prone during reuse or maintenance. End users usually lack essential programming language knowledge to tackle or even realize this problem, yet no automatic approaches are currently available. This paper proposes the first exploration of the nest-if usage status against two large-scale spreadsheet corpora containing over 80,000 industry-level spreadsheets. It turns out the use of nested-IF expressions are surprisingly common among end users. We then present an approach to tackling this problem through automatic formula refactoring. The general idea of the automatic approach is two-fold. First, we detect and remove logic redundancy based on the AST of a formula. Second, we identify higher-level semantics that have been represented with fragmented and scattered syntax, and reassemble the syntax using concise built-in functions. A comprehensive evaluation with over 28 million nested-IF formulae reveals that the approach is able to relieve the smell of over 90% of nested-IF formulae. @InProceedings{ESEC/FSE18p833, author = {Jie Zhang and Shi Han and Dan Hao and Lu Zhang and Dongmei Zhang}, title = {Automated Refactoring of Nested-IF Formulae in Spreadsheets}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {833--838}, doi = {10.1145/3236024.3275532}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Optimizing Test Prioritization ..." Optimizing Test Prioritization via Test Distribution Analysis Junjie Chen, Yiling Lou, Lingming Zhang, Jianyi Zhou, Xiaoleng Wang, Dan Hao, and Lu Zhang (Peking University, China; University of Texas at Dallas, USA; Baidu, China) Test prioritization aims to detect regression faults faster via reordering test executions, and a large number of test prioritization techniques have been proposed accordingly. However, test prioritization effectiveness is usually measured in terms of the average percentage of faults detected concerned with the number of test executions, rather than the actual regression testing time, making it unclear which technique is optimal in actual regression testing time. To answer this question, this paper first conducts an empirical study to investigate the actual regression testing time of various prioritization techniques. The results reveal a number of practical guidelines. In particular, no prioritization technique can always perform optimal in practice. To achieve the optimal prioritization effectiveness for any given project in practice, based on the findings of this study, we design learning-based Predictive Test Prioritization (PTP). PTP predicts the optimal prioritization technique for a given project based on the test distribution analysis (i.e., the distribution of test coverage, testing time, and coverage per unit time). The results show that PTP correctly predicts the optimal prioritization technique for 46 out of 50 open-source projects from GitHub, outperforming state-of-the-art techniques significantly in regression testing time, e.g., 43.16% to 94.92% improvement in detecting the first regression fault. Furthermore, PTP has been successfully integrated into the practical testing infrastructure of Baidu (a search service provider with over 600M monthly active users), and received positive feedbacks from the testing team of this company, e.g., saving beyond 2X testing costs with negligible overheads. @InProceedings{ESEC/FSE18p656, author = {Junjie Chen and Yiling Lou and Lingming Zhang and Jianyi Zhou and Xiaoleng Wang and Dan Hao and Lu Zhang}, title = {Optimizing Test Prioritization via Test Distribution Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {656--667}, doi = {10.1145/3236024.3236053}, year = {2018}, } Publisher's Version |
|
Hashimoto, Masatomo |
ESEC/FSE '18: "Automated Patch Extraction ..."
Automated Patch Extraction via Syntax- and Semantics-Aware Delta Debugging on Source Code Changes
Masatomo Hashimoto, Akira Mori, and Tomonori Izumida (Chiba Institute of Technology, Japan; AIST, Japan; IIJ Innovation Institute, Japan) Delta debugging (DD) is an approach to automating the debugging activities based on systematic testing. DD algorithms find the cause of a regression of a program by minimizing the changes applied between a working version and a faulty version of the program. However, it is still an open problem to minimize a huge set of changes while avoiding any invalid subsets that do not result in testable programs, especially in case that no software configuration management system is available. In this paper, we propose a rule-based approach to syntactic and semantic decomposition of changes into independent components to facilitate DD on source code changes, and hence to extract patches automatically. For analyzing changes, we make use of tree differencing on abstract syntax trees instead of common differencing on plain texts. We have developed an experimental implementation for Java programs and applied it to 194 bug fixes from Defects4J and 8 real-life regression bugs from 6 open source Java projects. Compared to a DD tool based on plain text differencing, it extracted patches whose size is reduced by 50% at the cost of 5% more test executions for the former dataset and by 73% at the cost of 40% more test executions for the latter, both on average. @InProceedings{ESEC/FSE18p598, author = {Masatomo Hashimoto and Akira Mori and Tomonori Izumida}, title = {Automated Patch Extraction via Syntax- and Semantics-Aware Delta Debugging on Source Code Changes}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {598--609}, doi = {10.1145/3236024.3236047}, year = {2018}, } Publisher's Version Info |
|
He, Liang |
ESEC/FSE '18-IND: "FACTS: Automated Black-Box ..."
FACTS: Automated Black-Box Testing of FinTech Systems
Qingshun Wang, Lintao Gu, Minhui Xue, Lihua Xu, Wenyu Niu, Liang Dou, Liang He, and Tao Xie (East China Normal University, China; Macquarie University, Australia; New York University Shanghai, China; CFETS Information Technology, China; University of Illinois at Urbana-Champaign, USA) FinTech, short for ``financial technology,'' has advanced the process of transforming financial business from a traditional manual-process-driven to an automation-driven model by providing various software platforms. However, the current FinTech-industry still heavily depends on manual testing, which becomes the bottleneck of FinTech industry development. To automate the testing process, we propose an approach of black-box testing for a FinTech system with effective tool support for both test generation and test oracles. For test generation, we first extract input categories from business-logic specifications, and then mutate real data collected from system logs with values randomly picked from each extracted input category. For test oracles, we propose a new technique of priority differential testing where we evaluate execution results of system-test inputs on the system's head (i.e., latest) version in the version repository (1) against the last legacy version in the version repository (only when the executed test inputs are on new, not-yet-deployed services) and (2) against both the currently-deployed version and the last legacy version (only when the test inputs are on existing, deployed services). When we rank the behavior-inconsistency results for developers to inspect, for the latter case, we give the currently-deployed version as a higher-priority source of behavior to check. We apply our approach to the CSTP subsystem, one of the largest data processing and forwarding modules of the China Foreign Exchange Trade System (CFETS) platform, whose annual total transaction volume reaches 150 trillion US dollars. Extensive experimental results show that our approach can substantially boost the branch coverage by approximately 40%, and is also efficient to identify common faults in the FinTech system. @InProceedings{ESEC/FSE18p839, author = {Qingshun Wang and Lintao Gu and Minhui Xue and Lihua Xu and Wenyu Niu and Liang Dou and Liang He and Tao Xie}, title = {FACTS: Automated Black-Box Testing of FinTech Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {839--844}, doi = {10.1145/3236024.3275533}, year = {2018}, } Publisher's Version |
|
He, Shilin |
ESEC/FSE '18: "Identifying Impactful Service ..."
Identifying Impactful Service System Problems via Log Analysis
Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang (Chinese University of Hong Kong, China; Microsoft Research, China; University of Newcastle, Australia) Logs are often used for troubleshooting in large-scale software systems. For a cloud-based online system that provides 24/7 service, a huge number of logs could be generated every day. However, these logs are highly imbalanced in general, because most logs indicate normal system operations, and only a small percentage of logs reveal impactful problems. Problems that lead to the decline of system KPIs (Key Performance Indicators) are impactful and should be fixed by engineers with a high priority. Furthermore, there are various types of system problems, which are hard to be distinguished manually. In this paper, we propose Log3C, a novel clustering-based approach to promptly and precisely identify impactful system problems, by utilizing both log sequences (a sequence of log events) and system KPIs. More specifically, we design a novel cascading clustering algorithm, which can greatly save the clustering time while keeping high accuracy by iteratively sampling, clustering, and matching log sequences. We then identify the impactful problems by correlating the clusters of log sequences with system KPIs. Log3C is evaluated on real-world log data collected from an online service system at Microsoft, and the results confirm its effectiveness and efficiency. Furthermore, our approach has been successfully applied in industrial practice. @InProceedings{ESEC/FSE18p60, author = {Shilin He and Qingwei Lin and Jian-Guang Lou and Hongyu Zhang and Michael R. Lyu and Dongmei Zhang}, title = {Identifying Impactful Service System Problems via Log Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {60--70}, doi = {10.1145/3236024.3236083}, year = {2018}, } Publisher's Version |
|
He, Xiao |
ESEC/FSE '18: "Putback-Based Bidirectional ..."
Putback-Based Bidirectional Model Transformations
Xiao He and Zhenjiang Hu (University of Science and Technology Beijing, China; NII, Japan; University of Tokyo, Japan) Bidirectional model transformation (BX) plays a vital role in Model-Driven Engineering. A major challenge in conventional relational and bidirectionalization-based BX approaches is the ambiguity issue, i.e., the backward transformation may not be uniquely determined by the consistency relation or the forward transformation. A promising solution to the ambiguity issue is to adopt putback-based bidirectional programming, which realizes a BX by specifying the backward transformation. However, existing putback-based approaches do not support multiple conversions of the same node (namely a shared node). Since a model is a graph, shared nodes are very common and inevitable. Consequently, existing putback-based approaches cannot be directly applied to bidirectional model transformation. This paper proposes a novel approach to BX. We define a new model-merging-based BX combinator, which can combine two BXs owning shared nodes into a well behaved composite BX. Afterwards, we propose a putback-based BX language XMU to address the ambiguity issue, which is built on the model-merging-based BX combinator. We present the formal semantics of XMU which can be proven well behaved. Finally, a tool support is also introduced to illustrate the usefulness of our approach. @InProceedings{ESEC/FSE18p434, author = {Xiao He and Zhenjiang Hu}, title = {Putback-Based Bidirectional Model Transformations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {434--444}, doi = {10.1145/3236024.3236070}, year = {2018}, } Publisher's Version Info |
|
Hebig, Regina |
ESEC/FSE '18: "Model Transformation Languages ..."
Model Transformation Languages under a Magnifying Glass: A Controlled Experiment with Xtend, ATL, and QVT
Regina Hebig, Christoph Seidl, Thorsten Berger, John Kook Pedersen, and Andrzej Wąsowski (Chalmers University of Technology, Sweden; University of Gothenburg, Sweden; TU Braunschweig, Germany; IT University of Copenhagen, Denmark) In Model-Driven Software Development, models are automatically processed to support the creation, build, and execution of systems. A large variety of dedicated model-transformation languages exists, promising to efficiently realize the automated processing of models. To investigate the actual benefit of using such specialized languages, we performed a large-scale controlled experiment in which over 78 subjects solve 231 individual tasks using three languages. The experiment sheds light on commonalities and differences between model transformation languages (ATL, QVT-O) and on benefits of using them in common development tasks (comprehension, change, and creation) against a modern general-purpose language (Xtend). Our results show no statistically significant benefit of using a dedicated transformation language over a modern general-purpose language. However, we were able to identify several aspects of transformation programming where domain-specific transformation languages do appear to help, including copying objects, context identification, and conditioning the computation on types. @InProceedings{ESEC/FSE18p445, author = {Regina Hebig and Christoph Seidl and Thorsten Berger and John Kook Pedersen and Andrzej Wąsowski}, title = {Model Transformation Languages under a Magnifying Glass: A Controlled Experiment with Xtend, ATL, and QVT}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {445--455}, doi = {10.1145/3236024.3236046}, year = {2018}, } Publisher's Version Info |
|
Hellendoorn, Vincent J. |
ESEC/FSE '18: "Deep Learning Type Inference ..."
Deep Learning Type Inference
Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis (University of California at Davis, USA; Microsoft Research, USA; University College London, UK; Microsoft Research, UK) Dynamically typed languages such as JavaScript and Python are increasingly popular, yet static typing has not been totally eclipsed: Python now supports type annotations and languages like TypeScript offer a middle-ground for JavaScript: a strict superset of JavaScript, to which it transpiles, coupled with a type system that permits partially typed programs. However, static typing has a cost: adding annotations, reading the added syntax, and wrestling with the type system to fix type errors. Type inference can ease the transition to more statically typed code and unlock the benefits of richer compile-time information, but is limited in languages like JavaScript as it cannot soundly handle duck-typing or runtime evaluation via eval. We propose DeepTyper, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially. DeepTyper, leverages an automatically aligned corpus of tokens and types to accurately predict thousands of variable and function type annotations. Furthermore, we demonstrate that context is key in accurately assigning these types and introduce a technique to reduce overfitting on local cues while highlighting the need for further improvements. Finally, we show that our model can interact with a compiler to provide more than 4,000 additional type annotations with over 95% precision that could not be inferred without the aid of DeepTyper. @InProceedings{ESEC/FSE18p152, author = {Vincent J. Hellendoorn and Christian Bird and Earl T. Barr and Miltiadis Allamanis}, title = {Deep Learning Type Inference}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {152--162}, doi = {10.1145/3236024.3236051}, year = {2018}, } Publisher's Version Info ESEC/FSE '18-NIER: "On the Naturalness of Proofs ..." On the Naturalness of Proofs Vincent J. Hellendoorn, Premkumar T. Devanbu, and Mohammad Amin Alipour (University of California at Davis, USA; University of Houston, USA) Proofs play a key role in reasoning about programs and verification of properties of systems. Mechanized proof assistants help users in developing and checking the consistency of proofs using the proof language developed by the systems; but even then writing proofs is tedious and could benefit from automated insight. In this paper, we analyze proofs in two different proof assistant systems (Coq and HOL Light) to investigate if there is evidence of "naturalness" in these proofs: viz., recurring linguistic patterns that are amenable to language models, in the way that programming languages are known to be. Such models could be used to find errors, rewrite proofs, help suggest dependencies, and perhaps even synthesize (steps of) proofs. We apply state-of-the-art language models to large corpora of proofs to show that this is indeed the case: proofs are remarkably predictable, much like other programming languages. Code completion tools for Coq proofs could save over 60% of typing effort. As proofs have become increasingly central to writing provably correct, large programs (such as the CompCert C compiler), our demonstration that they are amenable to general statistical models unlocks a range of linguistics-inspired tool support. @InProceedings{ESEC/FSE18p724, author = {Vincent J. Hellendoorn and Premkumar T. Devanbu and Mohammad Amin Alipour}, title = {On the Naturalness of Proofs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {724--728}, doi = {10.1145/3236024.3264832}, year = {2018}, } Publisher's Version |
|
Henkel, Jordan |
ESEC/FSE '18: "Code Vectors: Understanding ..."
Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces
Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, and Thomas Reps (University of Wisconsin-Madison, USA; Microsoft, USA; GrammaTech, USA) With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions. @InProceedings{ESEC/FSE18p163, author = {Jordan Henkel and Shuvendu K. Lahiri and Ben Liblit and Thomas Reps}, title = {Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {163--174}, doi = {10.1145/3236024.3236085}, year = {2018}, } Publisher's Version Info Artifacts Functional |
|
Herbsleb, James |
ESEC/FSE '18: "Ecosystem-Level Determinants ..."
Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem
Marat Valiev, Bogdan Vasilescu, and James Herbsleb (Carnegie Mellon University, USA) Open-source projects do not exist in a vacuum. They benefit from reusing other projects and themselves are being reused by others, creating complex networks of interdependencies, i.e., software ecosystems. Therefore, the sustainability of projects comprising ecosystems may no longer by determined solely by factors internal to the project, but rather by the ecosystem context as well. In this paper we report on a mixed-methods study of ecosystem-level factors affecting the sustainability of open-source Python projects. Quantitatively, using historical data from 46,547 projects in the PyPI ecosystem, we modeled the chances of project development entering a period of dormancy (limited activity) as a function of the projects' position in their dependency networks, organizational support, and other factors. Qualitatively, we triangulated the revealed effects and further expanded on our models through interviews with project maintainers. Results show that the number of project ties and the relative position in the dependency network have significant impact on sustained project activity, with nuanced effects early in a project's life cycle and later on. @InProceedings{ESEC/FSE18p644, author = {Marat Valiev and Bogdan Vasilescu and James Herbsleb}, title = {Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {644--655}, doi = {10.1145/3236024.3236062}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Hong, Kai |
ESEC/FSE '18-NIER: "Salient-Class Location: Help ..."
Salient-Class Location: Help Developers Understand Code Change in Code Review
Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng (Sun Yat-sen University, China; Hebei GEO University, China) Code review involves a significant amount of human effort to understand the code change, because the information required to inspect code changes may distribute across multiple files that reviewers are not familiar with. Code changes are often organized as commits for review. In this paper, we found that most of the commits contain a salient class, which is saliently modified and causes the modification of the rest classes in a commit. Our user studies confirmed that identifying the salient class in a commit can facilitate reviewers in understanding code change. We model the salient class identification as a binary classification problem and extract a number of discriminative features from commit to characterize the salience of a class. The initial experiment result shows that the proposed approach can improve the efficiency of reviewers understanding code changes in code review. @InProceedings{ESEC/FSE18p770, author = {Yuan Huang and Nan Jia and Xiangping Chen and Kai Hong and Zibin Zheng}, title = {Salient-Class Location: Help Developers Understand Code Change in Code Review}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {770--774}, doi = {10.1145/3236024.3264841}, year = {2018}, } Publisher's Version |
|
Hong, Seongjoon |
ESEC/FSE '18: "MemFix: Static Analysis-Based ..."
MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C
Junhee Lee, Seongjoon Hong, and Hakjoo Oh (Korea University, South Korea) We present MemFix, an automated technique for fixing memory deallocation errors in C programs. MemFix aims to fix memory-leak, double-free, and use-after-free errors, which occur when developers fail to properly deallocate memory objects. MemFix attempts to fix these errors by finding a set of free-statements that correctly deallocate all allocated objects without causing double-frees and use-after-frees. The key insight behind MemFix is that finding such a set of deallocation statements corresponds to solving an exact cover problem derived from a variant of typestate static analysis. We formally present the technique and experimentally show that MemFix is able to fix real errors found in open-source programs. Because MemFix is based on a sound static analysis, the generated patches guarantee to fix the original errors without introducing new errors. @InProceedings{ESEC/FSE18p95, author = {Junhee Lee and Seongjoon Hong and Hakjoo Oh}, title = {MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--106}, doi = {10.1145/3236024.3236079}, year = {2018}, } Publisher's Version |
|
Hosseini, Mitra Bokaei |
ESEC/FSE '18-DOC: "Semantic Inference from Natural ..."
Semantic Inference from Natural Language Privacy Policies and Android Code
Mitra Bokaei Hosseini (University of Texas at San Antonio, USA) Mobile apps collect dierent categories of personal information to provide users with various services. Companies use privacy policies containing critical requirements to inform users about their data practices. With the growing access to personal information and the scale of mobile app deployment, traceability of links between privacy policy requirements and app code is increasingly important. Automated traceability can be achieved using natural language processing and code analysis techniques. However, such techniques must address two main challenges: ambiguity in privacy policy terminology and unbounded information types provided by users through input elds in GUI. In this work, we propose approaches to interpret abstract terms in privacy policies, identify information types in Android layout code, and create a mapping between them using natural language processing techniques. @InProceedings{ESEC/FSE18p940, author = {Mitra Bokaei Hosseini}, title = {Semantic Inference from Natural Language Privacy Policies and Android Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {940--943}, doi = {10.1145/3236024.3275427}, year = {2018}, } Publisher's Version |
|
Hsieh, Ken |
ESEC/FSE '18: "Predicting Node Failure in ..."
Predicting Node Failure in Cloud Service Systems
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, and Dongmei Zhang (Microsoft Research, China; Microsoft, USA; University of Newcastle, Australia) In recent years, many traditional software systems have migrated to cloud computing platforms and are provided as online services. The service quality matters because system failures could seriously affect business and user experience. A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability. Predicting node failure in cloud service systems is challenging, because a node failure could be caused by a variety of reasons and reflected by many temporal and spatial signals. Furthermore, the failure data is highly imbalanced. To tackle these challenges, we propose MING, a novel technique that combines: 1) a LSTM model to incorporate the temporal data, 2) a Random Forest model to incorporate spatial data; 3) a ranking model that embeds the intermediate results of the two models as feature inputs and ranks the nodes by their failure-proneness, 4) a cost-sensitive function to identify the optimal threshold for selecting the faulty nodes. We evaluate our approach using real-world data collected from a cloud service system. The results confirm the effectiveness of the proposed approach. We have also successfully applied the proposed approach in real industrial practice. @InProceedings{ESEC/FSE18p480, author = {Qingwei Lin and Ken Hsieh and Yingnong Dang and Hongyu Zhang and Kaixin Sui and Yong Xu and Jian-Guang Lou and Chenggang Li and Youjiang Wu and Randolph Yao and Murali Chintalapati and Dongmei Zhang}, title = {Predicting Node Failure in Cloud Service Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {480--490}, doi = {10.1145/3236024.3236060}, year = {2018}, } Publisher's Version |
|
Hu, Gang |
ESEC/FSE '18: "AppFlow: Using Machine Learning ..."
AppFlow: Using Machine Learning to Synthesize Robust, Reusable UI Tests
Gang Hu, Linjie Zhu, and Junfeng Yang (Columbia University, USA) UI testing is known to be difficult, especially as today’s development cycles become faster. Manual UI testing is tedious, costly and error- prone. Automated UI tests are costly to write and maintain. This paper presents AppFlow, a system for synthesizing highly robust, highly reusable UI tests. It leverages machine learning to automatically recognize common screens and widgets, relieving developers from writing ad hoc, fragile logic to use them in tests. It enables developers to write a library of modular tests for the main functionality of an app category (e.g., an “add to cart” test for shopping apps). It can then quickly test a new app in the same category by synthesizing full tests from the modular ones in the library. By focusing on the main functionality, AppFlow provides “smoke testing” requiring little manual work. Optionally, developers can customize AppFlow by adding app-specific tests for completeness. We evaluated AppFlow on 60 popular apps in the shopping and the news category, two case studies on the BBC news app and the JackThreads shopping app, and a user-study of 15 subjects on the Wish shopping app. Results show that AppFlow accurately recognizes screens and widgets, synthesizes highly robust and reusable tests, covers 46.6% of all automatable tests for Jackthreads with the tests it synthesizes, and reduces the effort to test a new app by up to 90%. Interestingly, it found eight bugs in the evaluated apps, including seven functionality bugs, despite that they were publicly released and supposedly went through thorough testing. @InProceedings{ESEC/FSE18p269, author = {Gang Hu and Linjie Zhu and Junfeng Yang}, title = {AppFlow: Using Machine Learning to Synthesize Robust, Reusable UI Tests}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {269--282}, doi = {10.1145/3236024.3236055}, year = {2018}, } Publisher's Version |
|
Hu, Qingda |
ESEC/FSE '18-NIER: "Towards Quantifying the Development ..."
Towards Quantifying the Development Value of Code Contributions
Jinglei Ren, Hezheng Yin, Qingda Hu, Armando Fox, and Wojciech Koszek (Microsoft Research, China; University of California at Berkeley, USA; Tsinghua University, China; FreeBSD Project, USA) Quantifying the value of developers’ code contributions to a software project requires more than simply counting lines of code or commits. We define the development value of code as a combination of its structural value (the effect of code reuse) and its non-structural value (the impact on development). We propose techniques to automatically calculate both components of development value and combine them using Learning to Rank. Our preliminary empirical study shows that our analysis yields richer results than those obtained by human assessment or simple counting methods and demonstrates the potential of our approach. @InProceedings{ESEC/FSE18p775, author = {Jinglei Ren and Hezheng Yin and Qingda Hu and Armando Fox and Wojciech Koszek}, title = {Towards Quantifying the Development Value of Code Contributions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {775--779}, doi = {10.1145/3236024.3264842}, year = {2018}, } Publisher's Version |
|
Hu, Yu |
ESEC/FSE '18: "Bug Synthesis: Challenging ..."
Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults
Subhajit Roy, Awanish Pandey, Brendan Dolan-Gavitt, and Yu Hu (IIT Kanpur, India; New York University, USA) In spite of decades of research in bug detection tools, there is a surprising dearth of ground-truth corpora that can be used to evaluate the efficacy of such tools. Recently, systems such as LAVA and EvilCoder have been proposed to automatically inject bugs into software to quickly generate large bug corpora, but the bugs created so far differ from naturally occurring bugs in a number of ways. In this work, we propose a new automated bug injection system, Apocalypse, that uses formal techniques—symbolic execution, constraint-based program synthesis and model counting—to automatically inject fair (can potentially be discovered by current bug-detection tools), deep (requiring a long sequence of dependencies to be satisfied to fire), uncorrelated (each bug behaving independent of others), reproducible (a trigger input being available) and rare (can be triggered by only a few program inputs) bugs in large software code bases. In our evaluation, we inject bugs into thirty Coreutils programs as well as the TCAS test suite. We find that bugs synthesized by Apocalypse are highly realistic under a variety of metrics, that they do not favor a particular bug-finding strategy (unlike bugs produced by LAVA), and that they are more difficult to find than manually injected bugs, requiring up around 240× more tests to discover with a state-of-the-art symbolic execution tool. @InProceedings{ESEC/FSE18p224, author = {Subhajit Roy and Awanish Pandey and Brendan Dolan-Gavitt and Yu Hu}, title = {Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {224--234}, doi = {10.1145/3236024.3236084}, year = {2018}, } Publisher's Version |
|
Hu, Zhenjiang |
ESEC/FSE '18: "Putback-Based Bidirectional ..."
Putback-Based Bidirectional Model Transformations
Xiao He and Zhenjiang Hu (University of Science and Technology Beijing, China; NII, Japan; University of Tokyo, Japan) Bidirectional model transformation (BX) plays a vital role in Model-Driven Engineering. A major challenge in conventional relational and bidirectionalization-based BX approaches is the ambiguity issue, i.e., the backward transformation may not be uniquely determined by the consistency relation or the forward transformation. A promising solution to the ambiguity issue is to adopt putback-based bidirectional programming, which realizes a BX by specifying the backward transformation. However, existing putback-based approaches do not support multiple conversions of the same node (namely a shared node). Since a model is a graph, shared nodes are very common and inevitable. Consequently, existing putback-based approaches cannot be directly applied to bidirectional model transformation. This paper proposes a novel approach to BX. We define a new model-merging-based BX combinator, which can combine two BXs owning shared nodes into a well behaved composite BX. Afterwards, we propose a putback-based BX language XMU to address the ambiguity issue, which is built on the model-merging-based BX combinator. We present the formal semantics of XMU which can be proven well behaved. Finally, a tool support is also introduced to illustrate the usefulness of our approach. @InProceedings{ESEC/FSE18p434, author = {Xiao He and Zhenjiang Hu}, title = {Putback-Based Bidirectional Model Transformations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {434--444}, doi = {10.1145/3236024.3236070}, year = {2018}, } Publisher's Version Info |
|
Hua, Jinru |
ESEC/FSE '18-DEMO: "SketchFix: A Tool for Automated ..."
SketchFix: A Tool for Automated Program Repair Approach using Lazy Candidate Generation
Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid (University of Texas at Austin, USA) Manually locating and removing bugs in faulty program is often tedious and error-prone. A common automated program repair approach called generate-and-validate (G&V) iteratively creates candidate fixes, compiles them, and runs these candidates against the given tests. This approach can be costly due to a large number of re-compilations and re-executions of the program. To tackle this limitation, recent work introduced the SketchFix approach that tightly integrates the generation and validation phases, and utilizes runtime behaviors to substantially prune a large amount of repair candidates. This tool paper describes our Java implementation of SketchFix, which is an open-source library that we released on Github. Our experimental evaluation using Defects4J benchmark shows that SketchFix can significantly reduce the number of re-compilations and re-executions compared to other approaches and work particularly well in repairing expression manipulation at the AST node-level granularity.The demo video is at: https://youtu.be/AO-YCH8vGzQ. @InProceedings{ESEC/FSE18p888, author = {Jinru Hua and Mengshi Zhang and Kaiyuan Wang and Sarfraz Khurshid}, title = {SketchFix: A Tool for Automated Program Repair Approach using Lazy Candidate Generation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {888--891}, doi = {10.1145/3236024.3264600}, year = {2018}, } Publisher's Version Video |
|
Huang, Jeff |
ESEC/FSE '18: "DeepSim: Deep Learning Code ..."
DeepSim: Deep Learning Code Functional Similarity
Gang Zhao and Jeff Huang (Texas A&M University, USA) Measuring code similarity is fundamental for many software engineering tasks, e.g., code search, refactoring and reuse. However, most existing techniques focus on code syntactical similarity only, while measuring code functional similarity remains a challenging problem. In this paper, we propose a novel approach that encodes code control flow and data flow into a semantic matrix in which each element is a high dimensional sparse binary feature vector, and we design a new deep learning model that measures code functional similarity based on this representation. By concatenating hidden representations learned from a code pair, this new model transforms the problem of detecting functionally similar code to binary classification, which can effectively learn patterns between functionally similar code with very different syntactics. We have implemented our approach, DeepSim, for Java programs and evaluated its recall, precision and time performance on two large datasets of functionally similar code. The experimental results show that DeepSim significantly outperforms existing state-of-the-art techniques, such as DECKARD, RtvNN, CDLH, and two baseline deep neural networks models. @InProceedings{ESEC/FSE18p141, author = {Gang Zhao and Jeff Huang}, title = {DeepSim: Deep Learning Code Functional Similarity}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {141--151}, doi = {10.1145/3236024.3236068}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Concurrency Verification with ..." Concurrency Verification with Maximal Path Causality Qiuping Yi and Jeff Huang (Texas A&M University, USA) We present a technique that systematically explores the state spaces of concurrent programs across both the schedule space and the input space. The cornerstone is a new model called Maximal Path Causality (MPC), which captures all combinations of thread schedules and program inputs that reach the same path as one equivalency class, and generates a unique schedule+input combination to explore each path. Moreover, the exploration for different paths can be easily parallelized. Our extensive evaluation on both popular concurrency benchmarks and real-world C/C++ applications shows that MPC significantly improves the performance of existing techniques. @InProceedings{ESEC/FSE18p366, author = {Qiuping Yi and Jeff Huang}, title = {Concurrency Verification with Maximal Path Causality}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {366--376}, doi = {10.1145/3236024.3236048}, year = {2018}, } Publisher's Version |
|
Huang, Jianjun |
ESEC/FSE '18: "NAR-Miner: Discovering Negative ..."
NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection
Pan Bian, Bin Liang, Wenchang Shi, Jianjun Huang, and Yan Cai (Renmin University of China, China; Institute of Software at Chinese Academy of Sciences, China) Inferring programming rules from source code based on data mining techniques has been proven to be effective to detect software bugs. Existing studies focus on discovering positive rules in the form of A ⇒ B, indicating that when operation A appears, operation B should also be here. Unfortunately, the negative rules (A ⇒ ¬ B), indicating the mutual suppression or conflict relationships among program elements, have not gotten the attention they deserve. In fact, violating such negative rules can also result in serious bugs. In this paper, we propose a novel method called NAR-Miner to automatically extract negative association programming rules from large-scale systems, and detect their violations to find bugs. However, mining negative rules faces a more serious rule explosion problem than mining positive ones. Most of the obtained negative rules are uninteresting and can lead to unacceptable false alarms. To address the issue, we design a semantics-constrained mining algorithm to focus rule mining on the elements with strong semantic relationships. Furthermore, we introduce information entropy to rank candidate negative rules and highlight the interesting ones. Consequently, we effectively mitigate the rule explosion problem. We implement NAR-Miner and apply it to a Linux kernel (v4.12-rc6). The experiments show that the uninteresting rules are dramatically reduced and 17 detected violations have been confirmed as real bugs and patched by kernel community. We also apply NAR-Miner to PostgreSQL, OpenSSL and FFmpeg and discover six real bugs. @InProceedings{ESEC/FSE18p411, author = {Pan Bian and Bin Liang and Wenchang Shi and Jianjun Huang and Yan Cai}, title = {NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {411--422}, doi = {10.1145/3236024.3236032}, year = {2018}, } Publisher's Version |
|
Huang, Ruirui |
ESEC/FSE '18: "An Empirical Study on Crash ..."
An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems
Yu Gao, Wensheng Dou, Feng Qin, Chushu Gao, Dong Wang, Jun Wei, Ruirui Huang, Li Zhou, and Yongming Wu (Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Ohio State University, USA; Alibaba Group, China) In large-scale distributed systems, node crashes are inevitable, and can happen at any time. As such, distributed systems are usually designed to be resilient to these node crashes via various crash recovery mechanisms, such as write-ahead logging in HBase and hinted handoffs in Cassandra. However, faults in crash recovery mechanisms and their implementations can introduce intricate crash recovery bugs, and lead to severe consequences. In this paper, we present CREB, the most comprehensive study on 103 Crash REcovery Bugs from four popular open-source distributed systems, including ZooKeeper, Hadoop MapReduce, Cassandra and HBase. For all the studied bugs, we analyze their root causes, triggering conditions, bug impacts and fixing. Through this study, we obtain many interesting findings that can open up new research directions for combating crash recovery bugs. @InProceedings{ESEC/FSE18p539, author = {Yu Gao and Wensheng Dou and Feng Qin and Chushu Gao and Dong Wang and Jun Wei and Ruirui Huang and Li Zhou and Yongming Wu}, title = {An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {539--550}, doi = {10.1145/3236024.3236030}, year = {2018}, } Publisher's Version |
|
Huang, Tao |
ESEC/FSE '18-NIER: "How Are Spreadsheet Templates ..."
How Are Spreadsheet Templates Used in Practice: A Case Study on Enron
Liang Xu, Wensheng Dou, Jiaxin Zhu, Chushu Gao, Jun Wei, and Tao Huang (University of Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China) To reduce the effort of creating similar spreadsheets, end users may create expected spreadsheets from some predesigned templates, which contain necessary table layouts (e.g., headers and styles) and formulas, other than from scratch. When there are no explicitly predesigned spreadsheet templates, end users often take an existing spreadsheet as the instance template to create a new spreadsheet. However, improper template design and usage can introduce various issues. For example, a formula error in the template can be easily propagated to all its instances without users’ noticing. Since template design and usage are rarely documented in literature and practice, practitioners and researchers lack understanding of them to achieve effective improvement. In this paper, we conduct the first empirical study on the design and the usage of spreadsheet templates based on 47 predesigned templates (490 instances in total), and 21 instance template groups (168 template and instance pairs in total), extracted from the Enron corpus. Our study reveals a number of spreadsheet template design and usage issues in practice, and also sheds lights on several interesting research directions. @InProceedings{ESEC/FSE18p734, author = {Liang Xu and Wensheng Dou and Jiaxin Zhu and Chushu Gao and Jun Wei and Tao Huang}, title = {How Are Spreadsheet Templates Used in Practice: A Case Study on Enron}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {734--738}, doi = {10.1145/3236024.3264834}, year = {2018}, } Publisher's Version |
|
Huang, Yuan |
ESEC/FSE '18-NIER: "Salient-Class Location: Help ..."
Salient-Class Location: Help Developers Understand Code Change in Code Review
Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng (Sun Yat-sen University, China; Hebei GEO University, China) Code review involves a significant amount of human effort to understand the code change, because the information required to inspect code changes may distribute across multiple files that reviewers are not familiar with. Code changes are often organized as commits for review. In this paper, we found that most of the commits contain a salient class, which is saliently modified and causes the modification of the rest classes in a commit. Our user studies confirmed that identifying the salient class in a commit can facilitate reviewers in understanding code change. We model the salient class identification as a binary classification problem and extract a number of discriminative features from commit to characterize the salience of a class. The initial experiment result shows that the proposed approach can improve the efficiency of reviewers understanding code changes in code review. @InProceedings{ESEC/FSE18p770, author = {Yuan Huang and Nan Jia and Xiangping Chen and Kai Hong and Zibin Zheng}, title = {Salient-Class Location: Help Developers Understand Code Change in Code Review}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {770--774}, doi = {10.1145/3236024.3264841}, year = {2018}, } Publisher's Version |
|
Huang, Zixin |
ESEC/FSE '18: "Testing Probabilistic Programming ..."
Testing Probabilistic Programming Systems
Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic (University of Illinois at Urbana-Champaign, USA) Probabilistic programming systems (PP systems) allow developers to model stochastic phenomena and perform efficient inference on the models. The number and adoption of probabilistic programming systems is growing significantly. However, there is no prior study of bugs in these systems and no methodology for systematically testing PP systems. Yet, testing PP systems is highly non-trivial, especially when they perform approximate inference. In this paper, we characterize 118 previously reported bugs in three open-source PP systems—Edward, Pyro and Stan—and pro- pose ProbFuzz, an extensible system for testing PP systems. Prob- Fuzz allows a developer to specify templates of probabilistic models, from which it generates concrete probabilistic programs and data for testing. ProbFuzz uses language-specific translators to generate these concrete programs, which use the APIs of each PP system. ProbFuzz finds potential bugs by checking the output from running the generated programs against several oracles, including an accu- racy checker. Using ProbFuzz, we found 67 previously unknown bugs in recent versions of these PP systems. Developers already accepted 51 bug fixes that we submitted to the three PP systems, and their underlying systems, PyTorch and TensorFlow. @InProceedings{ESEC/FSE18p574, author = {Saikat Dutta and Owolabi Legunsen and Zixin Huang and Sasa Misailovic}, title = {Testing Probabilistic Programming Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {574--586}, doi = {10.1145/3236024.3236057}, year = {2018}, } Publisher's Version |
|
Hussain, Waqar |
ESEC/FSE '18-NIER: "Operationalizing Human Values ..."
Operationalizing Human Values in Software: A Research Roadmap
Davoud Mougouei, Harsha Perera, Waqar Hussain, Rifat Shams, and Jon Whittle (Monash University, Australia) Software influences several aspects of people's lives and therefore needs to reflect their values. However, existing software engineering methods fail to account for human values, which may result in breaching those values in software and, therefore, dissatisfaction of users and loss of profit and reputation. To avoid such negative consequences, human values need to be integrated -- in a verifiable way -- into software. We refer to this as Operationalizing Human Values in Software. But this is not easy to achieve due to three main obstacles: first, human values are hard to define in a way that can be put into practice; second, existing software design decisions are mainly ignorant of values; finally, values are hard to determine and quantify in software. This paper aims to establish a research roadmap for overcoming these obstacles. The proposed roadmap focuses on (i) establishing practical definitions for human values, (ii) integrating values into software design, and (iii) measuring values in the software development life cycle. @InProceedings{ESEC/FSE18p780, author = {Davoud Mougouei and Harsha Perera and Waqar Hussain and Rifat Shams and Jon Whittle}, title = {Operationalizing Human Values in Software: A Research Roadmap}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {780--784}, doi = {10.1145/3236024.3264843}, year = {2018}, } Publisher's Version |
|
Imtiaz, Sayem Mohammad |
ESEC/FSE '18-NIER: "Towards Data-Driven Vulnerability ..."
Towards Data-Driven Vulnerability Prediction for Requirements
Sayem Mohammad Imtiaz and Tanmay Bhowmik (Mississippi State University, USA) Due to the abundance of security breaches we continue to see, the software development community is recently paying attention to a more proactive approach towards security. This includes predicting vulnerability before exploitation employing static code analysis and machine learning techniques. Such mechanisms, however, are designed to detect post-implementation vulnerabilities. As the root of a vulnerability can often be traced back to the requirement specification, and vulnerability discovered later in the development life cycle is more expensive to fix, we need additional preventive mechanisms capable of predicting vulnerability at a much earlier stage. In this paper, we propose a novel framework providing an automated support to predict vulnerabilities for a requirement as early as during requirement engineering. We further present a preliminary demonstration of our framework and the promising results we observe clearly indicate the value of this new research idea. @InProceedings{ESEC/FSE18p744, author = {Sayem Mohammad Imtiaz and Tanmay Bhowmik}, title = {Towards Data-Driven Vulnerability Prediction for Requirements}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {744--748}, doi = {10.1145/3236024.3264836}, year = {2018}, } Publisher's Version |
|
Ivanov, Vladimir |
ESEC/FSE '18-IND: "Design and Validation of Precooked ..."
Design and Validation of Precooked Developer Dashboards
Vladimir Ivanov, Vladislav Pischulin, Alan Rogers, Giancarlo Succi, Jooyong Yi, and Vasilii Zorin (Innopolis University, Russia) Despite increasing popularity of developer dashboards, the effectiveness of dashboards is still in question. In order to design a dashboard that is effective and useful for developers, it is important to know (a) what information developers need to see in a dashboard, and (b) how developers want to use a dashboard with that necessary information. To answer these questions, we conducted two series of face-to-face individual interviews with developers. In the first step we analyzed answers, build a Goal-Question-Metric model and designed a precooked developer dashboard. Then, during the second separate series of interviews, we validated the GQM and derived feedback on the designed dashboard. Given that the cost of dashboard customization prevents developers from utilizing dashboards, we believe that our findings can provide a solid starting point to build precooked developer dashboards that can be readily utilized by software companies. @InProceedings{ESEC/FSE18p821, author = {Vladimir Ivanov and Vladislav Pischulin and Alan Rogers and Giancarlo Succi and Jooyong Yi and Vasilii Zorin}, title = {Design and Validation of Precooked Developer Dashboards}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {821--826}, doi = {10.1145/3236024.3275530}, year = {2018}, } Publisher's Version |
|
Izumida, Tomonori |
ESEC/FSE '18: "Automated Patch Extraction ..."
Automated Patch Extraction via Syntax- and Semantics-Aware Delta Debugging on Source Code Changes
Masatomo Hashimoto, Akira Mori, and Tomonori Izumida (Chiba Institute of Technology, Japan; AIST, Japan; IIJ Innovation Institute, Japan) Delta debugging (DD) is an approach to automating the debugging activities based on systematic testing. DD algorithms find the cause of a regression of a program by minimizing the changes applied between a working version and a faulty version of the program. However, it is still an open problem to minimize a huge set of changes while avoiding any invalid subsets that do not result in testable programs, especially in case that no software configuration management system is available. In this paper, we propose a rule-based approach to syntactic and semantic decomposition of changes into independent components to facilitate DD on source code changes, and hence to extract patches automatically. For analyzing changes, we make use of tree differencing on abstract syntax trees instead of common differencing on plain texts. We have developed an experimental implementation for Java programs and applied it to 194 bug fixes from Defects4J and 8 real-life regression bugs from 6 open source Java projects. Compared to a DD tool based on plain text differencing, it extracted patches whose size is reduced by 50% at the cost of 5% more test executions for the former dataset and by 73% at the cost of 40% more test executions for the latter, both on average. @InProceedings{ESEC/FSE18p598, author = {Masatomo Hashimoto and Akira Mori and Tomonori Izumida}, title = {Automated Patch Extraction via Syntax- and Semantics-Aware Delta Debugging on Source Code Changes}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {598--609}, doi = {10.1145/3236024.3236047}, year = {2018}, } Publisher's Version Info |
|
Jamshidi, Pooyan |
ESEC/FSE '18: "Learning to Sample: Exploiting ..."
Learning to Sample: Exploiting Similarities across Environments to Learn Performance Models for Configurable Systems
Pooyan Jamshidi, Miguel Velez, Christian Kästner, and Norbert Siegmund (University of South Carolina, USA; Carnegie Mellon University, USA; Bauhaus-University Weimar, Germany) Most software systems provide options that allow users to tailor the system in terms of functionality and qualities. The increased flexibility raises challenges for understanding the configuration space and the effects of options and their interactions on performance and other non-functional properties. To identify how options and interactions affect the performance of a system, several sampling and learning strategies have been recently proposed. However, existing approaches usually assume a fixed environment (hardware, workload, software release) such that learning has to be repeated once the environment changes. Repeating learning and measurement for each environment is expensive and often practically infeasible. Instead, we pursue a strategy that transfers knowledge across environments but sidesteps heavyweight and expensive transfer-learning strategies. Based on empirical insights about common relationships regarding (i) influential options, (ii) their interactions, and (iii) their performance distributions, our approach, L2S (Learning to Sample), selects better samples in the target environment based on information from the source environment. It progressively shrinks and adaptively concentrates on interesting regions of the configuration space. With both synthetic benchmarks and several real systems, we demonstrate that L2S outperforms state of the art performance learning and transfer-learning approaches in terms of measurement effort and learning accuracy. @InProceedings{ESEC/FSE18p71, author = {Pooyan Jamshidi and Miguel Velez and Christian Kästner and Norbert Siegmund}, title = {Learning to Sample: Exploiting Similarities across Environments to Learn Performance Models for Configurable Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {71--82}, doi = {10.1145/3236024.3236074}, year = {2018}, } Publisher's Version |
|
Jha, Somesh |
ESEC/FSE '18: "Neural-Augmented Static Analysis ..."
Neural-Augmented Static Analysis of Android Communication
Jinman Zhao, Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, and Damien Octeau (University of Wisconsin-Madison, USA; Google, USA) We address the problem of discovering communication links between applications in the popular Android mobile operating system, an important problem for security and privacy in Android. Any scalable static analysis in this complex setting is bound to produce an excessive amount of false-positives, rendering it impractical. To improve precision, we propose to augment static analysis with a trained neural-network model that estimates the probability that a communication link truly exists. We describe a neural-network architecture that encodes abstractions of communicating objects in two applications and estimates the probability with which a link indeed exists. At the heart of our architecture are type-directed encoders (TDE), a general framework for elegantly constructing encoders of a compound data type by recursively composing encoders for its constituent types. We evaluate our approach on a large corpus of Android applications, and demonstrate that it achieves very high accuracy. Further, we conduct thorough interpretability studies to understand the internals of the learned neural networks. @InProceedings{ESEC/FSE18p342, author = {Jinman Zhao and Aws Albarghouthi and Vaibhav Rastogi and Somesh Jha and Damien Octeau}, title = {Neural-Augmented Static Analysis of Android Communication}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {342--353}, doi = {10.1145/3236024.3236066}, year = {2018}, } Publisher's Version |
|
Jia, Nan |
ESEC/FSE '18-NIER: "Salient-Class Location: Help ..."
Salient-Class Location: Help Developers Understand Code Change in Code Review
Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng (Sun Yat-sen University, China; Hebei GEO University, China) Code review involves a significant amount of human effort to understand the code change, because the information required to inspect code changes may distribute across multiple files that reviewers are not familiar with. Code changes are often organized as commits for review. In this paper, we found that most of the commits contain a salient class, which is saliently modified and causes the modification of the rest classes in a commit. Our user studies confirmed that identifying the salient class in a commit can facilitate reviewers in understanding code change. We model the salient class identification as a binary classification problem and extract a number of discriminative features from commit to characterize the salience of a class. The initial experiment result shows that the proposed approach can improve the efficiency of reviewers understanding code changes in code review. @InProceedings{ESEC/FSE18p770, author = {Yuan Huang and Nan Jia and Xiangping Chen and Kai Hong and Zibin Zheng}, title = {Salient-Class Location: Help Developers Understand Code Change in Code Review}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {770--774}, doi = {10.1145/3236024.3264841}, year = {2018}, } Publisher's Version |
|
Jiang, Lingxiao |
ESEC/FSE '18-INV: "Ten Years of Hunting for Similar ..."
Ten Years of Hunting for Similar Code for Fun and Profit (Keynote)
Stéphane Glondu, Lingxiao Jiang, and Zhendong Su (Inria, France; Singapore Management University, Singapore; ETH Zurich, Switzerland) In 2007, the Deckard paper was published at ICSE. Since its publication, it has led to much follow-up research and applications. The paper made two core contributions: a novel vector embedding of structured code for fast similarity detection, and an application of the embedding for clone detection, resulting in the Deckard tool. The vector embedding is simple and easy to adapt. Similar code detection is also fundamental for a range of classical and emerging problems in software engineering, security, and computer science education (e.g., code reuse, refactoring, porting, translation, synthesis, program repair, malware detection, and feedback generation). Both have buttressed the paper’s influence. In 2018, the Deckard paper received the ACM SIGSOFT Impact Paper award. In this keynote, we take the opportunity to review the work’s inception, evolution and impact on its subsequent work and applications, and to share our thoughts on exciting ongoing and future developments. @InProceedings{ESEC/FSE18p2, author = {Stéphane Glondu and Lingxiao Jiang and Zhendong Su}, title = {Ten Years of Hunting for Similar Code for Fun and Profit (Keynote)}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {2--2}, doi = {10.1145/3236024.3280856}, year = {2018}, } Publisher's Version |
|
Jiang, Yanyan |
ESEC/FSE '18: "Testing Multithreaded Programs ..."
Testing Multithreaded Programs via Thread Speed Control
Dongjie Chen, Yanyan Jiang, Chang Xu, Xiaoxing Ma, and Jian Lu (Nanjing University, China) A multithreaded program's interleaving space is discrete and astronomically large, making effectively sampling thread schedules for manifesting concurrency bugs a challenging task. Observing that concurrency bugs can be manifested by adjusting thread relative speeds, this paper presents the new concept of speed space in which each vector denotes a family of thread schedules. A multithreaded program's speed space is approximately continuous, easy-to-sample, and preserves certain categories of concurrency bugs. We discuss the design, implementation, and evaluation of our speed-controlled scheduler for exploring adversarial/abnormal schedules. The experimental results confirm that our technique is effective in sampling diverse schedules. Our implementation also found previously unknown concurrency bugs in real-world multithreaded programs. @InProceedings{ESEC/FSE18p15, author = {Dongjie Chen and Yanyan Jiang and Chang Xu and Xiaoxing Ma and Jian Lu}, title = {Testing Multithreaded Programs via Thread Speed Control}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {15--25}, doi = {10.1145/3236024.3236077}, year = {2018}, } Publisher's Version Info |
|
Jiang, Yu |
ESEC/FSE '18-DEMO: "EClone: Detect Semantic Clones ..."
EClone: Detect Semantic Clones in Ethereum via Symbolic Transaction Sketch
Han Liu, Zhiqiang Yang, Chao Liu, Yu Jiang, Wenqi Zhao, and Jiaguang Sun (Tsinghua University, China; Peking University, China; Ant Financial, China) The Ethereum ecosystem has created a prosperity of smart contract applications in public blockchains, with transparent, traceable and programmable transactions. However, the flexibility that everybody can write and deploy smart contracts on Ethereum causes a large collection of similar contracts, i.e., clones. In practice, smart contract clones may amplify severe threats like security attacks, resource waste etc. In this paper, we have developed EClone, a semantic clone detector for Ethereum. The key insight of our clone detection is Symbolic Transaction Sketch, i.e., a set of critical semantic properties generated from symbolic transaction. Sketches of two smart contracts will be normalized into numeric vectors with a same length. Then, the clone detection problem is modeled as a similarity computation process where sketches and other syntactic information are combined. We have applied EClone in identifying semantic clones of deployed Ethereum smart contracts and achieved an accuracy of 93.27%. A demo video of EClone is at https://youtu.be/IRasOVv6vyc. @InProceedings{ESEC/FSE18p900, author = {Han Liu and Zhiqiang Yang and Chao Liu and Yu Jiang and Wenqi Zhao and Jiaguang Sun}, title = {EClone: Detect Semantic Clones in Ethereum via Symbolic Transaction Sketch}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {900--903}, doi = {10.1145/3236024.3264596}, year = {2018}, } Publisher's Version ESEC/FSE '18-IND: "PAFL: Extend Fuzzing Optimizations ..." PAFL: Extend Fuzzing Optimizations of Single Mode to Industrial Parallel Mode Jie Liang, Yu Jiang, Yuanliang Chen, Mingzhe Wang, Chijin Zhou, and Jiaguang Sun (Tsinghua University, China) Researchers have proposed many optimizations to improve the efficiency of fuzzing, and most optimized strategies work very well on their targets when running in single mode with instantiating one fuzzer instance. However, in real industrial practice, most fuzzers run in parallel mode with instantiating multiple fuzzer instances, and those optimizations unfortunately fail to maintain the efficiency improvements. In this paper, we present PAFL, a framework that utilizes efficient guiding information synchronization and task division to extend those existing fuzzing optimizations of single mode to industrial parallel mode. With an additional data structure to store the guiding information, the synchronization ensures the information is shared and updated among different fuzzer instances timely. Then, the task division promotes the diversity of fuzzer instances by splitting the fuzzing task into several sub-tasks based on branch bitmap. We first evaluate PAFL using 12 different real-world programs from Google fuzzer-test-suite. Results show that in parallel mode, two AFL improvers–AFLFast and FairFuzz do not outperform AFL, which is different from the case in single mode. However, when augmented with PAFL, the performance of AFLFast and FairFuzz in parallel mode improves. They cover 8% and 17% more branches, trigger 79% and 52% more unique crashes. For further evaluation on more widely-used software systems from GitHub, optimized fuzzers augmented with PAFL find more real bugs, and 25 of which are security-critical vulnerabilities registered as CVEs in the US National Vulnerability Database. @InProceedings{ESEC/FSE18p809, author = {Jie Liang and Yu Jiang and Yuanliang Chen and Mingzhe Wang and Chijin Zhou and Jiaguang Sun}, title = {PAFL: Extend Fuzzing Optimizations of Single Mode to Industrial Parallel Mode}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {809--814}, doi = {10.1145/3236024.3275525}, year = {2018}, } Publisher's Version ESEC/FSE '18-IND: "VulSeeker-Pro: Enhanced Semantic ..." VulSeeker-Pro: Enhanced Semantic Learning Based Binary Vulnerability Seeker with Emulation Jian Gao, Xin Yang, Ying Fu, Yu Jiang, Heyuan Shi, and Jiaguang Sun (Tsinghua University, China) Learning-based clone detection is widely exploited for binary vulnerability search. Although they solve the problem of high time overhead of traditional dynamic and static search approaches to some extent, their accuracy is limited, and need to manually identify the true positive cases among the top-M search results during the industrial practice. This paper presents VulSeeker-Pro, an enhanced binary vulnerability seeker that integrates function semantic emulation at the back end of semantic learning, to release the engineers from the manual identification work. It first uses the semantic learning based predictor to quickly predict the top-M candidate functions which are the most similar to the vulnerability from the target binary. Then the top-M candidates are fed to the emulation engine to resort, and more accurate top-N candidate functions are obtained. With fast filtering of semantic learning and dynamic trace generation of function semantic emulation, VulSeeker-Pro can achieve higher search accuracy with little time overhead. The experimental results on 15 known CVE vulnerabilities involving 6 industry widely used programs show that VulSeeker-Pro significantly outperforms the state-of-the-art approaches in terms of accuracy. In a total of 45 searches, VulSeeker-Pro finds 40 and 43 real vulnerabilities in the top-1 and top-5 candidate functions, which are 12.33× and 2.58× more than the most recent and related work Gemini. In terms of efficiency, it takes 0.22 seconds on average to determine whether the target binary function contains a known vulnerability or not. @InProceedings{ESEC/FSE18p803, author = {Jian Gao and Xin Yang and Ying Fu and Yu Jiang and Heyuan Shi and Jiaguang Sun}, title = {VulSeeker-Pro: Enhanced Semantic Learning Based Binary Vulnerability Seeker with Emulation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {803--808}, doi = {10.1145/3236024.3275524}, year = {2018}, } Publisher's Version ESEC/FSE '18-NIER: "DLFuzz: Differential Fuzzing ..." DLFuzz: Differential Fuzzing Testing of Deep Learning Systems Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun (Tsinghua University, China; Shanghai Jiao Tong University, China) Deep learning (DL) systems are increasingly applied to safety-critical domains such as autonomous driving cars. It is of significant importance to ensure the reliability and robustness of DL systems. Existing testing methodologies always fail to include rare inputs in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz, the first differential fuzzing testing framework to guide DL systems exposing incorrect behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and the prediction difference between the original input and the mutated input, without manual labeling effort or cross-referencing oracles from other DL systems with the same functionality. We present empirical evaluations on two well-known datasets to demonstrate its efficiency. Compared with DeepXplore, the state-of-the-art DL whitebox testing framework, DLFuzz does not require extra efforts to find similar functional DL systems for cross-referencing check, but could generate 338.59% more adversarial inputs with 89.82% smaller perturbations, averagely obtain 2.86% higher neuron coverage, and save 20.11% time consumption. @InProceedings{ESEC/FSE18p739, author = {Jianmin Guo and Yu Jiang and Yue Zhao and Quan Chen and Jiaguang Sun}, title = {DLFuzz: Differential Fuzzing Testing of Deep Learning Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {739--743}, doi = {10.1145/3236024.3264835}, year = {2018}, } Publisher's Version |
|
Johnson, Brittany |
ESEC/FSE '18-DEMO: "Themis: Automatically Testing ..."
Themis: Automatically Testing Software for Discrimination
Rico Angell, Brittany Johnson, Yuriy Brun, and Alexandra Meliou (University of Massachusetts at Amherst, USA) Bias in decisions made by modern software is becoming a common and serious problem. We present Themis, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior. We explain how Themis can measure discrimination and aid its debugging, describe a set of optimizations Themis uses to reduce test suite size, and demonstrate Themis' effectiveness on open-source software. Themis is open-source and all our evaluation data are available at http://fairness.cs.umass.edu/. See a video of Themis in action: https://youtu.be/brB8wkaUesY @InProceedings{ESEC/FSE18p871, author = {Rico Angell and Brittany Johnson and Yuriy Brun and Alexandra Meliou}, title = {Themis: Automatically Testing Software for Discrimination}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {871--875}, doi = {10.1145/3236024.3264590}, year = {2018}, } Publisher's Version Video Info |
|
Kanthan, Leslie |
ESEC/FSE '18: "Darwinian Data Structure Selection ..."
Darwinian Data Structure Selection
Michail Basios, Lingbo Li, Fan Wu, Leslie Kanthan, and Earl T. Barr (University College London, UK) Data structure selection and tuning is laborious but can vastly improve an application’s performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce ARTEMIS a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS modulo a test suite, then changes an application to use that DDS. ARTEMIS achieves substantial performance improvements for every project in 5 Java projects from DaCapo benchmark, 8 popular projects and 30 uniformly sampled projects from GitHub. For execution time, CPU usage, and memory consumption, ARTEMIS finds at least one solution that improves all measures for 86% (37/43) of the projects. The median improvement across the best solutions is 4.8%, 10.1%, 5.1% for runtime, memory and CPU usage. These aggregate results understate ARTEMIS’s potential impact. Some of the benchmarks it improves are libraries or utility functions. Two examples are gson, a ubiquitous Java serialization framework, and xalan, Apache’s XML transformation tool. ARTEMIS improves gson by 16.5%, 1% and 2.2% for memory, runtime, and CPU; ARTEMIS improves xalan’s memory consumption by 23.5%. Every client of these projects will benefit from these performance improvements. @InProceedings{ESEC/FSE18p118, author = {Michail Basios and Lingbo Li and Fan Wu and Leslie Kanthan and Earl T. Barr}, title = {Darwinian Data Structure Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {118--128}, doi = {10.1145/3236024.3236043}, year = {2018}, } Publisher's Version Info |
|
Kästner, Christian |
ESEC/FSE '18: "Learning to Sample: Exploiting ..."
Learning to Sample: Exploiting Similarities across Environments to Learn Performance Models for Configurable Systems
Pooyan Jamshidi, Miguel Velez, Christian Kästner, and Norbert Siegmund (University of South Carolina, USA; Carnegie Mellon University, USA; Bauhaus-University Weimar, Germany) Most software systems provide options that allow users to tailor the system in terms of functionality and qualities. The increased flexibility raises challenges for understanding the configuration space and the effects of options and their interactions on performance and other non-functional properties. To identify how options and interactions affect the performance of a system, several sampling and learning strategies have been recently proposed. However, existing approaches usually assume a fixed environment (hardware, workload, software release) such that learning has to be repeated once the environment changes. Repeating learning and measurement for each environment is expensive and often practically infeasible. Instead, we pursue a strategy that transfers knowledge across environments but sidesteps heavyweight and expensive transfer-learning strategies. Based on empirical insights about common relationships regarding (i) influential options, (ii) their interactions, and (iii) their performance distributions, our approach, L2S (Learning to Sample), selects better samples in the target environment based on information from the source environment. It progressively shrinks and adaptively concentrates on interesting regions of the configuration space. With both synthetic benchmarks and several real systems, we demonstrate that L2S outperforms state of the art performance learning and transfer-learning approaches in terms of measurement effort and learning accuracy. @InProceedings{ESEC/FSE18p71, author = {Pooyan Jamshidi and Miguel Velez and Christian Kästner and Norbert Siegmund}, title = {Learning to Sample: Exploiting Similarities across Environments to Learn Performance Models for Configurable Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {71--82}, doi = {10.1145/3236024.3236074}, year = {2018}, } Publisher's Version ESEC/FSE '18-NIER: "Beyond Testing Configurable ..." Beyond Testing Configurable Systems: Applying Variational Execution to Automatic Program Repair and Higher Order Mutation Testing Chu-Pan Wong, Jens Meinicke, and Christian Kästner (Carnegie Mellon University, USA; University of Magdeburg, Germany) Generate-and-validate automatic program repair and higher order mutation testing often use search-based techniques to find optimal or good enough solutions in huge search spaces. As search spaces continue to grow, finding solutions that require interactions of multiple changes can become challenging. To tackle the huge search space, we propose to use variational execution. Variational execution has been shown to be effective in exhaustively exploring variations and identifying interactions in a huge but often finite configuration space. The key idea is to encode alternatives in the search space as variations and use variational execution as a black-box technique to generate useful insights so that existing search heuristics can be informed. We show that this idea is promising and identify criteria for problems in which variational execution is a promising tool, which may be useful to identify further applications. @InProceedings{ESEC/FSE18p749, author = {Chu-Pan Wong and Jens Meinicke and Christian Kästner}, title = {Beyond Testing Configurable Systems: Applying Variational Execution to Automatic Program Repair and Higher Order Mutation Testing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {749--753}, doi = {10.1145/3236024.3264837}, year = {2018}, } Publisher's Version |
|
Kate, Sayali |
ESEC/FSE '18: "Phys: Probabilistic Physical ..."
Phys: Probabilistic Physical Unit Assignment and Inconsistency Detection
Sayali Kate, John-Paul Ore, Xiangyu Zhang, Sebastian Elbaum, and Zhaogui Xu (Purdue University, USA; University of Nebraska-Lincoln, USA; University on Nebraska-Lincoln, USA; Nanjing University, China) Program variables used in robotic and cyber-physical systems often have implicit physical units that cannot be determined from their variable types. Inferring an abstract physical unit type for variables and checking their physical unit type consistency is of particular importance for validating the correctness of such systems. For instance, a variable with the unit of ‘meter’ should not be assigned to another variable with the unit of ‘degree-per-second’. Existing solutions have various limitations such as requiring developers to annotate variables with physical units and only handling variables that are directly or transitively used in popular robotic libraries with known physical unit information. We observe that there are a lot of physical unit hints in these softwares such as variable names and specific forms of expressions. These hints have uncertainty as developers may not respect conventions. We propose to model them with probability distributions and conduct probabilistic inference. At the end, our technique produces a unit distribution for each variable. Unit inconsistencies can then be detected using the highly probable unit assignments. Experimental results on 30 programs show that our technique can infer units for 159.3% more variables compared to the state-of-the-art with more than 88.7% true positives, and inconsistencies detection on 90 programs shows that our technique reports 103.3% more inconsistencies with 85.3% true positives. @InProceedings{ESEC/FSE18p563, author = {Sayali Kate and John-Paul Ore and Xiangyu Zhang and Sebastian Elbaum and Zhaogui Xu}, title = {Phys: Probabilistic Physical Unit Assignment and Inconsistency Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {563--573}, doi = {10.1145/3236024.3236035}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Ketkar, Ameya |
ESEC/FSE '18-SRC: "Type Migration in Large-Scale ..."
Type Migration in Large-Scale Code Bases
Ameya Ketkar (Oregon State University, USA) Type migration is a frequent refactoring activity in which an existing type is replaced with another one throughout the source code. Recent studies have shown that type migration is more frequent in larger codebases. The state-of-the-art type migration tools cannot scale to large projects. Moreover, these tools do not fit into modern software development workflows, e.g., in Continuous Integration. This paper presents an IDE-independent type migration technique that scales to ultra-large-scale codebases through a MapReduce parallel and distributed process. We have implemented our approach in a tool called T2R. We evaluated it on codebases as large as 790 KLOC for specializing functional interfaces. Our results show that T2R is safe, scalable and useful. Open source developers accepted 70 migration patches spanning over 202 files. @InProceedings{ESEC/FSE18p965, author = {Ameya Ketkar}, title = {Type Migration in Large-Scale Code Bases}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {965--967}, doi = {10.1145/3236024.3275434}, year = {2018}, } Publisher's Version |
|
Khurshid, Sarfraz |
ESEC/FSE '18-DEMO: "SketchFix: A Tool for Automated ..."
SketchFix: A Tool for Automated Program Repair Approach using Lazy Candidate Generation
Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid (University of Texas at Austin, USA) Manually locating and removing bugs in faulty program is often tedious and error-prone. A common automated program repair approach called generate-and-validate (G&V) iteratively creates candidate fixes, compiles them, and runs these candidates against the given tests. This approach can be costly due to a large number of re-compilations and re-executions of the program. To tackle this limitation, recent work introduced the SketchFix approach that tightly integrates the generation and validation phases, and utilizes runtime behaviors to substantially prune a large amount of repair candidates. This tool paper describes our Java implementation of SketchFix, which is an open-source library that we released on Github. Our experimental evaluation using Defects4J benchmark shows that SketchFix can significantly reduce the number of re-compilations and re-executions compared to other approaches and work particularly well in repairing expression manipulation at the AST node-level granularity.The demo video is at: https://youtu.be/AO-YCH8vGzQ. @InProceedings{ESEC/FSE18p888, author = {Jinru Hua and Mengshi Zhang and Kaiyuan Wang and Sarfraz Khurshid}, title = {SketchFix: A Tool for Automated Program Repair Approach using Lazy Candidate Generation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {888--891}, doi = {10.1145/3236024.3264600}, year = {2018}, } Publisher's Version Video ESEC/FSE '18-DEMO: "ASketch: A Sketching Framework ..." ASketch: A Sketching Framework for Alloy Kaiyuan Wang, Allison Sullivan, Darko Marinov, and Sarfraz Khurshid (University of Texas at Austin, USA; University of Illinois at Urbana-Champaign, USA) Alloy is a declarative modeling language that supports first-order logic with transitive closure. Alloy has been used in a variety of domains to model software systems and find design deficiencies. However, it is often challenging to make an Alloy model correct or to debug a faulty Alloy model. ASketch is a sketching/synthesis technique that can help users write correct Alloy models. ASketch allows users to provide a partial Alloy model with holes, a generator that specifies candidate fragments to be considered for each hole, and a set of tests that capture the desired model properties. Then, the tool completes the holes such that all tests for the completed model pass. ASketch uses tests written for the recently introduced AUnit framework, which provides a foundation of testing (unit tests, test execution, and model coverage) for Alloy models in the spirit of traditional unit testing. This paper describes our Java implementation of ASketch, which is a command-line tool, released as an open-source project on GitHub. Our experimental results show that ASketch can handle partial Alloy models with multiple holes and a large search space. The demo video for ASketch can be found at https://youtu.be/T5NIVsV329E. @InProceedings{ESEC/FSE18p916, author = {Kaiyuan Wang and Allison Sullivan and Darko Marinov and Sarfraz Khurshid}, title = {ASketch: A Sketching Framework for Alloy}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {916--919}, doi = {10.1145/3236024.3264594}, year = {2018}, } Publisher's Version |
|
Kim, Mijung |
ESEC/FSE '18: "Which Generated Test Failures ..."
Which Generated Test Failures Are Fault Revealing? Prioritizing Failures Based on Inferred Precondition Violations using PAF
Mijung Kim, Shing-Chi Cheung, and Sunghun Kim (Hong Kong University of Science and Technology, China) Automated unit testing tools, such as Randoop, have been developed to produce failing tests as means of finding faults. However, these tools often produce false alarms, so are not widely used in practice. The main reason for a false alarm is that the generated failing test violates an implicit precondition of the method under test, such as a field should not be null at the entry of the method. This condition is not explicitly programmed or documented but implicitly assumed by developers. To address this limitation, we propose a technique called PAF to cluster generated test failures due to the same cause and reorder them based on their likelihood of violating an implicit precondition of the method under test. From various test executions, PAF observes their dataflows to the variables whose values are used when the program fails. Based on the dataflow similarity and where these values are originated, PAF clusters failures and determines their likelihood of being fault revealing. We integrated PAF into Randoop. Our empirical results on open-source projects show that PAF effectively clusters fault revealing tests arising from the same fault and successfully prioritizes the fault-revealing ones. @InProceedings{ESEC/FSE18p679, author = {Mijung Kim and Shing-Chi Cheung and Sunghun Kim}, title = {Which Generated Test Failures Are Fault Revealing? Prioritizing Failures Based on Inferred Precondition Violations using PAF}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {679--690}, doi = {10.1145/3236024.3236058}, year = {2018}, } Publisher's Version |
|
Kim, Miryung |
ESEC/FSE '18-DEMO: "BigSift: Automated Debugging ..."
BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing
Muhammad Ali Gulzar, Siman Wang, and Miryung Kim (University of California at Los Angeles, USA; Hunan University, China) Developing Big Data Analytics often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g. program crash, outlier results, etc.) arise, developers are often interested in pinpointing the root cause of errors. To address this problem, BigSift takes an Apache Spark program, a user-defined test oracle function, and a dataset as input and outputs a minimum set of input records that reproduces the same test failure by combining the insights from delta debugging with data provenance. The technical contribution of BigSift is the design of systems optimizations that bring automated debugging closer to a reality for data intensive scalable computing. BigSift exposes an interactive web interface where a user can monitor a big data analytics job running remotely on the cloud, write a user-defined test oracle function, and then trigger the automated debugging process. BigSift also provides a set of predefined test oracle functions, which can be used for explaining common types of anomalies in big data analytics--for example, finding the origin of the output value that is more than k standard deviations away from the median. The demonstration video is available at https://youtu.be/jdBsCd61a1Q. @InProceedings{ESEC/FSE18p863, author = {Muhammad Ali Gulzar and Siman Wang and Miryung Kim}, title = {BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {863--866}, doi = {10.1145/3236024.3264586}, year = {2018}, } Publisher's Version ESEC/FSE '18-DEMO: "Augmenting Stack Overflow ..." Augmenting Stack Overflow with API Usage Patterns Mined from GitHub Anastasia Reinhardt, Tianyi Zhang, Mihir Mathur, and Miryung Kim (George Fox University, USA; University of California at Los Angeles, USA) Programmers often consult Q&A websites such as Stack Overflow (SO) to learn new APIs. However, online code snippets are not always complete or reliable in terms of API usage. To assess online code snippets, we build a Chrome extension, ExampleCheck that detects API usage violations in SO posts using API usage patterns mined from 380K GitHub projects. It quantifies how many GitHub examples follow common API usage and illustrates how to remedy the detected violation in a given SO snippet. With ExampleCheck, programmers can easily identify the pitfalls of a given SO snippet and learn how much it deviates from common API usage patterns in GitHub. The demo video is at https://youtu.be/WOnN-wQZsH0. @InProceedings{ESEC/FSE18p880, author = {Anastasia Reinhardt and Tianyi Zhang and Mihir Mathur and Miryung Kim}, title = {Augmenting Stack Overflow with API Usage Patterns Mined from GitHub}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {880--883}, doi = {10.1145/3236024.3264585}, year = {2018}, } Publisher's Version Video Info |
|
Kim, Sunghun |
ESEC/FSE '18: "Which Generated Test Failures ..."
Which Generated Test Failures Are Fault Revealing? Prioritizing Failures Based on Inferred Precondition Violations using PAF
Mijung Kim, Shing-Chi Cheung, and Sunghun Kim (Hong Kong University of Science and Technology, China) Automated unit testing tools, such as Randoop, have been developed to produce failing tests as means of finding faults. However, these tools often produce false alarms, so are not widely used in practice. The main reason for a false alarm is that the generated failing test violates an implicit precondition of the method under test, such as a field should not be null at the entry of the method. This condition is not explicitly programmed or documented but implicitly assumed by developers. To address this limitation, we propose a technique called PAF to cluster generated test failures due to the same cause and reorder them based on their likelihood of violating an implicit precondition of the method under test. From various test executions, PAF observes their dataflows to the variables whose values are used when the program fails. Based on the dataflow similarity and where these values are originated, PAF clusters failures and determines their likelihood of being fault revealing. We integrated PAF into Randoop. Our empirical results on open-source projects show that PAF effectively clusters fault revealing tests arising from the same fault and successfully prioritizes the fault-revealing ones. @InProceedings{ESEC/FSE18p679, author = {Mijung Kim and Shing-Chi Cheung and Sunghun Kim}, title = {Which Generated Test Failures Are Fault Revealing? Prioritizing Failures Based on Inferred Precondition Violations using PAF}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {679--690}, doi = {10.1145/3236024.3236058}, year = {2018}, } Publisher's Version |
|
King, Irwin |
ESEC/FSE '18-DEMO: "INFAR: Insight Extraction ..."
INFAR: Insight Extraction from App Reviews
Cuiyun Gao, Jichuan Zeng, David Lo, Chin-Yew Lin, Michael R. Lyu, and Irwin King (Chinese University of Hong Kong, China; Singapore Management University, Singapore; Microsoft Research, China) App reviews play an essential role for users to convey their feedback about using the app. The critical information contained in app reviews can assist app developers for maintaining and updating mobile apps. However, the noisy nature and large-quantity of daily generated app reviews make it difficult to understand essential information carried in app reviews. Several prior studies have proposed methods that can automatically classify or cluster user reviews into a few app topics (e.g., security). These methods usually act on a static collection of user reviews. However, due to the dynamic nature of user feedback (i.e., reviews keep coming as new users register or new app versions being released) and multiple analysis dimensions (e.g., review quantity and user rating), developers still need to spend substantial effort in extracting contrastive information that can only be teased out by comparing data from multiple time periods or analysis dimensions. This is needed to answer questions such as: what kind of issues users are experiencing most? is there an unexpected rise in a particular kind of issue? etc. To address this need, in this paper, we introduce INFAR, a tool that automatically extracts INsights From App Reviews across time periods and analysis dimensions, and presents them in natural language supported by an interactive chart. The insights INFAR extracts include several perspectives: (1) salient topics (i.e., issue topics with significantly lower ratings), (2) abnormal topics (i.e., issue topics that experience a rapid rise in volume during a time period), (3) correlations between two topics, and (4) causal factors to rating or review quantity changes. To evaluate our tool, we conduct an empirical evaluation by involving six popular apps and 12 industrial practitioners, and 92% (11/12) of them approve the practical usefulness of the insights summarized by INFAR. Demo Tool Website: https://remine-lab.github.io/paper/infar.html Demo Video: https://youtu.be/MjcoiyjA5TE @InProceedings{ESEC/FSE18p904, author = {Cuiyun Gao and Jichuan Zeng and David Lo and Chin-Yew Lin and Michael R. Lyu and Irwin King}, title = {INFAR: Insight Extraction from App Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {904--907}, doi = {10.1145/3236024.3264595}, year = {2018}, } Publisher's Version |
|
Kini, Dileep |
ESEC/FSE '18: "Data Race Detection on Compressed ..."
Data Race Detection on Compressed Traces
Dileep Kini, Umang Mathur, and Mahesh Viswanathan (Akuna Capital, USA; University of Illinois at Urbana-Champaign, USA) We consider the problem of detecting data races in program traces that have been compressed using straight line programs (SLP), which are special context-free grammars that generate exactly one string, namely the trace that they represent. We consider two classical approaches to race detection --- using the happens-before relation and the lockset discipline. We present algorithms for both these methods that run in time that is linear in the size of the compressed, SLP representation. Typical program executions almost always exhibit patterns that lead to significant compression. Thus, our algorithms are expected to result in large speedups when compared with analyzing the uncompressed trace. Our experimental evaluation of these new algorithms on standard benchmarks confirms this observation. @InProceedings{ESEC/FSE18p26, author = {Dileep Kini and Umang Mathur and Mahesh Viswanathan}, title = {Data Race Detection on Compressed Traces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {26--37}, doi = {10.1145/3236024.3236025}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Klein, Jacques |
ESEC/FSE '18: "FraudDroid: Automated Ad Fraud ..."
FraudDroid: Automated Ad Fraud Detection for Android Apps
Feng Dong, Haoyu Wang, Li Li, Yao Guo, Tegawendé F. Bissyandé, Tianming Liu, Guoai Xu, and Jacques Klein (Beijing University of Posts and Telecommunications, China; Monash University, Australia; Peking University, China; University of Luxembourg, Luxembourg) Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (∼ 93%) and recall (∼ 92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection. @InProceedings{ESEC/FSE18p257, author = {Feng Dong and Haoyu Wang and Li Li and Yao Guo and Tegawendé F. Bissyandé and Tianming Liu and Guoai Xu and Jacques Klein}, title = {FraudDroid: Automated Ad Fraud Detection for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--268}, doi = {10.1145/3236024.3236045}, year = {2018}, } Publisher's Version |
|
Koszek, Wojciech |
ESEC/FSE '18-NIER: "Towards Quantifying the Development ..."
Towards Quantifying the Development Value of Code Contributions
Jinglei Ren, Hezheng Yin, Qingda Hu, Armando Fox, and Wojciech Koszek (Microsoft Research, China; University of California at Berkeley, USA; Tsinghua University, China; FreeBSD Project, USA) Quantifying the value of developers’ code contributions to a software project requires more than simply counting lines of code or commits. We define the development value of code as a combination of its structural value (the effect of code reuse) and its non-structural value (the impact on development). We propose techniques to automatically calculate both components of development value and combine them using Learning to Rank. Our preliminary empirical study shows that our analysis yields richer results than those obtained by human assessment or simple counting methods and demonstrates the potential of our approach. @InProceedings{ESEC/FSE18p775, author = {Jinglei Ren and Hezheng Yin and Qingda Hu and Armando Fox and Wojciech Koszek}, title = {Towards Quantifying the Development Value of Code Contributions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {775--779}, doi = {10.1145/3236024.3264842}, year = {2018}, } Publisher's Version |
|
Koutris, Paraschos |
ESEC/FSE '18: "Syntax-Guided Synthesis of ..."
Syntax-Guided Synthesis of Datalog Programs
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik (University of Pennsylvania, USA; Hanyang University, South Korea; University of Wisconsin-Madison, USA) Datalog has witnessed promising applications in a variety of domains. We propose a programming-by-example system, ALPS, to synthesize Datalog programs from input-output examples. Scaling synthesis to realistic programs in this manner is challenging due to the rich expressivity of Datalog. We present a syntax-guided synthesis approach that prunes the search space by exploiting the observation that in practice Datalog programs comprise rules that have similar latent syntactic structure. We evaluate ALPS on a suite of 34 benchmarks from three domains—knowledge discovery, program analysis, and database queries. The evaluation shows that ALPS can synthesize 33 of these benchmarks, and outperforms the state-of-the-art tools Metagol and Zaatar, which can synthesize only up to 10 of the benchmarks. @InProceedings{ESEC/FSE18p515, author = {Xujie Si and Woosuk Lee and Richard Zhang and Aws Albarghouthi and Paraschos Koutris and Mayur Naik}, title = {Syntax-Guided Synthesis of Datalog Programs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {515--527}, doi = {10.1145/3236024.3236034}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Krishna, Rahul |
ESEC/FSE '18: "Applications of Psychological ..."
Applications of Psychological Science for Actionable Analytics
Di Chen, Wei Fu, Rahul Krishna, and Tim Menzies (North Carolina State University, USA) According to psychological scientists, humans understand models that most match their own internal models, which they characterize as lists of "heuristic"s (i.e. lists of very succinct rules). One such heuristic rule generator is the Fast-and-Frugal Trees (FFT) preferred by psychological scientists. Despite their successful use in many applied domains, FFTs have not been applied in software analytics. Accordingly, this paper assesses FFTs for software analytics. We find that FFTs are remarkably effective in that their models are very succinct (5 lines or less describing a binary decision tree) while also outperforming result from very recent, top-level, conference papers. Also, when we restrict training data to operational attributes (i.e., those attributes that are frequently changed by developers), the performance of FFTs are not effected (while the performance of other learners can vary wildly). Our conclusions are two-fold. Firstly, there is much that software analytics community could learn from psychological science. Secondly, proponents of complex methods should always baseline those methods against simpler alternatives. For example, FFTs could be used as a standard baseline learner against which other software analytics tools are compared. @InProceedings{ESEC/FSE18p456, author = {Di Chen and Wei Fu and Rahul Krishna and Tim Menzies}, title = {Applications of Psychological Science for Actionable Analytics}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {456--467}, doi = {10.1145/3236024.3236050}, year = {2018}, } Publisher's Version |
|
Kurtev, Ivan |
ESEC/FSE '18-DEMO: "AlloyInEcore: Embedding of ..."
AlloyInEcore: Embedding of First-Order Relational Logic into Meta-Object Facility for Automated Model Reasoning
Ferhat Erata, Arda Goknil, Ivan Kurtev, and Bedir Tekinerdogan (UNIT Information Technologies, Turkey; University of Luxembourg, Luxembourg; Altran, Netherlands; Wageningen University, Netherlands) We present AlloyInEcore, a tool for specifying metamodels with their static semantics to facilitate automated, formal reasoning on models. Software development projects require that software systems be specified in various models (e.g., requirements models, architecture models, test models, and source code). It is crucial to reason about those models to ensure the correct and complete system specifications. AlloyInEcore~allows the user to specify metamodels with their static semantics, while, using the semantics, it automatically detects inconsistent models, and completes partial models. It has been evaluated on three industrial case studies in the automotive domain (https://modelwriter.github.io/AlloyInEcore/). @InProceedings{ESEC/FSE18p920, author = {Ferhat Erata and Arda Goknil and Ivan Kurtev and Bedir Tekinerdogan}, title = {AlloyInEcore: Embedding of First-Order Relational Logic into Meta-Object Facility for Automated Model Reasoning}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {920--923}, doi = {10.1145/3236024.3264588}, year = {2018}, } Publisher's Version Video Info |
|
Lahiri, Shuvendu K. |
ESEC/FSE '18: "Code Vectors: Understanding ..."
Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces
Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, and Thomas Reps (University of Wisconsin-Madison, USA; Microsoft, USA; GrammaTech, USA) With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions. @InProceedings{ESEC/FSE18p163, author = {Jordan Henkel and Shuvendu K. Lahiri and Ben Liblit and Thomas Reps}, title = {Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {163--174}, doi = {10.1145/3236024.3236085}, year = {2018}, } Publisher's Version Info Artifacts Functional |
|
Le, Tien-Duy B. |
ESEC/FSE '18-DEMO: "DSM: A Specification Mining ..."
DSM: A Specification Mining Tool using Recurrent Neural Network Based Language Model
Tien-Duy B. Le, Lingfeng Bao, and David Lo (Singapore Management University, Singapore; Zhejiang University, China) Formal specifications are important but often unavailable. Furthermore, writing these specifications is time-consuming and requires skills from developers. In this work, we present Deep Specification Miner (DSM), an automated tool that applies deep learning to mine finite-state automaton (FSA) based specifications. DSM accepts as input a set of execution traces to train a Recurrent Neural Network Language Model (RNNLM). From the input traces, DSM creates a Prefix Tree Acceptor (PTA) and leverages the inferred RNNLM to extract many features. These features are then forwarded to clustering algorithms for merging similar automata states in the PTA for assembling a number of FSAs. Next, our tool performs a model selection heuristic to approximate F-measure of FSAs, and outputs the one with the highest estimated F-measure. Noticeably, our implementation of DSM provides several options that allows users to optimize quality of resultant FSAs. Our video demonstration on the performance of DSM is publicly available at https://goo.gl/Ju4yFS. @InProceedings{ESEC/FSE18p896, author = {Tien-Duy B. Le and Lingfeng Bao and David Lo}, title = {DSM: A Specification Mining Tool using Recurrent Neural Network Based Language Model}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {896--899}, doi = {10.1145/3236024.3264597}, year = {2018}, } Publisher's Version |
|
Lee, Dongyoon |
ESEC/FSE '18: "The Impact of Regular Expression ..."
The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale
James C. Davis, Christy A. Coghlan, Francisco Servant, and Dongyoon Lee (Virginia Tech, USA) Regular expressions (regexes) are a popular and powerful means of automatically manipulating text. Regexes are also an understudied denial of service vector (ReDoS). If a regex has super-linear worst-case complexity, an attacker may be able to trigger this complexity, exhausting the victim’s CPU resources and causing denial of service. Existing research has shown how to detect these superlinear regexes, and practitioners have identified super-linear regex anti-pattern heuristics that may lead to such complexity. In this paper, we empirically study three major aspects of ReDoS that have hitherto been unexplored: the incidence of super-linear regexes in practice, how they can be prevented, and how they can be repaired. In the ecosystems of two of the most popular programming languages — JavaScript and Python – we detected thousands of super-linear regexes affecting over 10,000 modules across diverse application domains. We also found that the conventional wisdom for super-linear regex anti-patterns has few false negatives but many false positives; these anti-patterns appear to be necessary, but not sufficient, signals of super-linear behavior. Finally, we found that when faced with a super-linear regex, developers favor revising it over truncating input or developing a custom parser, regardless of whether they had been shown examples of all three fix strategies. These findings motivate further research into ReDoS, since many modules are vulnerable to it and existing mechanisms to avoid it are insufficient. We believe that ReDoS vulnerabilities are a larger threat in practice than might have been guessed. @InProceedings{ESEC/FSE18p246, author = {James C. Davis and Christy A. Coghlan and Francisco Servant and Dongyoon Lee}, title = {The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {246--256}, doi = {10.1145/3236024.3236027}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Lee, Junhee |
ESEC/FSE '18: "MemFix: Static Analysis-Based ..."
MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C
Junhee Lee, Seongjoon Hong, and Hakjoo Oh (Korea University, South Korea) We present MemFix, an automated technique for fixing memory deallocation errors in C programs. MemFix aims to fix memory-leak, double-free, and use-after-free errors, which occur when developers fail to properly deallocate memory objects. MemFix attempts to fix these errors by finding a set of free-statements that correctly deallocate all allocated objects without causing double-frees and use-after-frees. The key insight behind MemFix is that finding such a set of deallocation statements corresponds to solving an exact cover problem derived from a variant of typestate static analysis. We formally present the technique and experimentally show that MemFix is able to fix real errors found in open-source programs. Because MemFix is based on a sound static analysis, the generated patches guarantee to fix the original errors without introducing new errors. @InProceedings{ESEC/FSE18p95, author = {Junhee Lee and Seongjoon Hong and Hakjoo Oh}, title = {MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--106}, doi = {10.1145/3236024.3236079}, year = {2018}, } Publisher's Version |
|
Lee, Wen-Chuan |
ESEC/FSE '18: "MODE: Automated Neural Network ..."
MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama (Purdue University, USA) Artificial intelligence models are becoming an integral part of modern computing systems. Just like software inevitably has bugs, models have bugs too, leading to poor classification/prediction accuracy. Unlike software bugs, model bugs cannot be easily fixed by directly modifying models. Existing solutions work by providing additional training inputs. However, they have limited effectiveness due to the lack of understanding of model misbehaviors and hence the incapability of selecting proper inputs. Inspired by software debugging, we propose a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing. Our evaluation results on 29 different models for 6 different applications show that our technique can fix model bugs effectively and efficiently without introducing new bugs. For simple applications (e.g., digit recognition), MODE improves the test accuracy from 75% to 93% on average whereas the state-of-the-art can only improve to 85% with 11 times more training time. For complex applications and models (e.g., object recognition), MODE is able to improve the accuracy from 75% to over 91% in minutes to a few hours, whereas state-of-the-art fails to fix the bug or even degrades the test accuracy. @InProceedings{ESEC/FSE18p175, author = {Shiqing Ma and Yingqi Liu and Wen-Chuan Lee and Xiangyu Zhang and Ananth Grama}, title = {MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {175--186}, doi = {10.1145/3236024.3236082}, year = {2018}, } Publisher's Version |
|
Lee, Woosuk |
ESEC/FSE '18: "Syntax-Guided Synthesis of ..."
Syntax-Guided Synthesis of Datalog Programs
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik (University of Pennsylvania, USA; Hanyang University, South Korea; University of Wisconsin-Madison, USA) Datalog has witnessed promising applications in a variety of domains. We propose a programming-by-example system, ALPS, to synthesize Datalog programs from input-output examples. Scaling synthesis to realistic programs in this manner is challenging due to the rich expressivity of Datalog. We present a syntax-guided synthesis approach that prunes the search space by exploiting the observation that in practice Datalog programs comprise rules that have similar latent syntactic structure. We evaluate ALPS on a suite of 34 benchmarks from three domains—knowledge discovery, program analysis, and database queries. The evaluation shows that ALPS can synthesize 33 of these benchmarks, and outperforms the state-of-the-art tools Metagol and Zaatar, which can synthesize only up to 10 of the benchmarks. @InProceedings{ESEC/FSE18p515, author = {Xujie Si and Woosuk Lee and Richard Zhang and Aws Albarghouthi and Paraschos Koutris and Mayur Naik}, title = {Syntax-Guided Synthesis of Datalog Programs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {515--527}, doi = {10.1145/3236024.3236034}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Lee, Young Chul |
ESEC/FSE '18-IND: "Regression Test Selection ..."
Regression Test Selection for TizenRT
Ahmet Celik, Young Chul Lee, and Milos Gligoric (University of Texas at Austin, USA; Samsung Electronics, South Korea) Regression testing - running tests after code modifications - is widely practiced in industry, including at Samsung. Regression Test Selection (RTS) optimizes regression testing by skipping tests that are not affected by recent code changes. Recent work has developed robust RTS tools, which mostly target managed languages, e.g., Java and C#, and thus are not applicable to large C projects, e.g., TizenRT, a lightweight RTOS-based platform. We present Selfection, an RTS tool for projects written in C; we discuss the key challenges to develop Selfection and our design decisions. Selfection uses the objdump and readelf tools to statically build a dependency graph of functions from binaries and detect modified code elements. We integrated Selfection in TizenRT and evaluated its benefits if tests are run in an emulator and on a supported hardware platform (ARTIK 053). We used the latest 150 revisions of TizenRT available on GitHub. We measured the benefits of Selfection as the reduction in the number of tests and reduction in test execution time over running all tests at each revision (i.e., RetestAll). Our results show that Selfection can reduce, on average, the number of tests to 4.95% and end-to-end execution time to 7.04% when tests are executed in the emulator, and to 5.74% and 26.82% when tests are executed on the actual hardware. Our results also show that the time taken to maintain the dependency graph and detect modified functions is negligible. @InProceedings{ESEC/FSE18p845, author = {Ahmet Celik and Young Chul Lee and Milos Gligoric}, title = {Regression Test Selection for TizenRT}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {845--850}, doi = {10.1145/3236024.3275527}, year = {2018}, } Publisher's Version |
|
Legunsen, Owolabi |
ESEC/FSE '18: "Testing Probabilistic Programming ..."
Testing Probabilistic Programming Systems
Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic (University of Illinois at Urbana-Champaign, USA) Probabilistic programming systems (PP systems) allow developers to model stochastic phenomena and perform efficient inference on the models. The number and adoption of probabilistic programming systems is growing significantly. However, there is no prior study of bugs in these systems and no methodology for systematically testing PP systems. Yet, testing PP systems is highly non-trivial, especially when they perform approximate inference. In this paper, we characterize 118 previously reported bugs in three open-source PP systems—Edward, Pyro and Stan—and pro- pose ProbFuzz, an extensible system for testing PP systems. Prob- Fuzz allows a developer to specify templates of probabilistic models, from which it generates concrete probabilistic programs and data for testing. ProbFuzz uses language-specific translators to generate these concrete programs, which use the APIs of each PP system. ProbFuzz finds potential bugs by checking the output from running the generated programs against several oracles, including an accu- racy checker. Using ProbFuzz, we found 67 previously unknown bugs in recent versions of these PP systems. Developers already accepted 51 bug fixes that we submitted to the three PP systems, and their underlying systems, PyTorch and TensorFlow. @InProceedings{ESEC/FSE18p574, author = {Saikat Dutta and Owolabi Legunsen and Zixin Huang and Sasa Misailovic}, title = {Testing Probabilistic Programming Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {574--586}, doi = {10.1145/3236024.3236057}, year = {2018}, } Publisher's Version |
|
Lehmann, Daniel |
ESEC/FSE '18: "Feedback-Directed Differential ..."
Feedback-Directed Differential Testing of Interactive Debuggers
Daniel Lehmann and Michael Pradel (TU Darmstadt, Germany) To understand, localize, and fix programming errors, developers often rely on interactive debuggers. However, as debuggers are software, they may themselves have bugs, which can make debugging unnecessarily hard or even cause developers to reason about bugs that do not actually exist in their code. This paper presents the first automated testing technique for interactive debuggers. The problem of testing debuggers is fundamentally different from the well-studied problem of testing compilers because debuggers are interactive and because they lack a specification of expected behavior. Our approach, called DBDB, generates debugger actions to exercise the debugger and records traces that summarize the debugger's behavior. By comparing traces of multiple debuggers with each other, we find diverging behavior that points to bugs and other noteworthy differences. We evaluate DBDB on the JavaScript debuggers of Firefox and Chromium, finding 19 previously unreported bugs, eight of which are already fixed by the developers. @InProceedings{ESEC/FSE18p610, author = {Daniel Lehmann and Michael Pradel}, title = {Feedback-Directed Differential Testing of Interactive Debuggers}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {610--620}, doi = {10.1145/3236024.3236037}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Li, Chenggang |
ESEC/FSE '18: "Predicting Node Failure in ..."
Predicting Node Failure in Cloud Service Systems
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, and Dongmei Zhang (Microsoft Research, China; Microsoft, USA; University of Newcastle, Australia) In recent years, many traditional software systems have migrated to cloud computing platforms and are provided as online services. The service quality matters because system failures could seriously affect business and user experience. A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability. Predicting node failure in cloud service systems is challenging, because a node failure could be caused by a variety of reasons and reflected by many temporal and spatial signals. Furthermore, the failure data is highly imbalanced. To tackle these challenges, we propose MING, a novel technique that combines: 1) a LSTM model to incorporate the temporal data, 2) a Random Forest model to incorporate spatial data; 3) a ranking model that embeds the intermediate results of the two models as feature inputs and ranks the nodes by their failure-proneness, 4) a cost-sensitive function to identify the optimal threshold for selecting the faulty nodes. We evaluate our approach using real-world data collected from a cloud service system. The results confirm the effectiveness of the proposed approach. We have also successfully applied the proposed approach in real industrial practice. @InProceedings{ESEC/FSE18p480, author = {Qingwei Lin and Ken Hsieh and Yingnong Dang and Hongyu Zhang and Kaixin Sui and Yong Xu and Jian-Guang Lou and Chenggang Li and Youjiang Wu and Randolph Yao and Murali Chintalapati and Dongmei Zhang}, title = {Predicting Node Failure in Cloud Service Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {480--490}, doi = {10.1145/3236024.3236060}, year = {2018}, } Publisher's Version |
|
Li, Feng |
ESEC/FSE '18: "CloudRaid: Hunting Concurrency ..."
CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining
Jie Lu, Feng Li, Lian Li, and Xiaobing Feng (Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Institute of Information Engineering at Chinese Academy of Sciences, China) Cloud systems suffer from distributed concurrency bugs, which are notoriously difficult to detect and often lead to data loss and service outage. This paper presents CloudRaid, a new effective tool to battle distributed concurrency bugs. CloudRaid automatically detects concurrency bugs in cloud systems, by analyzing and testing those message orderings that are likely to expose errors. We observe that large-scale online cloud applications process millions of user requests per second, exercising many permutations of message orderings extensively. Those already sufficiently-tested message orderings are unlikely to expose errors. Hence, CloudRaid mines logs from previous executions to uncover those message orderings which are feasible, but not sufficiently tested. Specifically, CloudRaid tries to flip the order of a pair of messages <S,P> if they may happen in parallel, but S always arrives before P from existing logs, i.e., excercising the order P ↣ S. The log-based approach makes it suitable to live systems. We have applied CloudRaid to automatically test four representative distributed systems: Apache Hadoop2/Yarn, HBase, HDFS and Cassandra. CloudRaid can automatically test 40 different versions of the 4 systems (10 versions per system) in 35 hours, and can successfully trigger 28 concurrency bugs, including 8 new bugs that have never been found before. The 8 new bugs have all been confirmed by their original developers, and 3 of them are considered as critical bugs that have already been fixed. @InProceedings{ESEC/FSE18p3, author = {Jie Lu and Feng Li and Lian Li and Xiaobing Feng}, title = {CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {3--14}, doi = {10.1145/3236024.3236071}, year = {2018}, } Publisher's Version |
|
Li, Li |
ESEC/FSE '18: "FraudDroid: Automated Ad Fraud ..."
FraudDroid: Automated Ad Fraud Detection for Android Apps
Feng Dong, Haoyu Wang, Li Li, Yao Guo, Tegawendé F. Bissyandé, Tianming Liu, Guoai Xu, and Jacques Klein (Beijing University of Posts and Telecommunications, China; Monash University, Australia; Peking University, China; University of Luxembourg, Luxembourg) Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (∼ 93%) and recall (∼ 92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection. @InProceedings{ESEC/FSE18p257, author = {Feng Dong and Haoyu Wang and Li Li and Yao Guo and Tegawendé F. Bissyandé and Tianming Liu and Guoai Xu and Jacques Klein}, title = {FraudDroid: Automated Ad Fraud Detection for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--268}, doi = {10.1145/3236024.3236045}, year = {2018}, } Publisher's Version |
|
Li, Lian |
ESEC/FSE '18: "CloudRaid: Hunting Concurrency ..."
CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining
Jie Lu, Feng Li, Lian Li, and Xiaobing Feng (Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Institute of Information Engineering at Chinese Academy of Sciences, China) Cloud systems suffer from distributed concurrency bugs, which are notoriously difficult to detect and often lead to data loss and service outage. This paper presents CloudRaid, a new effective tool to battle distributed concurrency bugs. CloudRaid automatically detects concurrency bugs in cloud systems, by analyzing and testing those message orderings that are likely to expose errors. We observe that large-scale online cloud applications process millions of user requests per second, exercising many permutations of message orderings extensively. Those already sufficiently-tested message orderings are unlikely to expose errors. Hence, CloudRaid mines logs from previous executions to uncover those message orderings which are feasible, but not sufficiently tested. Specifically, CloudRaid tries to flip the order of a pair of messages <S,P> if they may happen in parallel, but S always arrives before P from existing logs, i.e., excercising the order P ↣ S. The log-based approach makes it suitable to live systems. We have applied CloudRaid to automatically test four representative distributed systems: Apache Hadoop2/Yarn, HBase, HDFS and Cassandra. CloudRaid can automatically test 40 different versions of the 4 systems (10 versions per system) in 35 hours, and can successfully trigger 28 concurrency bugs, including 8 new bugs that have never been found before. The 8 new bugs have all been confirmed by their original developers, and 3 of them are considered as critical bugs that have already been fixed. @InProceedings{ESEC/FSE18p3, author = {Jie Lu and Feng Li and Lian Li and Xiaobing Feng}, title = {CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {3--14}, doi = {10.1145/3236024.3236071}, year = {2018}, } Publisher's Version |
|
Li, Lingbo |
ESEC/FSE '18: "Darwinian Data Structure Selection ..."
Darwinian Data Structure Selection
Michail Basios, Lingbo Li, Fan Wu, Leslie Kanthan, and Earl T. Barr (University College London, UK) Data structure selection and tuning is laborious but can vastly improve an application’s performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce ARTEMIS a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS modulo a test suite, then changes an application to use that DDS. ARTEMIS achieves substantial performance improvements for every project in 5 Java projects from DaCapo benchmark, 8 popular projects and 30 uniformly sampled projects from GitHub. For execution time, CPU usage, and memory consumption, ARTEMIS finds at least one solution that improves all measures for 86% (37/43) of the projects. The median improvement across the best solutions is 4.8%, 10.1%, 5.1% for runtime, memory and CPU usage. These aggregate results understate ARTEMIS’s potential impact. Some of the benchmarks it improves are libraries or utility functions. Two examples are gson, a ubiquitous Java serialization framework, and xalan, Apache’s XML transformation tool. ARTEMIS improves gson by 16.5%, 1% and 2.2% for memory, runtime, and CPU; ARTEMIS improves xalan’s memory consumption by 23.5%. Every client of these projects will benefit from these performance improvements. @InProceedings{ESEC/FSE18p118, author = {Michail Basios and Lingbo Li and Fan Wu and Leslie Kanthan and Earl T. Barr}, title = {Darwinian Data Structure Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {118--128}, doi = {10.1145/3236024.3236043}, year = {2018}, } Publisher's Version Info |
|
Li, Shanping |
ESEC/FSE '18-DEMO: "VT-Revolution: Interactive ..."
VT-Revolution: Interactive Programming Tutorials Made Possible
Lingfeng Bao, Zhenchang Xing, Xin Xia, David Lo, and Shanping Li (Zhejiang University, China; Australian National University, Australia; Monash University, Australia; Singapore Management University, Singapore) Programming video tutorials showcase programming tasks and associated workflows. Although video tutorials are easy to create, it is often difficult to explore the captured workflows and interact with the programs in the videos. In this work, we propose a tool named VTRevolution -- an interactive programming video tutorial authoring system. VTRevolution has two components: 1) a tutorial authoring system leverages operating system level instrumentation to log workflow history while tutorial authors are creating programming video tutorials; 2) a tutorial watching system enhances the learning experience of video tutorials by providing operation history and timeline-based browsing interactions. Our tutorial authoring system does not require any special recording tools or instrumentation of target applications. Neither does it incur any additional burden on tutorial authors to add interactions to video tutorials. Given a video tutorial enriched with synchronously-logged workflow history, our tutorial watching system allows tutorial watchers to explore the captured workflows and interact with files and code in a way that is impossible for video data alone. We conduct a user study of 90 developers to evaluate the design and effectiveness of our system in helping developers learn programming knowledge in video tutorials. @InProceedings{ESEC/FSE18p924, author = {Lingfeng Bao and Zhenchang Xing and Xin Xia and David Lo and Shanping Li}, title = {VT-Revolution: Interactive Programming Tutorials Made Possible}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {924--927}, doi = {10.1145/3236024.3264587}, year = {2018}, } Publisher's Version |
|
Li, Yue |
ESEC/FSE '18: "Scalability-First Pointer ..."
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity
Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis (Aarhus University, Denmark; University of Athens, Greece) Context-sensitivity is important in pointer analysis to ensure high precision, but existing techniques suffer from unpredictable scalability. Many variants of context-sensitivity exist, and it is difficult to choose one that leads to reasonable analysis time and obtains high precision, without running the analysis multiple times. We present the Scaler framework that addresses this problem. Scaler efficiently estimates the amount of points-to information that would be needed to analyze each method with different variants of context-sensitivity. It then selects an appropriate variant for each method so that the total amount of points-to information is bounded, while utilizing the available space to maximize precision. Our experimental results demonstrate that Scaler achieves predictable scalability for all the evaluated programs (e.g., speedups can reach 10x for 2-object-sensitivity), while providing a precision that matches or even exceeds that of the best alternative techniques. @InProceedings{ESEC/FSE18p129, author = {Yue Li and Tian Tan and Anders Møller and Yannis Smaragdakis}, title = {Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--140}, doi = {10.1145/3236024.3236041}, year = {2018}, } Publisher's Version Info Artifacts Functional |
|
Li, Yuekang |
ESEC/FSE '18-DEMO: "FOT: A Versatile, Configurable, ..."
FOT: A Versatile, Configurable, Extensible Fuzzing Framework
Hongxu Chen, Yuekang Li, Bihuan Chen, Yinxing Xue, and Yang Liu (Nanyang Technological University, Singapore; Fudan University, China; University of Science and Technology of China, China) Greybox fuzzing is one of the most effective approaches for detecting software vulnerabilities. Various new techniques have been continuously emerging to enhance the effectiveness and/or efficiency by incorporating novel ideas into different components of a greybox fuzzer. However, there lacks a modularized fuzzing framework that can easily plugin new techniques and hence facilitate the reuse, integration and comparison of different techniques. To address this problem, we propose a fuzzing framework, namely Fuzzing Orchestration Toolkit (FOT). FOT is designed to be versatile, configurable and extensible. With FOT and its extensions, we have found 111 new bugs from 11 projects. Among these bugs, 18 CVEs have been assigned. Video link: https://youtu.be/O6Qu7BJ8RP0. @InProceedings{ESEC/FSE18p867, author = {Hongxu Chen and Yuekang Li and Bihuan Chen and Yinxing Xue and Yang Liu}, title = {FOT: A Versatile, Configurable, Extensible Fuzzing Framework}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {867--870}, doi = {10.1145/3236024.3264593}, year = {2018}, } Publisher's Version Video Info |
|
Liang, Bin |
ESEC/FSE '18: "NAR-Miner: Discovering Negative ..."
NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection
Pan Bian, Bin Liang, Wenchang Shi, Jianjun Huang, and Yan Cai (Renmin University of China, China; Institute of Software at Chinese Academy of Sciences, China) Inferring programming rules from source code based on data mining techniques has been proven to be effective to detect software bugs. Existing studies focus on discovering positive rules in the form of A ⇒ B, indicating that when operation A appears, operation B should also be here. Unfortunately, the negative rules (A ⇒ ¬ B), indicating the mutual suppression or conflict relationships among program elements, have not gotten the attention they deserve. In fact, violating such negative rules can also result in serious bugs. In this paper, we propose a novel method called NAR-Miner to automatically extract negative association programming rules from large-scale systems, and detect their violations to find bugs. However, mining negative rules faces a more serious rule explosion problem than mining positive ones. Most of the obtained negative rules are uninteresting and can lead to unacceptable false alarms. To address the issue, we design a semantics-constrained mining algorithm to focus rule mining on the elements with strong semantic relationships. Furthermore, we introduce information entropy to rank candidate negative rules and highlight the interesting ones. Consequently, we effectively mitigate the rule explosion problem. We implement NAR-Miner and apply it to a Linux kernel (v4.12-rc6). The experiments show that the uninteresting rules are dramatically reduced and 17 detected violations have been confirmed as real bugs and patched by kernel community. We also apply NAR-Miner to PostgreSQL, OpenSSL and FFmpeg and discover six real bugs. @InProceedings{ESEC/FSE18p411, author = {Pan Bian and Bin Liang and Wenchang Shi and Jianjun Huang and Yan Cai}, title = {NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {411--422}, doi = {10.1145/3236024.3236032}, year = {2018}, } Publisher's Version |
|
Liang, Jie |
ESEC/FSE '18-IND: "PAFL: Extend Fuzzing Optimizations ..."
PAFL: Extend Fuzzing Optimizations of Single Mode to Industrial Parallel Mode
Jie Liang, Yu Jiang, Yuanliang Chen, Mingzhe Wang, Chijin Zhou, and Jiaguang Sun (Tsinghua University, China) Researchers have proposed many optimizations to improve the efficiency of fuzzing, and most optimized strategies work very well on their targets when running in single mode with instantiating one fuzzer instance. However, in real industrial practice, most fuzzers run in parallel mode with instantiating multiple fuzzer instances, and those optimizations unfortunately fail to maintain the efficiency improvements. In this paper, we present PAFL, a framework that utilizes efficient guiding information synchronization and task division to extend those existing fuzzing optimizations of single mode to industrial parallel mode. With an additional data structure to store the guiding information, the synchronization ensures the information is shared and updated among different fuzzer instances timely. Then, the task division promotes the diversity of fuzzer instances by splitting the fuzzing task into several sub-tasks based on branch bitmap. We first evaluate PAFL using 12 different real-world programs from Google fuzzer-test-suite. Results show that in parallel mode, two AFL improvers–AFLFast and FairFuzz do not outperform AFL, which is different from the case in single mode. However, when augmented with PAFL, the performance of AFLFast and FairFuzz in parallel mode improves. They cover 8% and 17% more branches, trigger 79% and 52% more unique crashes. For further evaluation on more widely-used software systems from GitHub, optimized fuzzers augmented with PAFL find more real bugs, and 25 of which are security-critical vulnerabilities registered as CVEs in the US National Vulnerability Database. @InProceedings{ESEC/FSE18p809, author = {Jie Liang and Yu Jiang and Yuanliang Chen and Mingzhe Wang and Chijin Zhou and Jiaguang Sun}, title = {PAFL: Extend Fuzzing Optimizations of Single Mode to Industrial Parallel Mode}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {809--814}, doi = {10.1145/3236024.3275525}, year = {2018}, } Publisher's Version |
|
Liblit, Ben |
ESEC/FSE '18: "Code Vectors: Understanding ..."
Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces
Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, and Thomas Reps (University of Wisconsin-Madison, USA; Microsoft, USA; GrammaTech, USA) With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions. @InProceedings{ESEC/FSE18p163, author = {Jordan Henkel and Shuvendu K. Lahiri and Ben Liblit and Thomas Reps}, title = {Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {163--174}, doi = {10.1145/3236024.3236085}, year = {2018}, } Publisher's Version Info Artifacts Functional |
|
Lima, Bruno |
ESEC/FSE '18-SRC: "Automated Scenario-Based Integration ..."
Automated Scenario-Based Integration Testing of Distributed Systems
Bruno Lima (University of Porto, Portugal; INESC TEC, Portugal) In a growing number of domains, the provisioning of end-to-end services to the users depends on the proper interoperation of multiple systems, forming a new distributed system, often subject to timing constraints. To ensure interoperability and integrity, it is important to conduct integration tests that verify the interactions with the environment and between the system components in key scenarios. To tackle test automation challenges, we propose algorithms for decentralized conformance checking and test input generation, and for checking and enforcing the conditions (local observability and controllability) that allow decentralized test execution. @InProceedings{ESEC/FSE18p956, author = {Bruno Lima}, title = {Automated Scenario-Based Integration Testing of Distributed Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {956--958}, doi = {10.1145/3236024.3275431}, year = {2018}, } Publisher's Version |
|
Lin, Chin-Yew |
ESEC/FSE '18-DEMO: "INFAR: Insight Extraction ..."
INFAR: Insight Extraction from App Reviews
Cuiyun Gao, Jichuan Zeng, David Lo, Chin-Yew Lin, Michael R. Lyu, and Irwin King (Chinese University of Hong Kong, China; Singapore Management University, Singapore; Microsoft Research, China) App reviews play an essential role for users to convey their feedback about using the app. The critical information contained in app reviews can assist app developers for maintaining and updating mobile apps. However, the noisy nature and large-quantity of daily generated app reviews make it difficult to understand essential information carried in app reviews. Several prior studies have proposed methods that can automatically classify or cluster user reviews into a few app topics (e.g., security). These methods usually act on a static collection of user reviews. However, due to the dynamic nature of user feedback (i.e., reviews keep coming as new users register or new app versions being released) and multiple analysis dimensions (e.g., review quantity and user rating), developers still need to spend substantial effort in extracting contrastive information that can only be teased out by comparing data from multiple time periods or analysis dimensions. This is needed to answer questions such as: what kind of issues users are experiencing most? is there an unexpected rise in a particular kind of issue? etc. To address this need, in this paper, we introduce INFAR, a tool that automatically extracts INsights From App Reviews across time periods and analysis dimensions, and presents them in natural language supported by an interactive chart. The insights INFAR extracts include several perspectives: (1) salient topics (i.e., issue topics with significantly lower ratings), (2) abnormal topics (i.e., issue topics that experience a rapid rise in volume during a time period), (3) correlations between two topics, and (4) causal factors to rating or review quantity changes. To evaluate our tool, we conduct an empirical evaluation by involving six popular apps and 12 industrial practitioners, and 92% (11/12) of them approve the practical usefulness of the insights summarized by INFAR. Demo Tool Website: https://remine-lab.github.io/paper/infar.html Demo Video: https://youtu.be/MjcoiyjA5TE @InProceedings{ESEC/FSE18p904, author = {Cuiyun Gao and Jichuan Zeng and David Lo and Chin-Yew Lin and Michael R. Lyu and Irwin King}, title = {INFAR: Insight Extraction from App Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {904--907}, doi = {10.1145/3236024.3264595}, year = {2018}, } Publisher's Version |
|
Lin, Qingwei |
ESEC/FSE '18: "Predicting Node Failure in ..."
Predicting Node Failure in Cloud Service Systems
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, and Dongmei Zhang (Microsoft Research, China; Microsoft, USA; University of Newcastle, Australia) In recent years, many traditional software systems have migrated to cloud computing platforms and are provided as online services. The service quality matters because system failures could seriously affect business and user experience. A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability. Predicting node failure in cloud service systems is challenging, because a node failure could be caused by a variety of reasons and reflected by many temporal and spatial signals. Furthermore, the failure data is highly imbalanced. To tackle these challenges, we propose MING, a novel technique that combines: 1) a LSTM model to incorporate the temporal data, 2) a Random Forest model to incorporate spatial data; 3) a ranking model that embeds the intermediate results of the two models as feature inputs and ranks the nodes by their failure-proneness, 4) a cost-sensitive function to identify the optimal threshold for selecting the faulty nodes. We evaluate our approach using real-world data collected from a cloud service system. The results confirm the effectiveness of the proposed approach. We have also successfully applied the proposed approach in real industrial practice. @InProceedings{ESEC/FSE18p480, author = {Qingwei Lin and Ken Hsieh and Yingnong Dang and Hongyu Zhang and Kaixin Sui and Yong Xu and Jian-Guang Lou and Chenggang Li and Youjiang Wu and Randolph Yao and Murali Chintalapati and Dongmei Zhang}, title = {Predicting Node Failure in Cloud Service Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {480--490}, doi = {10.1145/3236024.3236060}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Identifying Impactful Service ..." Identifying Impactful Service System Problems via Log Analysis Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang (Chinese University of Hong Kong, China; Microsoft Research, China; University of Newcastle, Australia) Logs are often used for troubleshooting in large-scale software systems. For a cloud-based online system that provides 24/7 service, a huge number of logs could be generated every day. However, these logs are highly imbalanced in general, because most logs indicate normal system operations, and only a small percentage of logs reveal impactful problems. Problems that lead to the decline of system KPIs (Key Performance Indicators) are impactful and should be fixed by engineers with a high priority. Furthermore, there are various types of system problems, which are hard to be distinguished manually. In this paper, we propose Log3C, a novel clustering-based approach to promptly and precisely identify impactful system problems, by utilizing both log sequences (a sequence of log events) and system KPIs. More specifically, we design a novel cascading clustering algorithm, which can greatly save the clustering time while keeping high accuracy by iteratively sampling, clustering, and matching log sequences. We then identify the impactful problems by correlating the clusters of log sequences with system KPIs. Log3C is evaluated on real-world log data collected from an online service system at Microsoft, and the results confirm its effectiveness and efficiency. Furthermore, our approach has been successfully applied in industrial practice. @InProceedings{ESEC/FSE18p60, author = {Shilin He and Qingwei Lin and Jian-Guang Lou and Hongyu Zhang and Michael R. Lyu and Dongmei Zhang}, title = {Identifying Impactful Service System Problems via Log Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {60--70}, doi = {10.1145/3236024.3236083}, year = {2018}, } Publisher's Version |
|
Liu, Chao |
ESEC/FSE '18-DEMO: "EClone: Detect Semantic Clones ..."
EClone: Detect Semantic Clones in Ethereum via Symbolic Transaction Sketch
Han Liu, Zhiqiang Yang, Chao Liu, Yu Jiang, Wenqi Zhao, and Jiaguang Sun (Tsinghua University, China; Peking University, China; Ant Financial, China) The Ethereum ecosystem has created a prosperity of smart contract applications in public blockchains, with transparent, traceable and programmable transactions. However, the flexibility that everybody can write and deploy smart contracts on Ethereum causes a large collection of similar contracts, i.e., clones. In practice, smart contract clones may amplify severe threats like security attacks, resource waste etc. In this paper, we have developed EClone, a semantic clone detector for Ethereum. The key insight of our clone detection is Symbolic Transaction Sketch, i.e., a set of critical semantic properties generated from symbolic transaction. Sketches of two smart contracts will be normalized into numeric vectors with a same length. Then, the clone detection problem is modeled as a similarity computation process where sketches and other syntactic information are combined. We have applied EClone in identifying semantic clones of deployed Ethereum smart contracts and achieved an accuracy of 93.27%. A demo video of EClone is at https://youtu.be/IRasOVv6vyc. @InProceedings{ESEC/FSE18p900, author = {Han Liu and Zhiqiang Yang and Chao Liu and Yu Jiang and Wenqi Zhao and Jiaguang Sun}, title = {EClone: Detect Semantic Clones in Ethereum via Symbolic Transaction Sketch}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {900--903}, doi = {10.1145/3236024.3264596}, year = {2018}, } Publisher's Version |
|
Liu, Han |
ESEC/FSE '18-DEMO: "EClone: Detect Semantic Clones ..."
EClone: Detect Semantic Clones in Ethereum via Symbolic Transaction Sketch
Han Liu, Zhiqiang Yang, Chao Liu, Yu Jiang, Wenqi Zhao, and Jiaguang Sun (Tsinghua University, China; Peking University, China; Ant Financial, China) The Ethereum ecosystem has created a prosperity of smart contract applications in public blockchains, with transparent, traceable and programmable transactions. However, the flexibility that everybody can write and deploy smart contracts on Ethereum causes a large collection of similar contracts, i.e., clones. In practice, smart contract clones may amplify severe threats like security attacks, resource waste etc. In this paper, we have developed EClone, a semantic clone detector for Ethereum. The key insight of our clone detection is Symbolic Transaction Sketch, i.e., a set of critical semantic properties generated from symbolic transaction. Sketches of two smart contracts will be normalized into numeric vectors with a same length. Then, the clone detection problem is modeled as a similarity computation process where sketches and other syntactic information are combined. We have applied EClone in identifying semantic clones of deployed Ethereum smart contracts and achieved an accuracy of 93.27%. A demo video of EClone is at https://youtu.be/IRasOVv6vyc. @InProceedings{ESEC/FSE18p900, author = {Han Liu and Zhiqiang Yang and Chao Liu and Yu Jiang and Wenqi Zhao and Jiaguang Sun}, title = {EClone: Detect Semantic Clones in Ethereum via Symbolic Transaction Sketch}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {900--903}, doi = {10.1145/3236024.3264596}, year = {2018}, } Publisher's Version |
|
Liu, Tianming |
ESEC/FSE '18: "FraudDroid: Automated Ad Fraud ..."
FraudDroid: Automated Ad Fraud Detection for Android Apps
Feng Dong, Haoyu Wang, Li Li, Yao Guo, Tegawendé F. Bissyandé, Tianming Liu, Guoai Xu, and Jacques Klein (Beijing University of Posts and Telecommunications, China; Monash University, Australia; Peking University, China; University of Luxembourg, Luxembourg) Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (∼ 93%) and recall (∼ 92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection. @InProceedings{ESEC/FSE18p257, author = {Feng Dong and Haoyu Wang and Li Li and Yao Guo and Tegawendé F. Bissyandé and Tianming Liu and Guoai Xu and Jacques Klein}, title = {FraudDroid: Automated Ad Fraud Detection for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--268}, doi = {10.1145/3236024.3236045}, year = {2018}, } Publisher's Version |
|
Liu, Yang |
ESEC/FSE '18-DEMO: "FOT: A Versatile, Configurable, ..."
FOT: A Versatile, Configurable, Extensible Fuzzing Framework
Hongxu Chen, Yuekang Li, Bihuan Chen, Yinxing Xue, and Yang Liu (Nanyang Technological University, Singapore; Fudan University, China; University of Science and Technology of China, China) Greybox fuzzing is one of the most effective approaches for detecting software vulnerabilities. Various new techniques have been continuously emerging to enhance the effectiveness and/or efficiency by incorporating novel ideas into different components of a greybox fuzzer. However, there lacks a modularized fuzzing framework that can easily plugin new techniques and hence facilitate the reuse, integration and comparison of different techniques. To address this problem, we propose a fuzzing framework, namely Fuzzing Orchestration Toolkit (FOT). FOT is designed to be versatile, configurable and extensible. With FOT and its extensions, we have found 111 new bugs from 11 projects. Among these bugs, 18 CVEs have been assigned. Video link: https://youtu.be/O6Qu7BJ8RP0. @InProceedings{ESEC/FSE18p867, author = {Hongxu Chen and Yuekang Li and Bihuan Chen and Yinxing Xue and Yang Liu}, title = {FOT: A Versatile, Configurable, Extensible Fuzzing Framework}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {867--870}, doi = {10.1145/3236024.3264593}, year = {2018}, } Publisher's Version Video Info ESEC/FSE '18-IND: "Are Mobile Banking Apps Secure? ..." Are Mobile Banking Apps Secure? What Can Be Improved? Sen Chen, Ting Su, Lingling Fan, Guozhu Meng, Minhui Xue, Yang Liu, and Lihua Xu (East China Normal University, China; Nanyang Technological University, Singapore; Chinese Academy of Sciences, China; Macquarie University, Australia; New York University Shanghai, China) Mobile banking apps, as one of the most contemporary FinTechs, have been widely adopted by banking entities to provide instant financial services. However, our recent work discovered thousands of vulnerabilities in 693 banking apps, which indicates these apps are not as secure as we expected. This motivates us to conduct this study for understanding the current security status of them. First, we take 6 months to track the reporting and patching procedure of these vulnerabilities. Second, we audit 4 state-of the-art vulnerability detection tools on those patched vulnerabilities. Third, we discuss with 7 banking entities via in-person or online meetings and conduct an online survey to gain more feedback from financial app developers. Through this study, we reveal that (1) people may have inconsistent understandings of the vulnerabilities and different criteria for rating severity; (2) state-of-the-art tools are not effective in detecting vulnerabilities that the banking entities most concern; and (3) more efforts should be endeavored in different aspects to secure banking apps. We believe our study can help bridge the existing gaps, and further motivate different parties, including banking entities, researchers and policy makers, to better tackle security issues altogether. @InProceedings{ESEC/FSE18p797, author = {Sen Chen and Ting Su and Lingling Fan and Guozhu Meng and Minhui Xue and Yang Liu and Lihua Xu}, title = {Are Mobile Banking Apps Secure? What Can Be Improved?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {797--802}, doi = {10.1145/3236024.3275523}, year = {2018}, } Publisher's Version |
|
Liu, Yingqi |
ESEC/FSE '18: "MODE: Automated Neural Network ..."
MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama (Purdue University, USA) Artificial intelligence models are becoming an integral part of modern computing systems. Just like software inevitably has bugs, models have bugs too, leading to poor classification/prediction accuracy. Unlike software bugs, model bugs cannot be easily fixed by directly modifying models. Existing solutions work by providing additional training inputs. However, they have limited effectiveness due to the lack of understanding of model misbehaviors and hence the incapability of selecting proper inputs. Inspired by software debugging, we propose a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing. Our evaluation results on 29 different models for 6 different applications show that our technique can fix model bugs effectively and efficiently without introducing new bugs. For simple applications (e.g., digit recognition), MODE improves the test accuracy from 75% to 93% on average whereas the state-of-the-art can only improve to 85% with 11 times more training time. For complex applications and models (e.g., object recognition), MODE is able to improve the accuracy from 75% to over 91% in minutes to a few hours, whereas state-of-the-art fails to fix the bug or even degrades the test accuracy. @InProceedings{ESEC/FSE18p175, author = {Shiqing Ma and Yingqi Liu and Wen-Chuan Lee and Xiangyu Zhang and Ananth Grama}, title = {MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {175--186}, doi = {10.1145/3236024.3236082}, year = {2018}, } Publisher's Version |
|
Liu, Yu David |
ESEC/FSE '18: "Stochastic Energy Optimization ..."
Stochastic Energy Optimization for Mobile GPS Applications
Anthony Canino, Yu David Liu, and Hidehiko Masuhara (SUNY Binghamton, USA; Tokyo Institute of Technology, Japan) Mobile applications regularly interact with their noisy and ever-changing physical environment. The fundamentally uncertain nature of such interactions leads to significant challenges in energy optimization, a crucial goal of software engineering on mobile devices. This paper presents Aeneas, a novel energy optimization framework for Android in the presence of uncertainty. Aeneas provides a minimalistic programming model where acceptable program behavioral settings are abstracted as knobs and application-specific optimization goals — such as meeting an energy budget — are crystallized as rewards, both of which are directly programmable. At its heart, Aeneas is endowed with a stochastic optimizer to adaptively and intelligently select the reward-optimal knob setting through a form of reinforcement learning. We evaluate Aeneas on mobile GPS applications built over Google LocationService API. Through an in-field case study that covers approximately 6500 miles and 150 hours of driving as well as 20 hours of biking and hiking, we find that Aeneas can effectively and resiliently meet programmer-specified energy budgets in uncertain physical environments where individual GPS readings undergo significant fluctuation. Compared with non-stochastic approaches such as profile-guided optimization, Aeneas produces significantly more stable results across runs. @InProceedings{ESEC/FSE18p703, author = {Anthony Canino and Yu David Liu and Hidehiko Masuhara}, title = {Stochastic Energy Optimization for Mobile GPS Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {703--713}, doi = {10.1145/3236024.3236076}, year = {2018}, } Publisher's Version |
|
Liu, Zhenwei |
ESEC/FSE '18: "Do the Dependency Conflicts ..."
Do the Dependency Conflicts in My Project Matter?
Ying Wang, Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, Bo Yang, Hai Yu, Zhiliang Zhu, and Shing-Chi Cheung (Northeastern University, China; Hong Kong University of Science and Technology, China) Intensive dependencies of a Java project on third-party libraries can easily lead to the presence of multiple library or class versions on its classpath. When this happens, JVM will load one version and shadows the others. Dependency conflict (DC) issues occur when the loaded version fails to cover a required feature (e.g., method) referenced by the project, thus causing runtime exceptions. However, the warnings of duplicate classes or libraries detected by existing build tools such as Maven can be benign since not all instances of duplication will induce runtime exceptions, and hence are often ignored by developers. In this paper, we conducted an empirical study on real-world DC issues collected from large open source projects. We studied the manifestation and fixing patterns of DC issues. Based on our findings, we designed Decca, an automated detection tool that assesses DC issues' severity and filters out the benign ones. Our evaluation results on 30 projects show that Decca achieves a precision of 0.923 and recall of 0.766 in detecting high-severity DC issues. Decca also detected new DC issues in these projects. Subsequently, 20 DC bug reports were filed, and 11 of them were confirmed by developers. Issues in 6 reports were fixed with our suggested patches. @InProceedings{ESEC/FSE18p319, author = {Ying Wang and Ming Wen and Zhenwei Liu and Rongxin Wu and Rui Wang and Bo Yang and Hai Yu and Zhiliang Zhu and Shing-Chi Cheung}, title = {Do the Dependency Conflicts in My Project Matter?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {319--330}, doi = {10.1145/3236024.3236056}, year = {2018}, } Publisher's Version Info Artifacts Reusable Artifacts Functional |
|
Llerena, Yamilet R. Serrano |
ESEC/FSE '18: "Verifying the Long-Run Behavior ..."
Verifying the Long-Run Behavior of Probabilistic System Models in the Presence of Uncertainty
Yamilet R. Serrano Llerena, Marcel Böhme, Marc Brünink, Guoxin Su, and David S. Rosenblum (National University of Singapore, Singapore; Monash University, Australia; University of Wollongong, Australia) Verifying that a stochastic system is in a certain state when it has reached equilibrium has important applications. For instance, the probabilistic verification of the long-run behavior of a safety-critical system enables assessors to check whether it accepts a human abort-command at any time with a probability that is sufficiently high. The stochastic system is represented as probabilistic model, a long-run property is asserted and a probabilistic verifier checks the model against the property. However, existing probabilistic verifiers do not account for the imprecision of the probabilistic parameters in the model. Due to uncertainty, the probability of any state transition may be subject to small perturbations which can have direct consequences for the veracity of the verification result. In reality, the safety-critical system may accept the abort-command with an insufficient probability. In this paper, we introduce the first probabilistic verification technique that accounts for uncertainty on the verification of long-run properties of a stochastic system. We present a mathematical framework for the asymptotic analysis of the stationary distribution of a discrete-time Markov chain, making no assumptions about the distribution of the perturbations. Concretely, our novel technique computes upper and lower bounds on the long-run probability, given a certain degree of uncertainty about the stochastic system. @InProceedings{ESEC/FSE18p587, author = {Yamilet R. Serrano Llerena and Marcel Böhme and Marc Brünink and Guoxin Su and David S. Rosenblum}, title = {Verifying the Long-Run Behavior of Probabilistic System Models in the Presence of Uncertainty}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {587--597}, doi = {10.1145/3236024.3236078}, year = {2018}, } Publisher's Version |
|
Lo, David |
ESEC/FSE '18-DEMO: "VT-Revolution: Interactive ..."
VT-Revolution: Interactive Programming Tutorials Made Possible
Lingfeng Bao, Zhenchang Xing, Xin Xia, David Lo, and Shanping Li (Zhejiang University, China; Australian National University, Australia; Monash University, Australia; Singapore Management University, Singapore) Programming video tutorials showcase programming tasks and associated workflows. Although video tutorials are easy to create, it is often difficult to explore the captured workflows and interact with the programs in the videos. In this work, we propose a tool named VTRevolution -- an interactive programming video tutorial authoring system. VTRevolution has two components: 1) a tutorial authoring system leverages operating system level instrumentation to log workflow history while tutorial authors are creating programming video tutorials; 2) a tutorial watching system enhances the learning experience of video tutorials by providing operation history and timeline-based browsing interactions. Our tutorial authoring system does not require any special recording tools or instrumentation of target applications. Neither does it incur any additional burden on tutorial authors to add interactions to video tutorials. Given a video tutorial enriched with synchronously-logged workflow history, our tutorial watching system allows tutorial watchers to explore the captured workflows and interact with files and code in a way that is impossible for video data alone. We conduct a user study of 90 developers to evaluate the design and effectiveness of our system in helping developers learn programming knowledge in video tutorials. @InProceedings{ESEC/FSE18p924, author = {Lingfeng Bao and Zhenchang Xing and Xin Xia and David Lo and Shanping Li}, title = {VT-Revolution: Interactive Programming Tutorials Made Possible}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {924--927}, doi = {10.1145/3236024.3264587}, year = {2018}, } Publisher's Version ESEC/FSE '18-DEMO: "DSM: A Specification Mining ..." DSM: A Specification Mining Tool using Recurrent Neural Network Based Language Model Tien-Duy B. Le, Lingfeng Bao, and David Lo (Singapore Management University, Singapore; Zhejiang University, China) Formal specifications are important but often unavailable. Furthermore, writing these specifications is time-consuming and requires skills from developers. In this work, we present Deep Specification Miner (DSM), an automated tool that applies deep learning to mine finite-state automaton (FSA) based specifications. DSM accepts as input a set of execution traces to train a Recurrent Neural Network Language Model (RNNLM). From the input traces, DSM creates a Prefix Tree Acceptor (PTA) and leverages the inferred RNNLM to extract many features. These features are then forwarded to clustering algorithms for merging similar automata states in the PTA for assembling a number of FSAs. Next, our tool performs a model selection heuristic to approximate F-measure of FSAs, and outputs the one with the highest estimated F-measure. Noticeably, our implementation of DSM provides several options that allows users to optimize quality of resultant FSAs. Our video demonstration on the performance of DSM is publicly available at https://goo.gl/Ju4yFS. @InProceedings{ESEC/FSE18p896, author = {Tien-Duy B. Le and Lingfeng Bao and David Lo}, title = {DSM: A Specification Mining Tool using Recurrent Neural Network Based Language Model}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {896--899}, doi = {10.1145/3236024.3264597}, year = {2018}, } Publisher's Version ESEC/FSE '18-DEMO: "INFAR: Insight Extraction ..." INFAR: Insight Extraction from App Reviews Cuiyun Gao, Jichuan Zeng, David Lo, Chin-Yew Lin, Michael R. Lyu, and Irwin King (Chinese University of Hong Kong, China; Singapore Management University, Singapore; Microsoft Research, China) App reviews play an essential role for users to convey their feedback about using the app. The critical information contained in app reviews can assist app developers for maintaining and updating mobile apps. However, the noisy nature and large-quantity of daily generated app reviews make it difficult to understand essential information carried in app reviews. Several prior studies have proposed methods that can automatically classify or cluster user reviews into a few app topics (e.g., security). These methods usually act on a static collection of user reviews. However, due to the dynamic nature of user feedback (i.e., reviews keep coming as new users register or new app versions being released) and multiple analysis dimensions (e.g., review quantity and user rating), developers still need to spend substantial effort in extracting contrastive information that can only be teased out by comparing data from multiple time periods or analysis dimensions. This is needed to answer questions such as: what kind of issues users are experiencing most? is there an unexpected rise in a particular kind of issue? etc. To address this need, in this paper, we introduce INFAR, a tool that automatically extracts INsights From App Reviews across time periods and analysis dimensions, and presents them in natural language supported by an interactive chart. The insights INFAR extracts include several perspectives: (1) salient topics (i.e., issue topics with significantly lower ratings), (2) abnormal topics (i.e., issue topics that experience a rapid rise in volume during a time period), (3) correlations between two topics, and (4) causal factors to rating or review quantity changes. To evaluate our tool, we conduct an empirical evaluation by involving six popular apps and 12 industrial practitioners, and 92% (11/12) of them approve the practical usefulness of the insights summarized by INFAR. Demo Tool Website: https://remine-lab.github.io/paper/infar.html Demo Video: https://youtu.be/MjcoiyjA5TE @InProceedings{ESEC/FSE18p904, author = {Cuiyun Gao and Jichuan Zeng and David Lo and Chin-Yew Lin and Michael R. Lyu and Irwin King}, title = {INFAR: Insight Extraction from App Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {904--907}, doi = {10.1145/3236024.3264595}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Using Finite-State Models ..." Using Finite-State Models for Log Differencing Hen Amar, Lingfeng Bao, Nimrod Busany, David Lo, and Shahar Maoz (Tel Aviv University, Israel; Zhejiang University, Singapore; Singapore Management University, Singapore) Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest. In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails. We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments. @InProceedings{ESEC/FSE18p49, author = {Hen Amar and Lingfeng Bao and Nimrod Busany and David Lo and Shahar Maoz}, title = {Using Finite-State Models for Log Differencing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {49--59}, doi = {10.1145/3236024.3236069}, year = {2018}, } Publisher's Version |
|
Long, Yuheng |
ESEC/FSE '18: "Large-Scale Study of Substitutability ..."
Large-Scale Study of Substitutability in the Presence of Effects
Jackson Maddox, Yuheng Long, and Hridesh Rajan (Iowa State University, USA; Google, USA) A majority of modern software is constructed using languages that compute by producing side-effects such as reading/writing from/to files, throwing exceptions, acquiring locks, etc. To understand a piece of software, e.g. a class, it is important for a developer to understand its side-effects. Similarly, to replace a class with another, it is important to understand whether the replacement is a safe substitution for the former in terms of its behavior, a property known as substitutability, because mismatch may lead to bugs. The problem is especially severe for superclass-subclass pairs since at runtime an instance of the subclass may be used in the client code where a superclass is mentioned. Despite the importance of this property, we do not yet know whether substitutability w.r.t. effects between subclass and superclass is preserved in the wild, and if not what sorts of substitutability violations are common and what is the impact of such violations. This paper conducts a large scale study on over 20 million Java classes, in order to compare the effects of the methods of subclasses and superclasses in practice. Our comprehensive study considers the exception, synchronization, I/O, and method call effects. It reveals that in pairs with effects, only 8-24% have the same effects, and 31-56% of submethods have more effects, and the effects of a large percentage of submethods cannot be inferred from the supermethod. @InProceedings{ESEC/FSE18p528, author = {Jackson Maddox and Yuheng Long and Hridesh Rajan}, title = {Large-Scale Study of Substitutability in the Presence of Effects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {528--538}, doi = {10.1145/3236024.3236075}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Lopes, Cristina V. |
ESEC/FSE '18: "Oreo: Detection of Clones ..."
Oreo: Detection of Clones in the Twilight Zone
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes (University of California at Irvine, USA) Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect – the Twilight Zone. Most clone detectors reported in the literature fail to operate in this zone. We present Oreo, a novel approach to source code clone detection that not only detects Type-1 to Type-3 clones accurately, but is also capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. We evaluate the recall of Oreo on BigCloneBench, and perform manual evaluation for precision. Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity in a scalable manner @InProceedings{ESEC/FSE18p354, author = {Vaibhav Saini and Farima Farmahinifarahani and Yadong Lu and Pierre Baldi and Cristina V. Lopes}, title = {Oreo: Detection of Clones in the Twilight Zone}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {354--365}, doi = {10.1145/3236024.3236026}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Lou, Jian-Guang |
ESEC/FSE '18: "Predicting Node Failure in ..."
Predicting Node Failure in Cloud Service Systems
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, and Dongmei Zhang (Microsoft Research, China; Microsoft, USA; University of Newcastle, Australia) In recent years, many traditional software systems have migrated to cloud computing platforms and are provided as online services. The service quality matters because system failures could seriously affect business and user experience. A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability. Predicting node failure in cloud service systems is challenging, because a node failure could be caused by a variety of reasons and reflected by many temporal and spatial signals. Furthermore, the failure data is highly imbalanced. To tackle these challenges, we propose MING, a novel technique that combines: 1) a LSTM model to incorporate the temporal data, 2) a Random Forest model to incorporate spatial data; 3) a ranking model that embeds the intermediate results of the two models as feature inputs and ranks the nodes by their failure-proneness, 4) a cost-sensitive function to identify the optimal threshold for selecting the faulty nodes. We evaluate our approach using real-world data collected from a cloud service system. The results confirm the effectiveness of the proposed approach. We have also successfully applied the proposed approach in real industrial practice. @InProceedings{ESEC/FSE18p480, author = {Qingwei Lin and Ken Hsieh and Yingnong Dang and Hongyu Zhang and Kaixin Sui and Yong Xu and Jian-Guang Lou and Chenggang Li and Youjiang Wu and Randolph Yao and Murali Chintalapati and Dongmei Zhang}, title = {Predicting Node Failure in Cloud Service Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {480--490}, doi = {10.1145/3236024.3236060}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Identifying Impactful Service ..." Identifying Impactful Service System Problems via Log Analysis Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang (Chinese University of Hong Kong, China; Microsoft Research, China; University of Newcastle, Australia) Logs are often used for troubleshooting in large-scale software systems. For a cloud-based online system that provides 24/7 service, a huge number of logs could be generated every day. However, these logs are highly imbalanced in general, because most logs indicate normal system operations, and only a small percentage of logs reveal impactful problems. Problems that lead to the decline of system KPIs (Key Performance Indicators) are impactful and should be fixed by engineers with a high priority. Furthermore, there are various types of system problems, which are hard to be distinguished manually. In this paper, we propose Log3C, a novel clustering-based approach to promptly and precisely identify impactful system problems, by utilizing both log sequences (a sequence of log events) and system KPIs. More specifically, we design a novel cascading clustering algorithm, which can greatly save the clustering time while keeping high accuracy by iteratively sampling, clustering, and matching log sequences. We then identify the impactful problems by correlating the clusters of log sequences with system KPIs. Log3C is evaluated on real-world log data collected from an online service system at Microsoft, and the results confirm its effectiveness and efficiency. Furthermore, our approach has been successfully applied in industrial practice. @InProceedings{ESEC/FSE18p60, author = {Shilin He and Qingwei Lin and Jian-Guang Lou and Hongyu Zhang and Michael R. Lyu and Dongmei Zhang}, title = {Identifying Impactful Service System Problems via Log Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {60--70}, doi = {10.1145/3236024.3236083}, year = {2018}, } Publisher's Version |
|
Lou, Yiling |
ESEC/FSE '18: "Optimizing Test Prioritization ..."
Optimizing Test Prioritization via Test Distribution Analysis
Junjie Chen, Yiling Lou, Lingming Zhang, Jianyi Zhou, Xiaoleng Wang, Dan Hao, and Lu Zhang (Peking University, China; University of Texas at Dallas, USA; Baidu, China) Test prioritization aims to detect regression faults faster via reordering test executions, and a large number of test prioritization techniques have been proposed accordingly. However, test prioritization effectiveness is usually measured in terms of the average percentage of faults detected concerned with the number of test executions, rather than the actual regression testing time, making it unclear which technique is optimal in actual regression testing time. To answer this question, this paper first conducts an empirical study to investigate the actual regression testing time of various prioritization techniques. The results reveal a number of practical guidelines. In particular, no prioritization technique can always perform optimal in practice. To achieve the optimal prioritization effectiveness for any given project in practice, based on the findings of this study, we design learning-based Predictive Test Prioritization (PTP). PTP predicts the optimal prioritization technique for a given project based on the test distribution analysis (i.e., the distribution of test coverage, testing time, and coverage per unit time). The results show that PTP correctly predicts the optimal prioritization technique for 46 out of 50 open-source projects from GitHub, outperforming state-of-the-art techniques significantly in regression testing time, e.g., 43.16% to 94.92% improvement in detecting the first regression fault. Furthermore, PTP has been successfully integrated into the practical testing infrastructure of Baidu (a search service provider with over 600M monthly active users), and received positive feedbacks from the testing team of this company, e.g., saving beyond 2X testing costs with negligible overheads. @InProceedings{ESEC/FSE18p656, author = {Junjie Chen and Yiling Lou and Lingming Zhang and Jianyi Zhou and Xiaoleng Wang and Dan Hao and Lu Zhang}, title = {Optimizing Test Prioritization via Test Distribution Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {656--667}, doi = {10.1145/3236024.3236053}, year = {2018}, } Publisher's Version |
|
Lu, Jian |
ESEC/FSE '18: "Testing Multithreaded Programs ..."
Testing Multithreaded Programs via Thread Speed Control
Dongjie Chen, Yanyan Jiang, Chang Xu, Xiaoxing Ma, and Jian Lu (Nanjing University, China) A multithreaded program's interleaving space is discrete and astronomically large, making effectively sampling thread schedules for manifesting concurrency bugs a challenging task. Observing that concurrency bugs can be manifested by adjusting thread relative speeds, this paper presents the new concept of speed space in which each vector denotes a family of thread schedules. A multithreaded program's speed space is approximately continuous, easy-to-sample, and preserves certain categories of concurrency bugs. We discuss the design, implementation, and evaluation of our speed-controlled scheduler for exploring adversarial/abnormal schedules. The experimental results confirm that our technique is effective in sampling diverse schedules. Our implementation also found previously unknown concurrency bugs in real-world multithreaded programs. @InProceedings{ESEC/FSE18p15, author = {Dongjie Chen and Yanyan Jiang and Chang Xu and Xiaoxing Ma and Jian Lu}, title = {Testing Multithreaded Programs via Thread Speed Control}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {15--25}, doi = {10.1145/3236024.3236077}, year = {2018}, } Publisher's Version Info |
|
Lu, Jie |
ESEC/FSE '18: "CloudRaid: Hunting Concurrency ..."
CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining
Jie Lu, Feng Li, Lian Li, and Xiaobing Feng (Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Institute of Information Engineering at Chinese Academy of Sciences, China) Cloud systems suffer from distributed concurrency bugs, which are notoriously difficult to detect and often lead to data loss and service outage. This paper presents CloudRaid, a new effective tool to battle distributed concurrency bugs. CloudRaid automatically detects concurrency bugs in cloud systems, by analyzing and testing those message orderings that are likely to expose errors. We observe that large-scale online cloud applications process millions of user requests per second, exercising many permutations of message orderings extensively. Those already sufficiently-tested message orderings are unlikely to expose errors. Hence, CloudRaid mines logs from previous executions to uncover those message orderings which are feasible, but not sufficiently tested. Specifically, CloudRaid tries to flip the order of a pair of messages <S,P> if they may happen in parallel, but S always arrives before P from existing logs, i.e., excercising the order P ↣ S. The log-based approach makes it suitable to live systems. We have applied CloudRaid to automatically test four representative distributed systems: Apache Hadoop2/Yarn, HBase, HDFS and Cassandra. CloudRaid can automatically test 40 different versions of the 4 systems (10 versions per system) in 35 hours, and can successfully trigger 28 concurrency bugs, including 8 new bugs that have never been found before. The 8 new bugs have all been confirmed by their original developers, and 3 of them are considered as critical bugs that have already been fixed. @InProceedings{ESEC/FSE18p3, author = {Jie Lu and Feng Li and Lian Li and Xiaobing Feng}, title = {CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {3--14}, doi = {10.1145/3236024.3236071}, year = {2018}, } Publisher's Version |
|
Lu, Shan |
ESEC/FSE '18-DEMO: "PowerStation: Automatically ..."
PowerStation: Automatically Detecting and Fixing Inefficiencies of Database-Backed Web Applications in IDE
Junwen Yang, Cong Yan, Pranav Subramaniam, Shan Lu, and Alvin Cheung (University of Chicago, USA; University of Washington, USA) Modern web applications are built using a myriad of software components, and each of them exposes different programming models (e.g., application logic expressed in an imperative language, database queries expressed using declarative SQL). To improve programmer productivity, Object Relational Mapping (ORM) frameworks have been developed to allow developers build web applications in an object-oriented manner. Despite such frameworks, prior work has found that developers still struggle in developing performant ORM-based web applications. This paper presents PowerStation, a RubyMine IDE plugin for optimizing web applications developed using the Ruby on Rails ORM. Using automated static analysis, PowerStation detects ORM-related inefficiency problems and suggests fixes to developers. Our evaluation using 12 real-world applications shows that PowerStation can automatically detects 1221 performance issues across them. A tutorial on using PowerStation can be found at https://youtu.be/rAV8CGuSj6k. @InProceedings{ESEC/FSE18p884, author = {Junwen Yang and Cong Yan and Pranav Subramaniam and Shan Lu and Alvin Cheung}, title = {PowerStation: Automatically Detecting and Fixing Inefficiencies of Database-Backed Web Applications in IDE}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {884--887}, doi = {10.1145/3236024.3264589}, year = {2018}, } Publisher's Version |
|
Lu, Yadong |
ESEC/FSE '18: "Oreo: Detection of Clones ..."
Oreo: Detection of Clones in the Twilight Zone
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes (University of California at Irvine, USA) Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect – the Twilight Zone. Most clone detectors reported in the literature fail to operate in this zone. We present Oreo, a novel approach to source code clone detection that not only detects Type-1 to Type-3 clones accurately, but is also capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. We evaluate the recall of Oreo on BigCloneBench, and perform manual evaluation for precision. Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity in a scalable manner @InProceedings{ESEC/FSE18p354, author = {Vaibhav Saini and Farima Farmahinifarahani and Yadong Lu and Pierre Baldi and Cristina V. Lopes}, title = {Oreo: Detection of Clones in the Twilight Zone}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {354--365}, doi = {10.1145/3236024.3236026}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Lyu, Michael R. |
ESEC/FSE '18-DEMO: "INFAR: Insight Extraction ..."
INFAR: Insight Extraction from App Reviews
Cuiyun Gao, Jichuan Zeng, David Lo, Chin-Yew Lin, Michael R. Lyu, and Irwin King (Chinese University of Hong Kong, China; Singapore Management University, Singapore; Microsoft Research, China) App reviews play an essential role for users to convey their feedback about using the app. The critical information contained in app reviews can assist app developers for maintaining and updating mobile apps. However, the noisy nature and large-quantity of daily generated app reviews make it difficult to understand essential information carried in app reviews. Several prior studies have proposed methods that can automatically classify or cluster user reviews into a few app topics (e.g., security). These methods usually act on a static collection of user reviews. However, due to the dynamic nature of user feedback (i.e., reviews keep coming as new users register or new app versions being released) and multiple analysis dimensions (e.g., review quantity and user rating), developers still need to spend substantial effort in extracting contrastive information that can only be teased out by comparing data from multiple time periods or analysis dimensions. This is needed to answer questions such as: what kind of issues users are experiencing most? is there an unexpected rise in a particular kind of issue? etc. To address this need, in this paper, we introduce INFAR, a tool that automatically extracts INsights From App Reviews across time periods and analysis dimensions, and presents them in natural language supported by an interactive chart. The insights INFAR extracts include several perspectives: (1) salient topics (i.e., issue topics with significantly lower ratings), (2) abnormal topics (i.e., issue topics that experience a rapid rise in volume during a time period), (3) correlations between two topics, and (4) causal factors to rating or review quantity changes. To evaluate our tool, we conduct an empirical evaluation by involving six popular apps and 12 industrial practitioners, and 92% (11/12) of them approve the practical usefulness of the insights summarized by INFAR. Demo Tool Website: https://remine-lab.github.io/paper/infar.html Demo Video: https://youtu.be/MjcoiyjA5TE @InProceedings{ESEC/FSE18p904, author = {Cuiyun Gao and Jichuan Zeng and David Lo and Chin-Yew Lin and Michael R. Lyu and Irwin King}, title = {INFAR: Insight Extraction from App Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {904--907}, doi = {10.1145/3236024.3264595}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Identifying Impactful Service ..." Identifying Impactful Service System Problems via Log Analysis Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang (Chinese University of Hong Kong, China; Microsoft Research, China; University of Newcastle, Australia) Logs are often used for troubleshooting in large-scale software systems. For a cloud-based online system that provides 24/7 service, a huge number of logs could be generated every day. However, these logs are highly imbalanced in general, because most logs indicate normal system operations, and only a small percentage of logs reveal impactful problems. Problems that lead to the decline of system KPIs (Key Performance Indicators) are impactful and should be fixed by engineers with a high priority. Furthermore, there are various types of system problems, which are hard to be distinguished manually. In this paper, we propose Log3C, a novel clustering-based approach to promptly and precisely identify impactful system problems, by utilizing both log sequences (a sequence of log events) and system KPIs. More specifically, we design a novel cascading clustering algorithm, which can greatly save the clustering time while keeping high accuracy by iteratively sampling, clustering, and matching log sequences. We then identify the impactful problems by correlating the clusters of log sequences with system KPIs. Log3C is evaluated on real-world log data collected from an online service system at Microsoft, and the results confirm its effectiveness and efficiency. Furthermore, our approach has been successfully applied in industrial practice. @InProceedings{ESEC/FSE18p60, author = {Shilin He and Qingwei Lin and Jian-Guang Lou and Hongyu Zhang and Michael R. Lyu and Dongmei Zhang}, title = {Identifying Impactful Service System Problems via Log Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {60--70}, doi = {10.1145/3236024.3236083}, year = {2018}, } Publisher's Version |
|
Ma, Shiqing |
ESEC/FSE '18: "MODE: Automated Neural Network ..."
MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama (Purdue University, USA) Artificial intelligence models are becoming an integral part of modern computing systems. Just like software inevitably has bugs, models have bugs too, leading to poor classification/prediction accuracy. Unlike software bugs, model bugs cannot be easily fixed by directly modifying models. Existing solutions work by providing additional training inputs. However, they have limited effectiveness due to the lack of understanding of model misbehaviors and hence the incapability of selecting proper inputs. Inspired by software debugging, we propose a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing. Our evaluation results on 29 different models for 6 different applications show that our technique can fix model bugs effectively and efficiently without introducing new bugs. For simple applications (e.g., digit recognition), MODE improves the test accuracy from 75% to 93% on average whereas the state-of-the-art can only improve to 85% with 11 times more training time. For complex applications and models (e.g., object recognition), MODE is able to improve the accuracy from 75% to over 91% in minutes to a few hours, whereas state-of-the-art fails to fix the bug or even degrades the test accuracy. @InProceedings{ESEC/FSE18p175, author = {Shiqing Ma and Yingqi Liu and Wen-Chuan Lee and Xiangyu Zhang and Ananth Grama}, title = {MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {175--186}, doi = {10.1145/3236024.3236082}, year = {2018}, } Publisher's Version |
|
Ma, Xiaoxing |
ESEC/FSE '18: "Testing Multithreaded Programs ..."
Testing Multithreaded Programs via Thread Speed Control
Dongjie Chen, Yanyan Jiang, Chang Xu, Xiaoxing Ma, and Jian Lu (Nanjing University, China) A multithreaded program's interleaving space is discrete and astronomically large, making effectively sampling thread schedules for manifesting concurrency bugs a challenging task. Observing that concurrency bugs can be manifested by adjusting thread relative speeds, this paper presents the new concept of speed space in which each vector denotes a family of thread schedules. A multithreaded program's speed space is approximately continuous, easy-to-sample, and preserves certain categories of concurrency bugs. We discuss the design, implementation, and evaluation of our speed-controlled scheduler for exploring adversarial/abnormal schedules. The experimental results confirm that our technique is effective in sampling diverse schedules. Our implementation also found previously unknown concurrency bugs in real-world multithreaded programs. @InProceedings{ESEC/FSE18p15, author = {Dongjie Chen and Yanyan Jiang and Chang Xu and Xiaoxing Ma and Jian Lu}, title = {Testing Multithreaded Programs via Thread Speed Control}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {15--25}, doi = {10.1145/3236024.3236077}, year = {2018}, } Publisher's Version Info |
|
Maddox, Jackson |
ESEC/FSE '18: "Large-Scale Study of Substitutability ..."
Large-Scale Study of Substitutability in the Presence of Effects
Jackson Maddox, Yuheng Long, and Hridesh Rajan (Iowa State University, USA; Google, USA) A majority of modern software is constructed using languages that compute by producing side-effects such as reading/writing from/to files, throwing exceptions, acquiring locks, etc. To understand a piece of software, e.g. a class, it is important for a developer to understand its side-effects. Similarly, to replace a class with another, it is important to understand whether the replacement is a safe substitution for the former in terms of its behavior, a property known as substitutability, because mismatch may lead to bugs. The problem is especially severe for superclass-subclass pairs since at runtime an instance of the subclass may be used in the client code where a superclass is mentioned. Despite the importance of this property, we do not yet know whether substitutability w.r.t. effects between subclass and superclass is preserved in the wild, and if not what sorts of substitutability violations are common and what is the impact of such violations. This paper conducts a large scale study on over 20 million Java classes, in order to compare the effects of the methods of subclasses and superclasses in practice. Our comprehensive study considers the exception, synchronization, I/O, and method call effects. It reveals that in pairs with effects, only 8-24% have the same effects, and 31-56% of submethods have more effects, and the effects of a large percentage of submethods cannot be inferred from the supermethod. @InProceedings{ESEC/FSE18p528, author = {Jackson Maddox and Yuheng Long and Hridesh Rajan}, title = {Large-Scale Study of Substitutability in the Presence of Effects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {528--538}, doi = {10.1145/3236024.3236075}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Mansoor, Niloofar |
ESEC/FSE '18-IND: "Modeling and Testing a Family ..."
Modeling and Testing a Family of Surgical Robots: An Experience Report
Niloofar Mansoor, Jonathan A. Saddler, Bruno Silva, Hamid Bagheri, Myra B. Cohen, and Shane Farritor (University of Nebraska-Lincoln, USA) Safety-critical applications often use dependability cases to validate that specified properties are invariant, or to demonstrate a counter example showing how that property might be violated. However, most dependability cases are written with a single product in mind. At the same time, software product lines (families of related software products) have been studied with the goal of modeling variability and commonality, and building family based techniques for both analysis and testing. However, there has been little work on building an end to end dependability case for a software product line (where a property is modeled, a counter example is found and then validated as a true positive via testing), and none that we know of in an emerging safety-critical domain, that of robotic surgery. In this paper, we study a family of surgical robots, that combine hardware and software, and are highly configurable, representing over 1300 unique robots. At the same time, they are considered safety-critical and should have associated dependability cases. We perform a case study to understand how we can bring together lightweight formal analysis, feature modeling, and testing to provide an end to end pipeline to find potential violations of important safety properties. In the process, we learned that there are some interesting and open challenges for the research community, which if solved will lead towards more dependable safety-critical cyber-physical systems. @InProceedings{ESEC/FSE18p785, author = {Niloofar Mansoor and Jonathan A. Saddler and Bruno Silva and Hamid Bagheri and Myra B. Cohen and Shane Farritor}, title = {Modeling and Testing a Family of Surgical Robots: An Experience Report}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {785--790}, doi = {10.1145/3236024.3275534}, year = {2018}, } Publisher's Version |
|
Maoz, Shahar |
ESEC/FSE '18: "Using Finite-State Models ..."
Using Finite-State Models for Log Differencing
Hen Amar, Lingfeng Bao, Nimrod Busany, David Lo, and Shahar Maoz (Tel Aviv University, Israel; Zhejiang University, Singapore; Singapore Management University, Singapore) Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest. In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails. We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments. @InProceedings{ESEC/FSE18p49, author = {Hen Amar and Lingfeng Bao and Nimrod Busany and David Lo and Shahar Maoz}, title = {Using Finite-State Models for Log Differencing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {49--59}, doi = {10.1145/3236024.3236069}, year = {2018}, } Publisher's Version ESEC/FSE '18: "Modify, Enhance, Select: Co-Evolution ..." Modify, Enhance, Select: Co-Evolution of Combinatorial Models and Test Plans Rachel Tzoref-Brill and Shahar Maoz (Tel Aviv University, Israel; IBM, Israel) The evolution of software introduces many challenges to its testing. Considerable test maintenance efforts are dedicated to the adaptation of the tests to the changing software. As a result, over time, the test repository may inflate and drift away from an optimal test plan for the software version at hand. Combinatorial Testing (CT) is a well-known test design technique to achieve a small and effective test plan. It requires a manual definition of the test space in the form of a combinatorial model, and then automatically generates a test plan design, which maximizes the added value of each of the tests. CT is considered a best practice, however its applicability to evolving software is hardly explored. In this work, we introduce a first co-evolution approach for combinatorial models and test plans. By combining three building blocks, to minimally modify existing tests, to enhance them, and to select from them, we provide five alternatives for co-evolving the test plan with the combinatorial model, considering tradeoffs between maximizing fine-grained reuse and minimizing total test plan size, all while meeting the required combinatorial coverage. We use our solution to co-evolve test plans of 48 real-world industrial models with 68 version commits. The results demonstrate the need for co-evolution as well as the efficiency and effectiveness of our approach and its implementation. We further report on an industrial project that found our co-evolution solution necessary to enable adoption of CT with an agile development process. @InProceedings{ESEC/FSE18p235, author = {Rachel Tzoref-Brill and Shahar Maoz}, title = {Modify, Enhance, Select: Co-Evolution of Combinatorial Models and Test Plans}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {235--245}, doi = {10.1145/3236024.3236067}, year = {2018}, } Publisher's Version Info |
|
Marinov, Darko |
ESEC/FSE '18-DEMO: "ASketch: A Sketching Framework ..."
ASketch: A Sketching Framework for Alloy
Kaiyuan Wang, Allison Sullivan, Darko Marinov, and Sarfraz Khurshid (University of Texas at Austin, USA; University of Illinois at Urbana-Champaign, USA) Alloy is a declarative modeling language that supports first-order logic with transitive closure. Alloy has been used in a variety of domains to model software systems and find design deficiencies. However, it is often challenging to make an Alloy model correct or to debug a faulty Alloy model. ASketch is a sketching/synthesis technique that can help users write correct Alloy models. ASketch allows users to provide a partial Alloy model with holes, a generator that specifies candidate fragments to be considered for each hole, and a set of tests that capture the desired model properties. Then, the tool completes the holes such that all tests for the completed model pass. ASketch uses tests written for the recently introduced AUnit framework, which provides a foundation of testing (unit tests, test execution, and model coverage) for Alloy models in the spirit of traditional unit testing. This paper describes our Java implementation of ASketch, which is a command-line tool, released as an open-source project on GitHub. Our experimental results show that ASketch can handle partial Alloy models with multiple holes and a large search space. The demo video for ASketch can be found at https://youtu.be/T5NIVsV329E. @InProceedings{ESEC/FSE18p916, author = {Kaiyuan Wang and Allison Sullivan and Darko Marinov and Sarfraz Khurshid}, title = {ASketch: A Sketching Framework for Alloy}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {916--919}, doi = {10.1145/3236024.3264594}, year = {2018}, } Publisher's Version |
|
Masuhara, Hidehiko |
ESEC/FSE '18: "Stochastic Energy Optimization ..."
Stochastic Energy Optimization for Mobile GPS Applications
Anthony Canino, Yu David Liu, and Hidehiko Masuhara (SUNY Binghamton, USA; Tokyo Institute of Technology, Japan) Mobile applications regularly interact with their noisy and ever-changing physical environment. The fundamentally uncertain nature of such interactions leads to significant challenges in energy optimization, a crucial goal of software engineering on mobile devices. This paper presents Aeneas, a novel energy optimization framework for Android in the presence of uncertainty. Aeneas provides a minimalistic programming model where acceptable program behavioral settings are abstracted as knobs and application-specific optimization goals — such as meeting an energy budget — are crystallized as rewards, both of which are directly programmable. At its heart, Aeneas is endowed with a stochastic optimizer to adaptively and intelligently select the reward-optimal knob setting through a form of reinforcement learning. We evaluate Aeneas on mobile GPS applications built over Google LocationService API. Through an in-field case study that covers approximately 6500 miles and 150 hours of driving as well as 20 hours of biking and hiking, we find that Aeneas can effectively and resiliently meet programmer-specified energy budgets in uncertain physical environments where individual GPS readings undergo significant fluctuation. Compared with non-stochastic approaches such as profile-guided optimization, Aeneas produces significantly more stable results across runs. @InProceedings{ESEC/FSE18p703, author = {Anthony Canino and Yu David Liu and Hidehiko Masuhara}, title = {Stochastic Energy Optimization for Mobile GPS Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {703--713}, doi = {10.1145/3236024.3236076}, year = {2018}, } Publisher's Version |
|
Mathur, Mihir |
ESEC/FSE '18-DEMO: "Augmenting Stack Overflow ..."
Augmenting Stack Overflow with API Usage Patterns Mined from GitHub
Anastasia Reinhardt, Tianyi Zhang, Mihir Mathur, and Miryung Kim (George Fox University, USA; University of California at Los Angeles, USA) Programmers often consult Q&A websites such as Stack Overflow (SO) to learn new APIs. However, online code snippets are not always complete or reliable in terms of API usage. To assess online code snippets, we build a Chrome extension, ExampleCheck that detects API usage violations in SO posts using API usage patterns mined from 380K GitHub projects. It quantifies how many GitHub examples follow common API usage and illustrates how to remedy the detected violation in a given SO snippet. With ExampleCheck, programmers can easily identify the pitfalls of a given SO snippet and learn how much it deviates from common API usage patterns in GitHub. The demo video is at https://youtu.be/WOnN-wQZsH0. @InProceedings{ESEC/FSE18p880, author = {Anastasia Reinhardt and Tianyi Zhang and Mihir Mathur and Miryung Kim}, title = {Augmenting Stack Overflow with API Usage Patterns Mined from GitHub}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {880--883}, doi = {10.1145/3236024.3264585}, year = {2018}, } Publisher's Version Video Info |
|
Mathur, Umang |
ESEC/FSE '18: "Data Race Detection on Compressed ..."
Data Race Detection on Compressed Traces
Dileep Kini, Umang Mathur, and Mahesh Viswanathan (Akuna Capital, USA; University of Illinois at Urbana-Champaign, USA) We consider the problem of detecting data races in program traces that have been compressed using straight line programs (SLP), which are special context-free grammars that generate exactly one string, namely the trace that they represent. We consider two classical approaches to race detection --- using the happens-before relation and the lockset discipline. We present algorithms for both these methods that run in time that is linear in the size of the compressed, SLP representation. Typical program executions almost always exhibit patterns that lead to significant compression. Thus, our algorithms are expected to result in large speedups when compared with analyzing the uncompressed trace. Our experimental evaluation of these new algorithms on standard benchmarks confirms this observation. @InProceedings{ESEC/FSE18p26, author = {Dileep Kini and Umang Mathur and Mahesh Viswanathan}, title = {Data Race Detection on Compressed Traces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {26--37}, doi = {10.1145/3236024.3236025}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
McMillan, Collin |
ESEC/FSE '18: "Detecting Speech Act Types ..."
Detecting Speech Act Types in Developer Question/Answer Conversations during Bug Repair
Andrew Wood, Paige Rodeghero, Ameer Armaly, and Collin McMillan (University of Notre Dame, USA; Clemson University, USA; Google, USA) This paper targets the problem of speech act detection in conversations about bug repair. We conduct a ``Wizard of Oz'' experiment with 30 professional programmers, in which the programmers fix bugs for two hours, and use a simulated virtual assistant for help. Then, we use an open coding manual annotation procedure to identify the speech act types in the conversations. Finally, we train and evaluate a supervised learning algorithm to automatically detect the speech act types in the conversations. In 30 two-hour conversations, we made 2459 annotations and uncovered 26 speech act types. Our automated detection achieved 69% precision and 50% recall. The key application of this work is to advance the state of the art for virtual assistants in software engineering. Virtual assistant technology is growing rapidly, though applications in software engineering are behind those in other areas, largely due to a lack of relevant data and experiments. This paper targets this problem in the area of developer Q/A conversations about bug repair. @InProceedings{ESEC/FSE18p491, author = {Andrew Wood and Paige Rodeghero and Ameer Armaly and Collin McMillan}, title = {Detecting Speech Act Types in Developer Question/Answer Conversations during Bug Repair}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {491--502}, doi = {10.1145/3236024.3236031}, year = {2018}, } Publisher's Version |
|
McNamara, Andrew |
ESEC/FSE '18-NIER: "Does ACM’s Code of Ethics ..."
Does ACM’s Code of Ethics Change Ethical Decision Making in Software Development?
Andrew McNamara, Justin Smith, and Emerson Murphy-Hill (North Carolina State University, USA) Ethical decisions in software development can substantially impact end-users, organizations, and our environment, as is evidenced by recent ethics scandals in the news. Organizations, like the ACM, publish codes of ethics to guide software-related ethical decisions. In fact, the ACM has recently demonstrated renewed interest in its code of ethics and made updates for the first time since 1992. To better understand how the ACM code of ethics changes software-related decisions, we replicated a prior behavioral ethics study with 63 software engineering students and 105 professional software developers, measuring their responses to 11 ethical vignettes. We found that explicitly instructing participants to consider the ACM code of ethics in their decision making had no observed effect when compared with a control group. Our findings suggest a challenge to the research community: if not a code of ethics, what techniques can improve ethical decision making in software engineering? @InProceedings{ESEC/FSE18p729, author = {Andrew McNamara and Justin Smith and Emerson Murphy-Hill}, title = {Does ACM’s Code of Ethics Change Ethical Decision Making in Software Development?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {729--733}, doi = {10.1145/3236024.3264833}, year = {2018}, } Publisher's Version |
|
Mechtaev, Sergey |
ESEC/FSE '18: "Symbolic Execution with Existential ..."
Symbolic Execution with Existential Second-Order Constraints
Sergey Mechtaev, Alberto Griggio, Alessandro Cimatti, and Abhik Roychoudhury (National University of Singapore, Singapore; Fondazione Bruno Kessler, Italy) Symbolic execution systematically explores program paths by solving path conditions --- formulas over symbolic variables. Typically, the symbolic variables range over numbers, arrays and strings. We introduce symbolic execution with existential second-order constraints --- an extension of traditional symbolic execution that allows symbolic variables to range over functions whose interpretations are restricted by a user-defined language. The aims of this new technique are twofold. First, it offers a general analysis framework that can be applied in multiple domains such as program repair and library modelling. Secondly, it addresses the path explosion problem of traditional first-order symbolic execution in certain applications. To realize this technique, we integrate symbolic execution with program synthesis. Specifically, we propose a method of second-order constraint solving that provides efficient proofs of unsatisfiability, which is critical for the performance of symbolic execution. Our evaluation shows that the proposed technique (1) helps to repair programs with loops by mitigating the path explosion, (2) can enable analysis of applications written against unavailable libraries by modelling these libraries from the usage context. @InProceedings{ESEC/FSE18p389, author = {Sergey Mechtaev and Alberto Griggio and Alessandro Cimatti and Abhik Roychoudhury}, title = {Symbolic Execution with Existential Second-Order Constraints}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {389--399}, doi = {10.1145/3236024.3236049}, year = {2018}, } Publisher's Version |
|
Meijer, Erik |
ESEC/FSE '18-INV: "Behind Every Great Deep Learning ..."
Behind Every Great Deep Learning Framework Is an Even Greater Programming Languages Concept (Keynote)
Erik Meijer (Delft University of Technology, Netherlands; Facebook, USA) In many areas, such as image recognition, natural language processing, search, recommendation, autonomous cars, systems software and infrastructure, and even Software Engineering tools themselves, Software 2.0 (= programming using learned models) is quickly swallowing Software 1.0 (= programming using handcrafted algorithms). Where the Software 1.0 Engineer formally specifies their problem, carefully designs algorithms, composes systems out of subsystems or decomposes complex systems into smaller components, the Software 2.0 Engineer amasses training data and simply feeds it into an ML algorithm that will synthesize an approximation of the function whose partial extensional definition is that training data. Instead of code as the artifact of interest, in Software 2.0 it is all about the data where compilation of source code is replaced by training models with data. This new style of programming has far-reaching consequences for traditional software engineering practices. Everything we have learned about life cycle models, project planning and estimation, requirements analysis, program design, construction, debugging, testing, maintenance and implementation, … runs the danger of becoming obsolete. One way to try to prepare for the new realities of software engineering is not to zero in on the differences between Software 1.0 and Software 2.0 but instead focus on their similarities. If you carefully look at what a neural net actually represents, you realize that in essence it is a pure function, from multi-dimensional arrays of floating point numbers to multi-dimensional arrays of floating point numbers (tensors). What is special about these functions is that they are differentiable (yes, exactly as you remember from middle school calculus), which allows them to be trained using back propagation. The programming language community has also discovered that there is a deep connection between back propagation and continuations. Moreover, when you look closely at how Software 2.0 Engineers construct complex neural nets like CNNs, RNNs, LSTMs, … you recognize they are (implicitly) using high-order combinators like map, fold, zip, scan, recursion, conditionals, function composition, … to compose complex neural network architectures out of simple building blocks. Constructing neural networks using pure and higher-order differentiable functions and training them using reverse-mode automatic differentiation is unsurprisingly called Differentiable Programming. This talk will illustrate the deep programming language principles behind Differentiable Programming, which will hopefully inspire the working Software 1.0 engineer to pay serious attention to the threats and opportunities of Software 2.0. @InProceedings{ESEC/FSE18p1, author = {Erik Meijer}, title = {Behind Every Great Deep Learning Framework Is an Even Greater Programming Languages Concept (Keynote)}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {1--1}, doi = {10.1145/3236024.3280855}, year = {2018}, } Publisher's Version |
|
Meinicke, Jens |
ESEC/FSE '18-NIER: "Beyond Testing Configurable ..."
Beyond Testing Configurable Systems: Applying Variational Execution to Automatic Program Repair and Higher Order Mutation Testing
Chu-Pan Wong, Jens Meinicke, and Christian Kästner (Carnegie Mellon University, USA; University of Magdeburg, Germany) Generate-and-validate automatic program repair and higher order mutation testing often use search-based techniques to find optimal or good enough solutions in huge search spaces. As search spaces continue to grow, finding solutions that require interactions of multiple changes can become challenging. To tackle the huge search space, we propose to use variational execution. Variational execution has been shown to be effective in exhaustively exploring variations and identifying interactions in a huge but often finite configuration space. The key idea is to encode alternatives in the search space as variations and use variational execution as a black-box technique to generate useful insights so that existing search heuristics can be informed. We show that this idea is promising and identify criteria for problems in which variational execution is a promising tool, which may be useful to identify further applications. @InProceedings{ESEC/FSE18p749, author = {Chu-Pan Wong and Jens Meinicke and Christian Kästner}, title = {Beyond Testing Configurable Systems: Applying Variational Execution to Automatic Program Repair and Higher Order Mutation Testing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {749--753}, doi = {10.1145/3236024.3264837}, year = {2018}, } Publisher's Version |
|
Meliou, Alexandra |
ESEC/FSE '18-DEMO: "Themis: Automatically Testing ..."
Themis: Automatically Testing Software for Discrimination
Rico Angell, Brittany Johnson, Yuriy Brun, and Alexandra Meliou (University of Massachusetts at Amherst, USA) Bias in decisions made by modern software is becoming a common and serious problem. We present Themis, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior. We explain how Themis can measure discrimination and aid its debugging, describe a set of optimizations Themis uses to reduce test suite size, and demonstrate Themis' effectiveness on open-source software. Themis is open-source and all our evaluation data are available at http://fairness.cs.umass.edu/. See a video of Themis in action: https://youtu.be/brB8wkaUesY @InProceedings{ESEC/FSE18p871, author = {Rico Angell and Brittany Johnson and Yuriy Brun and Alexandra Meliou}, title = {Themis: Automatically Testing Software for Discrimination}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {871--875}, doi = {10.1145/3236024.3264590}, year = {2018}, } Publisher's Version Video Info ESEC/FSE '18-NIER: "Software Fairness ..." Software Fairness Yuriy Brun and Alexandra Meliou (University of Massachusetts at Amherst, USA) A goal of software engineering research is advancing software quality and the success of the software engineering process. However, while recent studies have demonstrated a new kind of defect in software related to its ability to operate in fair and unbiased manner, software engineering has not yet wholeheartedly tackled these new kinds of defects, thus leaving software vulnerable. This paper outlines a vision for how software engineering research can help reduce fairness defects and represents a call to action by the software engineering research community to reify that vision. Modern software is riddled with examples of biased behavior, from automated translation injecting gender stereotypes, to vision systems failing to see faces of certain races, to the US criminal justice sytem relying on biased computational assessments of crime recidivism. While systems may learn bias from biased data, bias can also emerge from ambiguous or incomplete requirement specification, poor design, implementation bugs, and unintended component interactions. We argue that software fairness is analogous to software quality, and that numerous software engineering challenges in the areas of requirements, specification, design, testing, and verification need to be tackled to solve this problem. @InProceedings{ESEC/FSE18p754, author = {Yuriy Brun and Alexandra Meliou}, title = {Software Fairness}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {754--759}, doi = {10.1145/3236024.3264838}, year = {2018}, } Publisher's Version Info |
|
Meng, Guozhu |
ESEC/FSE '18-IND: "Are Mobile Banking Apps Secure? ..."
Are Mobile Banking Apps Secure? What Can Be Improved?
Sen Chen, Ting Su, Lingling Fan, Guozhu Meng, Minhui Xue, Yang Liu, and Lihua Xu (East China Normal University, China; Nanyang Technological University, Singapore; Chinese Academy of Sciences, China; Macquarie University, Australia; New York University Shanghai, China) Mobile banking apps, as one of the most contemporary FinTechs, have been widely adopted by banking entities to provide instant financial services. However, our recent work discovered thousands of vulnerabilities in 693 banking apps, which indicates these apps are not as secure as we expected. This motivates us to conduct this study for understanding the current security status of them. First, we take 6 months to track the reporting and patching procedure of these vulnerabilities. Second, we audit 4 state-of the-art vulnerability detection tools on those patched vulnerabilities. Third, we discuss with 7 banking entities via in-person or online meetings and conduct an online survey to gain more feedback from financial app developers. Through this study, we reveal that (1) people may have inconsistent understandings of the vulnerabilities and different criteria for rating severity; (2) state-of-the-art tools are not effective in detecting vulnerabilities that the banking entities most concern; and (3) more efforts should be endeavored in different aspects to secure banking apps. We believe our study can help bridge the existing gaps, and further motivate different parties, including banking entities, researchers and policy makers, to better tackle security issues altogether. @InProceedings{ESEC/FSE18p797, author = {Sen Chen and Ting Su and Lingling Fan and Guozhu Meng and Minhui Xue and Yang Liu and Lihua Xu}, title = {Are Mobile Banking Apps Secure? What Can Be Improved?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {797--802}, doi = {10.1145/3236024.3275523}, year = {2018}, } Publisher's Version |
|
Menzies, Tim |
ESEC/FSE '18: "Applications of Psychological ..."
Applications of Psychological Science for Actionable Analytics
Di Chen, Wei Fu, Rahul Krishna, and Tim Menzies (North Carolina State University, USA) According to psychological scientists, humans understand models that most match their own internal models, which they characterize as lists of "heuristic"s (i.e. lists of very succinct rules). One such heuristic rule generator is the Fast-and-Frugal Trees (FFT) preferred by psychological scientists. Despite their successful use in many applied domains, FFTs have not been applied in software analytics. Accordingly, this paper assesses FFTs for software analytics. We find that FFTs are remarkably effective in that their models are very succinct (5 lines or less describing a binary decision tree) while also outperforming result from very recent, top-level, conference papers. Also, when we restrict training data to operational attributes (i.e., those attributes that are frequently changed by developers), the performance of FFTs are not effected (while the performance of other learners can vary wildly). Our conclusions are two-fold. Firstly, there is much that software analytics community could learn from psychological science. Secondly, proponents of complex methods should always baseline those methods against simpler alternatives. For example, FFTs could be used as a standard baseline learner against which other software analytics tools are compared. @InProceedings{ESEC/FSE18p456, author = {Di Chen and Wei Fu and Rahul Krishna and Tim Menzies}, title = {Applications of Psychological Science for Actionable Analytics}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {456--467}, doi = {10.1145/3236024.3236050}, year = {2018}, } Publisher's Version |
|
Mesbah, Ali |
ESEC/FSE '18-DEMO: "Vista: Web Test Repair using ..."
Vista: Web Test Repair using Computer Vision
Andrea Stocco, Rahulkrishna Yandrapally, and Ali Mesbah (University of British Columbia, Canada) Repairing broken web element locators represents the major main- tenance cost of web test cases. To detect possible repairs, testers typically inspect the tests’ interactions with the application under test through the GUI. Existing automated test repair techniques focus instead on the code and ignore visual aspects of the applica- tion. In this demo paper, we give an overview of Vista, a novel test repair technique that leverages computer vision and local crawling to automatically suggest and apply repairs to broken web tests. URL: https://github.com/saltlab/Vista @InProceedings{ESEC/FSE18p876, author = {Andrea Stocco and Rahulkrishna Yandrapally and Ali Mesbah}, title = {Vista: Web Test Repair using Computer Vision}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {876--879}, doi = {10.1145/3236024.3264592}, year = {2018}, } Publisher's Version Video ESEC/FSE '18: "Visual Web Test Repair ..." Visual Web Test Repair Andrea Stocco, Rahulkrishna Yandrapally, and Ali Mesbah (University of British Columbia, Canada) Web tests are prone to break frequently as the application under test evolves, causing much maintenance effort in practice. To detect the root causes of a test breakage, developers typically inspect the test's interactions with the application through the GUI. Existing automated test repair techniques focus instead on the code and entirely ignore visual aspects of the application. We propose a test repair technique that is informed by a visual analysis of the application. Our approach captures relevant visual information from tests execution and analyzes them through a fast image processing pipeline to visually validate test cases as they re-executed for regression purposes. Then, it reports the occurrences of breakages and potential fixes to the testers. Our approach is also equipped with a local crawling mechanism to handle non-trivial breakage scenarios such as the ones that require to repair the test's workflow. We implemented our approach in a tool called Vista. Our empirical evaluation on 2,672 test cases spanning 86 releases of four web applications shows that Vista is able to repair, on average, 81% of the breakages, a 41% increment with respect to existing techniques. @InProceedings{ESEC/FSE18p503, author = {Andrea Stocco and Rahulkrishna Yandrapally and Ali Mesbah}, title = {Visual Web Test Repair}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {503--514}, doi = {10.1145/3236024.3236063}, year = {2018}, } Publisher's Version Video Info |
|
Miller, Senecca |
ESEC/FSE '18-IND: "Building Lean Continuous Integration ..."
Building Lean Continuous Integration and Delivery Pipelines by Applying DevOps Principles: A Case Study at Varidesk
Vidroha Debroy, Senecca Miller, and Lance Brimble (Varidesk, USA) Continuous Integration (CI) and Continuous Delivery (CD) are widely considered to be best practices in software development. Studies have shown however, that adopting these practices can be challenging and there are many barriers that engineers may face, such as – overly long build times, lack of support for desired workflows, issues with configuration, etc. At Varidesk, we recently began shifting our primary web application (from a monolithic) to a micro-services-based architecture and also adapted our software development practices to aim for more effective CI/CD. In doing so, we also ran into some of the same afore-mentioned barriers. In this paper we focus on two specific challenges that we faced – long wait times for builds/releases to be queued and completed, and the lack of support for tooling, especially from a cross-cloud perspective. We then present the solutions that we came up with, which involved re-thinking DevOps as it applied to us, and re-building our own CI/CD pipelines based on DevOps-supporting approaches such as containerization, infrastructure-as-code, and orchestration. Our re-designed pipelines have led us to see speed increases, in terms of total build/release time, in the range of 330x-1110x and have enabled us to seamlessly move from a single-cloud to a multi- cloud environment, with no architectural changes to any apps. @InProceedings{ESEC/FSE18p851, author = {Vidroha Debroy and Senecca Miller and Lance Brimble}, title = {Building Lean Continuous Integration and Delivery Pipelines by Applying DevOps Principles: A Case Study at Varidesk}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {851--856}, doi = {10.1145/3236024.3275528}, year = {2018}, } Publisher's Version |
|
Minku, Leandro L. |
ESEC/FSE '18: "A Novel Automated Approach ..."
A Novel Automated Approach for Software Effort Estimation Based on Data Augmentation
Liyan Song, Leandro L. Minku, and Xin Yao (Southern University of Science and Technology, China; University of Birmingham, UK) Software effort estimation (SEE) usually suffers from data scarcity problem due to the expensive or long process of data collection. As a result, companies usually have limited projects for effort estimation, causing unsatisfactory prediction performance. Few studies have investigated strategies to generate additional SEE data to aid such learning. We aim to propose a synthetic data generator to address the data scarcity problem of SEE. Our synthetic generator enlarges the SEE data set size by slightly displacing some randomly chosen training examples. It can be used with any SEE method as a data preprocessor. Its effectiveness is justified with 6 state-of-the-art SEE models across 14 SEE data sets. We also compare our data generator against the only existing approach in the SEE literature. Experimental results show that our synthetic projects can significantly improve the performance of some SEE methods especially when the training data is insufficient. When they cannot significantly improve the prediction performance, they are not detrimental either. Besides, our synthetic data generator is significantly superior or perform similarly to its competitor in the SEE literature. Therefore, our data generator plays a non-harmful if not significantly beneficial effect on the SEE methods investigated in this paper. Therefore, it is helpful in addressing the data scarcity problem of SEE. @InProceedings{ESEC/FSE18p468, author = {Liyan Song and Leandro L. Minku and Xin Yao}, title = {A Novel Automated Approach for Software Effort Estimation Based on Data Augmentation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {468--479}, doi = {10.1145/3236024.3236052}, year = {2018}, } Publisher's Version |
|
Misailovic, Sasa |
ESEC/FSE '18: "Testing Probabilistic Programming ..."
Testing Probabilistic Programming Systems
Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic (University of Illinois at Urbana-Champaign, USA) Probabilistic programming systems (PP systems) allow developers to model stochastic phenomena and perform efficient inference on the models. The number and adoption of probabilistic programming systems is growing significantly. However, there is no prior study of bugs in these systems and no methodology for systematically testing PP systems. Yet, testing PP systems is highly non-trivial, especially when they perform approximate inference. In this paper, we characterize 118 previously reported bugs in three open-source PP systems—Edward, Pyro and Stan—and pro- pose ProbFuzz, an extensible system for testing PP systems. Prob- Fuzz allows a developer to specify templates of probabilistic models, from which it generates concrete probabilistic programs and data for testing. ProbFuzz uses language-specific translators to generate these concrete programs, which use the APIs of each PP system. ProbFuzz finds potential bugs by checking the output from running the generated programs against several oracles, including an accu- racy checker. Using ProbFuzz, we found 67 previously unknown bugs in recent versions of these PP systems. Developers already accepted 51 bug fixes that we submitted to the three PP systems, and their underlying systems, PyTorch and TensorFlow. @InProceedings{ESEC/FSE18p574, author = {Saikat Dutta and Owolabi Legunsen and Zixin Huang and Sasa Misailovic}, title = {Testing Probabilistic Programming Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {574--586}, doi = {10.1145/3236024.3236057}, year = {2018}, } Publisher's Version |
|
Miucin, Svetozar |
ESEC/FSE '18: "Performance Comprehension ..."
Performance Comprehension at WiredTiger
Alexandra Fedorova, Craig Mustard, Ivan Beschastnikh, Julia Rubin, Augustine Wong, Svetozar Miucin, and Louis Ye (University of British Columbia, Canada) Software debugging is a time-consuming and challenging process. Supporting debugging has been a focus of the software engineering field since its inception with numerous empirical studies, theories, and tools to support developers in this task. Performance bugs and performance debugging is a sub-genre of debugging that has received less attention. In this paper we contribute an empirical case study of performance bug diagnosis in the WiredTiger project, the default database engine behind MongoDB. We perform an in-depth analysis of 44 Jira tickets documenting WiredTiger performance-related issues. We investigate how developers diagnose performance bugs: what information they collect, what tools they use, and what processes they follow. Our findings show that developers spend the majority of their performance debugging time chasing outlier events, such as latency spikes and throughput drops. Yet, they are not properly supported by existing performance debugging tools in this task. We also observe that developers often use tools without knowing in advance whether the obtained information will be relevant to debugging the problem. Therefore, we believe developers can benefit from tools that can be used for unstructured exploration of performance data, rather than for answering specific questions. @InProceedings{ESEC/FSE18p83, author = {Alexandra Fedorova and Craig Mustard and Ivan Beschastnikh and Julia Rubin and Augustine Wong and Svetozar Miucin and Louis Ye}, title = {Performance Comprehension at WiredTiger}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {83--94}, doi = {10.1145/3236024.3236081}, year = {2018}, } Publisher's Version |
|
Møller, Anders |
ESEC/FSE '18: "Practical AJAX Race Detection ..."
Practical AJAX Race Detection for JavaScript Web Applications
Christoffer Quist Adamsen, Anders Møller, Saba Alimadadi, and Frank Tip (Aarhus University, Denmark; Northeastern University, USA) Asynchronous client-server communication is a common source of errors in JavaScript web applications. Such errors are difficult to detect using ordinary testing because of the nondeterministic scheduling of AJAX events. Existing automated event race detectors are generally too imprecise or too inefficient to be practically useful. To address this problem, we present a new approach based on a lightweight combination of dynamic analysis and controlled execution that directly targets identification of harmful AJAX event races. We experimentally demonstrate using our implementation, AjaxRacer, that this approach is capable of automatically detecting harmful AJAX event races in many websites, and producing informative error messages that support diagnosis and debugging. Among 20 widely used web pages that use AJAX, AjaxRacer discovers harmful AJAX races in 12 of them, with a total of 72 error reports, and with very few false positives. @InProceedings{ESEC/FSE18p38, author = {Christoffer Quist Adamsen and Anders Møller and Saba Alimadadi and Frank Tip}, title = {Practical AJAX Race Detection for JavaScript Web Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {38--48}, doi = {10.1145/3236024.3236038}, year = {2018}, } Publisher's Version Info Artifacts Reusable Artifacts Functional ESEC/FSE '18: "Scalability-First Pointer ..." Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis (Aarhus University, Denmark; University of Athens, Greece) Context-sensitivity is important in pointer analysis to ensure high precision, but existing techniques suffer from unpredictable scalability. Many variants of context-sensitivity exist, and it is difficult to choose one that leads to reasonable analysis time and obtains high precision, without running the analysis multiple times. We present the Scaler framework that addresses this problem. Scaler efficiently estimates the amount of points-to information that would be needed to analyze each method with different variants of context-sensitivity. It then selects an appropriate variant for each method so that the total amount of points-to information is bounded, while utilizing the available space to maximize precision. Our experimental results demonstrate that Scaler achieves predictable scalability for all the evaluated programs (e.g., speedups can reach 10x for 2-object-sensitivity), while providing a precision that matches or even exceeds that of the best alternative techniques. @InProceedings{ESEC/FSE18p129, author = {Yue Li and Tian Tan and Anders Møller and Yannis Smaragdakis}, title = {Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--140}, doi = {10.1145/3236024.3236041}, year = {2018}, } Publisher's Version Info Artifacts Functional |
|
Monteiro, Felipe R. |
ESEC/FSE '18-NIER: "Towards Counterexample-Guided ..."
Towards Counterexample-Guided k-Induction for Fast Bug Detection
Mikhail R. Gadelha, Felipe R. Monteiro, Lucas C. Cordeiro, and Denis A. Nicole (University of Southampton, UK; Federal University of Amazonas, Brazil; University of Manchester, UK) Recently, the k-induction algorithm has proven to be a successful approach for both finding bugs and proving correctness. However, since the algorithm is an incremental approach, it might waste resources trying to prove incorrect programs. In this paper, we extend the k-induction algorithm to shorten the number of steps required to find a property violation. We convert the algorithm into a meet-in-the-middle bidirectional search algorithm, using the counterexample produced from over-approximating the program. The main advantage is in the reduction of the state explosion by reducing the maximum required steps from k to ⌊k/2 + 1⌋. @InProceedings{ESEC/FSE18p765, author = {Mikhail R. Gadelha and Felipe R. Monteiro and Lucas C. Cordeiro and Denis A. Nicole}, title = {Towards Counterexample-Guided k-Induction for Fast Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {765--769}, doi = {10.1145/3236024.3264840}, year = {2018}, } Publisher's Version Video Info |
|
Mori, Akira |
ESEC/FSE '18: "Automated Patch Extraction ..."
Automated Patch Extraction via Syntax- and Semantics-Aware Delta Debugging on Source Code Changes
Masatomo Hashimoto, Akira Mori, and Tomonori Izumida (Chiba Institute of Technology, Japan; AIST, Japan; IIJ Innovation Institute, Japan) Delta debugging (DD) is an approach to automating the debugging activities based on systematic testing. DD algorithms find the cause of a regression of a program by minimizing the changes applied between a working version and a faulty version of the program. However, it is still an open problem to minimize a huge set of changes while avoiding any invalid subsets that do not result in testable programs, especially in case that no software configuration management system is available. In this paper, we propose a rule-based approach to syntactic and semantic decomposition of changes into independent components to facilitate DD on source code changes, and hence to extract patches automatically. For analyzing changes, we make use of tree differencing on abstract syntax trees instead of common differencing on plain texts. We have developed an experimental implementation for Java programs and applied it to 194 bug fixes from Defects4J and 8 real-life regression bugs from 6 open source Java projects. Compared to a DD tool based on plain text differencing, it extracted patches whose size is reduced by 50% at the cost of 5% more test executions for the former dataset and by 73% at the cost of 40% more test executions for the latter, both on average. @InProceedings{ESEC/FSE18p598, author = {Masatomo Hashimoto and Akira Mori and Tomonori Izumida}, title = {Automated Patch Extraction via Syntax- and Semantics-Aware Delta Debugging on Source Code Changes}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {598--609}, doi = {10.1145/3236024.3236047}, year = {2018}, } Publisher's Version Info |
|
Mougouei, Davoud |
ESEC/FSE '18-NIER: "Operationalizing Human Values ..."
Operationalizing Human Values in Software: A Research Roadmap
Davoud Mougouei, Harsha Perera, Waqar Hussain, Rifat Shams, and Jon Whittle (Monash University, Australia) Software influences several aspects of people's lives and therefore needs to reflect their values. However, existing software engineering methods fail to account for human values, which may result in breaching those values in software and, therefore, dissatisfaction of users and loss of profit and reputation. To avoid such negative consequences, human values need to be integrated -- in a verifiable way -- into software. We refer to this as Operationalizing Human Values in Software. But this is not easy to achieve due to three main obstacles: first, human values are hard to define in a way that can be put into practice; second, existing software design decisions are mainly ignorant of values; finally, values are hard to determine and quantify in software. This paper aims to establish a research roadmap for overcoming these obstacles. The proposed roadmap focuses on (i) establishing practical definitions for human values, (ii) integrating values into software design, and (iii) measuring values in the software development life cycle. @InProceedings{ESEC/FSE18p780, author = {Davoud Mougouei and Harsha Perera and Waqar Hussain and Rifat Shams and Jon Whittle}, title = {Operationalizing Human Values in Software: A Research Roadmap}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {780--784}, doi = {10.1145/3236024.3264843}, year = {2018}, } Publisher's Version |
|
Murphy-Hill, Emerson |
ESEC/FSE '18: "How Should Compilers Explain ..."
How Should Compilers Explain Problems to Developers?
Titus Barik, Denae Ford, Emerson Murphy-Hill, and Chris Parnin (Microsoft, USA; North Carolina State University, USA) Compilers primarily give feedback about problems to developers through the use of error messages. Unfortunately, developers routinely find these messages to be confusing and unhelpful. In this paper, we postulate that because error messages present poor explanations, theories of explanation---such as Toulmin's model of argument---can be applied to improve their quality. To understand how compilers should present explanations to developers, we conducted a comparative evaluation with 68 professional software developers and an empirical study of compiler error messages found in Stack Overflow questions across seven different programming languages. Our findings suggest that, given a pair of error messages, developers significantly prefer the error message that employs proper argument structure over a deficient argument structure when neither offers a resolution---but will accept a deficient argument structure if it provides a resolution to the problem. Human-authored explanations on Stack Overflow converge to one of the three argument structures: those that provide a resolution to the error, simple arguments, and extended arguments that provide additional evidence for the problem. Finally, we contribute three practical design principles to inform the design and evaluation of compiler error messages. @InProceedings{ESEC/FSE18p633, author = {Titus Barik and Denae Ford and Emerson Murphy-Hill and Chris Parnin}, title = {How Should Compilers Explain Problems to Developers?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {633--643}, doi = {10.1145/3236024.3236040}, year = {2018}, } Publisher's Version ESEC/FSE '18-NIER: "Does ACM’s Code of Ethics ..." Does ACM’s Code of Ethics Change Ethical Decision Making in Software Development? Andrew McNamara, Justin Smith, and Emerson Murphy-Hill (North Carolina State University, USA) Ethical decisions in software development can substantially impact end-users, organizations, and our environment, as is evidenced by recent ethics scandals in the news. Organizations, like the ACM, publish codes of ethics to guide software-related ethical decisions. In fact, the ACM has recently demonstrated renewed interest in its code of ethics and made updates for the first time since 1992. To better understand how the ACM code of ethics changes software-related decisions, we replicated a prior behavioral ethics study with 63 software engineering students and 105 professional software developers, measuring their responses to 11 ethical vignettes. We found that explicitly instructing participants to consider the ACM code of ethics in their decision making had no observed effect when compared with a control group. Our findings suggest a challenge to the research community: if not a code of ethics, what techniques can improve ethical decision making in software engineering? @InProceedings{ESEC/FSE18p729, author = {Andrew McNamara and Justin Smith and Emerson Murphy-Hill}, title = {Does ACM’s Code of Ethics Change Ethical Decision Making in Software Development?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {729--733}, doi = {10.1145/3236024.3264833}, year = {2018}, } Publisher's Version |
|
Mustard, Craig |
ESEC/FSE '18: "Performance Comprehension ..."
Performance Comprehension at WiredTiger
Alexandra Fedorova, Craig Mustard, Ivan Beschastnikh, Julia Rubin, Augustine Wong, Svetozar Miucin, and Louis Ye (University of British Columbia, Canada) Software debugging is a time-consuming and challenging process. Supporting debugging has been a focus of the software engineering field since its inception with numerous empirical studies, theories, and tools to support developers in this task. Performance bugs and performance debugging is a sub-genre of debugging that has received less attention. In this paper we contribute an empirical case study of performance bug diagnosis in the WiredTiger project, the default database engine behind MongoDB. We perform an in-depth analysis of 44 Jira tickets documenting WiredTiger performance-related issues. We investigate how developers diagnose performance bugs: what information they collect, what tools they use, and what processes they follow. Our findings show that developers spend the majority of their performance debugging time chasing outlier events, such as latency spikes and throughput drops. Yet, they are not properly supported by existing performance debugging tools in this task. We also observe that developers often use tools without knowing in advance whether the obtained information will be relevant to debugging the problem. Therefore, we believe developers can benefit from tools that can be used for unstructured exploration of performance data, rather than for answering specific questions. @InProceedings{ESEC/FSE18p83, author = {Alexandra Fedorova and Craig Mustard and Ivan Beschastnikh and Julia Rubin and Augustine Wong and Svetozar Miucin and Louis Ye}, title = {Performance Comprehension at WiredTiger}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {83--94}, doi = {10.1145/3236024.3236081}, year = {2018}, } Publisher's Version |
|
Naik, Mayur |
ESEC/FSE '18: "Syntax-Guided Synthesis of ..."
Syntax-Guided Synthesis of Datalog Programs
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik (University of Pennsylvania, USA; Hanyang University, South Korea; University of Wisconsin-Madison, USA) Datalog has witnessed promising applications in a variety of domains. We propose a programming-by-example system, ALPS, to synthesize Datalog programs from input-output examples. Scaling synthesis to realistic programs in this manner is challenging due to the rich expressivity of Datalog. We present a syntax-guided synthesis approach that prunes the search space by exploiting the observation that in practice Datalog programs comprise rules that have similar latent syntactic structure. We evaluate ALPS on a suite of 34 benchmarks from three domains—knowledge discovery, program analysis, and database queries. The evaluation shows that ALPS can synthesize 33 of these benchmarks, and outperforms the state-of-the-art tools Metagol and Zaatar, which can synthesize only up to 10 of the benchmarks. @InProceedings{ESEC/FSE18p515, author = {Xujie Si and Woosuk Lee and Richard Zhang and Aws Albarghouthi and Paraschos Koutris and Mayur Naik}, title = {Syntax-Guided Synthesis of Datalog Programs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {515--527}, doi = {10.1145/3236024.3236034}, year = {2018}, } Publisher's Version Artifacts Functional |
|
Nguyen, Anh Tuan |
ESEC/FSE '18: "Complementing Global and Local ..."
Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks
Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen (Iowa State University, USA; University of Texas at Dallas, USA; Axon, USA) When being trained on API documentation and tutorials, Word2vec produces vector representations to estimate the relevance between texts and API elements. However, existing Word2vec-based approaches to measure document similarities aggregate Word2vec vectors of individual words or APIs to build the representation of a document as if the words are independent. Thus, the semantics of API descriptions or code fragments are not well represented. In this work, we introduce D2Vec, a new model that fits with API documentation better than Word2vec. D2Vec is a neural network model that considers two complementary contexts to better capture the semantics of API documentation. We first connect the global context of the current API topic under description to all the text phrases within the description of that API. Second, the local orders of words and API elements in the text phrases are maintained in computing the vector representations for the APIs. We conducted an experiment to verify two intrinsic properties of D2Vec's vectors: 1) similar words and relevant API elements are projected into nearby locations; and 2) some vector operations carry semantics. We demonstrate the usefulness and good performance of D2Vec in three applications: API code search (text-to-code retrieval), API tutorial fragment search (code-to-text retrieval), and mining API mappings between software libraries (code-to-code retrieval). Finally, we provide actionable insights and implications for researchers in using our model in other applications with other types of documents. @InProceedings{ESEC/FSE18p551, author = {Thanh Nguyen and Ngoc Tran and Hung Phan and Trong Nguyen and Linh Truong and Anh Tuan Nguyen and Hoan Anh Nguyen and Tien N. Nguyen}, title = {Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {551--562}, doi = {10.1145/3236024.3236036}, year = {2018}, } Publisher's Version |
|
Nguyen, Hoan Anh |
ESEC/FSE '18: "Complementing Global and Local ..."
Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks
Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen (Iowa State University, USA; University of Texas at Dallas, USA; Axon, USA) When being trained on API documentation and tutorials, Word2vec produces vector representations to estimate the relevance between texts and API elements. However, existing Word2vec-based approaches to measure document similarities aggregate Word2vec vectors of individual words or APIs to build the representation of a document as if the words are independent. Thus, the semantics of API descriptions or code fragments are not well represented. In this work, we introduce D2Vec, a new model that fits with API documentation better than Word2vec. D2Vec is a neural network model that considers two complementary contexts to better capture the semantics of API documentation. We first connect the global context of the current API topic under description to all the text phrases within the description of that API. Second, the local orders of words and API elements in the text phrases are maintained in computing the vector representations for the APIs. We conducted an experiment to verify two intrinsic properties of D2Vec's vectors: 1) similar words and relevant API elements are projected into nearby locations; and 2) some vector operations carry semantics. We demonstrate the usefulness and good performance of D2Vec in three applications: API code search (text-to-code retrieval), API tutorial fragment search (code-to-text retrieval), and mining API mappings between software libraries (code-to-code retrieval). Finally, we provide actionable insights and implications for researchers in using our model in other applications with other types of documents. @InProceedings{ESEC/FSE18p551, author = {Thanh Nguyen and Ngoc Tran and Hung Phan and Trong Nguyen and Linh Truong and Anh Tuan Nguyen and Hoan Anh Nguyen and Tien N. Nguyen}, title = {Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {551--562}, doi = {10.1145/3236024.3236036}, year = {2018}, } Publisher's Version |
|
Nguyen, Son |
ESEC/FSE '18-SRC: "Feature-Interaction Aware ..."
Feature-Interaction Aware Configuration Prioritization
Son Nguyen (University of Texas at Dallas, USA) Unexpected interactions among features induce most bugs in a configurable software system. Exhaustively analyzing all exponential number of possible configurations is prohibitively costly. Thus, various sampling methods have been proposed to systematically narrow down the exponential number of configurations to be tested. Since testing all selected configurations can require a huge amount of effort, fault-based configuration prioritization, that helps detect bugs earlier, can yield practical benefits in quality assurance. In this paper, we propose CoPo, a novel formulation of feature-interaction bugs via common program entities enabled/disabled by the features. Leveraging from that, we develop an efficient feature-interaction-aware configuration prioritization technique for configurable systems by ranking configurations according to their total number of potential bugs. We conducted several experiments to evaluate CoPo on a public benchmark. We found that CoPo outperforms the state-of-the-art configuration prioritization methods. Interestingly, it is able to detect 17 not-yet-discovered feature-interaction bugs. @InProceedings{ESEC/FSE18p974, author = {Son Nguyen}, title = {Feature-Interaction Aware Configuration Prioritization}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {974--976}, doi = {10.1145/3236024.3275437}, year = {2018}, } Publisher's Version |
|
Nguyen, Thanh |
ESEC/FSE '18: "Complementing Global and Local ..."
Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks
Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen (Iowa State University, USA; University of Texas at Dallas, USA; Axon, USA) When being trained on API documentation and tutorials, Word2vec produces vector representations to estimate the relevance between texts and API elements. However, existing Word2vec-based approaches to measure document similarities aggregate Word2vec vectors of individual words or APIs to build the representation of a document as if the words are independent. Thus, the semantics of API descriptions or code fragments are not well represented. In this work, we introduce D2Vec, a new model that fits with API documentation better than Word2vec. D2Vec is a neural network model that considers two complementary contexts to better capture the semantics of API documentation. We first connect the global context of the current API topic under description to all the text phrases within the description of that API. Second, the local orders of words and API elements in the text phrases are maintained in computing the vector representations for the APIs. We conducted an experiment to verify two intrinsic properties of D2Vec's vectors: 1) similar words and relevant API elements are projected into nearby locations; and 2) some vector operations carry semantics. We demonstrate the usefulness and good performance of D2Vec in three applications: API code search (text-to-code retrieval), API tutorial fragment search (code-to-text retrieval), and mining API mappings between software libraries (code-to-code retrieval). Finally, we provide actionable insights and implications for researchers in using our model in other applications with other types of documents. @InProceedings{ESEC/FSE18p551, author = {Thanh Nguyen and Ngoc Tran and Hung Phan and Trong Nguyen and Linh Truong and Anh Tuan Nguyen and Hoan Anh Nguyen and Tien N. Nguyen}, title = {Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {551--562}, doi = {10.1145/3236024.3236036}, year = {2018}, } Publisher's Version |
|
Nguyen, Tien N. |
ESEC/FSE '18: "Complementing Global and Local ..."
Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks
Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen (Iowa State University, USA; University of Texas at Dallas, USA; Axon, USA) When being trained on API documentation and tutorials, Word2vec produces vector representations to estimate the relevance between texts and API elements. However, existing Word2vec-based approaches to measure document similarities aggregate Word2vec vectors of individual words or APIs to build the representation of a document as if the words are independent. Thus, the semantics of API descriptions or code fragments are not well represented. In this work, we introduce D2Vec, a new model that fits with API documentation better than Word2vec. D2Vec is a neural network model that considers two complementary contexts to better capture the semantics of API documentation. We first connect the global context of the current API topic under description to all the text phrases within the description of that API. Second, the local orders of words and API elements in the text phrases are maintained in computing the vector representations for the APIs. We conducted an experiment to verify two intrinsic properties of D2Vec's vectors: 1) similar words and relevant API elements are projected into nearby locations; and 2) some vector operations carry semantics. We demonstrate the usefulness and good performance of D2Vec in three applications: API code search (text-to-code retrieval), API tutorial fragment search (code-to-text retrieval), and mining API mappings between software libraries (code-to-code retrieval). Finally, we provide actionable insights and implications for researchers in using our model in other applications with other types of documents. @InProceedings{ESEC/FSE18p551, author = {Thanh Nguyen and Ngoc Tran and Hung Phan and Trong Nguyen and Linh Truong and Anh Tuan Nguyen and Hoan Anh Nguyen and Tien N. Nguyen}, title = {Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {551--562}, doi = {10.1145/3236024.3236036}, year = {2018}, } Publisher's Version |
|
Nguyen, Trong |
ESEC/FSE '18: "Complementing Global and Local ..."
Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks
Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen (Iowa State University, USA; University of Texas at Dallas, USA; Axon, USA) When being trained on API documentation and tutorials, Word2vec produces vector representations to estimate the relevance between texts and API elements. However, existing Word2vec-based approaches to measure document similarities aggregate Word2vec vectors of individual words or APIs to build the representation of a document as if the words are independent. Thus, the semantics of API descriptions or code fragments are not well represented. In this work, we introduce D2Vec, a new model that fits with API documentation better than Word2vec. D2Vec is a neural network model that considers two complementary contexts to better capture the semantics of API documentation. We first connect the global context of the current API topic under description to all the text phrases within the description of that API. Second, the local orders of words and API elements in the text phrases are maintained in computing the vector representations for the APIs. We conducted an experiment to verify two intrinsic properties of D2Vec's vectors: 1) similar words and relevant API elements are projected into nearby locations; and 2) some vector operations carry semantics. We demonstrate the usefulness and good performance of D2Vec in three applications: API code search (text-to-code retrieval), API tutorial fragment search (code-to-text retrieval), and mining API mappings between software libraries (code-to-code retrieval). Finally, we provide actionable insights and implications for researchers in using our model in other applications with other types of documents. @InProceedings{ESEC/FSE18p551, author = {Thanh Nguyen and Ngoc Tran and Hung Phan and Trong Nguyen and Linh Truong and Anh Tuan Nguyen and Hoan Anh Nguyen and Tien N. Nguyen}, title = {Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {551--562}, doi = {10.1145/3236024.3236036}, year = {2018}, } Publisher's Version |
|
Nguyen Quang Do, Lisa |
ESEC/FSE '18-NIER: "Gamifying Static Analysis ..."
Gamifying Static Analysis
Lisa Nguyen Quang Do and Eric Bodden (University of Paderborn, Germany; Fraunhofer IEM, Germany) In the past decades, static code analysis has become a prevalent means to detect bugs and security vulnerabilities in software systems. As software becomes more complex, analysis tools also report lists of increasingly complex warnings that developers need to address on a daily basis. The novel insight we present in this work is that static analysis tools and video games both require users to take on repetitive and challenging tasks. Importantly, though, while good video games manage to keep players engaged, static analysis tools are notorious for their lacking user experience, which prevents developers from using them to their full potential, frequently resulting in dissatisfaction and even tool abandonment. We show parallels between gaming and using static analysis tools, and advocate that the user-experience issues of analysis tools can be addressed by looking at the analysis tooling system as a whole, and by integrating gaming elements that keep users engaged, such as providing immediate and clear feedback, collaborative problem solving, or motivators such as points and badges. @InProceedings{ESEC/FSE18p714, author = {Lisa Nguyen Quang Do and Eric Bodden}, title = {Gamifying Static Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {714--718}, doi = {10.1145/3236024.3264830}, year = {2018}, } Publisher's Version Info |
|
Nicole, Denis A. |
ESEC/FSE '18-NIER: "Towards Counterexample-Guided ..."
Towards Counterexample-Guided k-Induction for Fast Bug Detection
Mikhail R. Gadelha, Felipe R. Monteiro, Lucas C. Cordeiro, and Denis A. Nicole (University of Southampton, UK; Federal University of Amazonas, Brazil; University of Manchester, UK) Recently, the k-induction algorithm has proven to be a successful approach for both finding bugs and proving correctness. However, since the algorithm is an incremental approach, it might waste resources trying to prove incorrect programs. In this paper, we extend the k-induction algorithm to shorten the number of steps required to find a property violation. We convert the algorithm into a meet-in-the-middle bidirectional search algorithm, using the counterexample produced from over-approximating the program. The main advantage is in the reduction of the state explosion by reducing the maximum required steps from k to ⌊k/2 + 1⌋. @InProceedings{ESEC/FSE18p765, author = {Mikhail R. Gadelha and Felipe R. Monteiro and Lucas C. Cordeiro and Denis A. Nicole}, title = {Towards Counterexample-Guided k-Induction for Fast Bug Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {765--769}, doi = {10.1145/3236024.3264840}, year = {2018}, } Publisher's Version Video Info |
|
Niu, Wenyu |
ESEC/FSE '18-IND: "FACTS: Automated Black-Box ..."
FACTS: Automated Black-Box Testing of FinTech Systems
Qingshun Wang, Lintao Gu, Minhui Xue, Lihua Xu, Wenyu Niu, Liang Dou, Liang He, and Tao Xie (East China Normal University, China; Macquarie University, Australia; New York University Shanghai, China; CFETS Information Technology, China; University of Illinois at Urbana-Champaign, USA) FinTech, short for ``financial technology,'' has advanced the process of transforming financial business from a traditional manual-process-driven to an automation-driven model by providing various software platforms. However, the current FinTech-industry still heavily depends on manual testing, which becomes the bottleneck of FinTech industry development. To automate the testing process, we propose an approach of black-box testing for a FinTech system with effective tool support for both test generation and test oracles. For test generation, we first extract input categories from business-logic specifications, and then mutate real data collected from system logs with values randomly picked from each extracted input category. For test oracles, we propose a new technique of priority differential testing where we evaluate execution results of system-test inputs on the system's head (i.e., latest) version in the version repository (1) against the last legacy version in the version repository (only when the executed test inputs are on new, not-yet-deployed services) and (2) against both the currently-deployed version and the last legacy version (only when the test inputs are on existing, deployed services). When we rank the behavior-inconsistency results for developers to inspect, for the latter case, we give the currently-deployed version as a higher-priority source of behavior to check. We apply our approach to the CSTP subsystem, one of the largest data processing and forwarding modules of the China Foreign Exchange Trade System (CFETS) platform, whose annual total transaction volume reaches 150 trillion US dollars. Extensive experimental results show that our approach can substantially boost the branch coverage by approximately 40%, and is also efficient to identify common faults in the FinTech system. @InProceedings{ESEC/FSE18p839, author = {Qingshun Wang and Lintao Gu and Minhui Xue and Lihua Xu and Wenyu Niu and Liang Dou and Liang He and Tao Xie}, title = {FACTS: Automated Black-Box Testing of FinTech Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {839--844}, doi = {10.1145/3236024.3275533}, year = {2018}, } Publisher's Version |
|
Nocera, Francesco |
ESEC/FSE '18-SRC: "Reshaping Distributed Agile ..."
Reshaping Distributed Agile and Adaptive Development Environment
Francesco Nocera (Polytechnic University of Bari, Italy) Towards the interest of (Collaborative Networked) Organizations in the adoption of emerging technologies to support communication, collaboration and monitoring needs of their Distributed Agile and Adaptive Development Environment (DADE), a tool based on the emerging Liquid Multi-Device Software paradigm is presented. @InProceedings{ESEC/FSE18p968, author = {Francesco Nocera}, title = {Reshaping Distributed Agile and Adaptive Development Environment}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {968--970}, doi = {10.1145/3236024.3275435}, year = {2018}, } Publisher's Version |
|
Noei, Ehsan |
ESEC/FSE '18: "Winning the App Production ..."
Winning the App Production Rally
Ehsan Noei, Daniel Alencar Da Costa, and Ying Zou (Queen's University, Canada) When a user looks for an Android app in Google Play Store, a number of apps appear in a specific rank. Mobile apps with higher ranks are more likely to be noticed and downloaded by users. The goal of this work is to understand the evolution of ranks and identify the variables that share a strong relationship with ranks. We explore 900 apps with a total of 4,878,011 user-reviews in 30 app development areas. We discover 13 clusters of rank trends. We observe that the majority of the subject apps (i.e., 61%) dropped in the rankings over the two years of our study. By applying a regression model, we find the variables that statistically significantly explain the rank trends, such as the number of releases. Moreover, we build a mixed effects model to study the changes in ranks across apps and various versions of each app. We find that not all the variables that common-wisdom would deem important have a significant relationship with ranks. Furthermore, app developers should not be afraid of a late entry into the market as new apps can achieve higher ranks than existing apps. Finally, we present the findings to 51 developers. According to the feedback, the findings can help app developers to achieve better ranks in Google Play Store. @InProceedings{ESEC/FSE18p283, author = {Ehsan Noei and Daniel Alencar Da Costa and Ying Zou}, title = {Winning the App Production Rally}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {283--294}, doi = {10.1145/3236024.3236044}, year = {2018}, } Publisher's Version |
|
Octeau, Damien |
ESEC/FSE '18: "Neural-Augmented Static Analysis ..."
Neural-Augmented Static Analysis of Android Communication
Jinman Zhao, Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, and Damien Octeau (University of Wisconsin-Madison, USA; Google, USA) We address the problem of discovering communication links between applications in the popular Android mobile operating system, an important problem for security and privacy in Android. Any scalable static analysis in this complex setting is bound to produce an excessive amount of false-positives, rendering it impractical. To improve precision, we propose to augment static analysis with a trained neural-network model that estimates the probability that a communication link truly exists. We describe a neural-network architecture that encodes abstractions of communicating objects in two applications and estimates the probability with which a link indeed exists. At the heart of our architecture are type-directed encoders (TDE), a general framework for elegantly constructing encoders of a compound data type by recursively composing encoders for its constituent types. We evaluate our approach on a large corpus of Android applications, and demonstrate that it achieves very high accuracy. Further, we conduct thorough interpretability studies to understand the internals of the learned neural networks. @InProceedings{ESEC/FSE18p342, author = {Jinman Zhao and Aws Albarghouthi and Vaibhav Rastogi and Somesh Jha and Damien Octeau}, title = {Neural-Augmented Static Analysis of Android Communication}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {342--353}, doi = {10.1145/3236024.3236066}, year = {2018}, } Publisher's Version |
|
Oh, Hakjoo |
ESEC/FSE '18: "MemFix: Static Analysis-Based ..."
MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C
Junhee Lee, Seongjoon Hong, and Hakjoo Oh (Korea University, South Korea) We present MemFix, an automated technique for fixing memory deallocation errors in C programs. MemFix aims to fix memory-leak, double-free, and use-after-free errors, which occur when developers fail to properly deallocate memory objects. MemFix attempts to fix these errors by finding a set of free-statements that correctly deallocate all allocated objects without causing double-frees and use-after-frees. The key insight behind MemFix is that finding such a set of deallocation statements corresponds to solving an exact cover problem derived from a variant of typestate static analysis. We formally present the technique and experimentally show that MemFix is able to fix real errors found in open-source programs. Because MemFix is based on a sound static analysis, the generated patches guarantee to fix the original errors without introducing new errors. @InProceedings{ESEC/FSE18p95, author = {Junhee Lee and Seongjoon Hong and Hakjoo Oh}, title = {MemFix: Static Analysis-Based Repair of Memory Deallocation Errors for C}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--106}, doi = {10.1145/3236024.3236079}, year = {2018}, } Publisher's Version |
|
Ore, John-Paul |
ESEC/FSE '18: "Phys: Probabilistic Physical ..."
Phys: Probabilistic Physical Unit Assignment and Inconsistency Detection
Sayali Kate, John-Paul Ore, Xiangyu Zhang, Sebastian Elbaum, and Zhaogui Xu (Purdue University, USA; University of Nebraska-Lincoln, USA; University on Nebraska-Lincoln, USA; Nanjing University, China) Program variables used in robotic and cyber-physical systems often have implicit physical units that cannot be determined from their variable types. Inferring an abstract physical unit type for variables and checking their physical unit type consistency is of particular importance for validating the correctness of such systems. For instance, a variable with the unit of ‘meter’ should not be assigned to another variable with the unit of ‘degree-per-second’. Existing solutions have various limitations such as requiring developers to annotate variables with physical units and only handling variables that are directly or transitively used in popular robotic libraries with known physical unit information. We observe that there are a lot of physical unit hints in these softwares such as variable names and specific forms of expressions. These hints have uncertainty as developers may not respect conventions. We propose to model them with probability distributions and conduct probabilistic inference. At the end, our technique produces a unit distribution for each variable. Unit inconsistencies can then be detected using the highly probable unit assignments. Experimental results on 30 programs show that our technique can infer units for 159.3% more variables compared to the state-of-the-art with more than 88.7% true positives, and inconsistencies detection on 90 programs shows that our technique reports 103.3% more inconsistencies with 85.3% true positives. @InProceedings{ESEC/FSE18p563, author = {Sayali Kate and John-Paul Ore and Xiangyu Zhang and Sebastian Elbaum and Zhaogui Xu}, title = {Phys: Probabilistic Physical Unit Assignment and Inconsistency Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {563--573}, doi = {10.1145/3236024.3236035}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Pandey, Awanish |
ESEC/FSE '18: "Bug Synthesis: Challenging ..."
Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults
Subhajit Roy, Awanish Pandey, Brendan Dolan-Gavitt, and Yu Hu (IIT Kanpur, India; New York University, USA) In spite of decades of research in bug detection tools, there is a surprising dearth of ground-truth corpora that can be used to evaluate the efficacy of such tools. Recently, systems such as LAVA and EvilCoder have been proposed to automatically inject bugs into software to quickly generate large bug corpora, but the bugs created so far differ from naturally occurring bugs in a number of ways. In this work, we propose a new automated bug injection system, Apocalypse, that uses formal techniques—symbolic execution, constraint-based program synthesis and model counting—to automatically inject fair (can potentially be discovered by current bug-detection tools), deep (requiring a long sequence of dependencies to be satisfied to fire), uncorrelated (each bug behaving independent of others), reproducible (a trigger input being available) and rare (can be triggered by only a few program inputs) bugs in large software code bases. In our evaluation, we inject bugs into thirty Coreutils programs as well as the TCAS test suite. We find that bugs synthesized by Apocalypse are highly realistic under a variety of metrics, that they do not favor a particular bug-finding strategy (unlike bugs produced by LAVA), and that they are more difficult to find than manually injected bugs, requiring up around 240× more tests to discover with a state-of-the-art symbolic execution tool. @InProceedings{ESEC/FSE18p224, author = {Subhajit Roy and Awanish Pandey and Brendan Dolan-Gavitt and Yu Hu}, title = {Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {224--234}, doi = {10.1145/3236024.3236084}, year = {2018}, } Publisher's Version |
|
Park, Daejun |
ESEC/FSE '18-DEMO: "A Formal Verification Tool ..."
A Formal Verification Tool for Ethereum VM Bytecode
Daejun Park, Yi Zhang, Manasvi Saxena, Philip Daian, and Grigore Roşu (University of Illinois at Urbana-Champaign, USA; Runtime Verification, USA; Cornell Tech, USA) In this paper, we present a formal verification tool for the Ethereum Virtual Machine (EVM) bytecode. To precisely reason about all possible behaviors of the EVM bytecode, we adopted KEVM, a complete formal semantics of the EVM, and instantiated the K-framework's reachability logic theorem prover to generate a correct-by-construction deductive verifier for the EVM. We further optimized the verifier by introducing EVM-specific abstractions and lemmas to improve its scalability. Our EVM verifier has been used to verify various high-profile smart contracts including the ERC20 token, Ethereum Casper, and DappHub MakerDAO contracts. @InProceedings{ESEC/FSE18p912, author = {Daejun Park and Yi Zhang and Manasvi Saxena and Philip Daian and Grigore Roşu}, title = {A Formal Verification Tool for Ethereum VM Bytecode}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {912--915}, doi = {10.1145/3236024.3264591}, year = {2018}, } Publisher's Version Video Info |
|
Parnin, Chris |
ESEC/FSE '18: "How Should Compilers Explain ..."
How Should Compilers Explain Problems to Developers?
Titus Barik, Denae Ford, Emerson Murphy-Hill, and Chris Parnin (Microsoft, USA; North Carolina State University, USA) Compilers primarily give feedback about problems to developers through the use of error messages. Unfortunately, developers routinely find these messages to be confusing and unhelpful. In this paper, we postulate that because error messages present poor explanations, theories of explanation---such as Toulmin's model of argument---can be applied to improve their quality. To understand how compilers should present explanations to developers, we conducted a comparative evaluation with 68 professional software developers and an empirical study of compiler error messages found in Stack Overflow questions across seven different programming languages. Our findings suggest that, given a pair of error messages, developers significantly prefer the error message that employs proper argument structure over a deficient argument structure when neither offers a resolution---but will accept a deficient argument structure if it provides a resolution to the problem. Human-authored explanations on Stack Overflow converge to one of the three argument structures: those that provide a resolution to the error, simple arguments, and extended arguments that provide additional evidence for the problem. Finally, we contribute three practical design principles to inform the design and evaluation of compiler error messages. @InProceedings{ESEC/FSE18p633, author = {Titus Barik and Denae Ford and Emerson Murphy-Hill and Chris Parnin}, title = {How Should Compilers Explain Problems to Developers?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {633--643}, doi = {10.1145/3236024.3236040}, year = {2018}, } Publisher's Version |
|
Pauck, Felix |
ESEC/FSE '18: "Do Android Taint Analysis ..."
Do Android Taint Analysis Tools Keep Their Promises?
Felix Pauck, Eric Bodden, and Heike Wehrheim (University of Paderborn, Germany) In recent years, researchers have developed a number of tools to conduct taint analysis of Android applications. While all the respective papers aim at providing a thorough empirical evaluation, comparability is hindered by varying or unclear evaluation targets. Sometimes, the apps used for evaluation are not precisely described. In other cases, authors use an established benchmark but cover it only partially. In yet other cases, the evaluations differ in terms of the data leaks searched for, or lack a ground truth to compare against. All those limitations make it impossible to truly compare the tools based on those published evaluations. We thus present ReproDroid, a framework allowing the accurate comparison of Android taint analysis tools. ReproDroid supports researchers in inferring the ground truth for data leaks in apps, in automatically applying tools to benchmarks, and in evaluating the obtained results. We use ReproDroid to comparatively evaluate on equal grounds the six prominent taint analysis tools Amandroid, DIALDroid, DidFail, DroidSafe, FlowDroid and IccTA. The results are largely positive although four tools violate some promises concerning features and accuracy. Finally, we contribute to the area of unbiased benchmarking with a new and improved version of the open test suite DroidBench. @InProceedings{ESEC/FSE18p331, author = {Felix Pauck and Eric Bodden and Heike Wehrheim}, title = {Do Android Taint Analysis Tools Keep Their Promises?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {331--341}, doi = {10.1145/3236024.3236029}, year = {2018}, } Publisher's Version Artifacts Reusable Artifacts Functional |
|
Pedersen, John Kook |
ESEC/FSE '18: "Model Transformation Languages ..."
Model Transformation Languages under a Magnifying Glass: A Controlled Experiment with Xtend, ATL, and QVT
Regina Hebig, Christoph Seidl, Thorsten Berger, John Kook Pedersen, and Andrzej Wąsowski (Chalmers University of Technology, Sweden; University of Gothenburg, Sweden; TU Braunschweig, Germany; IT University of Copenhagen, Denmark) In Model-Driven Software Development, models are automatically processed to support the creation, build, and execution of systems. A large variety of dedicated model-transformation languages exists, promising to efficiently realize the automated processing of models. To investigate the actual benefit of using such specialized languages, we performed a large-scale controlled experiment in which over 78 subjects solve 231 individual tasks using three languages. The experiment sheds light on commonalities and differences between model transformation languages (ATL, QVT-O) and on benefits of using them in common development tasks (comprehension, change, and creation) against a modern general-purpose language (Xtend). Our results show no statistically significant benefit of using a dedicated transformation language over a modern general-purpose language. However, we were able to identify several aspects of transformation programming where domain-specific transformation languages do appear to help, including copying objects, context identification, and conditioning the computation on types. @InProceedings{ESEC/FSE18p445, author = {Regina Hebig and Christoph Seidl and Thorsten Berger and John Kook Pedersen and Andrzej Wąsowski}, title = {Model Transformation Languages under a Magnifying Glass: A Controlled Experiment with Xtend, ATL, and QVT}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {445--455}, doi = {10.1145/3236024.3236046}, year = {2018}, } Publisher's Version Info |
|
Perera, Harsha |
ESEC/FSE '18-NIER: "Operationalizing Human Values ..."
Operationalizing Human Values in Software: A Research Roadmap
Davoud Mougouei, Harsha Perera, Waqar Hussain, Rifat Shams, and Jon Whittle (Monash University, Australia) Software influences several aspects of people's lives and therefore needs to reflect their values. However, existing software engineering methods fail to account for human values, which may result in breaching those values in software and, therefore, dissatisfaction of users and loss of profit and reputation. To avoid such negative consequences, human values need to be integrated -- in a verifiable way -- into software. We refer to this as Operationalizing Human Values in Software. But this is not easy to achieve due to three main obstacles: first, human values are hard to define in a way that can be put into practice; second, existing software design decisions are mainly ignorant of values; finally, values are hard to determine and quantify in software. This paper aims to establish a research roadmap for overcoming these obstacles. The proposed roadmap focuses on (i) establishing practical definitions for human values, (ii) integrating values into software design, and (iii) measuring values in the software development life cycle. @InProceedings{ESEC/FSE18p780, author = {Davoud Mougouei and Harsha Perera and Waqar Hussain and Rifat Shams and Jon Whittle}, title = {Operationalizing Human Values in Software: A Research Roadmap}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {780--784}, doi = {10.1145/3236 |