ESEC/FSE 2017 – Author Index |
Contents -
Abstracts -
Authors
|
A B C D E F G H I J K L M N O P R S T U V W X Y Z
Aafer, Yousra |
ESEC/FSE '17: "LAMP: Data Provenance for ..."
LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation
Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, and Xiangyu Zhang (Purdue University, USA; Nanjing University, China) Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering. @InProceedings{ESEC/FSE17p786, author = {Shiqing Ma and Yousra Aafer and Zhaogui Xu and Wen-Chuan Lee and Juan Zhai and Yingqi Liu and Xiangyu Zhang}, title = {LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {786--797}, doi = {}, year = {2017}, } |
|
Abdalkareem, Rabe |
ESEC/FSE '17: "Why Do Developers Use Trivial ..."
Why Do Developers Use Trivial Packages? An Empirical Case Study on npm
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (Concordia University, Canada) Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call `trivial packages'. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers' beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be `deployment tested' and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use. @InProceedings{ESEC/FSE17p385, author = {Rabe Abdalkareem and Olivier Nourry and Sultan Wehaibi and Suhaib Mujahid and Emad Shihab}, title = {Why Do Developers Use Trivial Packages? An Empirical Case Study on npm}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {385--395}, doi = {}, year = {2017}, } |
|
Ahmed, Umair Z. |
ESEC/FSE '17: "A Feasibility Study of Using ..."
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury (Innopolis University, Russia; IIT Kanpur, India; National University of Singapore, Singapore) Despite the fact an intelligent tutoring system for programming (ITSP) education has long attracted interest, its widespread use has been hindered by the difficulty of generating personalized feedback automatically. Meanwhile, automated program repair (APR) is an emerging new technology that automatically fixes software bugs, and it has been shown that APR can fix the bugs of large real-world software. In this paper, we study the feasibility of marrying intelligent programming tutoring and APR. We perform our feasibility study with four state-of-the-art APR tools (GenProg, AE, Angelix, and Prophet), and 661 programs written by the students taking an introductory programming course. We found that when APR tools are used out of the box, only about 30% of the programs in our dataset are repaired. This low repair rate is largely due to the student programs often being significantly incorrect — in contrast, professional software for which APR was successfully applied typically fails only a small portion of tests. To bridge this gap, we adopt in APR a new repair policy akin to the hint generation policy employed in the existing ITSP. This new repair policy admits partial repairs that address part of failing tests, which results in 84% improvement of repair rate. We also performed a user study with 263 novice students and 37 graders, and identified an understudied problem; while novice students do not seem to know how to effectively make use of generated repairs as hints, the graders do seem to gain benefits from repairs. @InProceedings{ESEC/FSE17p740, author = {Jooyong Yi and Umair Z. Ahmed and Amey Karkare and Shin Hwei Tan and Abhik Roychoudhury}, title = {A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {740--751}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Albarghouthi, Aws |
ESEC/FSE '17: "Discovering Relational Specifications ..."
Discovering Relational Specifications
Calvin Smith, Gabriel Ferns, and Aws Albarghouthi (University of Wisconsin-Madison, USA) Formal specifications of library functions play a critical role in a number of program analysis and development tasks. We present Bach, a technique for discovering likely relational specifications from data describing input–output behavior of a set of functions comprising a library or a program. Relational specifications correlate different executions of different functions; for instance, commutativity, transitivity, equivalence of two functions, etc. Bach combines novel insights from program synthesis and databases to discover a rich array of specifications. We apply Bach to learn specifications from data generated for a number of standard libraries. Our experimental evaluation demonstrates Bach’s ability to learn useful and deep specifications in a small amount of time. @InProceedings{ESEC/FSE17p616, author = {Calvin Smith and Gabriel Ferns and Aws Albarghouthi}, title = {Discovering Relational Specifications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {616--626}, doi = {}, year = {2017}, } Best-Paper Award |
|
Aliabadi, Maryam Raiyat |
ESEC/FSE '17: "ARTINALI: Dynamic Invariant ..."
ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security
Maryam Raiyat Aliabadi, Amita Ajith Kamath, Julien Gascon-Samson, and Karthik Pattabiraman (University of British Columbia, Canada; National Institute of Technology Karnataka, India) Cyber-Physical Systems (CPSes) are being widely deployed in security critical scenarios such as smart homes and medical devices. Unfortunately, the connectedness of these systems and their relative lack of security measures makes them ripe targets for attacks. Specification-based Intrusion Detection Systems (IDS) have been shown to be effective for securing CPSs. Unfortunately, deriving invariants for capturing the specifications of CPS systems is a tedious and error-prone process. Therefore, it is important to dynamically monitor the CPS system to learn its common behaviors and formulate invariants for detecting security attacks. Existing techniques for invariant mining only incorporate data and events, but not time. However, time is central to most CPS systems, and hence incorporating time in addition to data and events, is essential for achieving low false positives and false negatives. This paper proposes ARTINALI, which mines dynamic system properties by incorporating time as a first-class property of the system. We build ARTINALI-based Intrusion Detection Systems (IDSes) for two CPSes, namely smart meters and smart medical devices, and measure their efficacy. We find that the ARTINALI-based IDSes significantly reduce the ratio of false positives and false negatives by 16 to 48% (average 30.75%) and 89 to 95% (average 93.4%) respectively over other dynamic invariant detection tools. @InProceedings{ESEC/FSE17p349, author = {Maryam Raiyat Aliabadi and Amita Ajith Kamath and Julien Gascon-Samson and Karthik Pattabiraman}, title = {ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {349--361}, doi = {}, year = {2017}, } |
|
Alrajeh, Dalal |
ESEC/FSE '17: "On Evidence Preservation Requirements ..."
On Evidence Preservation Requirements for Forensic-Ready Systems
Dalal Alrajeh, Liliana Pasquale, and Bashar Nuseibeh (Imperial College London, UK; University College Dublin, Ireland; Open University, UK; Lero, Ireland) Forensic readiness denotes the capability of a system to support digital forensic investigations of potential, known incidents by preserving in advance data that could serve as evidence explaining how an incident occurred. Given the increasing rate at which (potentially criminal) incidents occur, designing soware systems that are forensic-ready can facilitate and reduce the costs of digital forensic investigations. However, to date, little or no attention has been given to how forensic-ready software systems can be designed systematically. In this paper we propose to explicitly represent evidence preservation requirements prescribing preservation of the minimal amount of data that would be relevant to a future digital investigation. We formalise evidence preservation requirements and propose an approach for synthesising specifications for systems to meet these requirements. We present our prototype implementation—based on a satisfiability solver and a logic-based learner—which we use to evaluate our approach, applying it to two digital forensic corpora. Our evaluation suggests that our approach preserves relevant data that could support hypotheses of potential incidents. Moreover, it enables significant reduction in the volume of data that would need to be examined during an investigation. @InProceedings{ESEC/FSE17p559, author = {Dalal Alrajeh and Liliana Pasquale and Bashar Nuseibeh}, title = {On Evidence Preservation Requirements for Forensic-Ready Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {559--569}, doi = {}, year = {2017}, } |
|
Amann, Sven |
ESEC/FSE '17: "CodeMatch: Obfuscation Won't ..."
CodeMatch: Obfuscation Won't Conceal Your Repackaged App
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini (TU Darmstadt, Germany) An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app’s library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages @InProceedings{ESEC/FSE17p638, author = {Leonid Glanz and Sven Amann and Michael Eichberg and Michael Reif and Ben Hermann and Johannes Lerch and Mira Mezini}, title = {CodeMatch: Obfuscation Won't Conceal Your Repackaged App}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {638--648}, doi = {}, year = {2017}, } Info |
|
Amidon, Peter |
ESEC/FSE '17: "Automatic Inference of Code ..."
Automatic Inference of Code Transforms for Patch Generation
Fan Long, Peter Amidon, and Martin Rinard (Massachusetts Institute of Technology, USA; University of California at San Diego, USA) We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects. To the best of our knowledge, Genesis is the first system to automatically infer patch generation transforms or candidate patch search spaces from previous successful patches. @InProceedings{ESEC/FSE17p727, author = {Fan Long and Peter Amidon and Martin Rinard}, title = {Automatic Inference of Code Transforms for Patch Generation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {727--739}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Antonopoulos, Timos |
ESEC/FSE '17: "Counterexample-Guided Approach ..."
Counterexample-Guided Approach to Finding Numerical Invariants
ThanhVu Nguyen , Timos Antonopoulos , Andrew Ruef, and Michael Hicks (University of Nebraska-Lincoln, USA; Yale University, USA; University of Maryland, USA) Numerical invariants, e.g., relationships among numerical variables in a program, represent a useful class of properties to analyze programs. General polynomial invariants represent more complex numerical relations, but they are often required in many scientific and engineering applications. We present NumInv, a tool that implements a counterexample-guided invariant generation (CEGIR) technique to automatically discover numerical invariants, which are polynomial equality and inequality relations among numerical variables. This CEGIR technique infers candidate invariants from program traces and then checks them against the program source code using the KLEE test-input generation tool. If the invariants are incorrect KLEE returns counterexample traces, which help the dynamic inference obtain better results. Existing CEGIR approaches often require sound invariants, however NumInv sacrifices soundness and produces results that KLEE cannot refute within certain time bounds. This design and the use of KLEE as a verifier allow NumInv to discover useful and important numerical invariants for many challenging programs. Preliminary results show that NumInv generates required invariants for understanding and verifying correctness of programs involving complex arithmetic. We also show that NumInv discovers polynomial invariants that capture precise complexity bounds of programs used to benchmark existing static complexity analysis techniques. Finally, we show that NumInv performs competitively comparing to state of the art numerical invariant analysis tools. @InProceedings{ESEC/FSE17p605, author = {ThanhVu Nguyen and Timos Antonopoulos and Andrew Ruef and Michael Hicks}, title = {Counterexample-Guided Approach to Finding Numerical Invariants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {605--615}, doi = {}, year = {2017}, } |
|
Apel, Sven |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info ESEC/FSE '17: "Attributed Variability Models: ..." Attributed Variability Models: Outside the Comfort Zone Norbert Siegmund, Stefan Sobernig, and Sven Apel (Bauhaus-University Weimar, Germany; WU Vienna, Austria; University of Passau, Germany) Variability models are often enriched with attributes, such as performance, that encode the influence of features on the respective attribute. In spite of their importance, there are only few attributed variability models available that have attribute values obtained from empirical, real-world observations and that cover interactions between features. But, what does it mean for research and practice when staying in the comfort zone of developing algorithms and tools in a setting where artificial attribute values are used and where interactions are neglected? This is the central question that we want to answer here. To leave the comfort zone, we use a combination of kernel density estimation and a genetic algorithm to rescale a given (real-world) attribute-value profile to a given variability model. To demonstrate the influence and relevance of realistic attribute values and interactions, we present a replication of a widely recognized, third-party study, into which we introduce realistic attribute values and interactions. We found statistically significant differences between the original study and the replication. We infer lessons learned to conduct experiments that involve attributed variability models. We also provide the accompanying tool Thor for generating attribute values including interactions. Our solution is shown to be agnostic about the given input distribution and to scale to large variability models. @InProceedings{ESEC/FSE17p268, author = {Norbert Siegmund and Stefan Sobernig and Sven Apel}, title = {Attributed Variability Models: Outside the Comfort Zone}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {268--278}, doi = {}, year = {2017}, } Info ESEC/FSE '17: "Using Bad Learners to Find ..." Using Bad Learners to Find Good Configurations Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel (North Carolina State University, USA; Bauhaus-University Weimar, Germany; University of Passau, Germany) Finding the optimally performing configuration of a software system for a given setting is often challenging. Recent approaches address this challenge by learning performance models based on a sample set of configurations. However, building an accurate performance model can be very expensive (and is often infeasible in practice). The central insight of this paper is that exact performance values (e.g., the response time of a software system) are not required to rank configurations and to identify the optimal one. As shown by our experiments, performance models that are cheap to learn but inaccurate (with respect to the difference between actual and predicted performance) can still be used rank configurations and hence find the optimal configuration. This novel rank-based approach allows us to significantly reduce the cost (in terms of number of measurements of sample configuration) as well as the time required to build performance models. We evaluate our approach with 21 scenarios based on 9 software systems and demonstrate that our approach is beneficial in 16 scenarios; for the remaining 5 scenarios, an accurate model can be built by using very few samples anyway, without the need for a rank-based approach. @InProceedings{ESEC/FSE17p257, author = {Vivek Nair and Tim Menzies and Norbert Siegmund and Sven Apel}, title = {Using Bad Learners to Find Good Configurations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--267}, doi = {}, year = {2017}, } |
|
Atlee, Joanne M. |
ESEC/FSE '17: "Continuous Variable-Specific ..."
Continuous Variable-Specific Resolutions of Feature Interactions
M. Hadi Zibaeenejad, Chi Zhang, and Joanne M. Atlee (University of Waterloo, Canada) Systems that are assembled from independently developed features suffer from feature interactions, in which features affect one another’s behaviour in surprising ways. The Feature Interaction Problem results from trying to implement an appropriate resolution for each interaction within each possible context, because the number of possible contexts to consider increases exponentially with the number of features in the system. Resolution strategies aim to combat the Feature Interaction Problem by offering default strategies that resolve entire classes of interactions, thereby reducing the work needed to resolve lots of interactions. However most such approaches employ coarse-grained resolution strategies (e.g., feature priority) or a centralized arbitrator. Our work focuses on employing variable-specific default-resolution strategies that aim to resolve at runtime features’ conflicting actions on a system’s outputs. In this paper, we extend prior work to enable co-resolution of interactions on coupled output variables and to promote smooth continuous resolutions over execution paths. We implemented our approach within the PreScan simulator and performed a case study involving 15 automotive features; this entailed our devising and implementing three resolution strategies for three output variables. The results of the case study show that the approach produces smooth and continuous resolutions of interactions throughout interesting scenarios. @InProceedings{ESEC/FSE17p408, author = {M. Hadi Zibaeenejad and Chi Zhang and Joanne M. Atlee}, title = {Continuous Variable-Specific Resolutions of Feature Interactions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {408--418}, doi = {}, year = {2017}, } Info |
|
Aydin, Abdulbaki |
ESEC/FSE '17: "Constraint Normalization and ..."
Constraint Normalization and Parameterized Caching for Quantitative Program Analysis
Tegan Brennan, Nestan Tsiskaridze, Nicolás Rosner, Abdulbaki Aydin, and Tevfik Bultan (University of California at Santa Barbara, USA) Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments. @InProceedings{ESEC/FSE17p535, author = {Tegan Brennan and Nestan Tsiskaridze and Nicolás Rosner and Abdulbaki Aydin and Tevfik Bultan}, title = {Constraint Normalization and Parameterized Caching for Quantitative Program Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {535--546}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Bagherzadeh, Mojtaba |
ESEC/FSE '17: "Model-Level, Platform-Independent ..."
Model-Level, Platform-Independent Debugging in the Context of the Model-Driven Development of Real-Time Systems
Mojtaba Bagherzadeh, Nicolas Hili, and Juergen Dingel (Queen's University, Canada) Providing proper support for debugging models at model-level is one of the main barriers to a broader adoption of Model Driven Development (MDD). In this paper, we focus on the use of MDD for the development of real-time embedded systems (RTE). We introduce a new platform-independent approach to implement model-level debuggers. We describe how to realize support for model-level debugging entirely in terms of the modeling language and show how to implement this support in terms of a model-to-model transformation. Key advantages of the approach over existing work are that (1) it does not require a program debugger for the code generated from the model, and that (2) any changes to, e.g., the code generator, the target language, or the hardware platform leave the debugger completely unaffected. We also describe an implementation of the approach in the context of Papyrus-RT, an open source MDD tool based on the modeling language UML-RT. We summarize the results of the use of our model-based debugger on several use cases to determine its overhead in terms of size and performance. Despite being a prototype, the performance overhead is in the order of microseconds, while the size overhead is comparable with that of GDB, the GNU Debugger. @InProceedings{ESEC/FSE17p419, author = {Mojtaba Bagherzadeh and Nicolas Hili and Juergen Dingel}, title = {Model-Level, Platform-Independent Debugging in the Context of the Model-Driven Development of Real-Time Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {419--430}, doi = {}, year = {2017}, } Video Info Artifacts Functional |
|
Batory, Don |
ESEC/FSE '17: "Finding Near-Optimal Configurations ..."
Finding Near-Optimal Configurations in Product Lines by Random Sampling
Jeho Oh, Don Batory, Margaret Myers, and Norbert Siegmund (University of Texas at Austin, USA; Bauhaus-University Weimar, Germany) Software Product Lines (SPLs) are highly configurable systems. This raises the challenge to find optimal performing configurations for an anticipated workload. As SPL configuration spaces are huge, it is infeasible to benchmark all configurations to find an optimal one. Prior work focused on building performance models to predict and optimize SPL configurations. Instead, we randomly sample and recursively search a configuration space directly to find near-optimal configurations without constructing a prediction model. Our algorithms are simpler and have higher accuracy and efficiency. @InProceedings{ESEC/FSE17p61, author = {Jeho Oh and Don Batory and Margaret Myers and Norbert Siegmund}, title = {Finding Near-Optimal Configurations in Product Lines by Random Sampling}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {61--71}, doi = {}, year = {2017}, } |
|
Bavota, Gabriele |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info ESEC/FSE '17: "Detecting Missing Information ..." Detecting Missing Information in Bug Descriptions Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Begel, Andrew |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Bernal-Cárdenas, Carlos |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info |
|
Bethmann, Anja |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Bianchi, Francesco A. |
ESEC/FSE '17: "Reproducing Concurrency Failures ..."
Reproducing Concurrency Failures from Crash Stacks
Francesco A. Bianchi, Mauro Pezzè , and Valerio Terragni (University of Lugano, Switzerland) Reproducing field failures is the first essential step for understanding, localizing and removing faults. Reproducing concurrency field failures is hard due to the need of synthesizing a test code jointly with a thread interleaving that induce the failure in the presence of limited information from the field. Current techniques for reproducing concurrency failures focus on identifying failure-inducing interleavings, leaving largely open the problem of synthesizing the test code that manifests such interleavings. In this paper, we present ConCrash, a technique to automatically generate test codes that reproduce concurrency failures that violate thread-safety from crash stacks, which commonly summarize the conditions of field failures. ConCrash efficiently explores the huge space of possible test codes to identify a failure-inducing one by using a suitable set of search pruning strategies. Combined with existing techniques for exploring interleavings, ConCrash automatically reproduces a given concurrency failure that violates the thread-safety of a class by identifying both a failure-inducing test code and corresponding interleaving. In the paper, we define the ConCrash approach, present a prototype implementation of ConCrash, and discuss the experimental results that we obtained on a known set of ten field failures that witness the effectiveness of the approach. @InProceedings{ESEC/FSE17p705, author = {Francesco A. Bianchi and Mauro Pezzè and Valerio Terragni}, title = {Reproducing Concurrency Failures from Crash Stacks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {705--716}, doi = {}, year = {2017}, } |
|
Binkley, David |
ESEC/FSE '17: "Generalized Observational ..."
Generalized Observational Slicing for Tree-Represented Modelling Languages
Nicolas E. Gold, David Binkley, Mark Harman , Syed Islam, Jens Krinke, and Shin Yoo (University College London, UK; Loyola University Maryland, USA; University of East London, UK; KAIST, South Korea) Model-driven software engineering raises the abstraction level making complex systems easier to understand than if written in textual code. Nevertheless, large complicated software systems can have large models, motivating the need for slicing techniques that reduce the size of a model. We present a generalization of observation-based slicing that allows the criterion to be defined using a variety of kinds of observable behavior and does not require any complex dependence analysis. We apply our implementation of generalized observational slicing for tree-structured representations to Simulink models. The resulting slice might be the subset of the original model responsible for an observed failure or simply the sub-model semantically related to a classic slicing criterion. Unlike its predecessors, the algorithm is also capable of slicing embedded Stateflow state machines. A study of nine real-world models drawn from four different application domains demonstrates the effectiveness of our approach at dramatically reducing Simulink model sizes for realistic observation scenarios: for 9 out of 20 cases, the resulting model has fewer than 25% of the original model's elements. @InProceedings{ESEC/FSE17p547, author = {Nicolas E. Gold and David Binkley and Mark Harman and Syed Islam and Jens Krinke and Shin Yoo}, title = {Generalized Observational Slicing for Tree-Represented Modelling Languages}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {547--558}, doi = {}, year = {2017}, } |
|
Böhme, Marcel |
ESEC/FSE '17: "Where Is the Bug and How Is ..."
Where Is the Bug and How Is It Fixed? An Experiment with Practitioners
Marcel Böhme, Ezekiel O. Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller (National University of Singapore, Singapore; Saarland University, Germany; Singapore University of Technology and Design, Singapore; SAP, Germany) Research has produced many approaches to automatically locate, explain, and repair software bugs. But do these approaches relate to the way practitioners actually locate, understand, and fix bugs? To help answer this question, we have collected a dataset named DBGBENCH --- the correct fault locations, bug diagnoses, and software patches of 27 real errors in open-source C projects that were consolidated from hundreds of debugging sessions of professional software engineers. Moreover, we shed light on the entire debugging process, from constructing a hypothesis to submitting a patch, and how debugging time, difficulty, and strategies vary across practitioners and types of errors. Most notably, DBGBENCH can serve as reality check for novel automated debugging and repair techniques. @InProceedings{ESEC/FSE17p117, author = {Marcel Böhme and Ezekiel O. Soremekun and Sudipta Chattopadhyay and Emamurho Ugherughe and Andreas Zeller}, title = {Where Is the Bug and How Is It Fixed? An Experiment with Practitioners}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {117--128}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Brechmann, André |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Brennan, Tegan |
ESEC/FSE '17: "Constraint Normalization and ..."
Constraint Normalization and Parameterized Caching for Quantitative Program Analysis
Tegan Brennan, Nestan Tsiskaridze, Nicolás Rosner, Abdulbaki Aydin, and Tevfik Bultan (University of California at Santa Barbara, USA) Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments. @InProceedings{ESEC/FSE17p535, author = {Tegan Brennan and Nestan Tsiskaridze and Nicolás Rosner and Abdulbaki Aydin and Tevfik Bultan}, title = {Constraint Normalization and Parameterized Caching for Quantitative Program Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {535--546}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Brown, David Bingham |
ESEC/FSE '17: "The Care and Feeding of Wild-Caught ..."
The Care and Feeding of Wild-Caught Mutants
David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps (University of Wisconsin-Madison, USA) Mutation testing of a test suite and a program provides a way to measure the quality of the test suite. In essence, mutation testing is a form of sensitivity testing: by running mutated versions of the program against the test suite, mutation testing measures the suite’s sensitivity for detecting bugs that a programmer might introduce into the program. This paper introduces a technique to improve mutation testing that we call wild-caught mutants; it provides a method for creating potential faults that are more closely coupled with changes made by actual programmers. This technique allows the mutation tester to have more certainty that the test suite is sensitive to the kind of changes that have been observed to have been made by programmers in real-world cases. @InProceedings{ESEC/FSE17p511, author = {David Bingham Brown and Michael Vaughn and Ben Liblit and Thomas Reps}, title = {The Care and Feeding of Wild-Caught Mutants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {511--522}, doi = {}, year = {2017}, } Video Info Artifacts Reusable |
|
Brun, Yuriy |
ESEC/FSE '17: "Fairness Testing: Testing ..."
Fairness Testing: Testing Software for Discrimination
Sainyam Galhotra, Yuriy Brun , and Alexandra Meliou (University of Massachusetts at Amherst, USA) This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination. @InProceedings{ESEC/FSE17p498, author = {Sainyam Galhotra and Yuriy Brun and Alexandra Meliou}, title = {Fairness Testing: Testing Software for Discrimination}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {498--510}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Bultan, Tevfik |
ESEC/FSE '17: "Constraint Normalization and ..."
Constraint Normalization and Parameterized Caching for Quantitative Program Analysis
Tegan Brennan, Nestan Tsiskaridze, Nicolás Rosner, Abdulbaki Aydin, and Tevfik Bultan (University of California at Santa Barbara, USA) Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments. @InProceedings{ESEC/FSE17p535, author = {Tegan Brennan and Nestan Tsiskaridze and Nicolás Rosner and Abdulbaki Aydin and Tevfik Bultan}, title = {Constraint Normalization and Parameterized Caching for Quantitative Program Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {535--546}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Cai, Yan |
ESEC/FSE '17: "AtexRace: Across Thread and ..."
AtexRace: Across Thread and Execution Sampling for In-House Race Detection
Yu Guo, Yan Cai , and Zijiang Yang (Western Michigan University, USA; Institute of Software at Chinese Academy of Sciences, China) Data race is a major source of concurrency bugs. Dynamic data race detection tools (e.g., FastTrack) monitor the execu-tions of a program to report data races occurring in runtime. However, such tools incur significant overhead that slows down and perturbs executions. To address the issue, the state-of-the-art dynamic data race detection tools (e.g., LiteRace) ap-ply sampling techniques to selectively monitor memory access-es. Although they reduce overhead, they also miss many data races as confirmed by existing studies. Thus, practitioners face a dilemma on whether to use FastTrack, which detects more data races but is much slower, or LiteRace, which is faster but detects less data races. In this paper, we propose a new sam-pling approach to address the major limitations of current sampling techniques, which ignore the facts that a data race involves two threads and a program under testing is repeatedly executed. We develop a tool called AtexRace to sample memory accesses across both threads and executions. By selectively monitoring the pairs of memory accesses that have not been frequently observed in current and previous executions, AtexRace detects as many data races as FastTrack at a cost as low as LiteRace. We have compared AtexRace against FastTrack and LiteRace on both Parsec benchmark suite and a large-scale real-world MySQL Server with 223 test cases. The experiments confirm that AtexRace can be a replacement of FastTrack and LiteRace. @InProceedings{ESEC/FSE17p315, author = {Yu Guo and Yan Cai and Zijiang Yang}, title = {AtexRace: Across Thread and Execution Sampling for In-House Race Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {315--325}, doi = {}, year = {2017}, } ESEC/FSE '17: "Adaptively Generating High ..." Adaptively Generating High Quality Fixes for Atomicity Violations Yan Cai , Lingwei Cao, and Jing Zhao (Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Harbin Engineering University, China) It is difficult to fix atomicity violations correctly. Existing gate lock algorithm (GLA) simply inserts gate locks to serialize exe-cutions, which may introduce performance bugs and deadlocks. Synthesized context-aware gate locks (by Grail) require complex source code synthesis. We propose Fixer to adaptively fix ato-micity violations. It firstly analyses the lock acquisitions of an atomicity violation. Then it either adjusts the existing lock scope or inserts a gate lock. The former addresses cases where some locks are used but fail to provide atomic accesses. For the latter, it infers the visibility (being global or a field of a class/struct) of the gate lock such that the lock only protects related accesses. For both cases, Fixer further eliminates new lock orders to avoid introducing deadlocks. Of course, Fixer can produce both kinds of fixes on atomicity violations with locks. The experi-mental results on 15 previously used atomicity violations show that: Fixer correctly fixed all 15 atomicity violations without introducing deadlocks. However, GLA and Grail both intro-duced 5 deadlocks. HFix (that only targets on fixing certain types of atomicity violations) only fixed 2 atomicity violations and introduced 4 deadlocks. Fixer also provides an alternative way to insert gate locks (by inserting gate locks with proper visibility) considering fix acceptance. @InProceedings{ESEC/FSE17p303, author = {Yan Cai and Lingwei Cao and Jing Zhao}, title = {Adaptively Generating High Quality Fixes for Atomicity Violations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {303--314}, doi = {}, year = {2017}, } |
|
Cao, Lingwei |
ESEC/FSE '17: "Adaptively Generating High ..."
Adaptively Generating High Quality Fixes for Atomicity Violations
Yan Cai , Lingwei Cao, and Jing Zhao (Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Harbin Engineering University, China) It is difficult to fix atomicity violations correctly. Existing gate lock algorithm (GLA) simply inserts gate locks to serialize exe-cutions, which may introduce performance bugs and deadlocks. Synthesized context-aware gate locks (by Grail) require complex source code synthesis. We propose Fixer to adaptively fix ato-micity violations. It firstly analyses the lock acquisitions of an atomicity violation. Then it either adjusts the existing lock scope or inserts a gate lock. The former addresses cases where some locks are used but fail to provide atomic accesses. For the latter, it infers the visibility (being global or a field of a class/struct) of the gate lock such that the lock only protects related accesses. For both cases, Fixer further eliminates new lock orders to avoid introducing deadlocks. Of course, Fixer can produce both kinds of fixes on atomicity violations with locks. The experi-mental results on 15 previously used atomicity violations show that: Fixer correctly fixed all 15 atomicity violations without introducing deadlocks. However, GLA and Grail both intro-duced 5 deadlocks. HFix (that only targets on fixing certain types of atomicity violations) only fixed 2 atomicity violations and introduced 4 deadlocks. Fixer also provides an alternative way to insert gate locks (by inserting gate locks with proper visibility) considering fix acceptance. @InProceedings{ESEC/FSE17p303, author = {Yan Cai and Lingwei Cao and Jing Zhao}, title = {Adaptively Generating High Quality Fixes for Atomicity Violations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {303--314}, doi = {}, year = {2017}, } |
|
Cappos, Justin |
ESEC/FSE '17: "Understanding Misunderstandings ..."
Understanding Misunderstandings in Source Code
Dan Gopstein, Jake Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos (New York University, USA; Pennsylvania State University, USA; University of Colorado at Colorado Springs, USA) Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback. @InProceedings{ESEC/FSE17p129, author = {Dan Gopstein and Jake Iannacone and Yu Yan and Lois DeLong and Yanyan Zhuang and Martin K.-C. Yeh and Justin Cappos}, title = {Understanding Misunderstandings in Source Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--139}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Casalnuovo, Casey |
ESEC/FSE '17: "Recovering Clear, Natural ..."
Recovering Clear, Natural Identifiers from Obfuscated JS Names
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu (Carnegie Mellon University, USA; University of California at Davis, USA) Well-chosen variable names are critical to source code readability, reusability, and maintainability. Unfortunately, in deployed JavaScript code (which is ubiquitous on the web) the identifier names are frequently minified and overloaded. This is done both for efficiency and also to protect potentially proprietary intellectual property. In this paper, we describe an approach based on statistical machine translation (SMT) that recovers some of the original names from the JavaScript programs minified by the very popular UglifyJS. This simple tool, Autonym, performs comparably to the best currently available deobfuscator for JavaScript, JSNice, which uses sophisticated static analysis. In fact, Autonym is quite complementary to JSNice, performing well when it does not, and vice versa. We also introduce a new tool, JSNaughty, which blends Autonym and JSNice, and significantly outperforms both at identifier name recovery, while remaining just as easy to use as JSNice. JSNaughty is available online at http://jsnaughty.org. @InProceedings{ESEC/FSE17p683, author = {Bogdan Vasilescu and Casey Casalnuovo and Premkumar Devanbu}, title = {Recovering Clear, Natural Identifiers from Obfuscated JS Names}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {683--693}, doi = {}, year = {2017}, } |
|
Castelluccio, Marco |
ESEC/FSE '17: "Automatically Analyzing Groups ..."
Automatically Analyzing Groups of Crashes for Finding Correlations
Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi (Federico II University of Naples, Italy; Mozilla, UK) We devised an algorithm, inspired by contrast-set mining algorithms such as STUCCO, to automatically find statistically significant properties (correlations) in crash groups. Many earlier works focused on improving the clustering of crashes but, to the best of our knowledge, the problem of automatically describing properties of a cluster of crashes is so far unexplored. This means developers currently spend a fair amount of time analyzing the groups themselves, which in turn means that a) they are not spending their time actually developing a fix for the crash; and b) they might miss something in their exploration of the crash data (there is a large number of attributes in crash reports and it is hard and error-prone to manually analyze everything). Our algorithm helps developers and release managers understand crash reports more easily and in an automated way, helping in pinpointing the root cause of the crash. The tool implementing the algorithm has been deployed on Mozilla's crash reporting service. @InProceedings{ESEC/FSE17p717, author = {Marco Castelluccio and Carlo Sansone and Luisa Verdoliva and Giovanni Poggi}, title = {Automatically Analyzing Groups of Crashes for Finding Correlations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {717--726}, doi = {}, year = {2017}, } |
|
Cedrim, Diego |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Celik, Ahmet |
ESEC/FSE '17: "Regression Test Selection ..."
Regression Test Selection Across JVM Boundaries
Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric (University of Texas at Austin, USA; Microsoft, USA) Modern software development processes recommend that changes be integrated into the main development line of a project multiple times a day. Before a new revision may be integrated, developers practice regression testing to ensure that the latest changes do not break any previously established functionality. The cost of regression testing is high, due to an increase in the number of revisions that are introduced per day, as well as the number of tests developers write per revision. Regression test selection (RTS) optimizes regression testing by skipping tests that are not affected by recent project changes. Existing dynamic RTS techniques support only projects written in a single programming language, which is unfortunate knowing that an open-source project is on average written in several programming languages. We present the first dynamic RTS technique that does not stop at predefined language boundaries. Our technique dynamically detects, at the operating system level, all file artifacts a test depends on. Our technique is, hence, oblivious to the specific means the test uses to actually access the files: be it through spawning a new process, invoking a system call, invoking a library written in a different language, invoking a library that spawns a process which makes a system call, etc. We also provide a set of extension points which allow for a smooth integration with testing frameworks and build systems. We implemented our technique in a tool called RTSLinux as a loadable Linux kernel module and evaluated it on 21 Java projects that escape JVM by spawning new processes or invoking native code, totaling 2,050,791 lines of code. Our results show that RTSLinux, on average, skips 74.17% of tests and saves 52.83% of test execution time compared to executing all tests. @InProceedings{ESEC/FSE17p809, author = {Ahmet Celik and Marko Vasic and Aleksandar Milicevic and Milos Gligoric}, title = {Regression Test Selection Across JVM Boundaries}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {809--820}, doi = {}, year = {2017}, } |
|
Chandramohan, Mahinthan |
ESEC/FSE '17: "Steelix: Program-State Based ..."
Steelix: Program-State Based Binary Fuzzing
Yuekang Li , Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu , and Alwen Tiu (Nanyang Technological University, Singapore; Fudan University, China) Coverage-based fuzzing is one of the most effective techniques to find vulnerabilities, bugs or crashes. However, existing techniques suffer from the difficulty in exercising the paths that are protected by magic bytes comparisons (e.g., string equality comparisons). Several approaches have been proposed to use heavy-weight program analysis to break through magic bytes comparisons, and hence are less scalable. In this paper, we propose a program-state based binary fuzzing approach, named Steelix, which improves the penetration power of a fuzzer at the cost of an acceptable slow down of the execution speed. In particular, we use light-weight static analysis and binary instrumentation to provide not only coverage information but also comparison progress information to a fuzzer. Such program state information informs a fuzzer about where the magic bytes are located in the test input and how to perform mutations to match the magic bytes efficiently. We have implemented Steelix and evaluated it on three datasets: LAVA-M dataset, DARPA CGC sample binaries and five real-life programs. The results show that Steelix has better code coverage and bug detection capability than the state-of-the-art fuzzers. Moreover, we found one CVE and nine new bugs. @InProceedings{ESEC/FSE17p627, author = {Yuekang Li and Bihuan Chen and Mahinthan Chandramohan and Shang-Wei Lin and Yang Liu and Alwen Tiu}, title = {Steelix: Program-State Based Binary Fuzzing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {627--637}, doi = {}, year = {2017}, } |
|
Chaparro, Oscar |
ESEC/FSE '17: "Detecting Missing Information ..."
Detecting Missing Information in Bug Descriptions
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Chattopadhyay, Sudipta |
ESEC/FSE '17: "Where Is the Bug and How Is ..."
Where Is the Bug and How Is It Fixed? An Experiment with Practitioners
Marcel Böhme, Ezekiel O. Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller (National University of Singapore, Singapore; Saarland University, Germany; Singapore University of Technology and Design, Singapore; SAP, Germany) Research has produced many approaches to automatically locate, explain, and repair software bugs. But do these approaches relate to the way practitioners actually locate, understand, and fix bugs? To help answer this question, we have collected a dataset named DBGBENCH --- the correct fault locations, bug diagnoses, and software patches of 27 real errors in open-source C projects that were consolidated from hundreds of debugging sessions of professional software engineers. Moreover, we shed light on the entire debugging process, from constructing a hypothesis to submitting a patch, and how debugging time, difficulty, and strategies vary across practitioners and types of errors. Most notably, DBGBENCH can serve as reality check for novel automated debugging and repair techniques. @InProceedings{ESEC/FSE17p117, author = {Marcel Böhme and Ezekiel O. Soremekun and Sudipta Chattopadhyay and Emamurho Ugherughe and Andreas Zeller}, title = {Where Is the Bug and How Is It Fixed? An Experiment with Practitioners}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {117--128}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Chaudhuri, Swarat |
ESEC/FSE '17: "Bayesian Specification Learning ..."
Bayesian Specification Learning for Finding API Usage Errors
Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine (Rice University, USA) We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a "reference distribution" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches. @InProceedings{ESEC/FSE17p151, author = {Vijayaraghavan Murali and Swarat Chaudhuri and Chris Jermaine}, title = {Bayesian Specification Learning for Finding API Usage Errors}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {151--162}, doi = {}, year = {2017}, } |
|
Chávez, Alexander |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Chen, Bihuan |
ESEC/FSE '17: "Loopster: Static Loop Termination ..."
Loopster: Static Loop Termination Analysis
Xiaofei Xie, Bihuan Chen, Liang Zou, Shang-Wei Lin, Yang Liu , and Xiaohong Li (Tianjin University, China; Nanyang Technological University, Singapore) Loop termination is an important problem for proving the correctness of a system and ensuring that the system always reacts. Existing loop termination analysis techniques mainly depend on the synthesis of ranking functions, which is often expensive. In this paper, we present a novel approach, named Loopster, which performs an efficient static analysis to decide the termination for loops based on path termination analysis and path dependency reasoning. Loopster adopts a divide-and-conquer approach: (1) we extract individual paths from a target multi-path loop and analyze the termination of each path, (2) analyze the dependencies between each two paths, and then (3) determine the overall termination of the target loop based on the relations among paths. We evaluate Loopster by applying it on the loop termination competition benchmark and three real-world projects. The results show that Loopster is effective in a majority of loops with better accuracy and 20 ×+ performance improvement compared to the state-of-the-art tools. @InProceedings{ESEC/FSE17p84, author = {Xiaofei Xie and Bihuan Chen and Liang Zou and Shang-Wei Lin and Yang Liu and Xiaohong Li}, title = {Loopster: Static Loop Termination Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {84--94}, doi = {}, year = {2017}, } ESEC/FSE '17: "Steelix: Program-State Based ..." Steelix: Program-State Based Binary Fuzzing Yuekang Li , Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu , and Alwen Tiu (Nanyang Technological University, Singapore; Fudan University, China) Coverage-based fuzzing is one of the most effective techniques to find vulnerabilities, bugs or crashes. However, existing techniques suffer from the difficulty in exercising the paths that are protected by magic bytes comparisons (e.g., string equality comparisons). Several approaches have been proposed to use heavy-weight program analysis to break through magic bytes comparisons, and hence are less scalable. In this paper, we propose a program-state based binary fuzzing approach, named Steelix, which improves the penetration power of a fuzzer at the cost of an acceptable slow down of the execution speed. In particular, we use light-weight static analysis and binary instrumentation to provide not only coverage information but also comparison progress information to a fuzzer. Such program state information informs a fuzzer about where the magic bytes are located in the test input and how to perform mutations to match the magic bytes efficiently. We have implemented Steelix and evaluated it on three datasets: LAVA-M dataset, DARPA CGC sample binaries and five real-life programs. The results show that Steelix has better code coverage and bug detection capability than the state-of-the-art fuzzers. Moreover, we found one CVE and nine new bugs. @InProceedings{ESEC/FSE17p627, author = {Yuekang Li and Bihuan Chen and Mahinthan Chandramohan and Shang-Wei Lin and Yang Liu and Alwen Tiu}, title = {Steelix: Program-State Based Binary Fuzzing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {627--637}, doi = {}, year = {2017}, } |
|
Chen, Qingying |
ESEC/FSE '17: "On the Scalability of Linux ..."
On the Scalability of Linux Kernel Maintainers' Work
Minghui Zhou , Qingying Chen, Audris Mockus, and Fengguang Wu (Peking University, China; University of Tennessee, USA; Intel, China) Open source software ecosystems evolve ways to balance the workload among groups of participants ranging from core groups to peripheral groups. As ecosystems grow, it is not clear whether the mechanisms that previously made them work will continue to be relevant or whether new mechanisms will need to evolve. The impact of failure for critical ecosystems such as Linux is enormous, yet the understanding of why they function and are effective is limited. We, therefore, aim to understand how the Linux kernel sustains its growth, how to characterize the workload of maintainers, and whether or not the existing mechanisms are scalable. We quantify maintainers’ work through the files that are maintained, and the change activity and the numbers of contributors in those files. We find systematic differences among modules; these differences are stable over time, which suggests that certain architectural features, commercial interests, or module-specific practices lead to distinct sustainable equilibria. We find that most of the modules have not grown appreciably over the last decade; most growth has been absorbed by a few modules. We also find that the effort per maintainer does not increase, even though the community has hypothesized that required effort might increase. However, the distribution of work among maintainers is highly unbalanced, suggesting that a few maintainers may experience increasing workload. We find that the practice of assigning multiple maintainers to a file yields only a power of 1/2 increase in productivity. We expect that our proposed framework to quantify maintainer practices will help clarify the factors that allow rapidly growing ecosystems to be sustainable. @InProceedings{ESEC/FSE17p27, author = {Minghui Zhou and Qingying Chen and Audris Mockus and Fengguang Wu}, title = {On the Scalability of Linux Kernel Maintainers' Work}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {27--37}, doi = {}, year = {2017}, } Info |
|
Chen, Yuting |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Cheung, Shing-Chi |
ESEC/FSE '17: "OASIS: Prioritizing Static ..."
OASIS: Prioritizing Static Analysis Warnings for Android Apps Based on App User Reviews
Lili Wei , Yepang Liu, and Shing-Chi Cheung (Hong Kong University of Science and Technology, China) Lint is a widely-used static analyzer for detecting bugs/issues in Android apps. However, it can generate many false warnings. One existing solution to this problem is to leverage project history data (e.g., bug fixing statistics) for warning prioritization. Unfortunately, such techniques are biased toward a project’s archived warnings and can easily miss newissues. Anotherweakness is that developers cannot readily relate the warnings to the impacts perceivable by users. To overcome these weaknesses, in this paper, we propose a semantics-aware approach, OASIS, to prioritizing Lint warnings by leveraging app user reviews. OASIS combines program analysis and NLP techniques to recover the intrinsic links between the Lint warnings for a given app and the user complaints on the app problems caused by the issues of concern. OASIS leverages the strength of such links to prioritize warnings. We evaluated OASIS on six popular and large-scale open-source Android apps. The results show that OASIS can effectively prioritize Lint warnings and help identify new issues that are previously-unknown to app developers. @InProceedings{ESEC/FSE17p672, author = {Lili Wei and Yepang Liu and Shing-Chi Cheung}, title = {OASIS: Prioritizing Static Analysis Warnings for Android Apps Based on App User Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {672--682}, doi = {}, year = {2017}, } |
|
Christakis, Maria |
ESEC/FSE '17: "Failure-Directed Program Trimming ..."
Failure-Directed Program Trimming
Kostas Ferles , Valentin Wüstholz, Maria Christakis, and Isil Dillig (University of Texas at Austin, USA; University of Kent, UK) This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program P, program trimming generates a new program P′ such that P and P′ are equi-safe (i.e., P′ has a bug if and only if P has a bug), but P′ has fewer execution paths than P. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques. @InProceedings{ESEC/FSE17p174, author = {Kostas Ferles and Valentin Wüstholz and Maria Christakis and Isil Dillig}, title = {Failure-Directed Program Trimming}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {174--185}, doi = {}, year = {2017}, } |
|
Chu, Duc-Hiep |
ESEC/FSE '17: "S3: Syntax- and Semantic-Guided ..."
S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo , Claire Le Goues , and Willem Visser (Singapore Management University, Singapore; IST Austria, Austria; Carnegie Mellon University, USA; Stellenbosch University, South Africa) A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs. @InProceedings{ESEC/FSE17p593, author = {Xuan-Bach D. Le and Duc-Hiep Chu and David Lo and Claire Le Goues and Willem Visser}, title = {S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {593--604}, doi = {}, year = {2017}, } |
|
Coelho, Jailton |
ESEC/FSE '17: "Why Modern Open Source Projects ..."
Why Modern Open Source Projects Fail
Jailton Coelho and Marco Tulio Valente (Federal University of Minas Gerais, Brazil) Open source is experiencing a renaissance period, due to the appearance of modern platforms and workflows for developing and maintaining public code. As a result, developers are creating open source software at speeds never seen before. Consequently, these projects are also facing unprecedented mortality rates. To better understand the reasons for the failure of modern open source projects, this paper describes the results of a survey with the maintainers of 104 popular GitHub systems that have been deprecated. We provide a set of nine reasons for the failure of these open source projects. We also show that some maintenance practices---specifically the adoption of contributing guidelines and continuous integration---have an important association with a project failure or success. Finally, we discuss and reveal the principal strategies developers have tried to overcome the failure of the studied projects. @InProceedings{ESEC/FSE17p186, author = {Jailton Coelho and Marco Tulio Valente}, title = {Why Modern Open Source Projects Fail}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {186--196}, doi = {}, year = {2017}, } |
|
Danas, Natasha |
ESEC/FSE '17: "The Power of "Why" ..."
The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance
Tim Nelson, Natasha Danas, Daniel J. Dougherty, and Shriram Krishnamurthi (Brown University, USA; Worcester Polytechnic Institute, USA) Scenario-finding tools like the Alloy Analyzer are widely used in numerous concrete domains like security, network analysis, UML analysis, and so on. They can help to verify properties and, more generally, aid in exploring a system's behavior. While scenario finders are valuable for their ability to produce concrete examples, individual scenarios only give insight into what is possible, leaving the user to make their own conclusions about what might be necessary. This paper enriches scenario finding by allowing users to ask ``why?'' and ``why not?'' questions about the examples they are given. We show how to distinguish parts of an example that cannot be consistently removed (or changed) from those that merely reflect underconstraint in the specification. In the former case we show how to determine which elements of the specification and which other components of the example together explain the presence of such facts. This paper formalizes the act of computing provenance in scenario-finding. We present Amalgam, an extension of the popular Alloy scenario-finder, which implements these foundations and provides interactive exploration of examples. We also evaluate Amalgam's algorithmics on a variety of both textbook and real-world examples. @InProceedings{ESEC/FSE17p106, author = {Tim Nelson and Natasha Danas and Daniel J. Dougherty and Shriram Krishnamurthi}, title = {The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {106--116}, doi = {}, year = {2017}, } Info Artifacts Reusable Best-Paper Award |
|
D'Antoni, Loris |
ESEC/FSE '17: "NoFAQ: Synthesizing Command ..."
NoFAQ: Synthesizing Command Repairs from Examples
Loris D'Antoni , Rishabh Singh, and Michael Vaughn (University of Wisconsin-Madison, USA; Microsoft Research, USA) Command-line tools are confusing and hard to use due to their cryptic error messages and lack of documentation. Novice users often resort to online help-forums for finding corrections to their buggy commands, but have a hard time in searching precisely for posts that are relevant to their problem and then applying the suggested solutions to their buggy command. We present NoFAQ, a tool that uses a set of rules to suggest possible fixes when users write buggy commands that trigger commonly occurring errors. The rules are expressed in a language called FIXIT and each rule pattern-matches against the user's buggy command and corresponding error message, and uses these inputs to produce a possible fixed command. NoFAQ automatically learns FIXIT rules from examples of buggy and repaired commands. We evaluate NoFAQ on two fronts. First, we use 92 benchmark problems drawn from an existing tool and show that NoFAQ is able to synthesize rules for 81 benchmark problems in real time using just 2 to 5 input-output examples for each rule. Second, we run our learning algorithm on the examples obtained through a crowd-sourcing interface and show that the learning algorithm scales to large sets of examples. @InProceedings{ESEC/FSE17p582, author = {Loris D'Antoni and Rishabh Singh and Michael Vaughn}, title = {NoFAQ: Synthesizing Command Repairs from Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {582--592}, doi = {}, year = {2017}, } |
|
Davidson, Drew |
ESEC/FSE '17: "Cimplifier: Automatically ..."
Cimplifier: Automatically Debloating Containers
Vaibhav Rastogi, Drew Davidson, Lorenzo De Carli, Somesh Jha , and Patrick McDaniel (University of Wisconsin-Madison, USA; Tala Security, USA; Colorado State University, USA; Pennsylvania State University, USA) Application containers, such as those provided by Docker, have recently gained popularity as a solution for agile and seamless software deployment. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. Unfortunately, the ease with which containers can be created is oftentimes a double-edged sword, encouraging the packaging of logically distinct applications, and the inclusion of significant amount of unnecessary components, within a single container. These practices needlessly increase the container size—sometimes by orders of magnitude. They also decrease the overall security, as each included component—necessary or not—may bring in security issues of its own, and there is no isolation between multiple applications packaged within the same container image. We propose algorithms and a tool called Cimplifier, which address these concerns: given a container and simple user-defined constraints, our tool partitions it into simpler containers, which (i) are isolated from each other, only communicating as necessary, and (ii) only include enough resources to perform their functionality. Our evaluation on real-world containers demonstrates that Cimplifier preserves the original functionality, leads to reduction in image size of up to 95%, and processes even large containers in under thirty seconds. @InProceedings{ESEC/FSE17p476, author = {Vaibhav Rastogi and Drew Davidson and Lorenzo De Carli and Somesh Jha and Patrick McDaniel}, title = {Cimplifier: Automatically Debloating Containers}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {476--486}, doi = {}, year = {2017}, } |
|
De Carli, Lorenzo |
ESEC/FSE '17: "Cimplifier: Automatically ..."
Cimplifier: Automatically Debloating Containers
Vaibhav Rastogi, Drew Davidson, Lorenzo De Carli, Somesh Jha , and Patrick McDaniel (University of Wisconsin-Madison, USA; Tala Security, USA; Colorado State University, USA; Pennsylvania State University, USA) Application containers, such as those provided by Docker, have recently gained popularity as a solution for agile and seamless software deployment. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. Unfortunately, the ease with which containers can be created is oftentimes a double-edged sword, encouraging the packaging of logically distinct applications, and the inclusion of significant amount of unnecessary components, within a single container. These practices needlessly increase the container size—sometimes by orders of magnitude. They also decrease the overall security, as each included component—necessary or not—may bring in security issues of its own, and there is no isolation between multiple applications packaged within the same container image. We propose algorithms and a tool called Cimplifier, which address these concerns: given a container and simple user-defined constraints, our tool partitions it into simpler containers, which (i) are isolated from each other, only communicating as necessary, and (ii) only include enough resources to perform their functionality. Our evaluation on real-world containers demonstrates that Cimplifier preserves the original functionality, leads to reduction in image size of up to 95%, and processes even large containers in under thirty seconds. @InProceedings{ESEC/FSE17p476, author = {Vaibhav Rastogi and Drew Davidson and Lorenzo De Carli and Somesh Jha and Patrick McDaniel}, title = {Cimplifier: Automatically Debloating Containers}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {476--486}, doi = {}, year = {2017}, } |
|
DeLong, Lois |
ESEC/FSE '17: "Understanding Misunderstandings ..."
Understanding Misunderstandings in Source Code
Dan Gopstein, Jake Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos (New York University, USA; Pennsylvania State University, USA; University of Colorado at Colorado Springs, USA) Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback. @InProceedings{ESEC/FSE17p129, author = {Dan Gopstein and Jake Iannacone and Yu Yan and Lois DeLong and Yanyan Zhuang and Martin K.-C. Yeh and Justin Cappos}, title = {Understanding Misunderstandings in Source Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--139}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
De Mello, Rafael |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Devanbu, Premkumar |
ESEC/FSE '17: "Are Deep Neural Networks the ..."
Are Deep Neural Networks the Best Choice for Modeling Source Code?
Vincent J. Hellendoorn and Premkumar Devanbu (University of California at Davis, USA) Current statistical language modeling techniques, including deep-learning based models, have proven to be quite effective for source code. We argue here that the special properties of source code can be exploited for further improvements. In this work, we enhance established language modeling approaches to handle the special challenges of modeling source code, such as: frequent changes, larger, changing vocabularies, deeply nested scopes, etc. We present a fast, nested language modeling toolkit specifically designed for software, with the ability to add & remove text, and mix & swap out many models. Specifically, we improve upon prior cache-modeling work and present a model with a much more expansive, multi-level notion of locality that we show to be well-suited for modeling software. We present results on varying corpora in comparison with traditional N-gram, as well as RNN, and LSTM deep-learning language models, and release all our source code for public use. Our evaluations suggest that carefully adapting N-gram models for source code can yield performance that surpasses even RNN and LSTM based deep-learning models. @InProceedings{ESEC/FSE17p763, author = {Vincent J. Hellendoorn and Premkumar Devanbu}, title = {Are Deep Neural Networks the Best Choice for Modeling Source Code?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {763--773}, doi = {}, year = {2017}, } Info ESEC/FSE '17: "Recovering Clear, Natural ..." Recovering Clear, Natural Identifiers from Obfuscated JS Names Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu (Carnegie Mellon University, USA; University of California at Davis, USA) Well-chosen variable names are critical to source code readability, reusability, and maintainability. Unfortunately, in deployed JavaScript code (which is ubiquitous on the web) the identifier names are frequently minified and overloaded. This is done both for efficiency and also to protect potentially proprietary intellectual property. In this paper, we describe an approach based on statistical machine translation (SMT) that recovers some of the original names from the JavaScript programs minified by the very popular UglifyJS. This simple tool, Autonym, performs comparably to the best currently available deobfuscator for JavaScript, JSNice, which uses sophisticated static analysis. In fact, Autonym is quite complementary to JSNice, performing well when it does not, and vice versa. We also introduce a new tool, JSNaughty, which blends Autonym and JSNice, and significantly outperforms both at identifier name recovery, while remaining just as easy to use as JSNice. JSNaughty is available online at http://jsnaughty.org. @InProceedings{ESEC/FSE17p683, author = {Bogdan Vasilescu and Casey Casalnuovo and Premkumar Devanbu}, title = {Recovering Clear, Natural Identifiers from Obfuscated JS Names}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {683--693}, doi = {}, year = {2017}, } |
|
Dietsch, Daniel |
ESEC/FSE '17: "Craig vs. Newton in Software ..."
Craig vs. Newton in Software Model Checking
Daniel Dietsch, Matthias Heizmann, Betim Musa, Alexander Nutz, and Andreas Podelski (University of Freiburg, Germany) Ever since the seminal work on SLAM and BLAST, software model checking with counterexample-guided abstraction refinement (CEGAR) has been an active topic of research. The crucial procedure here is to analyze a sequence of program statements (the counterexample) to find building blocks for the overall proof of the program. We can distinguish two approaches (which we name Craig and Newton) to implement the procedure. The historically first approach, Newton (named after the tool from the SLAM toolkit), is based on symbolic execution. The second approach, Craig, is based on Craig interpolation. It was widely believed that Craig is substantially more effective than Newton. In fact, 12 out of the 15 CEGAR-based tools in SV-COMP are based on Craig. Advances in software model checkers based on Craig, however, can go only lockstep with advances in SMT solvers with Craig interpolation. It may be time to revisit Newton and ask whether Newton can be as effective as Craig. We have implemented a total of 11 variants of Craig and Newton in two different state-of-the-art software model checking tools and present the outcome of our experimental comparison. @InProceedings{ESEC/FSE17p487, author = {Daniel Dietsch and Matthias Heizmann and Betim Musa and Alexander Nutz and Andreas Podelski}, title = {Craig vs. Newton in Software Model Checking}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {487--497}, doi = {}, year = {2017}, } |
|
Dig, Danny |
ESEC/FSE '17: "Trade-Offs in Continuous Integration: ..."
Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility
Michael Hilton , Nicholas Nelson, Timothy Tunnell, Darko Marinov , and Danny Dig (Oregon State University, USA; University of Illinois at Urbana-Champaign, USA) Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI being a widely used activity in software engineering, we do not know what motivates developers to use CI, and what barriers and unmet needs they face. Without such knowledge, developers make easily avoidable errors, tool builders invest in the wrong direction, and researchers miss opportunities for improving the practice of CI. We present a qualitative study of the barriers and needs developers face when using CI. We conduct semi-structured interviews with developers from different industries and development scales. We triangulate our findings by running two surveys. We find that developers face trade-offs between speed and certainty (Assurance), between better access and information security (Security), and between more configuration options and greater ease of use (Flexi- bility). We present implications of these trade-offs for developers, tool builders, and researchers. @InProceedings{ESEC/FSE17p197, author = {Michael Hilton and Nicholas Nelson and Timothy Tunnell and Darko Marinov and Danny Dig}, title = {Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {197--207}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Dillig, Isil |
ESEC/FSE '17: "Failure-Directed Program Trimming ..."
Failure-Directed Program Trimming
Kostas Ferles , Valentin Wüstholz, Maria Christakis, and Isil Dillig (University of Texas at Austin, USA; University of Kent, UK) This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program P, program trimming generates a new program P′ such that P and P′ are equi-safe (i.e., P′ has a bug if and only if P has a bug), but P′ has fewer execution paths than P. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques. @InProceedings{ESEC/FSE17p174, author = {Kostas Ferles and Valentin Wüstholz and Maria Christakis and Isil Dillig}, title = {Failure-Directed Program Trimming}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {174--185}, doi = {}, year = {2017}, } |
|
Dingel, Juergen |
ESEC/FSE '17: "Model-Level, Platform-Independent ..."
Model-Level, Platform-Independent Debugging in the Context of the Model-Driven Development of Real-Time Systems
Mojtaba Bagherzadeh, Nicolas Hili, and Juergen Dingel (Queen's University, Canada) Providing proper support for debugging models at model-level is one of the main barriers to a broader adoption of Model Driven Development (MDD). In this paper, we focus on the use of MDD for the development of real-time embedded systems (RTE). We introduce a new platform-independent approach to implement model-level debuggers. We describe how to realize support for model-level debugging entirely in terms of the modeling language and show how to implement this support in terms of a model-to-model transformation. Key advantages of the approach over existing work are that (1) it does not require a program debugger for the code generated from the model, and that (2) any changes to, e.g., the code generator, the target language, or the hardware platform leave the debugger completely unaffected. We also describe an implementation of the approach in the context of Papyrus-RT, an open source MDD tool based on the modeling language UML-RT. We summarize the results of the use of our model-based debugger on several use cases to determine its overhead in terms of size and performance. Despite being a prototype, the performance overhead is in the order of microseconds, while the size overhead is comparable with that of GDB, the GNU Debugger. @InProceedings{ESEC/FSE17p419, author = {Mojtaba Bagherzadeh and Nicolas Hili and Juergen Dingel}, title = {Model-Level, Platform-Independent Debugging in the Context of the Model-Driven Development of Real-Time Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {419--430}, doi = {}, year = {2017}, } Video Info Artifacts Functional |
|
Di Penta, Massimiliano |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info ESEC/FSE '17: "Detecting Missing Information ..." Detecting Missing Information in Bug Descriptions Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Donaldson, Alastair F. |
ESEC/FSE '17: "Cooperative Kernels: GPU Multitasking ..."
Cooperative Kernels: GPU Multitasking for Blocking Algorithms
Tyler Sorensen, Hugues Evrard, and Alastair F. Donaldson (Imperial College London, UK) There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today's GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking. Our prototype exploits no vendor-specific hardware, driver or compiler support, thus our results provide a lower-bound on the efficiency with which cooperative kernels can be implemented in practice. @InProceedings{ESEC/FSE17p431, author = {Tyler Sorensen and Hugues Evrard and Alastair F. Donaldson}, title = {Cooperative Kernels: GPU Multitasking for Blocking Algorithms}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {431--441}, doi = {}, year = {2017}, } Best-Paper Award |
|
Dotzler, Georg |
ESEC/FSE '17: "More Accurate Recommendations ..."
More Accurate Recommendations for Method-Level Changes
Georg Dotzler, Marius Kamp, Patrick Kreutzer, and Michael Philippsen (Friedrich-Alexander University Erlangen-Nürnberg, Germany) During the life span of large software projects, developers often apply the same code changes to different code locations in slight variations. Since the application of these changes to all locations is time-consuming and error-prone, tools exist that learn change patterns from input examples, search for possible pattern applications, and generate corresponding recommendations. In many cases, the generated recommendations are syntactically or semantically wrong due to code movements in the input examples. Thus, they are of low accuracy and developers cannot directly copy them into their projects without adjustments. We present the Accurate REcommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations. On average, the recommendations by ARES have an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives. At the same time ARES achieves precision and recall values that are on par with other tools. @InProceedings{ESEC/FSE17p798, author = {Georg Dotzler and Marius Kamp and Patrick Kreutzer and Michael Philippsen}, title = {More Accurate Recommendations for Method-Level Changes}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {798--808}, doi = {}, year = {2017}, } Info |
|
Dougherty, Daniel J. |
ESEC/FSE '17: "The Power of "Why" ..."
The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance
Tim Nelson, Natasha Danas, Daniel J. Dougherty, and Shriram Krishnamurthi (Brown University, USA; Worcester Polytechnic Institute, USA) Scenario-finding tools like the Alloy Analyzer are widely used in numerous concrete domains like security, network analysis, UML analysis, and so on. They can help to verify properties and, more generally, aid in exploring a system's behavior. While scenario finders are valuable for their ability to produce concrete examples, individual scenarios only give insight into what is possible, leaving the user to make their own conclusions about what might be necessary. This paper enriches scenario finding by allowing users to ask ``why?'' and ``why not?'' questions about the examples they are given. We show how to distinguish parts of an example that cannot be consistently removed (or changed) from those that merely reflect underconstraint in the specification. In the former case we show how to determine which elements of the specification and which other components of the example together explain the presence of such facts. This paper formalizes the act of computing provenance in scenario-finding. We present Amalgam, an extension of the popular Alloy scenario-finder, which implements these foundations and provides interactive exploration of examples. We also evaluate Amalgam's algorithmics on a variety of both textbook and real-world examples. @InProceedings{ESEC/FSE17p106, author = {Tim Nelson and Natasha Danas and Daniel J. Dougherty and Shriram Krishnamurthi}, title = {The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {106--116}, doi = {}, year = {2017}, } Info Artifacts Reusable Best-Paper Award |
|
Eden, Anthony |
ESEC/FSE '17: "CodeCarbonCopy ..."
CodeCarbonCopy
Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin Rinard (Massachusetts Institute of Technology, USA) We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obtain a translation between the data representation and name space of the recipient and the data representation and name space of the donor. It also implements a static analysis that identifies and removes irrelevant functionality useful in the donor but not in the recipient. We evaluate CCC on eight transfers between six applications. Our results show that CCC can successfully transfer donor functionality into recipient applications. @InProceedings{ESEC/FSE17p95, author = {Stelios Sidiroglou-Douskos and Eric Lahtinen and Anthony Eden and Fan Long and Martin Rinard}, title = {CodeCarbonCopy}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--105}, doi = {}, year = {2017}, } |
|
Eichberg, Michael |
ESEC/FSE '17: "CodeMatch: Obfuscation Won't ..."
CodeMatch: Obfuscation Won't Conceal Your Repackaged App
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini (TU Darmstadt, Germany) An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app’s library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages @InProceedings{ESEC/FSE17p638, author = {Leonid Glanz and Sven Amann and Michael Eichberg and Michael Reif and Ben Hermann and Johannes Lerch and Mira Mezini}, title = {CodeMatch: Obfuscation Won't Conceal Your Repackaged App}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {638--648}, doi = {}, year = {2017}, } Info |
|
Evrard, Hugues |
ESEC/FSE '17: "Cooperative Kernels: GPU Multitasking ..."
Cooperative Kernels: GPU Multitasking for Blocking Algorithms
Tyler Sorensen, Hugues Evrard, and Alastair F. Donaldson (Imperial College London, UK) There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today's GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking. Our prototype exploits no vendor-specific hardware, driver or compiler support, thus our results provide a lower-bound on the efficiency with which cooperative kernels can be implemented in practice. @InProceedings{ESEC/FSE17p431, author = {Tyler Sorensen and Hugues Evrard and Alastair F. Donaldson}, title = {Cooperative Kernels: GPU Multitasking for Blocking Algorithms}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {431--441}, doi = {}, year = {2017}, } Best-Paper Award |
|
Ferles, Kostas |
ESEC/FSE '17: "Failure-Directed Program Trimming ..."
Failure-Directed Program Trimming
Kostas Ferles , Valentin Wüstholz, Maria Christakis, and Isil Dillig (University of Texas at Austin, USA; University of Kent, UK) This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program P, program trimming generates a new program P′ such that P and P′ are equi-safe (i.e., P′ has a bug if and only if P has a bug), but P′ has fewer execution paths than P. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques. @InProceedings{ESEC/FSE17p174, author = {Kostas Ferles and Valentin Wüstholz and Maria Christakis and Isil Dillig}, title = {Failure-Directed Program Trimming}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {174--185}, doi = {}, year = {2017}, } |
|
Ferns, Gabriel |
ESEC/FSE '17: "Discovering Relational Specifications ..."
Discovering Relational Specifications
Calvin Smith, Gabriel Ferns, and Aws Albarghouthi (University of Wisconsin-Madison, USA) Formal specifications of library functions play a critical role in a number of program analysis and development tasks. We present Bach, a technique for discovering likely relational specifications from data describing input–output behavior of a set of functions comprising a library or a program. Relational specifications correlate different executions of different functions; for instance, commutativity, transitivity, equivalence of two functions, etc. Bach combines novel insights from program synthesis and databases to discover a rich array of specifications. We apply Bach to learn specifications from data generated for a number of standard libraries. Our experimental evaluation demonstrates Bach’s ability to learn useful and deep specifications in a small amount of time. @InProceedings{ESEC/FSE17p616, author = {Calvin Smith and Gabriel Ferns and Aws Albarghouthi}, title = {Discovering Relational Specifications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {616--626}, doi = {}, year = {2017}, } Best-Paper Award |
|
Filieri, Antonio |
ESEC/FSE '17: "Automated Control of Multiple ..."
Automated Control of Multiple Software Goals using Multiple Actuators
Martina Maggio, Alessandro Vittorio Papadopoulos, Antonio Filieri, and Henry Hoffmann (Lund University, Sweden; Mälardalen University, Sweden; Imperial College London, UK; University of Chicago, USA) Modern software should satisfy multiple goals simultaneously: it should provide predictable performance, be robust to failures, handle peak loads and deal seamlessly with unexpected conditions and changes in the execution environment. For this to happen, software designs should account for the possibility of runtime changes and provide formal guarantees of the software's behavior. Control theory is one of the possible design drivers for runtime adaptation, but adopting control theoretic principles often requires additional, specialized knowledge. To overcome this limitation, automated methodologies have been proposed to extract the necessary information from experimental data and design a control system for runtime adaptation. These proposals, however, only process one goal at a time, creating a chain of controllers. In this paper, we propose and evaluate the first automated strategy that takes into account multiple goals without separating them into multiple control strategies. Avoiding the separation allows us to tackle a larger class of problems and provide stronger guarantees. We test our methodology's generality with three case studies that demonstrate its broad applicability in meeting performance, reliability, quality, security, and energy goals despite environmental or requirements changes. @InProceedings{ESEC/FSE17p373, author = {Martina Maggio and Alessandro Vittorio Papadopoulos and Antonio Filieri and Henry Hoffmann}, title = {Automated Control of Multiple Software Goals using Multiple Actuators}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {373--384}, doi = {}, year = {2017}, } Info |
|
Fonseca, Baldoino |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Fu, Wei |
ESEC/FSE '17: "Revisiting Unsupervised Learning ..."
Revisiting Unsupervised Learning for Defect Prediction
Wei Fu and Tim Menzies (North Carolina State University, USA) Collecting quality data from software projects can be time-consuming and expensive. Hence, some researchers explore “unsupervised” approaches to quality prediction that does not require labelled data. An alternate technique is to use “supervised” approaches that learn models from project data labelled with, say, “defective” or “not-defective”. Most researchers use these supervised models since, it is argued, they can exploit more knowledge of the projects. At FSE’16, Yang et al. reported startling results where unsupervised defect predictors outperformed supervised predictors for effort-aware just-in-time defect prediction. If confirmed, these results would lead to a dramatic simplification of a seemingly complex task (data mining) that is widely explored in the software engineering literature. This paper repeats and refutes those results as follows. (1) There is much variability in the efficacy of the Yang et al. predictors so even with their approach, some supervised data is required to prune weaker predictors away. (2) Their findings were grouped across N projects. When we repeat their analysis on a project-by-project basis, supervised predictors are seen to work better. Even though this paper rejects the specific conclusions of Yang et al., we still endorse their general goal. In our our experiments, supervised predictors did not perform outstandingly better than unsupervised ones for effort-aware just-in-time defect prediction. Hence, they may indeed be some combination of unsupervised learners to achieve comparable performance to supervised ones. We therefore encourage others to work in this promising area. @InProceedings{ESEC/FSE17p72, author = {Wei Fu and Tim Menzies}, title = {Revisiting Unsupervised Learning for Defect Prediction}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {72--83}, doi = {}, year = {2017}, } ESEC/FSE '17: "Easy over Hard: A Case Study ..." Easy over Hard: A Case Study on Deep Learning Wei Fu and Tim Menzies (North Carolina State University, USA) While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long training time limits the ability of (a)~a researcher to test the stability of their conclusion via repeated runs with different random seeds; and (b)~other researchers to repeat, improve, or even refute that original work. For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That deep learning system took 14 hours to execute. We show here that applying a very simple optimizer called DE to fine tune SVM, it can achieve similar (and sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning method. We offer these results as a cautionary tale to the software analytics community and suggest that not every new innovation should be applied without critical analysis. If researchers deploy some new and expensive process, that work should be baselined against some simpler and faster alternatives. @InProceedings{ESEC/FSE17p49, author = {Wei Fu and Tim Menzies}, title = {Easy over Hard: A Case Study on Deep Learning}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {49--60}, doi = {}, year = {2017}, } |
|
Galhotra, Sainyam |
ESEC/FSE '17: "Fairness Testing: Testing ..."
Fairness Testing: Testing Software for Discrimination
Sainyam Galhotra, Yuriy Brun , and Alexandra Meliou (University of Massachusetts at Amherst, USA) This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination. @InProceedings{ESEC/FSE17p498, author = {Sainyam Galhotra and Yuriy Brun and Alexandra Meliou}, title = {Fairness Testing: Testing Software for Discrimination}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {498--510}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Garbervetsky, Diego |
ESEC/FSE '17: "Toward Full Elasticity in ..."
Toward Full Elasticity in Distributed Static Analysis: The Case of Callgraph Analysis
Diego Garbervetsky , Edgardo Zoppi, and Benjamin Livshits (University of Buenos Aires, Argentina; Imperial College London, UK) In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms. @InProceedings{ESEC/FSE17p442, author = {Diego Garbervetsky and Edgardo Zoppi and Benjamin Livshits}, title = {Toward Full Elasticity in Distributed Static Analysis: The Case of Callgraph Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {442--453}, doi = {}, year = {2017}, } |
|
Garcia, Alessandro |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Garcia, Joshua |
ESEC/FSE '17: "Automatic Generation of Inter-Component ..."
Automatic Generation of Inter-Component Communication Exploits for Android Applications
Joshua Garcia , Mahmoud Hammad, Negar Ghorbani, and Sam Malek (University of California at Irvine, USA) Although a wide variety of approaches identify vulnerabilities in Android apps, none attempt to determine exploitability of those vulnerabilities. Exploitability can aid in reducing false positives of vulnerability analysis, and can help engineers triage bugs. Specifically, one of the main attack vectors of Android apps is their inter-component communication interface, where apps may receive messages called Intents. In this paper, we provide the first approach for automatically generating exploits for Android apps, called LetterBomb, relying on a combined path-sensitive symbolic execution-based static analysis, and the use of software instrumentation and test oracles. We run LetterBomb on 10,000 Android apps from Google Play, where we identify 181 exploits from 835 vulnerable apps. Compared to a state-of-the-art detection approach for three ICC-based vulnerabilities, LetterBomb obtains 33%-60% more vulnerabilities at a 6.66 to 7 times faster speed. @InProceedings{ESEC/FSE17p661, author = {Joshua Garcia and Mahmoud Hammad and Negar Ghorbani and Sam Malek}, title = {Automatic Generation of Inter-Component Communication Exploits for Android Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {661--671}, doi = {}, year = {2017}, } Info |
|
Gascon-Samson, Julien |
ESEC/FSE '17: "ARTINALI: Dynamic Invariant ..."
ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security
Maryam Raiyat Aliabadi, Amita Ajith Kamath, Julien Gascon-Samson, and Karthik Pattabiraman (University of British Columbia, Canada; National Institute of Technology Karnataka, India) Cyber-Physical Systems (CPSes) are being widely deployed in security critical scenarios such as smart homes and medical devices. Unfortunately, the connectedness of these systems and their relative lack of security measures makes them ripe targets for attacks. Specification-based Intrusion Detection Systems (IDS) have been shown to be effective for securing CPSs. Unfortunately, deriving invariants for capturing the specifications of CPS systems is a tedious and error-prone process. Therefore, it is important to dynamically monitor the CPS system to learn its common behaviors and formulate invariants for detecting security attacks. Existing techniques for invariant mining only incorporate data and events, but not time. However, time is central to most CPS systems, and hence incorporating time in addition to data and events, is essential for achieving low false positives and false negatives. This paper proposes ARTINALI, which mines dynamic system properties by incorporating time as a first-class property of the system. We build ARTINALI-based Intrusion Detection Systems (IDSes) for two CPSes, namely smart meters and smart medical devices, and measure their efficacy. We find that the ARTINALI-based IDSes significantly reduce the ratio of false positives and false negatives by 16 to 48% (average 30.75%) and 89 to 95% (average 93.4%) respectively over other dynamic invariant detection tools. @InProceedings{ESEC/FSE17p349, author = {Maryam Raiyat Aliabadi and Amita Ajith Kamath and Julien Gascon-Samson and Karthik Pattabiraman}, title = {ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {349--361}, doi = {}, year = {2017}, } |
|
Gazzillo, Paul |
ESEC/FSE '17: "Kmax: Finding All Configurations ..."
Kmax: Finding All Configurations of Kbuild Makefiles Statically
Paul Gazzillo (Yale University, USA) Feature-oriented software design is a useful paradigm for building and reasoning about highly-configurable software. By making variability explicit, feature-oriented tools and languages make program analysis tasks easier, such as bug-finding, maintenance, and more. But critical software, such as Linux, coreboot, and BusyBox rely instead on brittle tools, such as Makefiles, to encode variability, impeding variability-aware tool development. Summarizing Makefile behavior for all configurations is difficult, because Makefiles have unusual semantics, and exhaustive enumeration of all configurations is intractable in practice. Existing approaches use ad-hoc heuristics, missing much of the encoded variability in Makefiles. We present Kmax, a new static analysis algorithm and tool for Kbuild Makefiles. It is a family-based variability analysis algorithm, where paths are Boolean expressions of configuration options, called reaching configurations, and its abstract state enumerates string values for all configurations. Kmax localizes configuration explosion to the statement level, making precise analysis tractable. The implementation analyzes Makefiles from the Kbuild build system used by several low-level systems projects. Evaluation of Kmax on the Linux and BusyBox build systems shows it to be accurate, precise, and fast. It is the first tool to collect all source files and their configurations from Linux. Compared to previous approaches, Kmax is far more accurate and precise, performs with little overhead, and scales better. @InProceedings{ESEC/FSE17p279, author = {Paul Gazzillo}, title = {Kmax: Finding All Configurations of Kbuild Makefiles Statically}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {279--290}, doi = {}, year = {2017}, } Info |
|
Gheyi, Rohit |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Ghezzi, Carlo |
ESEC/FSE '17: "Modeling and Verification ..."
Modeling and Verification of Evolving Cyber-Physical Spaces
Christos Tsigkanos, Timo Kehrer, and Carlo Ghezzi (Politecnico di Milano, Italy) We increasingly live in cyber-physical spaces -- spaces that are both physical and digital, and where the two aspects are intertwined. Such spaces are highly dynamic and typically undergo continuous change. Software engineering can have a profound impact in this domain, by defining suitable modeling and specification notations as well as supporting design-time formal verification. In this paper, we present a methodology and a technical framework which support modeling of evolving cyber-physical spaces and reasoning about their spatio-temporal properties. We utilize a discrete, graph-based formalism for modeling cyber-physical spaces as well as primitives of change, giving rise to a reactive system consisting of rewriting rules with both local and global application conditions. Formal reasoning facilities are implemented adopting logic-based specification of properties and according model checking procedures, in both spatial and temporal fragments. We evaluate our approach using a case study of a disaster scenario in a smart city. @InProceedings{ESEC/FSE17p38, author = {Christos Tsigkanos and Timo Kehrer and Carlo Ghezzi}, title = {Modeling and Verification of Evolving Cyber-Physical Spaces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {38--48}, doi = {}, year = {2017}, } |
|
Ghorbani, Negar |
ESEC/FSE '17: "Automatic Generation of Inter-Component ..."
Automatic Generation of Inter-Component Communication Exploits for Android Applications
Joshua Garcia , Mahmoud Hammad, Negar Ghorbani, and Sam Malek (University of California at Irvine, USA) Although a wide variety of approaches identify vulnerabilities in Android apps, none attempt to determine exploitability of those vulnerabilities. Exploitability can aid in reducing false positives of vulnerability analysis, and can help engineers triage bugs. Specifically, one of the main attack vectors of Android apps is their inter-component communication interface, where apps may receive messages called Intents. In this paper, we provide the first approach for automatically generating exploits for Android apps, called LetterBomb, relying on a combined path-sensitive symbolic execution-based static analysis, and the use of software instrumentation and test oracles. We run LetterBomb on 10,000 Android apps from Google Play, where we identify 181 exploits from 835 vulnerable apps. Compared to a state-of-the-art detection approach for three ICC-based vulnerabilities, LetterBomb obtains 33%-60% more vulnerabilities at a 6.66 to 7 times faster speed. @InProceedings{ESEC/FSE17p661, author = {Joshua Garcia and Mahmoud Hammad and Negar Ghorbani and Sam Malek}, title = {Automatic Generation of Inter-Component Communication Exploits for Android Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {661--671}, doi = {}, year = {2017}, } Info |
|
Glanz, Leonid |
ESEC/FSE '17: "CodeMatch: Obfuscation Won't ..."
CodeMatch: Obfuscation Won't Conceal Your Repackaged App
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini (TU Darmstadt, Germany) An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app’s library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages @InProceedings{ESEC/FSE17p638, author = {Leonid Glanz and Sven Amann and Michael Eichberg and Michael Reif and Ben Hermann and Johannes Lerch and Mira Mezini}, title = {CodeMatch: Obfuscation Won't Conceal Your Repackaged App}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {638--648}, doi = {}, year = {2017}, } Info |
|
Gligoric, Milos |
ESEC/FSE '17: "Regression Test Selection ..."
Regression Test Selection Across JVM Boundaries
Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric (University of Texas at Austin, USA; Microsoft, USA) Modern software development processes recommend that changes be integrated into the main development line of a project multiple times a day. Before a new revision may be integrated, developers practice regression testing to ensure that the latest changes do not break any previously established functionality. The cost of regression testing is high, due to an increase in the number of revisions that are introduced per day, as well as the number of tests developers write per revision. Regression test selection (RTS) optimizes regression testing by skipping tests that are not affected by recent project changes. Existing dynamic RTS techniques support only projects written in a single programming language, which is unfortunate knowing that an open-source project is on average written in several programming languages. We present the first dynamic RTS technique that does not stop at predefined language boundaries. Our technique dynamically detects, at the operating system level, all file artifacts a test depends on. Our technique is, hence, oblivious to the specific means the test uses to actually access the files: be it through spawning a new process, invoking a system call, invoking a library written in a different language, invoking a library that spawns a process which makes a system call, etc. We also provide a set of extension points which allow for a smooth integration with testing frameworks and build systems. We implemented our technique in a tool called RTSLinux as a loadable Linux kernel module and evaluated it on 21 Java projects that escape JVM by spawning new processes or invoking native code, totaling 2,050,791 lines of code. Our results show that RTSLinux, on average, skips 74.17% of tests and saves 52.83% of test execution time compared to executing all tests. @InProceedings{ESEC/FSE17p809, author = {Ahmet Celik and Marko Vasic and Aleksandar Milicevic and Milos Gligoric}, title = {Regression Test Selection Across JVM Boundaries}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {809--820}, doi = {}, year = {2017}, } |
|
Gold, Nicolas E. |
ESEC/FSE '17: "Generalized Observational ..."
Generalized Observational Slicing for Tree-Represented Modelling Languages
Nicolas E. Gold, David Binkley, Mark Harman , Syed Islam, Jens Krinke, and Shin Yoo (University College London, UK; Loyola University Maryland, USA; University of East London, UK; KAIST, South Korea) Model-driven software engineering raises the abstraction level making complex systems easier to understand than if written in textual code. Nevertheless, large complicated software systems can have large models, motivating the need for slicing techniques that reduce the size of a model. We present a generalization of observation-based slicing that allows the criterion to be defined using a variety of kinds of observable behavior and does not require any complex dependence analysis. We apply our implementation of generalized observational slicing for tree-structured representations to Simulink models. The resulting slice might be the subset of the original model responsible for an observed failure or simply the sub-model semantically related to a classic slicing criterion. Unlike its predecessors, the algorithm is also capable of slicing embedded Stateflow state machines. A study of nine real-world models drawn from four different application domains demonstrates the effectiveness of our approach at dramatically reducing Simulink model sizes for realistic observation scenarios: for 9 out of 20 cases, the resulting model has fewer than 25% of the original model's elements. @InProceedings{ESEC/FSE17p547, author = {Nicolas E. Gold and David Binkley and Mark Harman and Syed Islam and Jens Krinke and Shin Yoo}, title = {Generalized Observational Slicing for Tree-Represented Modelling Languages}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {547--558}, doi = {}, year = {2017}, } |
|
Gopstein, Dan |
ESEC/FSE '17: "Understanding Misunderstandings ..."
Understanding Misunderstandings in Source Code
Dan Gopstein, Jake Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos (New York University, USA; Pennsylvania State University, USA; University of Colorado at Colorado Springs, USA) Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback. @InProceedings{ESEC/FSE17p129, author = {Dan Gopstein and Jake Iannacone and Yu Yan and Lois DeLong and Yanyan Zhuang and Martin K.-C. Yeh and Justin Cappos}, title = {Understanding Misunderstandings in Source Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--139}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Goues, Claire Le |
ESEC/FSE '17: "S3: Syntax- and Semantic-Guided ..."
S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo , Claire Le Goues , and Willem Visser (Singapore Management University, Singapore; IST Austria, Austria; Carnegie Mellon University, USA; Stellenbosch University, South Africa) A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs. @InProceedings{ESEC/FSE17p593, author = {Xuan-Bach D. Le and Duc-Hiep Chu and David Lo and Claire Le Goues and Willem Visser}, title = {S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {593--604}, doi = {}, year = {2017}, } |
|
Guo, Shengjian |
ESEC/FSE '17: "Symbolic Execution of Programmable ..."
Symbolic Execution of Programmable Logic Controller Code
Shengjian Guo , Meng Wu, and Chao Wang (Virginia Tech, USA; University of Southern California, USA) Programmable logic controllers (PLCs) are specialized computers for automating a wide range of cyber-physical systems. Since these systems are often safety-critical, software running on PLCs need to be free of programming errors. However, automated tools for testing PLC software are lacking despite the pervasive use of PLCs in industry. We propose a symbolic execution based method, named SymPLC, for automatically testing PLC software written in programming languages specified in the IEC 61131-3 standard. SymPLC takes the PLC source code as input and translates it into C before applying symbolic execution, to systematically generate test inputs that cover both paths in each periodic task and interleavings of these tasks. Toward this end, we propose a number of PLC-specific reduction techniques for identifying and eliminating redundant interleavings. We have evaluated SymPLC on a large set of benchmark programs with both single and multiple tasks. Our experiments show that SymPLC can handle these programs efficiently, and for multi-task PLC programs, our new reduction techniques outperform the state-of-the-art partial order reduction technique by more than two orders of magnitude. @InProceedings{ESEC/FSE17p326, author = {Shengjian Guo and Meng Wu and Chao Wang}, title = {Symbolic Execution of Programmable Logic Controller Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {326--336}, doi = {}, year = {2017}, } |
|
Guo, Yu |
ESEC/FSE '17: "AtexRace: Across Thread and ..."
AtexRace: Across Thread and Execution Sampling for In-House Race Detection
Yu Guo, Yan Cai , and Zijiang Yang (Western Michigan University, USA; Institute of Software at Chinese Academy of Sciences, China) Data race is a major source of concurrency bugs. Dynamic data race detection tools (e.g., FastTrack) monitor the execu-tions of a program to report data races occurring in runtime. However, such tools incur significant overhead that slows down and perturbs executions. To address the issue, the state-of-the-art dynamic data race detection tools (e.g., LiteRace) ap-ply sampling techniques to selectively monitor memory access-es. Although they reduce overhead, they also miss many data races as confirmed by existing studies. Thus, practitioners face a dilemma on whether to use FastTrack, which detects more data races but is much slower, or LiteRace, which is faster but detects less data races. In this paper, we propose a new sam-pling approach to address the major limitations of current sampling techniques, which ignore the facts that a data race involves two threads and a program under testing is repeatedly executed. We develop a tool called AtexRace to sample memory accesses across both threads and executions. By selectively monitoring the pairs of memory accesses that have not been frequently observed in current and previous executions, AtexRace detects as many data races as FastTrack at a cost as low as LiteRace. We have compared AtexRace against FastTrack and LiteRace on both Parsec benchmark suite and a large-scale real-world MySQL Server with 223 test cases. The experiments confirm that AtexRace can be a replacement of FastTrack and LiteRace. @InProceedings{ESEC/FSE17p315, author = {Yu Guo and Yan Cai and Zijiang Yang}, title = {AtexRace: Across Thread and Execution Sampling for In-House Race Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {315--325}, doi = {}, year = {2017}, } |
|
Hammad, Mahmoud |
ESEC/FSE '17: "Automatic Generation of Inter-Component ..."
Automatic Generation of Inter-Component Communication Exploits for Android Applications
Joshua Garcia , Mahmoud Hammad, Negar Ghorbani, and Sam Malek (University of California at Irvine, USA) Although a wide variety of approaches identify vulnerabilities in Android apps, none attempt to determine exploitability of those vulnerabilities. Exploitability can aid in reducing false positives of vulnerability analysis, and can help engineers triage bugs. Specifically, one of the main attack vectors of Android apps is their inter-component communication interface, where apps may receive messages called Intents. In this paper, we provide the first approach for automatically generating exploits for Android apps, called LetterBomb, relying on a combined path-sensitive symbolic execution-based static analysis, and the use of software instrumentation and test oracles. We run LetterBomb on 10,000 Android apps from Google Play, where we identify 181 exploits from 835 vulnerable apps. Compared to a state-of-the-art detection approach for three ICC-based vulnerabilities, LetterBomb obtains 33%-60% more vulnerabilities at a 6.66 to 7 times faster speed. @InProceedings{ESEC/FSE17p661, author = {Joshua Garcia and Mahmoud Hammad and Negar Ghorbani and Sam Malek}, title = {Automatic Generation of Inter-Component Communication Exploits for Android Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {661--671}, doi = {}, year = {2017}, } Info |
|
Harman, Mark |
ESEC/FSE '17: "Generalized Observational ..."
Generalized Observational Slicing for Tree-Represented Modelling Languages
Nicolas E. Gold, David Binkley, Mark Harman , Syed Islam, Jens Krinke, and Shin Yoo (University College London, UK; Loyola University Maryland, USA; University of East London, UK; KAIST, South Korea) Model-driven software engineering raises the abstraction level making complex systems easier to understand than if written in textual code. Nevertheless, large complicated software systems can have large models, motivating the need for slicing techniques that reduce the size of a model. We present a generalization of observation-based slicing that allows the criterion to be defined using a variety of kinds of observable behavior and does not require any complex dependence analysis. We apply our implementation of generalized observational slicing for tree-structured representations to Simulink models. The resulting slice might be the subset of the original model responsible for an observed failure or simply the sub-model semantically related to a classic slicing criterion. Unlike its predecessors, the algorithm is also capable of slicing embedded Stateflow state machines. A study of nine real-world models drawn from four different application domains demonstrates the effectiveness of our approach at dramatically reducing Simulink model sizes for realistic observation scenarios: for 9 out of 20 cases, the resulting model has fewer than 25% of the original model's elements. @InProceedings{ESEC/FSE17p547, author = {Nicolas E. Gold and David Binkley and Mark Harman and Syed Islam and Jens Krinke and Shin Yoo}, title = {Generalized Observational Slicing for Tree-Represented Modelling Languages}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {547--558}, doi = {}, year = {2017}, } |
|
Heizmann, Matthias |
ESEC/FSE '17: "Craig vs. Newton in Software ..."
Craig vs. Newton in Software Model Checking
Daniel Dietsch, Matthias Heizmann, Betim Musa, Alexander Nutz, and Andreas Podelski (University of Freiburg, Germany) Ever since the seminal work on SLAM and BLAST, software model checking with counterexample-guided abstraction refinement (CEGAR) has been an active topic of research. The crucial procedure here is to analyze a sequence of program statements (the counterexample) to find building blocks for the overall proof of the program. We can distinguish two approaches (which we name Craig and Newton) to implement the procedure. The historically first approach, Newton (named after the tool from the SLAM toolkit), is based on symbolic execution. The second approach, Craig, is based on Craig interpolation. It was widely believed that Craig is substantially more effective than Newton. In fact, 12 out of the 15 CEGAR-based tools in SV-COMP are based on Craig. Advances in software model checkers based on Craig, however, can go only lockstep with advances in SMT solvers with Craig interpolation. It may be time to revisit Newton and ask whether Newton can be as effective as Craig. We have implemented a total of 11 variants of Craig and Newton in two different state-of-the-art software model checking tools and present the outcome of our experimental comparison. @InProceedings{ESEC/FSE17p487, author = {Daniel Dietsch and Matthias Heizmann and Betim Musa and Alexander Nutz and Andreas Podelski}, title = {Craig vs. Newton in Software Model Checking}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {487--497}, doi = {}, year = {2017}, } |
|
Hellendoorn, Vincent J. |
ESEC/FSE '17: "Are Deep Neural Networks the ..."
Are Deep Neural Networks the Best Choice for Modeling Source Code?
Vincent J. Hellendoorn and Premkumar Devanbu (University of California at Davis, USA) Current statistical language modeling techniques, including deep-learning based models, have proven to be quite effective for source code. We argue here that the special properties of source code can be exploited for further improvements. In this work, we enhance established language modeling approaches to handle the special challenges of modeling source code, such as: frequent changes, larger, changing vocabularies, deeply nested scopes, etc. We present a fast, nested language modeling toolkit specifically designed for software, with the ability to add & remove text, and mix & swap out many models. Specifically, we improve upon prior cache-modeling work and present a model with a much more expansive, multi-level notion of locality that we show to be well-suited for modeling software. We present results on varying corpora in comparison with traditional N-gram, as well as RNN, and LSTM deep-learning language models, and release all our source code for public use. Our evaluations suggest that carefully adapting N-gram models for source code can yield performance that surpasses even RNN and LSTM based deep-learning models. @InProceedings{ESEC/FSE17p763, author = {Vincent J. Hellendoorn and Premkumar Devanbu}, title = {Are Deep Neural Networks the Best Choice for Modeling Source Code?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {763--773}, doi = {}, year = {2017}, } Info |
|
Hermann, Ben |
ESEC/FSE '17: "CodeMatch: Obfuscation Won't ..."
CodeMatch: Obfuscation Won't Conceal Your Repackaged App
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini (TU Darmstadt, Germany) An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app’s library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages @InProceedings{ESEC/FSE17p638, author = {Leonid Glanz and Sven Amann and Michael Eichberg and Michael Reif and Ben Hermann and Johannes Lerch and Mira Mezini}, title = {CodeMatch: Obfuscation Won't Conceal Your Repackaged App}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {638--648}, doi = {}, year = {2017}, } Info |
|
Hicks, Michael |
ESEC/FSE '17: "Counterexample-Guided Approach ..."
Counterexample-Guided Approach to Finding Numerical Invariants
ThanhVu Nguyen , Timos Antonopoulos , Andrew Ruef, and Michael Hicks (University of Nebraska-Lincoln, USA; Yale University, USA; University of Maryland, USA) Numerical invariants, e.g., relationships among numerical variables in a program, represent a useful class of properties to analyze programs. General polynomial invariants represent more complex numerical relations, but they are often required in many scientific and engineering applications. We present NumInv, a tool that implements a counterexample-guided invariant generation (CEGIR) technique to automatically discover numerical invariants, which are polynomial equality and inequality relations among numerical variables. This CEGIR technique infers candidate invariants from program traces and then checks them against the program source code using the KLEE test-input generation tool. If the invariants are incorrect KLEE returns counterexample traces, which help the dynamic inference obtain better results. Existing CEGIR approaches often require sound invariants, however NumInv sacrifices soundness and produces results that KLEE cannot refute within certain time bounds. This design and the use of KLEE as a verifier allow NumInv to discover useful and important numerical invariants for many challenging programs. Preliminary results show that NumInv generates required invariants for understanding and verifying correctness of programs involving complex arithmetic. We also show that NumInv discovers polynomial invariants that capture precise complexity bounds of programs used to benchmark existing static complexity analysis techniques. Finally, we show that NumInv performs competitively comparing to state of the art numerical invariant analysis tools. @InProceedings{ESEC/FSE17p605, author = {ThanhVu Nguyen and Timos Antonopoulos and Andrew Ruef and Michael Hicks}, title = {Counterexample-Guided Approach to Finding Numerical Invariants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {605--615}, doi = {}, year = {2017}, } |
|
Hili, Nicolas |
ESEC/FSE '17: "Model-Level, Platform-Independent ..."
Model-Level, Platform-Independent Debugging in the Context of the Model-Driven Development of Real-Time Systems
Mojtaba Bagherzadeh, Nicolas Hili, and Juergen Dingel (Queen's University, Canada) Providing proper support for debugging models at model-level is one of the main barriers to a broader adoption of Model Driven Development (MDD). In this paper, we focus on the use of MDD for the development of real-time embedded systems (RTE). We introduce a new platform-independent approach to implement model-level debuggers. We describe how to realize support for model-level debugging entirely in terms of the modeling language and show how to implement this support in terms of a model-to-model transformation. Key advantages of the approach over existing work are that (1) it does not require a program debugger for the code generated from the model, and that (2) any changes to, e.g., the code generator, the target language, or the hardware platform leave the debugger completely unaffected. We also describe an implementation of the approach in the context of Papyrus-RT, an open source MDD tool based on the modeling language UML-RT. We summarize the results of the use of our model-based debugger on several use cases to determine its overhead in terms of size and performance. Despite being a prototype, the performance overhead is in the order of microseconds, while the size overhead is comparable with that of GDB, the GNU Debugger. @InProceedings{ESEC/FSE17p419, author = {Mojtaba Bagherzadeh and Nicolas Hili and Juergen Dingel}, title = {Model-Level, Platform-Independent Debugging in the Context of the Model-Driven Development of Real-Time Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {419--430}, doi = {}, year = {2017}, } Video Info Artifacts Functional |
|
Hilton, Michael |
ESEC/FSE '17: "Trade-Offs in Continuous Integration: ..."
Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility
Michael Hilton , Nicholas Nelson, Timothy Tunnell, Darko Marinov , and Danny Dig (Oregon State University, USA; University of Illinois at Urbana-Champaign, USA) Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI being a widely used activity in software engineering, we do not know what motivates developers to use CI, and what barriers and unmet needs they face. Without such knowledge, developers make easily avoidable errors, tool builders invest in the wrong direction, and researchers miss opportunities for improving the practice of CI. We present a qualitative study of the barriers and needs developers face when using CI. We conduct semi-structured interviews with developers from different industries and development scales. We triangulate our findings by running two surveys. We find that developers face trade-offs between speed and certainty (Assurance), between better access and information security (Security), and between more configuration options and greater ease of use (Flexi- bility). We present implications of these trade-offs for developers, tool builders, and researchers. @InProceedings{ESEC/FSE17p197, author = {Michael Hilton and Nicholas Nelson and Timothy Tunnell and Darko Marinov and Danny Dig}, title = {Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {197--207}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Hoek, André van der |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Support for Iteration on Code Search
Lee Martie, André van der Hoek, and Thomas Kwak (University of California at Irvine, USA) Sometimes, when programmers use a search engine they know more or less what they need. Other times, programmers use the search engine to look around and generate possible ideas for the programming problem they are working on. The key insight we explore in this paper is that the results found in the latter case tend to serve as inspiration or triggers for the next queries issued. We introduce two search engines, CodeExchange and CodeLikeThis, both of which are specifically designed to enable the user to directly leverage the results in formulating the next query. CodeExchange does this with a set of four features supporting the programmer to use characteristics of the results to find other code with or without those characteristics. CodeLikeThis supports simply selecting an entire result to find code that is analogous, to some degree, to that result. We evaluated how these approaches were used along with two approaches not explicitly supporting iteration, a baseline and Google, in a user study among 24 developers. We find that search engines that support using results to form the next query can improve the programmers’ search experience and different approaches to iteration can provide better experiences depending on the task. @InProceedings{ESEC/FSE17p774, author = {Lee Martie and André van der Hoek and Thomas Kwak}, title = {Understanding the Impact of Support for Iteration on Code Search}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {774--785}, doi = {}, year = {2017}, } |
|
Hoffmann, Henry |
ESEC/FSE '17: "Automated Control of Multiple ..."
Automated Control of Multiple Software Goals using Multiple Actuators
Martina Maggio, Alessandro Vittorio Papadopoulos, Antonio Filieri, and Henry Hoffmann (Lund University, Sweden; Mälardalen University, Sweden; Imperial College London, UK; University of Chicago, USA) Modern software should satisfy multiple goals simultaneously: it should provide predictable performance, be robust to failures, handle peak loads and deal seamlessly with unexpected conditions and changes in the execution environment. For this to happen, software designs should account for the possibility of runtime changes and provide formal guarantees of the software's behavior. Control theory is one of the possible design drivers for runtime adaptation, but adopting control theoretic principles often requires additional, specialized knowledge. To overcome this limitation, automated methodologies have been proposed to extract the necessary information from experimental data and design a control system for runtime adaptation. These proposals, however, only process one goal at a time, creating a chain of controllers. In this paper, we propose and evaluate the first automated strategy that takes into account multiple goals without separating them into multiple control strategies. Avoiding the separation allows us to tackle a larger class of problems and provide stronger guarantees. We test our methodology's generality with three case studies that demonstrate its broad applicability in meeting performance, reliability, quality, security, and energy goals despite environmental or requirements changes. @InProceedings{ESEC/FSE17p373, author = {Martina Maggio and Alessandro Vittorio Papadopoulos and Antonio Filieri and Henry Hoffmann}, title = {Automated Control of Multiple Software Goals using Multiple Actuators}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {373--384}, doi = {}, year = {2017}, } Info |
|
Hofmeister, Johannes |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Holmes, Reid |
ESEC/FSE '17: "Measuring the Cost of Regression ..."
Measuring the Cost of Regression Testing in Practice: A Study of Java Projects using Continuous Integration
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes (University of Waterloo, Canada; University of British Columbia, Canada) Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it. In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost. We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test. Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle. @InProceedings{ESEC/FSE17p821, author = {Adriaan Labuschagne and Laura Inozemtseva and Reid Holmes}, title = {Measuring the Cost of Regression Testing in Practice: A Study of Java Projects using Continuous Integration}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {821--830}, doi = {}, year = {2017}, } Info |
|
Iannacone, Jake |
ESEC/FSE '17: "Understanding Misunderstandings ..."
Understanding Misunderstandings in Source Code
Dan Gopstein, Jake Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos (New York University, USA; Pennsylvania State University, USA; University of Colorado at Colorado Springs, USA) Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback. @InProceedings{ESEC/FSE17p129, author = {Dan Gopstein and Jake Iannacone and Yu Yan and Lois DeLong and Yanyan Zhuang and Martin K.-C. Yeh and Justin Cappos}, title = {Understanding Misunderstandings in Source Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--139}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Inozemtseva, Laura |
ESEC/FSE '17: "Measuring the Cost of Regression ..."
Measuring the Cost of Regression Testing in Practice: A Study of Java Projects using Continuous Integration
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes (University of Waterloo, Canada; University of British Columbia, Canada) Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it. In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost. We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test. Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle. @InProceedings{ESEC/FSE17p821, author = {Adriaan Labuschagne and Laura Inozemtseva and Reid Holmes}, title = {Measuring the Cost of Regression Testing in Practice: A Study of Java Projects using Continuous Integration}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {821--830}, doi = {}, year = {2017}, } Info |
|
Islam, Syed |
ESEC/FSE '17: "Generalized Observational ..."
Generalized Observational Slicing for Tree-Represented Modelling Languages
Nicolas E. Gold, David Binkley, Mark Harman , Syed Islam, Jens Krinke, and Shin Yoo (University College London, UK; Loyola University Maryland, USA; University of East London, UK; KAIST, South Korea) Model-driven software engineering raises the abstraction level making complex systems easier to understand than if written in textual code. Nevertheless, large complicated software systems can have large models, motivating the need for slicing techniques that reduce the size of a model. We present a generalization of observation-based slicing that allows the criterion to be defined using a variety of kinds of observable behavior and does not require any complex dependence analysis. We apply our implementation of generalized observational slicing for tree-structured representations to Simulink models. The resulting slice might be the subset of the original model responsible for an observed failure or simply the sub-model semantically related to a classic slicing criterion. Unlike its predecessors, the algorithm is also capable of slicing embedded Stateflow state machines. A study of nine real-world models drawn from four different application domains demonstrates the effectiveness of our approach at dramatically reducing Simulink model sizes for realistic observation scenarios: for 9 out of 20 cases, the resulting model has fewer than 25% of the original model's elements. @InProceedings{ESEC/FSE17p547, author = {Nicolas E. Gold and David Binkley and Mark Harman and Syed Islam and Jens Krinke and Shin Yoo}, title = {Generalized Observational Slicing for Tree-Represented Modelling Languages}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {547--558}, doi = {}, year = {2017}, } |
|
Jabbarvand, Reyhaneh |
ESEC/FSE '17: "µDroid: An Energy-Aware Mutation ..."
µDroid: An Energy-Aware Mutation Testing Framework for Android
Reyhaneh Jabbarvand and Sam Malek (University of California at Irvine, USA) The rising popularity of mobile apps deployed on battery-constrained devices underlines the need for effectively evaluating their energy properties. However, currently there is a lack of testing tools for evaluating the energy properties of apps. As a result, for energy testing, developers are relying on tests intended for evaluating the functional correctness of apps. Such tests may not be adequate for revealing energy defects and inefficiencies in apps. This paper presents an energy-aware mutation testing framework, called μDROID, that can be used by developers to assess the adequacy of their test suite for revealing energy-related defects. μDROID implements fifty energy-aware mutation operators and relies on a novel, automatic oracle to determine if a mutant can be killed by a test. Our evaluation on real-world Android apps shows the ability of proposed mutation operators for evaluating the utility of tests in revealing energy defects. Moreover, our automated oracle can detect whether tests kill the energy mutants with an overall accuracy of 94%, thereby making it possible to apply μDROID automatically. @InProceedings{ESEC/FSE17p208, author = {Reyhaneh Jabbarvand and Sam Malek}, title = {µDroid: An Energy-Aware Mutation Testing Framework for Android}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {208--219}, doi = {}, year = {2017}, } ESEC/FSE '17: "PATDroid: Permission-Aware ..." PATDroid: Permission-Aware GUI Testing of Android Alireza Sadeghi, Reyhaneh Jabbarvand, and Sam Malek (University of California at Irvine, USA) Recent introduction of a dynamic permission system in Android, allowing the users to grant and revoke permissions after the installation of an app, has made it harder to properly test apps. Since an app's behavior may change depending on the granted permissions, it needs to be tested under a wide range of permission combinations. At the state-of-the-art, in the absence of any automated tool support, a developer needs to either manually determine the interaction of tests and app permissions, or exhaustively re-execute tests for all possible permission combinations, thereby increasing the time and resources required to test apps. This paper presents an automated approach, called PATDroid, for efficiently testing an Android app while taking the impact of permissions on its behavior into account. PATDroid performs a hybrid program analysis on both an app under test and its test suite to determine which tests should be executed on what permission combinations. Our experimental results show that PATDroid significantly reduces the testing effort, yet achieves comparable code coverage and fault detection capability as exhaustively testing an app under all permission combinations. @InProceedings{ESEC/FSE17p220, author = {Alireza Sadeghi and Reyhaneh Jabbarvand and Sam Malek}, title = {PATDroid: Permission-Aware GUI Testing of Android}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {220--232}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Jermaine, Chris |
ESEC/FSE '17: "Bayesian Specification Learning ..."
Bayesian Specification Learning for Finding API Usage Errors
Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine (Rice University, USA) We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a "reference distribution" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches. @InProceedings{ESEC/FSE17p151, author = {Vijayaraghavan Murali and Swarat Chaudhuri and Chris Jermaine}, title = {Bayesian Specification Learning for Finding API Usage Errors}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {151--162}, doi = {}, year = {2017}, } |
|
Jha, Somesh |
ESEC/FSE '17: "Cimplifier: Automatically ..."
Cimplifier: Automatically Debloating Containers
Vaibhav Rastogi, Drew Davidson, Lorenzo De Carli, Somesh Jha , and Patrick McDaniel (University of Wisconsin-Madison, USA; Tala Security, USA; Colorado State University, USA; Pennsylvania State University, USA) Application containers, such as those provided by Docker, have recently gained popularity as a solution for agile and seamless software deployment. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. Unfortunately, the ease with which containers can be created is oftentimes a double-edged sword, encouraging the packaging of logically distinct applications, and the inclusion of significant amount of unnecessary components, within a single container. These practices needlessly increase the container size—sometimes by orders of magnitude. They also decrease the overall security, as each included component—necessary or not—may bring in security issues of its own, and there is no isolation between multiple applications packaged within the same container image. We propose algorithms and a tool called Cimplifier, which address these concerns: given a container and simple user-defined constraints, our tool partitions it into simpler containers, which (i) are isolated from each other, only communicating as necessary, and (ii) only include enough resources to perform their functionality. Our evaluation on real-world containers demonstrates that Cimplifier preserves the original functionality, leads to reduction in image size of up to 95%, and processes even large containers in under thirty seconds. @InProceedings{ESEC/FSE17p476, author = {Vaibhav Rastogi and Drew Davidson and Lorenzo De Carli and Somesh Jha and Patrick McDaniel}, title = {Cimplifier: Automatically Debloating Containers}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {476--486}, doi = {}, year = {2017}, } |
|
Kamath, Amita Ajith |
ESEC/FSE '17: "ARTINALI: Dynamic Invariant ..."
ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security
Maryam Raiyat Aliabadi, Amita Ajith Kamath, Julien Gascon-Samson, and Karthik Pattabiraman (University of British Columbia, Canada; National Institute of Technology Karnataka, India) Cyber-Physical Systems (CPSes) are being widely deployed in security critical scenarios such as smart homes and medical devices. Unfortunately, the connectedness of these systems and their relative lack of security measures makes them ripe targets for attacks. Specification-based Intrusion Detection Systems (IDS) have been shown to be effective for securing CPSs. Unfortunately, deriving invariants for capturing the specifications of CPS systems is a tedious and error-prone process. Therefore, it is important to dynamically monitor the CPS system to learn its common behaviors and formulate invariants for detecting security attacks. Existing techniques for invariant mining only incorporate data and events, but not time. However, time is central to most CPS systems, and hence incorporating time in addition to data and events, is essential for achieving low false positives and false negatives. This paper proposes ARTINALI, which mines dynamic system properties by incorporating time as a first-class property of the system. We build ARTINALI-based Intrusion Detection Systems (IDSes) for two CPSes, namely smart meters and smart medical devices, and measure their efficacy. We find that the ARTINALI-based IDSes significantly reduce the ratio of false positives and false negatives by 16 to 48% (average 30.75%) and 89 to 95% (average 93.4%) respectively over other dynamic invariant detection tools. @InProceedings{ESEC/FSE17p349, author = {Maryam Raiyat Aliabadi and Amita Ajith Kamath and Julien Gascon-Samson and Karthik Pattabiraman}, title = {ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {349--361}, doi = {}, year = {2017}, } |
|
Kamp, Marius |
ESEC/FSE '17: "More Accurate Recommendations ..."
More Accurate Recommendations for Method-Level Changes
Georg Dotzler, Marius Kamp, Patrick Kreutzer, and Michael Philippsen (Friedrich-Alexander University Erlangen-Nürnberg, Germany) During the life span of large software projects, developers often apply the same code changes to different code locations in slight variations. Since the application of these changes to all locations is time-consuming and error-prone, tools exist that learn change patterns from input examples, search for possible pattern applications, and generate corresponding recommendations. In many cases, the generated recommendations are syntactically or semantically wrong due to code movements in the input examples. Thus, they are of low accuracy and developers cannot directly copy them into their projects without adjustments. We present the Accurate REcommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations. On average, the recommendations by ARES have an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives. At the same time ARES achieves precision and recall values that are on par with other tools. @InProceedings{ESEC/FSE17p798, author = {Georg Dotzler and Marius Kamp and Patrick Kreutzer and Michael Philippsen}, title = {More Accurate Recommendations for Method-Level Changes}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {798--808}, doi = {}, year = {2017}, } Info |
|
Karkare, Amey |
ESEC/FSE '17: "A Feasibility Study of Using ..."
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury (Innopolis University, Russia; IIT Kanpur, India; National University of Singapore, Singapore) Despite the fact an intelligent tutoring system for programming (ITSP) education has long attracted interest, its widespread use has been hindered by the difficulty of generating personalized feedback automatically. Meanwhile, automated program repair (APR) is an emerging new technology that automatically fixes software bugs, and it has been shown that APR can fix the bugs of large real-world software. In this paper, we study the feasibility of marrying intelligent programming tutoring and APR. We perform our feasibility study with four state-of-the-art APR tools (GenProg, AE, Angelix, and Prophet), and 661 programs written by the students taking an introductory programming course. We found that when APR tools are used out of the box, only about 30% of the programs in our dataset are repaired. This low repair rate is largely due to the student programs often being significantly incorrect — in contrast, professional software for which APR was successfully applied typically fails only a small portion of tests. To bridge this gap, we adopt in APR a new repair policy akin to the hint generation policy employed in the existing ITSP. This new repair policy admits partial repairs that address part of failing tests, which results in 84% improvement of repair rate. We also performed a user study with 263 novice students and 37 graders, and identified an understudied problem; while novice students do not seem to know how to effectively make use of generated repairs as hints, the graders do seem to gain benefits from repairs. @InProceedings{ESEC/FSE17p740, author = {Jooyong Yi and Umair Z. Ahmed and Amey Karkare and Shin Hwei Tan and Abhik Roychoudhury}, title = {A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {740--751}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Kästner, Christian |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Kehrer, Timo |
ESEC/FSE '17: "Modeling and Verification ..."
Modeling and Verification of Evolving Cyber-Physical Spaces
Christos Tsigkanos, Timo Kehrer, and Carlo Ghezzi (Politecnico di Milano, Italy) We increasingly live in cyber-physical spaces -- spaces that are both physical and digital, and where the two aspects are intertwined. Such spaces are highly dynamic and typically undergo continuous change. Software engineering can have a profound impact in this domain, by defining suitable modeling and specification notations as well as supporting design-time formal verification. In this paper, we present a methodology and a technical framework which support modeling of evolving cyber-physical spaces and reasoning about their spatio-temporal properties. We utilize a discrete, graph-based formalism for modeling cyber-physical spaces as well as primitives of change, giving rise to a reactive system consisting of rewriting rules with both local and global application conditions. Formal reasoning facilities are implemented adopting logic-based specification of properties and according model checking procedures, in both spatial and temporal fragments. We evaluate our approach using a case study of a disaster scenario in a smart city. @InProceedings{ESEC/FSE17p38, author = {Christos Tsigkanos and Timo Kehrer and Carlo Ghezzi}, title = {Modeling and Verification of Evolving Cyber-Physical Spaces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {38--48}, doi = {}, year = {2017}, } |
|
Knüppel, Alexander |
ESEC/FSE '17: "Is There a Mismatch between ..."
Is There a Mismatch between Real-World Feature Models and Product-Line Research?
Alexander Knüppel, Thomas Thüm, Stephan Mennicke, Jens Meinicke, and Ina Schaefer (TU Braunschweig, Germany; University of Magdeburg, Germany) Feature modeling has emerged as the de-facto standard to compactly capture the variability of a software product line. Multiple feature modeling languages have been proposed that evolved over the last decades to manage industrial-size product lines. However, less expressive languages, solely permitting require and exclude constraints, are permanently and carelessly used in product-line research. We address the problem whether those less expressive languages are sufficient for industrial product lines. We developed an algorithm to eliminate complex cross-tree constraints in a feature model, enabling the combination of tools and algorithms working with different feature model dialects in a plug-and-play manner. However, the scope of our algorithm is limited. Our evaluation on large feature models, including the Linux kernel, gives evidence that require and exclude constraints are not sufficient to express real-world feature models. Hence, we promote that research on feature models needs to consider arbitrary propositional formulas as cross-tree constraints prospectively. @InProceedings{ESEC/FSE17p291, author = {Alexander Knüppel and Thomas Thüm and Stephan Mennicke and Jens Meinicke and Ina Schaefer}, title = {Is There a Mismatch between Real-World Feature Models and Product-Line Research?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {291--302}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Kreutzer, Patrick |
ESEC/FSE '17: "More Accurate Recommendations ..."
More Accurate Recommendations for Method-Level Changes
Georg Dotzler, Marius Kamp, Patrick Kreutzer, and Michael Philippsen (Friedrich-Alexander University Erlangen-Nürnberg, Germany) During the life span of large software projects, developers often apply the same code changes to different code locations in slight variations. Since the application of these changes to all locations is time-consuming and error-prone, tools exist that learn change patterns from input examples, search for possible pattern applications, and generate corresponding recommendations. In many cases, the generated recommendations are syntactically or semantically wrong due to code movements in the input examples. Thus, they are of low accuracy and developers cannot directly copy them into their projects without adjustments. We present the Accurate REcommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations. On average, the recommendations by ARES have an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives. At the same time ARES achieves precision and recall values that are on par with other tools. @InProceedings{ESEC/FSE17p798, author = {Georg Dotzler and Marius Kamp and Patrick Kreutzer and Michael Philippsen}, title = {More Accurate Recommendations for Method-Level Changes}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {798--808}, doi = {}, year = {2017}, } Info |
|
Krinke, Jens |
ESEC/FSE '17: "Generalized Observational ..."
Generalized Observational Slicing for Tree-Represented Modelling Languages
Nicolas E. Gold, David Binkley, Mark Harman , Syed Islam, Jens Krinke, and Shin Yoo (University College London, UK; Loyola University Maryland, USA; University of East London, UK; KAIST, South Korea) Model-driven software engineering raises the abstraction level making complex systems easier to understand than if written in textual code. Nevertheless, large complicated software systems can have large models, motivating the need for slicing techniques that reduce the size of a model. We present a generalization of observation-based slicing that allows the criterion to be defined using a variety of kinds of observable behavior and does not require any complex dependence analysis. We apply our implementation of generalized observational slicing for tree-structured representations to Simulink models. The resulting slice might be the subset of the original model responsible for an observed failure or simply the sub-model semantically related to a classic slicing criterion. Unlike its predecessors, the algorithm is also capable of slicing embedded Stateflow state machines. A study of nine real-world models drawn from four different application domains demonstrates the effectiveness of our approach at dramatically reducing Simulink model sizes for realistic observation scenarios: for 9 out of 20 cases, the resulting model has fewer than 25% of the original model's elements. @InProceedings{ESEC/FSE17p547, author = {Nicolas E. Gold and David Binkley and Mark Harman and Syed Islam and Jens Krinke and Shin Yoo}, title = {Generalized Observational Slicing for Tree-Represented Modelling Languages}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {547--558}, doi = {}, year = {2017}, } |
|
Krishnamurthi, Shriram |
ESEC/FSE '17: "The Power of "Why" ..."
The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance
Tim Nelson, Natasha Danas, Daniel J. Dougherty, and Shriram Krishnamurthi (Brown University, USA; Worcester Polytechnic Institute, USA) Scenario-finding tools like the Alloy Analyzer are widely used in numerous concrete domains like security, network analysis, UML analysis, and so on. They can help to verify properties and, more generally, aid in exploring a system's behavior. While scenario finders are valuable for their ability to produce concrete examples, individual scenarios only give insight into what is possible, leaving the user to make their own conclusions about what might be necessary. This paper enriches scenario finding by allowing users to ask ``why?'' and ``why not?'' questions about the examples they are given. We show how to distinguish parts of an example that cannot be consistently removed (or changed) from those that merely reflect underconstraint in the specification. In the former case we show how to determine which elements of the specification and which other components of the example together explain the presence of such facts. This paper formalizes the act of computing provenance in scenario-finding. We present Amalgam, an extension of the popular Alloy scenario-finder, which implements these foundations and provides interactive exploration of examples. We also evaluate Amalgam's algorithmics on a variety of both textbook and real-world examples. @InProceedings{ESEC/FSE17p106, author = {Tim Nelson and Natasha Danas and Daniel J. Dougherty and Shriram Krishnamurthi}, title = {The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {106--116}, doi = {}, year = {2017}, } Info Artifacts Reusable Best-Paper Award |
|
Kusano, Markus |
ESEC/FSE '17: "Thread-Modular Static Analysis ..."
Thread-Modular Static Analysis for Relaxed Memory Models
Markus Kusano and Chao Wang (Virginia Tech, USA; University of Southern California, USA) We propose a memory-model-aware static program analysis method for accurately analyzing the behavior of concurrent software running on processors with weak consistency models such as x86-TSO, SPARC-PSO, and SPARC-RMO. At the center of our method is a unified framework for deciding the feasibility of inter-thread interferences to avoid propagating spurious data flows during static analysis and thus boost the performance of the static analyzer. We formulate the checking of interference feasibility as a set of Datalog rules which are both efficiently solvable and general enough to capture a range of hardware-level memory models. Compared to existing techniques, our method can significantly reduce the number of bogus alarms as well as unsound proofs. We implemented the method and evaluated it on a large set of multithreaded C programs. Our experiments show the method significantly outperforms state-of-the-art techniques in terms of accuracy with only moderate runtime overhead. @InProceedings{ESEC/FSE17p337, author = {Markus Kusano and Chao Wang}, title = {Thread-Modular Static Analysis for Relaxed Memory Models}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {337--348}, doi = {}, year = {2017}, } |
|
Kuvent, Aviv |
ESEC/FSE '17: "A Symbolic Justice Violations ..."
A Symbolic Justice Violations Transition System for Unrealizable GR(1) Specifications
Aviv Kuvent, Shahar Maoz , and Jan Oliver Ringert (Tel Aviv University, Israel) One of the main challenges of reactive synthesis, an automated procedure to obtain a correct-by-construction reactive system, is to deal with unrealizable specifications. Existing approaches to deal with unrealizability, in the context of GR(1), an expressive assume-guarantee fragment of LTL that enables efficient synthesis, include the generation of concrete counter-strategies and the computation of an unrealizable core. Although correct, such approaches produce large and complicated counter-strategies, often containing thousands of states. This hinders their use by engineers. In this work we present the Justice Violations Transition System (JVTS), a novel symbolic representation of counter-strategies for GR(1). The JVTS is much smaller and simpler than its corresponding concrete counter-strategy. Moreover, it is annotated with invariants that explain how the counter-strategy forces the system to violate the specification. We compute the JVTS symbolically, and thus more efficiently, without the expensive enumeration of concrete states. Finally, we provide the JVTS with an on-demand interactive concrete and symbolic play. We implemented our work, validated its correctness, and evaluated it on 14 unrealizable specifications of autonomous Lego robots as well as on benchmarks from the literature. The evaluation shows not only that the JVTS is in most cases much smaller than the corresponding concrete counter-strategy, but also that its computation is faster. @InProceedings{ESEC/FSE17p362, author = {Aviv Kuvent and Shahar Maoz and Jan Oliver Ringert}, title = {A Symbolic Justice Violations Transition System for Unrealizable GR(1) Specifications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {362--372}, doi = {}, year = {2017}, } Info |
|
Kwak, Thomas |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Support for Iteration on Code Search
Lee Martie, André van der Hoek, and Thomas Kwak (University of California at Irvine, USA) Sometimes, when programmers use a search engine they know more or less what they need. Other times, programmers use the search engine to look around and generate possible ideas for the programming problem they are working on. The key insight we explore in this paper is that the results found in the latter case tend to serve as inspiration or triggers for the next queries issued. We introduce two search engines, CodeExchange and CodeLikeThis, both of which are specifically designed to enable the user to directly leverage the results in formulating the next query. CodeExchange does this with a set of four features supporting the programmer to use characteristics of the results to find other code with or without those characteristics. CodeLikeThis supports simply selecting an entire result to find code that is analogous, to some degree, to that result. We evaluated how these approaches were used along with two approaches not explicitly supporting iteration, a baseline and Google, in a user study among 24 developers. We find that search engines that support using results to form the next query can improve the programmers’ search experience and different approaches to iteration can provide better experiences depending on the task. @InProceedings{ESEC/FSE17p774, author = {Lee Martie and André van der Hoek and Thomas Kwak}, title = {Understanding the Impact of Support for Iteration on Code Search}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {774--785}, doi = {}, year = {2017}, } |
|
Labuschagne, Adriaan |
ESEC/FSE '17: "Measuring the Cost of Regression ..."
Measuring the Cost of Regression Testing in Practice: A Study of Java Projects using Continuous Integration
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes (University of Waterloo, Canada; University of British Columbia, Canada) Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it. In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost. We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test. Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle. @InProceedings{ESEC/FSE17p821, author = {Adriaan Labuschagne and Laura Inozemtseva and Reid Holmes}, title = {Measuring the Cost of Regression Testing in Practice: A Study of Java Projects using Continuous Integration}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {821--830}, doi = {}, year = {2017}, } Info |
|
Lahtinen, Eric |
ESEC/FSE '17: "CodeCarbonCopy ..."
CodeCarbonCopy
Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin Rinard (Massachusetts Institute of Technology, USA) We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obtain a translation between the data representation and name space of the recipient and the data representation and name space of the donor. It also implements a static analysis that identifies and removes irrelevant functionality useful in the donor but not in the recipient. We evaluate CCC on eight transfers between six applications. Our results show that CCC can successfully transfer donor functionality into recipient applications. @InProceedings{ESEC/FSE17p95, author = {Stelios Sidiroglou-Douskos and Eric Lahtinen and Anthony Eden and Fan Long and Martin Rinard}, title = {CodeCarbonCopy}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--105}, doi = {}, year = {2017}, } |
|
Le, Xuan-Bach D. |
ESEC/FSE '17: "S3: Syntax- and Semantic-Guided ..."
S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo , Claire Le Goues , and Willem Visser (Singapore Management University, Singapore; IST Austria, Austria; Carnegie Mellon University, USA; Stellenbosch University, South Africa) A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs. @InProceedings{ESEC/FSE17p593, author = {Xuan-Bach D. Le and Duc-Hiep Chu and David Lo and Claire Le Goues and Willem Visser}, title = {S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {593--604}, doi = {}, year = {2017}, } |
|
Lee, Wen-Chuan |
ESEC/FSE '17: "LAMP: Data Provenance for ..."
LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation
Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, and Xiangyu Zhang (Purdue University, USA; Nanjing University, China) Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering. @InProceedings{ESEC/FSE17p786, author = {Shiqing Ma and Yousra Aafer and Zhaogui Xu and Wen-Chuan Lee and Juan Zhai and Yingqi Liu and Xiangyu Zhang}, title = {LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {786--797}, doi = {}, year = {2017}, } |
|
Lerch, Johannes |
ESEC/FSE '17: "CodeMatch: Obfuscation Won't ..."
CodeMatch: Obfuscation Won't Conceal Your Repackaged App
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini (TU Darmstadt, Germany) An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app’s library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages @InProceedings{ESEC/FSE17p638, author = {Leonid Glanz and Sven Amann and Michael Eichberg and Michael Reif and Ben Hermann and Johannes Lerch and Mira Mezini}, title = {CodeMatch: Obfuscation Won't Conceal Your Repackaged App}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {638--648}, doi = {}, year = {2017}, } Info |
|
Li, Xiaohong |
ESEC/FSE '17: "Loopster: Static Loop Termination ..."
Loopster: Static Loop Termination Analysis
Xiaofei Xie, Bihuan Chen, Liang Zou, Shang-Wei Lin, Yang Liu , and Xiaohong Li (Tianjin University, China; Nanyang Technological University, Singapore) Loop termination is an important problem for proving the correctness of a system and ensuring that the system always reacts. Existing loop termination analysis techniques mainly depend on the synthesis of ranking functions, which is often expensive. In this paper, we present a novel approach, named Loopster, which performs an efficient static analysis to decide the termination for loops based on path termination analysis and path dependency reasoning. Loopster adopts a divide-and-conquer approach: (1) we extract individual paths from a target multi-path loop and analyze the termination of each path, (2) analyze the dependencies between each two paths, and then (3) determine the overall termination of the target loop based on the relations among paths. We evaluate Loopster by applying it on the loop termination competition benchmark and three real-world projects. The results show that Loopster is effective in a majority of loops with better accuracy and 20 ×+ performance improvement compared to the state-of-the-art tools. @InProceedings{ESEC/FSE17p84, author = {Xiaofei Xie and Bihuan Chen and Liang Zou and Shang-Wei Lin and Yang Liu and Xiaohong Li}, title = {Loopster: Static Loop Termination Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {84--94}, doi = {}, year = {2017}, } |
|
Li, Yuekang |
ESEC/FSE '17: "Steelix: Program-State Based ..."
Steelix: Program-State Based Binary Fuzzing
Yuekang Li , Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu , and Alwen Tiu (Nanyang Technological University, Singapore; Fudan University, China) Coverage-based fuzzing is one of the most effective techniques to find vulnerabilities, bugs or crashes. However, existing techniques suffer from the difficulty in exercising the paths that are protected by magic bytes comparisons (e.g., string equality comparisons). Several approaches have been proposed to use heavy-weight program analysis to break through magic bytes comparisons, and hence are less scalable. In this paper, we propose a program-state based binary fuzzing approach, named Steelix, which improves the penetration power of a fuzzer at the cost of an acceptable slow down of the execution speed. In particular, we use light-weight static analysis and binary instrumentation to provide not only coverage information but also comparison progress information to a fuzzer. Such program state information informs a fuzzer about where the magic bytes are located in the test input and how to perform mutations to match the magic bytes efficiently. We have implemented Steelix and evaluated it on three datasets: LAVA-M dataset, DARPA CGC sample binaries and five real-life programs. The results show that Steelix has better code coverage and bug detection capability than the state-of-the-art fuzzers. Moreover, we found one CVE and nine new bugs. @InProceedings{ESEC/FSE17p627, author = {Yuekang Li and Bihuan Chen and Mahinthan Chandramohan and Shang-Wei Lin and Yang Liu and Alwen Tiu}, title = {Steelix: Program-State Based Binary Fuzzing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {627--637}, doi = {}, year = {2017}, } |
|
Liblit, Ben |
ESEC/FSE '17: "The Care and Feeding of Wild-Caught ..."
The Care and Feeding of Wild-Caught Mutants
David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps (University of Wisconsin-Madison, USA) Mutation testing of a test suite and a program provides a way to measure the quality of the test suite. In essence, mutation testing is a form of sensitivity testing: by running mutated versions of the program against the test suite, mutation testing measures the suite’s sensitivity for detecting bugs that a programmer might introduce into the program. This paper introduces a technique to improve mutation testing that we call wild-caught mutants; it provides a method for creating potential faults that are more closely coupled with changes made by actual programmers. This technique allows the mutation tester to have more certainty that the test suite is sensitive to the kind of changes that have been observed to have been made by programmers in real-world cases. @InProceedings{ESEC/FSE17p511, author = {David Bingham Brown and Michael Vaughn and Ben Liblit and Thomas Reps}, title = {The Care and Feeding of Wild-Caught Mutants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {511--522}, doi = {}, year = {2017}, } Video Info Artifacts Reusable |
|
Lin, Shang-Wei |
ESEC/FSE '17: "Loopster: Static Loop Termination ..."
Loopster: Static Loop Termination Analysis
Xiaofei Xie, Bihuan Chen, Liang Zou, Shang-Wei Lin, Yang Liu , and Xiaohong Li (Tianjin University, China; Nanyang Technological University, Singapore) Loop termination is an important problem for proving the correctness of a system and ensuring that the system always reacts. Existing loop termination analysis techniques mainly depend on the synthesis of ranking functions, which is often expensive. In this paper, we present a novel approach, named Loopster, which performs an efficient static analysis to decide the termination for loops based on path termination analysis and path dependency reasoning. Loopster adopts a divide-and-conquer approach: (1) we extract individual paths from a target multi-path loop and analyze the termination of each path, (2) analyze the dependencies between each two paths, and then (3) determine the overall termination of the target loop based on the relations among paths. We evaluate Loopster by applying it on the loop termination competition benchmark and three real-world projects. The results show that Loopster is effective in a majority of loops with better accuracy and 20 ×+ performance improvement compared to the state-of-the-art tools. @InProceedings{ESEC/FSE17p84, author = {Xiaofei Xie and Bihuan Chen and Liang Zou and Shang-Wei Lin and Yang Liu and Xiaohong Li}, title = {Loopster: Static Loop Termination Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {84--94}, doi = {}, year = {2017}, } ESEC/FSE '17: "Steelix: Program-State Based ..." Steelix: Program-State Based Binary Fuzzing Yuekang Li , Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu , and Alwen Tiu (Nanyang Technological University, Singapore; Fudan University, China) Coverage-based fuzzing is one of the most effective techniques to find vulnerabilities, bugs or crashes. However, existing techniques suffer from the difficulty in exercising the paths that are protected by magic bytes comparisons (e.g., string equality comparisons). Several approaches have been proposed to use heavy-weight program analysis to break through magic bytes comparisons, and hence are less scalable. In this paper, we propose a program-state based binary fuzzing approach, named Steelix, which improves the penetration power of a fuzzer at the cost of an acceptable slow down of the execution speed. In particular, we use light-weight static analysis and binary instrumentation to provide not only coverage information but also comparison progress information to a fuzzer. Such program state information informs a fuzzer about where the magic bytes are located in the test input and how to perform mutations to match the magic bytes efficiently. We have implemented Steelix and evaluated it on three datasets: LAVA-M dataset, DARPA CGC sample binaries and five real-life programs. The results show that Steelix has better code coverage and bug detection capability than the state-of-the-art fuzzers. Moreover, we found one CVE and nine new bugs. @InProceedings{ESEC/FSE17p627, author = {Yuekang Li and Bihuan Chen and Mahinthan Chandramohan and Shang-Wei Lin and Yang Liu and Alwen Tiu}, title = {Steelix: Program-State Based Binary Fuzzing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {627--637}, doi = {}, year = {2017}, } |
|
Linares-Vásquez, Mario |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info |
|
Liu, Yang |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } ESEC/FSE '17: "Loopster: Static Loop Termination ..." Loopster: Static Loop Termination Analysis Xiaofei Xie, Bihuan Chen, Liang Zou, Shang-Wei Lin, Yang Liu , and Xiaohong Li (Tianjin University, China; Nanyang Technological University, Singapore) Loop termination is an important problem for proving the correctness of a system and ensuring that the system always reacts. Existing loop termination analysis techniques mainly depend on the synthesis of ranking functions, which is often expensive. In this paper, we present a novel approach, named Loopster, which performs an efficient static analysis to decide the termination for loops based on path termination analysis and path dependency reasoning. Loopster adopts a divide-and-conquer approach: (1) we extract individual paths from a target multi-path loop and analyze the termination of each path, (2) analyze the dependencies between each two paths, and then (3) determine the overall termination of the target loop based on the relations among paths. We evaluate Loopster by applying it on the loop termination competition benchmark and three real-world projects. The results show that Loopster is effective in a majority of loops with better accuracy and 20 ×+ performance improvement compared to the state-of-the-art tools. @InProceedings{ESEC/FSE17p84, author = {Xiaofei Xie and Bihuan Chen and Liang Zou and Shang-Wei Lin and Yang Liu and Xiaohong Li}, title = {Loopster: Static Loop Termination Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {84--94}, doi = {}, year = {2017}, } ESEC/FSE '17: "Steelix: Program-State Based ..." Steelix: Program-State Based Binary Fuzzing Yuekang Li , Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu , and Alwen Tiu (Nanyang Technological University, Singapore; Fudan University, China) Coverage-based fuzzing is one of the most effective techniques to find vulnerabilities, bugs or crashes. However, existing techniques suffer from the difficulty in exercising the paths that are protected by magic bytes comparisons (e.g., string equality comparisons). Several approaches have been proposed to use heavy-weight program analysis to break through magic bytes comparisons, and hence are less scalable. In this paper, we propose a program-state based binary fuzzing approach, named Steelix, which improves the penetration power of a fuzzer at the cost of an acceptable slow down of the execution speed. In particular, we use light-weight static analysis and binary instrumentation to provide not only coverage information but also comparison progress information to a fuzzer. Such program state information informs a fuzzer about where the magic bytes are located in the test input and how to perform mutations to match the magic bytes efficiently. We have implemented Steelix and evaluated it on three datasets: LAVA-M dataset, DARPA CGC sample binaries and five real-life programs. The results show that Steelix has better code coverage and bug detection capability than the state-of-the-art fuzzers. Moreover, we found one CVE and nine new bugs. @InProceedings{ESEC/FSE17p627, author = {Yuekang Li and Bihuan Chen and Mahinthan Chandramohan and Shang-Wei Lin and Yang Liu and Alwen Tiu}, title = {Steelix: Program-State Based Binary Fuzzing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {627--637}, doi = {}, year = {2017}, } |
|
Liu, Yepang |
ESEC/FSE '17: "OASIS: Prioritizing Static ..."
OASIS: Prioritizing Static Analysis Warnings for Android Apps Based on App User Reviews
Lili Wei , Yepang Liu, and Shing-Chi Cheung (Hong Kong University of Science and Technology, China) Lint is a widely-used static analyzer for detecting bugs/issues in Android apps. However, it can generate many false warnings. One existing solution to this problem is to leverage project history data (e.g., bug fixing statistics) for warning prioritization. Unfortunately, such techniques are biased toward a project’s archived warnings and can easily miss newissues. Anotherweakness is that developers cannot readily relate the warnings to the impacts perceivable by users. To overcome these weaknesses, in this paper, we propose a semantics-aware approach, OASIS, to prioritizing Lint warnings by leveraging app user reviews. OASIS combines program analysis and NLP techniques to recover the intrinsic links between the Lint warnings for a given app and the user complaints on the app problems caused by the issues of concern. OASIS leverages the strength of such links to prioritize warnings. We evaluated OASIS on six popular and large-scale open-source Android apps. The results show that OASIS can effectively prioritize Lint warnings and help identify new issues that are previously-unknown to app developers. @InProceedings{ESEC/FSE17p672, author = {Lili Wei and Yepang Liu and Shing-Chi Cheung}, title = {OASIS: Prioritizing Static Analysis Warnings for Android Apps Based on App User Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {672--682}, doi = {}, year = {2017}, } |
|
Liu, Yingqi |
ESEC/FSE '17: "LAMP: Data Provenance for ..."
LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation
Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, and Xiangyu Zhang (Purdue University, USA; Nanjing University, China) Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering. @InProceedings{ESEC/FSE17p786, author = {Shiqing Ma and Yousra Aafer and Zhaogui Xu and Wen-Chuan Lee and Juan Zhai and Yingqi Liu and Xiangyu Zhang}, title = {LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {786--797}, doi = {}, year = {2017}, } |
|
Liu, Yuefei |
ESEC/FSE '17: "Better Test Cases for Better ..."
Better Test Cases for Better Automated Program Repair
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan (University of Waterloo, Canada) Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad). @InProceedings{ESEC/FSE17p831, author = {Jinqiu Yang and Alexey Zhikhartsev and Yuefei Liu and Lin Tan}, title = {Better Test Cases for Better Automated Program Repair}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {831--841}, doi = {}, year = {2017}, } |
|
Livshits, Benjamin |
ESEC/FSE '17: "Toward Full Elasticity in ..."
Toward Full Elasticity in Distributed Static Analysis: The Case of Callgraph Analysis
Diego Garbervetsky , Edgardo Zoppi, and Benjamin Livshits (University of Buenos Aires, Argentina; Imperial College London, UK) In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms. @InProceedings{ESEC/FSE17p442, author = {Diego Garbervetsky and Edgardo Zoppi and Benjamin Livshits}, title = {Toward Full Elasticity in Distributed Static Analysis: The Case of Callgraph Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {442--453}, doi = {}, year = {2017}, } |
|
Llerena, Yamilet R. Serrano |
ESEC/FSE '17: "Probabilistic Model Checking ..."
Probabilistic Model Checking of Perturbed MDPs with Applications to Cloud Computing
Yamilet R. Serrano Llerena, Guoxin Su, and David S. Rosenblum (National University of Singapore, Singapore; University of Wollongong, Australia) Probabilistic model checking is a formal verification technique that has been applied successfully in a variety of domains, providing identification of system errors through quantitative verification of stochastic system models. One domain that can benefit from probabilistic model checking is cloud computing, which must provide highly reliable and secure computational and storage services to large numbers of mission-critical software systems. For real-world domains like cloud computing, external system factors and environmental changes must be estimated accurately in the form of probabilities in system models; inaccurate estimates for the model probabilities can lead to invalid verification results. To address the effects of uncertainty in probability estimates, in previous work we have developed a variety of techniques for perturbation analysis of discrete- and continuous-time Markov chains (DTMCs and CTMCs). These techniques determine the consequences of the uncertainty on verification of system properties. In this paper, we present the first approach for perturbation analysis of Markov decision processes (MDPs), a stochastic formalism that is especially popular due to the significant expressive power it provides through the combination of both probabilistic and nondeterministic choice. Our primary contribution is a novel technique for efficiently analyzing the effects of perturbations of model probabilities on verification of reachability properties of MDPs. The technique heuristically explores the space of adversaries of an MDP, which encode the different ways of resolving the MDP’s nondeterministic choices. We demonstrate the practical effectiveness of our approach by applying it to two case studies of cloud systems. @InProceedings{ESEC/FSE17p454, author = {Yamilet R. Serrano Llerena and Guoxin Su and David S. Rosenblum}, title = {Probabilistic Model Checking of Perturbed MDPs with Applications to Cloud Computing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {454--464}, doi = {}, year = {2017}, } |
|
Lo, David |
ESEC/FSE '17: "S3: Syntax- and Semantic-Guided ..."
S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo , Claire Le Goues , and Willem Visser (Singapore Management University, Singapore; IST Austria, Austria; Carnegie Mellon University, USA; Stellenbosch University, South Africa) A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs. @InProceedings{ESEC/FSE17p593, author = {Xuan-Bach D. Le and Duc-Hiep Chu and David Lo and Claire Le Goues and Willem Visser}, title = {S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {593--604}, doi = {}, year = {2017}, } |
|
Long, Fan |
ESEC/FSE '17: "Automatic Inference of Code ..."
Automatic Inference of Code Transforms for Patch Generation
Fan Long, Peter Amidon, and Martin Rinard (Massachusetts Institute of Technology, USA; University of California at San Diego, USA) We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects. To the best of our knowledge, Genesis is the first system to automatically infer patch generation transforms or candidate patch search spaces from previous successful patches. @InProceedings{ESEC/FSE17p727, author = {Fan Long and Peter Amidon and Martin Rinard}, title = {Automatic Inference of Code Transforms for Patch Generation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {727--739}, doi = {}, year = {2017}, } Info Artifacts Functional ESEC/FSE '17: "CodeCarbonCopy ..." CodeCarbonCopy Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin Rinard (Massachusetts Institute of Technology, USA) We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obtain a translation between the data representation and name space of the recipient and the data representation and name space of the donor. It also implements a static analysis that identifies and removes irrelevant functionality useful in the donor but not in the recipient. We evaluate CCC on eight transfers between six applications. Our results show that CCC can successfully transfer donor functionality into recipient applications. @InProceedings{ESEC/FSE17p95, author = {Stelios Sidiroglou-Douskos and Eric Lahtinen and Anthony Eden and Fan Long and Martin Rinard}, title = {CodeCarbonCopy}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--105}, doi = {}, year = {2017}, } |
|
Lu, Jing |
ESEC/FSE '17: "Detecting Missing Information ..."
Detecting Missing Information in Bug Descriptions
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Ma, Shiqing |
ESEC/FSE '17: "LAMP: Data Provenance for ..."
LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation
Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, and Xiangyu Zhang (Purdue University, USA; Nanjing University, China) Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering. @InProceedings{ESEC/FSE17p786, author = {Shiqing Ma and Yousra Aafer and Zhaogui Xu and Wen-Chuan Lee and Juan Zhai and Yingqi Liu and Xiangyu Zhang}, title = {LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {786--797}, doi = {}, year = {2017}, } |
|
Maggio, Martina |
ESEC/FSE '17: "Automated Control of Multiple ..."
Automated Control of Multiple Software Goals using Multiple Actuators
Martina Maggio, Alessandro Vittorio Papadopoulos, Antonio Filieri, and Henry Hoffmann (Lund University, Sweden; Mälardalen University, Sweden; Imperial College London, UK; University of Chicago, USA) Modern software should satisfy multiple goals simultaneously: it should provide predictable performance, be robust to failures, handle peak loads and deal seamlessly with unexpected conditions and changes in the execution environment. For this to happen, software designs should account for the possibility of runtime changes and provide formal guarantees of the software's behavior. Control theory is one of the possible design drivers for runtime adaptation, but adopting control theoretic principles often requires additional, specialized knowledge. To overcome this limitation, automated methodologies have been proposed to extract the necessary information from experimental data and design a control system for runtime adaptation. These proposals, however, only process one goal at a time, creating a chain of controllers. In this paper, we propose and evaluate the first automated strategy that takes into account multiple goals without separating them into multiple control strategies. Avoiding the separation allows us to tackle a larger class of problems and provide stronger guarantees. We test our methodology's generality with three case studies that demonstrate its broad applicability in meeting performance, reliability, quality, security, and energy goals despite environmental or requirements changes. @InProceedings{ESEC/FSE17p373, author = {Martina Maggio and Alessandro Vittorio Papadopoulos and Antonio Filieri and Henry Hoffmann}, title = {Automated Control of Multiple Software Goals using Multiple Actuators}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {373--384}, doi = {}, year = {2017}, } Info |
|
Malek, Sam |
ESEC/FSE '17: "µDroid: An Energy-Aware Mutation ..."
µDroid: An Energy-Aware Mutation Testing Framework for Android
Reyhaneh Jabbarvand and Sam Malek (University of California at Irvine, USA) The rising popularity of mobile apps deployed on battery-constrained devices underlines the need for effectively evaluating their energy properties. However, currently there is a lack of testing tools for evaluating the energy properties of apps. As a result, for energy testing, developers are relying on tests intended for evaluating the functional correctness of apps. Such tests may not be adequate for revealing energy defects and inefficiencies in apps. This paper presents an energy-aware mutation testing framework, called μDROID, that can be used by developers to assess the adequacy of their test suite for revealing energy-related defects. μDROID implements fifty energy-aware mutation operators and relies on a novel, automatic oracle to determine if a mutant can be killed by a test. Our evaluation on real-world Android apps shows the ability of proposed mutation operators for evaluating the utility of tests in revealing energy defects. Moreover, our automated oracle can detect whether tests kill the energy mutants with an overall accuracy of 94%, thereby making it possible to apply μDROID automatically. @InProceedings{ESEC/FSE17p208, author = {Reyhaneh Jabbarvand and Sam Malek}, title = {µDroid: An Energy-Aware Mutation Testing Framework for Android}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {208--219}, doi = {}, year = {2017}, } ESEC/FSE '17: "Automatic Generation of Inter-Component ..." Automatic Generation of Inter-Component Communication Exploits for Android Applications Joshua Garcia , Mahmoud Hammad, Negar Ghorbani, and Sam Malek (University of California at Irvine, USA) Although a wide variety of approaches identify vulnerabilities in Android apps, none attempt to determine exploitability of those vulnerabilities. Exploitability can aid in reducing false positives of vulnerability analysis, and can help engineers triage bugs. Specifically, one of the main attack vectors of Android apps is their inter-component communication interface, where apps may receive messages called Intents. In this paper, we provide the first approach for automatically generating exploits for Android apps, called LetterBomb, relying on a combined path-sensitive symbolic execution-based static analysis, and the use of software instrumentation and test oracles. We run LetterBomb on 10,000 Android apps from Google Play, where we identify 181 exploits from 835 vulnerable apps. Compared to a state-of-the-art detection approach for three ICC-based vulnerabilities, LetterBomb obtains 33%-60% more vulnerabilities at a 6.66 to 7 times faster speed. @InProceedings{ESEC/FSE17p661, author = {Joshua Garcia and Mahmoud Hammad and Negar Ghorbani and Sam Malek}, title = {Automatic Generation of Inter-Component Communication Exploits for Android Applications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {661--671}, doi = {}, year = {2017}, } Info ESEC/FSE '17: "PATDroid: Permission-Aware ..." PATDroid: Permission-Aware GUI Testing of Android Alireza Sadeghi, Reyhaneh Jabbarvand, and Sam Malek (University of California at Irvine, USA) Recent introduction of a dynamic permission system in Android, allowing the users to grant and revoke permissions after the installation of an app, has made it harder to properly test apps. Since an app's behavior may change depending on the granted permissions, it needs to be tested under a wide range of permission combinations. At the state-of-the-art, in the absence of any automated tool support, a developer needs to either manually determine the interaction of tests and app permissions, or exhaustively re-execute tests for all possible permission combinations, thereby increasing the time and resources required to test apps. This paper presents an automated approach, called PATDroid, for efficiently testing an Android app while taking the impact of permissions on its behavior into account. PATDroid performs a hybrid program analysis on both an app under test and its test suite to determine which tests should be executed on what permission combinations. Our experimental results show that PATDroid significantly reduces the testing effort, yet achieves comparable code coverage and fault detection capability as exhaustively testing an app under all permission combinations. @InProceedings{ESEC/FSE17p220, author = {Alireza Sadeghi and Reyhaneh Jabbarvand and Sam Malek}, title = {PATDroid: Permission-Aware GUI Testing of Android}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {220--232}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Maoz, Shahar |
ESEC/FSE '17: "A Symbolic Justice Violations ..."
A Symbolic Justice Violations Transition System for Unrealizable GR(1) Specifications
Aviv Kuvent, Shahar Maoz , and Jan Oliver Ringert (Tel Aviv University, Israel) One of the main challenges of reactive synthesis, an automated procedure to obtain a correct-by-construction reactive system, is to deal with unrealizable specifications. Existing approaches to deal with unrealizability, in the context of GR(1), an expressive assume-guarantee fragment of LTL that enables efficient synthesis, include the generation of concrete counter-strategies and the computation of an unrealizable core. Although correct, such approaches produce large and complicated counter-strategies, often containing thousands of states. This hinders their use by engineers. In this work we present the Justice Violations Transition System (JVTS), a novel symbolic representation of counter-strategies for GR(1). The JVTS is much smaller and simpler than its corresponding concrete counter-strategy. Moreover, it is annotated with invariants that explain how the counter-strategy forces the system to violate the specification. We compute the JVTS symbolically, and thus more efficiently, without the expensive enumeration of concrete states. Finally, we provide the JVTS with an on-demand interactive concrete and symbolic play. We implemented our work, validated its correctness, and evaluated it on 14 unrealizable specifications of autonomous Lego robots as well as on benchmarks from the literature. The evaluation shows not only that the JVTS is in most cases much smaller than the corresponding concrete counter-strategy, but also that its computation is faster. @InProceedings{ESEC/FSE17p362, author = {Aviv Kuvent and Shahar Maoz and Jan Oliver Ringert}, title = {A Symbolic Justice Violations Transition System for Unrealizable GR(1) Specifications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {362--372}, doi = {}, year = {2017}, } Info |
|
Marcus, Andrian |
ESEC/FSE '17: "Detecting Missing Information ..."
Detecting Missing Information in Bug Descriptions
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Mariani, Leonardo |
ESEC/FSE '17: "BDCI: Behavioral Driven Conflict ..."
BDCI: Behavioral Driven Conflict Identification
Fabrizio Pastore, Leonardo Mariani, and Daniela Micucci (University of Milano-Bicocca, Italy) Source Code Management (SCM) systems support software evolution by providing features, such as version control, branching, and conflict detection. Despite the presence of these features, support to parallel software development is often limited. SCM systems can only address a subset of the conflicts that might be introduced by developers when concurrently working on multiple parallel branches. In fact, SCM systems can detect textual conflicts, which are generated by the concurrent modification of the same program locations, but they are unable to detect higher-order conflicts, which are generated by the concurrent modification of different program locations that generate program misbehaviors once merged. Higher-order conflicts are painful to detect and expensive to fix because they might be originated by the interference of apparently unrelated changes. In this paper we present Behavioral Driven Conflict Identification (BDCI), a novel approach to conflict detection. BDCI moves the analysis of conflicts from the source code level to the level of program behavior by generating and comparing behavioral models. The analysis based on behavioral models can reveal interfering changes as soon as they are introduced in the SCM system, even if they do not introduce any textual conflict. To evaluate the effectiveness and the cost of the proposed approach, we developed BDCIf, a specific instance of BDCI dedicated to the detection of higher-order conflicts related to the functional behavior of a program. The evidence collected by analyzing multiple versions of Git and Redis suggests that BDCIf can effectively detect higher-order conflicts and report how changes might interfere. @InProceedings{ESEC/FSE17p570, author = {Fabrizio Pastore and Leonardo Mariani and Daniela Micucci}, title = {BDCI: Behavioral Driven Conflict Identification}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {570--581}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Marinov, Darko |
ESEC/FSE '17: "Trade-Offs in Continuous Integration: ..."
Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility
Michael Hilton , Nicholas Nelson, Timothy Tunnell, Darko Marinov , and Danny Dig (Oregon State University, USA; University of Illinois at Urbana-Champaign, USA) Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI being a widely used activity in software engineering, we do not know what motivates developers to use CI, and what barriers and unmet needs they face. Without such knowledge, developers make easily avoidable errors, tool builders invest in the wrong direction, and researchers miss opportunities for improving the practice of CI. We present a qualitative study of the barriers and needs developers face when using CI. We conduct semi-structured interviews with developers from different industries and development scales. We triangulate our findings by running two surveys. We find that developers face trade-offs between speed and certainty (Assurance), between better access and information security (Security), and between more configuration options and greater ease of use (Flexi- bility). We present implications of these trade-offs for developers, tool builders, and researchers. @InProceedings{ESEC/FSE17p197, author = {Michael Hilton and Nicholas Nelson and Timothy Tunnell and Darko Marinov and Danny Dig}, title = {Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {197--207}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Martie, Lee |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Support for Iteration on Code Search
Lee Martie, André van der Hoek, and Thomas Kwak (University of California at Irvine, USA) Sometimes, when programmers use a search engine they know more or less what they need. Other times, programmers use the search engine to look around and generate possible ideas for the programming problem they are working on. The key insight we explore in this paper is that the results found in the latter case tend to serve as inspiration or triggers for the next queries issued. We introduce two search engines, CodeExchange and CodeLikeThis, both of which are specifically designed to enable the user to directly leverage the results in formulating the next query. CodeExchange does this with a set of four features supporting the programmer to use characteristics of the results to find other code with or without those characteristics. CodeLikeThis supports simply selecting an entire result to find code that is analogous, to some degree, to that result. We evaluated how these approaches were used along with two approaches not explicitly supporting iteration, a baseline and Google, in a user study among 24 developers. We find that search engines that support using results to form the next query can improve the programmers’ search experience and different approaches to iteration can provide better experiences depending on the task. @InProceedings{ESEC/FSE17p774, author = {Lee Martie and André van der Hoek and Thomas Kwak}, title = {Understanding the Impact of Support for Iteration on Code Search}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {774--785}, doi = {}, year = {2017}, } |
|
McDaniel, Patrick |
ESEC/FSE '17: "Cimplifier: Automatically ..."
Cimplifier: Automatically Debloating Containers
Vaibhav Rastogi, Drew Davidson, Lorenzo De Carli, Somesh Jha , and Patrick McDaniel (University of Wisconsin-Madison, USA; Tala Security, USA; Colorado State University, USA; Pennsylvania State University, USA) Application containers, such as those provided by Docker, have recently gained popularity as a solution for agile and seamless software deployment. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. Unfortunately, the ease with which containers can be created is oftentimes a double-edged sword, encouraging the packaging of logically distinct applications, and the inclusion of significant amount of unnecessary components, within a single container. These practices needlessly increase the container size—sometimes by orders of magnitude. They also decrease the overall security, as each included component—necessary or not—may bring in security issues of its own, and there is no isolation between multiple applications packaged within the same container image. We propose algorithms and a tool called Cimplifier, which address these concerns: given a container and simple user-defined constraints, our tool partitions it into simpler containers, which (i) are isolated from each other, only communicating as necessary, and (ii) only include enough resources to perform their functionality. Our evaluation on real-world containers demonstrates that Cimplifier preserves the original functionality, leads to reduction in image size of up to 95%, and processes even large containers in under thirty seconds. @InProceedings{ESEC/FSE17p476, author = {Vaibhav Rastogi and Drew Davidson and Lorenzo De Carli and Somesh Jha and Patrick McDaniel}, title = {Cimplifier: Automatically Debloating Containers}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {476--486}, doi = {}, year = {2017}, } |
|
Meinicke, Jens |
ESEC/FSE '17: "Is There a Mismatch between ..."
Is There a Mismatch between Real-World Feature Models and Product-Line Research?
Alexander Knüppel, Thomas Thüm, Stephan Mennicke, Jens Meinicke, and Ina Schaefer (TU Braunschweig, Germany; University of Magdeburg, Germany) Feature modeling has emerged as the de-facto standard to compactly capture the variability of a software product line. Multiple feature modeling languages have been proposed that evolved over the last decades to manage industrial-size product lines. However, less expressive languages, solely permitting require and exclude constraints, are permanently and carelessly used in product-line research. We address the problem whether those less expressive languages are sufficient for industrial product lines. We developed an algorithm to eliminate complex cross-tree constraints in a feature model, enabling the combination of tools and algorithms working with different feature model dialects in a plug-and-play manner. However, the scope of our algorithm is limited. Our evaluation on large feature models, including the Linux kernel, gives evidence that require and exclude constraints are not sufficient to express real-world feature models. Hence, we promote that research on feature models needs to consider arbitrary propositional formulas as cross-tree constraints prospectively. @InProceedings{ESEC/FSE17p291, author = {Alexander Knüppel and Thomas Thüm and Stephan Mennicke and Jens Meinicke and Ina Schaefer}, title = {Is There a Mismatch between Real-World Feature Models and Product-Line Research?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {291--302}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Meliou, Alexandra |
ESEC/FSE '17: "Fairness Testing: Testing ..."
Fairness Testing: Testing Software for Discrimination
Sainyam Galhotra, Yuriy Brun , and Alexandra Meliou (University of Massachusetts at Amherst, USA) This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination. @InProceedings{ESEC/FSE17p498, author = {Sainyam Galhotra and Yuriy Brun and Alexandra Meliou}, title = {Fairness Testing: Testing Software for Discrimination}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {498--510}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Meng, Guozhu |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Mennicke, Stephan |
ESEC/FSE '17: "Is There a Mismatch between ..."
Is There a Mismatch between Real-World Feature Models and Product-Line Research?
Alexander Knüppel, Thomas Thüm, Stephan Mennicke, Jens Meinicke, and Ina Schaefer (TU Braunschweig, Germany; University of Magdeburg, Germany) Feature modeling has emerged as the de-facto standard to compactly capture the variability of a software product line. Multiple feature modeling languages have been proposed that evolved over the last decades to manage industrial-size product lines. However, less expressive languages, solely permitting require and exclude constraints, are permanently and carelessly used in product-line research. We address the problem whether those less expressive languages are sufficient for industrial product lines. We developed an algorithm to eliminate complex cross-tree constraints in a feature model, enabling the combination of tools and algorithms working with different feature model dialects in a plug-and-play manner. However, the scope of our algorithm is limited. Our evaluation on large feature models, including the Linux kernel, gives evidence that require and exclude constraints are not sufficient to express real-world feature models. Hence, we promote that research on feature models needs to consider arbitrary propositional formulas as cross-tree constraints prospectively. @InProceedings{ESEC/FSE17p291, author = {Alexander Knüppel and Thomas Thüm and Stephan Mennicke and Jens Meinicke and Ina Schaefer}, title = {Is There a Mismatch between Real-World Feature Models and Product-Line Research?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {291--302}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Menzies, Tim |
ESEC/FSE '17: "Revisiting Unsupervised Learning ..."
Revisiting Unsupervised Learning for Defect Prediction
Wei Fu and Tim Menzies (North Carolina State University, USA) Collecting quality data from software projects can be time-consuming and expensive. Hence, some researchers explore “unsupervised” approaches to quality prediction that does not require labelled data. An alternate technique is to use “supervised” approaches that learn models from project data labelled with, say, “defective” or “not-defective”. Most researchers use these supervised models since, it is argued, they can exploit more knowledge of the projects. At FSE’16, Yang et al. reported startling results where unsupervised defect predictors outperformed supervised predictors for effort-aware just-in-time defect prediction. If confirmed, these results would lead to a dramatic simplification of a seemingly complex task (data mining) that is widely explored in the software engineering literature. This paper repeats and refutes those results as follows. (1) There is much variability in the efficacy of the Yang et al. predictors so even with their approach, some supervised data is required to prune weaker predictors away. (2) Their findings were grouped across N projects. When we repeat their analysis on a project-by-project basis, supervised predictors are seen to work better. Even though this paper rejects the specific conclusions of Yang et al., we still endorse their general goal. In our our experiments, supervised predictors did not perform outstandingly better than unsupervised ones for effort-aware just-in-time defect prediction. Hence, they may indeed be some combination of unsupervised learners to achieve comparable performance to supervised ones. We therefore encourage others to work in this promising area. @InProceedings{ESEC/FSE17p72, author = {Wei Fu and Tim Menzies}, title = {Revisiting Unsupervised Learning for Defect Prediction}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {72--83}, doi = {}, year = {2017}, } ESEC/FSE '17: "Using Bad Learners to Find ..." Using Bad Learners to Find Good Configurations Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel (North Carolina State University, USA; Bauhaus-University Weimar, Germany; University of Passau, Germany) Finding the optimally performing configuration of a software system for a given setting is often challenging. Recent approaches address this challenge by learning performance models based on a sample set of configurations. However, building an accurate performance model can be very expensive (and is often infeasible in practice). The central insight of this paper is that exact performance values (e.g., the response time of a software system) are not required to rank configurations and to identify the optimal one. As shown by our experiments, performance models that are cheap to learn but inaccurate (with respect to the difference between actual and predicted performance) can still be used rank configurations and hence find the optimal configuration. This novel rank-based approach allows us to significantly reduce the cost (in terms of number of measurements of sample configuration) as well as the time required to build performance models. We evaluate our approach with 21 scenarios based on 9 software systems and demonstrate that our approach is beneficial in 16 scenarios; for the remaining 5 scenarios, an accurate model can be built by using very few samples anyway, without the need for a rank-based approach. @InProceedings{ESEC/FSE17p257, author = {Vivek Nair and Tim Menzies and Norbert Siegmund and Sven Apel}, title = {Using Bad Learners to Find Good Configurations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--267}, doi = {}, year = {2017}, } ESEC/FSE '17: "Easy over Hard: A Case Study ..." Easy over Hard: A Case Study on Deep Learning Wei Fu and Tim Menzies (North Carolina State University, USA) While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long training time limits the ability of (a)~a researcher to test the stability of their conclusion via repeated runs with different random seeds; and (b)~other researchers to repeat, improve, or even refute that original work. For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That deep learning system took 14 hours to execute. We show here that applying a very simple optimizer called DE to fine tune SVM, it can achieve similar (and sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning method. We offer these results as a cautionary tale to the software analytics community and suggest that not every new innovation should be applied without critical analysis. If researchers deploy some new and expensive process, that work should be baselined against some simpler and faster alternatives. @InProceedings{ESEC/FSE17p49, author = {Wei Fu and Tim Menzies}, title = {Easy over Hard: A Case Study on Deep Learning}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {49--60}, doi = {}, year = {2017}, } |
|
Mezini, Mira |
ESEC/FSE '17: "CodeMatch: Obfuscation Won't ..."
CodeMatch: Obfuscation Won't Conceal Your Repackaged App
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini (TU Darmstadt, Germany) An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app’s library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages @InProceedings{ESEC/FSE17p638, author = {Leonid Glanz and Sven Amann and Michael Eichberg and Michael Reif and Ben Hermann and Johannes Lerch and Mira Mezini}, title = {CodeMatch: Obfuscation Won't Conceal Your Repackaged App}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {638--648}, doi = {}, year = {2017}, } Info |
|
Micucci, Daniela |
ESEC/FSE '17: "BDCI: Behavioral Driven Conflict ..."
BDCI: Behavioral Driven Conflict Identification
Fabrizio Pastore, Leonardo Mariani, and Daniela Micucci (University of Milano-Bicocca, Italy) Source Code Management (SCM) systems support software evolution by providing features, such as version control, branching, and conflict detection. Despite the presence of these features, support to parallel software development is often limited. SCM systems can only address a subset of the conflicts that might be introduced by developers when concurrently working on multiple parallel branches. In fact, SCM systems can detect textual conflicts, which are generated by the concurrent modification of the same program locations, but they are unable to detect higher-order conflicts, which are generated by the concurrent modification of different program locations that generate program misbehaviors once merged. Higher-order conflicts are painful to detect and expensive to fix because they might be originated by the interference of apparently unrelated changes. In this paper we present Behavioral Driven Conflict Identification (BDCI), a novel approach to conflict detection. BDCI moves the analysis of conflicts from the source code level to the level of program behavior by generating and comparing behavioral models. The analysis based on behavioral models can reveal interfering changes as soon as they are introduced in the SCM system, even if they do not introduce any textual conflict. To evaluate the effectiveness and the cost of the proposed approach, we developed BDCIf, a specific instance of BDCI dedicated to the detection of higher-order conflicts related to the functional behavior of a program. The evidence collected by analyzing multiple versions of Git and Redis suggests that BDCIf can effectively detect higher-order conflicts and report how changes might interfere. @InProceedings{ESEC/FSE17p570, author = {Fabrizio Pastore and Leonardo Mariani and Daniela Micucci}, title = {BDCI: Behavioral Driven Conflict Identification}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {570--581}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Milicevic, Aleksandar |
ESEC/FSE '17: "Regression Test Selection ..."
Regression Test Selection Across JVM Boundaries
Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric (University of Texas at Austin, USA; Microsoft, USA) Modern software development processes recommend that changes be integrated into the main development line of a project multiple times a day. Before a new revision may be integrated, developers practice regression testing to ensure that the latest changes do not break any previously established functionality. The cost of regression testing is high, due to an increase in the number of revisions that are introduced per day, as well as the number of tests developers write per revision. Regression test selection (RTS) optimizes regression testing by skipping tests that are not affected by recent project changes. Existing dynamic RTS techniques support only projects written in a single programming language, which is unfortunate knowing that an open-source project is on average written in several programming languages. We present the first dynamic RTS technique that does not stop at predefined language boundaries. Our technique dynamically detects, at the operating system level, all file artifacts a test depends on. Our technique is, hence, oblivious to the specific means the test uses to actually access the files: be it through spawning a new process, invoking a system call, invoking a library written in a different language, invoking a library that spawns a process which makes a system call, etc. We also provide a set of extension points which allow for a smooth integration with testing frameworks and build systems. We implemented our technique in a tool called RTSLinux as a loadable Linux kernel module and evaluated it on 21 Java projects that escape JVM by spawning new processes or invoking native code, totaling 2,050,791 lines of code. Our results show that RTSLinux, on average, skips 74.17% of tests and saves 52.83% of test execution time compared to executing all tests. @InProceedings{ESEC/FSE17p809, author = {Ahmet Celik and Marko Vasic and Aleksandar Milicevic and Milos Gligoric}, title = {Regression Test Selection Across JVM Boundaries}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {809--820}, doi = {}, year = {2017}, } |
|
Mockus, Audris |
ESEC/FSE '17: "On the Scalability of Linux ..."
On the Scalability of Linux Kernel Maintainers' Work
Minghui Zhou , Qingying Chen, Audris Mockus, and Fengguang Wu (Peking University, China; University of Tennessee, USA; Intel, China) Open source software ecosystems evolve ways to balance the workload among groups of participants ranging from core groups to peripheral groups. As ecosystems grow, it is not clear whether the mechanisms that previously made them work will continue to be relevant or whether new mechanisms will need to evolve. The impact of failure for critical ecosystems such as Linux is enormous, yet the understanding of why they function and are effective is limited. We, therefore, aim to understand how the Linux kernel sustains its growth, how to characterize the workload of maintainers, and whether or not the existing mechanisms are scalable. We quantify maintainers’ work through the files that are maintained, and the change activity and the numbers of contributors in those files. We find systematic differences among modules; these differences are stable over time, which suggests that certain architectural features, commercial interests, or module-specific practices lead to distinct sustainable equilibria. We find that most of the modules have not grown appreciably over the last decade; most growth has been absorbed by a few modules. We also find that the effort per maintainer does not increase, even though the community has hypothesized that required effort might increase. However, the distribution of work among maintainers is highly unbalanced, suggesting that a few maintainers may experience increasing workload. We find that the practice of assigning multiple maintainers to a file yields only a power of 1/2 increase in productivity. We expect that our proposed framework to quantify maintainer practices will help clarify the factors that allow rapidly growing ecosystems to be sustainable. @InProceedings{ESEC/FSE17p27, author = {Minghui Zhou and Qingying Chen and Audris Mockus and Fengguang Wu}, title = {On the Scalability of Linux Kernel Maintainers' Work}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {27--37}, doi = {}, year = {2017}, } Info |
|
Mongiovi, Melina |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Moran, Kevin |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info |
|
Moreno, Laura |
ESEC/FSE '17: "Detecting Missing Information ..."
Detecting Missing Information in Bug Descriptions
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Mujahid, Suhaib |
ESEC/FSE '17: "Why Do Developers Use Trivial ..."
Why Do Developers Use Trivial Packages? An Empirical Case Study on npm
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (Concordia University, Canada) Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call `trivial packages'. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers' beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be `deployment tested' and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use. @InProceedings{ESEC/FSE17p385, author = {Rabe Abdalkareem and Olivier Nourry and Sultan Wehaibi and Suhaib Mujahid and Emad Shihab}, title = {Why Do Developers Use Trivial Packages? An Empirical Case Study on npm}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {385--395}, doi = {}, year = {2017}, } |
|
Murali, Vijayaraghavan |
ESEC/FSE '17: "Bayesian Specification Learning ..."
Bayesian Specification Learning for Finding API Usage Errors
Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine (Rice University, USA) We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a "reference distribution" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches. @InProceedings{ESEC/FSE17p151, author = {Vijayaraghavan Murali and Swarat Chaudhuri and Chris Jermaine}, title = {Bayesian Specification Learning for Finding API Usage Errors}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {151--162}, doi = {}, year = {2017}, } |
|
Musa, Betim |
ESEC/FSE '17: "Craig vs. Newton in Software ..."
Craig vs. Newton in Software Model Checking
Daniel Dietsch, Matthias Heizmann, Betim Musa, Alexander Nutz, and Andreas Podelski (University of Freiburg, Germany) Ever since the seminal work on SLAM and BLAST, software model checking with counterexample-guided abstraction refinement (CEGAR) has been an active topic of research. The crucial procedure here is to analyze a sequence of program statements (the counterexample) to find building blocks for the overall proof of the program. We can distinguish two approaches (which we name Craig and Newton) to implement the procedure. The historically first approach, Newton (named after the tool from the SLAM toolkit), is based on symbolic execution. The second approach, Craig, is based on Craig interpolation. It was widely believed that Craig is substantially more effective than Newton. In fact, 12 out of the 15 CEGAR-based tools in SV-COMP are based on Craig. Advances in software model checkers based on Craig, however, can go only lockstep with advances in SMT solvers with Craig interpolation. It may be time to revisit Newton and ask whether Newton can be as effective as Craig. We have implemented a total of 11 variants of Craig and Newton in two different state-of-the-art software model checking tools and present the outcome of our experimental comparison. @InProceedings{ESEC/FSE17p487, author = {Daniel Dietsch and Matthias Heizmann and Betim Musa and Alexander Nutz and Andreas Podelski}, title = {Craig vs. Newton in Software Model Checking}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {487--497}, doi = {}, year = {2017}, } |
|
Myers, Margaret |
ESEC/FSE '17: "Finding Near-Optimal Configurations ..."
Finding Near-Optimal Configurations in Product Lines by Random Sampling
Jeho Oh, Don Batory, Margaret Myers, and Norbert Siegmund (University of Texas at Austin, USA; Bauhaus-University Weimar, Germany) Software Product Lines (SPLs) are highly configurable systems. This raises the challenge to find optimal performing configurations for an anticipated workload. As SPL configuration spaces are huge, it is infeasible to benchmark all configurations to find an optimal one. Prior work focused on building performance models to predict and optimize SPL configurations. Instead, we randomly sample and recursively search a configuration space directly to find near-optimal configurations without constructing a prediction model. Our algorithms are simpler and have higher accuracy and efficiency. @InProceedings{ESEC/FSE17p61, author = {Jeho Oh and Don Batory and Margaret Myers and Norbert Siegmund}, title = {Finding Near-Optimal Configurations in Product Lines by Random Sampling}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {61--71}, doi = {}, year = {2017}, } |
|
Nagarakatte, Santosh |
ESEC/FSE '17: "A Fast Causal Profiler for ..."
A Fast Causal Profiler for Task Parallel Programs
Adarsh Yoga and Santosh Nagarakatte (Rutgers University, USA) This paper proposes TASKPROF, a profiler that identifies parallelism bottlenecks in task parallel programs. It leverages the structure of a task parallel execution to perform fine-grained attribution of work to various parts of the program. TASKPROF’s use of hardware performance counters to perform fine-grained measurements minimizes perturbation. TASKPROF’s profile execution runs in parallel using multi-cores. TASKPROF’s causal profile enables users to estimate improvements in parallelism when a region of code is optimized even when concrete optimizations are not yet known. We have used TASKPROF to isolate parallelism bottlenecks in twenty three applications that use the Intel Threading Building Blocks library. We have designed parallelization techniques in five applications to increase parallelism by an order of magnitude using TASKPROF. Our user study indicates that developers are able to isolate performance bottlenecks with ease using TASKPROF. @InProceedings{ESEC/FSE17p15, author = {Adarsh Yoga and Santosh Nagarakatte}, title = {A Fast Causal Profiler for Task Parallel Programs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {15--26}, doi = {}, year = {2017}, } Artifacts Functional |
|
Nair, Vivek |
ESEC/FSE '17: "Using Bad Learners to Find ..."
Using Bad Learners to Find Good Configurations
Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel (North Carolina State University, USA; Bauhaus-University Weimar, Germany; University of Passau, Germany) Finding the optimally performing configuration of a software system for a given setting is often challenging. Recent approaches address this challenge by learning performance models based on a sample set of configurations. However, building an accurate performance model can be very expensive (and is often infeasible in practice). The central insight of this paper is that exact performance values (e.g., the response time of a software system) are not required to rank configurations and to identify the optimal one. As shown by our experiments, performance models that are cheap to learn but inaccurate (with respect to the difference between actual and predicted performance) can still be used rank configurations and hence find the optimal configuration. This novel rank-based approach allows us to significantly reduce the cost (in terms of number of measurements of sample configuration) as well as the time required to build performance models. We evaluate our approach with 21 scenarios based on 9 software systems and demonstrate that our approach is beneficial in 16 scenarios; for the remaining 5 scenarios, an accurate model can be built by using very few samples anyway, without the need for a rank-based approach. @InProceedings{ESEC/FSE17p257, author = {Vivek Nair and Tim Menzies and Norbert Siegmund and Sven Apel}, title = {Using Bad Learners to Find Good Configurations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--267}, doi = {}, year = {2017}, } |
|
Nam, Jaechang |
ESEC/FSE '17: "QTEP: Quality-Aware Test Case ..."
QTEP: Quality-Aware Test Case Prioritization
Song Wang, Jaechang Nam, and Lin Tan (University of Waterloo, Canada) Test case prioritization (TCP) is a practical activity in software testing for exposing faults earlier. Researchers have proposed many TCP techniques to reorder test cases. Among them, coverage-based TCPs have been widely investigated. Specifically, coverage-based TCP approaches leverage coverage information between source code and test cases, i.e., static code coverage and dynamic code coverage, to schedule test cases. Existing coverage-based TCP techniques mainly focus on maximizing coverage while often do not consider the likely distribution of faults in source code. However, software faults are not often equally distributed in source code, e.g., around 80% faults are located in about 20% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. In this paper, we present a quality-aware test case prioritization technique, QTEP, to address the limitation of existing coverage-based TCP algorithms. In QTEP, we leverage code inspection techniques, i.e., a typical statistic defect prediction model and a typical static bug finder, to detect fault-prone source code and then adapt existing coverage-based TCP algorithms by considering the weighted source code in terms of fault-proneness. Our evaluation with 16 variant QTEP techniques on 33 different versions of 7 open source Java projects shows that QTEP could improve existing coverage-based TCP techniques for both regression and new test cases. Specifically, the improvement of the best variant of QTEP for regression test cases could be up to 15.0% and on average 7.6%, and for all test cases (both regression and new test cases), the improvement could be up to 10.0% and on average 5.0%. @InProceedings{ESEC/FSE17p523, author = {Song Wang and Jaechang Nam and Lin Tan}, title = {QTEP: Quality-Aware Test Case Prioritization}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {523--534}, doi = {}, year = {2017}, } Info |
|
Nelson, Nicholas |
ESEC/FSE '17: "Trade-Offs in Continuous Integration: ..."
Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility
Michael Hilton , Nicholas Nelson, Timothy Tunnell, Darko Marinov , and Danny Dig (Oregon State University, USA; University of Illinois at Urbana-Champaign, USA) Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI being a widely used activity in software engineering, we do not know what motivates developers to use CI, and what barriers and unmet needs they face. Without such knowledge, developers make easily avoidable errors, tool builders invest in the wrong direction, and researchers miss opportunities for improving the practice of CI. We present a qualitative study of the barriers and needs developers face when using CI. We conduct semi-structured interviews with developers from different industries and development scales. We triangulate our findings by running two surveys. We find that developers face trade-offs between speed and certainty (Assurance), between better access and information security (Security), and between more configuration options and greater ease of use (Flexi- bility). We present implications of these trade-offs for developers, tool builders, and researchers. @InProceedings{ESEC/FSE17p197, author = {Michael Hilton and Nicholas Nelson and Timothy Tunnell and Darko Marinov and Danny Dig}, title = {Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {197--207}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Nelson, Tim |
ESEC/FSE '17: "The Power of "Why" ..."
The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance
Tim Nelson, Natasha Danas, Daniel J. Dougherty, and Shriram Krishnamurthi (Brown University, USA; Worcester Polytechnic Institute, USA) Scenario-finding tools like the Alloy Analyzer are widely used in numerous concrete domains like security, network analysis, UML analysis, and so on. They can help to verify properties and, more generally, aid in exploring a system's behavior. While scenario finders are valuable for their ability to produce concrete examples, individual scenarios only give insight into what is possible, leaving the user to make their own conclusions about what might be necessary. This paper enriches scenario finding by allowing users to ask ``why?'' and ``why not?'' questions about the examples they are given. We show how to distinguish parts of an example that cannot be consistently removed (or changed) from those that merely reflect underconstraint in the specification. In the former case we show how to determine which elements of the specification and which other components of the example together explain the presence of such facts. This paper formalizes the act of computing provenance in scenario-finding. We present Amalgam, an extension of the popular Alloy scenario-finder, which implements these foundations and provides interactive exploration of examples. We also evaluate Amalgam's algorithmics on a variety of both textbook and real-world examples. @InProceedings{ESEC/FSE17p106, author = {Tim Nelson and Natasha Danas and Daniel J. Dougherty and Shriram Krishnamurthi}, title = {The Power of "Why" and "Why Not": Enriching Scenario Exploration with Provenance}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {106--116}, doi = {}, year = {2017}, } Info Artifacts Reusable Best-Paper Award |
|
Ng, Vincent |
ESEC/FSE '17: "Detecting Missing Information ..."
Detecting Missing Information in Bug Descriptions
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Nguyen, ThanhVu |
ESEC/FSE '17: "Counterexample-Guided Approach ..."
Counterexample-Guided Approach to Finding Numerical Invariants
ThanhVu Nguyen , Timos Antonopoulos , Andrew Ruef, and Michael Hicks (University of Nebraska-Lincoln, USA; Yale University, USA; University of Maryland, USA) Numerical invariants, e.g., relationships among numerical variables in a program, represent a useful class of properties to analyze programs. General polynomial invariants represent more complex numerical relations, but they are often required in many scientific and engineering applications. We present NumInv, a tool that implements a counterexample-guided invariant generation (CEGIR) technique to automatically discover numerical invariants, which are polynomial equality and inequality relations among numerical variables. This CEGIR technique infers candidate invariants from program traces and then checks them against the program source code using the KLEE test-input generation tool. If the invariants are incorrect KLEE returns counterexample traces, which help the dynamic inference obtain better results. Existing CEGIR approaches often require sound invariants, however NumInv sacrifices soundness and produces results that KLEE cannot refute within certain time bounds. This design and the use of KLEE as a verifier allow NumInv to discover useful and important numerical invariants for many challenging programs. Preliminary results show that NumInv generates required invariants for understanding and verifying correctness of programs involving complex arithmetic. We also show that NumInv discovers polynomial invariants that capture precise complexity bounds of programs used to benchmark existing static complexity analysis techniques. Finally, we show that NumInv performs competitively comparing to state of the art numerical invariant analysis tools. @InProceedings{ESEC/FSE17p605, author = {ThanhVu Nguyen and Timos Antonopoulos and Andrew Ruef and Michael Hicks}, title = {Counterexample-Guided Approach to Finding Numerical Invariants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {605--615}, doi = {}, year = {2017}, } |
|
Nourry, Olivier |
ESEC/FSE '17: "Why Do Developers Use Trivial ..."
Why Do Developers Use Trivial Packages? An Empirical Case Study on npm
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (Concordia University, Canada) Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call `trivial packages'. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers' beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be `deployment tested' and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use. @InProceedings{ESEC/FSE17p385, author = {Rabe Abdalkareem and Olivier Nourry and Sultan Wehaibi and Suhaib Mujahid and Emad Shihab}, title = {Why Do Developers Use Trivial Packages? An Empirical Case Study on npm}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {385--395}, doi = {}, year = {2017}, } |
|
Nuseibeh, Bashar |
ESEC/FSE '17: "On Evidence Preservation Requirements ..."
On Evidence Preservation Requirements for Forensic-Ready Systems
Dalal Alrajeh, Liliana Pasquale, and Bashar Nuseibeh (Imperial College London, UK; University College Dublin, Ireland; Open University, UK; Lero, Ireland) Forensic readiness denotes the capability of a system to support digital forensic investigations of potential, known incidents by preserving in advance data that could serve as evidence explaining how an incident occurred. Given the increasing rate at which (potentially criminal) incidents occur, designing soware systems that are forensic-ready can facilitate and reduce the costs of digital forensic investigations. However, to date, little or no attention has been given to how forensic-ready software systems can be designed systematically. In this paper we propose to explicitly represent evidence preservation requirements prescribing preservation of the minimal amount of data that would be relevant to a future digital investigation. We formalise evidence preservation requirements and propose an approach for synthesising specifications for systems to meet these requirements. We present our prototype implementation—based on a satisfiability solver and a logic-based learner—which we use to evaluate our approach, applying it to two digital forensic corpora. Our evaluation suggests that our approach preserves relevant data that could support hypotheses of potential incidents. Moreover, it enables significant reduction in the volume of data that would need to be examined during an investigation. @InProceedings{ESEC/FSE17p559, author = {Dalal Alrajeh and Liliana Pasquale and Bashar Nuseibeh}, title = {On Evidence Preservation Requirements for Forensic-Ready Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {559--569}, doi = {}, year = {2017}, } |
|
Nutz, Alexander |
ESEC/FSE '17: "Craig vs. Newton in Software ..."
Craig vs. Newton in Software Model Checking
Daniel Dietsch, Matthias Heizmann, Betim Musa, Alexander Nutz, and Andreas Podelski (University of Freiburg, Germany) Ever since the seminal work on SLAM and BLAST, software model checking with counterexample-guided abstraction refinement (CEGAR) has been an active topic of research. The crucial procedure here is to analyze a sequence of program statements (the counterexample) to find building blocks for the overall proof of the program. We can distinguish two approaches (which we name Craig and Newton) to implement the procedure. The historically first approach, Newton (named after the tool from the SLAM toolkit), is based on symbolic execution. The second approach, Craig, is based on Craig interpolation. It was widely believed that Craig is substantially more effective than Newton. In fact, 12 out of the 15 CEGAR-based tools in SV-COMP are based on Craig. Advances in software model checkers based on Craig, however, can go only lockstep with advances in SMT solvers with Craig interpolation. It may be time to revisit Newton and ask whether Newton can be as effective as Craig. We have implemented a total of 11 variants of Craig and Newton in two different state-of-the-art software model checking tools and present the outcome of our experimental comparison. @InProceedings{ESEC/FSE17p487, author = {Daniel Dietsch and Matthias Heizmann and Betim Musa and Alexander Nutz and Andreas Podelski}, title = {Craig vs. Newton in Software Model Checking}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {487--497}, doi = {}, year = {2017}, } |
|
Oh, Jeho |
ESEC/FSE '17: "Finding Near-Optimal Configurations ..."
Finding Near-Optimal Configurations in Product Lines by Random Sampling
Jeho Oh, Don Batory, Margaret Myers, and Norbert Siegmund (University of Texas at Austin, USA; Bauhaus-University Weimar, Germany) Software Product Lines (SPLs) are highly configurable systems. This raises the challenge to find optimal performing configurations for an anticipated workload. As SPL configuration spaces are huge, it is infeasible to benchmark all configurations to find an optimal one. Prior work focused on building performance models to predict and optimize SPL configurations. Instead, we randomly sample and recursively search a configuration space directly to find near-optimal configurations without constructing a prediction model. Our algorithms are simpler and have higher accuracy and efficiency. @InProceedings{ESEC/FSE17p61, author = {Jeho Oh and Don Batory and Margaret Myers and Norbert Siegmund}, title = {Finding Near-Optimal Configurations in Product Lines by Random Sampling}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {61--71}, doi = {}, year = {2017}, } |
|
Papadopoulos, Alessandro Vittorio |
ESEC/FSE '17: "Automated Control of Multiple ..."
Automated Control of Multiple Software Goals using Multiple Actuators
Martina Maggio, Alessandro Vittorio Papadopoulos, Antonio Filieri, and Henry Hoffmann (Lund University, Sweden; Mälardalen University, Sweden; Imperial College London, UK; University of Chicago, USA) Modern software should satisfy multiple goals simultaneously: it should provide predictable performance, be robust to failures, handle peak loads and deal seamlessly with unexpected conditions and changes in the execution environment. For this to happen, software designs should account for the possibility of runtime changes and provide formal guarantees of the software's behavior. Control theory is one of the possible design drivers for runtime adaptation, but adopting control theoretic principles often requires additional, specialized knowledge. To overcome this limitation, automated methodologies have been proposed to extract the necessary information from experimental data and design a control system for runtime adaptation. These proposals, however, only process one goal at a time, creating a chain of controllers. In this paper, we propose and evaluate the first automated strategy that takes into account multiple goals without separating them into multiple control strategies. Avoiding the separation allows us to tackle a larger class of problems and provide stronger guarantees. We test our methodology's generality with three case studies that demonstrate its broad applicability in meeting performance, reliability, quality, security, and energy goals despite environmental or requirements changes. @InProceedings{ESEC/FSE17p373, author = {Martina Maggio and Alessandro Vittorio Papadopoulos and Antonio Filieri and Henry Hoffmann}, title = {Automated Control of Multiple Software Goals using Multiple Actuators}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {373--384}, doi = {}, year = {2017}, } Info |
|
Parnin, Chris |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Pasquale, Liliana |
ESEC/FSE '17: "On Evidence Preservation Requirements ..."
On Evidence Preservation Requirements for Forensic-Ready Systems
Dalal Alrajeh, Liliana Pasquale, and Bashar Nuseibeh (Imperial College London, UK; University College Dublin, Ireland; Open University, UK; Lero, Ireland) Forensic readiness denotes the capability of a system to support digital forensic investigations of potential, known incidents by preserving in advance data that could serve as evidence explaining how an incident occurred. Given the increasing rate at which (potentially criminal) incidents occur, designing soware systems that are forensic-ready can facilitate and reduce the costs of digital forensic investigations. However, to date, little or no attention has been given to how forensic-ready software systems can be designed systematically. In this paper we propose to explicitly represent evidence preservation requirements prescribing preservation of the minimal amount of data that would be relevant to a future digital investigation. We formalise evidence preservation requirements and propose an approach for synthesising specifications for systems to meet these requirements. We present our prototype implementation—based on a satisfiability solver and a logic-based learner—which we use to evaluate our approach, applying it to two digital forensic corpora. Our evaluation suggests that our approach preserves relevant data that could support hypotheses of potential incidents. Moreover, it enables significant reduction in the volume of data that would need to be examined during an investigation. @InProceedings{ESEC/FSE17p559, author = {Dalal Alrajeh and Liliana Pasquale and Bashar Nuseibeh}, title = {On Evidence Preservation Requirements for Forensic-Ready Systems}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {559--569}, doi = {}, year = {2017}, } |
|
Pastore, Fabrizio |
ESEC/FSE '17: "BDCI: Behavioral Driven Conflict ..."
BDCI: Behavioral Driven Conflict Identification
Fabrizio Pastore, Leonardo Mariani, and Daniela Micucci (University of Milano-Bicocca, Italy) Source Code Management (SCM) systems support software evolution by providing features, such as version control, branching, and conflict detection. Despite the presence of these features, support to parallel software development is often limited. SCM systems can only address a subset of the conflicts that might be introduced by developers when concurrently working on multiple parallel branches. In fact, SCM systems can detect textual conflicts, which are generated by the concurrent modification of the same program locations, but they are unable to detect higher-order conflicts, which are generated by the concurrent modification of different program locations that generate program misbehaviors once merged. Higher-order conflicts are painful to detect and expensive to fix because they might be originated by the interference of apparently unrelated changes. In this paper we present Behavioral Driven Conflict Identification (BDCI), a novel approach to conflict detection. BDCI moves the analysis of conflicts from the source code level to the level of program behavior by generating and comparing behavioral models. The analysis based on behavioral models can reveal interfering changes as soon as they are introduced in the SCM system, even if they do not introduce any textual conflict. To evaluate the effectiveness and the cost of the proposed approach, we developed BDCIf, a specific instance of BDCI dedicated to the detection of higher-order conflicts related to the functional behavior of a program. The evidence collected by analyzing multiple versions of Git and Redis suggests that BDCIf can effectively detect higher-order conflicts and report how changes might interfere. @InProceedings{ESEC/FSE17p570, author = {Fabrizio Pastore and Leonardo Mariani and Daniela Micucci}, title = {BDCI: Behavioral Driven Conflict Identification}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {570--581}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Pattabiraman, Karthik |
ESEC/FSE '17: "ARTINALI: Dynamic Invariant ..."
ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security
Maryam Raiyat Aliabadi, Amita Ajith Kamath, Julien Gascon-Samson, and Karthik Pattabiraman (University of British Columbia, Canada; National Institute of Technology Karnataka, India) Cyber-Physical Systems (CPSes) are being widely deployed in security critical scenarios such as smart homes and medical devices. Unfortunately, the connectedness of these systems and their relative lack of security measures makes them ripe targets for attacks. Specification-based Intrusion Detection Systems (IDS) have been shown to be effective for securing CPSs. Unfortunately, deriving invariants for capturing the specifications of CPS systems is a tedious and error-prone process. Therefore, it is important to dynamically monitor the CPS system to learn its common behaviors and formulate invariants for detecting security attacks. Existing techniques for invariant mining only incorporate data and events, but not time. However, time is central to most CPS systems, and hence incorporating time in addition to data and events, is essential for achieving low false positives and false negatives. This paper proposes ARTINALI, which mines dynamic system properties by incorporating time as a first-class property of the system. We build ARTINALI-based Intrusion Detection Systems (IDSes) for two CPSes, namely smart meters and smart medical devices, and measure their efficacy. We find that the ARTINALI-based IDSes significantly reduce the ratio of false positives and false negatives by 16 to 48% (average 30.75%) and 89 to 95% (average 93.4%) respectively over other dynamic invariant detection tools. @InProceedings{ESEC/FSE17p349, author = {Maryam Raiyat Aliabadi and Amita Ajith Kamath and Julien Gascon-Samson and Karthik Pattabiraman}, title = {ARTINALI: Dynamic Invariant Detection for Cyber-Physical System Security}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {349--361}, doi = {}, year = {2017}, } |
|
Peitek, Norman |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Pezzè, Mauro |
ESEC/FSE '17: "Reproducing Concurrency Failures ..."
Reproducing Concurrency Failures from Crash Stacks
Francesco A. Bianchi, Mauro Pezzè , and Valerio Terragni (University of Lugano, Switzerland) Reproducing field failures is the first essential step for understanding, localizing and removing faults. Reproducing concurrency field failures is hard due to the need of synthesizing a test code jointly with a thread interleaving that induce the failure in the presence of limited information from the field. Current techniques for reproducing concurrency failures focus on identifying failure-inducing interleavings, leaving largely open the problem of synthesizing the test code that manifests such interleavings. In this paper, we present ConCrash, a technique to automatically generate test codes that reproduce concurrency failures that violate thread-safety from crash stacks, which commonly summarize the conditions of field failures. ConCrash efficiently explores the huge space of possible test codes to identify a failure-inducing one by using a suitable set of search pruning strategies. Combined with existing techniques for exploring interleavings, ConCrash automatically reproduces a given concurrency failure that violates the thread-safety of a class by identifying both a failure-inducing test code and corresponding interleaving. In the paper, we define the ConCrash approach, present a prototype implementation of ConCrash, and discuss the experimental results that we obtained on a known set of ten field failures that witness the effectiveness of the approach. @InProceedings{ESEC/FSE17p705, author = {Francesco A. Bianchi and Mauro Pezzè and Valerio Terragni}, title = {Reproducing Concurrency Failures from Crash Stacks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {705--716}, doi = {}, year = {2017}, } |
|
Philippsen, Michael |
ESEC/FSE '17: "More Accurate Recommendations ..."
More Accurate Recommendations for Method-Level Changes
Georg Dotzler, Marius Kamp, Patrick Kreutzer, and Michael Philippsen (Friedrich-Alexander University Erlangen-Nürnberg, Germany) During the life span of large software projects, developers often apply the same code changes to different code locations in slight variations. Since the application of these changes to all locations is time-consuming and error-prone, tools exist that learn change patterns from input examples, search for possible pattern applications, and generate corresponding recommendations. In many cases, the generated recommendations are syntactically or semantically wrong due to code movements in the input examples. Thus, they are of low accuracy and developers cannot directly copy them into their projects without adjustments. We present the Accurate REcommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations. On average, the recommendations by ARES have an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives. At the same time ARES achieves precision and recall values that are on par with other tools. @InProceedings{ESEC/FSE17p798, author = {Georg Dotzler and Marius Kamp and Patrick Kreutzer and Michael Philippsen}, title = {More Accurate Recommendations for Method-Level Changes}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {798--808}, doi = {}, year = {2017}, } Info |
|
Podelski, Andreas |
ESEC/FSE '17: "Craig vs. Newton in Software ..."
Craig vs. Newton in Software Model Checking
Daniel Dietsch, Matthias Heizmann, Betim Musa, Alexander Nutz, and Andreas Podelski (University of Freiburg, Germany) Ever since the seminal work on SLAM and BLAST, software model checking with counterexample-guided abstraction refinement (CEGAR) has been an active topic of research. The crucial procedure here is to analyze a sequence of program statements (the counterexample) to find building blocks for the overall proof of the program. We can distinguish two approaches (which we name Craig and Newton) to implement the procedure. The historically first approach, Newton (named after the tool from the SLAM toolkit), is based on symbolic execution. The second approach, Craig, is based on Craig interpolation. It was widely believed that Craig is substantially more effective than Newton. In fact, 12 out of the 15 CEGAR-based tools in SV-COMP are based on Craig. Advances in software model checkers based on Craig, however, can go only lockstep with advances in SMT solvers with Craig interpolation. It may be time to revisit Newton and ask whether Newton can be as effective as Craig. We have implemented a total of 11 variants of Craig and Newton in two different state-of-the-art software model checking tools and present the outcome of our experimental comparison. @InProceedings{ESEC/FSE17p487, author = {Daniel Dietsch and Matthias Heizmann and Betim Musa and Alexander Nutz and Andreas Podelski}, title = {Craig vs. Newton in Software Model Checking}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {487--497}, doi = {}, year = {2017}, } |
|
Poggi, Giovanni |
ESEC/FSE '17: "Automatically Analyzing Groups ..."
Automatically Analyzing Groups of Crashes for Finding Correlations
Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi (Federico II University of Naples, Italy; Mozilla, UK) We devised an algorithm, inspired by contrast-set mining algorithms such as STUCCO, to automatically find statistically significant properties (correlations) in crash groups. Many earlier works focused on improving the clustering of crashes but, to the best of our knowledge, the problem of automatically describing properties of a cluster of crashes is so far unexplored. This means developers currently spend a fair amount of time analyzing the groups themselves, which in turn means that a) they are not spending their time actually developing a fix for the crash; and b) they might miss something in their exploration of the crash data (there is a large number of attributes in crash reports and it is hard and error-prone to manually analyze everything). Our algorithm helps developers and release managers understand crash reports more easily and in an automated way, helping in pinpointing the root cause of the crash. The tool implementing the algorithm has been deployed on Mozilla's crash reporting service. @InProceedings{ESEC/FSE17p717, author = {Marco Castelluccio and Carlo Sansone and Luisa Verdoliva and Giovanni Poggi}, title = {Automatically Analyzing Groups of Crashes for Finding Correlations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {717--726}, doi = {}, year = {2017}, } |
|
Poshyvanyk, Denys |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info |
|
Pu, Geguang |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Rajamani, Sriram |
ESEC/FSE '17: "A Compiler and Verifier for ..."
A Compiler and Verifier for Page Access Oblivious Computation
Rohit Sinha, Sriram Rajamani , and Sanjit A. Seshia (University of California at Berkeley, USA; Microsoft Research, India) Trusted hardware primitives such as Intel's SGX instructions provide applications with a protected address space, called an enclave, for trusted code and data. However, building enclaves that preserve confidentiality of sensitive data continues to be a challenge. The developer must not only avoid leaking secrets via the enclave's outputs but also prevent leaks via side channels induced by interactions with the untrusted platform. Recent attacks have demonstrated that simply observing the page faults incurred during an enclave's execution can reveal its secrets if the enclave makes data accesses or control flow decisions based on secret values. To address this problem, a developer needs compilers to automatically produce confidential programs, and verification tools to certify the absence of secret-dependent page access patterns (a property that we formalize as page-access obliviousness). To that end, we implement an efficient compiler for a type and memory-safe language, a compiler pass that enforces page-access obliviousness with low runtime overheads, and an automatic, modular verifier that certifies page-access obliviousness at the machine-code level, thus removing the compiler from our trusted computing base. We evaluate this toolchain on several machine learning algorithms and image processing routines that we run within SGX enclaves. @InProceedings{ESEC/FSE17p649, author = {Rohit Sinha and Sriram Rajamani and Sanjit A. Seshia}, title = {A Compiler and Verifier for Page Access Oblivious Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {649--660}, doi = {}, year = {2017}, } |
|
Rastogi, Vaibhav |
ESEC/FSE '17: "Cimplifier: Automatically ..."
Cimplifier: Automatically Debloating Containers
Vaibhav Rastogi, Drew Davidson, Lorenzo De Carli, Somesh Jha , and Patrick McDaniel (University of Wisconsin-Madison, USA; Tala Security, USA; Colorado State University, USA; Pennsylvania State University, USA) Application containers, such as those provided by Docker, have recently gained popularity as a solution for agile and seamless software deployment. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. Unfortunately, the ease with which containers can be created is oftentimes a double-edged sword, encouraging the packaging of logically distinct applications, and the inclusion of significant amount of unnecessary components, within a single container. These practices needlessly increase the container size—sometimes by orders of magnitude. They also decrease the overall security, as each included component—necessary or not—may bring in security issues of its own, and there is no isolation between multiple applications packaged within the same container image. We propose algorithms and a tool called Cimplifier, which address these concerns: given a container and simple user-defined constraints, our tool partitions it into simpler containers, which (i) are isolated from each other, only communicating as necessary, and (ii) only include enough resources to perform their functionality. Our evaluation on real-world containers demonstrates that Cimplifier preserves the original functionality, leads to reduction in image size of up to 95%, and processes even large containers in under thirty seconds. @InProceedings{ESEC/FSE17p476, author = {Vaibhav Rastogi and Drew Davidson and Lorenzo De Carli and Somesh Jha and Patrick McDaniel}, title = {Cimplifier: Automatically Debloating Containers}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {476--486}, doi = {}, year = {2017}, } |
|
Ray, Baishakhi |
ESEC/FSE '17: "Automatically Diagnosing and ..."
Automatically Diagnosing and Repairing Error Handling Bugs in C
Yuchi Tian and Baishakhi Ray (University of Virginia, USA) Correct error handling is essential for building reliable and secure systems. Unfortunately, low-level languages like C often do not support any error handling primitives and leave it up to the developers to create their own mechanisms for error propagation and handling. However, in practice, the developers often make mistakes while writing the repetitive and tedious error handling code and inadvertently introduce bugs. Such error handling bugs often have severe consequences undermining the security and reliability of the affected systems. Fixing these bugs is also tiring—they are repetitive and cumbersome to implement. Therefore, it is crucial to develop tool supports for automatically detecting and fixing error handling bugs. To understand the nature of error handling bugs that occur in widely used C programs, we conduct a comprehensive study of real world error handling bugs and their fixes. Leveraging the knowledge, we then design, implement, and evaluate ErrDoc, a tool that not only detects and characterizes different types of error handling bugs but also automatically fixes them. Our evaluation on five open-source projects shows that ErrDoc can detect error handling bugs with 100% to 84% precision and around 95% recall, and categorize them with 83% to 96% precision and above 90% recall. Thus, ErrDoc improves precision up to 5 percentage points, and recall up to 44 percentage points w.r.t. the state-of-the-art. We also demonstrate that ErrDoc can fix the bugs with high accuracy. @InProceedings{ESEC/FSE17p752, author = {Yuchi Tian and Baishakhi Ray}, title = {Automatically Diagnosing and Repairing Error Handling Bugs in C}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {752--762}, doi = {}, year = {2017}, } Best-Paper Award |
|
Reif, Michael |
ESEC/FSE '17: "CodeMatch: Obfuscation Won't ..."
CodeMatch: Obfuscation Won't Conceal Your Repackaged App
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini (TU Darmstadt, Germany) An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps. To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other. To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app’s library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps. Using our approach, we found that ≈ 15% of all apps in Android app stores are repackages @InProceedings{ESEC/FSE17p638, author = {Leonid Glanz and Sven Amann and Michael Eichberg and Michael Reif and Ben Hermann and Johannes Lerch and Mira Mezini}, title = {CodeMatch: Obfuscation Won't Conceal Your Repackaged App}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {638--648}, doi = {}, year = {2017}, } Info |
|
Reps, Thomas |
ESEC/FSE '17: "The Care and Feeding of Wild-Caught ..."
The Care and Feeding of Wild-Caught Mutants
David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps (University of Wisconsin-Madison, USA) Mutation testing of a test suite and a program provides a way to measure the quality of the test suite. In essence, mutation testing is a form of sensitivity testing: by running mutated versions of the program against the test suite, mutation testing measures the suite’s sensitivity for detecting bugs that a programmer might introduce into the program. This paper introduces a technique to improve mutation testing that we call wild-caught mutants; it provides a method for creating potential faults that are more closely coupled with changes made by actual programmers. This technique allows the mutation tester to have more certainty that the test suite is sensitive to the kind of changes that have been observed to have been made by programmers in real-world cases. @InProceedings{ESEC/FSE17p511, author = {David Bingham Brown and Michael Vaughn and Ben Liblit and Thomas Reps}, title = {The Care and Feeding of Wild-Caught Mutants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {511--522}, doi = {}, year = {2017}, } Video Info Artifacts Reusable |
|
Ribeiro, Márcio |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Rinard, Martin |
ESEC/FSE '17: "Automatic Inference of Code ..."
Automatic Inference of Code Transforms for Patch Generation
Fan Long, Peter Amidon, and Martin Rinard (Massachusetts Institute of Technology, USA; University of California at San Diego, USA) We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects. To the best of our knowledge, Genesis is the first system to automatically infer patch generation transforms or candidate patch search spaces from previous successful patches. @InProceedings{ESEC/FSE17p727, author = {Fan Long and Peter Amidon and Martin Rinard}, title = {Automatic Inference of Code Transforms for Patch Generation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {727--739}, doi = {}, year = {2017}, } Info Artifacts Functional ESEC/FSE '17: "CodeCarbonCopy ..." CodeCarbonCopy Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin Rinard (Massachusetts Institute of Technology, USA) We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obtain a translation between the data representation and name space of the recipient and the data representation and name space of the donor. It also implements a static analysis that identifies and removes irrelevant functionality useful in the donor but not in the recipient. We evaluate CCC on eight transfers between six applications. Our results show that CCC can successfully transfer donor functionality into recipient applications. @InProceedings{ESEC/FSE17p95, author = {Stelios Sidiroglou-Douskos and Eric Lahtinen and Anthony Eden and Fan Long and Martin Rinard}, title = {CodeCarbonCopy}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--105}, doi = {}, year = {2017}, } |
|
Ringert, Jan Oliver |
ESEC/FSE '17: "A Symbolic Justice Violations ..."
A Symbolic Justice Violations Transition System for Unrealizable GR(1) Specifications
Aviv Kuvent, Shahar Maoz , and Jan Oliver Ringert (Tel Aviv University, Israel) One of the main challenges of reactive synthesis, an automated procedure to obtain a correct-by-construction reactive system, is to deal with unrealizable specifications. Existing approaches to deal with unrealizability, in the context of GR(1), an expressive assume-guarantee fragment of LTL that enables efficient synthesis, include the generation of concrete counter-strategies and the computation of an unrealizable core. Although correct, such approaches produce large and complicated counter-strategies, often containing thousands of states. This hinders their use by engineers. In this work we present the Justice Violations Transition System (JVTS), a novel symbolic representation of counter-strategies for GR(1). The JVTS is much smaller and simpler than its corresponding concrete counter-strategy. Moreover, it is annotated with invariants that explain how the counter-strategy forces the system to violate the specification. We compute the JVTS symbolically, and thus more efficiently, without the expensive enumeration of concrete states. Finally, we provide the JVTS with an on-demand interactive concrete and symbolic play. We implemented our work, validated its correctness, and evaluated it on 14 unrealizable specifications of autonomous Lego robots as well as on benchmarks from the literature. The evaluation shows not only that the JVTS is in most cases much smaller than the corresponding concrete counter-strategy, but also that its computation is faster. @InProceedings{ESEC/FSE17p362, author = {Aviv Kuvent and Shahar Maoz and Jan Oliver Ringert}, title = {A Symbolic Justice Violations Transition System for Unrealizable GR(1) Specifications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {362--372}, doi = {}, year = {2017}, } Info |
|
Rosenblum, David S. |
ESEC/FSE '17: "Probabilistic Model Checking ..."
Probabilistic Model Checking of Perturbed MDPs with Applications to Cloud Computing
Yamilet R. Serrano Llerena, Guoxin Su, and David S. Rosenblum (National University of Singapore, Singapore; University of Wollongong, Australia) Probabilistic model checking is a formal verification technique that has been applied successfully in a variety of domains, providing identification of system errors through quantitative verification of stochastic system models. One domain that can benefit from probabilistic model checking is cloud computing, which must provide highly reliable and secure computational and storage services to large numbers of mission-critical software systems. For real-world domains like cloud computing, external system factors and environmental changes must be estimated accurately in the form of probabilities in system models; inaccurate estimates for the model probabilities can lead to invalid verification results. To address the effects of uncertainty in probability estimates, in previous work we have developed a variety of techniques for perturbation analysis of discrete- and continuous-time Markov chains (DTMCs and CTMCs). These techniques determine the consequences of the uncertainty on verification of system properties. In this paper, we present the first approach for perturbation analysis of Markov decision processes (MDPs), a stochastic formalism that is especially popular due to the significant expressive power it provides through the combination of both probabilistic and nondeterministic choice. Our primary contribution is a novel technique for efficiently analyzing the effects of perturbations of model probabilities on verification of reachability properties of MDPs. The technique heuristically explores the space of adversaries of an MDP, which encode the different ways of resolving the MDP’s nondeterministic choices. We demonstrate the practical effectiveness of our approach by applying it to two case studies of cloud systems. @InProceedings{ESEC/FSE17p454, author = {Yamilet R. Serrano Llerena and Guoxin Su and David S. Rosenblum}, title = {Probabilistic Model Checking of Perturbed MDPs with Applications to Cloud Computing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {454--464}, doi = {}, year = {2017}, } |
|
Rosner, Nicolás |
ESEC/FSE '17: "Constraint Normalization and ..."
Constraint Normalization and Parameterized Caching for Quantitative Program Analysis
Tegan Brennan, Nestan Tsiskaridze, Nicolás Rosner, Abdulbaki Aydin, and Tevfik Bultan (University of California at Santa Barbara, USA) Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments. @InProceedings{ESEC/FSE17p535, author = {Tegan Brennan and Nestan Tsiskaridze and Nicolás Rosner and Abdulbaki Aydin and Tevfik Bultan}, title = {Constraint Normalization and Parameterized Caching for Quantitative Program Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {535--546}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Roy, Subhajit |
ESEC/FSE '17: "Synergistic Debug-Repair of ..."
Synergistic Debug-Repair of Heap Manipulations
Sahil Verma and Subhajit Roy (IIT Kanpur, India) We present Wolverine, an integrated Debug-Repair environment for heap manipulating programs. Wolverine facilitates stepping through a concrete program execution, provides visualizations of the abstract program states (as box-and-arrow diagrams) and integrates a novel, proof-directed repair algorithm to synthesize repair patches. To provide a seamless environment, Wolverine supports "hot-patching" of the generated repair patches, enabling the programmer to continue the debug session without requiring an abort-compile-debug cycle. We also propose new debug-repair possibilities, "specification refinement" and "specification slicing" made possible by Wolverine. We evaluate our framework on 1600 buggy programs (generated using fault injection) on a variety of data-structures like singly, doubly and circular linked-lists, Binary Search Trees, AVL trees, Red-Black trees and Splay trees; Wolverine could repair all the buggy instances within reasonable time (less than 5 sec in most cases). We also evaluate Wolverine on 247 (buggy) student submissions; Wolverine could repair more than 80% of programs where the student had made a reasonable attempt. @InProceedings{ESEC/FSE17p163, author = {Sahil Verma and Subhajit Roy}, title = {Synergistic Debug-Repair of Heap Manipulations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {163--173}, doi = {}, year = {2017}, } |
|
Roychoudhury, Abhik |
ESEC/FSE '17: "A Feasibility Study of Using ..."
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury (Innopolis University, Russia; IIT Kanpur, India; National University of Singapore, Singapore) Despite the fact an intelligent tutoring system for programming (ITSP) education has long attracted interest, its widespread use has been hindered by the difficulty of generating personalized feedback automatically. Meanwhile, automated program repair (APR) is an emerging new technology that automatically fixes software bugs, and it has been shown that APR can fix the bugs of large real-world software. In this paper, we study the feasibility of marrying intelligent programming tutoring and APR. We perform our feasibility study with four state-of-the-art APR tools (GenProg, AE, Angelix, and Prophet), and 661 programs written by the students taking an introductory programming course. We found that when APR tools are used out of the box, only about 30% of the programs in our dataset are repaired. This low repair rate is largely due to the student programs often being significantly incorrect — in contrast, professional software for which APR was successfully applied typically fails only a small portion of tests. To bridge this gap, we adopt in APR a new repair policy akin to the hint generation policy employed in the existing ITSP. This new repair policy admits partial repairs that address part of failing tests, which results in 84% improvement of repair rate. We also performed a user study with 263 novice students and 37 graders, and identified an understudied problem; while novice students do not seem to know how to effectively make use of generated repairs as hints, the graders do seem to gain benefits from repairs. @InProceedings{ESEC/FSE17p740, author = {Jooyong Yi and Umair Z. Ahmed and Amey Karkare and Shin Hwei Tan and Abhik Roychoudhury}, title = {A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {740--751}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Ruef, Andrew |
ESEC/FSE '17: "Counterexample-Guided Approach ..."
Counterexample-Guided Approach to Finding Numerical Invariants
ThanhVu Nguyen , Timos Antonopoulos , Andrew Ruef, and Michael Hicks (University of Nebraska-Lincoln, USA; Yale University, USA; University of Maryland, USA) Numerical invariants, e.g., relationships among numerical variables in a program, represent a useful class of properties to analyze programs. General polynomial invariants represent more complex numerical relations, but they are often required in many scientific and engineering applications. We present NumInv, a tool that implements a counterexample-guided invariant generation (CEGIR) technique to automatically discover numerical invariants, which are polynomial equality and inequality relations among numerical variables. This CEGIR technique infers candidate invariants from program traces and then checks them against the program source code using the KLEE test-input generation tool. If the invariants are incorrect KLEE returns counterexample traces, which help the dynamic inference obtain better results. Existing CEGIR approaches often require sound invariants, however NumInv sacrifices soundness and produces results that KLEE cannot refute within certain time bounds. This design and the use of KLEE as a verifier allow NumInv to discover useful and important numerical invariants for many challenging programs. Preliminary results show that NumInv generates required invariants for understanding and verifying correctness of programs involving complex arithmetic. We also show that NumInv discovers polynomial invariants that capture precise complexity bounds of programs used to benchmark existing static complexity analysis techniques. Finally, we show that NumInv performs competitively comparing to state of the art numerical invariant analysis tools. @InProceedings{ESEC/FSE17p605, author = {ThanhVu Nguyen and Timos Antonopoulos and Andrew Ruef and Michael Hicks}, title = {Counterexample-Guided Approach to Finding Numerical Invariants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {605--615}, doi = {}, year = {2017}, } |
|
Sadeghi, Alireza |
ESEC/FSE '17: "PATDroid: Permission-Aware ..."
PATDroid: Permission-Aware GUI Testing of Android
Alireza Sadeghi, Reyhaneh Jabbarvand, and Sam Malek (University of California at Irvine, USA) Recent introduction of a dynamic permission system in Android, allowing the users to grant and revoke permissions after the installation of an app, has made it harder to properly test apps. Since an app's behavior may change depending on the granted permissions, it needs to be tested under a wide range of permission combinations. At the state-of-the-art, in the absence of any automated tool support, a developer needs to either manually determine the interaction of tests and app permissions, or exhaustively re-execute tests for all possible permission combinations, thereby increasing the time and resources required to test apps. This paper presents an automated approach, called PATDroid, for efficiently testing an Android app while taking the impact of permissions on its behavior into account. PATDroid performs a hybrid program analysis on both an app under test and its test suite to determine which tests should be executed on what permission combinations. Our experimental results show that PATDroid significantly reduces the testing effort, yet achieves comparable code coverage and fault detection capability as exhaustively testing an app under all permission combinations. @InProceedings{ESEC/FSE17p220, author = {Alireza Sadeghi and Reyhaneh Jabbarvand and Sam Malek}, title = {PATDroid: Permission-Aware GUI Testing of Android}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {220--232}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Sansone, Carlo |
ESEC/FSE '17: "Automatically Analyzing Groups ..."
Automatically Analyzing Groups of Crashes for Finding Correlations
Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi (Federico II University of Naples, Italy; Mozilla, UK) We devised an algorithm, inspired by contrast-set mining algorithms such as STUCCO, to automatically find statistically significant properties (correlations) in crash groups. Many earlier works focused on improving the clustering of crashes but, to the best of our knowledge, the problem of automatically describing properties of a cluster of crashes is so far unexplored. This means developers currently spend a fair amount of time analyzing the groups themselves, which in turn means that a) they are not spending their time actually developing a fix for the crash; and b) they might miss something in their exploration of the crash data (there is a large number of attributes in crash reports and it is hard and error-prone to manually analyze everything). Our algorithm helps developers and release managers understand crash reports more easily and in an automated way, helping in pinpointing the root cause of the crash. The tool implementing the algorithm has been deployed on Mozilla's crash reporting service. @InProceedings{ESEC/FSE17p717, author = {Marco Castelluccio and Carlo Sansone and Luisa Verdoliva and Giovanni Poggi}, title = {Automatically Analyzing Groups of Crashes for Finding Correlations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {717--726}, doi = {}, year = {2017}, } |
|
Schaefer, Ina |
ESEC/FSE '17: "Is There a Mismatch between ..."
Is There a Mismatch between Real-World Feature Models and Product-Line Research?
Alexander Knüppel, Thomas Thüm, Stephan Mennicke, Jens Meinicke, and Ina Schaefer (TU Braunschweig, Germany; University of Magdeburg, Germany) Feature modeling has emerged as the de-facto standard to compactly capture the variability of a software product line. Multiple feature modeling languages have been proposed that evolved over the last decades to manage industrial-size product lines. However, less expressive languages, solely permitting require and exclude constraints, are permanently and carelessly used in product-line research. We address the problem whether those less expressive languages are sufficient for industrial product lines. We developed an algorithm to eliminate complex cross-tree constraints in a feature model, enabling the combination of tools and algorithms working with different feature model dialects in a plug-and-play manner. However, the scope of our algorithm is limited. Our evaluation on large feature models, including the Linux kernel, gives evidence that require and exclude constraints are not sufficient to express real-world feature models. Hence, we promote that research on feature models needs to consider arbitrary propositional formulas as cross-tree constraints prospectively. @InProceedings{ESEC/FSE17p291, author = {Alexander Knüppel and Thomas Thüm and Stephan Mennicke and Jens Meinicke and Ina Schaefer}, title = {Is There a Mismatch between Real-World Feature Models and Product-Line Research?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {291--302}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Seshia, Sanjit A. |
ESEC/FSE '17: "A Compiler and Verifier for ..."
A Compiler and Verifier for Page Access Oblivious Computation
Rohit Sinha, Sriram Rajamani , and Sanjit A. Seshia (University of California at Berkeley, USA; Microsoft Research, India) Trusted hardware primitives such as Intel's SGX instructions provide applications with a protected address space, called an enclave, for trusted code and data. However, building enclaves that preserve confidentiality of sensitive data continues to be a challenge. The developer must not only avoid leaking secrets via the enclave's outputs but also prevent leaks via side channels induced by interactions with the untrusted platform. Recent attacks have demonstrated that simply observing the page faults incurred during an enclave's execution can reveal its secrets if the enclave makes data accesses or control flow decisions based on secret values. To address this problem, a developer needs compilers to automatically produce confidential programs, and verification tools to certify the absence of secret-dependent page access patterns (a property that we formalize as page-access obliviousness). To that end, we implement an efficient compiler for a type and memory-safe language, a compiler pass that enforces page-access obliviousness with low runtime overheads, and an automatic, modular verifier that certifies page-access obliviousness at the machine-code level, thus removing the compiler from our trusted computing base. We evaluate this toolchain on several machine learning algorithms and image processing routines that we run within SGX enclaves. @InProceedings{ESEC/FSE17p649, author = {Rohit Sinha and Sriram Rajamani and Sanjit A. Seshia}, title = {A Compiler and Verifier for Page Access Oblivious Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {649--660}, doi = {}, year = {2017}, } |
|
Shihab, Emad |
ESEC/FSE '17: "Why Do Developers Use Trivial ..."
Why Do Developers Use Trivial Packages? An Empirical Case Study on npm
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (Concordia University, Canada) Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call `trivial packages'. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers' beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be `deployment tested' and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use. @InProceedings{ESEC/FSE17p385, author = {Rabe Abdalkareem and Olivier Nourry and Sultan Wehaibi and Suhaib Mujahid and Emad Shihab}, title = {Why Do Developers Use Trivial Packages? An Empirical Case Study on npm}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {385--395}, doi = {}, year = {2017}, } |
|
Sidiroglou-Douskos, Stelios |
ESEC/FSE '17: "CodeCarbonCopy ..."
CodeCarbonCopy
Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin Rinard (Massachusetts Institute of Technology, USA) We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obtain a translation between the data representation and name space of the recipient and the data representation and name space of the donor. It also implements a static analysis that identifies and removes irrelevant functionality useful in the donor but not in the recipient. We evaluate CCC on eight transfers between six applications. Our results show that CCC can successfully transfer donor functionality into recipient applications. @InProceedings{ESEC/FSE17p95, author = {Stelios Sidiroglou-Douskos and Eric Lahtinen and Anthony Eden and Fan Long and Martin Rinard}, title = {CodeCarbonCopy}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {95--105}, doi = {}, year = {2017}, } |
|
Siegmund, Janet |
ESEC/FSE '17: "Measuring Neural Efficiency ..."
Measuring Neural Efficiency of Program Comprehension
Janet Siegmund, Norman Peitek, Chris Parnin , Sven Apel, Johannes Hofmeister, Christian Kästner , Andrew Begel, Anja Bethmann, and André Brechmann (University of Passau, Germany; Leibniz Institute for Neurobiology, Germany; North Carolina State University, USA; Carnegie Mellon University, USA; Microsoft Research, USA) Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as ``beacons,' for locating relevant code, and ``plans,'' for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension. @InProceedings{ESEC/FSE17p140, author = {Janet Siegmund and Norman Peitek and Chris Parnin and Sven Apel and Johannes Hofmeister and Christian Kästner and Andrew Begel and Anja Bethmann and André Brechmann}, title = {Measuring Neural Efficiency of Program Comprehension}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {140--150}, doi = {}, year = {2017}, } Info |
|
Siegmund, Norbert |
ESEC/FSE '17: "Finding Near-Optimal Configurations ..."
Finding Near-Optimal Configurations in Product Lines by Random Sampling
Jeho Oh, Don Batory, Margaret Myers, and Norbert Siegmund (University of Texas at Austin, USA; Bauhaus-University Weimar, Germany) Software Product Lines (SPLs) are highly configurable systems. This raises the challenge to find optimal performing configurations for an anticipated workload. As SPL configuration spaces are huge, it is infeasible to benchmark all configurations to find an optimal one. Prior work focused on building performance models to predict and optimize SPL configurations. Instead, we randomly sample and recursively search a configuration space directly to find near-optimal configurations without constructing a prediction model. Our algorithms are simpler and have higher accuracy and efficiency. @InProceedings{ESEC/FSE17p61, author = {Jeho Oh and Don Batory and Margaret Myers and Norbert Siegmund}, title = {Finding Near-Optimal Configurations in Product Lines by Random Sampling}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {61--71}, doi = {}, year = {2017}, } ESEC/FSE '17: "Attributed Variability Models: ..." Attributed Variability Models: Outside the Comfort Zone Norbert Siegmund, Stefan Sobernig, and Sven Apel (Bauhaus-University Weimar, Germany; WU Vienna, Austria; University of Passau, Germany) Variability models are often enriched with attributes, such as performance, that encode the influence of features on the respective attribute. In spite of their importance, there are only few attributed variability models available that have attribute values obtained from empirical, real-world observations and that cover interactions between features. But, what does it mean for research and practice when staying in the comfort zone of developing algorithms and tools in a setting where artificial attribute values are used and where interactions are neglected? This is the central question that we want to answer here. To leave the comfort zone, we use a combination of kernel density estimation and a genetic algorithm to rescale a given (real-world) attribute-value profile to a given variability model. To demonstrate the influence and relevance of realistic attribute values and interactions, we present a replication of a widely recognized, third-party study, into which we introduce realistic attribute values and interactions. We found statistically significant differences between the original study and the replication. We infer lessons learned to conduct experiments that involve attributed variability models. We also provide the accompanying tool Thor for generating attribute values including interactions. Our solution is shown to be agnostic about the given input distribution and to scale to large variability models. @InProceedings{ESEC/FSE17p268, author = {Norbert Siegmund and Stefan Sobernig and Sven Apel}, title = {Attributed Variability Models: Outside the Comfort Zone}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {268--278}, doi = {}, year = {2017}, } Info ESEC/FSE '17: "Using Bad Learners to Find ..." Using Bad Learners to Find Good Configurations Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel (North Carolina State University, USA; Bauhaus-University Weimar, Germany; University of Passau, Germany) Finding the optimally performing configuration of a software system for a given setting is often challenging. Recent approaches address this challenge by learning performance models based on a sample set of configurations. However, building an accurate performance model can be very expensive (and is often infeasible in practice). The central insight of this paper is that exact performance values (e.g., the response time of a software system) are not required to rank configurations and to identify the optimal one. As shown by our experiments, performance models that are cheap to learn but inaccurate (with respect to the difference between actual and predicted performance) can still be used rank configurations and hence find the optimal configuration. This novel rank-based approach allows us to significantly reduce the cost (in terms of number of measurements of sample configuration) as well as the time required to build performance models. We evaluate our approach with 21 scenarios based on 9 software systems and demonstrate that our approach is beneficial in 16 scenarios; for the remaining 5 scenarios, an accurate model can be built by using very few samples anyway, without the need for a rank-based approach. @InProceedings{ESEC/FSE17p257, author = {Vivek Nair and Tim Menzies and Norbert Siegmund and Sven Apel}, title = {Using Bad Learners to Find Good Configurations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {257--267}, doi = {}, year = {2017}, } |
|
Singh, Rishabh |
ESEC/FSE '17: "NoFAQ: Synthesizing Command ..."
NoFAQ: Synthesizing Command Repairs from Examples
Loris D'Antoni , Rishabh Singh, and Michael Vaughn (University of Wisconsin-Madison, USA; Microsoft Research, USA) Command-line tools are confusing and hard to use due to their cryptic error messages and lack of documentation. Novice users often resort to online help-forums for finding corrections to their buggy commands, but have a hard time in searching precisely for posts that are relevant to their problem and then applying the suggested solutions to their buggy command. We present NoFAQ, a tool that uses a set of rules to suggest possible fixes when users write buggy commands that trigger commonly occurring errors. The rules are expressed in a language called FIXIT and each rule pattern-matches against the user's buggy command and corresponding error message, and uses these inputs to produce a possible fixed command. NoFAQ automatically learns FIXIT rules from examples of buggy and repaired commands. We evaluate NoFAQ on two fronts. First, we use 92 benchmark problems drawn from an existing tool and show that NoFAQ is able to synthesize rules for 81 benchmark problems in real time using just 2 to 5 input-output examples for each rule. Second, we run our learning algorithm on the examples obtained through a crowd-sourcing interface and show that the learning algorithm scales to large sets of examples. @InProceedings{ESEC/FSE17p582, author = {Loris D'Antoni and Rishabh Singh and Michael Vaughn}, title = {NoFAQ: Synthesizing Command Repairs from Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {582--592}, doi = {}, year = {2017}, } |
|
Sinha, Rohit |
ESEC/FSE '17: "A Compiler and Verifier for ..."
A Compiler and Verifier for Page Access Oblivious Computation
Rohit Sinha, Sriram Rajamani , and Sanjit A. Seshia (University of California at Berkeley, USA; Microsoft Research, India) Trusted hardware primitives such as Intel's SGX instructions provide applications with a protected address space, called an enclave, for trusted code and data. However, building enclaves that preserve confidentiality of sensitive data continues to be a challenge. The developer must not only avoid leaking secrets via the enclave's outputs but also prevent leaks via side channels induced by interactions with the untrusted platform. Recent attacks have demonstrated that simply observing the page faults incurred during an enclave's execution can reveal its secrets if the enclave makes data accesses or control flow decisions based on secret values. To address this problem, a developer needs compilers to automatically produce confidential programs, and verification tools to certify the absence of secret-dependent page access patterns (a property that we formalize as page-access obliviousness). To that end, we implement an efficient compiler for a type and memory-safe language, a compiler pass that enforces page-access obliviousness with low runtime overheads, and an automatic, modular verifier that certifies page-access obliviousness at the machine-code level, thus removing the compiler from our trusted computing base. We evaluate this toolchain on several machine learning algorithms and image processing routines that we run within SGX enclaves. @InProceedings{ESEC/FSE17p649, author = {Rohit Sinha and Sriram Rajamani and Sanjit A. Seshia}, title = {A Compiler and Verifier for Page Access Oblivious Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {649--660}, doi = {}, year = {2017}, } |
|
Smith, Calvin |
ESEC/FSE '17: "Discovering Relational Specifications ..."
Discovering Relational Specifications
Calvin Smith, Gabriel Ferns, and Aws Albarghouthi (University of Wisconsin-Madison, USA) Formal specifications of library functions play a critical role in a number of program analysis and development tasks. We present Bach, a technique for discovering likely relational specifications from data describing input–output behavior of a set of functions comprising a library or a program. Relational specifications correlate different executions of different functions; for instance, commutativity, transitivity, equivalence of two functions, etc. Bach combines novel insights from program synthesis and databases to discover a rich array of specifications. We apply Bach to learn specifications from data generated for a number of standard libraries. Our experimental evaluation demonstrates Bach’s ability to learn useful and deep specifications in a small amount of time. @InProceedings{ESEC/FSE17p616, author = {Calvin Smith and Gabriel Ferns and Aws Albarghouthi}, title = {Discovering Relational Specifications}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {616--626}, doi = {}, year = {2017}, } Best-Paper Award |
|
Sobernig, Stefan |
ESEC/FSE '17: "Attributed Variability Models: ..."
Attributed Variability Models: Outside the Comfort Zone
Norbert Siegmund, Stefan Sobernig, and Sven Apel (Bauhaus-University Weimar, Germany; WU Vienna, Austria; University of Passau, Germany) Variability models are often enriched with attributes, such as performance, that encode the influence of features on the respective attribute. In spite of their importance, there are only few attributed variability models available that have attribute values obtained from empirical, real-world observations and that cover interactions between features. But, what does it mean for research and practice when staying in the comfort zone of developing algorithms and tools in a setting where artificial attribute values are used and where interactions are neglected? This is the central question that we want to answer here. To leave the comfort zone, we use a combination of kernel density estimation and a genetic algorithm to rescale a given (real-world) attribute-value profile to a given variability model. To demonstrate the influence and relevance of realistic attribute values and interactions, we present a replication of a widely recognized, third-party study, into which we introduce realistic attribute values and interactions. We found statistically significant differences between the original study and the replication. We infer lessons learned to conduct experiments that involve attributed variability models. We also provide the accompanying tool Thor for generating attribute values including interactions. Our solution is shown to be agnostic about the given input distribution and to scale to large variability models. @InProceedings{ESEC/FSE17p268, author = {Norbert Siegmund and Stefan Sobernig and Sven Apel}, title = {Attributed Variability Models: Outside the Comfort Zone}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {268--278}, doi = {}, year = {2017}, } Info |
|
Soremekun, Ezekiel O. |
ESEC/FSE '17: "Where Is the Bug and How Is ..."
Where Is the Bug and How Is It Fixed? An Experiment with Practitioners
Marcel Böhme, Ezekiel O. Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller (National University of Singapore, Singapore; Saarland University, Germany; Singapore University of Technology and Design, Singapore; SAP, Germany) Research has produced many approaches to automatically locate, explain, and repair software bugs. But do these approaches relate to the way practitioners actually locate, understand, and fix bugs? To help answer this question, we have collected a dataset named DBGBENCH --- the correct fault locations, bug diagnoses, and software patches of 27 real errors in open-source C projects that were consolidated from hundreds of debugging sessions of professional software engineers. Moreover, we shed light on the entire debugging process, from constructing a hypothesis to submitting a patch, and how debugging time, difficulty, and strategies vary across practitioners and types of errors. Most notably, DBGBENCH can serve as reality check for novel automated debugging and repair techniques. @InProceedings{ESEC/FSE17p117, author = {Marcel Böhme and Ezekiel O. Soremekun and Sudipta Chattopadhyay and Emamurho Ugherughe and Andreas Zeller}, title = {Where Is the Bug and How Is It Fixed? An Experiment with Practitioners}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {117--128}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Sorensen, Tyler |
ESEC/FSE '17: "Cooperative Kernels: GPU Multitasking ..."
Cooperative Kernels: GPU Multitasking for Blocking Algorithms
Tyler Sorensen, Hugues Evrard, and Alastair F. Donaldson (Imperial College London, UK) There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today's GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking. Our prototype exploits no vendor-specific hardware, driver or compiler support, thus our results provide a lower-bound on the efficiency with which cooperative kernels can be implemented in practice. @InProceedings{ESEC/FSE17p431, author = {Tyler Sorensen and Hugues Evrard and Alastair F. Donaldson}, title = {Cooperative Kernels: GPU Multitasking for Blocking Algorithms}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {431--441}, doi = {}, year = {2017}, } Best-Paper Award |
|
Sousa, Leonardo |
ESEC/FSE '17: "Understanding the Impact of ..."
Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects
Diego Cedrim, Alessandro Garcia , Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez (PUC-Rio, Brazil; Federal University of Campina Grande, Brazil; Federal University of Alagoas, Brazil) Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, refactoring intends to achieve different goals in practice, and its application may not reduce smelly structures. Developers may neglect or end up creating new code smells through refactoring. Unfortunately, little has been reported about the beneficial and harmful effects of refactoring on code smells. This paper reports a longitudinal study intended to address this gap. We analyze how often commonly-used refactoring types affect the density of 13 types of code smells along the version histories of 23 projects. Our findings are based on the analysis of 16,566 refactorings distributed in 10 different types. Even though 79.4% of the refactorings touched smelly elements, 57% did not reduce their occurrences. Surprisingly, only 9.7% of refactorings removed smells, while 33.3% induced the introduction of new ones. More than 95% of such refactoring-induced smells were not removed in successive commits, which suggest refactorings tend to more frequently introduce long-living smells instead of eliminating existing ones. We also characterized and quantified typical refactoring-smell patterns, and observed that harmful patterns are frequent, including: (i) approximately 30% of the Move Method and Pull Up Method refactorings induced the emergence of God Class, and (ii) the Extract Superclass refactoring creates the smell Speculative Generality in 68% of the cases. @InProceedings{ESEC/FSE17p465, author = {Diego Cedrim and Alessandro Garcia and Melina Mongiovi and Rohit Gheyi and Leonardo Sousa and Rafael de Mello and Baldoino Fonseca and Márcio Ribeiro and Alexander Chávez}, title = {Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {465--475}, doi = {}, year = {2017}, } Info |
|
Su, Guoxin |
ESEC/FSE '17: "Probabilistic Model Checking ..."
Probabilistic Model Checking of Perturbed MDPs with Applications to Cloud Computing
Yamilet R. Serrano Llerena, Guoxin Su, and David S. Rosenblum (National University of Singapore, Singapore; University of Wollongong, Australia) Probabilistic model checking is a formal verification technique that has been applied successfully in a variety of domains, providing identification of system errors through quantitative verification of stochastic system models. One domain that can benefit from probabilistic model checking is cloud computing, which must provide highly reliable and secure computational and storage services to large numbers of mission-critical software systems. For real-world domains like cloud computing, external system factors and environmental changes must be estimated accurately in the form of probabilities in system models; inaccurate estimates for the model probabilities can lead to invalid verification results. To address the effects of uncertainty in probability estimates, in previous work we have developed a variety of techniques for perturbation analysis of discrete- and continuous-time Markov chains (DTMCs and CTMCs). These techniques determine the consequences of the uncertainty on verification of system properties. In this paper, we present the first approach for perturbation analysis of Markov decision processes (MDPs), a stochastic formalism that is especially popular due to the significant expressive power it provides through the combination of both probabilistic and nondeterministic choice. Our primary contribution is a novel technique for efficiently analyzing the effects of perturbations of model probabilities on verification of reachability properties of MDPs. The technique heuristically explores the space of adversaries of an MDP, which encode the different ways of resolving the MDP’s nondeterministic choices. We demonstrate the practical effectiveness of our approach by applying it to two case studies of cloud systems. @InProceedings{ESEC/FSE17p454, author = {Yamilet R. Serrano Llerena and Guoxin Su and David S. Rosenblum}, title = {Probabilistic Model Checking of Perturbed MDPs with Applications to Cloud Computing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {454--464}, doi = {}, year = {2017}, } |
|
Su, Ting |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Su, Zhendong |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Tan, Lin |
ESEC/FSE '17: "Better Test Cases for Better ..."
Better Test Cases for Better Automated Program Repair
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan (University of Waterloo, Canada) Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad). @InProceedings{ESEC/FSE17p831, author = {Jinqiu Yang and Alexey Zhikhartsev and Yuefei Liu and Lin Tan}, title = {Better Test Cases for Better Automated Program Repair}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {831--841}, doi = {}, year = {2017}, } ESEC/FSE '17: "QTEP: Quality-Aware Test Case ..." QTEP: Quality-Aware Test Case Prioritization Song Wang, Jaechang Nam, and Lin Tan (University of Waterloo, Canada) Test case prioritization (TCP) is a practical activity in software testing for exposing faults earlier. Researchers have proposed many TCP techniques to reorder test cases. Among them, coverage-based TCPs have been widely investigated. Specifically, coverage-based TCP approaches leverage coverage information between source code and test cases, i.e., static code coverage and dynamic code coverage, to schedule test cases. Existing coverage-based TCP techniques mainly focus on maximizing coverage while often do not consider the likely distribution of faults in source code. However, software faults are not often equally distributed in source code, e.g., around 80% faults are located in about 20% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. In this paper, we present a quality-aware test case prioritization technique, QTEP, to address the limitation of existing coverage-based TCP algorithms. In QTEP, we leverage code inspection techniques, i.e., a typical statistic defect prediction model and a typical static bug finder, to detect fault-prone source code and then adapt existing coverage-based TCP algorithms by considering the weighted source code in terms of fault-proneness. Our evaluation with 16 variant QTEP techniques on 33 different versions of 7 open source Java projects shows that QTEP could improve existing coverage-based TCP techniques for both regression and new test cases. Specifically, the improvement of the best variant of QTEP for regression test cases could be up to 15.0% and on average 7.6%, and for all test cases (both regression and new test cases), the improvement could be up to 10.0% and on average 5.0%. @InProceedings{ESEC/FSE17p523, author = {Song Wang and Jaechang Nam and Lin Tan}, title = {QTEP: Quality-Aware Test Case Prioritization}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {523--534}, doi = {}, year = {2017}, } Info |
|
Tan, Shin Hwei |
ESEC/FSE '17: "A Feasibility Study of Using ..."
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury (Innopolis University, Russia; IIT Kanpur, India; National University of Singapore, Singapore) Despite the fact an intelligent tutoring system for programming (ITSP) education has long attracted interest, its widespread use has been hindered by the difficulty of generating personalized feedback automatically. Meanwhile, automated program repair (APR) is an emerging new technology that automatically fixes software bugs, and it has been shown that APR can fix the bugs of large real-world software. In this paper, we study the feasibility of marrying intelligent programming tutoring and APR. We perform our feasibility study with four state-of-the-art APR tools (GenProg, AE, Angelix, and Prophet), and 661 programs written by the students taking an introductory programming course. We found that when APR tools are used out of the box, only about 30% of the programs in our dataset are repaired. This low repair rate is largely due to the student programs often being significantly incorrect — in contrast, professional software for which APR was successfully applied typically fails only a small portion of tests. To bridge this gap, we adopt in APR a new repair policy akin to the hint generation policy employed in the existing ITSP. This new repair policy admits partial repairs that address part of failing tests, which results in 84% improvement of repair rate. We also performed a user study with 263 novice students and 37 graders, and identified an understudied problem; while novice students do not seem to know how to effectively make use of generated repairs as hints, the graders do seem to gain benefits from repairs. @InProceedings{ESEC/FSE17p740, author = {Jooyong Yi and Umair Z. Ahmed and Amey Karkare and Shin Hwei Tan and Abhik Roychoudhury}, title = {A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {740--751}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Terragni, Valerio |
ESEC/FSE '17: "Reproducing Concurrency Failures ..."
Reproducing Concurrency Failures from Crash Stacks
Francesco A. Bianchi, Mauro Pezzè , and Valerio Terragni (University of Lugano, Switzerland) Reproducing field failures is the first essential step for understanding, localizing and removing faults. Reproducing concurrency field failures is hard due to the need of synthesizing a test code jointly with a thread interleaving that induce the failure in the presence of limited information from the field. Current techniques for reproducing concurrency failures focus on identifying failure-inducing interleavings, leaving largely open the problem of synthesizing the test code that manifests such interleavings. In this paper, we present ConCrash, a technique to automatically generate test codes that reproduce concurrency failures that violate thread-safety from crash stacks, which commonly summarize the conditions of field failures. ConCrash efficiently explores the huge space of possible test codes to identify a failure-inducing one by using a suitable set of search pruning strategies. Combined with existing techniques for exploring interleavings, ConCrash automatically reproduces a given concurrency failure that violates the thread-safety of a class by identifying both a failure-inducing test code and corresponding interleaving. In the paper, we define the ConCrash approach, present a prototype implementation of ConCrash, and discuss the experimental results that we obtained on a known set of ten field failures that witness the effectiveness of the approach. @InProceedings{ESEC/FSE17p705, author = {Francesco A. Bianchi and Mauro Pezzè and Valerio Terragni}, title = {Reproducing Concurrency Failures from Crash Stacks}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {705--716}, doi = {}, year = {2017}, } |
|
Thüm, Thomas |
ESEC/FSE '17: "Is There a Mismatch between ..."
Is There a Mismatch between Real-World Feature Models and Product-Line Research?
Alexander Knüppel, Thomas Thüm, Stephan Mennicke, Jens Meinicke, and Ina Schaefer (TU Braunschweig, Germany; University of Magdeburg, Germany) Feature modeling has emerged as the de-facto standard to compactly capture the variability of a software product line. Multiple feature modeling languages have been proposed that evolved over the last decades to manage industrial-size product lines. However, less expressive languages, solely permitting require and exclude constraints, are permanently and carelessly used in product-line research. We address the problem whether those less expressive languages are sufficient for industrial product lines. We developed an algorithm to eliminate complex cross-tree constraints in a feature model, enabling the combination of tools and algorithms working with different feature model dialects in a plug-and-play manner. However, the scope of our algorithm is limited. Our evaluation on large feature models, including the Linux kernel, gives evidence that require and exclude constraints are not sufficient to express real-world feature models. Hence, we promote that research on feature models needs to consider arbitrary propositional formulas as cross-tree constraints prospectively. @InProceedings{ESEC/FSE17p291, author = {Alexander Knüppel and Thomas Thüm and Stephan Mennicke and Jens Meinicke and Ina Schaefer}, title = {Is There a Mismatch between Real-World Feature Models and Product-Line Research?}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {291--302}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Tian, Yuchi |
ESEC/FSE '17: "Automatically Diagnosing and ..."
Automatically Diagnosing and Repairing Error Handling Bugs in C
Yuchi Tian and Baishakhi Ray (University of Virginia, USA) Correct error handling is essential for building reliable and secure systems. Unfortunately, low-level languages like C often do not support any error handling primitives and leave it up to the developers to create their own mechanisms for error propagation and handling. However, in practice, the developers often make mistakes while writing the repetitive and tedious error handling code and inadvertently introduce bugs. Such error handling bugs often have severe consequences undermining the security and reliability of the affected systems. Fixing these bugs is also tiring—they are repetitive and cumbersome to implement. Therefore, it is crucial to develop tool supports for automatically detecting and fixing error handling bugs. To understand the nature of error handling bugs that occur in widely used C programs, we conduct a comprehensive study of real world error handling bugs and their fixes. Leveraging the knowledge, we then design, implement, and evaluate ErrDoc, a tool that not only detects and characterizes different types of error handling bugs but also automatically fixes them. Our evaluation on five open-source projects shows that ErrDoc can detect error handling bugs with 100% to 84% precision and around 95% recall, and categorize them with 83% to 96% precision and above 90% recall. Thus, ErrDoc improves precision up to 5 percentage points, and recall up to 44 percentage points w.r.t. the state-of-the-art. We also demonstrate that ErrDoc can fix the bugs with high accuracy. @InProceedings{ESEC/FSE17p752, author = {Yuchi Tian and Baishakhi Ray}, title = {Automatically Diagnosing and Repairing Error Handling Bugs in C}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {752--762}, doi = {}, year = {2017}, } Best-Paper Award |
|
Tiu, Alwen |
ESEC/FSE '17: "Steelix: Program-State Based ..."
Steelix: Program-State Based Binary Fuzzing
Yuekang Li , Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu , and Alwen Tiu (Nanyang Technological University, Singapore; Fudan University, China) Coverage-based fuzzing is one of the most effective techniques to find vulnerabilities, bugs or crashes. However, existing techniques suffer from the difficulty in exercising the paths that are protected by magic bytes comparisons (e.g., string equality comparisons). Several approaches have been proposed to use heavy-weight program analysis to break through magic bytes comparisons, and hence are less scalable. In this paper, we propose a program-state based binary fuzzing approach, named Steelix, which improves the penetration power of a fuzzer at the cost of an acceptable slow down of the execution speed. In particular, we use light-weight static analysis and binary instrumentation to provide not only coverage information but also comparison progress information to a fuzzer. Such program state information informs a fuzzer about where the magic bytes are located in the test input and how to perform mutations to match the magic bytes efficiently. We have implemented Steelix and evaluated it on three datasets: LAVA-M dataset, DARPA CGC sample binaries and five real-life programs. The results show that Steelix has better code coverage and bug detection capability than the state-of-the-art fuzzers. Moreover, we found one CVE and nine new bugs. @InProceedings{ESEC/FSE17p627, author = {Yuekang Li and Bihuan Chen and Mahinthan Chandramohan and Shang-Wei Lin and Yang Liu and Alwen Tiu}, title = {Steelix: Program-State Based Binary Fuzzing}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {627--637}, doi = {}, year = {2017}, } |
|
Tsigkanos, Christos |
ESEC/FSE '17: "Modeling and Verification ..."
Modeling and Verification of Evolving Cyber-Physical Spaces
Christos Tsigkanos, Timo Kehrer, and Carlo Ghezzi (Politecnico di Milano, Italy) We increasingly live in cyber-physical spaces -- spaces that are both physical and digital, and where the two aspects are intertwined. Such spaces are highly dynamic and typically undergo continuous change. Software engineering can have a profound impact in this domain, by defining suitable modeling and specification notations as well as supporting design-time formal verification. In this paper, we present a methodology and a technical framework which support modeling of evolving cyber-physical spaces and reasoning about their spatio-temporal properties. We utilize a discrete, graph-based formalism for modeling cyber-physical spaces as well as primitives of change, giving rise to a reactive system consisting of rewriting rules with both local and global application conditions. Formal reasoning facilities are implemented adopting logic-based specification of properties and according model checking procedures, in both spatial and temporal fragments. We evaluate our approach using a case study of a disaster scenario in a smart city. @InProceedings{ESEC/FSE17p38, author = {Christos Tsigkanos and Timo Kehrer and Carlo Ghezzi}, title = {Modeling and Verification of Evolving Cyber-Physical Spaces}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {38--48}, doi = {}, year = {2017}, } |
|
Tsiskaridze, Nestan |
ESEC/FSE '17: "Constraint Normalization and ..."
Constraint Normalization and Parameterized Caching for Quantitative Program Analysis
Tegan Brennan, Nestan Tsiskaridze, Nicolás Rosner, Abdulbaki Aydin, and Tevfik Bultan (University of California at Santa Barbara, USA) Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments. @InProceedings{ESEC/FSE17p535, author = {Tegan Brennan and Nestan Tsiskaridze and Nicolás Rosner and Abdulbaki Aydin and Tevfik Bultan}, title = {Constraint Normalization and Parameterized Caching for Quantitative Program Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {535--546}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Tufano, Michele |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info |
|
Tunnell, Timothy |
ESEC/FSE '17: "Trade-Offs in Continuous Integration: ..."
Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility
Michael Hilton , Nicholas Nelson, Timothy Tunnell, Darko Marinov , and Danny Dig (Oregon State University, USA; University of Illinois at Urbana-Champaign, USA) Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI being a widely used activity in software engineering, we do not know what motivates developers to use CI, and what barriers and unmet needs they face. Without such knowledge, developers make easily avoidable errors, tool builders invest in the wrong direction, and researchers miss opportunities for improving the practice of CI. We present a qualitative study of the barriers and needs developers face when using CI. We conduct semi-structured interviews with developers from different industries and development scales. We triangulate our findings by running two surveys. We find that developers face trade-offs between speed and certainty (Assurance), between better access and information security (Security), and between more configuration options and greater ease of use (Flexi- bility). We present implications of these trade-offs for developers, tool builders, and researchers. @InProceedings{ESEC/FSE17p197, author = {Michael Hilton and Nicholas Nelson and Timothy Tunnell and Darko Marinov and Danny Dig}, title = {Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {197--207}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Ugherughe, Emamurho |
ESEC/FSE '17: "Where Is the Bug and How Is ..."
Where Is the Bug and How Is It Fixed? An Experiment with Practitioners
Marcel Böhme, Ezekiel O. Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller (National University of Singapore, Singapore; Saarland University, Germany; Singapore University of Technology and Design, Singapore; SAP, Germany) Research has produced many approaches to automatically locate, explain, and repair software bugs. But do these approaches relate to the way practitioners actually locate, understand, and fix bugs? To help answer this question, we have collected a dataset named DBGBENCH --- the correct fault locations, bug diagnoses, and software patches of 27 real errors in open-source C projects that were consolidated from hundreds of debugging sessions of professional software engineers. Moreover, we shed light on the entire debugging process, from constructing a hypothesis to submitting a patch, and how debugging time, difficulty, and strategies vary across practitioners and types of errors. Most notably, DBGBENCH can serve as reality check for novel automated debugging and repair techniques. @InProceedings{ESEC/FSE17p117, author = {Marcel Böhme and Ezekiel O. Soremekun and Sudipta Chattopadhyay and Emamurho Ugherughe and Andreas Zeller}, title = {Where Is the Bug and How Is It Fixed? An Experiment with Practitioners}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {117--128}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Valente, Marco Tulio |
ESEC/FSE '17: "Why Modern Open Source Projects ..."
Why Modern Open Source Projects Fail
Jailton Coelho and Marco Tulio Valente (Federal University of Minas Gerais, Brazil) Open source is experiencing a renaissance period, due to the appearance of modern platforms and workflows for developing and maintaining public code. As a result, developers are creating open source software at speeds never seen before. Consequently, these projects are also facing unprecedented mortality rates. To better understand the reasons for the failure of modern open source projects, this paper describes the results of a survey with the maintainers of 104 popular GitHub systems that have been deprecated. We provide a set of nine reasons for the failure of these open source projects. We also show that some maintenance practices---specifically the adoption of contributing guidelines and continuous integration---have an important association with a project failure or success. Finally, we discuss and reveal the principal strategies developers have tried to overcome the failure of the studied projects. @InProceedings{ESEC/FSE17p186, author = {Jailton Coelho and Marco Tulio Valente}, title = {Why Modern Open Source Projects Fail}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {186--196}, doi = {}, year = {2017}, } |
|
Vasic, Marko |
ESEC/FSE '17: "Regression Test Selection ..."
Regression Test Selection Across JVM Boundaries
Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric (University of Texas at Austin, USA; Microsoft, USA) Modern software development processes recommend that changes be integrated into the main development line of a project multiple times a day. Before a new revision may be integrated, developers practice regression testing to ensure that the latest changes do not break any previously established functionality. The cost of regression testing is high, due to an increase in the number of revisions that are introduced per day, as well as the number of tests developers write per revision. Regression test selection (RTS) optimizes regression testing by skipping tests that are not affected by recent project changes. Existing dynamic RTS techniques support only projects written in a single programming language, which is unfortunate knowing that an open-source project is on average written in several programming languages. We present the first dynamic RTS technique that does not stop at predefined language boundaries. Our technique dynamically detects, at the operating system level, all file artifacts a test depends on. Our technique is, hence, oblivious to the specific means the test uses to actually access the files: be it through spawning a new process, invoking a system call, invoking a library written in a different language, invoking a library that spawns a process which makes a system call, etc. We also provide a set of extension points which allow for a smooth integration with testing frameworks and build systems. We implemented our technique in a tool called RTSLinux as a loadable Linux kernel module and evaluated it on 21 Java projects that escape JVM by spawning new processes or invoking native code, totaling 2,050,791 lines of code. Our results show that RTSLinux, on average, skips 74.17% of tests and saves 52.83% of test execution time compared to executing all tests. @InProceedings{ESEC/FSE17p809, author = {Ahmet Celik and Marko Vasic and Aleksandar Milicevic and Milos Gligoric}, title = {Regression Test Selection Across JVM Boundaries}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {809--820}, doi = {}, year = {2017}, } |
|
Vasilescu, Bogdan |
ESEC/FSE '17: "Recovering Clear, Natural ..."
Recovering Clear, Natural Identifiers from Obfuscated JS Names
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu (Carnegie Mellon University, USA; University of California at Davis, USA) Well-chosen variable names are critical to source code readability, reusability, and maintainability. Unfortunately, in deployed JavaScript code (which is ubiquitous on the web) the identifier names are frequently minified and overloaded. This is done both for efficiency and also to protect potentially proprietary intellectual property. In this paper, we describe an approach based on statistical machine translation (SMT) that recovers some of the original names from the JavaScript programs minified by the very popular UglifyJS. This simple tool, Autonym, performs comparably to the best currently available deobfuscator for JavaScript, JSNice, which uses sophisticated static analysis. In fact, Autonym is quite complementary to JSNice, performing well when it does not, and vice versa. We also introduce a new tool, JSNaughty, which blends Autonym and JSNice, and significantly outperforms both at identifier name recovery, while remaining just as easy to use as JSNice. JSNaughty is available online at http://jsnaughty.org. @InProceedings{ESEC/FSE17p683, author = {Bogdan Vasilescu and Casey Casalnuovo and Premkumar Devanbu}, title = {Recovering Clear, Natural Identifiers from Obfuscated JS Names}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {683--693}, doi = {}, year = {2017}, } |
|
Vaughn, Michael |
ESEC/FSE '17: "The Care and Feeding of Wild-Caught ..."
The Care and Feeding of Wild-Caught Mutants
David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps (University of Wisconsin-Madison, USA) Mutation testing of a test suite and a program provides a way to measure the quality of the test suite. In essence, mutation testing is a form of sensitivity testing: by running mutated versions of the program against the test suite, mutation testing measures the suite’s sensitivity for detecting bugs that a programmer might introduce into the program. This paper introduces a technique to improve mutation testing that we call wild-caught mutants; it provides a method for creating potential faults that are more closely coupled with changes made by actual programmers. This technique allows the mutation tester to have more certainty that the test suite is sensitive to the kind of changes that have been observed to have been made by programmers in real-world cases. @InProceedings{ESEC/FSE17p511, author = {David Bingham Brown and Michael Vaughn and Ben Liblit and Thomas Reps}, title = {The Care and Feeding of Wild-Caught Mutants}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {511--522}, doi = {}, year = {2017}, } Video Info Artifacts Reusable ESEC/FSE '17: "NoFAQ: Synthesizing Command ..." NoFAQ: Synthesizing Command Repairs from Examples Loris D'Antoni , Rishabh Singh, and Michael Vaughn (University of Wisconsin-Madison, USA; Microsoft Research, USA) Command-line tools are confusing and hard to use due to their cryptic error messages and lack of documentation. Novice users often resort to online help-forums for finding corrections to their buggy commands, but have a hard time in searching precisely for posts that are relevant to their problem and then applying the suggested solutions to their buggy command. We present NoFAQ, a tool that uses a set of rules to suggest possible fixes when users write buggy commands that trigger commonly occurring errors. The rules are expressed in a language called FIXIT and each rule pattern-matches against the user's buggy command and corresponding error message, and uses these inputs to produce a possible fixed command. NoFAQ automatically learns FIXIT rules from examples of buggy and repaired commands. We evaluate NoFAQ on two fronts. First, we use 92 benchmark problems drawn from an existing tool and show that NoFAQ is able to synthesize rules for 81 benchmark problems in real time using just 2 to 5 input-output examples for each rule. Second, we run our learning algorithm on the examples obtained through a crowd-sourcing interface and show that the learning algorithm scales to large sets of examples. @InProceedings{ESEC/FSE17p582, author = {Loris D'Antoni and Rishabh Singh and Michael Vaughn}, title = {NoFAQ: Synthesizing Command Repairs from Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {582--592}, doi = {}, year = {2017}, } |
|
Vendome, Christopher |
ESEC/FSE '17: "Enabling Mutation Testing ..."
Enabling Mutation Testing for Android Apps
Mario Linares-Vásquez , Gabriele Bavota , Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk (Universidad de los Andes, Colombia; University of Lugano, Switzerland; College of William and Mary, USA; University of Sannio, Italy) Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for programming languages to be augmented with operators specific to an application domain and/or technology. This paper proposes MDroid+, a framework for effective mutation testing of Android apps. First, we systematically devise a taxonomy of 262 types of Android faults grouped in 14 categories by manually analyzing 2,023 so ware artifacts from different sources (e.g., bug reports, commits). Then, we identified a set of 38 mutation operators, and implemented an infrastructure to automatically seed mutations in Android apps with 35 of the identified operators. The taxonomy and the proposed operators have been evaluated in terms of stillborn/trivial mutants generated as compared to well know mutation tools, and their capacity to represent real faults in Android apps @InProceedings{ESEC/FSE17p233, author = {Mario Linares-Vásquez and Gabriele Bavota and Michele Tufano and Kevin Moran and Massimiliano Di Penta and Christopher Vendome and Carlos Bernal-Cárdenas and Denys Poshyvanyk}, title = {Enabling Mutation Testing for Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {233--244}, doi = {}, year = {2017}, } Info |
|
Verdoliva, Luisa |
ESEC/FSE '17: "Automatically Analyzing Groups ..."
Automatically Analyzing Groups of Crashes for Finding Correlations
Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi (Federico II University of Naples, Italy; Mozilla, UK) We devised an algorithm, inspired by contrast-set mining algorithms such as STUCCO, to automatically find statistically significant properties (correlations) in crash groups. Many earlier works focused on improving the clustering of crashes but, to the best of our knowledge, the problem of automatically describing properties of a cluster of crashes is so far unexplored. This means developers currently spend a fair amount of time analyzing the groups themselves, which in turn means that a) they are not spending their time actually developing a fix for the crash; and b) they might miss something in their exploration of the crash data (there is a large number of attributes in crash reports and it is hard and error-prone to manually analyze everything). Our algorithm helps developers and release managers understand crash reports more easily and in an automated way, helping in pinpointing the root cause of the crash. The tool implementing the algorithm has been deployed on Mozilla's crash reporting service. @InProceedings{ESEC/FSE17p717, author = {Marco Castelluccio and Carlo Sansone and Luisa Verdoliva and Giovanni Poggi}, title = {Automatically Analyzing Groups of Crashes for Finding Correlations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {717--726}, doi = {}, year = {2017}, } |
|
Verma, Sahil |
ESEC/FSE '17: "Synergistic Debug-Repair of ..."
Synergistic Debug-Repair of Heap Manipulations
Sahil Verma and Subhajit Roy (IIT Kanpur, India) We present Wolverine, an integrated Debug-Repair environment for heap manipulating programs. Wolverine facilitates stepping through a concrete program execution, provides visualizations of the abstract program states (as box-and-arrow diagrams) and integrates a novel, proof-directed repair algorithm to synthesize repair patches. To provide a seamless environment, Wolverine supports "hot-patching" of the generated repair patches, enabling the programmer to continue the debug session without requiring an abort-compile-debug cycle. We also propose new debug-repair possibilities, "specification refinement" and "specification slicing" made possible by Wolverine. We evaluate our framework on 1600 buggy programs (generated using fault injection) on a variety of data-structures like singly, doubly and circular linked-lists, Binary Search Trees, AVL trees, Red-Black trees and Splay trees; Wolverine could repair all the buggy instances within reasonable time (less than 5 sec in most cases). We also evaluate Wolverine on 247 (buggy) student submissions; Wolverine could repair more than 80% of programs where the student had made a reasonable attempt. @InProceedings{ESEC/FSE17p163, author = {Sahil Verma and Subhajit Roy}, title = {Synergistic Debug-Repair of Heap Manipulations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {163--173}, doi = {}, year = {2017}, } |
|
Visser, Willem |
ESEC/FSE '17: "S3: Syntax- and Semantic-Guided ..."
S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo , Claire Le Goues , and Willem Visser (Singapore Management University, Singapore; IST Austria, Austria; Carnegie Mellon University, USA; Stellenbosch University, South Africa) A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs. @InProceedings{ESEC/FSE17p593, author = {Xuan-Bach D. Le and Duc-Hiep Chu and David Lo and Claire Le Goues and Willem Visser}, title = {S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {593--604}, doi = {}, year = {2017}, } |
|
Wang, Chao |
ESEC/FSE '17: "Thread-Modular Static Analysis ..."
Thread-Modular Static Analysis for Relaxed Memory Models
Markus Kusano and Chao Wang (Virginia Tech, USA; University of Southern California, USA) We propose a memory-model-aware static program analysis method for accurately analyzing the behavior of concurrent software running on processors with weak consistency models such as x86-TSO, SPARC-PSO, and SPARC-RMO. At the center of our method is a unified framework for deciding the feasibility of inter-thread interferences to avoid propagating spurious data flows during static analysis and thus boost the performance of the static analyzer. We formulate the checking of interference feasibility as a set of Datalog rules which are both efficiently solvable and general enough to capture a range of hardware-level memory models. Compared to existing techniques, our method can significantly reduce the number of bogus alarms as well as unsound proofs. We implemented the method and evaluated it on a large set of multithreaded C programs. Our experiments show the method significantly outperforms state-of-the-art techniques in terms of accuracy with only moderate runtime overhead. @InProceedings{ESEC/FSE17p337, author = {Markus Kusano and Chao Wang}, title = {Thread-Modular Static Analysis for Relaxed Memory Models}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {337--348}, doi = {}, year = {2017}, } ESEC/FSE '17: "Symbolic Execution of Programmable ..." Symbolic Execution of Programmable Logic Controller Code Shengjian Guo , Meng Wu, and Chao Wang (Virginia Tech, USA; University of Southern California, USA) Programmable logic controllers (PLCs) are specialized computers for automating a wide range of cyber-physical systems. Since these systems are often safety-critical, software running on PLCs need to be free of programming errors. However, automated tools for testing PLC software are lacking despite the pervasive use of PLCs in industry. We propose a symbolic execution based method, named SymPLC, for automatically testing PLC software written in programming languages specified in the IEC 61131-3 standard. SymPLC takes the PLC source code as input and translates it into C before applying symbolic execution, to systematically generate test inputs that cover both paths in each periodic task and interleavings of these tasks. Toward this end, we propose a number of PLC-specific reduction techniques for identifying and eliminating redundant interleavings. We have evaluated SymPLC on a large set of benchmark programs with both single and multiple tasks. Our experiments show that SymPLC can handle these programs efficiently, and for multi-task PLC programs, our new reduction techniques outperform the state-of-the-art partial order reduction technique by more than two orders of magnitude. @InProceedings{ESEC/FSE17p326, author = {Shengjian Guo and Meng Wu and Chao Wang}, title = {Symbolic Execution of Programmable Logic Controller Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {326--336}, doi = {}, year = {2017}, } ESEC/FSE '17: "DESCRY: Reproducing System-Level ..." DESCRY: Reproducing System-Level Concurrency Failures Tingting Yu, Tarannum S. Zaman, and Chao Wang (University of Kentucky, USA; University of Southern California, USA) Concurrent systems may fail in the field due to various elusive faults such as race conditions. Reproducing such failures is hard because (1) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which cannot be handled by existing tools for reproducing intra-process (thread-level) failures; (2) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers; and (3) the debugging environment may differ from the deployed environment, which further complicates failure reproduction. To address these problems, we present DESCRY, the first fully automated tool for reproducing system-level concurrency failures based only on default log messages collected from the field. DESCRY uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. We have evaluated DESCRY on 22 real-world multi-process Linux applications with a total of 236,875 lines of code to demonstrate both its effectiveness and its efficiency in reproducing failures that no other tool can reproduce. @InProceedings{ESEC/FSE17p694, author = {Tingting Yu and Tarannum S. Zaman and Chao Wang}, title = {DESCRY: Reproducing System-Level Concurrency Failures}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {694--704}, doi = {}, year = {2017}, } |
|
Wang, Song |
ESEC/FSE '17: "QTEP: Quality-Aware Test Case ..."
QTEP: Quality-Aware Test Case Prioritization
Song Wang, Jaechang Nam, and Lin Tan (University of Waterloo, Canada) Test case prioritization (TCP) is a practical activity in software testing for exposing faults earlier. Researchers have proposed many TCP techniques to reorder test cases. Among them, coverage-based TCPs have been widely investigated. Specifically, coverage-based TCP approaches leverage coverage information between source code and test cases, i.e., static code coverage and dynamic code coverage, to schedule test cases. Existing coverage-based TCP techniques mainly focus on maximizing coverage while often do not consider the likely distribution of faults in source code. However, software faults are not often equally distributed in source code, e.g., around 80% faults are located in about 20% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. In this paper, we present a quality-aware test case prioritization technique, QTEP, to address the limitation of existing coverage-based TCP algorithms. In QTEP, we leverage code inspection techniques, i.e., a typical statistic defect prediction model and a typical static bug finder, to detect fault-prone source code and then adapt existing coverage-based TCP algorithms by considering the weighted source code in terms of fault-proneness. Our evaluation with 16 variant QTEP techniques on 33 different versions of 7 open source Java projects shows that QTEP could improve existing coverage-based TCP techniques for both regression and new test cases. Specifically, the improvement of the best variant of QTEP for regression test cases could be up to 15.0% and on average 7.6%, and for all test cases (both regression and new test cases), the improvement could be up to 10.0% and on average 5.0%. @InProceedings{ESEC/FSE17p523, author = {Song Wang and Jaechang Nam and Lin Tan}, title = {QTEP: Quality-Aware Test Case Prioritization}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {523--534}, doi = {}, year = {2017}, } Info |
|
Wehaibi, Sultan |
ESEC/FSE '17: "Why Do Developers Use Trivial ..."
Why Do Developers Use Trivial Packages? An Empirical Case Study on npm
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (Concordia University, Canada) Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call `trivial packages'. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers' beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be `deployment tested' and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use. @InProceedings{ESEC/FSE17p385, author = {Rabe Abdalkareem and Olivier Nourry and Sultan Wehaibi and Suhaib Mujahid and Emad Shihab}, title = {Why Do Developers Use Trivial Packages? An Empirical Case Study on npm}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {385--395}, doi = {}, year = {2017}, } |
|
Wei, Lili |
ESEC/FSE '17: "OASIS: Prioritizing Static ..."
OASIS: Prioritizing Static Analysis Warnings for Android Apps Based on App User Reviews
Lili Wei , Yepang Liu, and Shing-Chi Cheung (Hong Kong University of Science and Technology, China) Lint is a widely-used static analyzer for detecting bugs/issues in Android apps. However, it can generate many false warnings. One existing solution to this problem is to leverage project history data (e.g., bug fixing statistics) for warning prioritization. Unfortunately, such techniques are biased toward a project’s archived warnings and can easily miss newissues. Anotherweakness is that developers cannot readily relate the warnings to the impacts perceivable by users. To overcome these weaknesses, in this paper, we propose a semantics-aware approach, OASIS, to prioritizing Lint warnings by leveraging app user reviews. OASIS combines program analysis and NLP techniques to recover the intrinsic links between the Lint warnings for a given app and the user complaints on the app problems caused by the issues of concern. OASIS leverages the strength of such links to prioritize warnings. We evaluated OASIS on six popular and large-scale open-source Android apps. The results show that OASIS can effectively prioritize Lint warnings and help identify new issues that are previously-unknown to app developers. @InProceedings{ESEC/FSE17p672, author = {Lili Wei and Yepang Liu and Shing-Chi Cheung}, title = {OASIS: Prioritizing Static Analysis Warnings for Android Apps Based on App User Reviews}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {672--682}, doi = {}, year = {2017}, } |
|
Wu, Fengguang |
ESEC/FSE '17: "On the Scalability of Linux ..."
On the Scalability of Linux Kernel Maintainers' Work
Minghui Zhou , Qingying Chen, Audris Mockus, and Fengguang Wu (Peking University, China; University of Tennessee, USA; Intel, China) Open source software ecosystems evolve ways to balance the workload among groups of participants ranging from core groups to peripheral groups. As ecosystems grow, it is not clear whether the mechanisms that previously made them work will continue to be relevant or whether new mechanisms will need to evolve. The impact of failure for critical ecosystems such as Linux is enormous, yet the understanding of why they function and are effective is limited. We, therefore, aim to understand how the Linux kernel sustains its growth, how to characterize the workload of maintainers, and whether or not the existing mechanisms are scalable. We quantify maintainers’ work through the files that are maintained, and the change activity and the numbers of contributors in those files. We find systematic differences among modules; these differences are stable over time, which suggests that certain architectural features, commercial interests, or module-specific practices lead to distinct sustainable equilibria. We find that most of the modules have not grown appreciably over the last decade; most growth has been absorbed by a few modules. We also find that the effort per maintainer does not increase, even though the community has hypothesized that required effort might increase. However, the distribution of work among maintainers is highly unbalanced, suggesting that a few maintainers may experience increasing workload. We find that the practice of assigning multiple maintainers to a file yields only a power of 1/2 increase in productivity. We expect that our proposed framework to quantify maintainer practices will help clarify the factors that allow rapidly growing ecosystems to be sustainable. @InProceedings{ESEC/FSE17p27, author = {Minghui Zhou and Qingying Chen and Audris Mockus and Fengguang Wu}, title = {On the Scalability of Linux Kernel Maintainers' Work}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {27--37}, doi = {}, year = {2017}, } Info |
|
Wu, Ke |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Wu, Meng |
ESEC/FSE '17: "Symbolic Execution of Programmable ..."
Symbolic Execution of Programmable Logic Controller Code
Shengjian Guo , Meng Wu, and Chao Wang (Virginia Tech, USA; University of Southern California, USA) Programmable logic controllers (PLCs) are specialized computers for automating a wide range of cyber-physical systems. Since these systems are often safety-critical, software running on PLCs need to be free of programming errors. However, automated tools for testing PLC software are lacking despite the pervasive use of PLCs in industry. We propose a symbolic execution based method, named SymPLC, for automatically testing PLC software written in programming languages specified in the IEC 61131-3 standard. SymPLC takes the PLC source code as input and translates it into C before applying symbolic execution, to systematically generate test inputs that cover both paths in each periodic task and interleavings of these tasks. Toward this end, we propose a number of PLC-specific reduction techniques for identifying and eliminating redundant interleavings. We have evaluated SymPLC on a large set of benchmark programs with both single and multiple tasks. Our experiments show that SymPLC can handle these programs efficiently, and for multi-task PLC programs, our new reduction techniques outperform the state-of-the-art partial order reduction technique by more than two orders of magnitude. @InProceedings{ESEC/FSE17p326, author = {Shengjian Guo and Meng Wu and Chao Wang}, title = {Symbolic Execution of Programmable Logic Controller Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {326--336}, doi = {}, year = {2017}, } |
|
Wüstholz, Valentin |
ESEC/FSE '17: "Failure-Directed Program Trimming ..."
Failure-Directed Program Trimming
Kostas Ferles , Valentin Wüstholz, Maria Christakis, and Isil Dillig (University of Texas at Austin, USA; University of Kent, UK) This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program P, program trimming generates a new program P′ such that P and P′ are equi-safe (i.e., P′ has a bug if and only if P has a bug), but P′ has fewer execution paths than P. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques. @InProceedings{ESEC/FSE17p174, author = {Kostas Ferles and Valentin Wüstholz and Maria Christakis and Isil Dillig}, title = {Failure-Directed Program Trimming}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {174--185}, doi = {}, year = {2017}, } |
|
Xie, Xiaofei |
ESEC/FSE '17: "Loopster: Static Loop Termination ..."
Loopster: Static Loop Termination Analysis
Xiaofei Xie, Bihuan Chen, Liang Zou, Shang-Wei Lin, Yang Liu , and Xiaohong Li (Tianjin University, China; Nanyang Technological University, Singapore) Loop termination is an important problem for proving the correctness of a system and ensuring that the system always reacts. Existing loop termination analysis techniques mainly depend on the synthesis of ranking functions, which is often expensive. In this paper, we present a novel approach, named Loopster, which performs an efficient static analysis to decide the termination for loops based on path termination analysis and path dependency reasoning. Loopster adopts a divide-and-conquer approach: (1) we extract individual paths from a target multi-path loop and analyze the termination of each path, (2) analyze the dependencies between each two paths, and then (3) determine the overall termination of the target loop based on the relations among paths. We evaluate Loopster by applying it on the loop termination competition benchmark and three real-world projects. The results show that Loopster is effective in a majority of loops with better accuracy and 20 ×+ performance improvement compared to the state-of-the-art tools. @InProceedings{ESEC/FSE17p84, author = {Xiaofei Xie and Bihuan Chen and Liang Zou and Shang-Wei Lin and Yang Liu and Xiaohong Li}, title = {Loopster: Static Loop Termination Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {84--94}, doi = {}, year = {2017}, } |
|
Xu, Zhaogui |
ESEC/FSE '17: "LAMP: Data Provenance for ..."
LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation
Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, and Xiangyu Zhang (Purdue University, USA; Nanjing University, China) Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering. @InProceedings{ESEC/FSE17p786, author = {Shiqing Ma and Yousra Aafer and Zhaogui Xu and Wen-Chuan Lee and Juan Zhai and Yingqi Liu and Xiangyu Zhang}, title = {LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {786--797}, doi = {}, year = {2017}, } |
|
Yan, Yu |
ESEC/FSE '17: "Understanding Misunderstandings ..."
Understanding Misunderstandings in Source Code
Dan Gopstein, Jake Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos (New York University, USA; Pennsylvania State University, USA; University of Colorado at Colorado Springs, USA) Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback. @InProceedings{ESEC/FSE17p129, author = {Dan Gopstein and Jake Iannacone and Yu Yan and Lois DeLong and Yanyan Zhuang and Martin K.-C. Yeh and Justin Cappos}, title = {Understanding Misunderstandings in Source Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--139}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Yang, Jinqiu |
ESEC/FSE '17: "Better Test Cases for Better ..."
Better Test Cases for Better Automated Program Repair
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan (University of Waterloo, Canada) Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad). @InProceedings{ESEC/FSE17p831, author = {Jinqiu Yang and Alexey Zhikhartsev and Yuefei Liu and Lin Tan}, title = {Better Test Cases for Better Automated Program Repair}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {831--841}, doi = {}, year = {2017}, } |
|
Yang, Weiming |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Yang, Zijiang |
ESEC/FSE '17: "AtexRace: Across Thread and ..."
AtexRace: Across Thread and Execution Sampling for In-House Race Detection
Yu Guo, Yan Cai , and Zijiang Yang (Western Michigan University, USA; Institute of Software at Chinese Academy of Sciences, China) Data race is a major source of concurrency bugs. Dynamic data race detection tools (e.g., FastTrack) monitor the execu-tions of a program to report data races occurring in runtime. However, such tools incur significant overhead that slows down and perturbs executions. To address the issue, the state-of-the-art dynamic data race detection tools (e.g., LiteRace) ap-ply sampling techniques to selectively monitor memory access-es. Although they reduce overhead, they also miss many data races as confirmed by existing studies. Thus, practitioners face a dilemma on whether to use FastTrack, which detects more data races but is much slower, or LiteRace, which is faster but detects less data races. In this paper, we propose a new sam-pling approach to address the major limitations of current sampling techniques, which ignore the facts that a data race involves two threads and a program under testing is repeatedly executed. We develop a tool called AtexRace to sample memory accesses across both threads and executions. By selectively monitoring the pairs of memory accesses that have not been frequently observed in current and previous executions, AtexRace detects as many data races as FastTrack at a cost as low as LiteRace. We have compared AtexRace against FastTrack and LiteRace on both Parsec benchmark suite and a large-scale real-world MySQL Server with 223 test cases. The experiments confirm that AtexRace can be a replacement of FastTrack and LiteRace. @InProceedings{ESEC/FSE17p315, author = {Yu Guo and Yan Cai and Zijiang Yang}, title = {AtexRace: Across Thread and Execution Sampling for In-House Race Detection}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {315--325}, doi = {}, year = {2017}, } |
|
Yao, Yao |
ESEC/FSE '17: "Guided, Stochastic Model-Based ..."
Guided, Stochastic Model-Based GUI Testing of Android Apps
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu , Yang Liu , and Zhendong Su (East China Normal University, China; Nanyang Technological University, Singapore; Shanghai Jiao Tong University, China; University of California at Davis, USA) Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed. @InProceedings{ESEC/FSE17p245, author = {Ting Su and Guozhu Meng and Yuting Chen and Ke Wu and Weiming Yang and Yao Yao and Geguang Pu and Yang Liu and Zhendong Su}, title = {Guided, Stochastic Model-Based GUI Testing of Android Apps}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {245--256}, doi = {}, year = {2017}, } |
|
Yeh, Martin K.-C. |
ESEC/FSE '17: "Understanding Misunderstandings ..."
Understanding Misunderstandings in Source Code
Dan Gopstein, Jake Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos (New York University, USA; Pennsylvania State University, USA; University of Colorado at Colorado Springs, USA) Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback. @InProceedings{ESEC/FSE17p129, author = {Dan Gopstein and Jake Iannacone and Yu Yan and Lois DeLong and Yanyan Zhuang and Martin K.-C. Yeh and Justin Cappos}, title = {Understanding Misunderstandings in Source Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--139}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Yi, Jooyong |
ESEC/FSE '17: "A Feasibility Study of Using ..."
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury (Innopolis University, Russia; IIT Kanpur, India; National University of Singapore, Singapore) Despite the fact an intelligent tutoring system for programming (ITSP) education has long attracted interest, its widespread use has been hindered by the difficulty of generating personalized feedback automatically. Meanwhile, automated program repair (APR) is an emerging new technology that automatically fixes software bugs, and it has been shown that APR can fix the bugs of large real-world software. In this paper, we study the feasibility of marrying intelligent programming tutoring and APR. We perform our feasibility study with four state-of-the-art APR tools (GenProg, AE, Angelix, and Prophet), and 661 programs written by the students taking an introductory programming course. We found that when APR tools are used out of the box, only about 30% of the programs in our dataset are repaired. This low repair rate is largely due to the student programs often being significantly incorrect — in contrast, professional software for which APR was successfully applied typically fails only a small portion of tests. To bridge this gap, we adopt in APR a new repair policy akin to the hint generation policy employed in the existing ITSP. This new repair policy admits partial repairs that address part of failing tests, which results in 84% improvement of repair rate. We also performed a user study with 263 novice students and 37 graders, and identified an understudied problem; while novice students do not seem to know how to effectively make use of generated repairs as hints, the graders do seem to gain benefits from repairs. @InProceedings{ESEC/FSE17p740, author = {Jooyong Yi and Umair Z. Ahmed and Amey Karkare and Shin Hwei Tan and Abhik Roychoudhury}, title = {A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {740--751}, doi = {}, year = {2017}, } Info Artifacts Functional |
|
Yoga, Adarsh |
ESEC/FSE '17: "A Fast Causal Profiler for ..."
A Fast Causal Profiler for Task Parallel Programs
Adarsh Yoga and Santosh Nagarakatte (Rutgers University, USA) This paper proposes TASKPROF, a profiler that identifies parallelism bottlenecks in task parallel programs. It leverages the structure of a task parallel execution to perform fine-grained attribution of work to various parts of the program. TASKPROF’s use of hardware performance counters to perform fine-grained measurements minimizes perturbation. TASKPROF’s profile execution runs in parallel using multi-cores. TASKPROF’s causal profile enables users to estimate improvements in parallelism when a region of code is optimized even when concrete optimizations are not yet known. We have used TASKPROF to isolate parallelism bottlenecks in twenty three applications that use the Intel Threading Building Blocks library. We have designed parallelization techniques in five applications to increase parallelism by an order of magnitude using TASKPROF. Our user study indicates that developers are able to isolate performance bottlenecks with ease using TASKPROF. @InProceedings{ESEC/FSE17p15, author = {Adarsh Yoga and Santosh Nagarakatte}, title = {A Fast Causal Profiler for Task Parallel Programs}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {15--26}, doi = {}, year = {2017}, } Artifacts Functional |
|
Yoo, Shin |
ESEC/FSE '17: "Generalized Observational ..."
Generalized Observational Slicing for Tree-Represented Modelling Languages
Nicolas E. Gold, David Binkley, Mark Harman , Syed Islam, Jens Krinke, and Shin Yoo (University College London, UK; Loyola University Maryland, USA; University of East London, UK; KAIST, South Korea) Model-driven software engineering raises the abstraction level making complex systems easier to understand than if written in textual code. Nevertheless, large complicated software systems can have large models, motivating the need for slicing techniques that reduce the size of a model. We present a generalization of observation-based slicing that allows the criterion to be defined using a variety of kinds of observable behavior and does not require any complex dependence analysis. We apply our implementation of generalized observational slicing for tree-structured representations to Simulink models. The resulting slice might be the subset of the original model responsible for an observed failure or simply the sub-model semantically related to a classic slicing criterion. Unlike its predecessors, the algorithm is also capable of slicing embedded Stateflow state machines. A study of nine real-world models drawn from four different application domains demonstrates the effectiveness of our approach at dramatically reducing Simulink model sizes for realistic observation scenarios: for 9 out of 20 cases, the resulting model has fewer than 25% of the original model's elements. @InProceedings{ESEC/FSE17p547, author = {Nicolas E. Gold and David Binkley and Mark Harman and Syed Islam and Jens Krinke and Shin Yoo}, title = {Generalized Observational Slicing for Tree-Represented Modelling Languages}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {547--558}, doi = {}, year = {2017}, } |
|
Yu, Tingting |
ESEC/FSE '17: "DESCRY: Reproducing System-Level ..."
DESCRY: Reproducing System-Level Concurrency Failures
Tingting Yu, Tarannum S. Zaman, and Chao Wang (University of Kentucky, USA; University of Southern California, USA) Concurrent systems may fail in the field due to various elusive faults such as race conditions. Reproducing such failures is hard because (1) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which cannot be handled by existing tools for reproducing intra-process (thread-level) failures; (2) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers; and (3) the debugging environment may differ from the deployed environment, which further complicates failure reproduction. To address these problems, we present DESCRY, the first fully automated tool for reproducing system-level concurrency failures based only on default log messages collected from the field. DESCRY uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. We have evaluated DESCRY on 22 real-world multi-process Linux applications with a total of 236,875 lines of code to demonstrate both its effectiveness and its efficiency in reproducing failures that no other tool can reproduce. @InProceedings{ESEC/FSE17p694, author = {Tingting Yu and Tarannum S. Zaman and Chao Wang}, title = {DESCRY: Reproducing System-Level Concurrency Failures}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {694--704}, doi = {}, year = {2017}, } |
|
Zaman, Tarannum S. |
ESEC/FSE '17: "DESCRY: Reproducing System-Level ..."
DESCRY: Reproducing System-Level Concurrency Failures
Tingting Yu, Tarannum S. Zaman, and Chao Wang (University of Kentucky, USA; University of Southern California, USA) Concurrent systems may fail in the field due to various elusive faults such as race conditions. Reproducing such failures is hard because (1) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which cannot be handled by existing tools for reproducing intra-process (thread-level) failures; (2) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers; and (3) the debugging environment may differ from the deployed environment, which further complicates failure reproduction. To address these problems, we present DESCRY, the first fully automated tool for reproducing system-level concurrency failures based only on default log messages collected from the field. DESCRY uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. We have evaluated DESCRY on 22 real-world multi-process Linux applications with a total of 236,875 lines of code to demonstrate both its effectiveness and its efficiency in reproducing failures that no other tool can reproduce. @InProceedings{ESEC/FSE17p694, author = {Tingting Yu and Tarannum S. Zaman and Chao Wang}, title = {DESCRY: Reproducing System-Level Concurrency Failures}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {694--704}, doi = {}, year = {2017}, } |
|
Zampetti, Fiorella |
ESEC/FSE '17: "Detecting Missing Information ..."
Detecting Missing Information in Bug Descriptions
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus , Gabriele Bavota , and Vincent Ng (University of Texas at Dallas, USA; University of Sannio, Italy; Colorado State University, USA; University of Lugano, Switzerland) Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, by clearly describing the Observed Behavior (OB), the Steps to Reproduce (S2R), and the Expected Behavior (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in bug reports and, to date, there is no effective way to automatically check and enforce their presence. We manually analyzed nearly 3k bug reports to understand to what extent OB, EB, and S2R are reported in bug reports and what discourse patterns reporters use to describe such information. We found that (i) while most reports contain OB (i.e., 93.5%), only 35.2% and 51.4% explicitly describe EB and S2R, respectively; and (ii) reporters recurrently use 154 discourse patterns to describe such content. Based on these findings, we designed and evaluated an automated approach to detect the absence (or presence) of EB and S2R in bug descriptions. With its best setting, our approach is able to detect missing EB (S2R) with 85.9% (69.2%) average precision and 93.2% (83%) average recall. Our approach intends to improve bug descriptions quality by alerting reporters about missing EB and S2R at reporting time. @InProceedings{ESEC/FSE17p396, author = {Oscar Chaparro and Jing Lu and Fiorella Zampetti and Laura Moreno and Massimiliano Di Penta and Andrian Marcus and Gabriele Bavota and Vincent Ng}, title = {Detecting Missing Information in Bug Descriptions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {396--407}, doi = {}, year = {2017}, } |
|
Zeller, Andreas |
ESEC/FSE '17: "Where Is the Bug and How Is ..."
Where Is the Bug and How Is It Fixed? An Experiment with Practitioners
Marcel Böhme, Ezekiel O. Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller (National University of Singapore, Singapore; Saarland University, Germany; Singapore University of Technology and Design, Singapore; SAP, Germany) Research has produced many approaches to automatically locate, explain, and repair software bugs. But do these approaches relate to the way practitioners actually locate, understand, and fix bugs? To help answer this question, we have collected a dataset named DBGBENCH --- the correct fault locations, bug diagnoses, and software patches of 27 real errors in open-source C projects that were consolidated from hundreds of debugging sessions of professional software engineers. Moreover, we shed light on the entire debugging process, from constructing a hypothesis to submitting a patch, and how debugging time, difficulty, and strategies vary across practitioners and types of errors. Most notably, DBGBENCH can serve as reality check for novel automated debugging and repair techniques. @InProceedings{ESEC/FSE17p117, author = {Marcel Böhme and Ezekiel O. Soremekun and Sudipta Chattopadhyay and Emamurho Ugherughe and Andreas Zeller}, title = {Where Is the Bug and How Is It Fixed? An Experiment with Practitioners}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {117--128}, doi = {}, year = {2017}, } Info Artifacts Reusable |
|
Zhai, Juan |
ESEC/FSE '17: "LAMP: Data Provenance for ..."
LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation
Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, and Xiangyu Zhang (Purdue University, USA; Nanjing University, China) Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering. @InProceedings{ESEC/FSE17p786, author = {Shiqing Ma and Yousra Aafer and Zhaogui Xu and Wen-Chuan Lee and Juan Zhai and Yingqi Liu and Xiangyu Zhang}, title = {LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {786--797}, doi = {}, year = {2017}, } |
|
Zhang, Chi |
ESEC/FSE '17: "Continuous Variable-Specific ..."
Continuous Variable-Specific Resolutions of Feature Interactions
M. Hadi Zibaeenejad, Chi Zhang, and Joanne M. Atlee (University of Waterloo, Canada) Systems that are assembled from independently developed features suffer from feature interactions, in which features affect one another’s behaviour in surprising ways. The Feature Interaction Problem results from trying to implement an appropriate resolution for each interaction within each possible context, because the number of possible contexts to consider increases exponentially with the number of features in the system. Resolution strategies aim to combat the Feature Interaction Problem by offering default strategies that resolve entire classes of interactions, thereby reducing the work needed to resolve lots of interactions. However most such approaches employ coarse-grained resolution strategies (e.g., feature priority) or a centralized arbitrator. Our work focuses on employing variable-specific default-resolution strategies that aim to resolve at runtime features’ conflicting actions on a system’s outputs. In this paper, we extend prior work to enable co-resolution of interactions on coupled output variables and to promote smooth continuous resolutions over execution paths. We implemented our approach within the PreScan simulator and performed a case study involving 15 automotive features; this entailed our devising and implementing three resolution strategies for three output variables. The results of the case study show that the approach produces smooth and continuous resolutions of interactions throughout interesting scenarios. @InProceedings{ESEC/FSE17p408, author = {M. Hadi Zibaeenejad and Chi Zhang and Joanne M. Atlee}, title = {Continuous Variable-Specific Resolutions of Feature Interactions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {408--418}, doi = {}, year = {2017}, } Info |
|
Zhang, Xiangyu |
ESEC/FSE '17: "LAMP: Data Provenance for ..."
LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation
Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, and Xiangyu Zhang (Purdue University, USA; Nanjing University, China) Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering. @InProceedings{ESEC/FSE17p786, author = {Shiqing Ma and Yousra Aafer and Zhaogui Xu and Wen-Chuan Lee and Juan Zhai and Yingqi Liu and Xiangyu Zhang}, title = {LAMP: Data Provenance for Graph Based Machine Learning Algorithms through Derivative Computation}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {786--797}, doi = {}, year = {2017}, } |
|
Zhao, Jing |
ESEC/FSE '17: "Adaptively Generating High ..."
Adaptively Generating High Quality Fixes for Atomicity Violations
Yan Cai , Lingwei Cao, and Jing Zhao (Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Harbin Engineering University, China) It is difficult to fix atomicity violations correctly. Existing gate lock algorithm (GLA) simply inserts gate locks to serialize exe-cutions, which may introduce performance bugs and deadlocks. Synthesized context-aware gate locks (by Grail) require complex source code synthesis. We propose Fixer to adaptively fix ato-micity violations. It firstly analyses the lock acquisitions of an atomicity violation. Then it either adjusts the existing lock scope or inserts a gate lock. The former addresses cases where some locks are used but fail to provide atomic accesses. For the latter, it infers the visibility (being global or a field of a class/struct) of the gate lock such that the lock only protects related accesses. For both cases, Fixer further eliminates new lock orders to avoid introducing deadlocks. Of course, Fixer can produce both kinds of fixes on atomicity violations with locks. The experi-mental results on 15 previously used atomicity violations show that: Fixer correctly fixed all 15 atomicity violations without introducing deadlocks. However, GLA and Grail both intro-duced 5 deadlocks. HFix (that only targets on fixing certain types of atomicity violations) only fixed 2 atomicity violations and introduced 4 deadlocks. Fixer also provides an alternative way to insert gate locks (by inserting gate locks with proper visibility) considering fix acceptance. @InProceedings{ESEC/FSE17p303, author = {Yan Cai and Lingwei Cao and Jing Zhao}, title = {Adaptively Generating High Quality Fixes for Atomicity Violations}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {303--314}, doi = {}, year = {2017}, } |
|
Zhikhartsev, Alexey |
ESEC/FSE '17: "Better Test Cases for Better ..."
Better Test Cases for Better Automated Program Repair
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan (University of Waterloo, Canada) Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad). @InProceedings{ESEC/FSE17p831, author = {Jinqiu Yang and Alexey Zhikhartsev and Yuefei Liu and Lin Tan}, title = {Better Test Cases for Better Automated Program Repair}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {831--841}, doi = {}, year = {2017}, } |
|
Zhou, Minghui |
ESEC/FSE '17: "On the Scalability of Linux ..."
On the Scalability of Linux Kernel Maintainers' Work
Minghui Zhou , Qingying Chen, Audris Mockus, and Fengguang Wu (Peking University, China; University of Tennessee, USA; Intel, China) Open source software ecosystems evolve ways to balance the workload among groups of participants ranging from core groups to peripheral groups. As ecosystems grow, it is not clear whether the mechanisms that previously made them work will continue to be relevant or whether new mechanisms will need to evolve. The impact of failure for critical ecosystems such as Linux is enormous, yet the understanding of why they function and are effective is limited. We, therefore, aim to understand how the Linux kernel sustains its growth, how to characterize the workload of maintainers, and whether or not the existing mechanisms are scalable. We quantify maintainers’ work through the files that are maintained, and the change activity and the numbers of contributors in those files. We find systematic differences among modules; these differences are stable over time, which suggests that certain architectural features, commercial interests, or module-specific practices lead to distinct sustainable equilibria. We find that most of the modules have not grown appreciably over the last decade; most growth has been absorbed by a few modules. We also find that the effort per maintainer does not increase, even though the community has hypothesized that required effort might increase. However, the distribution of work among maintainers is highly unbalanced, suggesting that a few maintainers may experience increasing workload. We find that the practice of assigning multiple maintainers to a file yields only a power of 1/2 increase in productivity. We expect that our proposed framework to quantify maintainer practices will help clarify the factors that allow rapidly growing ecosystems to be sustainable. @InProceedings{ESEC/FSE17p27, author = {Minghui Zhou and Qingying Chen and Audris Mockus and Fengguang Wu}, title = {On the Scalability of Linux Kernel Maintainers' Work}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {27--37}, doi = {}, year = {2017}, } Info |
|
Zhuang, Yanyan |
ESEC/FSE '17: "Understanding Misunderstandings ..."
Understanding Misunderstandings in Source Code
Dan Gopstein, Jake Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos (New York University, USA; Pennsylvania State University, USA; University of Colorado at Colorado Springs, USA) Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback. @InProceedings{ESEC/FSE17p129, author = {Dan Gopstein and Jake Iannacone and Yu Yan and Lois DeLong and Yanyan Zhuang and Martin K.-C. Yeh and Justin Cappos}, title = {Understanding Misunderstandings in Source Code}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {129--139}, doi = {}, year = {2017}, } Info Best-Paper Award |
|
Zibaeenejad, M. Hadi |
ESEC/FSE '17: "Continuous Variable-Specific ..."
Continuous Variable-Specific Resolutions of Feature Interactions
M. Hadi Zibaeenejad, Chi Zhang, and Joanne M. Atlee (University of Waterloo, Canada) Systems that are assembled from independently developed features suffer from feature interactions, in which features affect one another’s behaviour in surprising ways. The Feature Interaction Problem results from trying to implement an appropriate resolution for each interaction within each possible context, because the number of possible contexts to consider increases exponentially with the number of features in the system. Resolution strategies aim to combat the Feature Interaction Problem by offering default strategies that resolve entire classes of interactions, thereby reducing the work needed to resolve lots of interactions. However most such approaches employ coarse-grained resolution strategies (e.g., feature priority) or a centralized arbitrator. Our work focuses on employing variable-specific default-resolution strategies that aim to resolve at runtime features’ conflicting actions on a system’s outputs. In this paper, we extend prior work to enable co-resolution of interactions on coupled output variables and to promote smooth continuous resolutions over execution paths. We implemented our approach within the PreScan simulator and performed a case study involving 15 automotive features; this entailed our devising and implementing three resolution strategies for three output variables. The results of the case study show that the approach produces smooth and continuous resolutions of interactions throughout interesting scenarios. @InProceedings{ESEC/FSE17p408, author = {M. Hadi Zibaeenejad and Chi Zhang and Joanne M. Atlee}, title = {Continuous Variable-Specific Resolutions of Feature Interactions}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {408--418}, doi = {}, year = {2017}, } Info |
|
Zoppi, Edgardo |
ESEC/FSE '17: "Toward Full Elasticity in ..."
Toward Full Elasticity in Distributed Static Analysis: The Case of Callgraph Analysis
Diego Garbervetsky , Edgardo Zoppi, and Benjamin Livshits (University of Buenos Aires, Argentina; Imperial College London, UK) In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms. @InProceedings{ESEC/FSE17p442, author = {Diego Garbervetsky and Edgardo Zoppi and Benjamin Livshits}, title = {Toward Full Elasticity in Distributed Static Analysis: The Case of Callgraph Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {442--453}, doi = {}, year = {2017}, } |
|
Zou, Liang |
ESEC/FSE '17: "Loopster: Static Loop Termination ..."
Loopster: Static Loop Termination Analysis
Xiaofei Xie, Bihuan Chen, Liang Zou, Shang-Wei Lin, Yang Liu , and Xiaohong Li (Tianjin University, China; Nanyang Technological University, Singapore) Loop termination is an important problem for proving the correctness of a system and ensuring that the system always reacts. Existing loop termination analysis techniques mainly depend on the synthesis of ranking functions, which is often expensive. In this paper, we present a novel approach, named Loopster, which performs an efficient static analysis to decide the termination for loops based on path termination analysis and path dependency reasoning. Loopster adopts a divide-and-conquer approach: (1) we extract individual paths from a target multi-path loop and analyze the termination of each path, (2) analyze the dependencies between each two paths, and then (3) determine the overall termination of the target loop based on the relations among paths. We evaluate Loopster by applying it on the loop termination competition benchmark and three real-world projects. The results show that Loopster is effective in a majority of loops with better accuracy and 20 ×+ performance improvement compared to the state-of-the-art tools. @InProceedings{ESEC/FSE17p84, author = {Xiaofei Xie and Bihuan Chen and Liang Zou and Shang-Wei Lin and Yang Liu and Xiaohong Li}, title = {Loopster: Static Loop Termination Analysis}, booktitle = {Proc.\ ESEC/FSE}, publisher = {ACM}, pages = {84--94}, doi = {}, year = {2017}, } |
264 authors
proc time: 6.5