Powered by
Conference Publishing Consulting

36th International Conference on Software Engineering (ICSE 2014), May 31 – June 7, 2014, Hyderabad, India

ICSE 2014 – Proceedings

Contents - Abstracts - Authors
Online Calendar - iCal File
Twitter: https://twitter.com/ICSEconf


Title Page

Message from the Chairs
Welcome to the 36th International Conference on Software Engineering, set in Hyderabad, India. On behalf of the entire Organizing Committee, it is our distinct pleasure to invite you to participate, not just in the official program of the conference and its full assortment of activities, but also in enjoying the beautiful history, customs, and surroundings of the city and people of Hyderabad.


Technical Research

Perspectives on Software Engineering

Cowboys, Ankle Sprains, and Keepers of Quality: How Is Video Game Development Different from Software Development?
Emerson Murphy-Hill, Thomas Zimmermann, and Nachiappan Nagappan
(North Carolina State University, USA; Microsoft Research, USA)
Video games make up an important part of the software industry, yet the software engineering community rarely studies video games. This imbalance is a problem if video game development differs from general software development, as some game experts suggest. In this paper we describe a study with 14 interviewees and 364 survey respondents. The study elicited substantial differences between video game development and other software development. For example, in game development, “cowboy coders” are necessary to cope with the continuous interplay between creative desires and technical constraints. Consequently, game developers are hesitant to use automated testing because of these tests’ rapid obsolescence in the face of shifting creative desires of game designers. These differences between game and non-game development have implications for research, industry, and practice. For instance, as a starting point for impacting game development, researchers could create testing tools that enable game developers to create tests that assert flexible behavior with little up-front investment.
Publisher's Version Article Search
Analyze This! 145 Questions for Data Scientists in Software Engineering
Andrew Begel and Thomas Zimmermann
(Microsoft Research, USA)
In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like data scientists to investigate about software, about software processes and practices, and about software engineers. Our analyses resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate these 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also saw opposition to questions that assess the performance of individual employees or compare them with one another. Our categorization and catalog of 145 questions can help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.
Publisher's Version Article Search Info
The Dimensions of Software Engineering Success
Paul Ralph and Paul Kelly
(Lancaster University, UK)
Software engineering research and practice are hampered by the lack of a well-understood, top-level dependent variable. Recent initiatives on General Theory of Software Engineering suggest a multifaceted variable – Software Engineering Success. However, its exact dimensions are unknown. This paper investigates the dimensions (not causes) of software engineering success. An interdisciplinary sample of 191 design professionals (68 in the software industry) were interviewed concerning their perceptions of success. Non-software designers (e.g. architects) were included to increase the breadth of ideas and facilitate comparative analysis. Transcripts were subjected to supervised, semi-automated semantic content analysis, including a software developer vs. other professionals comparison. Findings suggest that participants view their work as time-constrained projects with explicit clients and other stakeholders. Success depends on stakeholder impacts – financial, social, physical and emotional – and is understood through feedback. Concern with meeting explicit requirements is peculiar to software engineering and design is not equated with aesthetics in many other fields. Software engineering success is a complex multifaceted variable, which cannot sufficiently be explained by traditional dimensions including user satisfaction, profitability or meeting requirements, budgets and schedules. A proto-theory of success is proposed, which models success as the net impact on a particular stakeholder at a particular time. Stakeholder impacts are driven by project efficiency, artifact quality and market performance. Success is not additive, e.g., ‘low’ success for clients does not average with ‘high’ success for developers to make ‘moderate’ success overall; rather, a project may be simultaneously successful and unsuccessful from different perspectives.
Publisher's Version Article Search
How Do Professionals Perceive Legacy Systems and Software Modernization?
Ravi Khadka, Belfrit V. Batlajery, Amir M. Saeidi, Slinger Jansen, and Jurriaan Hage
(Utrecht University, Netherlands)
Existing research in legacy system modernization has traditionally focused on technical challenges, and takes the standpoint that legacy systems are obsolete, yet crucial for an organization's operation. Nonetheless, it remains unclear whether practitioners in the industry also share this perception. This paper describes the outcome of an exploratory study in which 26 industrial practitioners were interviewed on what makes a software system a legacy system, what the main drivers are that lead to the modernization of such systems, and what challenges are faced during the modernization process. The findings of the interviews have been validated by means of a survey with 198 respondents. The results show that practitioners value their legacy systems highly, the challenges they face are not just technical, but also include business and organizational aspects.
Publisher's Version Article Search

Testing 1

SimRT: An Automated Framework to Support Regression Testing for Data Races
Tingting Yu, Witawas Srisa-an, and Gregg Rothermel
(University of Nebraska-Lincoln, USA)
Concurrent programs are prone to various classes of difficult-to-detect faults, of which data races are particularly prevalent. Prior work has attempted to increase the cost-effectiveness of approaches for testing for data races by employing race detection techniques, but to date, no work has considered cost-effective approaches for re-testing for races as programs evolve. In this paper we present SimRT, an automated regression testing framework for use in detecting races introduced by code modifications. SimRT employs a regression test selection technique, focused on sets of program elements related to race detection, to reduce the number of test cases that must be run on a changed program to detect races that occur due to code modifications, and it employs a test case prioritization technique to improve the rate at which such races are detected. Our empirical study of SimRT reveals that it is more efficient and effective for revealing races than other approaches, and that its constituent test selection and prioritization components each contribute to its performance.
Publisher's Version Article Search
Performance Regression Testing Target Prioritization via Performance Risk Analysis
Peng Huang, Xiao Ma, Dongcai Shen, and Yuanyuan Zhou
(University of California at San Diego, USA; University of Illinois at Urbana-Champaign, USA)
As software evolves, problematic changes can significantly degrade software performance, i.e., introducing performance regression. Performance regression testing is an effective way to reveal such issues in early stages. Yet because of its high overhead, this activity is usually performed infrequently. Consequently, when performance regression issue is spotted at a certain point, multiple commits might have been merged since last testing. Developers have to spend extra time and efforts narrowing down which commit caused the problem. Existing efforts try to improve performance regression testing efficiency through test case reduction or prioritization. In this paper, we propose a new lightweight and white-box approach, performance risk analysis (PRA), to improve performance regression testing efficiency via testing target prioritization. The analysis statically evaluates a given source code commit's risk in introducing performance regression. Performance regression testing can leverage the analysis result to test commits with high risks first while delaying or skipping testing on low-risk commits. To validate this idea's feasibility, we conduct a study on 100 real-world performance regression issues from three widely used, open-source software. Guided by insights from the study, we design PRA and build a tool, PerfScope. Evaluation on the examined problematic commits shows our tool can successfully alarm 91% of them. Moreover, on 600 randomly picked new commits from six large-scale software, with our tool, developers just need to test only 14-22% of the 600 commits and will still be able to alert 87-95% of the commits with performance regression.
Publisher's Version Article Search Video Info
Code Coverage for Suite Evaluation by Developers
Rahul Gopinath, Carlos Jensen, and Alex Groce
(Oregon State University, USA)
One of the key challenges of developers testing code is determining a test suite's quality -- its ability to find faults. The most common approach is to use code coverage as a measure for test suite quality, and diminishing returns in coverage or high absolute coverage as a stopping rule. In testing research, suite quality is often evaluated by a suite's ability to kill mutants (artificially seeded potential faults). Determining which criteria best predict mutation kills is critical to practical estimation of test suite quality. Previous work has only used small sets of programs, and usually compares multiple suites for a single program. Practitioners, however, seldom compare suites --- they evaluate one suite. Using suites (both manual and automatically generated) from a large set of real-world open-source projects shows that evaluation results differ from those for suite-comparison: statement (not block, branch, or path) coverage predicts mutation kills best.
Publisher's Version Article Search
Time Pressure: A Controlled Experiment of Test Case Development and Requirements Review
Mika V. Mäntylä, Kai Petersen, Timo O. A. Lehtinen, and Casper Lassenius
(Aalto University, Finland; Blekinge Institute of Technology, Sweden)
Time pressure is prevalent in the software industry in which shorter and shorter deadlines and high customer demands lead to increasingly tight deadlines. However, the effects of time pressure have received little attention in software engineering research. We performed a controlled experiment on time pressure with 97 observations from 54 subjects. Using a two-by-two crossover design, our subjects performed requirements review and test case development tasks. We found statistically significant evidence that time pressure increases efficiency in test case development (high effect size Cohen’s d=1.279) and in requirements review (medium effect size Cohen’s d=0.650). However, we found no statistically significant evidence that time pressure would decrease effectiveness or cause adverse effects on motivation, frustration or perceived performance. We also investigated the role of knowledge but found no evidence of the mediating role of knowledge in time pressure as suggested by prior work, possibly due to our subjects. We conclude that applying moderate time pressure for limited periods could be used to increase efficiency in software engineering tasks that are well structured and straight forward.
Publisher's Version Article Search Info


Verifying Component and Connector Models against Crosscutting Structural Views
Shahar Maoz, Jan Oliver Ringert, and Bernhard Rumpe
(Tel Aviv University, Israel; RWTH Aachen University, Germany)
The structure of component and connector (C&C) models, which are used in many application domains of software engineering, consists of components at different containment levels, their typed input and output ports, and the connectors between them. C&C views, which we have presented at FSE'13, can be used to specify structural properties of C&C models in an expressive and intuitive way. In this work we address the verification of a C&C model against a C&C view and present efficient (polynomial) algorithms to decide satisfaction. A unique feature of our work, not present in existing approaches to checking structural properties of C&C models, is the generation of witnesses for satisfaction/non-satisfaction and of short natural-language texts, which serve to explain and formally justify the verification results and point the engineer to its causes. A prototype tool and an evaluation over four example systems with multiple views, performance and scalability experiments, as well as a user study of the usefulness of the witnesses for engineers, demonstrate the contribution of our work to the state-of-the-art in component and connector modeling and analysis.
Publisher's Version Article Search
TradeMaker: Automated Dynamic Analysis of Synthesized Tradespaces
Hamid Bagheri, Chong Tang, and Kevin Sullivan
(George Mason University, USA; University of Virginia, USA)
System designers today are focusing less on point solutions for complex systems and more on design spaces, often with a focus on understanding tradeoffs among non-functional properties across such spaces. This shift places a premium on the efficient comparative evaluation of non-functional properties of designs in such spaces. While static analysis of designs will sometimes suffice, often one must run designs dynamically, under comparable loads, to determine properties and tradeoffs. Yet variant designs often present variant interfaces, requiring that common loads be specialized to many interfaces. The main contributions of this paper are a mathematical framework, architecture, and tool for specification-driven synthesis of design spaces and common loads specialized to individual designs for dynamic tradeoff analysis of non-functional properties in large design spaces. To test our approach we used it to run an experiment to test the validity of static metrics for object-relational database mappings, requiring design space and load synthesis for, and dynamic analysis of, hundreds of database designs.
Publisher's Version Article Search
Lifting Model Transformations to Product Lines
Rick Salay, Michalis Famelis, Julia Rubin, Alessio Di Sandro, and Marsha Chechik
(University of Toronto, Canada)
Software product lines and model transformations are two techniques used in industry for managing the development of highly complex software. Product line approaches simplify the handling of software variants while model transformations automate software manipulations such as refactoring, optimization, code generation, etc. While these techniques are well understood independently, combining them to get the benefit of both poses a challenge because most model transformations apply to individual models while model-level product lines represent sets of models. In this paper, we address this challenge by providing an approach for automatically ``lifting'' model transformations so that they can be applied to product lines. We illustrate our approach using a case study and evaluate it through a set of experiments.
Publisher's Version Article Search
Automated Goal Operationalisation Based on Interpolation and SAT Solving
Renzo Degiovanni, Dalal Alrajeh, Nazareno Aguirre, and Sebastian Uchitel
(Universidad Nacional de Río Cuarto, Argentina; Imperial College London, UK; Universidad de Buenos Aires, Argentina)
Goal oriented methods have been successfully employed for eliciting and elaborating software requirements. When goals are assigned to an agent, they have to be operationalised: the agent’s operations have to be refined, by equipping them with appropriate enabling and triggering conditions, so that the goals are fulfilled. Goal operationalisation generally demands a significant effort of the engineer. Although there exist approaches that tackle this problem, they are either informal or at most semi automated, requiring the engineer to assist in the process. In this paper, we present an approach for goal operationalisation that automatically computes required preconditions and required triggering conditions for operations, so that the resulting operations establish the goals. The process is iterative, is able to deal with safety goals and particular kinds of liveness goals, and is based on the use of interpolation and SAT solving.
Publisher's Version Article Search Video

Configuration, Variability, and Clones

Mining Configuration Constraints: Static Analyses and Empirical Results
Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czarnecki
(University of Waterloo, Canada; IT University of Copenhagen, Denmark; Carnegie Mellon University, USA)
Highly-configurable systems allow users to tailor the software to their specific needs. Not all combinations of configuration options are valid though, and constraints arise for technical or non-technical reasons. Explicitly describing these constraints in a variability model allows reasoning about the supported configurations. To automate creating variability models, we need to identify the origin of such configuration constraints. We propose an approach which uses build-time errors and a novel feature-effect heuristic to automatically extract configuration constraints from C code. We conduct an empirical study on four highly-configurable open-source systems with existing variability models having three objectives in mind: evaluate the accuracy of our approach, determine the recoverability of existing variability-model constraints using our analysis, and classify the sources of variability-model constraints. We find that both our extraction heuristics are highly accurate (93% and 77% respectively), and that we can recover 19% of the existing variability-models using our approach. However, we find that many of the remaining constraints require expert knowledge or more expensive analyses. We argue that our approach, tooling, and experimental results support researchers and practitioners working on variability model re-engineering, evolution, and consistency-checking techniques.
Publisher's Version Article Search Info
Which Configuration Option Should I Change?
Sai Zhang and Michael D. Ernst
(University of Washington, USA)
Modern software often exposes configuration options that enable users to customize its behavior. During software evolution, developers may change how the configuration options behave. When upgrading to a new software version, users may need to re-configure the software by changing the values of certain configuration options. This paper addresses the following question during the evolution of a configurable software system: which configuration options should a user change to maintain the software's desired behavior? This paper presents a technique (and its tool implementation, called ConfSuggester) to troubleshoot configuration errors caused by software evolution. ConfSuggester uses dynamic profiling, execution trace comparison, and static analysis to link the undesired behavior to its root cause - a configuration option whose value can be changed to produce desired behavior from the new software version. We evaluated ConfSuggester on 8 configuration errors from 6 configurable software systems written in Java. For 6 errors, the rootcause configuration option was ConfSuggester's first suggestion. For 1 error, the root cause was ConfSuggester's third suggestion. The root cause of the remaining error was ConfSuggester's sixth suggestion. Overall, ConfSuggester produced significantly better results than two existing techniques. ConfSuggester runs in just a few minutes, making it an attractive alternative to manual debugging.
Publisher's Version Article Search
Detecting Differences across Multiple Instances of Code Clones
Yun Lin, Zhenchang Xing, Yinxing Xue, Yang Liu, Xin Peng, Jun Sun, and Wenyun Zhao
(Fudan University, China; Nanyang Technological University, Singapore; National University of Singapore, Singapore; Singapore University of Technology and Design, Singapore)
Clone detectors find similar code fragments (i.e., instances of code clones) and report large numbers of them for industrial systems. To maintain or manage code clones, developers often have to investigate differences of multiple cloned code fragments. However,existing program differencing techniques compare only two code fragments at a time. Developers then have to manually combine several pairwise differencing results. In this paper, we present an approach to automatically detecting differences across multiple clone instances. We have implemented our approach as an Eclipse plugin and evaluated its accuracy with three Java software systems. Our evaluation shows that our algorithm has precision over 97.66% and recall over 95.63% in three open source Java projects. We also conducted a user study of 18 developers to evaluate the usefulness of our approach for eight clone-related refactoring tasks. Our study shows that our approach can significantly improve developers’performance in refactoring decisions, refactoring details, and task completion time on clone-related refactoring tasks. Automatically detecting differences across multiple clone instances also opens opportunities for building practical applications of code clones in software maintenance, such as auto-generation of application skeleton, intelligent simultaneous code editing.
Publisher's Version Article Search
Achieving Accuracy and Scalability Simultaneously in Detecting Application Clones on Android Markets
Kai Chen, Peng Liu, and Yingjun Zhang
(Pennsylvania State University, USA; Institute of Information Engineering at Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China)
Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ``purpose-added" app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ``centroid effect" and the inherent ``monotonicity" property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.
Publisher's Version Article Search

Social Aspects of Software Engineering

Two's Company, Three's a Crowd: A Case Study of Crowdsourcing Software Development
Klaas-Jan Stol and Brian Fitzgerald
(Lero, Ireland; University of Limerick, Ireland)
Crowdsourcing is an emerging and promising approach which involves delegating a variety of tasks to an unknown workforce - the crowd. Crowdsourcing has been applied quite successfully in various contexts from basic tasks on Amazon Mechanical Turk to solving complex industry problems, e.g. InnoCentive. Companies are increasingly using crowdsourcing to accomplish specific software development tasks. However, very little research exists on this specific topic. This paper presents an in-depth industry case study of crowdsourcing software development at a multinational corporation. Our case study highlights a number of challenges that arise when crowdsourcing software development. For example, the crowdsourcing development process is essentially a waterfall model and this must eventually be integrated with the agile approach used by the company. Crowdsourcing works better for specific software development tasks that are less complex and stand-alone without interdependencies. The development cost was much greater than originally expected, overhead in terms of company effort to prepare specifications and answer crowdsourcing community queries was much greater, and the time-scale to complete contests, review submissions and resolve quality issues was significant. Finally, quality issues were pushed later in the lifecycle given the lengthy process necessary to identify and resolve quality issues. Given the emphasis in software engineering on identifying bugs as early as possible, this is quite problematic.
Publisher's Version Article Search Info
Does Latitude Hurt while Longitude Kills? Geographical and Temporal Separation in a Large Scale Software Development Project
Patrick Wagstrom and Subhajit Datta
(IBM Research, USA; Singapore University of Technology and Design, Singapore)
Distributed software development allows firms to leverage cost advantages and place work near centers of competency. This distribution comes at a cost -- distributed teams face challenges from differing cultures, skill levels, and a lack of shared working hours. In this paper we examine whether and how geographic and temporal separation in a large scale distributed software development influences developer interactions. We mine the work item trackers for a large commercial software project with a globally distributed development team. We examine both the time to respond and the propensity of individuals to respond and find that when taken together, geographic distance has little effect, while temporal separation has a significant negative impact on the time to respond. However, both have little impact on the social network of individuals in the organization. These results suggest that while temporally distributed teams do communicate, it is at a slower rate, and firms may wish to locate partner teams in similar time zones for maximal performance.
Publisher's Version Article Search
Software Engineering at the Speed of Light: How Developers Stay Current using Twitter
Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey
(University of Victoria, Canada; Federal University of Rio Grande do Norte, Brazil)
The microblogging service Twitter has over 500 million users posting over 500 million tweets daily. Research has established that software developers use Twitter in their work, but this has not yet been examined in detail. Twitter is an important medium in some software engineering circles—understanding its use could lead to improved support, and learning more about the reasons for non-adoption could inform the design of improved tools. In a qualitative study, we surveyed 271 and interviewed 27 developers active on GitHub. We find that Twitter helps them keep up with the fast-paced development landscape. They use it to stay aware of industry changes, for learning, and for building relationships. We discover the challenges they experience and extract their coping strategies. Some developers do not want to or cannot embrace Twitter for their work—we show their reasons and alternative channels. We validate our findings in a follow-up survey with more than 1,200 respondents.
Publisher's Version Article Search Info
Building It Together: Synchronous Development in OSS
Qi Xuan and Vladimir Filkov
(Zhejiang University of Technology, China; University of California at Davis, USA)
In distributed software development synchronized actions are important for completion of complex, interleaved tasks that require the abilities of multiple people. Synchronous development is manifested when file commits by two developers are close together in time and modify the same files. Here we propose quantitative methods for identifying synchronized activities in OSS projects, and use them to relate developer synchronization with effective productivity and communication. In particular, we define co-commit bursts and communication bursts, as intervals of time rich in co-commit and correspondence activities, respectively, and construct from them smoothed time series which can be, subsequently, correlated to discover synchrony. We find that synchronized co-commits between developers are associated with their effective productivity and coordination: during co-commit bursts, vs. at other times, the project size grows faster even though the overall coding effort slows down. We also find strong correlation between synchronized co-commits and communication, that is, for pairs of developers, more co-commit bursts are accompanied with more communication bursts, and their relationship follows closely a linear model. In addition, synchronized co-commits and communication activities occur very close together in time, thus, they can also be thought of as synchronizing each other. This study can help with better understanding collaborative mechanisms in OSS and the role communication plays in distributed software engineering.
Publisher's Version Article Search


A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair
Martin Monperrus
(University of Lille, France; INRIA, France)
At ICSE'2013, there was the first session ever dedicated to automatic program repair. In this session, Kim et al. presented PAR, a novel template-based approach for fixing Java bugs. We strongly disagree with key points of this paper. Our critical review has two goals. First, we aim at explaining why we disagree with Kim and colleagues and why the reasons behind this disagreement are important for research on automatic software repair in general. Second, we aim at contributing to the field with a clarification of the essential ideas behind automatic software repair. In particular we discuss the main evaluation criteria of automatic software repair: understandability, correctness and completeness. We show that depending on how one sets up the repair scenario, the evaluation goals may be contradictory. Eventually, we discuss the nature of fix acceptability and its relation to the notion of software correctness.
Publisher's Version Article Search
Data-Guided Repair of Selection Statements
Divya Gopinath, Sarfraz Khurshid, Diptikalyan Saha, and Satish Chandra
(University of Texas at Austin, USA; IBM Research, India; Samsung Electronics, USA)
Database-centric programs form the backbone of many enterprise systems. Fixing defects in such programs takes much human effort due to the interplay between imperative code and database-centric logic. This paper presents a novel data-driven approach for automated fixing of bugs in the selection condition of database statements (e.g., WHERE clause of SELECT statements) – a common form of bugs in such programs. Our key observation is that in real-world data, there is information latent in the distribution of data that can be useful to repair selection conditions efficiently. Given a faulty database program and input data, only a part of which induces the defect, our novelty is in determining the correct behavior for the defect-inducing data by taking advantage of the information revealed by the rest of the data. We accomplish this by employing semi-supervised learning to predict the correct behavior for defect-inducing data and by patching up any inaccuracies in the prediction by a SAT-based combinatorial search. Next, we learn a compact decision tree for the correct behavior, including the correct behavior on the defect-inducing data. This tree suggests a plausible fix to the selection condition. We demonstrate the feasibility of our approach on seven realworld examples.
Publisher's Version Article Search
The Strength of Random Search on Automated Program Repair
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang
(National University of Defense Technology, China)
Automated program repair recently received considerable attentions, and many techniques on this research area have been proposed. Among them, two genetic-programming-based techniques, GenProg and Par, have shown the promising results. In particular, GenProg has been used as the baseline technique to check the repair effectiveness of new techniques in much literature. Although GenProg and Par have shown their strong ability of fixing real-life bugs in nontrivial programs, to what extent GenProg and Par can benefit from genetic programming, used by them to guide the patch search process, is still unknown. To address the question, we present a new automated repair technique using random search, which is commonly considered much simpler than genetic programming, and implement a prototype tool called RSRepair. Experiment on 7 programs with 24 versions shipping with real-life bugs suggests that RSRepair, in most cases (23/24), outperforms GenProg in terms of both repair effectiveness (requiring fewer patch trials) and efficiency (requiring fewer test case executions), justifying the stronger strength of random search over genetic programming. According to experimental results, we suggest that every proposed technique using optimization algorithm should check its effectiveness by comparing it with random search.
Publisher's Version Article Search
MintHint: Automated Synthesis of Repair Hints
Shalini Kaleeswaran, Varun Tulsian, Aditya Kanade, and Alessandro Orso
(Indian Institute of Science, India; Georgia Tech, USA)
Being able to automatically repair programs is at the same time a very compelling vision and an extremely challenging task. In this paper, we present MintHint, a novel technique for program repair that is a departure from most of today’s approaches. Instead of trying to fully automate program repair, which is often an unachievable goal, MintHint performs statistical correlation analysis to identify expressions that are likely to occur in the repaired code and generates, using pattern-matching based synthesis, repair hints from these expressions. Intuitively, these hints suggest how to rectify a faulty statement and help developers find a complete, actual repair. We also present an empirical evaluation of MintHint in two parts. The first part is a user study that shows that, when debugging, developers’ productivity improved manyfold with the use of repair hints—instead of traditional fault localization information alone. The second part consists of applying MintHint to several faults in Unix utilities to further assess the effectiveness of the approach. Our results show that MintHint performs well even in common situations where (1) the repair space searched does not contain the exact repair, and (2) the operational specification obtained from the test cases for repair is incomplete or even imprecise, which can be challenging for approaches aiming at fully automated repair.
Publisher's Version Article Search Video

Formal Analysis

Mining Behavior Models from User-Intensive Web Applications
Carlo Ghezzi, Mauro Pezzè, Michele Sama, and Giordano Tamburrelli
(Politecnico di Milano, Italy; University of Lugano, Switzerland; Touchtype, UK)
Many modern user-intensive applications, such as Web applications, must satisfy the interaction requirements of thousands if not millions of users, which can be hardly fully understood at design time. Designing applications that meet user behaviors, by efficiently supporting the prevalent navigation patterns, and evolving with them requires new approaches that go beyond classic software engineering solutions. We present a novel approach that automates the acquisition of user-interaction requirements in an incremental and reflective way. Our solution builds upon inferring a set of probabilistic Markov models of the users' navigational behaviors, dynamically extracted from the interaction history given in the form of a log file. We annotate and analyze the inferred models to verify quantitative properties by means of probabilistic model checking. The paper investigates the advantages of the approach referring to a Web application currently in use.
Publisher's Version Article Search
Reviser: Efficiently Updating IDE-/IFDS-Based Data-Flow Analyses in Response to Incremental Program Changes
Steven Arzt and Eric Bodden
(TU Darmstadt, Germany; Fraunhofer SIT, Germany)
Most application code evolves incrementally, and especially so when being maintained after the applications have been deployed. Yet, most data-flow analyses do not take advantage of this fact. Instead they require clients to recompute the entire analysis even if little code has changed—a time consuming undertaking, especially with large libraries or when running static analyses often, e.g., on a continuous-integration server. In this work, we present Reviser, a novel approach for automatically and efficiently updating inter-procedural dataflow analysis results in response to incremental program changes. Reviser follows a clear-and-propagate philosophy, aiming at clearing and recomputing analysis information only where required, thereby greatly reducing the required computational effort. The Reviser algorithm is formulated as an extension to the IDE framework for Inter-procedural Finite Distributed Environment problems and automatically updates arbitrary IDE-based analyses. We have implemented Reviser as an open-source extension to the Heros IFDS/IDE solver and the Soot program-analysis framework. An evaluation of Reviser on various client analyses and target programs shows performance gains of up to 80% in comparison to a full recomputation. The experiments also show Reviser to compute the same results as a full recomputation on all instances tested.
Publisher's Version Article Search Info
Automated Design of Self-Adaptive Software with Control-Theoretical Formal Guarantees
Antonio Filieri, Henry Hoffmann, and Martina Maggio
(University of Stuttgart, Germany; University of Chicago, USA; Lund University, Sweden)
Self-adaptation enables software to execute successfully in dynamic, unpredictable, and uncertain environments. Control theory provides a broad set of mathematically grounded techniques for adapting the behavior of dynamic systems. While it has been applied to specific software control problems, it has proved difficult to define methodologies allowing non-experts to systematically apply control techniques to create adaptive software. These difficulties arise because computer systems are usually non-linear, with varying workloads and heterogeneous components, making it difficult to model software as a dynamic system; i.e., by means of differential or difference equations. This paper proposes a broad scope methodology for automatically constructing both an approximate dynamic model of a software system and a suitable controller for managing its non-functional requirements. Despite its generality, this methodology provides formal guarantees concerning the system's dynamic behavior by keeping its model continuously updated to compensate for changes in the execution environment and effects of the initial approximation. We apply the methodology to three case studies, demonstrating its generality by tackling different domains (and different non-functional requirements) with the same approach. Being broadly applicable and fully automated, this methodology may allow the adoption of control theoretical solutions (and their formal properties) for a wide range of software adaptation problems.
Publisher's Version Article Search
Perturbation Analysis of Stochastic Systems with Empirical Distribution Parameters
Guoxin Su and David S. Rosenblum
(National University of Singapore, Singapore)
Probabilistic model checking is a quantitative verification technology for computer systems and has been the focus of intense research for over a decade. While in many circumstances of probabilistic model checking it is reasonable to anticipate a possible discrepancy between a stochastic model and a real-world system it represents, the state-of-the-art provides little account for the effects of this discrepancy on verification results. To address this problem, we present a perturbation approach in which quantities such as transition probabilities in the stochastic model are allowed to be perturbed from their measured values. We present a rigorous mathematical characterization for variations that can occur to verification results in the presence of model perturbations. The formal treatment is based on the analysis of a parametric variant of discrete-time Markov chains, called parametric Markov chains (PMCs), which are equipped with a metric to measure their perturbed vector variables. We employ an asymptotic method from perturbation theory to compute two forms of perturbation bounds, namely condition numbers and quadratic bounds, for automata-based verification of PMCs. We also evaluate our approach with case studies on variant models for three widely studied systems, the Zeroconf protocol, the Leader Election Protocol and the NAND Multiplexer.
Publisher's Version Article Search

Configuration Management

How Do Centralized and Distributed Version Control Systems Impact Software Changes?
Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig
(Oregon State University, USA)
Distributed Version Control Systems (DVCS) have seen an increase in popularity relative to traditional Centralized Version Control Systems (CVCS). Yet we know little on whether developers are benefitting from the extra power of DVCS. Without such knowledge, researchers, developers, tool builders, and team managers are in the danger of making wrong assumptions. In this paper we present the first in-depth, large scale empirical study that looks at the influence of DVCS on the practice of splitting, grouping, and committing changes. We recruited 820 participants for a survey that sheds light into the practice of using DVCS. We also analyzed 409M lines of code changed by 358300 commits, made by 5890 developers, in 132 repositories containing a total of 73M LOC. Using this data, we uncovered some interesting facts. For example, (i) commits made in distributed repositories were 32% smaller than the centralized ones, (ii) developers split commits more often in DVCS, and (iii) DVCS commits are more likely to have references to issue tracking labels.
Publisher's Version Article Search Info
Transition from Centralized to Decentralized Version Control Systems: A Case Study on Reasons, Barriers, and Outcomes
Kıvanç Muşlu, Christian Bird, Nachiappan Nagappan, and Jacek Czerwonka
(University of Washington, USA; Microsoft Research, USA; Microsoft, USA)
In recent years, software development has started to transition from centralized version control systems (CVCSs) to decentralized version control systems (DVCSs). Although CVCSs and DVCSs have been studied extensively, there has been little research on the transition across these systems. This paper investigates the transition process, from the developer’s view, in a large company. The paper captures the transition reasons, barriers, and outcomes through 10 developer interviews, and investigates these findings through a survey, participated by 70 developers. The paper identifies that the majority of the developers need to work incrementally and offline, and manage multiple contexts efficiently. DVCSs fulfill these developer needs; however the transition comes with a cost depending on the previous development workflow. The paper discusses the transition reasons, barriers and outcomes, and provides recommendations for teams planning such a transition. The paper shows that lightweight branches, and local and incremental commits were the main reasons for developers wanting to move to a DVCS. Further, the paper identifies the main problems with the transition process as: steep DVCS learning curve; incomplete DVCS integration with the rest of the development workflow; and DVCS scaling issues.
Publisher's Version Article Search
An Exploratory Study of the Pull-Based Software Development Model
Georgios Gousios, Martin Pinzger, and Arie van Deursen
(Delft University of Technology, Netherlands; University of Klagenfurt, Austria)
The advent of distributed version control systems has led to the development of a new paradigm for distributed software development; instead of pushing changes to a central repository, developers pull them from other repositories and merge them locally. Various code hosting sites, notably Github, have tapped on the opportunity to facilitate pull-based development by offering workflow support tools, such as code reviewing systems and integrated issue trackers. In this work, we explore how pull-based software development works, first on the GHTorrent corpus and then on a carefully selected sample of 291 projects. We find that the pull request model offers fast turnaround, increased opportunities for community engagement and decreased time to incorporate contributions. We show that a relatively small number of factors affect both the decision to merge a pull request and the time to process it. We also examine the reasons for pull request rejection and find that technical ones are only a small minority.
Publisher's Version Article Search
Influence of Social and Technical Factors for Evaluating Contribution in GitHub
Jason Tsay, Laura Dabbish, and James Herbsleb
(Carnegie Mellon University, USA)
Open source software is commonly portrayed as a meritocracy, where decisions are based solely on their technical merit. However, literature on open source suggests a complex social structure underlying the meritocracy. Social work environments such as GitHub make the relationships between users and between users and work artifacts transparent. This transparency enables developers to better use information such as technical value and social connections when making work decisions. We present a study on open source software contribution in GitHub that focuses on the task of evaluating pull requests, which are one of the primary methods for contributing code in GitHub. We analyzed the association of various technical and social measures with the likelihood of contribution acceptance. We found that project managers made use of information signaling both good technical contribution practices for a pull request and the strength of the social connection between the submitter and project manager when evaluating pull requests. Pull requests with many comments were much less likely to be accepted, moderated by the submitter's prior interaction in the project. Well-established projects were more conservative in accepting pull requests. These findings provide evidence that developers use both technical and social information when evaluating potential contributions to open source software projects.
Publisher's Version Article Search

Software Understanding

Understanding JavaScript Event-Based Interactions
Saba Alimadadi, Sheldon Sequeira, Ali Mesbah, and Karthik Pattabiraman
(University of British Columbia, Canada)
Web applications have become one of the fastest growing types of software systems today. Despite their popularity, understanding the behaviour of modern web applications is still a challenging endeavour for developers during development and maintenance tasks. The challenges mainly stem from the dynamic, event-driven, and asynchronous nature of the JavaScript language. We propose a generic technique for capturing low-level event-based interactions in a web application and mapping those to a higher-level behavioural model. This model is then transformed into an interactive visualization, representing episodes of triggered causal and temporal events, related JavaScript code executions, and their impact on the dynamic DOM state. Our approach, implemented in a tool called Clematis, allows developers to easily understand the complex dynamic behaviour of their application at three different semantic levels of granularity. The results of our industrial controlled experiment show that Clematis is capable of improving the task accuracy by 61%, while reducing the task completion time by 47%.
Publisher's Version Article Search
Understanding Understanding Source Code with Functional Magnetic Resonance Imaging
Janet Siegmund, Christian Kästner, Sven Apel, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann
(University of Passau, Germany; Carnegie Mellon University, USA; Georgia Tech, USA; Leibniz Institute for Neurobiology, Germany; Metop Research Institute, Germany; University of Magdeburg, Germany)
Program comprehension is an important cognitive process that inherently eludes direct measurement. Thus, researchers are struggling with providing suitable programming languages, tools, or coding conventions to support developers in their everyday work. In this paper, we explore whether functional magnetic resonance imaging (fMRI), which is well established in cognitive neuroscience, is feasible to soundly measure program comprehension. In a controlled experiment, we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax errors. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing---all processes that fit well to our understanding of program comprehension. Our results encourage us and, hopefully, other researchers to use fMRI in future studies to measure program comprehension and, in the long run, answer questions, such as: Can we predict whether someone will be an excellent programmer? How effective are new languages and tools for program understanding? How should we train programmers?
Publisher's Version Article Search
Improving Automated Source Code Summarization via an Eye-Tracking Study of Programmers
Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sidney D'Mello
(University of Notre Dame, USA)
Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the process of selecting the subset: a high-quality selection would contain the same statements and keywords that a programmer would choose. Unfortunately, little evidence exists about the statements and keywords that programmers view as important when they summarize source code. In this paper, we present an eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods. We apply the findings to build a novel summarization tool. Then, we evaluate this tool and provide evidence to support the development of source code summarization systems.
Publisher's Version Article Search
Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development
Thomas Fritz, Andrew Begel, Sebastian C. Müller, Serap Yigit-Elliott, and Manuela Züger
(University of Zurich, Switzerland; Microsoft Research, USA; Exponent, USA)
Software developers make programming mistakes that cause serious bugs for their customers. Existing work to detect problematic software focuses mainly on post hoc identification of correlations between bug fixes and code. We propose a new approach to address this problem --- detect when software developers are experiencing difficulty while they work on their programming tasks, and stop them before they can introduce bugs into the code. In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult. We can predict nominal task difficulty (easy/difficult) for a new developer with 64.99% precision and 64.58% recall, and for a new task with 84.38% precision and 69.79% recall. We can improve the Naive Bayes classifier's performance if we trained it on just the eye-tracking data over the entire dataset, or by using a sliding window data collection schema with a 55 second time window. Our work brings the community closer to a viable and reliable measure of task difficulty that could power the next generation of programming support tools.
Publisher's Version Article Search


Dictionary Learning Based Software Defect Prediction
Xiao-Yuan Jing, Shi Ying, Zhi-Wu Zhang, Shan-Shan Wu, and Jin Liu
(Wuhan University, China; Nanjing University of Posts and Telecommunications, China)
In order to improve the quality of a software system, software defect prediction aims to automatically identify defective software modules for efficient software test. To predict software defect, those classification methods with static code attributes have attracted a great deal of attention. In recent years, machine learning techniques have been applied to defect prediction. Due to the fact that there exists the similarity among different software modules, one software module can be approximately represented by a small proportion of other modules. And the representation coefficients over the pre-defined dictionary, which consists of historical software module data, are generally sparse. In this paper, we propose to use the dictionary learning technique to predict software defect. By using the characteristics of the metrics mined from the open source software, we learn multiple dictionaries (including defective module and defective-free module sub-dictionaries and the total dictionary) and sparse representation coefficients. Moreover, we take the misclassification cost issue into account because the misclassification of defective modules generally incurs much higher risk cost than that of defective-free ones. We thus propose a cost-sensitive discriminative dictionary learning (CDDL) approach for software defect classification and prediction. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that CDDL outperforms several representative state-of-the-art defect prediction methods.
Publisher's Version Article Search
Comparing Static Bug Finders and Statistical Prediction
Foyzur Rahman, Sameer Khatri, Earl T. Barr, and Premkumar Devanbu
(University of California at Davis, USA; University College London, UK)
The all-important goal of delivering better software at lower cost has led to a vital, enduring quest for ways to find and remove defects efficiently and accurately. To this end, two parallel lines of research have emerged over the last years. Static analysis seeks to find defects using algorithms that process well-defined semantic abstractions of code. Statistical defect prediction uses historical data to estimate parameters of statistical formulae modeling the phenomena thought to govern defect occurrence and predict where defects are likely to occur. These two approaches have emerged from distinct intellectual traditions and have largely evolved independently, in “splendid isolation”. In this paper, we evaluate these two (largely) disparate approaches on a similar footing. We use historical defect data to apprise the two approaches, compare them, and seek synergies. We find that under some accounting principles, they provide comparable benefits; we also find that in some settings, the performance of certain static bug-finders can be enhanced using information provided by statistical defect prediction.
Publisher's Version Article Search
Coverage Is Not Strongly Correlated with Test Suite Effectiveness
Laura Inozemtseva and Reid Holmes
(University of Waterloo, Canada)
The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites. We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness. We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.
Publisher's Version Article Search Video Info
How to Make Best Use of Cross-Company Data in Software Effort Estimation?
Leandro L. Minku and Xin Yao
(University of Birmingham, UK)
Previous works using Cross-Company (CC) data for making Within-Company (WC) Software Effort Estimation (SEE) try to use CC data or models directly to provide predictions in the WC context. So, these data or models are only helpful when they match the WC context well. When they do not, a fair amount of WC training data, which are usually expensive to acquire, are still necessary to achieve good performance. We investigate how to make best use of CC data, so that we can reduce the amount of WC data while maintaining or improving performance in comparison to WC SEE models. This is done by proposing a new framework to learn the relationship between CC and WC projects explicitly, allowing CC models to be mapped to the WC context. Such mapped models can be useful even when the CC models themselves do not match the WC context directly. Our study shows that a new approach instantiating this framework is able not only to use substantially less WC data than a corresponding WC model, but also to achieve similar/better performance. This approach can also be used to provide insight into the behaviour of a company in comparison to others.
Publisher's Version Article Search


CARE: Cache Guided Deterministic Replay for Concurrent Java Programs
Yanyan Jiang, Tianxiao Gu, Chang Xu, Xiaoxing Ma, and Jian Lu
(Nanjing University, China)
Deterministic replay tools help programmers debug concurrent programs. However, for long-running programs, a replay tool may generate huge log of shared memory access dependences. In this paper, we present CARE, an application-level deterministic record and replay technique to reduce the log size. The key idea of CARE is logging read-write dependences only at per-thread value prediction cache misses. This strategy records only a subset of all exact read-write dependences, and reduces synchronizations protecting memory reads in the instrumented code. Realizing that such record strategy provides only value-deterministic replay, CARE also adopts variable grouping and action prioritization heuristics to synthesize sequentially consistent executions at replay in linear time. We implemented CARE in Java and experimentally evaluated it with recognized benchmarks. Results showed that CARE successfully resolved all missing read-write dependences, producing sequentially consistent replay for all benchmarks. CARE exhibited 1.7--40X (median 3.4X) smaller runtime overhead, and 1.1--309X (median 7.0X) smaller log size against state-of-the-art technique LEAP.
Publisher's Version Article Search
Inferring Models of Concurrent Systems from Logs of Their Behavior with CSight
Ivan Beschastnikh, Yuriy Brun, Michael D. Ernst, and Arvind Krishnamurthy
(University of British Columbia, Canada; University of Massachusetts, USA; University of Washington, USA)
Concurrent systems are notoriously difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process, and documentation is often incomplete and out of sync with the implementation. To provide developers with more insight into concurrent systems, we developed CSight. CSight mines logs of a system's executions to infer a concise and accurate model of that system's behavior, in the form of a communicating finite state machine (CFSM). Engineers can use the inferred CFSM model to understand complex behavior, detect anomalies, debug, and increase confidence in the correctness of their implementations. CSight's only requirement is that the logged events have vector timestamps. We provide a tool that automatically adds vector timestamps to system logs. Our tool prototypes are available at http://synoptic.googlecode.com/. This paper presents algorithms for inferring CFSM models from traces of concurrent systems, proves them correct, provides an implementation, and evaluates the implementation in two ways: by running it on logs from three different networked systems and via a user study that focused on bug finding. Our evaluation finds that CSight infers accurate models that can help developers find bugs.
Publisher's Version Article Search Info
Unleashing Concurrency for Irregular Data Structures
Peng Liu and Charles Zhang
(Wuhan University, China; Hong Kong University of Science and Technology, China)
To implement the atomicity in accessing the irregular data structure, developers often use the coarse-grained locking because the hierarchical nature of the data structure makes the reasoning of fine-grained locking difficult and error-prone for the update of an ancestor field in the data structure may affect its descendants. The coarse-grained locking disallows the concurrent accesses to the entire data structure and leads to a low degree of concurrency. We propose an approach, built upon the Multiple Granularity Lock (MGL), that replaces the coarse-grained locks to unleash more concurrency for irregular data structures. Our approach is widely applicable and does not require the data structures to have special shapes. We produce the MGL locks through reasoning about the hierarchy of the data structure and the accesses to it. According to the evaluation results on widely used applications, our optimization brings the significant speedup, e.g., at least 7%-20% speedup and up to 2X speedup.
Publisher's Version Article Search
ConLock: A Constraint-Based Approach to Dynamic Checking on Deadlocks in Multithreaded Programs
Yan Cai, Shangru Wu, and W. K. Chan
(City University of Hong Kong, China)
Many predictive deadlock detection techniques analyze multithreaded programs to suggest potential deadlocks (referred to as cycles or deadlock warnings). Nonetheless, many of such cycles are false positives. On checking these cycles, existing dynamic deadlock confirmation techniques may frequently encounter thrashing or result in a low confirmation probability. This paper presents a novel technique entitled ConLock to address these problems. ConLock firstly analyzes a given cycle and the execution trace that produces the cycle. It identifies a set of thread scheduling constraints based on a novel should-happen-before relation. ConLock then manipulates a confirmation run with the aim to not violate a reduced set of scheduling constraints and to trigger an occurrence of the deadlock if the cycle is a real deadlock. If the cycle is a false positive, ConLock reports scheduling violations. We have validated ConLock using a suite of real-world programs with 11 deadlocks. The result shows that among all 741 cycles reported by Magiclock, ConLock confirms all 11 deadlocks with a probability of 71%−100%. On the remaining 730 cycles, ConLock reports scheduling violations on each. We have systematically sampled 87 out of the 730 cycles and confirmed that all these cycles are false positives.
Publisher's Version Article Search

Apps and Energy

SEEDS: A Software Engineer's Energy-Optimization Decision Support Framework
Irene Manotas, Lori Pollock, and James Clause
(University of Delaware, USA)
Reducing the energy usage of software is becoming more important in many environments, in particular, battery-powered mobile devices, embedded systems and data centers. Recent empirical studies indicate that software engineers can support the goal of reducing energy usage by making design and implementation decisions in ways that take into consideration how such decisions impact the energy usage of an application. However, the large number of possible choices and the lack of feedback and information available to software engineers necessitates some form of automated decision-making support. This paper describes the first known automated support for systematically optimizing the energy usage of applications by making code-level changes. It is effective at reducing energy usage while freeing developers from needing to deal with the low-level, tedious tasks of applying changes and monitoring the resulting impacts to the energy usage of their application. We present a general framework, SEEDS, as well as an instantiation of the framework that automatically optimizes Java applications by selecting the most energy-efficient library implementations for Java's Collections API. Our empirical evaluation of the framework and instantiation show that it is possible to improve the energy usage of an application in a fully automated manner for a reasonable cost.
Publisher's Version Article Search
APE: An Annotation Language and Middleware for Energy-Efficient Mobile Application Development
Nima Nikzad, Octav Chipara, and William G. Griswold
(University of California at San Diego, USA; University of Iowa, USA)
Energy-efficiency is a key concern in continuously-running mobile applications, such as those for health and context monitoring. Unfortunately, developers must implement complex and customized power-management policies for each application. This involves the use of complex primitives and writing error-prone multithreaded code to monitor hardware state. To address this problem, we present APE, an annotation language and middleware service that eases the development of energy-efficient Android applications. APE annotations are used to demarcate a power-hungry code segment whose execution is deferred until the device enters a state that minimizes the cost of that operation. The execution of power-hungry operations is coordinated across applications by the APE middleware. Several examples show the expressive power of our approach. A case study of using APE annotations in a real mobile sensing application shows that annotations can cleanly specify a power management policy and reduce the complexity of its implementation. An empirical evaluation of the middleware shows that APE introduces negligible overhead and equals hand-tuned code in energy savings, in this case achieving 63.4% energy savings compared to the case when there is no coordination.
Publisher's Version Article Search
Making Web Applications More Energy Efficient for OLED Smartphones
Ding Li, Angelica Huyen Tran, and William G. J. Halfond
(University of Southern California, USA)
A smartphone’s display is one of its most energy consuming components. Modern smartphones use OLED displays that consume more energy when displaying light colors as op- posed to dark colors. This is problematic as many popular mobile web applications use large light colored backgrounds. To address this problem we developed an approach for auto- matically rewriting web applications so that they generate more energy efficient web pages. Our approach is based on program analysis of the structure of the web application im- plementation. In the evaluation of our approach we show that it can achieve a 40% reduction in display power con- sumption. A user study indicates that the transformed web pages are acceptable to users with over 60% choosing to use the transformed pages for normal usage.
Publisher's Version Article Search

Testing 2

Micro Execution
Patrice Godefroid
(Microsoft Research, USA)
Micro execution is the ability to execute any code fragment without a user-provided test driver or input data. The user simply identifies a function or code location in an exe or dll. A runtime Virtual Machine (VM) customized for testing purposes then starts executing the code at that location, catches all memory operations before they occur, allocates memory on-the-fly in order to perform those read/write memory operations, and provides input values according to a customizable memory policy, which defines what read memory accesses should be treated as inputs. MicroX is a first prototype VM allowing micro execution of x86 binary code. No test driver, no input data, no source code, no debug symbols are required: MicroX automatically discovers dynamically the Input/Output interface of the code being run. Input values are provided as needed along the execution and can be generated in various ways, e.g., randomly or using some other test-generation tool. To our knowledge, MicroX is the first VM designed for test isolation and generation purposes. This paper introduces micro execution and discusses how to implement it, strengths and limitations, applications, related work and long-term goals.
Publisher's Version Article Search
Unit Test Virtualization with VMVM
Jonathan Bell and Gail Kaiser
(Columbia University, USA)
Testing large software packages can become very time intensive. To address this problem, researchers have investigated techniques such as Test Suite Minimization. Test Suite Minimization reduces the number of tests in a suite by removing tests that appear redundant, at the risk of a reduction in fault-finding ability since it can be difficult to identify which tests are truly redundant. We take a completely different approach to solving the same problem of long running test suites by instead reducing the time needed to execute each test, an approach that we call Unit Test Virtualization. With Unit Test Virtualization, we reduce the overhead of isolating each unit test with a lightweight virtualization container. We describe the empirical analysis that grounds our approach and provide an implementation of Unit Test Virtualization targeting Java applications. We evaluated our implementation, VMVM, using 20 real-world Java applications and found that it reduces test suite execution time by up to 97% (on average, 62%) when compared to traditional unit test execution. We also compared VMVM to a well known Test Suite Minimization technique, finding the reduction provided by VMVM to be four times greater, while still executing every test with no loss of fault-finding ability.
Publisher's Version Article Search Info
Interpolated N-Grams for Model Based Testing
Paolo Tonella, Roberto Tiella, and Cu Duy Nguyen
(Fondazione Bruno Kessler, Italy; University of Luxembourg, Luxembourg)
Models - in particular finite state machine models - provide an invaluable source of information for the derivation of effective test cases. However, models usually approximate part of the program semantics and capture only some of the relevant dependencies and constraints. As a consequence, some of the test cases that are derived from models are infeasible. In this paper, we propose a method, based on the computation of the N-gram statistics, to increase the likelihood of deriving feasible test cases from a model. Correspondingly, the level of model coverage is also expected to increase, because infeasible test cases do not contribute to coverage. While N-grams do improve existing test case derivation methods, they show limitations when the N-gram statistics is incomplete, which is expected to necessarily occur as N increases. Interpolated N-grams overcome such limitation and show the highest performance of all test case derivation methods compared in this work.
Publisher's Version Article Search Video
An Analysis of the Relationship between Conditional Entropy and Failed Error Propagation in Software Testing
Kelly Androutsopoulos, David Clark, Haitao Dan, Robert M. Hierons, and Mark Harman
(Middlesex University, UK; University College London, UK; Brunel University, UK)
Failed error propagation (FEP) is known to hamper software testing, yet it remains poorly understood. We introduce an information theoretic formulation of FEP that is based on measures of conditional entropy. This formulation considers the situation in which we are interested in the potential for an incorrect program state at statement s to fail to propagate to incorrect output. We define five metrics that differ in two ways: whether we only consider parts of the program that can be reached after executing s and whether we restrict attention to a single program path of interest .We give the results of experiments in which it was found that on average one in 10 tests suffered from FEP, earlier studies having shown that this figure can vary significantly between programs. The experiments also showed that our metrics are well-correlated with FEP. Our empirical study involved 30 programs, for which we executed a total of 7,140,000 test cases. The results reveal that the metrics differ in their performance but the Spearman rank correlation with failed error propagation is close to 0.95 for two of the metrics. These strong correlations in an experimental setting, in which all information about both FEP and conditional entropy is known, open up the possibility in the longer term of devising inexpensive information theory based metrics that allow us to minimise the effect of FEP.
Publisher's Version Article Search

Code Contracts, Invariants, and Robustness

Trading Robustness for Maintainability: An Empirical Study of Evolving C# Programs
Nélio Cacho, Thiago César, Thomas Filipe, Eliezio Soares, Arthur Cassio, Rafael Souza, Israel Garcia, Eiji Adachi Barbosa, and Alessandro Garcia
(Federal University of Rio Grande do Norte, Brazil; PUC-Rio, Brazil)
Mainstream programming languages provide built-in exception handling mechanisms to support robust and maintainable implementation of exception handling in software systems. Most of these modern languages, such as C#, Ruby, Python and many others, are often claimed to have more appropriated exception handling mechanisms. They reduce programming constraints on exception handling to favor agile changes in the source code. These languages provide what we call maintenance-driven exception handling mechanisms. It is expected that the adoption of these mechanisms improve software maintainability without hindering software robustness. However, there is still little empirical knowledge about the impact that adopting these mechanisms have on software robustness. This paper addressed this gap by conducting an empirical study aimed at understanding the relationship between changes in C# programs and their robustness. In particular, we evaluated how changes in the normal and exceptional code were related to exception handling faults. We applied a change impact analysis and a control flow analysis in 119 versions of 16 C# programs. The results showed that: (i) most of the problems hindering software robustness in those programs are caused by changes in the normal code, (ii) many potential faults were introduced even when improving exception handling in C# code, and (iii) faults are often facilitated by the maintenance-driven flexibility of the exception handling mechanism. Moreover, we present a series of change scenarios that decrease the program robustness.
Publisher's Version Article Search
Case Studies and Tools for Contract Specifications
Todd W. Schiller, Kellen Donohue, Forrest Coward, and Michael D. Ernst
(University of Washington, USA)
Contracts are a popular tool for specifying the functional behavior of software. This paper characterizes the contracts that developers write, the contracts that developers could write, and how a developer reacts when shown the difference. This paper makes three research contributions based on an investigation of open-source projects' use of Code Contracts. First, we characterize Code Contract usage in practice. For example, approximately three-fourths of the Code Contracts are basic checks for the presence of data. We discuss similarities and differences in usage across the projects, and we identify annotation burden, tool support, and training as possible explanations based on developer interviews. Second, based on contracts automatically inferred for four of the projects, we find that developers underutilize contracts for expressing state updates, object state indicators, and conditional properties. Third, we performed user studies to learn how developers decide which contracts to enforce. The developers used contract suggestions to support their existing use cases with more expressive contracts. However, the suggestions did not lead them to experiment with other use cases for which contracts are better-suited. In support of the research contributions, the paper presents two engineering contributions: (1) Celeriac, a tool for generating traces of .NET programs compatible with the Daikon invariant detection tool, and (2) Contract Inserter, a Visual Studio add-in for discovering and inserting likely invariants as Code Contracts.
Publisher's Version Article Search Info
Using Dynamic Analysis to Generate Disjunctive Invariants
ThanhVu Nguyen, Deepak Kapur, Westley Weimer, and Stephanie Forrest
(University of New Mexico, USA; University of Virginia, USA)
Program invariants are important for defect detection, program verification, and program repair. However, existing techniques have limited support for important classes of invariants such as disjunctions, which express the semantics of conditional statements. We propose a method for generating disjunctive invariants over numerical domains, which are inexpressible using classical convex polyhedra. Using dynamic analysis and reformulating the problem in non-standard ``max-plus'' and ``min-plus'' algebras, our method constructs hulls over program trace points. Critically, we introduce and infer a weak class of such invariants that balances expressive power against the computational cost of generating nonconvex shapes in high dimensions. Existing dynamic inference techniques often generate spurious invariants that fit some program traces but do not generalize. With the insight that generating dynamic invariants is easy, we propose to verify these invariants statically using k-inductive SMT theorem proving which allows us to validate invariants that are not classically inductive. Results on difficult kernels involving nonlinear arithmetic and abstract arrays suggest that this hybrid approach efficiently generates and proves correct program invariants.
Publisher's Version Article Search
Inductive Verification of Data Model Invariants for Web Applications
Ivan Bocić and Tevfik Bultan
(University of California at Santa Barbara, USA)
Modern software applications store their data in remote cloud servers. Users interact with these applications using web browsers or thin clients running on mobile devices. A key issue in dependability of these applications is the correctness of the actions that update the data store, which are triggered by user requests. In this paper, we present techniques for au- tomatically checking if the actions of an application preserve the data model invariants. Our approach first automatically extracts a data model specification, which we call an abstract data store, from a given application using instrumented exe- cution. The abstract data store identifies the sets of objects and relations (associations) used by the application, and the actions that update the data store by deleting or creating objects or by changing the relations among the objects. We show that checking invariants of an abstract data store corre- sponds to inductive invariant verification, and can be done using a mapping to First Order Logic (FOL) and using a FOL theorem prover. We implemented this approach for the Rails framework and applied it to three open source applications. We found four previously unknown bugs and reported them to the developers, who confirmed and imme- diately fixed two of them.
Publisher's Version Article Search

Search and APIs

How Do API Documentation and Static Typing Affect API Usability?
Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik
(University of Duisburg-Essen, Germany; University of Chile, Chile; University of Nevada at Las Vegas, USA)
When developers use Application Programming Interfaces (APIs), they often rely on documentation to assist their tasks. In previous studies, we reported evidence indicating that static type systems acted as a form of implicit documentation, benefiting developer productivity. Such implicit documentation is easier to maintain, given it is enforced by the compiler, but previous experiments tested users without any explicit documentation. In this paper, we report on a controlled experiment and an exploratory study comparing the impact of using documentation and a static or dynamic type system on a development task. Results of our study both confirm previous findings and show that the benefits of static typing are strengthened with explicit documentation, but that this was not as strongly felt with dynamically typed languages.
Publisher's Version Article Search
Live API Documentation
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes
(University of Waterloo, Canada)
Application Programming Interfaces (APIs) provide powerful abstraction mechanisms that enable complex functionality to be used by client programs. However, this abstraction does not come for free: understanding how to use an API can be difficult. While API documentation can help, it is often insufficient on its own. Online sites like Stack Overflow and Github Gists have grown to fill the gap between traditional API documentation and more example-based resources. Unfortunately, these two important classes of documentation are independent. In this paper we describe an iterative, deductive method of linking source code examples to API documentation. We also present an implementation of this method, called Baker, that is highly precise (0.97) and supports both Java and JavaScript. Baker can be used to enhance traditional API documentation with up-to-date source code examples; it can also be used to incorporate links to the API documentation into the code snippets that use the API.
Publisher's Version Article Search Video Info
CodeHint: Dynamic and Interactive Synthesis of Code Snippets
Joel Galenson, Philip Reames, Rastislav Bodik, Björn Hartmann, and Koushik Sen
(University of California at Berkeley, USA)
There are many tools that help programmers find code fragments, but most are inexpressive and rely on static information. We present a new technique for synthesizing code that is dynamic (giving accurate results and allowing programmers to reason about concrete executions), easy-to-use (supporting a wide range of correctness specifications), and interactive (allowing users to refine the candidate code snippets). Our implementation, which we call CodeHint, generates and evaluates code at runtime and hence can synthesize real-world Java code that involves I/O, reflection, native calls, and other advanced language features. We have evaluated CodeHint in two user studies and show that its algorithms are efficient and that it improves programmer productivity by more than a factor of two.
Publisher's Version Article Search Video Info
Spotting Working Code Examples
Iman Keivanloo, Juergen Rilling, and Ying Zou
(Queen's University, Canada; Concordia University, Canada)
Working code examples are useful resources for pragmatic reuse in software development. A working code example provides a solution to a specific programming problem. Earlier studies have shown that existing code search engines are not successful in finding working code examples. They fail in ranking high quality code examples at the top of the result set. To address this shortcoming, a variety of pattern-based solutions are proposed in the literature. However, these solutions cannot be integrated seamlessly in Internet-scale source code engines due to their high time complexity or query language restrictions. In this paper, we propose an approach for spotting working code examples which can be adopted by Internet-scale source code search engines. The time complexity of our approach is as low as the complexity of existing code search engines on the Internet and considerably lower than the pattern-based approaches supporting free-form queries. We study the performance of our approach using a representative corpus of 25,000 open source Java projects. Our findings support the feasibility of our approach for Internet-scale code search. We also found that our approach outperforms Ohloh Code search engine, previously known as Koders, in spotting working code examples.
Publisher's Version Article Search

Adaptive Systems

Self-Adaptation through Incremental Generative Model Transformations at Runtime
Bihuan Chen, Xin Peng, Yijun Yu, Bashar Nuseibeh, and Wenyun Zhao
(Fudan University, China; Open University, UK; University of Limerick, Ireland)
A self-adaptive system uses runtime models to adapt its architecture to the changing requirements and contexts. However, there is no one-to-one mapping between the requirements in the problem space and the architectural elements in the solution space. Instead, one refined requirement may crosscut multiple architectural elements, and its realization involves complex behavioral or structural interactions manifested as architectural design decisions. In this paper we propose to combine two kinds of self-adaptations: requirements-driven self-adaptation, which captures requirements as goal models to reason about the best plan within the problem space, and architecture-based self-adaptation, which captures architectural design decisions as decision trees to search for the best design for the desired requirements within the contextualized solution space. Following these adaptations, component-based architecture models are reconfigured using incremental and generative model transformations. Compared with requirements-driven or architecture-based approaches, the case study using an online shopping benchmark shows promise that our approach can further improve the effectiveness of adaptation (e.g. system throughput in this case study) and offer more adaptation flexibility.
Publisher's Version Article Search
Hope for the Best, Prepare for the Worst: Multi-tier Control for Adaptive Systems
Nicolas D'Ippolito, Víctor Braberman, Jeff Kramer, Jeff Magee, Daniel Sykes, and Sebastian Uchitel
(Imperial College London, UK; Universidad de Buenos Aires, Argentina)
Most approaches for adaptive systems rely on models, particularly behaviour or architecture models, which describe the system and the environment in which it operates. One of the difficulties in creating such models is uncertainty about the accuracy and completeness of the models. Engineers therefore make assumptions which may prove to be invalid at runtime. In this paper we introduce a rigorous, tiered framework for combining behaviour models, each with different associated assumptions and risks. These models are used to generate operational strategies, through techniques such controller synthesis, which are then executed concurrently at runtime. We show that our framework can be used to adapt the functional behaviour of the system: through graceful degradation when the assumptions of a higher level model are broken, and through progressive enhancement when those assumptions are satisfied or restored.
Publisher's Version Article Search Video
Brownout: Building More Robust Cloud Applications
Cristian Klein, Martina Maggio, Karl-Erik Årzén, and Francisco Hernández-Rodriguez
(Umeå University, Sweden; Lund University, Sweden)
Self-adaptation is a first class concern for cloud applications, which should be able to withstand diverse runtime changes. Variations are simultaneously happening both at the cloud infrastructure level - for example hardware failures - and at the user workload level - flash crowds. However, robustly withstanding extreme variability, requires costly hardware over-provisioning. In this paper, we introduce a self-adaptation programming paradigm called brownout. Using this paradigm, applications can be designed to robustly withstand unpredictable runtime variations, without over-provisioning. The paradigm is based on optional code that can be dynamically deactivated through decisions based on control theory. We modified two popular web application prototypes - RUBiS and RUBBoS - with less than 170 lines of code, to make them brownout-compliant. Experiments show that brownout self-adaptation dramatically improves the ability to withstand flash-crowds and hardware failures.
Publisher's Version Article Search Info
Integrating Adaptive User Interface Capabilities in Enterprise Applications
Pierre A. Akiki, Arosha K. Bandara, and Yijun Yu
(Open University, UK)
Many existing enterprise applications are at a mature stage in their development and are unable to easily benefit from the usability gains offered by adaptive user interfaces (UIs). Therefore, a method is needed for integrating adaptive UI capabilities into these systems without incurring a high cost or significantly disrupting the way they function. This paper presents a method for integrating adaptive UI behavior in enterprise applications based on CEDAR, a model-driven, service-oriented, and tool-supported architecture for devising adaptive enterprise application UIs. The proposed integration method is evaluated with a case study, which includes establishing and applying technical metrics to measure several of the method’s properties using the open-source enterprise application OFBiz as a test-case. The generality and flexibility of the integration method are also evaluated based on an interview and discussions with practitioners about their real-life projects.
Publisher's Version Article Search Video Info

Build and Package Management

Programmers' Build Errors: A Case Study (at Google)
Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge
(Hong Kong University of Science and Technology, China; Google, USA; University of Nebraska-Lincoln, USA)
Building is an integral part of the software development process. However, little is known about the compiler errors that occur in this process. In this paper, we present an empirical study of 26.6 million builds produced during a period of nine months by thousands of developers. We describe the workflow through which those builds are generated, and we analyze failure frequency, compiler error types, and resolution efforts to fix those compiler errors. The results provide insights on how a large organization build process works, and pinpoints errors for which further developer support would be most effective.
Publisher's Version Article Search
Understanding and Improving Software Build Teams
Shaun Phillips, Thomas Zimmermann, and Christian Bird
(University of Calgary, Canada; Microsoft Research, USA)
Build, creating software from source code, is a fundamental activity in software development. Build teams manage this process and ensure builds are produced reliably and efficiently. This paper presents an exploration into the nature of build teams--how they form, work, and relate to other teams--through three multi-method studies conducted at Microsoft. We also consider build team effectiveness and find that many challenges are social, not technical: role ambiguity, knowledge sharing, communication, trust, and conflict. Our findings validate theories from group dynamics and organization science, and using a cross-discipline approach, we apply learnings from these fields to inform the design of engineering tools and practices to improve build team effectiveness
Publisher's Version Article Search
Towards Efficient Optimization in Package Management Systems
Alexey Ignatiev, Mikoláš Janota, and Joao Marques-Silva
(INESC-ID, Portugal; University College Dublin, Ireland)
Package management as a means of reuse of software artifacts has become extremely popular, most notably in Linux distributions. At the same time, successful package management brings about a number of computational challenges. Whenever a user requires a new package to be installed, a package manager not only installs the new package but it might also install other packages or uninstall some old ones in order to respect dependencies and conflicts of the packages. Coming up with a new configuration of packages is computationally challenging. It is in particular complex when we also wish to optimize for user preferences, such as that the resulting package configuration should not differ too much from the original one. A number of exact approaches for solving this problem have been proposed in recent years. These approaches, however, do not have guaranteed runtime due to the high computational complexity of the problem. This paper addresses this issue by devising a hybrid approach that integrates exact solving with approximate solving by invoking the approximate part whenever the solver is running out of time. Experimental evaluation shows that this approach enables returning high-quality package configurations with rapid response time.
Publisher's Version Article Search
Easing Software Component Repository Evolution
Jérôme Vouillon, Mehdi Dogguy, and Roberto Di Cosmo
(University Paris Diderot, France; CNRS, France; EDF, France; Debian, France; INRIA, France)
Modern software systems are built by composing components drawn from large repositories, whose size and complexity increase at a fast pace. Maintaining and evolving these software collections is a complex task, and a strict qualification process needs to be enforced. We studied in depth the Debian software repository, one of the largest and most complex existing ones, and we developed comigrate, an extremely efficient tool that is able to identify the largest sets of components that can migrate to the reference repository without violating its quality constraints. This tool outperforms significantly all existing tools, and provides detailed information that is crucial to understand the reasons why some components cannot migrate. Extensive validation on the Debian distribution has been performed. The core architecture of the tool is quite general, and can be easily adapted to other software repositories.
Publisher's Version Article Search Info


AR-Miner: Mining Informative Reviews for Developers from Mobile App Marketplace
Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang
(Nanyang Technological University, Singapore; Carnegie Mellon University, USA)
With the popularity of smartphones and mobile devices, mobile application (a.k.a. “app”) markets have been growing exponentially in terms of number of users and downloads. App developers spend considerable effort on collecting and exploiting user feedback to improve user satisfaction, but suffer from the absence of effective user review analytics tools. To facilitate mobile app developers discover the most “informative” user reviews from a large and rapidly increasing pool of user reviews, we present “AR-Miner” — a novel computational framework for App Review Mining, which performs comprehensive analytics from raw user reviews by (i) first extracting informative user reviews by filtering noisy and irrelevant ones, (ii) then grouping the informative reviews automatically using topic modeling, (iii) further prioritizing the informative reviews by an effective review ranking scheme, (iv) and finally presenting the groups of most “informative” reviews via an intuitive visualization approach. We conduct extensive experiments and case studies on four popular Android apps to evaluate AR-Miner, from which the encouraging results indicate that AR-Miner is effective, efficient and promising for app developers.
Publisher's Version Article Search
Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features
Robert Dyer, Hridesh Rajan, Hoan Anh Nguyen, and Tien N. Nguyen
(Iowa State University, USA)
Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 31k open-source Java projects representing over 9 million Java files, which when parsed contain over 18 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: there are millions of places features could potentially be used but weren't; developers convert existing code to use new features; and we found thousands of instances of potential resource handling bugs.
Publisher's Version Article Search Info
Mining Interprocedural, Data-Oriented Usage Patterns in JavaScript Web Applications
Hung Viet Nguyen, Hoan Anh Nguyen, Anh Tuan Nguyen, and Tien N. Nguyen
(Iowa State University, USA)
A frequently occurring usage of program elements in a programming language and software libraries is called a usage pattern. In JavaScript (JS) Web applications, JS usage patterns in their source code have special characteristics that pose challenges in pattern mining. They involve nested data objects with no corresponding names or types. JS functions can be also used as data objects. JS usages are often cross-language, inter-procedural, and involve control and data flow dependencies among JS program entities and data objects whose data types are revealed only at run time due to dynamic typing in JS. This paper presents JSModel, a novel graph-based representation for JS usages, and JSMiner, a scalable approach to mine inter-procedural, data-oriented JS usage patterns. Our empirical evaluation on several Web programs shows that JSMiner efficiently detects more JS patterns with higher accuracy than a state-of-the-art approach. We conducted experiments to show JSModel's usefulness in two applications: detecting anti-patterns (buggy patterns) and documenting JS APIs via pattern skeletons. Our controlled experiment shows that the mined patterns are useful as JS documentation and code templates.
Publisher's Version Article Search
Mining Fine-Grained Code Changes to Detect Unknown Change Patterns
Stas Negara, Mihai Codoban, Danny Dig, and Ralph E. Johnson
(University of Illinois at Urbana-Champaign, USA; Oregon State University, USA)
Identifying repetitive code changes benefits developers, tool builders, and researchers. Tool builders can automate the popular code changes, thus improving the productivity of developers. Researchers can better understand the practice of code evolution, advancing existing code assistance tools and benefiting developers even further. Unfortunately, existing research either predominantly uses coarse-grained Version Control System (VCS) snapshots as the primary source of code evolution data or considers only a small subset of program transformations of a single kind - refactorings. We present the first approach that identifies previously unknown frequent code change patterns from a fine-grained sequence of code changes. Our novel algorithm effectively handles challenges that distinguish continuous code change pattern mining from the existing data mining techniques. We evaluated our algorithm on 1,520 hours of code development collected from 23 developers, and showed that it is effective, useful, and scales to large amounts of data. We analyzed some of the mined code change patterns and discovered ten popular kinds of high-level program transformations. More than half of our 420 survey participants acknowledged that eight out of ten transformations are relevant to their programming activities.
Publisher's Version Article Search

Automated Bug Detection and Repair

Detecting Memory Leaks through Introspective Dynamic Behavior Modeling using Machine Learning
Sangho Lee, Changhee Jung, and Santosh Pande
(Georgia Tech, USA; Virginia Tech, USA)
This paper expands staleness-based memory leak detection by presenting a machine learning-based framework. The proposed framework is based on an idea that object staleness can be better leveraged in regard to similarity of objects; i.e., an object is more likely to have leaked if it shows significantly high staleness not observed from other similar objects with the same allocation context. A central part of the proposed framework is the modeling of heap objects. To this end, the framework observes the staleness of objects during a representative run of an application. From the observed data, the framework generates training examples, which also contain instances of hypothetical leaks. Via machine learning, the proposed framework replaces the error-prone user-definable staleness predicates used in previous research with a model-based prediction. The framework was tested using both synthetic and real-world examples. Evaluation with synthetic leakage workloads of SPEC2006 benchmarks shows that the proposed method achieves the optimal accuracy permitted by staleness-based leak detection. Moreover, by incorporating allocation context into the model, the proposed method achieves higher accuracy than is possible with object staleness alone. Evaluation with real-world memory leaks demonstrates that the proposed method is effective for detecting previously reported bugs with high accuracy.
Publisher's Version Article Search
Automated Memory Leak Detection for Production Use
Changhee Jung, Sangho Lee, Easwaran Raman, and Santosh Pande
(Virginia Tech, USA; Georgia Tech, USA; Google, USA)
This paper presents Sniper, an automated memory leak detection tool for C/C++ production software. To track the staleness of allocated memory (which is a clue to potential leaks) with little overhead (mostly <3%), Sniper leverages instruction sampling using performance monitoring units available in commodity processors. It also offloads the time- and space-consuming analyses, and works on the original software without modifying the underlying memory allocator; it neither perturbs the application execution nor increases the heap size. The Sniper can even deal with multithreaded applications with very low overhead. In particular, it performs a statistical analysis, which views memory leaks as anomalies, for automated and systematic leak determination. Consequently, it accurately detected real-world memory leaks with no false positive, and achieved an F-measure of 81% on average for 17 benchmarks stress-tested with various memory leaks.
Publisher's Version Article Search
Vejovis: Suggesting Fixes for JavaScript Faults
Frolin S. Ocariza, Jr., Karthik Pattabiraman, and Ali Mesbah
(University of British Columbia, Canada)
JavaScript is used in web applications for achieving rich user interfaces and implementing core functionality. Unfortunately, JavaScript code is known to be prone to faults. In an earlier study, we found that over 65% of such faults are caused by the interaction of JavaScript code with the DOM at runtime (DOM-related faults). In this paper, we first perform an analysis of 190 bug reports to understand fixes commonly applied by programmers to these DOM-related faults; we observe that parameter replacements and DOM element validations are common fix categories. Based on these findings, we propose an automated technique and tool, called Vejovis, for suggesting repairs for DOM-based JavaScript faults. To evaluate Vejovis, we conduct a case study in which we subject Vejovis to 22 real-world bugs across 11 applications. We find that Vejovis accurately suggests repairs for 20 out of the 22 bugs, and in 13 of the 20 cases, the correct fix was the top ranked one.
Publisher's Version Article Search
Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells due to Ambiguous Computation
Wensheng Dou, Shing-Chi Cheung, and Jun Wei
(Institute of Software at Chinese Academy of Sciences, China; Hong Kong University of Science and Technology, China)
Spreadsheets are widely used by end users for numerical computation in their business. Spreadsheet cells whose computation is subject to the same semantics are often clustered in a row or column. When a spreadsheet evolves, these cell clusters can degenerate due to ad hoc modifications or undisciplined copy-and-pastes. Such degenerated clusters no longer keep cells prescribing the same computational semantics, and are said to exhibit ambiguous computation smells. Our empirical study finds that such smells are common and likely harmful. We propose AmCheck, a novel technique that automatically detects and repairs ambiguous computation smells by recovering their intended computational semantics. A case study using AmCheck suggests that it is useful for discovering and repairing real spreadsheet problems.
Publisher's Version Article Search


Us and Them: A Study of Privacy Requirements Across North America, Asia, and Europe
Swapneel Sheth, Gail Kaiser, and Walid Maalej
(Columbia University, USA; University of Hamburg, Germany)
Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. However, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives.
Publisher's Version Article Search
Distilling Privacy Requirements for Mobile Applications
Keerthi Thomas, Arosha K. Bandara, Blaine A. Price, and Bashar Nuseibeh
(Open University, UK; University of Limerick, Ireland)
As mobile computing applications have become commonplace, it is increasingly important for them to address end-users’ privacy requirements. Privacy requirements depend on a number of contextual socio-cultural factors to which mobility adds another level of contextual variation. However, traditional requirements elicitation methods do not sufficiently account for contextual factors and therefore cannot be used effectively to represent and analyse the privacy requirements of mobile end users. On the other hand, methods that do investigate contextual factors tend to produce data that does not lend itself to the process of requirements extraction. To address this problem we have developed a Privacy Requirements Distillation approach that employs a problem analysis framework to extract and refine privacy requirements for mobile applications from raw data gathered through empirical studies involving end users. Our approach introduces privacy facets that capture patterns of privacy concerns which are matched against the raw data. We demonstrate and evaluate our approach using qualitative data from an empirical study of a mobile social networking application.
Publisher's Version Article Search
Uncertainty, Risk, and Information Value in Software Requirements and Architecture
Emmanuel Letier, David Stefan, and Earl T. Barr
(University College London, UK)
Uncertainty complicates early requirements and architecture decisions and may expose a software project to significant risk. Yet software architects lack support for evaluating uncertainty, its impact on risk, and the value of reducing uncertainty before making critical decisions. We propose to apply decision analysis and multi-objective optimisation techniques to provide such support. We present a systematic method allowing software architects to describe uncertainty about the impact of alternatives on stakeholders' goals; to calculate the consequences of uncertainty through Monte-Carlo simulation; to shortlist candidate architectures based on expected costs, benefits and risks; and to assess the value of obtaining additional information before deciding. We demonstrate our method on the design of a system for coordinating emergency response teams. Our approach highlights the need for requirements engineering and software cost estimation methods to disclose uncertainty instead of hiding it.
Publisher's Version Article Search
Requirements Fixation
Rahul Mohanani, Paul Ralph, and Ben Shreeve
(Lancaster University, UK)
There is a broad consensus that understanding system desiderata (requirements) and design creativity are both important for software engineering success. However, little research has addressed the relationship between design creativity and the way requirements are framed or presented. This paper therefore aims to investigate the possibility that the way desiderata are framed or presented can affect design creativity. Forty two participants took part in a randomized control trial where one group received desiderata framed as “requirements” while the other received desiderata framed as “ideas”. Participants produced design concepts which were judged for originality. Participants who received requirements framing produced significantly less original designs than participants who received ideas framing (Mann-Whitney U=116.5, p=0.004). We conclude that framing desiderata as “requirements” may cause requirements fixation where designers’ preoccupation with satisfying explicit requirements inhibits their creativity.
Publisher's Version Article Search Video

Testing and Conformance Verification

Exploring Variability-Aware Execution for Testing Plugin-Based Web Applications
Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen
(Iowa State University, USA; Carnegie Mellon University, USA)
In plugin-based systems, plugin conflicts may occur when two or more plugins interfere with one another, changing their expected behaviors. It is highly challenging to detect plugin conflicts due to the exponential explosion of the combinations of plugins (i.e., configurations). In this paper, we address the challenge of executing a test case over many configurations. Leveraging the fact that many executions of a test are similar, our variability-aware execution runs common code once. Only when encountering values that are different depending on specific configurations will the execution split to run for each of them. To evaluate the scalability of variability-aware execution on a large real-world setting, we built a prototype PHP interpreter called Varex and ran it on the popular WordPress blogging Web application. The results show that while plugin interactions exist, there is a significant amount of sharing that allows variability-aware execution to scale to 2^50 configurations within seven minutes of running time. During our study, with Varex, we were able to detect two plugin conflicts: one was recently reported on WordPress forum and another one was not previously discovered.
Publisher's Version Article Search
A Study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence
Xiangjuan Yao, Mark Harman, and Yue Jia
(China University of Mining and Technology, China; University College London, UK)
Though mutation testing has been widely studied for more than thirty years, the prevalence and properties of equivalent mutants remain largely unknown. We report on the causes and prevalence of equivalent mutants and their relationship to stubborn mutants (those that remain undetected by a high quality test suite, yet are non-equivalent). Our results, based on manual analysis of 1,230 mutants from 18 programs, reveal a highly uneven distribution of equivalence and stubbornness. For example, the ABS class and half UOI class generate many equivalent and almost no stubborn mutants, while the LCR class generates many stubborn and few equivalent mutants. We conclude that previous test effectiveness studies based on fault seeding could be skewed, while developers of mutation testing tools should prioritise those operators that we found generate disproportionately many stubborn (and few equivalent) mutants.
Publisher's Version Article Search Info
Cross-Checking Oracles from Intrinsic Software Redundancy
Antonio Carzaniga, Alberto Goffi, Alessandra Gorla, Andrea Mattavelli, and Mauro Pezzè
(University of Lugano, Switzerland; Saarland University, Germany; University of Milano-Bicocca, Italy)
Despite the recent advances in automatic test generation, testers must still write test oracles manually. If formal specifications are available, it might be possible to use decision procedures derived from those specifications. We present a technique that is based on a form of specification but also leverages more information from the system under test. We assume that the system under test is somewhat redundant, in the sense that some operations are designed to behave like others but their executions are different. Our experience in this and previous work indicates that this redundancy exists and is easily documented. We then generate oracles by cross-checking the execution of a test with the same test in which we replace some operations with redundant ones. We develop this notion of cross-checking oracles into a generic technique to automatically insert oracles into unit tests. An experimental evaluation shows that cross-checking oracles, used in combination with automatic test generation techniques, can be very effective in revealing faults, and that they can even improve good hand-written test suites.
Publisher's Version Article Search Video
Mind the Gap: Assessing the Conformance of Software Traceability to Relevant Guidelines
Patrick Rempel, Patrick Mäder, Tobias Kuschke, and Jane Cleland-Huang
(TU Ilmenau, Germany; DePaul University, USA)
Many guidelines for safety-critical industries such as aeronautics, medical devices, and railway communications, specify that traceability must be used to demonstrate that a rigorous process has been followed and to provide evidence that the system is safe for use. In practice, there is a gap between what is prescribed by guidelines and what is implemented in practice, making it difficult for organizations and certifiers to fully evaluate the safety of the software system. In this paper we present an approach, which parses a guideline to extract a Traceability Model depicting software artifact types and their prescribed traces. It then analyzes the traceability data within a project to identify areas of traceability failure. Missing traceability paths, redundant and/or inconsistent data, and other problems are highlighted. We used our approach to evaluate the traceability of seven safety-critical software systems and found that none of the evaluated projects contained traceability that fully conformed to its relevant guidelines.
Publisher's Version Article Search Video

Modeling and Interfaces

Effects of Using Examples on Structural Model Comprehension: A Controlled Experiment
Dina Zayan, Michał Antkiewicz, and Krzysztof Czarnecki
(University of Waterloo, Canada)
We present a controlled experiment for the empirical evaluation of Example-Driven Modeling (EDM), an approach that systematically uses examples for model comprehension and domain knowledge transfer. We conducted the experiment with 26 graduate and undergraduate students from electrical and computer engineering (ECE), computer science (CS), and software engineering (SE) programs at the University of Waterloo. The experiment involves a domain model, with UML class diagrams representing the domain abstractions and UML object diagrams representing examples of using these abstractions. The goal is to provide empirical evidence of the effects of suitable examples in model comprehension, compared to having model abstractions only, by having the participants perform model comprehension tasks. Our results show that EDM is superior to having model abstractions only, with an improvement of 39% for diagram completeness, 30% for questions completeness, 71% for efficiency, and a reduction of 80% for the number of mistakes. We provide qualitative results showing that participants receiving model abstractions augmented with examples experienced lower perceived difficulty in performing the comprehension tasks, higher perceived confidence in their tasks' solutions, and asked fewer clarifying domain questions, a reduction of 90%. We also present participants' feedback regarding the usefulness of the provided examples, their number and types, as well as, the use of partial examples.
Publisher's Version Article Search Info
Design Rule Spaces: A New Form of Architecture Insight
Lu Xiao, Yuanfang Cai, and Rick Kazman
(Drexel University, USA; University of Hawaii, USA; SEI, USA)
In this paper, we investigate software architecture as a set of overlapping design rule spaces, formed by one or more structural or evolutionary relationships and clustered using our design rule hierarchy algorithm. Considering evolutionary coupling as a special type of relationship, we investigated (1) whether design rule spaces can reveal structural relations among error-prone files; (2) whether design rule spaces can reveal structural problems contributing to error-proneness.We studied three large-scale open source projects and found that error-prone files can be captured by just a few design rule sub-spaces. Supported by our tool, Titan, we are able to flexibly visualize design rule spaces formed by different types of relationships, including evolutionary dependencies. This way, we are not only able to visualize which error-prone files belong to which design rule spaces, but also to visualize the structural problems that give insight into why these files are error prone. Design rule spaces provide valuable direction on which parts of the architecture are problematic, and on why, when, and how to refactor.
Publisher's Version Article Search
Controlled Modeling Environment using Flexibly-Formatted Spreadsheets
Hisashi Miyashita, Hideki Tai, and Shunichi Amano
(Cybernet Systems, Japan; IBM Research, Japan)
As modeling in software and system development becomes increasingly prevalent, many engineers need to collaboratively develop models spanning many disciplines such as requirements management, system design, software, etc. However, integrating modeling languages for various disciplines is challenging, because UML and SysML are too complex for many engineers to understand. Therefore, in complicated engineering processes, engineers with different areas of expertise often find it difficult to access the same information in different domain-specific modeling environments. Our approach to address this problem is to share and edit the models as task-oriented spreadsheets, using a unified model (in UML or SysML) and a unified user interface (in the spreadsheet program). The formats of the spreadsheets are optimized for various tasks while the target models remain in a unified modeling language. Since the transformation between the spreadsheets and the models is automated and transparent, users do not have to be skilled with the modeling languages to edit the spreadsheets. Using our novel approach, we were able to reduce the errors and time, and also the difficulty for each task without providing specialized training for the engineers. A preliminary user study showed that, by applying the spreadsheet-based approach, we could reduce the number of errors with less time for typical systems engineering tasks.
Publisher's Version Article Search
Feature Maintenance with Emergent Interfaces
Márcio Ribeiro, Paulo Borba, and Christian Kästner
(Federal University of Alagoas, Brazil; Federal University of Pernambuco, Brazil; Carnegie Mellon University, USA)
Hidden code dependencies are responsible for many complications in maintenance tasks. With the introduction of variable features in configurable systems, dependencies may even cross feature boundaries, causing problems that are prone to be detected late. Many current implementation techniques for product lines lack proper interfaces, which could make such dependencies explicit. As alternative to changing the implementation approach, we provide a tool-based solution to support developers in recognizing and dealing with feature dependencies: emergent interfaces. Emergent interfaces are inferred on demand, based on feature-sensitive intraprocedural and interprocedural data-flow analysis. They emerge in the IDE and emulate modularity benefits not available in the host language. To evaluate the potential of emergent interfaces, we conducted and replicated a controlled experiment, and found, in the studied context, that emergent interfaces can improve performance of code change tasks by up to 3 times while also reducing the number of errors.
Publisher's Version Article Search Info

Apps and Verification

Detecting Performance Anti-patterns for Applications Developed using Object-Relational Mapping
Parminder Flora, Weiyi Shang, Zhen Ming Jiang, and Ahmed E. Hassan
(Queen's University, Canada; York University, Canada; BlackBerry, Canada)
Object-Relational Mapping (ORM) provides developers a conceptual abstraction for mapping the application code to the underlying databases. ORM is widely used in industry due to its convenience; permitting developers to focus on developing the business logic without worrying too much about the database access details. However, developers often write ORM code without considering the impact of such code on database performance, leading to cause transactions with timeouts or hangs in large-scale systems. Unfortunately, there is little support to help developers automatically detect suboptimal database accesses. In this paper, we propose an automated framework to detect ORM performance anti-patterns. Our framework automatically flags performance anti-patterns in the source code. Furthermore, as there could be hundreds or even thousands of instances of anti-patterns, our framework provides sup- port to prioritize performance bug fixes based on a statistically rigorous performance assessment. We have successfully evaluated our framework on two open source and one large-scale industrial systems. Our case studies show that our framework can detect new and known real-world performance bugs and that fixing the detected performance anti- patterns can improve the system response time by up to 98%.
Publisher's Version Article Search
Characterizing and Detecting Performance Bugs for Smartphone Applications
Yepang Liu, Chang Xu, and Shing-Chi Cheung
(Hong Kong University of Science and Technology, China; Nanjing University, China)
Smartphone applications’ performance has a vital impact on user experience. However, many smartphone applications suffer from bugs that cause significant performance degradation, thereby losing their competitive edge. Unfortunately, people have little understanding of these performance bugs. They also lack effective techniques to fight with such bugs. To bridge this gap, we conducted a study of 70 real-world performance bugs collected from eight large-scale and popular Android applications. We studied the characteristics (e.g., bug types and how they manifested) of these bugs and identified their common patterns. These findings can support follow-up research on performance bug avoidance, testing, debugging and analysis for smartphone applications. To demonstrate the usefulness of our findings, we implemented a static code analyzer, PerfChecker, to detect our identified performance bug patterns. We experimentally evaluated PerfChecker by applying it to 29 popular Android applications, which comprise 1.1 million lines of Java code. PerfChecker successfully detected 126 matching instances of our performance bug patterns. Among them, 68 were quickly confirmed by developers as previously-unknown issues that affect application performance, and 20 were fixed soon afterwards by following our optimization suggestions.
Publisher's Version Article Search Info
Checking App Behavior Against App Descriptions
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller
(Saarland University, Germany)
How do we know a program does what it claims to do? After clustering Android apps by their description topics, we identify outliers in each cluster with respect to their API usage. A "weather" app that sends messages thus becomes an anomaly; likewise, a "messaging" app would typically not be expected to access the current location. Applied on a set of 22,500+ Android applications, our CHABADA prototype identified several anomalies; additionally, it flagged 56% of novel malware as such, without requiring any known malware patterns.
Publisher's Version Article Search
AsDroid: Detecting Stealthy Behaviors in Android Applications by User Interface and Program Behavior Contradiction
Jianjun Huang, Xiangyu Zhang, Lin Tan, Peng Wang, and Bin Liang
(Purdue University, USA; University of Waterloo, Canada; Renmin University of China, China)
Android smartphones are becoming increasingly popular. The open nature of Android allows users to install miscellaneous applications, including the malicious ones, from third-party marketplaces without rigorous sanity checks. A large portion of existing malwares perform stealthy operations such as sending short messages, making phone calls and HTTP connections, and installing additional malicious components. In this paper, we propose a novel technique to detect such stealthy behavior. We model stealthy behavior as the program behavior that mismatches with user interface, which denotes the user's expectation of program behavior. We use static program analysis to attribute a top level function that is usually a user interaction function with the behavior it performs. Then we analyze the text extracted from the user interface component associated with the top level function. Semantic mismatch of the two indicates stealthy behavior. To evaluate AsDroid, we download a pool of 182 apps that are potentially problematic by looking at their permissions. Among the 182 apps, AsDroid reports stealthy behaviors in 113 apps, with 28 false positives and 11 false negatives.
Publisher's Version Article Search

Symbolic Execution

Patch Verification via Multiversion Interprocedural Control Flow Graphs
Wei Le and Shannon D. Pattison
(Rochester Institute of Technology, USA)
Software development is inherently incremental; however, it is challenging to correctly introduce changes on top of existing code. Recent studies show that 15%-24% of the bug fixes are incorrect, and the most important yet hard-to-acquire information for programming changes is whether this change breaks any code elsewhere. This paper presents a framework, called Hydrogen, for patch verification. Hydrogen aims to automatically determine whether a patch correctly fixes a bug, a new bug is introduced in the change, a bug can impact multiple software releases, and the patch is applicable for all the impacted releases. Hydrogen consists of a novel program representation, namely multiversion interprocedural control flow graph (MVICFG), that integrates and compares control flow of multiple versions of programs, and a demand-driven, path-sensitive symbolic analysis that traverses the MVICFG for detecting bugs related to software changes and versions. In this paper, we present the definition, construction and applications of MVICFGs. Our experimental results show that Hydrogen correctly builds desired MVICFGs and is scalable to real-life programs such as libpng, tightvnc and putty. We experimentally demonstrate that MVICFGs can enable efficient patch verification. Using the results generated by Hydrogen, we have found a few documentation errors related to patches for a set of open-source programs.
Publisher's Version Article Search
Property Differencing for Incremental Checking
Guowei Yang, Sarfraz Khurshid, Suzette Person, and Neha Rungta
(Texas State University, USA; University of Texas at Austin, USA; NASA Langley Research Center, USA; NASA Ames Research Center, USA)
This paper introduces iProperty, a novel approach that facilitates incremental checking of programs based on a property differencing technique. Specifically, iProperty aims to reduce the cost of checking properties as they are initially developed and as they co-evolve with the program. The key novelty of iProperty is to compute the differences between the new and old versions of expected properties to reduce the number and size of the properties that need to be checked during the initial development of the properties. Furthermore, property differencing is used in synergy with program behavior differencing techniques to optimize common regression scenarios, such as detecting regression errors or checking feature additions for conformance to new expected properties. Experimental results in the context of symbolic execution of Java programs annotated with properties written as assertions show the effectiveness of iProperty in utilizing change information to enable more efficient checking.
Publisher's Version Article Search
Symbolic Assume-Guarantee Reasoning through BDD Learning
Fei He, Bow-Yaw Wang, Liangze Yin, and Lei Zhu
(Tsinghua University, China; Academia Sinica, Taiwan)
Both symbolic model checking and assume-guarantee reasoning aim to circumvent the state explosion problem. Symbolic model checking explores many states simultaneously and reports numerous erroneous traces. Automated assume-guarantee reasoning, on the other hand, infers contextual assumptions by inspecting spurious erroneous traces. One would expect that their integration could further improve the capacity of model checking. Yet examining numerous erroneous traces to deduce contextual assumptions can be very time-consuming. The integration of symbolic model checking and assume-guarantee reasoning is thus far from clear. In this paper, we present a progressive witness analysis algorithm for automated assume-guarantee reasoning to exploit a multitude of traces from BDD-based symbolic model checkers. Our technique successfully integrates symbolic model checking with automated assume-guarantee reasoning by directly inferring BDD's as implicit assumptions. It outperforms monolithic symbolic model checking in four benchmark problems and an industrial case study in experiments.
Publisher's Version Article Search
Enhancing Symbolic Execution with Veritesting
Thanassis Avgerinos, Alexandre Rebert, Sang Kil Cha, and David Brumley
(Carnegie Mellon University, USA)
We present MergePoint, a new binary-only symbolic execution system for large-scale and fully unassisted testing of commodity off-the-shelf (COTS) software. MergePoint introduces veritesting, a new technique that employs static symbolic execution to amplify the effect of dynamic symbolic execution. Veritesting allows MergePoint to find twice as many bugs, explore orders of magnitude more paths, and achieve higher code coverage than previous dynamic symbolic execution systems. MergePoint is currently running daily on a 100 node cluster analyzing 33,248 Linux binaries; has generated more than 15 billion SMT queries, 200 million test cases, 2,347,420 crashes, and found 11,687 bugs in 4,379 distinct applications.
Publisher's Version Article Search

Refactoring and Reverse Engineering

Manual Refactoring Changes with Automated Refactoring Validation
Xi Ge and Emerson Murphy-Hill
(North Carolina State University, USA)
Refactoring, the practice of applying behavior-preserving changes to existing code, can enhance the quality of software systems. Refactoring tools can automatically perform and check the correctness of refactorings. However, even when developers have these tools, they still perform about 90% of refactorings manually, which is error-prone. To address this problem, we propose a technique called GhostFactor separating transformation and correctness checking: we allow the developer to transform code manually, but check the correctness of her transformation automatically. We implemented our technique as a Visual Studio plugin, then evaluated it with a human study of eight software developers; GhostFactor improved the correctness of manual refactorings by 67%.
Publisher's Version Article Search
Alternate Refactoring Paths Reveal Usability Problems
Mohsen Vakilian and Ralph E. Johnson
(University of Illinois at Urbana-Champaign, USA)
Modern Integrated Development Environments (IDEs) support many refactorings. Yet, programmers greatly underuse automated refactorings. Recent studies have applied traditional usability testing methodologies such as surveys, lab studies, and interviews to find the usability problems of refactoring tools. However, these methodologies can identify only certain kinds of usability problems. The critical incident technique (CIT) is a general methodology that uncovers usability problems by analyzing troubling user interactions. We adapt CIT to refactoring tools and show that alternate refactoring paths are indicators of the usability problems of refactoring tools. We define an alternate refactoring path as a sequence of user interactions that contains cancellations, reported messages, or repeated invocations of the refactoring tool. We evaluated our method on a large corpus of refactoring usage data, which we collected during a field study on 36 programmers over three months. This method revealed 15 usability problems, 13 of which were previously unknown. We reported these problems and proposed design improvements to Eclipse developers. The developers acknowledged all of the problems and have already fixed four of them. This result suggests that analyzing alternate paths is effective at discovering the usability problems of interactive program transformation (IPT) tools.
Publisher's Version Article Search Info
A Study and Toolkit for Asynchronous Programming in C#
Semih Okur, David L. Hartveld, Danny Dig, and Arie van Deursen
(University of Illinois at Urbana-Champaign, USA; Delft University of Technology, Netherlands; Oregon State University, USA)
Asynchronous programming is in demand today, because responsiveness is increasingly important on all modern devices. Yet, we know little about how developers use asynchronous programming in practice. Without such knowledge, developers, researchers, language and library designers, and tool providers can make wrong assumptions. We present the first study that analyzes the usage of asynchronous programming in a large experiment. We analyzed 1378 open source Windows Phone (WP) apps, comprising 12M SLOC, produced by 3376 developers. Using this data, we answer 2 research questions about use and misuse of asynchronous constructs. Inspired by these findings, we developed (i) Asyncifier, an automated refactoring tool that converts callback-based asynchronous code to use async/await; (ii) Corrector, a tool that finds and corrects common misuses of async/await. Our empirical evaluation shows that these tools are (i) applicable and (ii) efficient. Developers accepted 314 patches generated by our tools.
Publisher's Version Article Search Info
Reuse-Oriented Reverse Engineering of Functional Components from X86 Binaries
Dohyeong Kim, William N. Sumner, Xiangyu Zhang, Dongyan Xu, and Hira Agrawal
(Purdue University, USA; Simon Fraser University, Canada; Applied Communications Sciences, USA)
Locating, extracting, and reusing the implementation of a feature within an existing binary program is challenging. This paper proposes a novel algorithm to identify modular functions corresponding to such features and to provide usable interfaces for the extracted functions. We provide a way to represent a desired feature with two executions that both execute the feature but with different inputs. Instead of reverse engineering the interface of a function, we wrap the existing interface and provide a simpler and more intuitive interface for the function through concretization and redirection. Experiments show that our technique can be applied to extract varied features from several real world applications including a malicious application.
Publisher's Version Article Search

proc time: 0.25