Powered by
Conference Publishing Consulting

2015 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2015), August 30 – September 4, 2015, Bergamo, Italy

ESEC/FSE 2015 – Proceedings

Contents - Abstracts - Authors
Twitter: https://twitter.com/FSEconf


Title Page

Welcome from the Chairs



Research Papers

Adaptive Systems

Proactive Self-Adaptation under Uncertainty: A Probabilistic Model Checking Approach
Gabriel A. Moreno, Javier Cámara, David Garlan, and Bradley Schmerl
(SEI, USA; Carnegie Mellon University, USA)
Self-adaptive systems tend to be reactive and myopic, adapting in response to changes without anticipating what the subsequent adaptation needs will be. Adapting reactively can result in inefficiencies due to the system performing a suboptimal sequence of adaptations. Furthermore, when adaptations have latency, and take some time to produce their effect, they have to be started with sufficient lead time so that they complete by the time their effect is needed. Proactive latency-aware adaptation addresses these issues by making adaptation decisions with a look-ahead horizon and taking adaptation latency into account. In this paper we present an approach for proactive latency-aware adaptation under uncertainty that uses probabilistic model checking for adaptation decisions. The key idea is to use a formal model of the adaptive system in which the adaptation decision is left underspecified through nondeterminism, and have the model checker resolve the nondeterministic choices so that the accumulated utility over the horizon is maximized. The adaptation decision is optimal over the horizon, and takes into account the inherent uncertainty of the environment predictions needed for looking ahead. Our results show that the decision based on a look-ahead horizon, and the factoring of both tactic latency and environment uncertainty, considerably improve the effectiveness of adaptation decisions.
Publisher's Version Article Search
Automated Multi-objective Control for Self-Adaptive Software Design
Antonio Filieri, Henry Hoffmann, and Martina Maggio
(University of Stuttgart, Germany; University of Chicago, USA; Lund University, Sweden)
While software is becoming more complex everyday, the requirements on its behavior are not getting any easier to satisfy. An application should offer a certain quality of service, adapt to the current environmental conditions and withstand runtime variations that were simply unpredictable during the design phase. To tackle this complexity, control theory has been proposed as a technique for managing software's dynamic behavior, obviating the need for human intervention. Control-theoretical solutions, however, are either tailored for the specific application or do not handle the complexity of multiple interacting components and multiple goals. In this paper, we develop an automated control synthesis methodology that takes, as input, the configurable software components (or knobs) and the goals to be achieved. Our approach automatically constructs a control system that manages the specified knobs and guarantees the goals are met. These claims are backed up by experimental studies on three different software applications, where we show how the proposed automated approach handles the complexity of multiple knobs and objectives.
Publisher's Version Article Search
Detecting Event Anomalies in Event-Based Systems
Gholamreza Safi, Arman Shahbazian, William G. J. Halfond, and Nenad Medvidovic
(University of Southern California, USA)
Event-based interaction is an attractive paradigm because its use can lead to highly flexible and adaptable systems. One problem in this paradigm is that events are sent, received, and processed nondeterministically, due to the systems’ reliance on implicit invocation and implicit concurrency. This nondeterminism can lead to event anomalies, which occur when an event-based system receives multiple events that lead to the write of a shared field or memory location. Event anomalies can lead to unreliable, error-prone, and hard to debug behavior in an event-based system. To detect these anomalies, this paper presents a new static analysis technique, DEvA, for automatically detecting event anomalies. DEvA has been evaluated on a set of open-source event-based systems against a state-of-the-art technique for detecting data races in multithreaded systems, and a recent technique for solving a similar problem with event processing in Android applications. DEvA exhibited high precision with respect to manually constructed ground truths, and was able to locate event anomalies that had not been detected by the existing solutions.
Publisher's Version Article Search Video Info

Software Quality

Suggesting Accurate Method and Class Names
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton
(University of Edinburgh, UK; University College London, UK; Microsoft Research, USA)
Descriptive names are a vital part of readable, and hence maintainable, code. Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names. However, suggesting names for methods and classes is much more difficult. This is because good method and class names need to be functionally descriptive, but suggesting such names requires that the model goes beyond local context. We introduce a neural probabilistic language model for source code that is specifically designed for the method naming problem. Our model learns which names are semantically similar by assigning them to locations, called embeddings, in a high-dimensional continuous space, in such a way that names with similar embeddings tend to be used in similar contexts. These embeddings seem to contain semantic information about tokens, even though they are learned only from statistical co-occurrences of tokens. Furthermore, we introduce a variant of our model that is, to our knowledge, the first that can propose neologisms, names that have not appeared in the training corpus. We obtain state of the art results on the method, class, and even the simpler variable naming tasks. More broadly, the continuous embeddings that are learned by our model have the potential for wide application within software engineering.
Publisher's Version Article Search Info
Measure It? Manage It? Ignore It? Software Practitioners and Technical Debt
Neil A. Ernst, Stephany Bellomo, Ipek Ozkaya, Robert L. Nord, and Ian Gorton
The technical debt metaphor is widely used to encapsulate numerous software quality problems. The metaphor is attractive to practitioners as it communicates to both technical and nontechnical audiences that if quality problems are not addressed, things may get worse. However, it is unclear whether there are practices that move this metaphor beyond a mere communication mechanism. Existing studies of technical debt have largely focused on code metrics and small surveys of developers. In this paper, we report on our survey of 1,831 participants, primarily software engineers and architects working in long-lived, software-intensive projects from three large organizations, and follow-up interviews of seven software engineers. We analyzed our data using both nonparametric statistics and qualitative text analysis. We found that architectural decisions are the most important source of technical debt. Furthermore, while respondents believe the metaphor is itself important for communication, existing tools are not currently helpful in managing the details. We use our results to motivate a technical debt timeline to focus management and tooling approaches.
Publisher's Version Article Search Info
Automatically Computing Path Complexity of Programs
Lucas Bang, Abdulbaki Aydin, and Tevfik Bultan
(University of California at Santa Barbara, USA)
Recent automated software testing techniques concentrate on achieving path coverage. We present a complexity measure that provides an upper bound for the number of paths in a program, and hence, can be used for assessing the difficulty of achieving path coverage for a given method. We define the path complexity of a program as a function that takes a depth bound as input and returns the number of paths in the control flow graph that are within that bound. We show how to automatically compute the path complexity function in closed form, and the asymptotic path complexity which identifies the dominant term in the path complexity function. Our results demonstrate that path complexity can be computed efficiently, and it is a better complexity measure for path coverage compared to cyclomatic complexity and NPATH complexity.
Publisher's Version Article Search Info

Synthesis and Search-Based Approaches for Reactive Systems

Systematic Testing of Asynchronous Reactive Systems
Ankush Desai, Shaz Qadeer, and Sanjit A. Seshia
(University of California at Berkeley, USA; Microsoft Research, USA)

We introduce the concept of a delaying explorer with the goal of performing prioritized exploration of the behaviors of an asynchronous reactive program. A delaying explorer stratifies the search space using a custom strategy, and a delay operation that allows deviation from that strategy. We show that prioritized search with a delaying explorer performs significantly better than existing prioritization techniques. We also demonstrate empirically the need for writing different delaying explorers for scalable systematic testing and hence, present a flexible delaying explorer interface. We introduce two new techniques to improve the scalability of search based on delaying explorers. First, we present an algorithm for stratified exhaustive search and use efficient state caching to avoid redundant exploration of schedules. We provide soundness and termination guarantees for our algorithm. Second, for the cases where the state of the system cannot be captured or there are resource constraints, we present an algorithm to randomly sample any execution from the stratified search space. This algorithm guarantees that any such execution that requires d delay operations is sampled with probability at least 1/Ld, where L is the maximum number of program steps. We have implemented our algorithms and evaluated them on a collection of real-world fault-tolerant distributed protocols.

Publisher's Version Article Search Info
Effective Test Suites for Mixed Discrete-Continuous Stateflow Controllers
Reza Matinnejad, Shiva Nejati, Lionel C. Briand, and Thomas Bruckmann
(University of Luxembourg, Luxembourg; Delphi Automotive Systems, Luxembourg)
Modeling mixed discrete-continuous controllers using Stateflow is common practice and has a long tradition in the embedded software system industry. Testing Stateflow models is complicated by expensive and manual test oracles that are not amenable to full automation due to the complex continuous behaviors of such models. In this paper, we reduce the cost of manual test oracles by providing test case selection algorithms that help engineers develop small test suites with high fault revealing power for Stateflow models. We present six test selection algorithms for discrete-continuous Stateflows: An adaptive random test selection algorithm that diversifies test inputs, two white-box coverage-based algorithms, a black-box algorithm that diversifies test outputs, and two search-based black-box algorithms that aim to maximize the likelihood of presence of continuous output failure patterns. We evaluate and compare our test selection algorithms, and find that our three output-based algorithms consistently outperform the coverage- and input-based algorithms in revealing faults in discrete-continuous Stateflow models. Further, we show that our output-based algorithms are complementary as the two search-based algorithms perform best in revealing specific failures with small test suites, while the output diversity algorithm is able to identify different failure types better than other algorithms when test suites are above a certain size.
Publisher's Version Article Search
GR(1) Synthesis for LTL Specification Patterns
Shahar Maoz and Jan Oliver Ringert
(Tel Aviv University, Israel)
Reactive synthesis is an automated procedure to obtain a correct-by-construction reactive system from its temporal logic specification. Two of the main challenges in bringing reactive synthesis to software engineering practice are its very high worst-case complexity -- for linear temporal logic (LTL) it is double exponential in the length of the formula, and the difficulty of writing declarative specifications using basic LTL operators. To address the first challenge, Piterman et al. have suggested the General Reactivity of Rank 1 (GR(1)) fragment of LTL, which has an efficient polynomial time symbolic synthesis algorithm. To address the second challenge, Dwyer et al. have identified 55 LTL specification patterns, which are common in industrial specifications and make writing specifications easier. In this work we show that almost all of the 55 LTL specification patterns identified by Dwyer et al. can be expressed as assumptions and guarantees in the GR(1) fragment of LTL. Specifically, we present an automated, sound and complete translation of the patterns to the GR(1) form, which effectively results in an efficient reactive synthesis procedure for any specification that is written using the patterns. We have validated the correctness of the catalog of GR(1) templates we have created. The work is implemented in our reactive synthesis environment. It provides positive, promising evidence, for the potential feasibility of using reactive synthesis in practice.
Publisher's Version Article Search

Testing I

Modeling Readability to Improve Unit Tests
Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer
(University of Sheffield, UK; University of Virginia, USA)
Writing good unit tests can be tedious and error prone, but even once they are written, the job is not done: Developers need to reason about unit tests throughout software development and evolution, in order to diagnose test failures, maintain the tests, and to understand code written by other developers. Unreadable tests are more difficult to maintain and lose some of their value to developers. To overcome this problem, we propose a domain-specific model of unit test readability based on human judgements, and use this model to augment automated unit test generation. The resulting approach can automatically generate test suites with both high coverage and also improved readability. In human studies users prefer our improved tests and are able to answer maintenance questions about them 14% more quickly at the same level of accuracy.
Publisher's Version Article Search
Improving Model-Based Test Generation by Model Decomposition
Paolo Arcaini, Angelo Gargantini, and Elvinia Riccobene
(Charles University in Prague, Czech Republic; University of Bergamo, Italy; University of Milan, Italy)
One of the well-known techniques for model-based test generation exploits the capability of model checkers to return counterexamples upon property violations. However, this approach is not always optimal in practice due to the required time and memory, or even not feasible due to the state explosion problem of model checking. A way to mitigate these limitations consists in decomposing a system model into suitable subsystem models separately analyzable. In this paper, we show a technique to decompose a system model into subsystems by exploiting the model variables dependency, and then we propose a test generation approach which builds tests for the single subsystems and combines them later in order to obtain tests for the system as a whole. Such approach mitigates the exponential increase of the test generation time and memory consumption, and, compared with the same model-based test generation technique applied to the whole system, shows to be more efficient. We prove that, although not complete, the approach is sound.
Publisher's Version Article Search
Synthesizing Tests for Detecting Atomicity Violations
Malavika Samak and Murali Krishna Ramanathan
(Indian Institute of Science, India)
Using thread-safe libraries can help programmers avoid the complexities of multithreading. However, designing libraries that guarantee thread-safety can be challenging. Detecting and eliminating atomicity violations when methods in the libraries are invoked concurrently is vital in building reliable client applications that use the libraries. While there are dynamic analyses to detect atomicity violations, these techniques are critically dependent on effective multithreaded tests. Unfortunately, designing such tests is non-trivial. In this paper, we design a novel and scalable approach for synthesizing multithreaded tests that help detect atomicity violations. The input to the approach is the implementation of the library and a sequential seed testsuite that invokes every method in the library with random parameters. We analyze the execution of the sequential tests, generate variable lock dependencies and construct a set of three accesses which when interleaved suitably in a multithreaded execution can cause an atomicity violation. Subsequently, we identify pairs of method invocations that correspond to these accesses and invoke them concurrently from distinct threads with appropriate objects to help expose atomicity violations. We have incorporated these ideas in our tool, named Intruder, and applied it on multiple open-source Java multithreaded libraries. Intruder is able to synthesize 40 multithreaded tests across nine classes in less than two minutes to detect 79 harmful atomicity violations, including previously unknown violations in thread-safe classes. We also demonstrate the effectiveness of Intruder by comparing the results with other approaches designed for synthesizing multithreaded tests.
Publisher's Version Article Search

Search-Based Approaches to Testing, Repair, and Energy Optimisation

Optimizing Energy Consumption of GUIs in Android Apps: A Multi-objective Approach
Mario Linares-Vásquez, Gabriele Bavota, Carlos Eduardo Bernal Cárdenas, Rocco Oliveto, Massimiliano Di Penta, and Denys Poshyvanyk
(College of William and Mary, USA; Free University of Bolzano, Italy; University of Molise, Italy; University of Sannio, Italy)
The wide diffusion of mobile devices has motivated research towards optimizing energy consumption of software systems— including apps—targeting such devices. Besides efforts aimed at dealing with various kinds of energy bugs, the adoption of Organic Light-Emitting Diode (OLED) screens has motivated research towards reducing energy consumption by choosing an appropriate color palette. Whilst past research in this area aimed at optimizing energy while keeping an acceptable level of contrast, this paper proposes an approach, named GEMMA (Gui Energy Multi-objective optiMization for Android apps), for generating color palettes using a multi- objective optimization technique, which produces color solutions optimizing energy consumption and contrast while using consistent colors with respect to the original color palette. An empirical evaluation that we performed on 25 Android apps demonstrates not only significant improvements in terms of the three different objectives, but also confirmed that in most cases users still perceived the choices of colors as attractive. Finally, for several apps we interviewed the original developers, who in some cases expressed the intent to adopt the proposed choice of color palette, whereas in other cases pointed out directions for future improvements
Publisher's Version Article Search Info
Generating TCP/UDP Network Data for Automated Unit Test Generation
Andrea Arcuri, Gordon Fraser, and Juan Pablo Galeotti
(Scienta, Norway; University of Luxembourg, Luxembourg; University of Sheffield, UK; Saarland University, Germany)
Although automated unit test generation techniques can in principle generate test suites that achieve high code coverage, in practice this is often inhibited by the dependence of the code under test on external resources. In particular, a common problem in modern programming languages is posed by code that involves networking (e.g., opening a TCP listening port). In order to generate tests for such code, we describe an approach where we mock (simulate) the networking interfaces of the Java standard library, such that a search-based test generator can treat the network as part of the test input space. This not only has the benefit that it overcomes many limitations of testing networking code (e.g., different tests binding to the same local ports, and deterministic resolution of hostnames and ephemeral ports), it also substantially increases code coverage. An evaluation on 23,886 classes from 110 open source projects, totalling more than 6.6 million lines of Java code, reveals that network access happens in 2,642 classes (11%). Our implementation of the proposed technique as part of the EVOSUITE testing tool addresses the networking code contained in 1,672 (63%) of these classes, and leads to an increase of the average line coverage from 29.1% to 50.8%. On a manual selection of 42 Java classes heavily depending on networking, line coverage with EVOSUITE more than doubled with the use of network mocking, increasing from 31.8% to 76.6%.
Publisher's Version Article Search
Staged Program Repair with Condition Synthesis
Fan Long and Martin Rinard
(Massachusetts Institute of Technology, USA)
We present SPR, a new program repair system that combines staged program repair and condition synthesis. These techniques enable SPR to work productively with a set of parameterized transformation schemas to generate and efficiently search a rich space of program repairs. Together these techniques enable SPR to generate correct repairs for over five times as many defects as previous systems evaluated on the same benchmark set.
Publisher's Version Article Search Info

Empirical Studies of Software Developers I

When, How, and Why Developers (Do Not) Test in Their IDEs
Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman
(Delft University of Technology, Netherlands; Radboud University Nijmegen, Netherlands)
The research community in Software Engineering and Software Testing in particular builds many of its contributions on a set of mutually shared expectations. Despite the fact that they form the basis of many publications as well as open-source and commercial testing applications, these common expectations and beliefs are rarely ever questioned. For example, Frederic Brooks’ statement that testing takes half of the development time seems to have manifested itself within the community since he first made it in the “Mythical Man Month” in 1975. With this paper, we report on the surprising results of a large-scale field study with 416 software engineers whose development activity we closely monitored over the course of five months, resulting in over 13 years of recorded work time in their integrated development environments (IDEs). Our findings question several commonly shared assumptions and beliefs about testing and might be contributing factors to the observed bug proneness of software in practice: the majority of developers in our study does not test; developers rarely run their tests in the IDE; Test-Driven Development (TDD) is not widely practiced; and, last but not least, software developers only spend a quarter of their work time engineering tests, whereas they think they test half of their time.
Publisher's Version Article Search
How Developers Search for Code: A Case Study
Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum
(Google, USA; Iowa State University, USA; University of Nebraska-Lincoln, USA)
With the advent of large code repositories and sophisticated search capabilities, code search is increasingly becoming a key software development activity. In this work we shed some light into how developers search for code through a case study performed at Google, using a combination of survey and log-analysis methodologies. Our study provides insights into what developers are doing and trying to learn when per- forming a search, search scope, query properties, and what a search session under different contexts usually entails. Our results indicate that programmers search for code very frequently, conducting an average of five search sessions with 12 total queries each workday. The search queries are often targeted at a particular code location and programmers are typically looking for code with which they are somewhat familiar. Further, programmers are generally seeking answers to questions about how to use an API, what code does, why something is failing, or where code is located.
Publisher's Version Article Search
Tracing Software Developers' Eyes and Interactions for Change Tasks
Katja Kevic, Braden M. Walters, Timothy R. Shaffer, Bonita Sharif, David C. Shepherd, and Thomas Fritz
(University of Zurich, Switzerland; Youngstown State University, USA; ABB Research, USA)
What are software developers doing during a change task? While an answer to this question opens countless opportunities to support developers in their work, only little is known about developers' detailed navigation behavior for realistic change tasks. Most empirical studies on developers performing change tasks are limited to very small code snippets or are limited by the granularity or the detail of the data collected for the study. In our research, we try to overcome these limitations by combining user interaction monitoring with very fine granular eye-tracking data that is automatically linked to the underlying source code entities in the IDE. In a study with 12 professional and 10 student developers working on three change tasks from an open source system, we used our approach to investigate the detailed navigation of developers for realistic change tasks. The results of our study show, amongst others, that the eye tracking data does indeed capture different aspects than user interaction data and that developers focus on only small parts of methods that are often related by data flow. We discuss our findings and their implications for better developer tool support.
Publisher's Version Article Search Info

Testing II

Assertions Are Strongly Correlated with Test Suite Effectiveness
Yucheng Zhang and Ali Mesbah
(University of British Columbia, Canada)
Code coverage is a popular test adequacy criterion in practice. Code coverage, however, remains controversial as there is a lack of coherent empirical evidence for its relation with test suite effectiveness. More recently, test suite size has been shown to be highly correlated with effectiveness. However, previous studies treat test methods as the smallest unit of interest, and ignore potential factors influencing this relationship. We propose to go beyond test suite size, by investigating test assertions inside test methods. We empirically evaluate the relationship between a test suite’s effectiveness and the (1) number of assertions, (2) assertion coverage, and (3) different types of assertions. We compose 6,700 test suites in total, using 24,000 assertions of five real-world Java projects. We find that the number of assertions in a test suite strongly correlates with its effectiveness, and this factor directly influences the relationship between test suite size and effectiveness. Our results also indicate that assertion coverage is strongly correlated with effectiveness and different types of assertions can influence the effectiveness of their containing test suites.
Publisher's Version Article Search Info
Test Report Prioritization to Assist Crowdsourced Testing
Yang Feng, Zhenyu Chen, James A. Jones, Chunrong Fang, and Baowen Xu
(Nanjing University, China; University of California at Irvine, USA)
In crowdsourced testing, users can be incentivized to perform testing tasks and report their results, and because crowdsourced workers are often paid per task, there is a financial incentive to complete tasks quickly rather than well. These reports of the crowdsourced testing tasks are called "test reports" and are composed of simple natural language and screenshots. Back at the software-development organization, developers must manually inspect the test reports to judge their value for revealing faults. Due to the nature of crowdsourced work, the number of test reports are often difficult to comprehensively inspect and process. In order to help with this daunting task, we created the first technique of its kind, to the best of our knowledge, to prioritize test reports for manual inspection. Our technique utilizes two key strategies: (1) a diversity strategy to help developers inspect a wide variety of test reports and to avoid duplicates and wasted effort on falsely classified faulty behavior, and (2) a risk strategy to help developers identify test reports that may be more likely to be fault-revealing based on past observations. Together, these strategies form our DivRisk strategy to prioritize test reports in crowd- sourced testing. Three industrial projects have been used to evaluate the effectiveness of test report prioritization methods. The results of the empirical study show that: (1) DivRisk can significantly outperform random prioritization; (2) DivRisk can approximate the best theoretical result for a real-world industrial mobile application. In addition, we provide some practical guidelines of test report prioritization for crowdsourced testing based on the empirical study and our experiences.
Publisher's Version Article Search
Comparing and Combining Test-Suite Reduction and Regression Test Selection
August Shi, Tifany Yung, Alex Gyori, and Darko Marinov
(University of Illinois at Urbana-Champaign, USA)
Regression testing is widely used to check that changes made to software do not break existing functionality, but regression test suites grow, and running them fully can become costly. Researchers have proposed test-suite reduction and regression test selection as two approaches to reduce this cost by not running some of the tests from the test suite. However, previous research has not empirically evaluated how the two approaches compare to each other, and how well a combination of these approaches performs. We present the first extensive study that compares test-suite reduction and regression test selection approaches individually, and also evaluates a combination of the two approaches. We also propose a new criterion to measure the quality of tests with respect to software changes. Our experiments on 4,793 commits from 17 open-source projects show that regression test selection runs on average fewer tests (by 40.15pp) than test-suite reduction. However, test-suite reduction can have a high loss in fault-detection capability with respect to the changes, whereas a (safe) regression test selection has no loss. The experiments also show that a combination of the two approaches runs even fewer tests (on average 5.34pp) than regression test selection, but these tests still have a loss in fault-detection capability with respect to the changes.
Publisher's Version Article Search


Questions Developers Ask While Diagnosing Potential Security Vulnerabilities with Static Analysis
Justin Smith, Brittany Johnson, Emerson Murphy-Hill, Bill Chu, and Heather Richter Lipford
(North Carolina State University, USA; University of North Carolina at Charlotte, USA)
Security tools can help developers answer questions about potential vulnerabilities in their code. A better understanding of the types of questions asked by developers may help toolsmiths design more effective tools. In this paper, we describe how we collected and categorized these questions by conducting an exploratory study with novice and experienced software developers. We equipped them with Find Security Bugs, a security-oriented static analysis tool, and observed their interactions with security vulnerabilities in an open-source system that they had previously contributed to. We found that they asked questions not only about security vulnerabilities, associated attacks, and fixes, but also questions about the software itself, the social ecosystem that built the software, and related resources and tools. For example, when participants asked questions about the source of tainted data, their tools forced them to make imperfect tradeoffs between systematic and ad hoc program navigation strategies.
Publisher's Version Article Search Info
Quantifying Developers' Adoption of Security Tools
Jim Witschey, Olga Zielinska, Allaire Welk, Emerson Murphy-Hill, Chris Mayhorn, and Thomas Zimmermann
(North Carolina State University, USA; Microsoft Research, USA)
Security tools could help developers find critical vulnerabilities, yet such tools remain underused. We surveyed developers from 14 companies and 5 mailing lists about their reasons for using and not using security tools. The resulting thirty-nine predictors of security tool use provide both expected and unexpected insights. As we expected, developers who perceive security to be important are more likely to use security tools than those who do not. But that was not the strongest predictor of security tool use, it was instead developers' ability to observe their peers using security tools.
Publisher's Version Article Search
Auto-patching DOM-Based XSS at Scale
Inian Parameshwaran, Enrico Budianto, Shweta Shinde, Hung Dang, Atul Sadhu, and Prateek Saxena
(National University of Singapore, Singapore)
DOM-based cross-site scripting (XSS) is a client-side code injection vulnerability that results from unsafe dynamic code generation in JavaScript applications, and has few known practical defenses. We study dynamic code evaluation practices on nearly a quarter million URLs crawled starting from the the Alexa Top 1000 websites. Of 777,082 cases of dynamic HTML/JS code generation we observe, 13.3% use unsafe string interpolation for dynamic code generation — a well-known dangerous coding practice. To remedy this, we propose a technique to generate secure patches that replace unsafe string interpolation with safer code that utilizes programmatic DOM construction techniques. Our system transparently auto-patches the vulnerable site while incurring only 5.2 − 8.07% overhead. The patching mechanism requires no access to server-side code or modification to browsers, and thus is practical as a turnkey defense.
Publisher's Version Article Search Info

Configurable Systems

Performance-Influence Models for Highly Configurable Systems
Norbert Siegmund, Alexander Grebhahn, Sven Apel, and Christian Kästner
(University of Passau, Germany; Carnegie Mellon University, USA)
Almost every complex software system today is configurable. While configurability has many benefits, it challenges performance prediction, optimization, and debugging. Often, the influences of individual configuration options on performance are unknown. Worse, configuration options may interact, giving rise to a configuration space of possibly exponential size. Addressing this challenge, we propose an approach that derives a performance-influence model for a given configurable system, describing all relevant influences of configuration options and their interactions. Our approach combines machine-learning and sampling heuristics in a novel way. It improves over standard techniques in that it (1) represents influences of options and their interactions explicitly (which eases debugging), (2) smoothly integrates binary and numeric configuration options for the first time, (3) incorporates domain knowledge, if available (which eases learning and increases accuracy), (4) considers complex constraints among options, and (5) systematically reduces the solution space to a tractable size. A series of experiments demonstrates the feasibility of our approach in terms of the accuracy of the models learned as well as the accuracy of the performance predictions one can make with them.
Publisher's Version Article Search Info
Users Beware: Preference Inconsistencies Ahead
Farnaz Behrang, Myra B. Cohen, and Alessandro Orso
(Georgia Tech, USA; University of Nebraska-Lincoln, USA)
The structure of preferences for modern highly-configurable software systems has become extremely complex, usually consisting of multiple layers of access that go from the user interface down to the lowest levels of the source code. This complexity can lead to inconsistencies between layers, especially during software evolution. For example, there may be preferences that users can change through the GUI, but that have no effect on the actual behavior of the system because the related source code is not present or has been removed going from one version to the next. These inconsistencies may result in unexpected program behaviors, which range in severity from mild annoyances to more critical security or performance problems. To address this problem, we present SCIC (Software Configuration Inconsistency Checker), a static analysis technique that can automatically detect these kinds of inconsistencies. Unlike other configuration analysis tools, SCIC can handle software that (1) is written in multiple programming languages and (2) has a complex preference structure. In an empirical evaluation that we performed on 10 years worth of versions of both the widely used Mozilla Core and Firefox, SCIC was able to find 40 real inconsistencies (some determined as severe), whose lifetime spanned multiple versions, and whose detection required the analysis of code written in multiple languages.
Publisher's Version Article Search
Hey, You Have Given Me Too Many Knobs!: Understanding and Dealing with Over-Designed Configuration in System Software
Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker
(University of California at San Diego, USA; Huazhong University of Science and Technology, China; NetApp, USA)
Configuration problems are not only prevalent, but also severely impair the reliability of today's system software. One fundamental reason is the ever-increasing complexity of configuration, reflected by the large number of configuration parameters ("knobs"). With hundreds of knobs, configuring system software to ensure high reliability and performance becomes a daunting, error-prone task. This paper makes a first step in understanding a fundamental question of configuration design: "do users really need so many knobs?" To provide the quantitatively answer, we study the configuration settings of real-world users, including thousands of customers of a commercial storage system (Storage-A), and hundreds of users of two widely-used open-source system software projects. Our study reveals a series of interesting findings to motivate software architects and developers to be more cautious and disciplined in configuration design. Motivated by these findings, we provide a few concrete, practical guidelines which can significantly reduce the configuration space. Take Storage-A as an example, the guidelines can remove 51.9% of its parameters and simplify 19.7% of the remaining ones with little impact on existing users. Also, we study the existing configuration navigation methods in the context of "too many knobs" to understand their effectiveness in dealing with the over-designed configuration, and to provide practices for building navigation support in system software.
Publisher's Version Article Search Video Info


Crowd Debugging
Fuxiang Chen and Sunghun Kim
(Hong Kong University of Science and Technology, China)
Research shows that, in general, many people turn to QA sites to solicit answers to their problems. We observe in Stack Overflow a huge number of recurring questions, 1,632,590, despite mechanisms having been put into place to prevent these recurring questions. Recurring questions imply developers are facing similar issues in their source code. However, limitations exist in the QA sites. Developers need to visit them frequently and/or should be familiar with all the content to take advantage of the crowd's knowledge. Due to the large and rapid growth of QA data, it is difficult, if not impossible for developers to catch up. To address these limitations, we propose mining the QA site, Stack Overflow, to leverage the huge mass of crowd knowledge to help developers debug their code. Our approach reveals 189 warnings and 171 (90.5%) of them are confirmed by developers from eight high-quality and well-maintained projects. Developers appreciate these findings because the crowd provides solutions and comprehensive explanations to the issues. We compared the confirmed bugs with three popular static analysis tools (FindBugs, JLint and PMD). Of the 171 bugs identified by our approach, only FindBugs detected six of them whereas JLint and PMD detected none.
Publisher's Version Article Search
On the Use of Delta Debugging to Reduce Recordings and Facilitate Debugging of Web Applications
Mouna Hammoudi, Brian Burg, Gigon Bae, and Gregg Rothermel
(University of Nebraska-Lincoln, USA; University of Washington, USA)
Recording the sequence of events that lead to a failure of a web application can be an effective aid for debugging. Nevertheless, a recording of an event sequence may include many events that are not related to a failure, and this may render debugging more difficult. To address this problem, we have adapted Delta Debugging to function on recordings of web applications, in a manner that lets it identify and discard portions of those recordings that do not influence the occurrence of a failure. We present the results of three empirical studies that show that (1) recording reduction can achieve significant reductions in recording size and replay time on actual web applications obtained from developer forums, (2) reduced recordings do in fact help programmers locate faults significantly more efficiently as, and no less effectively than non-reduced recordings, and (3) recording reduction produces even greater reductions on larger, more complex applications.
Publisher's Version Article Search Info
MemInsight: Platform-Independent Memory Debugging for JavaScript
Simon Holm Jensen, Manu Sridharan, Koushik Sen, and Satish Chandra
(Snowflake Computing, USA; Samsung Research, USA; University of California at Berkeley, USA)
JavaScript programs often suffer from memory issues that can either hurt performance or eventually cause memory exhaustion. While existing snapshot-based profiling tools can be helpful, the information provided is limited to the coarse granularity at which snapshots can be taken. We present MemInsight, a tool that provides detailed, time-varying analysis of the memory behavior of JavaScript applications, including web applications. MemInsight is platform independent and runs on unmodified JavaScript engines. It employs tuned source-code instrumentation to generate a trace of memory allocations and accesses, and it leverages modern browser features to track precise information for DOM (document object model) objects. It also computes exact object lifetimes without any garbage collector assistance, and exposes this information in an easily-consumable manner for further analysis. We describe several client analyses built into MemInsight, including detection of possible memory leaks and opportunities for stack allocation and object inlining. An experimental evaluation showed that with no modifications to the runtime, MemInsight was able to expose memory issues in several real-world applications.
Publisher's Version Article Search

Web Applications

JITProf: Pinpointing JIT-Unfriendly JavaScript Code
Liang Gong, Michael Pradel, and Koushik Sen
(University of California at Berkeley, USA; TU Darmstadt, Germany)
Most modern JavaScript engines use just-in-time (JIT) compilation to translate parts of JavaScript code into efficient machine code at runtime. Despite the overall success of JIT compilers, programmers may still write code that uses the dynamic features of JavaScript in a way that prohibits profitable optimizations. Unfortunately, there currently is no way to measure how prevalent such JIT-unfriendly code is and to help developers detect such code locations. This paper presents JITProf, a profiling framework to dynamically identify code locations that prohibit profitable JIT optimizations. The key idea is to associate meta-information with JavaScript objects and code locations, to update this information whenever particular runtime events occur, and to use the meta-information to identify JIT-unfriendly operations. We use JITProf to analyze widely used JavaScript web applications and show that JIT-unfriendly code is prevalent in practice. Furthermore, we show how to use the approach as a profiling technique that finds optimization opportunities in a program. Applying the profiler to popular benchmark programs shows that refactoring these programs to avoid performance problems identified by JITProf leads to statistically significant performance improvements of up to 26.3% in 15 benchmarks.
Publisher's Version Article Search Info
Cross-Language Program Slicing for Dynamic Web Applications
Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen
(Iowa State University, USA; Carnegie Mellon University, USA)
During software maintenance, program slicing is a useful technique to assist developers in understanding the impact of their changes. While different program-slicing techniques have been proposed for traditional software systems, program slicing for dynamic web applications is challenging since the client-side code is generated from the server-side code and data entities are referenced across different languages and are often embedded in string literals in the server-side program. To address those challenges, we introduce WebSlice, an approach to compute program slices across different languages for web applications. We first identify data-flow dependencies among data entities for PHP code based on symbolic execution. We also compute SQL queries and a conditional DOM that represents client-code variations and construct the data flows for embedded languages: SQL, HTML, and JavaScript. Next, we connect the data flows across different languages and across PHP pages. Finally, we compute a program slice for a given entity based on the established data flows. Running WebSlice on five real-world, open-source PHP systems, we found that, out of 40,670 program slices, 10% cross languages, 38% cross files, and 13% cross string fragments, demonstrating the potential benefit of tool support for cross-language program slicing in dynamic web applications.
Publisher's Version Article Search
Detecting JavaScript Races That Matter
Erdal Mutlu, Serdar Tasiran, and Benjamin Livshits
(Koç University, Turkey; Microsoft Research, USA)
As JavaScript has become virtually omnipresent as the language for programming large and complex web applications in the last several years, we have seen an increase in interest in finding data races in client-side JavaScript. While JavaScript execution is single-threaded, there is still enough potential for data races, created largely by the non-determinism of the scheduler. Recently, several academic efforts have explored both static and run-time analysis approaches in an effort to find data races. However, despite this, we have not seen these analysis techniques deployed in practice and we have only seen scarce evidence that developers find and fix bugs related to data races in JavaScript. In this paper we argue for a different formulation of what it means to have a data race in a JavaScript application and distinguish between benign and harmful races, affecting persistent browser or server state. We further argue that while benign races — the subject of the majority of prior work — do exist, harmful races are exceedingly rare in practice (19 harmful vs. 621 benign). Our results shed a new light on the issues of data race prevalence and importance. To find races, we also propose a novel lightweight run-time symbolic exploration algorithm for finding races in traces of run-time execution. Our algorithm eschews schedule exploration in favor of smaller run-time overheads and thus can be used by beta testers or in crowd-sourced testing. In our experiments on 26 sites, we demonstrate that benign races are considerably more common than harmful ones.
Publisher's Version Article Search Info

Studies of Software Engineering Research and Practice

The Making of Cloud Applications: An Empirical Study on Software Development for the Cloud
Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall
(University of Zurich, Switzerland)
Cloud computing is gaining more and more traction as a deployment and provisioning model for software. While a large body of research already covers how to optimally operate a cloud system, we still lack insights into how professional software engineers actually use clouds, and how the cloud impacts development practices. This paper reports on the first systematic study on how software developers build applications for the cloud. We conducted a mixed-method study, consisting of qualitative interviews of 25 professional developers and a quantitative survey with 294 responses. Our results show that adopting the cloud has a profound impact throughout the software development process, as well as on how developers utilize tools and data in their daily work. Among other things, we found that (1) developers need better means to anticipate runtime problems and rigorously define metrics for improved fault localization and (2) the cloud offers an abundance of operational data, however, developers still often rely on their experience and intuition rather than utilizing metrics. From our findings, we extracted a set of guidelines for cloud development and identified challenges for researchers and tool vendors.
Publisher's Version Article Search
An Empirical Study of Goto in C Code from GitHub Repositories
Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Éric Tanter, Shane McIntosh, Audris Mockus, and Ahmed E. Hassan
(Rochester Institute of Technology, USA; University of Chile, Chile; Kyushu University, Japan; McGill University, Canada; University of Tennessee, USA; Queen's University, Canada)
It is nearly 50 years since Dijkstra argued that goto obscures the flow of control in program execution and urged programmers to abandon the goto statement. While past research has shown that goto is still in use, little is known about whether goto is used in the unrestricted manner that Dijkstra feared, and if it is ‘harmful’ enough to be a part of a post-release bug. We, therefore, conduct a two part empirical study - (1) qualitatively analyze a statistically rep- resentative sample of 384 files from a population of almost 250K C programming language files collected from over 11K GitHub repositories and find that developers use goto in C files for error handling (80.21±5%) and cleaning up resources at the end of a procedure (40.36 ± 5%); and (2) quantitatively analyze the commit history from the release branches of six OSS projects and find that no goto statement was re- moved/modified in the post-release phase of four of the six projects. We conclude that developers limit themselves to using goto appropriately in most cases, and not in an unrestricted manner like Dijkstra feared, thus suggesting that goto does not appear to be harmful in practice.
Publisher's Version Article Search
How Practitioners Perceive the Relevance of Software Engineering Research
David Lo, Nachiappan Nagappan, and Thomas Zimmermann
(Singapore Management University, Singapore; Microsoft Research, USA)
The number of software engineering research papers over the last few years has grown significantly. An important question here is: how relevant is software engineering research to practitioners in the field? To address this question, we conducted a survey at Microsoft where we invited 3,000 industry practitioners to rate the relevance of research ideas contained in 571 ICSE, ESEC/FSE and FSE papers that were published over a five year period. We received 17,913 ratings by 512 practitioners who labelled ideas as essential, worthwhile, unimportant, or unwise. The results from the survey suggest that practitioners are positive towards studies done by the software engineering research community: 71% of all ratings were essential or worthwhile. We found no correlation between the citation counts and the relevance scores of the papers. Through a qualitative analysis of free text responses, we identify several reasons why practitioners considered certain research ideas to be unwise. The survey approach described in this paper is lightweight: on average, a participant spent only 22.5 minutes to respond to the survey. At the same time, the results can provide useful insight to conference organizers, authors, and participating practitioners.
Publisher's Version Article Search


What Change History Tells Us about Thread Synchronization
Rui Gu, Guoliang Jin, Linhai Song, Linjie Zhu, and Shan Lu
(Columbia University, USA; North Carolina State University, USA; University of Wisconsin-Madison, USA; University of Chicago, USA)
Multi-threaded programs are pervasive, yet difficult to write. Missing proper synchronization leads to correctness bugs and over synchronization leads to performance problems. To improve the correctness and efficiency of multi-threaded software, we need a better understanding of synchronization challenges faced by real-world developers. This paper studies the code repositories of open-source multi-threaded software projects to obtain a broad and in- depth view of how developers handle synchronizations. We first examine how critical sections are changed when software evolves by checking over 250,000 revisions of four representative open-source software projects. The findings help us answer questions like how often synchronization is an afterthought for developers; whether it is difficult for devel- opers to decide critical section boundaries and lock variables; and what are real-world over-synchronization problems. We then conduct case studies to better understand (1) how critical sections are changed to solve performance prob- lems (i.e. over-synchronization issues) and (2) how soft- ware changes lead to synchronization-related correctness problems (i.e. concurrency bugs). This in-depth study shows that tool support is needed to help developers tackle over-synchronization problems; it also shows that concur- rency bug avoidance, detection, and testing can be improved through better awareness of code revision history.
Publisher's Version Article Search
Finding Schedule-Sensitive Branches
Jeff Huang and Lawrence Rauchwerger
(Texas A&M University, USA)
This paper presents an automated, precise technique, TAME, for identifying schedule-sensitive branches (SSBs) in concurrent programs, i.e., branches whose decision may vary depending on the actual scheduling of concurrent threads. The technique consists of 1) tracing events at fine-grained level; 2) deriving the constraints for each branch; and 3) invoking an SMT solver to find possible SSB, by trying to solve the negated branch condition. To handle the infeasibly huge number of computations that would be generated by the fine-grained tracing, TAME leverages concolic execution and implements several sound approximations to delimit the number of traces to analyse, yet without sacrificing precision. In addition, TAME implements a novel distributed trace partition approach distributing the analysis into smaller chunks. Evaluation on both popular benchmarks and real applications shows that TAME is effective in finding SSBs and has good scalability. TAME found a total of 34 SSBs, among which 17 are related to concurrency errors, and 9 are ad hoc synchronizations.
Publisher's Version Article Search
Effective and Precise Dynamic Detection of Hidden Races for Java Programs
Yan Cai and Lingwei Cao
(Institute of Software at Chinese Academy of Sciences, China)
Happens-before relation is widely used to detect data races dynami-cally. However, it could easily hide many data races as it is inter-leaving sensitive. Existing techniques based on randomized sched-uling are ineffective on detecting these hidden races. In this paper, we propose DrFinder, an effective and precise dynamic technique to detect hidden races. Given an execution, DrFinder firstly analyz-es the lock acquisitions in it and collects a set of "may-trigger" relations. Each may-trigger relation consists of a method and a type of a Java object. It indicates that, during execution, the method may directly or indirectly acquire a lock of the type. In the subsequent executions of the same program, DrFinder actively schedules the execution according to the set of collected may-trigger relations. It aims to reverse the set of happens-before relation that may exist in the previous executions so as to expose those hidden races. To effectively detect hidden races in each execution, DrFinder also collects a new set of may-trigger relation during its scheduling, which is used in its next scheduling. Our experiment on a suite of real-world Java multithreaded programs shows that DrFinder is effective to detect 89 new data races in 10 runs. Many of these races could not be detected by existing techniques (i.e., FastTrack, ConTest, and PCT) even in 100 runs.
Publisher's Version Article Search

Program Analysis I

A User-Guided Approach to Program Analysis
Ravi Mangal, Xin Zhang, Aditya V. Nori, and Mayur Naik
(Georgia Tech, USA; Microsoft Research, UK)
Program analysis tools often produce undesirable output due to various approximations. We present an approach and a system EUGENE that allows user feedback to guide such approximations towards producing the desired output. We formulate the problem of user-guided program analysis in terms of solving a combination of hard rules and soft rules: hard rules capture soundness while soft rules capture degrees of approximations and preferences of users. Our technique solves the rules using an off-the-shelf solver in a manner that is sound (satisfies all hard rules), optimal (maximally satisfies soft rules), and scales to real-world analyses and programs. We evaluate EUGENE on two different analyses with labeled output on a suite of seven Java programs of size 131–198 KLOC. We also report upon a user study involving nine users who employ EUGENE to guide an information-flow analysis on three Java micro-benchmarks. In our experiments, EUGENE significantly reduces misclassified reports upon providing limited amounts of feedback.
Publisher's Version Article Search
Hidden Truths in Dead Software Paths
Michael Eichberg, Ben Hermann, Mira Mezini, and Leonid Glanz
(TU Darmstadt, Germany)
Approaches and techniques for statically finding a multitude of issues in source code have been developed in the past. A core property of these approaches is that they are usually targeted towards finding only a very specific kind of issue and that the effort to develop such an analysis is significant. This strictly limits the number of kinds of issues that can be detected. In this paper, we discuss a generic approach based on the detection of infeasible paths in code that can discover a wide range of code smells ranging from useless code that hinders comprehension to real bugs. Code issues are identified by calculating the difference between the control-flow graph that contains all technically possible edges and the corresponding graph recorded while performing a more precise analysis using abstract interpretation. We have evaluated the approach using the Java Development Kit as well as the Qualitas Corpus (a curated collection of over 100 Java Applications) and were able to find thousands of issues across a wide range of categories.
Publisher's Version Article Search Info
P3: Partitioned Path Profiling
Mohammed Afraz, Diptikalyan Saha, and Aditya Kanade
(Indian Institute of Science, India; IBM Research, India)
Acyclic path profile is an abstraction of dynamic control flow paths of procedures and has been found to be useful in a wide spectrum of activities. Unfortunately, the runtime overhead of obtaining such a profile can be high, limiting its use in practice. In this paper, we present partitioned path profiling (P3) which runs K copies of the program in parallel, each with the same input but on a separate core, and collects the profile only for a subset of intra-procedural paths in each copy, thereby, distributing the overhead of profiling. P3 identifies “profitable” procedures and assigns disjoint subsets of paths of a profitable procedure to different copies for profiling. To obtain exact execution frequencies of a subset of paths, we design a new algorithm, called PSPP. All paths of an unprofitable procedure are assigned to the same copy. P3 uses the classic Ball-Larus algorithm for profiling unprofitable procedures. Further, P3 attempts to evenly distribute the profiling overhead across the copies. To the best of our knowledge, P3 is the first algorithm for parallel path profiling. We have applied P3 to profile several programs in the SPEC 2006 benchmark. Compared to sequential profiling, P3 substantially reduced the runtime overhead on these programs averaged across all benchmarks. The reduction was 23%, 43% and 56% on average for 2, 4 and 8 cores respectively. P3 also performed better than a coarse-grained approach that treats all procedures as unprofitable and distributes them across available cores. For 2 cores, the profiling overhead of P3 was on average 5% less compared to the coarse-grained approach across these programs. For 4 and 8 cores, it was respectively 18% and 25% less.
Publisher's Version Article Search

Prediction and Recommendation

Heterogeneous Cross-Company Defect Prediction by Unified Metric Representation and CCA-Based Transfer Learning
Xiaoyuan Jing, Fei Wu, Xiwei Dong, Fumin Qi, and Baowen Xu
(Wuhan University, China; Nanjing University of Posts and Telecommunications, China; Nanjing University, China)
Cross-company defect prediction (CCDP) learns a prediction model by using training data from one or multiple projects of a source company and then applies the model to the target company data. Existing CCDP methods are based on the assumption that the data of source and target companies should have the same software metrics. However, for CCDP, the source and target company data is usually heterogeneous, namely the metrics used and the size of metric set are different in the data of two companies. We call CCDP in this scenario as heterogeneous CCDP (HCCDP) task. In this paper, we aim to provide an effective solution for HCCDP. We propose a unified metric representation (UMR) for the data of source and target companies. The UMR consists of three types of metrics, i.e., the common metrics of the source and target companies, source-company specific metrics and target-company specific metrics. To construct UMR for source company data, the target-company specific metrics are set as zeros, while for UMR of the target company data, the source-company specific metrics are set as zeros. Based on the unified metric representation, we for the first time introduce canonical correlation analysis (CCA), an effective transfer learning method, into CCDP to make the data distributions of source and target companies similar. Experiments on 14 public heterogeneous datasets from four companies indicate that: 1) for HCCDP with partially different metrics, our approach significantly outperforms state-of-the-art CCDP methods; 2) for HCCDP with totally different metrics, our approach obtains comparable prediction performances in contrast with within-project prediction results. The proposed approach is effective for HCCDP.
Publisher's Version Article Search
Heterogeneous Defect Prediction
Jaechang Nam and Sunghun Kim
(Hong Kong University of Science and Technology, China)
Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect prediction (WPDP). Researchers also proposed cross-project defect prediction (CPDP) to predict defects for new projects lacking in defect data by using prediction models built by other projects. In recent studies, CPDP is proved to be feasible. However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. As a result, current techniques for CPDP are difficult to apply across projects with heterogeneous metric sets. To address the limitation, we propose heterogeneous defect prediction (HDP) to predict defects across projects with heterogeneous metric sets. Our HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Our empirical study on 28 subjects shows that about 68% of predictions using our approach outperform or are comparable to WPDP with statistical significance.
Publisher's Version Article Search
Clone-Based and Interactive Recommendation for Modifying Pasted Code
Yun Lin, Xin Peng, Zhenchang Xing, Diwen Zheng, and Wenyun Zhao
(Fudan University, China; Nanyang Technological University, Singapore)
Developers often need to modify pasted code when programming with copy-and-paste practice. Some modifications on pasted code could involve lots of editing efforts, and any missing or wrong edit could incur bugs. In this paper, we propose a clone-based and interactive approach to recommending where and how to modify the pasted code. In our approach, we regard clones of the pasted code as the results of historical copy-and-paste operations and their differences as historical modifications on the same piece of code. Our approach first retrieves clones of the pasted code from a clone repository and detects syntactically complete differences among them. Then our approach transfers each clone difference into a modification slot on the pasted code, suggests options for each slot, and further mines modifying regulations from the clone differences. Based on the mined modifying regulations, our approach dynamically updates the suggested options and their ranking in each slot according to developer's modifications on the pasted code. We implement a proof-of-concept tool CCDemon based on our approach and evaluate its effectiveness based on code clones detected from five open source projects. The results show that our approach can identify 96.9% of the to-be-modified positions in pasted code and suggest 75.0% of the required modifications. Our human study further confirms that CCDemon can help developers to accomplish their modifications of pasted code more efficiently.
Publisher's Version Article Search

Program Repair

Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair
Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun
(University of Massachusetts at Amherst, USA; University College London, UK; Carnegie Mellon University, USA; University of Massachusetts, USA)
Automated program repair has shown promise for reducing the significant manual effort debugging requires. This paper addresses a deficit of earlier evaluations of automated repair techniques caused by repairing programs and evaluating generated patches' correctness using the same set of tests. Since tests are an imperfect metric of program correctness, evaluations of this type do not discriminate between correct patches and patches that overfit the available tests and break untested but desired functionality. This paper evaluates two well-studied repair tools, GenProg and TrpAutoRepair, on a publicly available benchmark of bugs, each with a human-written patch. By evaluating patches using tests independent from those used during repair, we find that the tools are unlikely to improve the proportion of independent tests passed, and that the quality of the patches is proportional to the coverage of the test suite used during repair. For programs that pass most tests, the tools are as likely to break tests as to fix them. However, novice developers also overfit, and automated repair performs no worse than these developers. In addition to overfitting, we measure the effects of test suite coverage, test suite provenance, and starting program quality, as well as the difference in quality between novice-developer-written and tool-generated patches when quality is assessed with a test suite independent from the one used for patch generation.
Publisher's Version Article Search
Responsive Designs in a Snap
Nishant Sinha and Rezwana Karim
(IBM Research, India; Rutgers University, USA)
With the massive adoption of mobile devices with different form- factors, UI designers face the challenge of designing responsive UIs which are visually appealing across a wide range of devices. De- signing responsive UIs requires a deep knowledge of HTML/CSS as well as responsive patterns - juggling through various design configurations and re-designing for multiple devices is laborious and time-consuming. We present DECOR, a recommendation tool for creating multi-device responsive UIs. Given an initial UI de- sign, user-specified design constraints and a list of devices, DECOR provides ranked, device-specific recommendations to the designer for approval. Design space exploration involves a combinatorial explosion: we formulate it as a design repair problem and devise several design space pruning techniques to enable efficient repair. An evaluation over real-life designs shows that DECOR is able to compute the desired recommendations, involving a variety of responsive design patterns, in less than a minute.
Publisher's Version Article Search
CLOTHO: Saving Programs from Malformed Strings and Incorrect String-Handling
Aritra Dhar, Rahul Purandare, Mohan Dhawan, and Suresh Rangaswamy
(Xerox Research Center, India; IIIT Delhi, India; IBM Research, India)
Software is susceptible to malformed data originating from untrusted sources. Occasionally the programming logic or constructs used are inappropriate to handle the varied constraints imposed by legal and well-formed data. Consequently, softwares may produce unexpected results or even crash. In this paper, we present CLOTHO, a novel hybrid approach that saves such softwares from crashing when failures originate from malformed strings or inappropriate handling of strings. CLOTHO statically analyses a program to identify statements that are vulnerable to failures related to associated string data. CLOTHO then generates patches that are likely to satisfy constraints on the data, and in case of failures produces program behavior which would be close to the expected. The precision of the patches is improved with the help of a dynamic analysis. We have implemented CLOTHO for the JAVA String API, and our evaluation based on several popular open-source libraries shows that CLOTHO generates patches that are semantically similar to the patches generated by the programmers in the later versions. Additionally, these patches are activated only when a failure is detected, and thus CLOTHO incurs no runtime overhead during normal execution, and negligible overhead in case of failures.
Publisher's Version Article Search Info

Information Retrieval

Query-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks
Laura Moreno, Gabriele Bavota, Sonia Haiduc, Massimiliano Di Penta, Rocco Oliveto, Barbara Russo, and Andrian Marcus
(University of Texas at Dallas, USA; Free University of Bolzano, Italy; Florida State University, USA; University of Sannio, Italy; University of Molise, Italy)
Text Retrieval (TR) approaches have been used to leverage the textual information contained in software artifacts to address a multitude of software engineering (SE) tasks. However, TR approaches need to be configured properly in order to lead to good results. Current approaches for automatic TR configuration in SE configure a single TR approach and then use it for all possible queries. In this paper, we show that such a configuration strategy leads to suboptimal results, and propose QUEST, the first approach bringing TR configuration selection to the query level. QUEST recommends the best TR configuration for a given query, based on a supervised learning approach that determines the TR configuration that performs the best for each query according to its properties. We evaluated QUEST in the context of feature and bug localization, using a data set with more than 1,000 queries. We found that QUEST is able to recommend one of the top three TR configurations for a query with a 69% accuracy, on average. We compared the results obtained with the configurations recommended by QUEST for every query with those obtained using a single TR configuration for all queries in a system and in the entire data set. We found that using QUEST we obtain better results than with any of the considered TR configurations.
Publisher's Version Article Search Info
Information Retrieval and Spectrum Based Bug Localization: Better Together
Tien-Duy B. Le, Richard J. Oentaryo, and David Lo
(Singapore Management University, Singapore)
Debugging often takes much effort and resources. To help developers debug, numerous information retrieval (IR)-based and spectrum-based bug localization techniques have been proposed. IR-based techniques process textual information in bug reports, while spectrum-based techniques process program spectra (i.e., a record of which program elements are executed for each test case). Both eventually generate a ranked list of program elements that are likely to contain the bug. However, these techniques only consider one source of information, either bug reports or program spectra, which is not optimal. To deal with the limitation of existing techniques, in this work, we propose a new multi-modal technique that considers both bug reports and program spectra to localize bugs. Our approach adaptively creates a bug-specific model to map a particular bug to its possible location, and introduces a novel idea of suspicious words that are highly associated to a bug. We evaluate our approach on 157 real bugs from four software systems, and compare it with a state-of-the-art IR-based bug localization method, a state-of-the-art spectrum-based bug localization method, and three state-of-the-art multi-modal feature location methods that are adapted for bug localization. Experiments show that our approach can outperform the baselines by at least 47.62%, 31.48%, 27.78%, and 28.80% in terms of number of bugs successfully localized when a developer inspects 1, 5, and 10 program elements (i.e., Top 1, Top 5, and Top 10), and Mean Average Precision (MAP) respectively.
Publisher's Version Article Search
Rule-Based Extraction of Goal-Use Case Models from Text
Tuong Huan Nguyen, John Grundy, and Mohamed Almorsy
(Swinburne University of Technology, Australia)
Goal and use case modeling has been recognized as a key approach for understanding and analyzing requirements. However, in practice, goals and use cases are often buried among other content in requirements specifications documents and written in unstructured styles. It is thus a time-consuming and error-prone process to identify such goals and use cases. In addition, having them embedded in natural language documents greatly limits the possibility of formally analyzing the requirements for problems. To address these issues, we have developed a novel rule-based approach to automatically extract goal and use case models from natural language requirements documents. Our approach is able to automatically categorize goals and ensure they are properly specified. We also provide automated semantic parameterization of artifact textual specifications to promote further analysis on the extracted goal-use case models. Our approach achieves 85% precision and 82% recall rates on average for model extraction and 88% accuracy for the automated parameterization.
Publisher's Version Article Search Info

Program Analysis II

Symbolic Execution of Programs with Heap Inputs
Pietro Braione, Giovanni Denaro, and Mauro Pezzè
(University of Milano-Bicocca, Italy; University of Lugano, Switzerland)
Symbolic analysis is a core component of many automatic test generation and program verication approaches. To verify complex software systems, test and analysis techniques shall deal with the many aspects of the target systems at different granularity levels. In particular, testing software programs that make extensive use of heap data structures at unit and integration levels requires generating suitable input data structures in the heap. This is a main challenge for symbolic testing and analysis techniques that work well when dealing with numeric inputs, but do not satisfactorily cope with heap data structures yet. In this paper we propose a language HEX to specify invariants of partially initialized data structures, and a decision procedure that supports the incremental evaluation of structural properties in HEX. Used in combination with the symbolic execution of heap manipulating programs, HEX prevents the exploration of invalid states, thus improving the eefficiency of program testing and analysis, and avoiding false alarms that negatively impact on verication activities. The experimental data conrm that HEX is an effective and efficient solution to the problem of testing and analyzing heap manipulating programs, and outperforms the alternative approaches that have been proposed so far.
Publisher's Version Article Search
Automatically Deriving Pointer Reference Expressions from Binary Code for Memory Dump Analysis
Yangchun Fu, Zhiqiang Lin, and David Brumley
(University of Texas at Dallas, USA; Carnegie Mellon University, USA)
Given a crash dump or a kernel memory snapshot, it is often desirable to have a capability that can traverse its pointers to locate the root cause of the crash, or check their integrity to detect the control flow hijacks. To achieve this, one key challenge lies in how to locate where the pointers are. While locating a pointer usually requires the data structure knowledge of the corresponding program, an important advance made by this work is that we show a technique of extracting address-independent data reference expressions for pointers through dynamic binary analysis. This novel pointer reference expression encodes how a pointer is accessed through the combination of a base address (usually a global variable) with certain offset and further pointer dereferences. We have applied our techniques to OS kernels, and our experimental results with a number of real world kernel malware show that we can correctly identify the hijacked kernel function pointers by locating them using the extracted pointer reference expressions when only given a memory snapshot.
Publisher's Version Article Search

Measurement and Metric

Summarizing and Measuring Development Activity
Christoph Treude, Fernando Figueira Filho, and Uirá Kulesza
(Federal University of Rio Grande do Norte, Brazil)
Software developers pursue a wide range of activities as part of their work, and making sense of what they did in a given time frame is far from trivial as evidenced by the large number of awareness and coordination tools that have been developed in recent years. To inform tool design for making sense of the information available about a developer's activity, we conducted an empirical study with 156 GitHub users to investigate what information they would expect in a summary of development activity, how they would measure development activity, and what factors influence how such activity can be condensed into textual summaries or numbers. We found that unexpected events are as important as expected events in summaries of what a developer did, and that many developers do not believe in measuring development activity. Among the factors that influence summarization and measurement of development activity, we identified development experience and programming languages.
Publisher's Version Article Search Info
A Method to Identify and Correct Problematic Software Activity Data: Exploiting Capacity Constraints and Data Redundancies
Qimu Zheng, Audris Mockus, and Minghui Zhou
(Peking University, China; University of Tennessee, USA)
Mining software repositories to understand and improve software development is a common approach in research and practice. The operational data obtained from these repositories often do not faithfully represent the intended aspects of software development and, therefore, may jeopardize the conclusions derived from it. We propose an approach to identify problematic values based on the constraints of software development and to correct such values using data redundancies. We investigate the approach using issue and commit data of Mozilla project. In particular, we identified problematic data in four types of events and found the fraction of problematic values to exceed 10% and rapidly rising. We found the corrected values to be 50% closer to the most accurate estimate of task completion time. Finally, we found that the models of time until fix changed substantially when data were corrected, with the corrected data providing a 20% better fit. We discuss how the approach may be generalized to other types of operational data to increase fidelity of software measurement in practice and in research.
Publisher's Version Article Search

Patterns and Coding Convention

Gamification for Enforcing Coding Conventions
Christian R. Prause and Matthias Jarke
(DLR, Germany; RWTH Aachen University, Germany)
Software is a knowledge intensive product, which can only evolve if there is effective and efficient information exchange between developers. Complying to coding conventions improves information exchange by improving the readability of source code. However, without some form of enforcement, compliance to coding conventions is limited. We look at the problem of information exchange in code and propose gamification as a way to motivate developers to invest in compliance. Our concept consists of a technical prototype and its integration into a Scrum environment. By means of two experiments with agile software teams and subsequent surveys, we show that gamification can effectively improve adherence to coding conventions.
Publisher's Version Article Search

Mobile Applications

String Analysis for Java and Android Applications
Ding Li, Yingjun Lyu, Mian Wan, and William G. J. Halfond
(University of Southern California, USA)
String analysis is critical for many verification techniques. However, accurately modeling string variables is a challeng- ing problem. Current approaches are generally customized for certain problem domains or have critical limitations in handling loops, providing context-sensitive inter-procedural analysis, and performing efficient analysis on complicated apps. To address these limitations, we propose a general framework, Violist, for string analysis that allows researchers to more flexibly choose how they will address each of these challenges by separating the representation and interpreta- tion of string operations. In our evaluation, we show that our approach can achieve high accuracy on both Java and Android apps in a reasonable amount of time. We also com- pared our approach with a popular and widely used string analyzer and found that our approach has higher precision and shorter execution time while maintaining the same level of recall.
Publisher's Version Article Search
Auto-completing Bug Reports for Android Applications
Kevin Moran, Mario Linares-Vásquez, Carlos Bernal-Cárdenas, and Denys Poshyvanyk
(College of William and Mary, USA)
The modern software development landscape has seen a shift in focus toward mobile applications as tablets and smartphones near ubiquitous adoption. Due to this trend, the complexity of these “apps” has been increasing, making development and maintenance challenging. Additionally, current bug tracking systems are not able to effectively support construction of reports with actionable information that directly lead to a bug’s resolution. To address the need for an improved reporting system, we introduce a novel solution, called FUSION, that helps users auto-complete reproduction steps in bug reports for mobile apps. FUSION links user-provided information to program artifacts extracted through static and dynamic analysis performed before testing or release. The approach that FUSION employs is generalizable to other current mobile software platforms, and constitutes a new method by which off-device bug reporting can be conducted for mobile software projects. In a study involving 28 participants we applied FUSION to support the maintenance tasks of reporting and reproducing defects from 15 real-world bugs found in 14 open source Android apps while qualitatively and qualitatively measuring the user experience of the system. Our results demonstrate that FUSION both effectively facilitates reporting and allows for more reliable reproduction of bugs from reports compared to traditional issue tracking systems by presenting more detailed contextual app information.
Publisher's Version Article Search Video Info
CLAPP: Characterizing Loops in Android Applications
Yanick Fratantonio, Aravind Machiry, Antonio Bianchi, Christopher Kruegel, and Giovanni Vigna
(University of California at Santa Barbara, USA)
When performing program analysis, loops are one of the most important aspects that needs to be taken into account. In the past, many approaches have been proposed to analyze loops to perform different tasks, ranging from compiler optimizations to Worst-Case Execution Time (WCET) analysis. While these approaches are powerful, they focus on tackling very specific categories of loops and known loop patterns, such as the ones for which the number of iterations can be statically determined. In this work, we developed a static analysis framework to characterize and analyze generic loops, without relying on techniques based on pattern matching. For this work, we focus on the Android platform, and we implemented a prototype, called CLAPP, that we used to perform the first large-scale empirical study of the usage of loops in Android applications. In particular, we used our tool to analyze a total of 4,110,510 loops found in 11,823 Android applications. As part of our evaluation, we provide the detailed results of our empirical study, we show how our analysis was able to determine that the execution of 63.28% of the loops is bounded, and we discuss several interesting insights related to the performance issues and security aspects associated with loops.
Publisher's Version Article Search Info

Search, Synthesis, and Verification

TLV: Abstraction through Testing, Learning, and Validation
Jun Sun, Hao Xiao, Yang Liu, Shang-Wei Lin, and Shengchao Qin
(Singapore University of Technology and Design, Singapore; Nanyang Technological University, Singapore; Teesside University, UK; Shenzhen University, China)
A (Java) class provides a service to its clients (i.e., programs which use the class). The service must satisfy certain specifications. Different specifications might be expected at different levels of abstraction depending on the client's objective. In order to effectively contrast the class against its specifications, whether manually or automatically, one essential step is to automatically construct an abstraction of the given class at a proper level of abstraction. The abstraction should be correct (i.e., over-approximating) and accurate (i.e., with few spurious traces). We present an automatic approach, which combines testing, learning, and validation, to constructing an abstraction. Our approach is designed such that a large part of the abstraction is generated based on testing and learning so as to minimize the use of heavy-weight techniques like symbolic execution. The abstraction is generated through a process of abstraction/refinement, with no user input, and converges to a specific level of abstraction depending on the usage context. The generated abstraction is guaranteed to be correct and accurate. We have implemented the proposed approach in a toolkit named TLV and evaluated TLV with a number of benchmark programs as well as three real-world ones. The results show that TLV generates abstraction for program analysis and verification more efficiently.
Publisher's Version Article Search Info
Mimic: Computing Models for Opaque Code
Stefan Heule, Manu Sridharan, and Satish Chandra
(Stanford University, USA; Samsung Research, USA)
Opaque code, which is executable but whose source is unavailable or hard to process, can be problematic in a number of scenarios, such as program analysis. Manual construction of models is often used to handle opaque code, but this process is tedious and error-prone. (In this paper, we use model to mean a representation of a piece of code suitable for program analysis.) We present a novel technique for automatic generation of models for opaque code, based on program synthesis. The technique intercepts memory accesses from the opaque code to client objects, and uses this information to construct partial execution traces. Then, it performs a heuristic search inspired by Markov Chain Monte Carlo techniques to discover an executable code model whose behavior matches the opaque code. Native execution, parallelization, and a carefully-designed fitness function are leveraged to increase the effectiveness of the search. We have implemented our technique in a tool Mimic for discovering models of opaque JavaScript functions, and used Mimic to synthesize correct models for a variety of array-manipulating routines.
Publisher's Version Article Search Info Artifacts Available
Witness Validation and Stepwise Testification across Software Verifiers
Dirk Beyer, Matthias Dangl, Daniel Dietsch, Matthias Heizmann, and Andreas Stahlbauer
(University of Passau, Germany; University of Freiburg, Germany)
It is commonly understood that a verification tool should provide a counterexample to witness a specification violation. Until recently, software verifiers dumped error witnesses in proprietary formats, which are often neither human- nor machine-readable, and an exchange of witnesses between different verifiers was impossible. To close this gap in software-verification technology, we have defined an exchange format for error witnesses that is easy to write and read by verification tools (for further processing, e.g., witness validation) and that is easy to convert into visualizations that conveniently let developers inspect an error path. To eliminate manual inspection of false alarms, we develop the notion of stepwise testification: in a first step, a verifier finds a problematic program path and, in addition to the verification result FALSE, constructs a witness for this path; in the next step, another verifier re-verifies that the witness indeed violates the specification. This process can have more than two steps, each reducing the state space around the error path, making it easier to validate the witness in a later step. An obvious application for testification is the setting where we have two verifiers: one that is efficient but imprecise and another one that is precise but expensive. We have implemented the technique of error-witness-driven program analysis in two state-of-the-art verification tools, CPAchecker and Ultimate Automizer, and show by experimental evaluation that the approach is applicable to a large set of verification tasks.
Publisher's Version Article Search Info Artifacts Available

Java and Object-Oriented Programming

Efficient and Reasonable Object-Oriented Concurrency
Scott West, Sebastian Nanz, and Bertrand Meyer
(Google, Switzerland; ETH Zurich, Switzerland)
Making threaded programs safe and easy to reason about is one of the chief difficulties in modern programming. This work provides an efficient execution model for SCOOP, a concurrency approach that provides not only data-race freedom but also pre/postcondition reasoning guarantees between threads. The extensions we propose influence both the underlying semantics to increase the amount of concurrent execution that is possible, exclude certain classes of deadlocks, and enable greater performance. These extensions are used as the basis of an efficient runtime and optimization pass that improve performance 15x over a baseline implementation. This new implementation of SCOOP is, on average, also 2x faster than other well-known safe concurrent languages. The measurements are based on both coordination-intensive and data-manipulation-intensive benchmarks designed to offer a mixture of workloads.
Publisher's Version Article Search
FlexJava: Language Support for Safe and Modular Approximate Programming
Jongse Park, Hadi Esmaeilzadeh, Xin Zhang, Mayur Naik, and William Harris
(Georgia Tech, USA)
Energy efficiency is a primary constraint in modern systems. Approximate computing is a promising approach that trades quality of result for gains in efficiency and performance. State- of-the-art approximate programming models require extensive manual annotations on program data and operations to guarantee safe execution of approximate programs. The need for extensive manual annotations hinders the practical use of approximation techniques. This paper describes FlexJava, a small set of language extensions, that significantly reduces the annotation effort, paving the way for practical approximate programming. These extensions enable programmers to annotate approximation-tolerant method outputs. The FlexJava compiler, which is equipped with an approximation safety analysis, automatically infers the operations and data that affect these outputs and selectively marks them approximable while giving safety guarantees. The automation and the language–compiler codesign relieve programmers from manually and explicitly an- notating data declarations or operations as safe to approximate. FlexJava is designed to support safety, modularity, generality, and scalability in software development. We have implemented FlexJava annotations as a Java library and we demonstrate its practicality using a wide range of Java applications and by con- ducting a user study. Compared to EnerJ, a recent approximate programming system, FlexJava provides the same energy savings with significant reduction (from 2× to 17×) in the number of annotations. In our user study, programmers spend 6× to 12× less time annotating programs using FlexJava than when using EnerJ.
Publisher's Version Article Search
Getting to Know You: Towards a Capability Model for Java
Ben Hermann, Michael Reif, Michael Eichberg, and Mira Mezini
(TU Darmstadt, Germany)
Developing software from reusable libraries lets developers face a security dilemma: Either be efficient and reuse libraries as they are or inspect them, know about their resource usage, but possibly miss deadlines as reviews are a time consuming process. In this paper, we propose a novel capability inference mechanism for libraries written in Java. It uses a coarse-grained capability model for system resources that can be presented to developers. We found that the capability inference agrees by 86.81% on expectations towards capabilities that can be derived from project documentation. Moreover, our approach can find capabilities that cannot be discovered using project documentation. It is thus a helpful tool for developers mitigating the aforementioned dilemma.
Publisher's Version Article Search Info

Testing III

Efficient Dependency Detection for Safe Java Test Acceleration
Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya
(Columbia University, USA; Electric Cloud, USA)
Slow builds remain a plague for software developers. The frequency with which code can be built (compiled, tested and packaged) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being built was successful. We have discovered that in the case of some languages, such as Java, the majority of build time is spent running tests, where dependencies between individual tests are complicated to discover, making many existing test acceleration techniques unsound to deploy in practice. Without knowledge of which tests are dependent on others, we cannot safely parallelize the execution of the tests, nor can we perform incremental testing (i.e., execute only a subset of an application's tests for each build). The previous techniques for detecting these dependencies did not scale to large test suites: given a test suite that normally ran in two hours, the best-case running scenario for the previous tool would have taken over 422 CPU days to find dependencies between all test methods (and would not soundly find all dependencies) — on the same project the exhaustive technique (to find all dependencies) would have taken over 1e300 years. We present a novel approach to detecting all dependencies between test cases in large projects that can enable safe exploitation of parallelism and test selection with a modest analysis cost.
Publisher's Version Article Search
Turning Programs against Each Other: High Coverage Fuzz-Testing using Binary-Code Mutation and Dynamic Slicing
Ulf Kargén and Nahid Shahmehri
(Linköping University, Sweden)
Mutation-based fuzzing is a popular and widely employed black-box testing technique for finding security and robustness bugs in software. It owes much of its success to its simplicity; a well-formed seed input is mutated, e.g. through random bit-flipping, to produce test inputs. While reducing the need for human effort, and enabling security testing even of closed-source programs with undocumented input formats, the simplicity of mutation-based fuzzing comes at the cost of poor code coverage. Often millions of iterations are needed, and the results are highly dependent on configuration parameters and the choice of seed inputs. In this paper we propose a novel method for automated generation of high-coverage test cases for robustness testing. Our method is based on the observation that, even for closed-source programs with proprietary input formats, an implementation that can generate well-formed inputs to the program is typically available. By systematically mutating the program code of such generating programs, we leverage information about the input format encoded in the generating program to produce high-coverage test inputs, capable of reaching deep states in the program under test. Our method works entirely at the machine-code level, enabling use-cases similar to traditional black-box fuzzing. We have implemented the method in our tool MutaGen, and evaluated it on 7 popular Linux programs. We found that, for most programs, our method improves code coverage by one order of magnitude or more, compared to two well-known mutation-based fuzzers. We also found a total of 8 unique bugs.
Publisher's Version Article Search
Guided Differential Testing of Certificate Validation in SSL/TLS Implementations
Yuting Chen and Zhendong Su
(Shanghai Jiao Tong University, China; University of California at Davis, USA)
Certificate validation in SSL/TLS implementations is critical for Internet security. There is recent strong effort, namely frankencert, in automatically synthesizing certificates for stress-testing certificate validation. Despite its early promise, it remains a significant challenge to generate effective test certificates as they are structurally complex with intricate syntactic and semantic constraints. This paper tackles this challenge by introducing mucert, a novel, guided technique to much more effectively test real-world certificate validation code. Our core insight is to (1) leverage easily accessible Internet certificates as seed certificates, and (2) diversify them by adapting Markov Chain Monte Carlo (MCMC) sampling. The diversified certificates are then used to reveal discrepancies, thus potential flaws, among different certificate validation implementations. We have implemented mucert and extensively evaluated it against frankencert. Our experimental results show that mucert is significantly more cost-effective than frankencert. Indeed, 1K mucerts (i.e., mucert-mutated certificates) yield three times as many distinct discrepancies as 8M frankencerts (i.e., frankencert-synthesized certificates), and 200 mucerts can achieve higher code coverage than 100,000 frankencerts. This improvement is significant as it incurs much cost to test each generated certificate. We have analyzed and reported 20+ latent discrepancies (presumably missed by frankencert), and reported an additional 357 discrepancy-triggering certificates to SSL/TLS developers, who have already confirmed some of our reported issues and are investigating causes of all the reported discrepancies. In particular, our reports have led to bug fixes, active discussions in the community, and proposed changes to relevant IETF’s RFCs. We believe that mucert is practical and effective for helping improve the robustness of SSL/TLS implementations.
Publisher's Version Article Search Info

Empirical Studies of Software Developers II

Quality and Productivity Outcomes Relating to Continuous Integration in GitHub
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov
(University of California at Davis, USA; National University of Defense Technology, China)
Software processes comprise many steps; coding is followed by building, integration testing, system testing, deployment, operations, among others. Software process integration and automation have been areas of key concern in software engineering, ever since the pioneering work of Osterweil; market pressures for Agility, and open, decentralized, software development have provided additional pressures for progress in this area. But do these innovations actually help projects? Given the numerous confounding factors that can influence project performance, it can be a challenge to discern the effects of process integration and automation. Software project ecosystems such as GitHub provide a new opportunity in this regard: one can readily find large numbers of projects in various stages of process integration and automation, and gather data on various influencing factors as well as productivity and quality outcomes. In this paper we use large, historical data on process metrics and outcomes in GitHub projects to discern the effects of one specific innovation in process automation: continuous integration. Our main finding is that continuous integration improves the productivity of project teams, who can integrate more outside contributions, without an observable diminishment in code quality.
Publisher's Version Article Search
Developer Onboarding in GitHub: The Role of Prior Social Links and Language Experience
Casey Casalnuovo, Bogdan Vasilescu, Premkumar Devanbu, and Vladimir Filkov
(University of California at Davis, USA)
The team aspects of software engineering have been a subject of great interest since early work by Fred Brooks and others: how well do people work together in teams? why do people join teams? what happens if teams are distributed? Recently, the emergence of project ecosystems such as GitHub have created an entirely new, higher level of organization. GitHub supports numerous teams; they share a common technical platform (for work activities) and a common social platform (via following, commenting, etc). We explore the GitHub evidence for socialization as a precursor to joining a project, and how the technical factors of past experience and social factors of past connections to team members of a project affect productivity both initially and in the long run. We find developers preferentially join projects in GitHub where they have pre-existing relationships; furthermore, we find that the presence of past social connections combined with prior experience in languages dominant in the project leads to higher productivity both initially and cumulatively. Interestingly, we also find that stronger social connections are associated with slightly less productivity initially, but slightly more productivity in the long run.
Publisher's Version Article Search
Impact of Developer Turnover on Quality in Open-Source Software
Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C. Murphy, and Jean-Rémy Falleri
(University of Bordeaux, France; University of British Columbia, Canada)
Turnover is the phenomenon of continuous influx and retreat of human resources in a team. Despite being well-studied in many settings, turnover has not been characterized for open-source software projects. We study the source code repositories of five open-source projects to characterize patterns of turnover and to determine the effects of turnover on software quality. We define the base concepts of both external and internal turnover, which are the mobility of developers in and out of a project, and the mobility of developers inside a project, respectively. We provide a qualitative analysis of turnover patterns. We also found, in a quantitative analysis, that the activity of external newcomers negatively impact software quality.
Publisher's Version Article Search Info

Symbolic Execution

MultiSE: Multi-path Symbolic Execution using Value Summaries
Koushik Sen, George Necula, Liang Gong, and Wontae Choi
(University of California at Berkeley, USA)
Dynamic symbolic execution (DSE) has been proposed to effectively generate test inputs for real-world programs. Unfortunately, DSE techniques do not scale well for large realistic programs, because often the number of feasible execution paths of a program increases exponentially with the increase in the length of an execution path. In this paper, we propose MultiSE, a new technique for merging states incrementally during symbolic execution, without using auxiliary variables. The key idea of MultiSE is based on an alternative representation of the state, where we map each variable, including the program counter, to a set of guarded symbolic expressions called a value summary. MultiSE has several advantages over conventional DSE and conventional state merging techniques: value summaries enable sharing of symbolic expressions and path constraints along multiple paths and thus avoid redundant execution. MultiSE does not introduce auxiliary symbolic variables, which enables it to 1) make progress even when merging values not supported by the constraint solver, 2) avoid expensive constraint solver calls when resolving function calls and jumps, and 3) carry out most operations concretely. Moreover, MultiSE updates value summaries incrementally at every assignment instruction, which makes it unnecessary to identify the join points and to keep track of variables to merge at join points. We have implemented MultiSE for JavaScript programs in a publicly available open-source tool. Our evaluation of MultiSE on several programs shows that 1) value summaries are an eective technique to take advantage of the sharing of value along multiple execution path, that 2) MultiSE can run significantly faster than traditional dynamic symbolic execution and, 3) MultiSE saves a substantial number of state merges compared to conventional state-merging techniques.
Publisher's Version Article Search
Assertion Guided Symbolic Execution of Multithreaded Programs
Shengjian Guo, Markus Kusano, Chao Wang, Zijiang Yang, and Aarti Gupta
(Virginia Tech, USA; Western Michigan University, USA; Princeton University, USA)
Symbolic execution is a powerful technique for systematic testing of sequential and multithreaded programs. However, its application is limited by the high cost of covering all feasible intra-thread paths and inter-thread interleavings. We propose a new assertion guided pruning framework that identifies executions guaranteed not to lead to an error and removes them during symbolic execution. By summarizing the reasons why previously explored executions cannot reach an error and using the information to prune redundant executions in the future, we can soundly reduce the search space. We also use static concurrent program slicing and heuristic minimization of symbolic constraints to further reduce the computational overhead. We have implemented our method in the Cloud9 symbolic execution tool and evaluated it on a large set of multithreaded C/C++ programs. Our experiments show that the new method can reduce the overall computational cost significantly.
Publisher's Version Article Search
Iterative Distribution-Aware Sampling for Probabilistic Symbolic Execution
Mateus Borges, Antonio Filieri, Marcelo D'Amorim, and Corina S. Păsăreanu
(University of Stuttgart, Germany; Federal University of Pernambuco, Brazil; Carnegie Mellon University, USA; NASA Ames Research Center, USA)
Probabilistic symbolic execution aims at quantifying the probability of reaching program events of interest assuming that program inputs follow given probabilistic distributions. The technique collects constraints on the inputs that lead to the target events and analyzes them to quantify how likely it is for an input to satisfy the constraints. Current techniques either handle only linear constraints or only support continuous distributions using a “discretization” of the input domain, leading to imprecise and costly results. We propose an iterative distribution-aware sampling approach to support probabilistic symbolic execution for arbitrarily complex mathematical constraints and continuous input distributions. We follow a compositional approach, where the symbolic constraints are decomposed into sub-problems whose solution can be solved independently. At each iteration the convergence rate of the com- putation is increased by automatically refocusing the analysis on estimating the sub-problems that mostly affect the accuracy of the results, as guided by three different ranking strategies. Experiments on publicly available benchmarks show that the proposed technique improves on previous approaches in terms of scalability and accuracy of the results.
Publisher's Version Article Search Info

New Ideas

Human Aspects of Software Engineering

Bespoke Tools: Adapted to the Concepts Developers Know
Brittany Johnson, Rahul Pandita, Emerson Murphy-Hill, and Sarah Heckman
(North Carolina State University, USA)
Even though different developers have varying levels of expertise, the tools in one developer's integrated development environment (IDE) behave the same as the tools in every other developers' IDE. In this paper, we propose the idea of automatically customizing development tools by modeling what a developer knows about software concepts. We then sketch three such ``bespoke'' tools and describe how development data can be used to infer what a developer knows about relevant concepts. Finally, we describe our ongoing efforts to make bespoke program analysis tools that customize their notifications to the developer using them.
Publisher's Version Article Search
I Heart Hacker News: Expanding Qualitative Research Findings by Analyzing Social News Websites
Titus Barik, Brittany Johnson, and Emerson Murphy-Hill
(ABB Research, USA; North Carolina State University, USA)
Grounded theory is an important research method in empirical software engineering, but it is also time consuming, tedious, and complex. This makes it difficult for researchers to assess if threats, such as missing themes or sample bias, have inadvertently materialized. To better assess such threats, our new idea is that we can automatically extract knowledge from social news websites, such as Hacker News, to easily replicate existing grounded theory research --- and then compare the results. We conduct a replication study on static analysis tool adoption using Hacker News. We confirm that even a basic replication and analysis using social news websites can offer additional insights to existing themes in studies, while also identifying new themes. For example, we identified that security was not a theme discovered in the original study on tool adoption. As a long-term vision, we consider techniques from the discipline of knowledge discovery to make this replication process more automatic.
Publisher's Version Article Search
GitSonifier: Using Sound to Portray Developer Conflict History
Kevin J. North, Shane Bolan, Anita Sarma, and Myra B. Cohen
(University of Nebraska-Lincoln, USA)
There are many tools that help software engineers analyze data about their software, projects, and teams. These tools primarily use visualizations to portray data in a concise and understandable way. However, software engineering tasks are often multi-dimensional and temporal, making some visualizations difficult to understand. An alternative for representing data, which can easily incorporate higher dimensionality and temporal information, is the use of sound. In this paper we propose the use of sonification to help portray collaborative development history. Our approach, GitSonifier, combines sound primitives to represent developers, days, and conflicts over the history of a program's development. In a formative user study on an open source project's data, we find that users can easily extract meaningful information from sound clips and differentiate users, passage of time, and development conflicts, suggesting that sonification has the potential to provide benefit in this context.
Publisher's Version Article Search
Automatically Recommending Test Code Examples to Inexperienced Developers
Raphael Pham, Yauheni Stoliar, and Kurt Schneider
(Leibniz Universität Hannover, Germany)
New graduates joining the software engineering workforce sometimes have trouble writing test code. Coming from university, they lack a hands-on approach to testing and have little experience with writing tests in a real-world setting. Software companies resort to costly training camps or mentoring initiatives. Not overcoming this lack of testing skills early on can hinder the newcomer’s professional progress in becoming a high-quality engineer. Studying open source developers, we found that they rely on a project’s pre-existing test code to learn how to write tests and adapt test code for their own use. We propose to strategically present useful and contextual test code examples from a project’s test suite to newcomers in order to facilitate learning and test writing. With an automatic suggestion mechanism for valuable test code, the newcomer is enabled to learn how senior developers write tests and copy it. Having access to suitable tests lowers the barrier for writing new tests.
Publisher's Version Article Search
Using Software Theater for the Demonstration of Innovative Ubiquitous Applications
Han Xu, Stephan Krusche, and Bernd Bruegge
(TU München, Germany)
Software development has to cope with uncertainties and changing requirements that constantly arise in the development process. Agile methods address this challenge by adopting an incremental development process and delivering working software frequently. However, current validation techniques used in sprint reviews are not sufficient for emerging applications based on ubiquitous technologies. To fill this gap, we propose a new way of demonstration called Software Theater. Based on ideas from theater plays, it aims at presenting scenario-based demonstration in a theatrical way to highlight new features, new user experience and new technical architecture in an integrated performance. We have used Software Theater in more than twenty projects and the result is overall positive.
Publisher's Version Article Search

Validation, Verification, and Testing

Behavioral Log Analysis with Statistical Guarantees
Nimrod Busany and Shahar Maoz
(Tel Aviv University, Israel)
Scalability is a major challenge for existing behavioral log analysis algorithms, which extract finite-state automaton models or temporal properties from logs generated by running systems. In this work we propose to address scalability using statistical tools. The key to our approach is to consider behavioral log analysis as a statistical experiment. Rather than analyzing the entire log, we suggest to analyze only a sample of traces from the log and, most importantly, provide means to compute statistical guarantees for the correctness of the analysis result. We present two example applications of our approach as well as initial evidence for its effectiveness.
Publisher's Version Article Search
Inner Oracles: Input-Specific Assertions on Internal States
Yingfei Xiong, Dan Hao, Lu Zhang, Tao Zhu, Muyao Zhu, and Tian Lan
(Peking University, China)
Traditional test oracles are defined on the outputs of test executions, and cannot assert internal states of executions. Traditional assertions are common to all test execution, and are usually more difficult to construct than on oracle for one test input. In this paper we propose the concept of inner oracles, which are assertions on internal states that are specific to one test input. We first motivate the necessity of inner oracles, and then show that it can be implemented easily using the available programming mechanisms. Next, we report two initial empirical studies on inner oracles, showing that inner oracles have a significant impact on both the fault-detection capability of tests and the performance of test suite reduction. Finally, we highlight the implications of inner oracles on several research and practical problems.
Publisher's Version Article Search Info
Targeted Program Transformations for Symbolic Execution
Cristian Cadar
(Imperial College London, UK)
Semantics-preserving program transformations, such as refactorings and optimisations, can have a significant impact on the effectiveness of symbolic execution testing and analysis. Furthermore, semantics-preserving transformations that increase the performance of native execution can in fact decrease the scalability of symbolic execution. Similarly, semantics-altering transformations, such as type changes and object size modifications, can often lead to substantial improvements in the testing effectiveness achieved by symbolic execution in the original program. As a result, we argue that one should treat program transformations as first-class ingredients of scalable symbolic execution, alongside widely-accepted aspects such as search heuristics and constraint solving optimisations. First, we propose to understand the impact of existing program transformations on symbolic execution, to increase scalability and improve experimental design and reproducibility. Second, we argue for the design of testability transformations specifically targeted toward more scalable symbolic execution.
Publisher's Version Article Search
Crash Reproduction via Test Case Mutation: Let Existing Test Cases Help
Jifeng Xuan, Xiaoyuan Xie, and Martin Monperrus
(Wuhan University, China; University of Lille, France; INRIA, France)
Developers reproduce crashes to understand root causes during software debugging. To reduce the manual effort by developers, automatic methods of crash reproduction generate new test cases for triggering crashes. However, due to the complex program structures, it is challenging to generate a test case to cover a specific program path. In this paper, we propose an approach to automatic crash reproduction via test case mutation, which updates existing test cases to trigger crashes rather than creating new test cases from scratch. This approach leverages major structures and objects in existing test cases and increases the chance of executing the specific path. Our preliminary result on 12 crashes in Apache Commons Collections shows that 7 crashes are reproduced by our approach of test case mutation.
Publisher's Version Article Search
RDIT: Race Detection from Incomplete Traces
Arun K. Rajagopalan and Jeff Huang
(Texas A&M University, USA)
We present RDIT, a novel dynamic algorithm to precisely detect data races in multi-threaded programs with incomplete trace information -- the presence of missing events. RDIT enhances the classical Happens-Before algorithm by relaxing the need to collect the full execution trace, while still guaranteeing full precision. The key idea behind RDIT is to abstract away the missing events by capturing the invocation data of the missing methods. This provides valuable information to approximate the possible synchronization behavior introduced by the missing events. By making the least conservative approximation that two missing methods introduce synchronization only when they access common data, RDIT guarantees to detect a maximal set of true races from the information available. We have conducted a preliminary study of RDIT on a real system and our results show that RDIT is promising; it detects no false positive when events are missed, whereas Happens-Before reports many.
Publisher's Version Article Search

Maintenance and Evolution

TACO: Test Suite Augmentation for Concurrent Programs
Tingting Yu
(University of Kentucky, USA)
The advent of multicore processors has greatly increased the prevalence of concurrent programs to achieve higher performance. As programs evolve, test suite augmentation techniques are used in regression testing to identify where new test cases are needed and then generate them. Prior work on test suite augmentation has focused on sequential software, but to date, no work has considered concurrent software systems for which regression testing is expensive due to large number of possible thread interleavings. In this paper, we present TACO, an automated test suite augmentation framework for concurrent programs in which our goal is not only to generate new inputs to exercise uncovered changed code but also to explore new thread interleavings induced by the changes. Our technique utilizes results from reuse of existing test inputs following random schedules, together with a predicative scheduling strategy and an incremental concolic testing algorithm to automatically generate new inputs that drive program through affected interleaving space so that it can effectively and efficiently validate changes that have not been exercised by existing test cases. Toward the end, we discuss several main challenges and opportunities of our approach.
Publisher's Version Article Search
Navigating through the Archipelago of Refactorings
Apostolos V. Zarras, Theofanis Vartziotis, and Panos Vassiliadis
(University of Ioannina, Greece)
The essence of refactoring is to improve software quality via the systematic combination of primitive refactorings. Yet, there are way too many refactorings. Choosing which refactorings to use, how to combine them and how to integrate them in more complex evolution tasks is really hard. Our vision is to provide the developer with a "trip advisor" for the archipelago of refactorings. The core idea of our approach is the map of the archipelago of refactorings, which identies the basic relations that guide the systematic and eective combination of refactorings. Based on the map, the trip advisor makes suggestions that allow the developer to decide how to start, assess the possible alternatives, have a clear picture of what has to be done before, during and after the refactorings and assess the possible implications.
Publisher's Version Article Search
Detecting Semantic Merge Conflicts with Variability-Aware Execution
Hung Viet Nguyen, My Huu Nguyen, Son Cuu Dang, Christian Kästner, and Tien N. Nguyen
(Iowa State University, USA; Ho Chi Minh City University of Science, Vietnam; University of Technology Sydney, Australia; Carnegie Mellon University, USA)
In collaborative software development, changes made in parallel by multiple developers may conflict. Previous research has shown that conflicts are common and occur as textual conflicts or semantic conflicts, which manifest as build or test failures. With many parallel changes, it is desirable to identify conflicts early and pinpoint the (minimum) set of changes involved. However, the costs of identifying semantic conflicts can be high because tests need to be executed on many merge scenarios. We propose Semex, a novel approach to detect semantic conflicts using variability-aware execution. We encode all parallel changes into a single program, in which "if" statements guard the alternative code fragments. Then, we run the test cases using variability-aware execution, exploring all possible concrete executions of the combined program with regard to all possible merge scenarios, while exploiting similarities among the executions to speed up the process. Variability-aware execution returns a formula describing all failing merge scenarios. In our preliminary experimental study on seven PHP programs with a total of 50 test cases and 19 semantic conflicts, Semex correctly detected all 19 conflicts.
Publisher's Version Article Search
Product Lines Can Jeopardize Their Trade Secrets
Mathieu Acher, Guillaume Bécan, Benoit Combemale, Benoit Baudry, and Jean-Marc Jézéquel
(University of Rennes 1, France; INRIA, France; IRISA, France)
What do you give for free to your competitor when you exhibit a product line? This paper addresses this question through several cases in which the discovery of trade secrets of a product line is possible and can lead to severe consequences. That is, we show that an outsider can understand the variability realization and gain either confidential business information or even some economical direct advantage. For instance, an attacker can identify hidden constraints and bypass the product line to get access to features or copyrighted data. This paper warns against possible naive modeling, implementation, and testing of variability leading to the existence of product lines that jeopardize their trade secrets. Our vision is that defensive methods and techniques should be developed to protect specifically variability – or at least further complicate the task of reverse engineering it.
Publisher's Version Article Search

Tool Demonstrations

JSketch: Sketching for Java
Jinseong Jeon, Xiaokang Qiu, Jeffrey S. Foster, and Armando Solar-Lezama
(University of Maryland, USA; Massachusetts Institute of Technology, USA)
Sketch-based synthesis, epitomized by the Sketch tool, lets developers synthesize software starting from a partial program, also called a sketch or template. This paper presents JSketch, a tool that brings sketch-based synthesis to Java. JSketch's input is a partial Java program that may include holes, which are unknown constants, expression generators, which range over sets of expressions, and class generators, which are partial classes. JSketch then translates the synthesis problem into a Sketch problem; this translation is complex because Sketch is not object-oriented. Finally, JSketch synthesizes an executable Java program by interpreting the output of Sketch.
Publisher's Version Article Search Info
Don't Panic: Reverse Debugging of Kernel Drivers
Pavel Dovgalyuk, Denis Dmitriev, and Vladimir Makarov
(Russian Academy of Sciences, Russia)
Debugging of device drivers' failures is a very tough task because of kernel panics, blue screens of death, hardware volatility, long periods of time required to expose the bug, perturbation of the drivers by the debugger, and non-determinism of multi-threaded environment. This paper shows how reverse debugging reduces the influence of these factors to the process of drivers debugging. We present reverse debugger as a practical tool, which was tested for i386, x86-64, and ARM platforms, for Windows and Linux guest operating systems. We show that our tool incurs very low overhead (about 10%), which allows using it for debugging of the time sensitive applications. The paper also presents the case study which demonstrates reverse debugging of the USB kernel drivers for Linux.
Publisher's Version Article Search Info
UMTG: A Toolset to Automatically Generate System Test Cases from Use Case Specifications
Chunhui Wang, Fabrizio Pastore, Arda Goknil, Lionel C. Briand, and Zohaib Iqbal
(University of Luxembourg, Luxembourg; National University of Computer and Emerging Sciences, Pakistan)
We present UMTG, a toolset for automatically generating executable and traceable system test cases from use case specifications. UMTG employs Natural Language Processing (NLP), a restricted form of use case specifications, and constraint solving. Use cases are expected to follow a template with restriction rules that reduce imprecision and enable NLP. NLP is used to capture the control flow implicitly described in use case specifications. Finally, to generate test input, constraint solving is applied to OCL constraints referring to the domain model of the system. UMTG is integrated with two tools that are widely adopted in industry, IBM Doors and Rhapsody. UMTG has been successfully evaluated on an industrial case study.
Publisher's Version Article Search
DexterJS: Robust Testing Platform for DOM-Based XSS Vulnerabilities
Inian Parameshwaran, Enrico Budianto, Shweta Shinde, Hung Dang, Atul Sadhu, and Prateek Saxena
(National University of Singapore, Singapore)
DOM-based cross-site scripting (XSS) is a client-side vulnerability that pervades JavaScript applications on the web, and has few known practical defenses. In this paper, we introduce DEXTERJS, a testing platform for detecting and validating DOM-based XSS vulnerabilities on web applications. DEXTERJS leverages source-to source rewriting to carry out character-precise taint tracking when executing in the browser context—thus being able to identify vulnerable information flows in a web page. By scanning a web page, DEXTERJS produces working exploits that validate DOM-based XSS vulnerability on the page. DEXTERJS is robust, has been tested on Alexa’s top 1000 sites, and has found a total of 820 distinct zero-day DOM-XSS confirmed exploits automatically.
Publisher's Version Article Search Video Info
T3i: A Tool for Generating and Querying Test Suites for Java
I. S. Wishnu B. Prasetya
(Utrecht University, Netherlands)
T3i is an automated unit-testing tool to test Java classes. To expose interactions T3i generates test-cases in the form of sequences of calls to the methods of the target class. What separates it from other testing tools is that it treats test suites as first class objects and allows users to e.g. combine, query, and filter them. With these operations, the user can construct a test suite with specific properties. Queries can be used to check correctness properties. Hoare triples, LTL formulas, and algebraic equations can be queried. T3i can be used interactively, thus facilitating more exploratory testing, as well as through a script. The familiar Java syntax can be used to control it, or alternatively one can use the much lighter Groovy syntax.
Publisher's Version Article Search Video Info
iTrace: Enabling Eye Tracking on Software Artifacts within the IDE to Support Software Engineering Tasks
Timothy R. Shaffer, Jenna L. Wise, Braden M. Walters, Sebastian C. Müller, Michael Falcone, and Bonita Sharif
(Youngstown State University, USA; University of Zurich, Switzerland)
The paper presents iTrace, an Eclipse plugin that implicitly records developers' eye movements while they work on change tasks. iTrace is the first eye tracking environment that makes it possible for researchers to conduct eye tracking studies on large software systems. An overview of the design and architecture is presented along with features and usage scenarios. iTrace is designed to support a variety of eye trackers. The design is flexible enough to record eye movements on various types of software artifacts (Java code, text/html/xml documents, diagrams), as well as IDE user interface elements. The plugin has been successfully used for software traceability tasks and program comprehension tasks. iTrace is also applicable to other tasks such as code summarization and code recommendations based on developer eye movements. A short video demonstration is available at https://youtu.be/3OUnLCX4dXo.
Publisher's Version Article Search Video Info
Nyx: A Display Energy Optimizer for Mobile Web Apps
Ding Li, Angelica Huyen Tran, and William G. J. Halfond
(University of Southern California, USA)
Energy is a critical resource for current mobile devices. In a smartphone, display is one of the most energy consuming components. Modern smartphones often use OLED screens, which consume much more energy when displaying light colors than displaying dark colors. In our previous study, we proposed a technique to reduce display energy of mo- bile web apps by changing the color scheme automatically. With this approach, we achieved a 40% reduction in display power consumption and 97% user acceptance of the new color scheme. In this tool paper, we describe Nyx, which implements our approach. Nyx is implemented as a self- contained executable file with which users can optimize en- ergy consumption of their web apps with a simple command.
Publisher's Version Article Search
NARCIA: An Automated Tool for Change Impact Analysis in Natural Language Requirements
Chetan Arora, Mehrdad Sabetzadeh, Arda Goknil, Lionel C. Briand, and Frank Zimmer
(University of Luxembourg, Luxembourg; SES TechCom, Luxembourg)
We present NARCIA, a tool for analyzing the impact of change in natural language requirements. For a given change in a requirements document, NARCIA calculates quantitative scores suggesting how likely each requirements statement in the document is to be impacted. These scores, computed using Natural Language Processing (NLP), are used for sorting the requirements statements, enabling the user to focus on statements that are most likely to be impacted. To increase the accuracy of change impact analysis, NARCIA provides a mechanism for making explicit the rationale behind changes. NARCIA has been empirically evaluated on two industrial case studies. The results of this evaluation are briefly highlighted.
Publisher's Version Article Search Info
Commit Guru: Analytics and Risk Prediction of Software Commits
Christoffer Rosen, Ben Grawi, and Emad Shihab
(Rochester Institute of Technology, USA; Concordia University, Canada)
Software quality is one of the most important research sub-areas of software engineering. Hence, a plethora of research has focused on the prediction of software quality. Much of the software analytics and prediction work has proposed metrics, models and novel approaches that can predict quality with high levels of accuracy. However, adoption of such techniques remain low; one of the reasons for this low adoption of the current analytics and prediction technique is the lack of actionable and publicly available tools. We present Commit Guru, a language agnostic analytics and prediction tool that identifies and predicts risky software commits. Commit Guru is publicly available and is able to mine any GIT SCM repository. Analytics are generated at both, the project and commit levels. In addition, Commit Guru automatically identifies risky (i.e., bug-inducing) commits and builds a prediction model that assess the likelihood of a recent commit introducing a bug in the future. Finally, to facilitate future research in the area, users of Commit Guru can download the data for any project that is processed by Commit Guru with a single click. Several large open source projects have been successfully processed using Commit Guru. Commit Guru is available online at commit.guru. Our source code is also released freely under the MIT license.
Publisher's Version Article Search Info
OSSMETER: A Software Measurement Platform for Automatically Analysing Open Source Software Projects
Davide Di Ruscio, Dimitris S. Kolovos, Ioannis Korkontzelos, Nicholas Matragkas, and Jurgen J. Vinju
(University of L'Aquila, Italy; University of York, UK; University of Manchester, UK; CWI, Netherlands)
Deciding whether an open source software (OSS) project meets the required standards for adoption in terms of quality, maturity, activity of development and user support is not a straightforward process as it involves exploring various sources of information. Such sources include OSS source code repositories, communication channels such as newsgroups, forums, and mailing lists, as well as issue tracking systems. OSSMETER is an extensible and scalable platform that can monitor and incrementally analyse a large number of OSS projects. The results of this analysis can be used to assess various aspects of OSS projects, and to directly compare different OSS projects with each other.
Publisher's Version Article Search
Comprehensive Service Matching with MatchBox
Paul Börding, Melanie Bruns, and Marie Christin Platenius
(University of Paderborn, Germany)
Nowadays, many service providers offer software components in the form of Software as a Service. Requesters that want to discover those services in order to use or to integrate them, need to find out which service satisfies their requirements best. For this purpose, service matching approaches determine how well the specifications of provided services satisfy their requirements (including structural, behavioral, and non-functional requirements). In this paper, we describe the tool-suite MatchBox that allows the integration of existing service matchers and their combination as part of flexibly configurable matching processes. Taking requirements and service specifications as an input, MatchBox is able to execute such matching processes and deliver rich matching results. In contrast to related tools, MatchBox allows users to take into account many different kinds of requirements, while it also provides the flexibility to control the matching process in many different ways.
Publisher's Version Article Search Video Info
UEDashboard: Awareness of Unusual Events in Commit Histories
Larissa Leite, Christoph Treude, and Fernando Figueira Filho
(Federal University of Rio Grande do Norte, Brazil)
To be able to respond to source code modifications with large impact or commits that necessitate further examination, developers and managers in a software development team need to be aware of anything unusual happening in their software projects. To address this need, we introduce UEDashboard, a tool which automatically detects unusual events in a commit history based on metrics and smells, and surfaces them in an event feed. Our preliminary evaluation with a team of professional software developers showed that our conceptualization of unusual correlates with developers' perceptions of task difficulty, and that UEDashboard could be useful in supporting development meetings and for pre-commit warnings.
Publisher's Version Article Search Info
MatrixMiner: A Red Pill to Architect Informal Product Descriptions in the Matrix
Sana Ben Nasr, Guillaume Bécan, Mathieu Acher, João Bosco Ferreira Filho, Benoit Baudry, Nicolas Sannier, and Jean-Marc Davril
(University of Rennes 1, France; INRIA, France; IRISA, France; University of Luxembourg, Luxembourg; University of Namur, Belgium)
Domain analysts, product managers, or customers aim to capture the important features and differences among a set of related products. A case-by-case reviewing of each product description is a laborious and time-consuming task that fails to deliver a condensed view of a product line. This paper introduces MatrixMiner: a tool for automatically synthesizing product comparison matrices (PCMs) from a set of product descriptions written in natural language. MatrixMiner is capable of identifying and organizing features and values in a PCM – despite the informality and absence of structure in the textual descriptions of products. Our empirical results of products mined from BestBuy show that the synthesized PCMs exhibit numerous quantitative, comparable information. Users can exploit MatrixMiner to visualize the matrix through a Web editor and review, refine, or complement the cell values thanks to the traceability with the original product descriptions and technical specifications.
Publisher's Version Article Search Video Info

Industry Papers


Predicting Field Reliability
Pete Rotella, Sunita Chulani, and Devesh Goyal
(Cisco Systems, USA)
The objective of the work described is to accurately predict, as early as possible in the software lifecycle, how reliably a new software release will behave in the field. The initiative is based on a set of innovative mathematical models that have consistently shown a high correlation between key in-process metrics and our primary customer experience metric, SWDPMH (Software Defects per Million Hours [usage] per Month). We have focused on the three primary dimensions of testing – incoming, fixed, and backlog bugs. All of the key predictive metrics described here are empirically-derived, and in specific quantitative terms have not previously been documented in the software engineering/quality literature. A key part of this work is the empirical determination of the precision of the measurements of the primary predictive variables, and the determination of the prediction (outcome) error. These error values enable teams to accurately gauge bug finding and fixing progress, week by week, during the primary test period.
Publisher's Version Article Search
REMI: Defect Prediction for Efficient API Testing
Mijung Kim, Jaechang Nam, Jaehyuk Yeon, Soonhwang Choi, and Sunghun Kim
(Hong Kong University of Science and Technology, China; Samsung Electronics, South Korea)
Quality assurance for common APIs is important since the the reliability of APIs affects the quality of other systems using the APIs. Testing is a common practice to ensure the quality of APIs, but it is a challenging and laborious task especially for industrial projects. Due to a large number of APIs with tight time constraints and limited resources, it is hard to write enough test cases for all APIs. To address these challenges, we present a novel technique, REMI that predicts high risk APIs in terms of producing potential bugs. REMI allows developers to write more test cases for the high risk APIs. We evaluate REMI on a real-world industrial project, Tizen-wearable, and apply REMI to the API development process at Samsung Electronics. Our evaluation results show that REMI predicts the bug-prone APIs with reasonable accuracy (0.681 f-measure on average). The results also show that applying REMI to the Tizen-wearable development process increases the number of bugs detected, and reduces the resources required for executing test cases.
Publisher's Version Article Search
OnSpot System: Test Impact Visibility during Code Edits in Real Software
Muhammad Umar Janjua
(Microsoft, USA)
For maintaining the quality of software updates to complex software products (e.g. Windows 7 OS), an extensive, broad level regression testing is conducted whenever releasing new code fixes or updates. Despite the huge cost and investment in the test infrastructure to execute these massive tests, the developer of the code fix has to wait for the regression test failures to be reported after checkin. These regression tests typically run way later from the code editing stage and consequently the developer has no test impact visibility while introducing the code changes at compile time or before checkin. We argue that it is valuable and practically feasible to tailor the entire development/testing process to provide valuable and actionable test feedback at the development/compilation stage as well. With this goal, this paper explores a system model that provides a near real-time test feedback based on regression tests while the code change is under development or as soon as it becomes compilable. OnSpot system dynamically overlays the results of tests on relevant source code lines in the development environment; thereby highlighting test failures akin to syntax failures enabling quicker correction and re-run at compile time rather than late when the damage is already done. We evaluate OnSpot system with the security fixes in Windows 7 while considering various factors like test feedback time, coverage ratio. We found out that on average nearly 40% of the automated Windows 7 regression test collateral could run under 30 seconds providing same level of coverage; thereby making OnSpot approach practically feasible and manageable during compile time
Publisher's Version Article Search

Software Process

Products, Developers, and Milestones: How Should I Build My N-Gram Language Model
Juliana Saraiva, Christian Bird, and Thomas Zimmermann
(Federal University of Pernambuco, Brazil; Microsoft Research, USA)
Recent work has shown that although programming languages enable source code to be rich and complex, most code tends to be repetitive and predictable. The use of natural language processing (NLP) techniques applied to source code such as n-gram language models show great promise in areas such as code completion, aiding impaired developers, and code search. In this paper, we address three questions related to different methods of constructing language models in an industrial context. Specifically, we ask: (1) Do application specific, but smaller language models perform better than language models across applications? (2) Are developer specific language models effective and do they differ depending on what parts of the codebase a developer is working in? (3) Finally, do language models change over time, i.e., does a language model from early development model change later on in development? The answers to these questions enable techniques that make use of programming language models in development to choose the model training corpus more effectively. We evaluate these questions by building 28 language models across developers, time periods, and applications within Microsoft Office and present the results in this paper. We find that developer and application specific language models perform better than models from the entire codebase, but that temporality has little to no effect on language model performance.
Publisher's Version Article Search
Evaluating a Formal Scenario-Based Method for the Requirements Analysis in Automotive Software Engineering
Joel Greenyer, Max Haase, Jörg Marhenke, and Rene Bellmer
(Leibniz Universität Hannover, Germany; IAV, Germany)
Automotive software systems often consist of multiple reactive components that must satisfy complex and safety-critical requirements. In automotive projects, the requirements are usually documented informally and are reviewed manually; this regularly causes inconsistencies to remain hidden until the integration phase, where their repair requires costly iterations. We therefore seek methods for the early automated requirement analysis and evaluated the scenario-based specification approach based on LSCs/MSDs; it promises to support an incremental and precise specification of requirements, and offers automated analysis through scenario execution and formal realizability checking. In a case study, we used ScenarioTools to model and analyze the requirements of a software to control a high-voltage coupling for electric vehicles. Our example contained 36 requirements and assumptions that we could successfully formalize, and we could successfully find specification defects by automated realizability checking. In this paper, we report on lessons learned, tool and method extensions we have introduced, and open challenges.
Publisher's Version Article Search
Barriers and Enablers for Shortening Software Development Lead-Time in Mechatronics Organizations: A Case Study
Mahshad M.Mahally, Miroslaw Staron, and Jan Bosch
(Volvo, Sweden; Chalmers University of Technology, Sweden; University of Gothenburg, Sweden)
The automotive industry adopts various approaches to reduce the production lead time in order to be competitive on the market. Due to the increasing amount of in-house software development, this industry gets new opportunities to decrease the software development lead-time. This can have a significant impact on decreasing time to market and fewer resources spent in projects. In this paper we present a study of software development areas where we perceived barriers for fast development and where we have identified enablers to overcome these barriers. We conducted a case study at one of the vehicle manufacturers in Sweden using structured interviews. Our results show that there are 21 barriers and 21 corresponding enablers spread over almost all phases of software development.
Publisher's Version Article Search

Requirements and Specification

Semantic Degrees for Industrie 4.0 Engineering: Deciding on the Degree of Semantic Formalization to Select Appropriate Technologies
Chih-Hong Cheng, Tuncay Guelfirat, Christian Messinger, Johannes O. Schmitt, Matthias Schnelte, and Peter Weber
(ABB Research, Germany)
Under the context of Industrie 4.0 (I4.0), future production systems provide balanced operations between manufacturing flexibility and efficiency, realized in an autonomous, horizontal, and decentralized item-level production control framework. Structured interoperability via precise formulations on an appropriate degree is crucial to achieve software engineering efficiency in the system life cycle. However, selecting the degree of formalization can be challenging, as it crucially depends on the desired common understanding (semantic degree) between multiple parties. In this paper, we categorize different semantic degrees and map a set of technologies in industrial automation to their associated degrees. Furthermore, we created guidelines to assist engineers selecting appropriate semantic degrees in their design. We applied these guidelines on publicly available scenarios to examine the validity of the approach, and identified semantic elements over internally developed use cases concerning plug-and-produce.
Publisher's Version Article Search
Towards Automating the Security Compliance Value Chain
Smita Ghaisas, Manish Motwani, Balaji Balasubramaniam, Anjali Gajendragadkar, Rahul Kelkar, and Harrick Vin
(Tata Consultancy Services, India)
Information security is of paramount importance in this digital era. While businesses strive to adopt industry-accepted system-hardening standards such as benchmarks recommended by the Center for Internet Security (CIS) to combat threats, they are confronted with an additional challenge of ever-evolving regulations that address security concerns. These create additional requirements, which must be incorporated into software systems. In this paper, we present a generic approach towards automating different activities of the Security Compliance Value Chain (SCVC) in organizations. We discuss the approach in the context of the Payment Card Industry Data Security Standard (PCI-DSS) regulations. Specifically, we present automation of (1) interpretation of PCI-DSS regulations to infer system requirements, (2) traceability of the inferred system requirements to CIS security controls (3) implementation of appropriate security controls, and finally, (4) verification and reporting of compliance.
Publisher's Version Article Search
Requirements, Architecture, and Quality in a Mission Critical System: 12 Lessons Learned
Aapo Koski and Tommi Mikkonen
(Insta DefSec, Finland; Tampere University of Technology, Finland)
Public tender processes typically start with a comprehensive specification phase, where representatives of the eventual owner of the system, usually together with a hired group of consultants, spend a considerable amount of time to determine the needs of the owner. For the company that implements the system, this setup introduces two major challenges: (1) the written down requirements can never truly describe to a person, at least to one external to the specification process, the true intent behind the requirement; (2) the vision of the future system, stemming from the original idea, will change during the specification process – over time simultaneously invalidating at least some of the requirements. This paper reflects the experiences encountered in a large-scale mission critical information system – ERICA, an information system for the emergency services in Finland – regarding design, implementation, and deployment. Based on the experiences we propose more dynamic ways of system specification, leading to simpler design, implementation, and deployment phases and finally to a better perceived quality.
Publisher's Version Article Search

Doctoral Symposium

Decentralized Self-Adaptation in Large-Scale Distributed Systems
Luca Florio
(Politecnico di Milano, Italy)
The evolution of technology is leading to a world where computational systems are made of a huge number of components spread over a logical network: these components operate in a highly dynamic and unpredictable environment, joining or leaving the system and creating connections between them at runtime. This scenario poses new challenges to software engineers that have to design and implement such complex systems. We want to address this problem, designing and developing an infrastructure, GRU, that uses self-adaptive decentralized techniques to manage large-scale distributed systems. GRU will help developers to focus on the functional part of their application instead of the needed self-adaptive infrastructure. We aim to evaluate our project with concrete case studies, providing evidence on the validity of our approach, and with the feedback provided by developers that will test our system. We believe this approach can contribute to fill the gap between the theoretical study of self-adaptive systems and their application in a production context.
Publisher's Version Article Search
Vehicle Level Continuous Integration in the Automotive Industry
Sebastian Vöst
(University of Stuttgart, Germany)
Embedded systems are omnipresent in the modern world. This naturally includes the automobile industry, where electronic functions are becoming prevalent. In the automotive domain, embedded systems today are highly distributed systems and manufactured in great numbers and variance. To ensure correct functionality, systematic integration and testing on the system level is key. In software engineering, continuous integration has been used with great success. In the automotive industry though, system tests are still performed in a big-bang integration style, which makes tracing and fixing errors very expensive and time-consuming. Thus, I want to investigate whether and how continuous integration can be applied to the automotive industry on the system level. Doing so, I present an adapted process of Continuous Integration including methods for test case specification and selection. I will apply this process as a pilot project in a production environment at BMW and evaluate the effectiveness by gathering both qualitative and quantitative data. From the gained experience, I will derive possible improvements to the process for future implementations and requirements on test hardware used for Continuous Integration.
Publisher's Version Article Search
Quantifying Architectural Debts
Lu Xiao
(Drexel University, USA)
In our prior research, we found that problematic architectural connections can propagate errors. We also found that among multiple files, the architectural connections that violate common design principles strongly correlate with the error-proneness of files. The flawed architectural connections, if not fixed properly and timely, can become debts that accumulate high interest in terms of maintenance costs over time. In this paper, we define architectural debts as clusters of files with problematic architectural connections among them, and their connections incur high maintenance costs over time. Our goal is to 1) precisely identify which and how many files are involved in architectural debts; 2) quantify the penalties of architectural debts in terms of maintenance costs; and 3) model the growth trend of penalties---maintenance costs---that accumulate due to architectural debts. We plan to provide a quantitative model for project managers and stakeholders as a reference in making decisions of whether, when and where to invest in refactoring.
Publisher's Version Article Search
User-Centric Security: Optimization of the Security-Usability Trade-Off
Denis Feth
(Fraunhofer IESE, Germany)
Security and usability are highly important and interdependent quality attributes of modern IT systems. However, it is often hard to fully meet both in practice. Security measures are complex by nature and often complicate work flows. Vice versa, insecure systems are typically not usable in practice. To tackle this, we aim at finding the best balance between usability and security in software engineering and administration. Our methodology is based on active involvement of large user groups and analyzes user feedback in order to optimize security mechanisms with respect to their user experience, with a focus on security awareness. It is applied during requirements elicitation and prototyping, and to dynamically adapt unsuited security policies at runtime.
Publisher's Version Article Search
Automated Unit Test Generation for Evolving Software
Sina Shamshiri
(University of Sheffield, UK)
As developers make changes to software programs, they want to ensure that the originally intended functionality of the software has not been affected. As a result, developers write tests and execute them after making changes. However, high quality tests are needed that can reveal unintended bugs, and not all developers have access to such tests. Moreover, since tests are written without the knowledge of future changes, sometimes new tests are needed to exercise such changes. While this problem has been well studied in the literature, the current approaches for automatically generating such tests either only attempt to reach the change and do not aim to propagate the infected state to the output, or may suffer from scalability issues, especially when a large sequence of calls is required for propagation. We propose a search-based approach that aims to automatically generate tests which can reveal functionality changes, given two versions of a program (e.g., pre-change and post-change). Developers can then use these tests to identify unintended functionality changes (i.e., bugs). Initial evaluation results show that our approach can be effective on detecting such changes, but there remain challenges in scaling up test generation and making the tests useful to developers, both of which we aim to overcome.
Publisher's Version Article Search

Student Research Competition

Increasing the Efficiency of Search-Based Unit Test Generation using Parameter Control
Thomas White
(University of Sheffield, UK)
Automatically generating test suites with high coverage is of great importance to software engineers, but this process is hindered by the vast amount of parameters the tools use to generate tests. Developers usually lack knowledge about the workings of the tools that generate test suites to set the parameters to optimal values, and the optimal values usually change during runtime. Parameter Control automatically adapts parameters during test generation, and has shown to help solve this problem in other areas. To investigate any improvements parameter control could have in search-based generation of test suites, we adapted multiple methods of controlling mutation and crossover rate in EvoSuite, a tool that automatically generates unit test suites. Upon evaluation, clear benefits to controlling parameters were found, but surprisingly, controlling some parameters can sometimes be more harmful to the search than beneficial through increased computation costs.
Publisher's Version Article Search
Enhancing Android Application Bug Reporting
Kevin Moran
(College of William and Mary, USA)
The modern software development landscape has seen a shift in focus toward mobile applications as smartphones and tablets near ubiquitous adoption. Due to this trend, the complexity of these “apps” has been increasing, making development and maintenance challenging. Current bug tracking systems do not effectively facilitate the creation of bug reports with useful information that will directly lead to a bug’s resolution. To address the need for an improved reporting system, we introduce a novel solution, called Fusion, that helps reporters auto-complete reproduction steps in bug reports for mobile apps by taking advantage of their GUI-centric nature. Fusion links information, that reporters provide, to program artifacts extracted through static and dynamic analysis performed beforehand. This allows our system to facilitate the reporting process for developers and testers, while generating more reproducible bug reports with immediately actionable information.
Publisher's Version Article Search Video Info
Improving Energy Consumption in Android Apps
Carlos Bernal-Cárdenas
(College of William and Mary, USA)
Mobile applications sometimes exhibit behaviors that can be attributed to energy bugs depending on developer implementation decisions. In other words, certain design decisions that are technically “correct” might affect the energy performance of applications. Such choices include selection of color palettes, libraries used, API usage and task scheduling order. We study the energy consumption of Android apps using a power model based on a multi-objective approach that minimizes the energy consumption, maximizes the contrast, and minimizes the distance between the chosen colors by comparing the new options to the original palette. In addition, the usage of unnecessary resources can also be a cause of energy bugs depending on whether or not these are implemented correctly. We present an opportunity for continuous investigation of energy bugs by analyzing components in the background during execution on Android applications. This includes a potential new taxonomy type that is not covered by state-of-the-art approaches.
Publisher's Version Article Search
Automated Generation of Programming Language Quizzes
Shuktika Jain
(IIIT Delhi, India)
Formation of quizzes is a vital problem as they are an important part of learning. To create a quiz on a particular topic, its related terms need to be identified for further use in extraction of questions on the topic. These terms are referred to as entities for the topic and the task of distinguishing entities from general purpose terms is termed entity discovery. We know that discussion forums and question-answer sites on software contain questions using programming terms in their posts. In this work, we mine patterns in user queries from such a forum and then automatically discover entities for programming languages using these patterns. We use these entities to extract questions related to the programming language and form automated quizzes using them.
Publisher's Version Article Search
Spotting Familiar Code Snippet Structures for Program Comprehension
Venkatesh Vinayakarao
(IIIT Delhi, India)
Developers deal with the persistent problem of understanding non-trivial code snippets. To understand the given implementation, its issues, and available choices, developers will benefit from reading relevant discussions and descriptions over the web. However, there is no easy way to know the relevant natural language terms so as to reach to such descriptions from a code snippet, especially if the documentation is inadequate and if the vocabulary used in the code is not helpful for web search. We propose an approach to solve this problem using a repository of topics and associated structurally variant snippets collected from a discussion forum. In this on-going work, we take Java methods from the code samples of three Java books, match them with the repository, and associate the topics with 76.9% precision and 66.7% recall.
Publisher's Version Article Search
Combining Eye Tracking with Navigation Paths for Identification of Cross-Language Code Dependencies
Martin Konopka
(Slovak University of Technology in Bratislava, Slovakia)
In recent years, fine-grained monitoring of software developers during software development and maintenance activities has increased in popularity, together with use of devices for eye tracking and recording developer’s biometric data. We look for everyday application of such data to support developers in their work. In this paper we discuss an approach to identify potential code dependencies in source code, even when written in different programming languages, by combining identification of areas-of-interest in source code using eye tracking with developer’s navigation paths. Our plan is to evaluate it with data of developers working on real development tasks.
Publisher's Version Article Search
A Textual Domain Specific Language for Requirement Modelling
Oyindamola Olajubu
(University of Northampton, UK)
Requirement specification is usually done with a combination of Natural Language (NL) and informal diagrams. Modeling approaches to support requirement engineering activities have involved a combination of text and graphical models. In this work, a textual domain specific modelling notation for requirement specification is presented. How certain requirement quality attributes are addressed using this notation is also demonstrated.
Publisher's Version Article Search
Automated Attack Surface Approximation
Christopher Theisen
(North Carolina State University, USA)
While software systems are being developed and released to consumers more rapidly than ever, security remains an important issue for developers. Shorter development cycles means less time for these critical security testing and review efforts. The attack surface of a system is the sum of all paths for untrusted data into and out of a system. Code that lies on the attack surface therefore contains code with actual exploitable vulnerabilities. However, identifying code that lies on the attack surface requires the same contested security resources from the secure testing efforts themselves. My research proposes an automated technique to approximate attack surfaces through the analysis of stack traces. We hypothesize that stack traces user crashes represent activity that puts the system under stress, and is therefore indicative of potential security vulnerabilities. The goal of this research is to aid software engineers in prioritizing security efforts by approximating the attack surface of a system via stack trace analysis. In a trial on Mozilla Firefox, the attack surface approximation selected 8.4% of files and contained 72.1% of known vulnerabilities. A similar trial was performed on the Windows 8 product.
Publisher's Version Article Search
Pockets: A Tool to Support Exploratory Programming for Novices and Educators
Erina Makihara
(NAIST, Japan)
Exploratory programming is one of the programming techniques, and it is considered to be an effective way to improve programming skills for novices. However, there is no existing system or programming environment educating exploratory programming for novices. Therefore, we have developed a tool, named as Pockets, to support novice's exploratory programming. Through Pockets, educators are able to identify where and when novices experience difficulties during exploratory programming. In addition, it is possible to assist educators' mentoring by referring collected logs through the proposed system. We have also conducted a case study and evaluated the usefulness of the tool. As a result, Pockets makes novices' exploratory programming more efficient, and also allows more accurate advice by educators.
Publisher's Version Article Search

proc time: 0.62