ESEC/FSE 2013 – Proceedings

A Logical Revolution (Keynote)
Moshe Y. Vardi

(Rice University, USA)
Mathematical logic was developed in an effort to provide formal foundations for mathematics. In this quest, which ultimately failed, logic begat computer science, yielding both computers and theoretical computer science. But then logic turned out to be a disappointment as foundations for computer science, as almost all decision problems in logic are either undecidable or intractable. Starting from the mid 1970s, however, there has been a quiet revolution in logic in computer science, and problems that are theoretically undecidable or intractable were shown to be quite feasible in practice. This talk describes the rise, fall, and rise of logic in computer science, describing several modern applications of logic to computing, include databases, hardware design, and software engineering.

Producing Software by Integration: Challenges and Research Directions (Keynote)
Paola Inverardi, Marco Autili, Davide Di Ruscio, Patrizio Pelliccione, and Massimo Tivoli
(University of l'Aquila, Italy)
Software is increasingly produced according to a certain goal and by integrating existing software produced by third-parties, typically black-box, and often provided without a machine readable documentation. This implies that development processes of the next future have to explicitly deal with an inherent incompleteness of information about existing software, notably on its behaviour. Therefore, on one side a software producer will less and less know the precise behaviour of a third party software service, on the other side she will need to use it to build her own application. In this paper we present an innovative development process to automatically produce dependable software systems by integrating existing services under uncertainty and according to the specied goal. Moreover, we (i) discuss important challenges that must be faced while producing the kind of systems we are targeting, (ii) give an overview of the state of art related to the identied challenges, and finally (iii) provide research directions to address these challenges.

Software Engineering for Mathematics (Keynote)
Georges Gonthier

(Microsoft Research, UK)
Since Turing, we have wanted to use computers to store, process, and check mathematics. However even with the assistance of modern software tools, the formalization of research-level mathematics remains a daunting task, not least because of the talent with which working mathematicians combine diverse theories to achieve their ends. By drawing on tools and techniques from type theory, language design, and software engineering we captured enough of these practices to formalize the proof of the Odd Order theorem, a landmark result in Group Theory, which ultimately lead to the monumental Classification of finite simple groups. This involved recasting the software component concept in the setting of higher-order, higher-kinded Type Theory to create a library of mathematical components covering most of the undergraduate Algebra and graduate Group Theory syllabus. This library then allowed us to write a formal proof comparable in size and abstraction level to the 250-page textbook proof of the Odd Order theorem.

Panels

Empirical Answers to Fundamental Software Engineering Problems (Panel)
Bertrand Meyer

, Harald Gall, Mark Harman, and Giancarlo Succi
(ETH Zurich, Switzerland; ITMO, Russia; Eiffel Software, USA; University of Zurich, Switzerland; University College London, UK; Microsoft Research, UK; Free University of Bozen, Italy)
Can the methods of empirical software engineering give us answers to the truly important open questions in the field?

A Publication Culture in Software Engineering (Panel)
Steven Fraser, Luciano Baresi, Jane Cleland-Huang, Carlo A. Furia, Georges Gonthier, Paola Inverardi, and Moshe Y. Vardi
(CISCO, USA; Politecnico di Milano, Italy; DePaul University, USA; ETH Zurich, Switzerland; Microsoft Research, UK; University of L’Aquila, USA; Rice University, USA)
This panel will discuss what characterizes the publication process in the software engineering community and debate how it serves the needs of the community, whether it is fair - e.g. valuable work gets published and mediocre work rejected - and highlight the obstacles for young scientists. The panel will conclude with a discussion on suggested next steps.

AEC Summary

Artifact Evaluation (Summary)
Alexandre Bergel and Lorenzo Bettini
(University of Chile, Chile; Università di Torino, Italy)
The artifact evaluation committee (AEC) is in charge of evaluating data and software that accompany papers accepted at the ESEC/FSE'13 research track. Authors of more than 40% of the accepted papers have submitted an artifact (22 out of the 51 accepted papers). The AEC has positively evaluated more than 50% of the submitted artifacts. 12 out of the 22 artifacts have been graded as ``met expectations'' or ``exceeded expectations''.

Info

Technical Research

Testing I

Efficiency and Early Fault Detection with Lower and Higher Strength Combinatorial Interaction Testing
Justyna Petke, Shin Yoo, Myra B. Cohen, and Mark Harman

(University College London, UK; University of Nebraska-Lincoln, USA)
Combinatorial Interaction Testing (CIT) is important because it tests the interactions between the many features and parameters that make up the configuration space of software systems. However, in order to be practically applicable, it must be able to cater for soft and hard real-world constraints and should, ideally, report a test priority order that maximises earliest fault detection. We show that we can achieve the highest strength CIT in 5.65 minutes on average. This was previously thought to be too computationally expensive to be feasible. Furthermore, we show that higher strength suites find more faults, while prioritisations using lower strengths are no worse at achieving early fault revelation.

Info

Con2colic Testing
Azadeh Farzan

, Andreas Holzer, Niloofar Razavi, and Helmut Veith
(University of Toronto, Canada; Vienna University of Technology, Austria)
In this paper, we describe (con)2colic testing - a systematic testing approach for concurrent software. Based on concrete and symbolic executions of a concurrent program, (con)2colic testing derives inputs and schedules such that the execution space of the program under investigation is systematically explored. We introduce interference scenarios as key concept in (con)2colic testing. Interference scenarios capture the flow of data among different threads and enable a unified representation of path and interference constraints. We have implemented a (con)2colic testing engine and demonstrate the effectiveness of our approach by experiments.

Boosting Concolic Testing via Interpolation
Joxan Jaffar

, Vijayaraghavan Murali, and Jorge A. Navas
(National University of Singapore, Singapore; University of Melbourne, Australia)
Concolic testing has been very successful in automatically generating test inputs for programs. However one of its major limitations is path-explosion that limits the generation of high coverage inputs. Since its inception several ideas have been proposed to attack this problem from various angles: defining search heuristics that increase coverage, caching of function summaries, pruning of paths using static/dynamic information etc.
We propose a new and complementary method based on interpolation, that greatly mitigates path-explosion by subsuming paths that can be guaranteed to not hit a bug. We discuss new challenges in using interpolation that arise specifically in the context of concolic testing. We experimentally evaluate our method with different search heuristics using Crest, a publicly available concolic tester.

Dynamic and Variable Software

Adequate Monitoring of Service Compositions
Antonia Bertolino, Eda Marchetti, and Andrea Morichetta
(ISTI-CNR, Italy)
Monitoring is essential to validate the runtime behaviour of dynamic distributed systems. However, monitors can inform of relevant events as they occur, but by their very nature they will not report about all those events that are not happening. In service-oriented applications it would be desirable to have means to assess the thoroughness of the interactions among the services that are being monitored. In case some events or message sequences or interaction patterns have not been observed for a while, in fact, one could timely check whether this happens because something is going wrong. In this paper, we introduce the novel notion of monitoring adequacy, which is generic and can be defined on different entities. We then define two adequacy criteria for service compositions and implement a proof-of-concept adequate monitoring framework. We validate the approach on two case studies, the Travel Reservation System and the Future Market choreographies.

Prediction of Atomic Web Services Reliability Based on K-Means Clustering
Marin Silic, Goran Delac, and Sinisa Srbljic
(University of Zagreb, Croatia)
Contemporary web applications are often designed as composite services built by coordinating atomic services with the aim of providing the appropriate functionality. Although functional properties of each atomic service assure correct functionality of the entire application, nonfunctional properties such as availability, reliability, or security might significantly influence the user-perceived quality of the application. In this paper, we present CLUS, a model for reliability prediction of atomic web services that improves state-of-the-art approaches used in modern recommendation systems. CLUS predicts the reliability for the ongoing service invocation using the data collected from previous invocations. We improve the accuracy of the current state-of-the-art prediction models by considering user-, service- and environment-specific parameters of the invocation context. To address the computational performance related to scalability issues, we aggregate the available previous invocation data using K-means clustering algorithm. We evaluated our model by conducting experiments on services deployed in different regions of the Amazon cloud. The evaluation results suggest that our model improves both performance and accuracy of the prediction when compared to the current state-of-the-art models.

Info

Artifact Accepted for Presentation

Scalable Analysis of Variable Software
Jörg Liebig, Alexander von Rhein, Christian Kästner

, Sven Apel, Jens Dörre, and Christian Lengauer
(University of Passau, Germany; CMU, USA)
The advent of variability management and generator technology enables users to derive individual variants from a variable code base based on a selection of desired configuration options. This approach gives rise to the generation of possibly billions of variants that, however, cannot be efficiently analyzed for errors with classic analysis techniques. To address this issue, researchers and practitioners usually apply sampling heuristics. While sampling reduces the analysis effort significantly, the information obtained is necessarily incomplete and it is unknown whether sampling heuristics scale to billions of variants. Recently, researchers have begun to develop variability-aware analyses that analyze the variable code base directly exploiting the similarities among individual variants to reduce analysis effort. However, while being promising, so far, variability-aware analyses have been applied mostly only to small academic systems. To learn about the mutual strengths and weaknesses of variability-aware and sampling-based analyses of software systems, we compared the two strategies by means of two concrete analysis implementations (type checking and liveness analysis), applied them to three subject systems: Busybox, the x86 Linux kernel, and OpenSSL. Our key finding is that variability-aware analysis outperforms most sampling heuristics with respect to analysis time while preserving completeness.

Info

Formal Reasoning

Bayesian Inference using Data Flow Analysis
Guillaume Claret, Sriram K. Rajamani

, Aditya V. Nori, Andrew D. Gordon, and Johannes Borgström
(INRIA, France; Microsoft Research, India; Microsoft Research, UK; Uppsala University, Sweden)
We present a new algorithm for Bayesian inference over probabilistic programs, based on data flow analysis techniques from the program analysis community. Unlike existing techniques for Bayesian inference on probabilistic programs, our data flow analysis algorithm is able to perform inference directly on probabilistic programs with loops. Even for loop-free programs, we show that data flow analysis offers better precision and better performance benefits over existing techniques. We also describe heuristics that are crucial for our inference to scale, and present an empirical evaluation of our algorithm over a range of benchmarks.

Second-Order Constraints in Dynamic Invariant Inference
Kaituo Li, Christoph Reichenbach, Yannis Smaragdakis

, and Michal Young

(University of Massachusetts at Amherst, USA; Goethe University Frankfurt, Germany; University of Athens, Greece; University of Oregon, USA)
The current generation of dynamic invariant detectors often produce invariants that are inconsistent with program semantics or programmer knowledge. We improve the consistency of dynamically discovered invariants by taking into account higher-level constraints. These constraints encode knowledge about invariants, even when the invariants themselves are unknown. For instance, even though the invariants describing the behavior of two functions f1 and f2 may be unknown, we may know that any valid input for f1 is also valid for f2, i.e., the precondition of f1 implies that of f2. We explore techniques for expressing and employing such consistency constraints to improve the quality of produced invariants. We further introduce techniques for dynamically discovering potential second-order constraints that the programmer can subsequently approve or reject.
Our implementation builds on the Daikon tool, with a vocabulary of constraints that the programmer can use to enhance and constrain Daikon’s inference. We show that dynamic inference of second-order constraints together with minimal human effort can significantly influence the produced (first-order) invariants even in systems of substantial size, such as the Apache Commons Collections and the AspectJ compiler. We also find that 99% of the dynamically inferred second-order constraints we sampled are true.

Z3-str: A Z3-Based String Solver for Web Application Analysis
Yunhui Zheng, Xiangyu Zhang

, and Vijay Ganesh
(Purdue University, USA; University of Waterloo, Canada)
Analyzing web applications requires reasoning about strings and non-strings cohesively. Existing string solvers either ignore non-string program behavior or support limited set of string operations. In this paper, we develop a general purpose string solver, called Z3-str, as an extension of the Z3 SMT solver through its plug-in interface. Z3-str treats strings as a primitive type, thus avoiding the inherent limitations observed in many existing solvers that encode strings in terms of other primitives. The logic of the plug-in has three sorts, namely, bool, int and string. The string-sorted terms include string constants and variables of arbitrary length, with functions such as concatenation, sub-string, and replace. The int-sorted terms are standard, with the exception of the length function over string terms. The atomic formulas are equations over string terms, and (in)-equalities over integer terms. Not only does our solver have features that enable whole program symbolic, static and dynamic analysis, but also it performs better than other solvers in our experiments. The application of Z3-str in remote code execution detection shows that its support of a wide spectrum of string operations is key to reducing false positives.

Info

Artifact Accepted for Presentation

Empirical Studies I

An Empirical Analysis of the Co-evolution of Schema and Code in Database Applications
Dong Qiu, Bixin Li, and Zhendong Su
(Southeast University, China; UC Davis, USA)
Modern database applications are among the most widely used and complex software systems. They constantly evolve, responding to changes to data, database schemas, and code. It is challenging to manage these changes and ensure that everything co-evolves consistently. For example, when a database schema is modified, all the code that interacts with the database must be changed accordingly. Although database evolution and software evolution have been extensively studied in isolation, the co-evolution of schema and code has largely been unexplored.
This paper presents the first comprehensive empirical analysis of the co-evolution of database schemas and code in ten popular large open-source database applications, totaling over 160K revisions. Our major findings include: 1) Database schemas evolve frequently during the application lifecycle, exhibiting a variety of change types with similar distributions across the studied applications; 2) Overall, schema changes induce significant code-level modifications, while certain change types have more impact on code than others; and 3) Co-change analyses can be viable to automate or assist with database application evolution. We have also observed that: 1) 80% of the schema changes happened in 20-30% of the tables, while nearly 40% of the tables did not change; and 2) Referential integrity constraints and stored procedures are rarely used in our studied subjects. We believe that our study reveals new insights into how database applications evolve and useful guidelines for designing assistive tools to aid their evolution.

Info

Automated Oracles: An Empirical Study on Cost and Effectiveness
Cu D. Nguyen, Alessandro Marchetto, and Paolo Tonella
(Fondazione Bruno Kessler, Italy)
Software testing is an effective, yet expensive, method to improve software quality. Test automation, a potential way to reduce testing cost, has received enormous research attention recently, but the so-called “oracle problem” (how to decide the PASS/FAIL outcome of a test execution) is still a major obstacle to such cost reduction. We have extensively investigated state-of-the-art works that contribute to address this problem, from areas such as specification mining and model inference. In this paper, we compare three types of automated oracles: Data invariants, Temporal invariants, and Finite State Automata. More specifically, we study the training cost and the false positive rate; we evaluate also their fault detection capability. Seven medium to large, industrial application subjects and real faults have been used in our empirical investigation.

Info

Sample Size vs. Bias in Defect Prediction
Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu
(UC Davis, USA; Universidad Politécnica de Madrid, Spain)
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significant concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more, bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUCROC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.

Parallel, Concurrent, and Distributed Systems

Finding Incorrect Compositions of Atomicity
Peng Liu, Julian Dolby

, and Charles Zhang

(Hong Kong University of Science and Technology, China; IBM Research, USA)
In object-oriented code, atomicity is ideally isolated in a library which encapsulates shared program state and provides atomic APIs for access. The library provides a convenient way for programmers to reason about the needed synchronization. However, as the library exports a limited set of APIs, it cannot satisfy every unplanned atomicity demand; therefore, clients may have to compose invocations of the library APIs to obtain new atomic functionality. This process is error-prone due to the complexity of reasoning required, hence tool support for uncovering incorrect compositions (i.e., atomic compositions that are implemented incorrectly) would be very helpful. A key difficulty is how to determine the intended atomic compositions, which are rarely documented. Existing inference techniques cannot be used to infer the atomic compositions because they cannot recognize the library and the client, which requires understanding the related program state. Even if extended to support the library/client, they lead to many false positives or false negatives because they miss the key program logic which reflects programmers’ coding paradigms for atomic compositions.
We define a new inference technique which identifies intended atomic compositions using two key symptoms based on program dependence. We then check dynamically whether these atomic compositions are implemented incorrectly as non-atomic. Evaluation on thirteen large applications shows that our approach finds around 50 previously unknown incorrect compositions. Further study on Tomcat shows that almost half (5 out of 12) of discovered incorrect compositions are confirmed as bugs by the developers. Given that Tomcat is heavily used in 250,000 sites including Linkedin.com and Ebay.com, we believe finding multiple new bugs in it automatically with relatively few false positives supports our heuristics for determining intended atomicity.

Artifact Accepted for Presentation

Tightfit: Adaptive Parallelization with Foresight
Omer Tripp and Noam Rinetzky

(Tel Aviv University, Israel)
Irregular applications often exhibit data-dependent parallelism: Different inputs, and sometimes also different execution phases, enable different levels of parallelism. These changes in available parallelism have motivated work on adaptive concurrency control mechanisms. Existing adaptation techniques mostly learn about available parallelism indirectly, through runtime monitors that detect pathologies (e.g. excessive retries in speculation or high lock contention in mutual exclusion).
We present a novel approach to adaptive parallelization, whereby the effective level of parallelism is predicted directly based on input features, rather than through circumstantial indicators over the execution environment (such as retry rate). This enables adaptation with foresight, based on the input data and not the run prefix. For this, the user specifies input features, which our system then correlates with the amount of available parallelism through offline learning. The resulting prediction rule serves in deployment runs to foresee the available parallelism for a given workload and tune the parallelization system accordingly.
We have implemented our approach in TIGHTFIT, a general framework for input-centric offline adaptation. Our experimental evaluation of TIGHTFIT over two adaptive runtime systems and eight benchmarks provides positive evidence regarding TIGHTFIT's efficacy and accuracy.

Distributed Program Tracing
Diptikalyan Saha, Pankaj Dhoolia, and Gaurab Paul
(IBM Research, India; IIT Kharagpur, India)
Dynamic program analysis techniques depend on accurate program traces. Program instrumentation is commonly used to collect these traces, which causes overhead to the program execution. Various techniques have addressed this problem by minimizing the number of probes/witnesses used to collect traces. In this paper, we present a novel distributed trace collection framework wherein, a program is executed multiple times with the same input for different sets of witnesses. The partial traces such obtained are then merged to create the whole program trace. Such divide-and-conquer strategy enables parallel collection of partial traces, thereby reducing the total time of collection. The problem is particularly challenging as arbitrary distribution of witnesses cannot guarantee correct formation of traces. We provide and prove a necessary and sufficient condition for distributing the witnesses which ensures correct formation of trace. Moreover, we describe witness distribution strategies that are suitable for parallel collection. We use the framework to collect traces of field SAP-ABAP programs using breakpoints as witnesses as instrumentation cannot be performed due to practical constraints. To optimize such collection, we extend Ball-Larus' optimal edge-based profiling algorithm to an optimal node-based algorithm. We demonstrate the effectiveness of the framework for collecting traces of SAP-ABAP programs.

Software Development Activities

Will You Still Compile Me Tomorrow? Static Cross-Version Compiler Validation
Chris Hawblitzel

, Shuvendu K. Lahiri, Kshama Pawar, Hammad Hashmi, Sedar Gokbulut, Lakshan Fernando, Dave Detlefs, and Scott Wadsworth
(Microsoft, USA)
This paper describes a cross-version compiler validator and measures its effectiveness on the CLR JIT compiler. The validator checks for semantically equivalent assembly language output from various versions of the compiler, including versions across a seven-month time period, across two architectures (x86 and ARM), across two compilation scenarios (JIT and MDIL), and across optimizations levels. For month-to-month comparisons, the validator achieves a false alarm rate of just 2.2%. To help understand reported semantic differences, the validator performs a root-cause analysis on the counterexample traces generated by the underlying automated theorem proving tools. This root-cause analysis groups most of the counterexamples into a small number of buckets, reducing the number of counterexamples analyzed by hand by anywhere from 53% to 96%. The validator ran on over 500,000 methods across a large suite of test programs, finding 12 previously unknown correctness and performance bugs in the CLR compiler.

Convergent Contemporary Software Peer Review Practices
Peter C. Rigby

and Christian Bird
(Concordia University, Canada; Microsoft Research, USA)
Software peer review is practiced on a diverse set of software projects that have drastically different settings, cultures, incentive systems, and time pressures. In an effort to characterize and understand these differences we examine two Google-led projects, Android and Chromium OS, three Microsoft projects, Bing, Office, and MS SQL, and projects internal to AMD. We contrast our findings with data taken from traditional software inspection conducted on a Lucent project and from open source software peer review on six projects, including Apache, Linux, and KDE. Our measures of interest include the review interval, the number of developers involved in review, and proxy measures for the number of defects found during review. We find that despite differences among projects, many of the characteristics of the review process have independently converged to similar values which we think indicate general principles of code review practice. We also introduce a measure of the degree to which knowledge is shared during review. This is an aspect of review practice that has traditionally only had experiential support. Our knowledge sharing measure shows that conducting peer review increases the number of distinct files a developer knows about by 66% to 150% depending on the project. This paper is one of the first studies of contemporary review in software firms and the most diverse study of peer review to date.

Do All Task Dependencies Require Coordination? The Role of Task Properties in Identifying Critical Coordination Needs in Software Projects
Kelly Blincoe, Giuseppe Valetto, and Daniela Damian
(Drexel University, USA; University of Victoria, Canada)
Several methods exist to detect the coordination needs within software teams. Evidence exists that developers’ awareness about coordination needs improves work performance. Distinguishing with certainty between critical and trivial coordination needs and identifying and prioritizing which specific tasks a pair of developers should coordinate about remains an open problem. We investigate what work dependencies should be considered when establishing coordination needs within a development team. We use our conceptualization of work dependencies named Proximity and leverage machine learning techniques to analyze what additional task properties are indicative of coordination needs. In a case study of the Mylyn project, we were able to identify from all potential coordination requirements a subset of 17% that are most critical. We define critical coordination requirements as those that can cause the most disruption to task duration when left unmanaged. These results imply that coordination awareness tools could be enhanced to make developers aware of only the coordination needs that can bring about the highest performance benefit.

Testing II

Dynodroid: An Input Generation System for Android Apps
Aravind Machiry, Rohan Tahiliani, and Mayur Naik
(Georgia Tech, USA)
We present a system Dynodroid for generating relevant inputs to unmodified Android apps. Dynodroid views an app as an event-driven program that interacts with its environment by means of a sequence of events through the Android framework. By instrumenting the framework once and for all, Dynodroid monitors the reaction of an app upon each event in a lightweight manner, using it to guide the generation of the next event to the app. Dynodroid also allows interleaving events from machines, which are better at generating a large number of simple inputs, with events from humans, who are better at providing intelligent inputs.
We evaluated Dynodroid on 50 open-source Android apps, and compared it with two prevalent approaches: users manually exercising apps, and Monkey, a popular fuzzing tool. Dynodroid, humans, and Monkey covered 55%, 60%, and 53%, respectively, of each app's Java source code on average. Monkey took 20X more events on average than Dynodroid. Dynodroid also found 9 bugs in 7 of the 50 apps, and 6 bugs in 5 of the top 1,000 free apps on Google Play.

Info

Distinguished Artifact

KATCH: High-Coverage Testing of Software Patches
Paul Dan Marinescu and Cristian Cadar

(Imperial College London, UK)
One of the distinguishing characteristics of software systems is that they evolve: new patches are committed to software repositories and new versions are released to users on a continuous basis. Unfortunately, many of these changes bring unexpected bugs that break the stability of the system or affect its security. In this paper, we address this problem using a technique for automatically testing code patches. Our technique combines symbolic execution with several novel heuristics based on static and dynamic program analysis which allow it to quickly reach the code of the patch. We have implemented our approach in a tool called KATCH, which we have applied to all the patches written in a combined period of approximately six years for nineteen mature programs from the popular GNU diffutils, GNU binutils and GNU findutils utility suites, which are shipped with virtually all UNIX-based distributions. Our results show that KATCH can automatically synthesise inputs that significantly increase the patch coverage achieved by the existing manual test suites, and find bugs at the moment they are introduced.

Info

Distinguished Artifact

Termination Proofs from Tests
Aditya V. Nori and Rahul Sharma
(Microsoft Research, India; Stanford University, USA)
We show how a test suite for a sequential program can be profitably used to construct a termination proof. In particular, we describe an algorithm TpT for proving termination of a program based on information derived from testing it. TpT iteratively calls two phases: (a) an infer phase, and (b) a validate phase. In the infer phase, machine learning, in particular, linear regression is used to efficiently compute a candidate loop bound for every loop in the program. These loop bounds are verified for correctness by an off-the-shelf checker. If a loop bound is invalid, then the safety checker provides a test or a counterexample that is used to generate more data which is subsequently used by the next infer phase to compute better estimates for loop bounds. On the other hand, if all loop bounds are valid, then we have a proof of termination. We also describe a simple extension to our approach that allows us to infer polynomial loop bounds automatically. We have evaluated TpT on two benchmark sets, micro-benchmarks obtained from recent literature on program termination, and Windows device drivers. Our results are promising -- on the micro-benchmarks, we show that TpT is able to prove termination on 15% more benchmarks than any previously known technique, and our evaluation on Windows device drivers demonstrates TpT's ability to analyze and scale to real world applications.

Dynamic Analysis

SPLat: Lightweight Dynamic Analysis for Reducing Combinatorics in Testing Configurable Systems
Chang Hwan Peter Kim, Darko Marinov, Sarfraz Khurshid, Don Batory, Sabrina Souto, Paulo Barros, and Marcelo d'Amorim
(University of Texas at Austin, USA; University of Illinois at Urbana-Champaign, USA; Groupon, USA; Federal University of Pernambuco, Brazil)
Many programs can be configured through dynamic and/or static selection of configuration variables. A software product line (SPL), for example, specifies a family of programs where each program is defined by a unique combination of features. Systematically testing SPL programs is expensive as it can require running each test against a combinatorial number of configurations. Fortunately, a test is often independent of many configuration variables and need not be run against every combination. Configurations that are not required for a test can be pruned from execution. This paper presents SPLat, a new way to dynamically prune irrelevant configurations: the configurations to run for a test can be determined during test execution by monitoring accesses to configuration variables. SPLat achieves an optimal reduction in the number of configurations and is lightweight compared to prior work that used static analysis and heavyweight dynamic execution. Experimental results on 10 SPLs written in Java show that SPLat substantially reduces the total test execution time in many cases. Moreover, we demonstrate the scalability of SPLat by applying it to a large industrial code base written in Ruby on Rails.

Info

Cachetor: Detecting Cacheable Data to Remove Bloat
Khanh Nguyen and Guoqing Xu
(UC Irvine, USA)
Modern object-oriented software commonly suffers from runtime bloat that significantly affects its performance and scalability. Studies have shown that one important pattern of bloat is the work repeatedly done to compute the same data values. Very often the cost of computation is very high and it is thus beneficial to memoize the invariant data values for later use. While this is a common practice in real-world development, manually finding invariant data values is a daunting task during development and tuning. To help the developers quickly find such optimization opportunities for performance improvement, we propose a novel run-time profiling tool, called Cachetor, which uses a combination of dynamic dependence profiling and value profiling to identify and report operations that keep generating identical data values. The major challenge in the design of Cachetor is that both dependence and value profiling are extremely expensive techniques that cannot scale to large, real-world applications for which optimizations are important. To overcome this challenge, we propose a series of novel abstractions that are applied to run-time instruction instances during profiling, yielding significantly improved analysis time and scalability. We have implemented Cachetor in Jikes Research Virtual Machine and evaluated it on a set of 14 large Java applications. Our experimental results suggest that Cachetor is effective in exposing caching opportunities and substantial performance gains can be achieved by modifying a program to cache the reported data.

Effective Dynamic Detection of Alias Analysis Errors
Jingyue Wu, Gang Hu, Yang Tang, and Junfeng Yang
(Columbia University, USA)
Alias analysis is perhaps one of the most crucial and widely used analyses, and has attracted tremendous research efforts over the years. Yet, advanced alias analyses are extremely difficult to get right, and the bugs in these analyses are one key reason that they have not been adopted to production compilers. This paper presents NeonGoby, a system for effectively detecting errors in alias analysis implementations, improving their correctness and hopefully widening their adoption. NeonGoby detects the worst type of bugs where the alias analysis claims that two pointers never alias, but they actually alias at runtime. NeonGoby works by dynamically observing pointer addresses during the execution of a test program and then checking these addresses against an alias analysis for errors. It is explicitly designed to (1) be agnostic to the alias analysis it checks for maximum applicability and ease of use and (2) detect alias analysis errors that manifest on real-world programs and workloads. It emits no false positives as long as test programs do not have undefined behavior per ANSI C specification or call external functions that interfere with our detection algorithm. It reduces performance overhead using a practical selection of techniques. Evaluation on three popular alias analyses and real-world programs Apache and MySQL shows that NeonGoby effectively finds 29 alias analysis bugs with zero false positives and reasonable overhead; the most serious four bugs have been patched by the developers. To enable alias analysis builders to start using NeonGoby today, we have released it open-source at https://github.com/columbia/neongoby, along with our error detection results and proposed patches.

Info

Models and Features

Feature Model Extraction from Large Collections of Informal Product Descriptions
Jean-Marc Davril, Edouard Delfosse, Negar Hariri, Mathieu Acher, Jane Cleland-Huang, and Patrick Heymans
(University of Namur, Belgium; DePaul University, USA; University of Rennes I, France; INRIA, France)
Feature Models (FMs) are used extensively in software product line engineering to help generate and validate individual product configurations and to provide support for domain analysis. As FM construction can be tedious and time-consuming, researchers have previously developed techniques for extracting FMs from sets of formally specified individual configurations, or from software requirements specifications for families of existing products. However, such artifacts are often not available. In this paper we present a novel, automated approach for constructing FMs from publicly available product descriptions found in online product repositories and marketing websites such as SoftPedia and CNET. While each individual product description provides only a partial view of features in the domain, a large set of descriptions can provide fairly comprehensive coverage. Our approach utilizes hundreds of partial product descriptions to construct an FM and is described and evaluated against antivirus product descriptions mined from SoftPedia.

N-Way Model Merging
Julia Rubin and Marsha Chechik

(IBM Research, Israel; University of Toronto, Canada)
Model merging is widely recognized as an essential step in a variety of software development activities. During the process of combining a set of related products into a product line or consolidating model views of multiple stakeholders, we need to merge multiple input models into one; yet, most of the existing approaches are applicable to merging only two models. In this paper, we define the n-way merge problem. We show that it can be reduced to the known and widely studied NP-hard problem of weighted set packing. Yet, the approximation solutions for that problem do not scale for real-sized software models. We thus evaluate alternative approaches of merging models that incrementally process input models in small subsets and propose our own algorithm that considerably improves precision over such approaches without sacrificing performance.

Compiling Mockups to Flexible UIs
Nishant Sinha and Rezwana Karim
(IBM Research, India; Rutgers University, USA)
As the web becomes ubiquitous, developers are obliged to develop web applications for a variety of desktop and mobile platforms. Re- designing the user interface for every such platform is clearly cumbersome. We propose a new framework based on model-based compilation to assist the designer in solving this problem. Starting from an under-specified visual design mockup drawn by the designer, we show how faithful and flexible web pages can be obtained with virtually no manual effort. Our framework, in sharp contrast to existing web design tools, overcomes the tough challenges involved in mockup compilation by (a) employing combinatorial search to infer hierarchical layouts and (b) mechanizing adhoc principles for CSS design into a modular, extensible rule-based architecture. We believe ours is the first disciplined effort to solve the problem and will inspire rapid, low-effort web design.

Video

Info

Artifact Accepted for Presentation

Test and Analysis

Making Offline Analyses Continuous
Kıvanç Muşlu, Yuriy Brun

, Michael D. Ernst

, and David Notkin
(University of Washington, USA; University of Massachusetts at Amherst, USA)
Developers use analysis tools to help write, debug, and understand software systems under development. A developer's change to the system source code may affect analysis results. Typically, to learn those effects, the developer must explicitly initiate the analysis. This may interrupt the developer's workflow and/or the delay until the developer learns the implications of the change. The situation is even worse for impure analyses — ones that modify the code on which it runs — because such analyses block the developer from working on the code.
This paper presents Codebase Replication, a novel approach to easily convert an offline analysis — even an impure one — into a continuous analysis that informs the developer of the implications of recent changes as quickly as possible after the change is made. Codebase Replication copies the developer's codebase, incrementally keeps this copy codebase in sync with the developer's codebase, makes that copy codebase available for offline analyses to run without disturbing the developer and without the developer's changes disturbing the analyses, and makes analysis results available to be presented to the developer.
We have implemented Codebase Replication in Solstice, an open-source, publicly-available Eclipse plug-in. We have used Solstice to convert three offline analyses — FindBugs, PMD, and unit testing — into continuous ones. Each conversion required on average 436 NCSL and took, on average, 18 hours. Solstice-based analyses experience no more than 2.5 milliseconds of runtime overhead per developer action.

Regression Tests to Expose Change Interaction Errors
Marcel Böhme, Bruno C. d. S. Oliveira, and Abhik Roychoudhury

(National University of Singapore, Singapore)
Changes often introduce program errors, and hence recent software testing literature has focused on generating tests which stress changes. In this paper, we argue that changes cannot be treated as isolated program artifacts which are stressed via testing. Instead, it is the complex dependency across multiple changes which introduce subtle errors. Furthermore, the complex dependence structures, that need to be exercised to expose such errors, ensure that they remain undiscovered even in well tested and deployed software. We motivate our work based on empirical evidence from a well tested and stable project - Linux GNU Coreutils - where we found that one third of the regressions take more than two (2) years to be fixed, and that two thirds of such long-standing regressions are introduced due to change interactions for the utilities we investigated.
To combat change interaction errors, we first define a notion of change interaction where several program changes are found to affect the result of a program statement via program dependencies. Based on this notion, we propose a change sequence graph (CSG) to summarize the control-flow and dependencies across changes. The CSG is then used as a guide during program path exploration via symbolic execution - thereby efficiently producing test cases which witness change interaction errors. Our experimental infra-structure was deployed on various utilities of GNU Coreutils, which have been distributed with Linux for almost twenty years. Apart from finding five (5) previously unknown errors in the utilities, we found that only one in five generated test cases exercises a sequence that is critical to exposing a change-interaction error, while being an order of magnitude more likely to expose an error. On the other hand, stressing changes in isolation only exposed half of the change interaction errors. These results demonstrate the importance and difficulty of change dependence-aware regression testing.

Differential Assertion Checking
Shuvendu K. Lahiri, Kenneth L. McMillan, Rahul Sharma, and Chris Hawblitzel

(Microsoft Research, USA; Stanford University, USA)
Previous version of a program can be a powerful enabler for program analysis by defining new relative specifications and making the results of current program analysis more relevant. In this paper, we describe the approach of differential assertion checking (DAC) for comparing different versions of a program with respect to a set of assertions. DAC provides a natural way to write relative specifications over two programs. We introduce a novel modular approach to DAC by reducing it to safety checking of a composed program, which can be accomplished by standard program verifiers. In particular, we leverage automatic invariant generation to synthesize relative specifications for pairs of loops and procedures. We provide a preliminary evaluation of a prototype implementation within the SymDiff tool along two directions (a) soundly verifying bug fixes in the presence of loops and (b) providing a knob for suppressing alarms when checking a new version of a program.

Maintenance and Evolution

Preventing Database Deadlocks in Applications
Mark Grechanik, B. M. Mainul Hossain, Ugo Buy, and Haisheng Wang
(University of Illinois at Chicago, USA; Oracle, USA)
Many organizations deploy applications that use databases by sending Structured Query Language (SQL) statements to them and obtaining data that result from the execution of these statements. Since applications often share the same databases concurrently, database deadlocks routinely occur in these databases resulting in major performance degradation in these applications. Database engines do not prevent database deadlocks for the same reason that the schedulers of operating system kernels do not preempt processes in a way to avoid race conditions and deadlocks - it is not feasible to find an optimal context switching schedule quickly for multiple processes (and SQL statements), and the overhead of doing it is prohibitive.
We created a novel approach that combines run-time monitoring, which automatically prevents database deadlocks, with static analysis, which detects hold-and-wait cycles that specify how resources (e.g., database tables) are held in contention during executions of SQL statements. We rigorously evaluated our approach. For a realistic case of over 1,200 SQL statements, our algorithm detects all hold-and-wait cycles in less than two seconds. We built a toolset and experimented with three applications. Our tool prevented all existing database deadlocks in these applications and increased their throughputs by up to three orders of magnitude.

Info

Identifying Message Flow in Distributed Event-Based Systems
Joshua Garcia, Daniel Popescu, Gholamreza Safi, William G. J. Halfond

, and Nenad Medvidovic

(University of Southern California, USA)
Distributed event-based (DEB) systems contain highly-decoupled components that interact by exchanging messages. This enables flexible system composition and adaptation, but also makes DEB systems difficult to maintain. Most existing program analysis techniques to support maintenance are not well suited to DEB systems, while those that are tend to suffer from inaccuracy or make assumptions that limit their applicability. This paper presents Eos, a static analysis technique that identifies message information useful for maintaining a DEB system, namely, message types and message flow within a system. Eos has been evaluated on six off-the-shelf DEB systems spanning five different middleware platforms, and has exhibited excellent accuracy and efficiency. Furthermore, a case study involving a range of maintenance activities undertaken on three existing DEB systems shows that, on average, Eos enables an engineer to identify the scope and impact of required changes more accurately than existing alternatives.

Info

Improving Trace Accuracy through Data-Driven Configuration and Composition of Tracing Features
Sugandha Lohar, Sorawit Amornborvornwong, Andrea Zisman, and Jane Cleland-Huang
(DePaul University, USA; Open University, UK)
Software traceability is a sought-after, yet often elusive quality in large software-intensive systems primarily because the cost and effort of tracing can be overwhelming. State-of-the art solutions address this problem through utilizing trace retrieval techniques to automate the process of creating and maintaining trace links. However, there is no simple one- size-fits all solution to trace retrieval. As this paper will show, finding the right combination of tracing techniques can lead to significant improvements in the quality of generated links. We present a novel approach to trace retrieval in which the underlying infrastructure is configured at runtime to optimize trace quality. We utilize a machine-learning approach to search for the best configuration given an initial training set of validated trace links, a set of available tracing techniques specified in a feature model, and an architecture capable of instantiating all valid configurations of features. We evaluate our approach through a series of experiments using project data from the transportation, healthcare, and space exploration domains, and discuss its implementation in an industrial environment. Finally, we show how our approach can create a robust baseline against which new tracing techniques can be evaluated.

Formal Verification

Precision Reuse for Efficient Regression Verification
Dirk Beyer

, Stefan Löwe, Evgeny Novikov, Andreas Stahlbauer, and Philipp Wendler
(University of Passau, Germany; ISP RAS, Russia)
Continuous testing during development is a well-established technique for software-quality assurance. Continuous model checking from revision to revision is not yet established as a standard practice, because the enormous resource consumption makes its application impractical. Model checkers compute a large number of verification facts that are necessary for verifying if a given specification holds. We have identified a category of such intermediate results that are easy to store and efficient to reuse: abstraction precisions. The precision of an abstract domain specifies the level of abstraction that the analysis works on. Precisions are thus a precious result of the verification effort and it is a waste of resources to throw them away after each verification run. In particular, precisions are reasonably small and thus easy to store; they are easy to process and have a large impact on resource consumption. We experimentally show the impact of precision reuse on industrial verification problems created from 62 Linux kernel device drivers with 1119 revisions.

Info

Artifact Accepted for Presentation

Cascading Verification: An Integrated Method for Domain-Specific Model Checking
Fokion Zervoudakis, David S. Rosenblum, Sebastian Elbaum, and Anthony Finkelstein
(University College London, UK; National University of Singapore, Singapore; University of Nebraska-Lincoln, USA)
Model checking is an established method for verifying behavioral properties of system models. But model checkers tend to support low-level modeling languages that require intricate models to represent even the simplest systems. Modeling complexity arises in part from the need to encode domain knowledge at relatively low levels of abstraction.
In this paper, we demonstrate that formalized domain knowledge can be reused to raise the abstraction level of model and property specifications, and hence the effectiveness of model checking. We describe a novel method for domain-specific model checking called cascading verification that uses composite reasoning over high-level system specifications and formalized domain knowledge to synthesize both low-level system models and their behavioral properties for verification. In particular, model builders use a high-level domain-specific language (DSL) based on YAML to express system specifications that can be verified with probabilistic model checking. Domain knowledge is encoded in the Web Ontology Language (OWL), the Semantic Web Rule Language (SWRL) and Prolog, which are used in combination to overcome their individual limitations. A compiler then synthesizes models and properties for verification by the probabilistic model checker PRISM. We illustrate cascading verification for the domain of uninhabited aerial vehicles (UAVs), for which we have constructed a prototype implementation. An evaluation of this prototype reveals nontrivial reductions in the size and complexity of input specifications compared to the artifacts synthesized for PRISM.

Enhancing Symbolic Execution with Built-In Term Rewriting and Constrained Lazy Initialization
Pietro Braione, Giovanni Denaro, and Mauro Pezzè
(University of Milano-Bicocca, Italy; University of Lugano, Switzerland)
Symbolic execution suffers from problems when analyzing programs that handle complex data structures as their inputs and take decisions over non-linear expressions. For these programs, symbolic execution may incur invalid inputs or unidentified infeasible traces, and may raise large amounts of false alarms. Some symbolic executors tackle these problems by introducing executable preconditions to exclude invalid inputs, and some solvers exploit rewrite rules to address non linear problems. In this paper, we discuss the core limitations of executable preconditions, and address these limitations by proposing invariants specifically designed to harmonize with the lazy initialization algorithm. We exploit rewrite rules applied within the symbolic executor, to address simplifications of inverse relationships fostered from either program-specific calculations or the logic of the verification tasks. We present a symbolic executor that integrates the two techniques, and validate our approach against the verification of a relevant set of properties of the Tactical Separation Assisted Flight Environment. The empirical data show that the integrated approach can improve the effectiveness of symbolic execution.

Info

Artifact Accepted for Presentation

Model Inference and Synthesis

Mining Behavior Models from Enterprise Web Applications
Matthias Schur, Andreas Roth, and Andreas Zeller
(SAP, Germany; Saarland University, Germany)
Today's enterprise web applications demand very high release cycles---and consequently, frequent tests. Automating these tests typically requires a behavior model: A description of the states the application can be in, the transitions between these states, and the expected results. Furthermore one needs scripts to make the abstract actions (transitions) in the model executable. As specifying such behavior models and writing the necessary scripts manually is a hard task, a possible alternative could be to extract them from existing applications. However, mining such models can be a challenge, in particular because one needs to know when two states are equivalent, as well as how to reach that state. We present ProCrawl (PROcess CRAWLer), a generic approach to mine behavior models from (multi-user) enterprise web applications. ProCrawl observes the behavior of the application through its user interface, generates and executes tests to explore unobserved behavior. In our evaluation of three non-trivial web applications (an open-source shop system, an SAP product compliance application, and an open-source conference manager), ProCrawl produces models that precisely abstract application behavior and which can be directly used for effective model-based regression testing.

Incrementally Synthesizing Controllers from Scenario-Based Product Line Specifications
Joel Greenyer, Christian Brenner, Maxime Cordy, Patrick Heymans, and Erika Gressi
(Leibniz Universität Hannover, Germany; University of Paderborn, Germany; University of Namur, Belgium; Politecnico di Milano, Italy)
Many software-intensive systems consist of components that interact to fulfill complex functionality. Moreover, often many variants of such systems have to be designed at once. This adds complexity to the design task. Recently, we proposed a scenario-based approach to design product lines, which combines feature diagrams and Modal Sequence Diagrams. We proposed a consistency-checking technique based on a dedicated product line model checker. One limitation of this technique is that it is incomplete, i.e., it may fail to show the consistency of some consistent specifications. In this paper we propose a new game-based approach that overcomes this incompleteness and, in addition, automatically synthesizes controllers for the consistent product specifications. We exploit the fact that many variants are similar and efficiently synthesize product controllers incrementally. We provide a prototype tool and evaluate the efficiency of the approach.

Synthesis of Component and Connector Models from Crosscutting Structural Views
Shahar Maoz

, Jan Oliver Ringert, and Bernhard Rumpe

(Tel Aviv University, Israel; RWTH Aachen University, Germany)
We present component and connector (C&C) views, which specify structural properties of component and connector models in an expressive and intuitive way. C&C views provide means to abstract away direct hierarchy, direct connectivity, port names and types, and thus can crosscut the traditional boundaries of the implementation-oriented hierarchical decomposition of systems and sub-systems, and reflect the partial knowledge available to different stakeholders involved in a system's design. As a primary application for C&C views we investigate the synthesis problem: given a C&C views specification, consisting of mandatory, alternative, and negative views, construct a concrete satisfying C&C model, if one exists. We show that the problem is NP-hard and solve it, in a bounded scope, using a reduction to SAT, via Alloy. We further extend the basic problem with support for library components, specification patterns, and architectural styles. The result of synthesis can be used for further exploration, simulation, and refinement of the C&C model or, as the complete, final model itself, for direct code generation. A prototype tool and an evaluation over four example systems with multiple specifications show promising results and suggest interesting future research directions towards a comprehensive development environment for the structure of component and connector designs.

Info

Artifact Accepted for Presentation

Empirical Studies II

Searching for Better Configurations: A Rigorous Approach to Clone Evaluation
Tiantian Wang, Mark Harman, Yue Jia, and Jens Krinke
(Harbin Institute of Technology, China; University College London, UK)
Clone detection finds application in many software engineering activities such as comprehension and refactoring. However, the confounding configuration choice problem poses a widely-acknowledged threat to the validity of previous empirical analyses. We introduce desktop and parallelised cloud-deployed versions of a search based solution that finds suitable configurations for empirical studies. We evaluate our approach on 6 widely used clone detection tools applied to the Bellon suite of 8 subject systems. Our evaluation reports the results of 9.3 million total executions of a clone tool; the largest study yet reported. Our approach finds significantly better configurations (p < 0.05) than those currently used, providing evidence that our approach can ameliorate the confounding configuration choice problem.

Info

Diversity in Software Engineering Research
Meiyappan Nagappan, Thomas Zimmermann

, and Christian Bird
(Queen’s University, Canada; Microsoft Research, USA)
One of the goals of software engineering research is to achieve generality: Are the phenomena found in a few projects reflective of others? Will a technique perform as well on projects other than the projects it is evaluated on? While it is common sense to select a sample that is representative of a population, the importance of diversity is often overlooked, yet as important. In this paper, we combine ideas from representativeness and diversity and introduce a measure called sample coverage, defined as the percentage of projects in a population that are similar to the given sample. We introduce algorithms to compute the sample coverage for a given set of projects and to select the projects that increase the coverage the most. We demonstrate our technique on research presented over the span of two years at ICSE and FSE with respect to a population of 20,000 active open source projects monitored by Ohloh.net. Knowing the coverage of a sample enhances our ability to reason about the findings of a study. Furthermore, we propose reporting guidelines for research: in addition to coverage scores, papers should discuss the target population of the research (universe) and dimensions that potentially can influence the outcomes of a research (space).

Info

Artifact Accepted for Presentation

API Change and Fault Proneness: A Threat to the Success of Android Apps
Mario Linares-Vásquez, Gabriele Bavota, Carlos Bernal-Cárdenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk

(College of William and Mary, USA; University of Sannio, Italy; Universidad Nacional de Colombia, Colombia; University of Molise, Italy)
During the recent years, the market of mobile software applications (apps) has maintained an impressive upward trajectory. Many small and large software development companies invest considerable resources to target available opportunities. As of today, the markets for such devices feature over 850K+ apps for Android and 900K+ for iOS. Availability, cost, functionality, and usability are just some factors that determine the success or lack of success for a given app. Among the other factors, reliability is an important criteria: users easily get frustrated by repeated failures, crashes, and other bugs; hence, abandoning some apps in favor of others.
This paper reports a study analyzing how the fault- and change-proneness of APIs used by 7,097 (free) Android apps relates to applications' lack of success, estimated from user ratings. Results of this study provide important insights into a crucial issue: making heavy use of fault- and change-prone APIs can negatively impact the success of these apps.

Info

JavaScript

Jalangi: A Selective Record-Replay and Dynamic Analysis Framework for JavaScript
Koushik Sen, Swaroop Kalasapur, Tasneem Brutch, and Simon Gibbs
(UC Berkeley, USA; Samsung Research, USA)
JavaScript is widely used for writing client-side web applications and is getting increasingly popular for writing mobile applications. However, unlike C, C++, and Java, there are not that many tools available for analysis and testing of JavaScript applications. In this paper, we present a simple yet powerful framework, called Jalangi, for writing heavy-weight dynamic analyses. Our framework incorporates two key techniques: 1) selective record-replay, a technique which enables to record and to faithfully replay a user-selected part of the program, and 2) shadow values and shadow execution, which enables easy implementation of heavy-weight dynamic analyses. Our implementation makes no special assumption about JavaScript, which makes it applicable to real-world JavaScript programs running on multiple platforms. We have implemented concolic testing, an analysis to track origins of nulls and undefined, a simple form of taint analysis, an analysis to detect likely type inconsistencies, and an object allocation profiler in Jalangi. Our evaluation of Jalangi on the SunSpider benchmark suite and on five web applications shows that Jalangi has an average slowdown of 26X during recording and 30X slowdown during replay and analysis. The slowdowns are comparable with slowdowns reported for similar tools, such as PIN and Valgrind for x86 binaries. We believe that the techniques proposed in this paper are applicable to other dynamic languages.

Practical Static Analysis of JavaScript Applications in the Presence of Frameworks and Libraries
Magnus Madsen

, Benjamin Livshits, and Michael Fanning
(Aarhus University, Denmark; Microsoft Research, USA; Microsoft, USA)
JavaScript is a language that is widely-used for both web- based and standalone applications such as those in the upcoming Windows 8 operating system. Analysis of JavaScript has long been known to be challenging due to its dynamic nature. On top of that, most JavaScript applications rely on large and complex libraries and frameworks, often written in a combination of JavaScript and native code such as C and C++. Stubs have been commonly employed as a partial specification mechanism to address the library problem; however, they are tedious to write, incomplete, and occasionally incorrect.
However, the manner in which library code is used within applications often sheds light on what library APIs return or consume as parameters. In this paper, we propose a technique which combines pointer analysis with use analysis to handle many challenges posed by large JavaScript libraries. Our approach enables a variety of applications, ranging from call graph discovery to auto-complete to supporting runtime optimizations. Our techniques have been implemented and empirically validated on a set of 25 Windows 8 JavaScript applications, averaging 1,587 lines of code, demonstrating a combination of scalability and precision.

Server Interface Descriptions for Automated Testing of JavaScript Web Applications
Casper S. Jensen, Anders Møller

, and Zhendong Su
(Aarhus University, Denmark; UC Davis, USA)
Automated testing of JavaScript web applications is complicated by the communication with servers. Specifically, it is difficult to test the JavaScript code in isolation from the server code and database contents. We present a practical solution to this problem. First, we demonstrate that formal server interface descriptions are useful in automated testing of JavaScript web applications for separating the concerns of the client and the server. Second, to support the construction of server interface descriptions for existing applications, we introduce an effective inference technique that learns communication patterns from sample data.
By incorporating interface descriptions into the testing tool Artemis, our experimental results show that we increase the level of automation for high-coverage testing on a collection of JavaScript web applications that exchange JSON data between the clients and servers. Moreover, we demonstrate that the inference technique can quickly and accurately learn useful server interface descriptions.

Info

Source Code and Programming

Explaining Inconsistent Code
Martin Schäf, Daniel Schwartz-Narbonne, and Thomas Wies

(United Nations University, China; New York University, USA)
A code fragment is inconsistent if it is not part of any normally terminating execution. Examples of such inconsistencies include code that is unreachable, code that always fails due to a run-time error, and code that makes conflicting assumptions about the program state. In this paper, we consider the problem of automatically explaining inconsistent code. This problem is difficult because traditional fault localization techniques do not apply. Our solution relies on a novel algorithm that takes an infeasible code fragment as input and generates a so-called error invariant automaton. The error invariant automaton is an abstraction of the input code fragment that only mentions program statements and facts that are relevant for understanding the cause of the inconsistency. We conducted a preliminary usability study which demonstrated that error invariant automata can help programmers better understand inconsistencies in code taken from real-world programs. In particular, access to an error invariant automata tripled the speed at which programmers could diagnose the cause of a code inconsistency.

A Statistical Semantic Language Model for Source Code
Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen
(Iowa State University, USA)
Recent research has successfully applied the statistical n-gram language model to show that source code exhibits a good level of repetition. The n-gram model is shown to have good predictability in supporting code suggestion and completion. However, the state-of-the-art n-gram approach to capture source code regularities/patterns is based only on the lexical information in a local context of the code units. To improve predictability, we introduce SLAMC, a novel statistical semantic language model for source code. It incorporates semantic information into code tokens and models the regularities/patterns of such semantic annotations, called sememes, rather than their lexemes. It combines the local context in semantic n-grams with the global technical concerns/functionality into an n-gram topic model, together with pairwise associations of program elements. Based on SLAMC, we developed a new code suggestion method, which is empirically evaluated on several projects to have relatively 18-68% higher accuracy than the state-of-the-art approach.

Crossing the Gap from Imperative to Functional Programming through Refactoring
Alex Gyori, Lyle Franklin, Danny Dig, and Jan Lahoda
(University of Illinois, USA; Ball State University, USA; Oregon State University, USA; Oracle, Czech Republic)
Java 8 introduces two functional features: lambda expressions and functional operations like map or filter that apply a lambda expression over the elements of a Collection. Refactoring existing code to use these new features enables explicit but unobtrusive parallelism and makes the code more succinct. However, refactoring is tedious: it requires changing many lines of code. It is also error-prone: the programmer must reason about the control-, data-flow, and side-effects. Fortunately, refactorings can be automated. We designed and implemented LambdaFicator, a tool which automates two refactorings. The first refactoring converts anonymous inner classes to lambda expressions. The second refactoring converts for loops that iterate over Collections to functional operations that use lambda expressions. Using 9 open-source projects, we have applied these two refactorings 1263 and 1709 times, respectively. The results show that LambdaFicator is useful: (i) it is widely applicable, (ii) it reduces the code bloat, (iii) it increases programmer productivity, and (iv) it is accurate.

Info

Artifact Accepted for Presentation

Bug Detection

Scalable and Incremental Software Bug Detection
Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan
(Coverity, USA; Indian Institute of Science, India)
An important, but often neglected, goal of static analysis for detecting bugs is the ability to show defects to the programmer quickly. Unfortunately, existing static analysis tools scale very poorly, or are shallow and cannot find complex interprocedural defects. Previous attempts at reducing the analysis time by adding more resources (CPU, memory) or by splitting the analysis into multiple sub-analyses based on defect detection capabilities resulted in limited/negligible improvements.
We present a technique for parallel and incremental static analysis using top-down, bottom-up and global specification inference based around the concept of a work unit, a self-contained atom of analysis input, that deterministically maps to its output. A work unit contains both abstract and concrete syntax to analyze, a supporting fragment of the class hierarchy, summarized interprocedural behavior, and defect reporting information, factored to ensure a high level of reuse when analyzing successive versions incrementally. Work units are created and consumed by an analysis master process that coordinates the multiple analysis passes, the flow of information among them, and incrementalizes the computation. Meanwhile, multiple analysis worker processes use abstract interpretation to compute work unit results. Process management and interprocess communication is done by a general-purpose computation distributor layer.
We have implemented our approach and our experimental results show that using eight processor cores, we can perform complete analysis of code bases with millions of lines of code in a few hours, and even faster after incremental changes to that code. The analysis is thorough and accurate: it usually reports about one crash-causing defect per thousand lines of code, with a false positive rate of 10--20%

Inferring Project-Specific Bug Patterns for Detecting Sibling Bugs
Guangtai Liang, Qianxiang Wang, Tao Xie, and Hong Mei
(Peking University, China; University of Illinois at Urbana-Champaign, USA)
Lightweight static bug-detection tools such as FindBugs, PMD, Jlint, and Lint4j detect bugs with the knowledge of generic bug patterns (e.g., objects of java.io.InputStream are not closed in time after used). Besides generic bug patterns, different projects under analysis may have some project-specific bug patterns. For example, in a revision of the Xerces project, the class field "fDTDHandler" is dereferenced without proper null-checks, while it could actually be null at runtime. We name such bug patterns directly related to objects instantiated in specific projects as Project-Specific Bug Patterns (PSBPs). Due to lack of such PSBP knowledge, existing tools usually fail in effectively detecting most of this kind of bugs. We name bugs belonging to the same project and sharing the same PSBP as sibling bugs. If some sibling bugs are fixed in a fix revision but some others remain, we treat such fix as an incomplete fix. To address such incomplete fixes, we propose a PSBP-based approach for detecting sibling bugs and implement a tool called Sibling-Bug Detector (SBD). Given a fix revision, SBD first infers the PSBPs implied by the fix revision. Then, based on the inferred PSBPs, SBD detects their related sibling bugs in the same project. To evaluate SBD, we apply it to seven popular open-source projects. Among the 108 warnings reported by SBD, 63 of them have been confirmed as real bugs by the project developers, while two existing popular static detectors (FindBugs and PMD) cannot report most of them.

Info

Mining Succinct Predicated Bug Signatures
Chengnian Sun and Siau-Cheng Khoo
(National University of Singapore, Singapore)
A bug signature is a set of program elements highlighting the cause or effect of a bug, and provides contextual information for debugging. In order to mine a signature for a buggy program, two sets of execution profiles of the program, one capturing the correct execution and the other capturing the faulty, are examined to identify the program elements contrasting faulty from correct. Signatures solely consisting of control flow transitions have been investigated via discriminative sequence and graph mining algorithms. These signatures might be handicapped in cases where the effect of a bug is not manifested by any deviation in control flow transitions. In this paper, we introduce the notion of predicated bug signature that aims to enhance the predictive power of bug signatures by utilizing both data predicates and control-flow information. We introduce a novel “discriminative itemset generator” mining technique to generate succinct signatures which do not contain redundant or irrelevant program elements. Our case studies demonstrate that predicated signatures can hint at more scenarios of bugs where traditional control-flow signatures fail.

Info

Artifact Accepted for Presentation

Tool Demonstrations

Tool Demonstrations I

SocialCDE: A Social Awareness Tool for Global Software Teams
Fabio Calefato and Filippo Lanubile
(University of Bari, Italy)
We present SocialCDE, a tool that aims at augmenting Application Lifecycle Management (ALM) platforms with social awareness to facilitate the establishment of interpersonal connections and increase the likelihood of successful interactions by disclosing developers’ personal interests and contextual information.

Video

Info

REDACT: Preventing Database Deadlocks from Application-Based Transactions
B. M. Mainul Hossain, Mark Grechanik, Ugo Buy, and Haisheng Wang
(University of Illinois at Chicago, USA; Oracle, USA)
In this demonstration, we will present a database deadlocks prevention system that visualizes our algorithm for detecting hold-and-wait cycles that specify how resources (e.g., database tables) are locked and waited on to be locked during executions of SQL statements and utilizes those cycles information to prevent database deadlocks automatically.

Video

Info

aPET: A Test Case Generation Tool for Concurrent Objects
Elvira Albert

, Puri Arenas, Miguel Gómez-Zamalloa, and Peter Y. H. Wong
(Complutense University of Madrid, Spain; SLD Fredhopper, Netherlands)
We present the concepts, usage and prototypical implementation of aPET, a test case generation tool for a distributed asynchronous language based on concurrent objects. The system receives as input a program, a selection of methods to be tested, and a set of parameters that include a selection of a coverage criterion. It yields as output a set of test cases which guarantee that the selected coverage criterion is achieved. aPET is completely integrated within the language's IDE via Eclipse. The generated test cases can be displayed in textual mode and, besides, it is possible to generate ABSUnit code (i.e., code runnable in a simple framework similar to JUnit to write repeatable tests). The information yield by aPET can be relevant to spot bugs during program development and also to perform regression testing.

Info

Tool Demonstrations II

RUBRIC: A Flexible Tool for Automated Checking of Conformance to Requirement Boilerplates
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand

, Frank Zimmer, and Raul Gnaga
(University of Luxembourg, Luxembourg; SES TechCom, Luxembourg)
Using requirement boilerplates is an effective way to mit- igate many types of ambiguity in Natural Language (NL) requirements and to enable more automated transformation and analysis of these requirements. When requirements are expressed using boilerplates, one must check, as a first qual- ity assurance measure, whether the requirements actually conform to the boilerplates. If done manually, boilerplate conformance checking can be laborious, particularly when requirements change frequently. We present RUBRIC (Re- qUirements BoileRplate sanIty Checker), a flexible tool for automatically checking NL requirements against boilerplates for conformance. RUBRIC further provides a range of di- agnostics to highlight potentially problematic syntactic con- structs in NL requirement statements. RUBRIC is based on a Natural Language Processing (NLP) technique, known as text chunking. A key advantage of RUBRIC is that it yields highly accurate results even in early stages of requirements writing, where a requirements glossary may be unavailable or only partially specified. RUBRIC is scalable and can be applied repeatedly to large sets of requirements as they evolve. The tool has been validated through an industrial case study which we outline briefly in the paper.

Video

Info

RiTHM: A Tool for Enabling Time-Triggered Runtime Verification for C Programs
Samaneh Navabpour, Yogi Joshi, Wallace Wu, Shay Berkovich, Ramy Medhat, Borzoo Bonakdarpour, and Sebastian Fischmeister
(University of Waterloo, Canada)
We introduce the tool RiTHM (Runtime Time-triggered Heterogeneous Monitoring). RiTHM takes a C program under inspection and a set of LTL properties as input and generates an instrumented C program that is verified at run time by a time-triggered monitor. RiTHM provides two techniques based on static analysis and control theory to minimize instrumentation of the input C program and monitoring intervention. The monitor's verification decision procedure is sound and complete and exploits the GPU many-core technology to speedup and encapsulate monitoring tasks.

Video

Info

PoMMaDe: Pushdown Model-Checking for Malware Detection
Fu Song and Tayssir Touili
(East China Normal University, China; CNRS, France; University Paris Diderot, France)
We present PoMMaDe, a Pushdown Model-checking based Malware Detector. In PoMMaDe, a binary program is modeled as a pushdown system (PDS) which allows to track the stack of the program, and malicious behaviors are specified in SCTPL or SLTPL, where SCTPL (resp. SLTPL) is an extension of CTL (resp. LTL) with variables, quantifiers, and predicates over the stack (needed for malware specification). The malware detection problem is reduced to SCTPL/SLTPL model-checking for PDSs. PoMMaDe allows us to detect 600 real malwares, 200 new malwares generated by two malware generators NGVCK and VCL32, and prove benign programs are benign. In particular, PoMMaDe was able to detect several malwares that could not be detected by well-known anti-viruses such as Avira, Avast, Kaspersky, McAfee, AVG, BitDefender, Eset Nod32, F-Secure, Norton, Panda, Trend Micro and Qihoo 360.

Video

Info

RADA: A Tool for Reasoning about Algebraic Data Types with Abstractions
Tuan-Hung Pham and Michael W. Whalen
(University of Minnesota, USA)
We present RADA, a portable, scalable tool for reasoning about formulas containing algebraic data types using catamorphism (fold) functions. It can work as a back-end for reasoning about recursive programs that manipulate algebraic types. RADA operates by successively unrolling catamorphisms and uses either CVC4 and Z3 as reasoning engines. We have used RADA for reasoning about functional implementations of complex data structures and to reason about guard applications that determine whether XML messages should be allowed to cross network security domains. Promising experimental results demonstrate that RADA can be used in several practical contexts.

Video

Info

Tool Demonstrations III

Jalangi: A Tool Framework for Concolic Testing, Selective Record-Replay, and Dynamic Analysis of JavaScript
Koushik Sen, Swaroop Kalasapur, Tasneem Brutch, and Simon Gibbs
(UC Berkeley, USA; Samsung Research, USA)
We describe a tool framework, called Jalangi, for dynamic analysis and concolic testing of JavaScript programs. The framework is written in JavaScript and allows implementation of various heavy-weight dynamic analyses for JavaScript. Jalangi incorporates two key techniques: 1) selective record-replay, a technique which enables to record and to faithfully replay a user-selected part of the program, and 2) shadow values and shadow execution, which enables easy implementation of heavy-weight dynamic analyses such as concolic testing and taint tracking. Jalangi works through source-code instrumentation which makes it portable across platforms. Jalangi is available at https://github.com/SRA-SiliconValley/jalangi under Apache 2.0 license. Our evaluation of Jalangi on the SunSpider benchmark suite and on five web applications shows that Jalangi has an average slowdown of 26X during recording and 30X slowdown during replay and analysis. The slowdowns are comparable with slowdowns reported for similar tools, such as PIN and Valgrind for x86 binaries.

Video

Info

RSA-MBT: A Test Tool for Generating Test Artifacts Based on Models
Andrew Diniz da Costa, Ricardo Venieris, Gustavo Carvalho, and Carlos José Pereira de Lucena
(PUC-Rio, Brazil)
Model-Based Testing (MBT) has attracted the attention of many industries and, hence, it has provided several approaches reported in the literature. The Software Engineering Lab (LES) at the Pontifical Catholic University of Rio de Janeiro has worked extensively on coordinating and carrying out tests of large-scale software systems developed (for web and desktop) for different domains (e.g. petroleum, e-commerce, etc). Based on this experience, an LES test group created a new test modeling language called UML Testing Profile for Coordination (UTP-C) to model relevant test data. However, to use the advantages of the new modeling, an appropriate test-modeling tool became necessary. Thus, this paper presents the RSA-MBT, a new plug-in developed for the Rational Software Architecture (RSA) tool, to generate a set of test artifacts from UTP-C diagrams.

Info

USMMC: A Self-Contained Model Checker for UML State Machines
Shuang Liu, Yang Liu

, Jun Sun, Manchun Zheng, Bimlesh Wadhwa, and Jin Song Dong
(National University of Singapore, Singapore; Nanyang Technological University, Singapore; Singapore University of Technology and Design, Singapore)
UML diagrams are gaining increasing usage in Object-Oriented system designs. UML state machines are specifically used in modeling dynamic behaviors of classes. It has been widely agreed that verification of system designs at an early stage will dramatically reduce the development cost. Tool support for verification UML designs can also encourage consistent usage of UML diagrams throughout the software development procedure. In this work, we present a tool, named USMMC, which turns model checking of UML state machines into practice. USMMC is a self-contained toolkit, which provides editing, interactive simulation as well as powerful model checking support for UML state machines. The evaluation results show the effectiveness and scalability of our tool.

Video

Info

New Ideas

Analysis and Testing

Extracting URLs from JavaScript via Program Analysis
Qi Wang, Jingyu Zhou, Yuting Chen, Yizhou Zhang, and Jianjun Zhao
(Shanghai Jiao Tong University, China; Cornell University, USA)
With the extensive use of client-side JavaScript in web applications, web contents are becoming more dynamic than ever before. This poses significant challenges for search engines, because more web URLs are now embedded or hidden inside JavaScript code and most web crawlers are script-agnostic, significantly reducing the coverage of search engines. We present a hybrid approach that combines static analysis with dynamic execution, overcoming the weakness of a purely static or dynamic approach that either lacks accuracy or suffers from huge execution cost. We also propose to integrate program analysis techniques such as statement coverage and program slicing to improve the performance of URL mining.

Data Debugging with Continuous Testing
Kıvanç Muşlu, Yuriy Brun

, and Alexandra Meliou
(University of Washington, USA; University of Massachusetts at Amherst, USA)
Today, systems rely as heavily on data as on the software that manipulates those data. Errors in these systems are incredibly costly, annually resulting in multi-billion dollar losses, and, on multiple occasions, in death. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: discovering system errors caused by well-formed but incorrect data. In this paper, we propose continuous data testing: using otherwise-idle CPU cycles to run test queries, in the background, as a user or database administrator modifies a database. This technique notifies the user or administrator about a data bug as quickly as possible after that bug is introduced, leading to at least three benefits: (1) The bug is discovered quickly and can be fixed before it is likely to cause a problem. (2) The bug is discovered while the relevant change is fresh in the user's or administrator's mind, increasing the chance that the underlying cause of the bug, as opposed to only the discovered side-effect, is fixed. (3) When poor documentation or company policies contribute to bugs, discovering the bug quickly is likely to identify these contributing factors, facilitating updating documentation and policies to prevent similar bugs in the future. We describe the problem space and potential benefits of continuous data testing, our vision for the technique, challenges we encountered, and our prototype implementation for PostgreSQL. The prototype's low overhead shows promise that continuous data testing can address the important problem of data debugging.

Iterative Test Suites Refinement for Elastic Computing Systems
Alessio Gambi, Antonio Filieri, and Schahram Dustdar
(University of Lugano, Switzerland; University of Stuttgart, Germany; Vienna University of Technology, Austria)
Elastic computing systems can dynamically scale to continuously and cost-effectively provide their required Quality of Service in face of time-varying workloads, and they are usually implemented in the cloud. Despite their wide-spread adoption by industry, a formal definition of elasticity and suitable procedures for its assessment and verification are still missing. Both academia and industry are trying to adapt established testing procedures for functional and non-functional properties, with limited effectiveness with respect to elasticity. In this paper we propose a new methodology to automatically generate test-suites for testing the elastic properties of systems. Elasticity, plasticity, and oscillations are first formalized through a convenient behavioral abstraction of the elastic system and then used to drive an iterative test suite refinement process. The outcomes of our approach are a test suite tailored to the violation of elasticity properties and a human-readable abstraction of the system behavior to further support diagnosis and fix.

Using Fault History to Improve Mutation Reduction
Laura Inozemtseva, Hadi Hemmati, and Reid Holmes
(University of Waterloo, Canada; University of Manitoba, Canada)
Mutation testing can be used to measure test suite quality in two ways: by treating the kill score as a quality metric, or by treating each surviving, non-equivalent mutant as an indicator of an inadequacy in the test suite. The first technique relies on the assumption that the mutation score is highly correlated with the suite's real fault detection rate, which is not well supported by the literature. The second technique relies only on the weaker assumption that the "interesting" mutants (i.e., the ones that indicate an inadequacy in the suite) are in the set of surviving mutants. Using the second technique also makes improving the suite straightforward.
Unfortunately, mutation testing has a performance problem. At least part of the test suite must be run on every mutant, meaning mutation testing can be too slow for practical use. Previous work has addressed this by reducing the number of mutants to evaluate in various ways, including selecting a random subset of them. However, reducing the set of mutants by random reduction is suboptimal for developers using the second technique described above, since random reduction will eliminate many of the interesting mutants.
We propose a new reduction method that supports the use of the second technique by reducing the set of mutants to those generated by altering files that have contained many faults in the past. We performed a pilot study that suggests that this reduction method preferentially chooses mutants that will survive mutation testing; that is, it preserves a greater number of interesting mutants than random reduction does.

Hunting Bugs

A Cost-Effectiveness Criterion for Applying Software Defect Prediction Models
Hongyu Zhang

and S. C. Cheung
(Tsinghua University, China; ISCAS, China; Hong Kong University of Science and Technology, China)
Ideally, software defect prediction models should help organize software quality assurance (SQA) resources and reduce cost of finding defects by allowing the modules most likely to contain defects to be inspected first. In this paper, we study the cost-effectiveness of applying defect prediction models in SQA and propose a basic cost-effectiveness criterion. The criterion implies that defect prediction models should be applied with caution. We also propose a new metric FN/(FN+TN) to measure the cost-effectiveness of a defect prediction model.

BugMap: A Topographic Map of Bugs
Jiangtao Gong and Hongyu Zhang

(Tsinghua University, China; ISCAS, China)
A large and complex software system could contain a large number of bugs. It is desirable for developers to understand how these bugs are distributed across the system, so they could have a better overview of software quality. In this paper, we describe BugMap, a tool we developed for visualizing large-scale bug location information. Taken source code and bug data as the input, BugMap can display bug localizations on a topographic map. By examining the topographic map, developers can understand how the components and files are affected by bugs. We apply this tool to visualize the distribution of Eclipse bugs across components/files. The results show that our tool is effective for understanding the overall quality status of a large-scale system and for identifying the problematic areas of the system.

Lexical Statistical Machine Translation for Language Migration
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen
(Iowa State University, USA)
Prior research has shown that source code also exhibits naturalness, i.e. it is written by humans and is likely to be repetitive. The researchers also showed that the n-gram language model is useful in predicting the next token in a source file given a large corpus of existing source code. In this paper, we investigate how well statistical machine translation (SMT) models for natural languages could help in migrating source code from one programming language to another. We treat source code as a sequence of lexical tokens and apply a phrase-based SMT model on the lexemes of those tokens. Our empirical evaluation on migrating two Java projects into C# showed that lexical, phrase-based SMT could achieve high lexical translation accuracy (BLEU from 81.3-82.6%). Users would have to manually edit only 11.9-15.8% of the total number of tokens in the resulting code to correct it. However, a high percentage of total translation methods (49.5-58.6%) is syntactically incorrect. Therefore, our result calls for a more program-oriented SMT model that is capable of better integrating the syntactic and semantic information of a program to support language migration.

Code Fragment Summarization
Annie T. T. Ying and Martin P. Robillard

(McGill University, Canada)
Current research in software engineering has mostly focused on the retrieval accuracy aspect but little on the presentation aspect of code examples, e.g., how code examples are presented in a result page. We investigate the feasibility of summarizing code examples for better presenting a code example. Our algorithm based on machine learning could approximate summaries in an oracle manually generated by humans with a precision of 0.71. This result is promising as summaries with this level of precision achieved the same level of agreement as human annotators with each other.

Video

Understanding Software Development

Understanding Gamification Mechanisms for Software Development
Daniel J. Dubois and Giordano Tamburrelli
(Massachusetts Institute of Technology, USA; University of Lugano, Switzerland)
In this paper we outline the idea to adopt gamification techniques to engage, train, monitor, and motivate all the players involved in the development of complex software artifacts, from the inception to the deployment and maintenance. The paper introduces the concept of gamification and proposes a research approach to understand how its principles may be successfully applied to the process of software development. Applying gamification to software engineering is not as straightforward as it may appear since it has to be casted to the peculiarities of this domain. Existing literature in the area has already recognized the possible use of such technology in the context of software development, however how to design and use gamification in this context is still an open question. This leads to several research challenges which are organized in a fascinating research agenda that is part of the contribution of this paper. Finally, to support the proposed ideas we present a preliminary experiment that shows the effect of gamification on the performance of students involved in a software engineering project.

Toward Understanding the Causes of Unanswered Questions in Software Information Sites: A Case Study of Stack Overflow
Ripon K. Saha, Avigit K. Saha, and Dewayne E. Perry
(University of Texas at Austin, USA; University of Saskatchewan, Canada)
Stack Overflow is a highly successful question-answering website in the programming community, which not only provide quick solutions to programmers’ questions but also is considered as a large repository of valuable software engineering knowledge. However, despite having a very engaged and active user community, Stack Overflow currently has more than 300K unanswered questions. In this paper, we perform an initial investigation to understand why these questions remain unanswered by applying a combination of statistical and data mining techniques. Our preliminary results indicate that although there are some topics that were never answered, most questions remained unanswered because they apparently are of little interest to the user community.

Where Is the Business Logic?
Yael Dubinsky, Yishai Feldman, and Maayan Goldstein
(IBM Research, Israel)
One of the challenges in maintaining legacy systems is to be able to locate business logic in the code, and isolate it for different purposes, including implementing requested changes, refactoring, eliminating duplication, unit testing, and extracting business logic into a rule engine. Our new idea is an iterative method to identify the business logic in the code and visualize this information to gain better understanding of the logic distribution in the code, as well as developing a domain-specific business vocabulary. This new method combines and extends several existing technologies, including search, aggregation, and visualization. We evaluated the visualization method on a large-scale application and found that it yields useful results, provided an appropriate vocabulary is available.

Towards Emotional Awareness in Software Development Teams
Emitza Guzman and Bernd Bruegge
(TU Munich, Germany)
Emotions play an important role in determining work results and how team members collaborate within a project. When working in large, distributed teams, members can lose awareness of the emotional state of the project. We propose an approach to improve emotional awareness in software development teams by means of quantitative emotion summaries. Our approach automatically extracts and summarizes emotions expressed in collaboration artifacts by combining probabilistic topic modeling with lexical sentiment analysis techniques. We applied the approach to 1000 collaboration artifacts produced by three development teams in a three month period. Interviews with the teams' project leaders suggest that the proposed emotion summaries have a good correlation with the emotional state of the project, and could be useful for improving emotional awareness. However, the interviews also indicate that the current state of the summaries is not detailed enough and further improvements are needed.

Industrial Research

Implementing Sound Software-Engineering Practices in Companies

Precise Range Analysis on Large Industry Code
Shrawan Kumar, Bharti Chimdyalwar, and Ulka Shrotri
(Tata Consultancy Services, India)
Abstract interpretation is widely used to perform static code analysis with non-relational (interval) as well as relational (difference-bound matrices, polyhedral) domains. Analysis using non-relational domains is highly scalable but delivers imprecise results, whereas, use of relational domains produces precise results but does not scale up. We have developed a tool that implements K-limited path sensitive interval domain analysis to get precise results without losing on scalability. The tool was able to successfully analyse 10 million lines of embedded code for different properties such as division by zero, array index out of bound (AIOB), overflow-underflow and so on. This paper presents details of the tool and results of our experiments for detecting AIOB property. A comparison with the existing tools in the market demonstrates that our tool is more precise and scales better.

Agreements for Software Reuse in Corporations
Thijmen de Gooijer and Heiko Koziolek
(ABB Research, Sweden; ABB Research, Germany)
Agreements for sharing of software between entities in a corporation have to be tailored to fit the situation. Such agreements are not legal documents and must address different issues than traditional software licenses. We found that these agreements should cover what is granted, payment, support, ownership and liability. In a case study we learned that an agreement should list its assumptions on the structure and processes of the software organization. The presented work enables others to create guidelines for software sharing agreements tailored to their organization and shares lessons about the differences between software product lines and corporate software sharing and reuse.

Approaches to Quality

ShAir: Extensible Middleware for Mobile Peer-to-Peer Resource Sharing
Daniel J. Dubois, Yosuke Bando, Konosuke Watanabe, and Henry Holtzman
(Massachusetts Institute of Technology, USA; Toshiba, Japan)
ShAir is a middleware infrastructure that allows mobile applications to share resources of their devices (e.g., data, storage, connectivity, computation) in a transparent way. The goals of ShAir are: (i) abstracting the creation and maintenance of opportunistic delay-tolerant peer-to-peer networks; (ii) being decoupled from the actual hardware and network platform; (iii) extensibility in terms of supported hardware, protocols, and on the type of resources that can be shared; (iv) being capable of self-adapting at run-time; (v) enabling the development of applications that are easier to design, test, and simulate. In this paper we discuss the design, extensibility, and maintainability of the ShAir middleware, and how to use it as a platform for collaborative resource-sharing applications. Finally we show our experience in designing and testing a file-sharing application.

Info

Risky Files: An Approach to Focus Quality Improvement Effort
Audris Mockus, Randy Hackbarth, and John Palframan
(Avaya Labs Research, USA)
As the development of software products frequently transitions among globally distributed teams, the knowledge about the source code, design decisions, original requirements, and the history of troublesome areas gets lost. A new team faces tremendous challenges to regain that knowledge. In numerous projects we observed that only 1% of project files are involved in more than 60% of the customer reported defects (CFDs), thus focusing quality improvement on such files can greatly reduce the risk of poor product quality. We describe a mostly automated approach that annotates the source code at the file and module level with the historic information from multiple version control, issue tracking, and an organization's directory systems. Risk factors (e.g, past changes and authors who left the project) are identified via a regression model and the riskiest areas undergo a structured evaluation by experts. The results are presented via a web-based tool and project experts are then trained how to use the tool in conjunction with a checklist to determine risk remediation actions for each risky file. We have deployed the approach in seven projects in Avaya and are continuing deployment to the remaining projects as we are evaluating the results of earlier deployments. The approach is particularly helpful to focus quality improvement effort for new releases of deployed products in a resource-constrained environment.

System Reliability Calculation Based on the Run-Time Analysis of Ladder Program
Yu Jiang

, Hehua Zhang, Han Liu, Xiaoyu Song, William N. N. Hung, Ming Gu, and Jiaguang Sun
(Tsinghua University, China; Portland State University, USA)
Programmable logic controller (PLC) system is a typical kind of embedded system that is widely used in industry. The complexity of reliability analysis of safety critical PLC systems arises in handling the temporal correlations among the system components caused by the run-time execution logic of the embedded ladder program. In this paper, we propose a novel probabilistic model for the reliability analysis of PLC systems, called run-time reliability model (RRM). It is constructed based on the structure and run-time execution of the embedded ladder program, automatically. Then, we present some custom-made conditional probability distribution (CPD) tables according to the execution semantics of the RRM nodes, and insert the reliability probability of each system component referenced by the node into the corresponding CPD table. The proposed model is accurate and fast compared to previous work as described in the experiment results.

Effective Industry Use of Software-Engineering Tools

h-ubu: An Industrial-Strength Service-Oriented Component Framework for JavaScript Applications
Clement Escoffier, Philippe Lalanda, and Nicolas Rempulsky
(Grenoble University, France; Ubidreams, France)
In the last years, we developed web applications requiring a large amount of JavaScript code. These web applications present adaptation requirements. In addition to platform-centric adaptation, applications have to dynamically react to external events like connectivity disruptions. Building such applications is complex and we faced sharp maintainability challenges. This paper presents h-ubu, a service-oriented component framework for JavaScript allowing building adaptive applications. h-ubu is used in industrial web applications and mobile applications. h-ubu is available in open source, as part of the OW2 Nanoko project.

Design and Optimization of Multi-clocked Embedded Systems using Formal Technique
Yu Jiang

, Zonghui Li, Hehua Zhang, Yangdong Deng, Xiaoyu Song, Ming Gu, and Jiaguang Sun
(Tsinghua University, China; Portland State University, USA)
Today’s system-on-chip and distributed systems are commonly equipped with multiple clocks. The key challenge in designing such systems is that heterogenous control-oriented and data-oriented behaviors within one clock domain, and asynchronous communications between two clock domains have to be captured and evaluated in a single framework. In this paper, we propose to use timed automata and synchronous dataflow to capture the dynamic behaviors of multi-clock embedded systems. A timed automata and synchronous dataflow based modeling and analyzing framework is constructed to evaluate and optimize the performance of multiclock embedded systems. Data-oriented behaviors are captured by synchronous dataflow, while synchronous control-oriented behaviors are captured by timed automata, and inter clock-domain asynchronous communication can be modeled in an interface timed automaton or a synchronous dataflow module with the CSP mechanism. The behaviors of synchronous dataflow are interpreted by some equivalent timed automata to maintain the semantic consistency of the mixed model. Then, various functional properties can be simulated and verified within the framework. We apply this framework in the design process of a sub-system that is used in real world subway communication control system

The Economics of Static Analysis Tools
Rahul Kumar and Aditya V. Nori
(Microsoft Research, India)
Static analysis tools have experienced a dichotomy over the span of the last decade. They have proven themselves to be useful in many domains, but at the same time have not (in general) experienced any notable concrete integration into a development environment. This is partly due to the inherent complexity of the tools themselves, as well as due to other intangible factors. Such factors usually tend to include questions about the return on investment of the tool and the value the tool provides in a development environment. In this paper, we present an empirical model for evaluating static analysis tools from the perspective of the economic value they provide. We further apply this model to a case study of the Static Driver Verier (SDV) tool that ships with the Windows Driver Kit and show the usefulness of the model and the tool.

Doctoral Symposium

Doctoral Papers 1

Automatically Describing Software Faults
Nicholas DiGiuseppe
(UC Irvine, USA)
A developers ability to successfully debug a fault is directly related to their ability to comprehend the fault. Notwithstanding improvements in software-maintenance automation, this fault comprehension task remains largely manual and time consuming. I propose an automated approach to describe software faults, thus ameliorating comprehension and reducing manual effort. My approach leverages dynamic analysis, fault localization, and source-code mining to produce a succinct, natural-language fault summary.

Fuzzy Service Matching in On-The-Fly Computing
Marie Christin Platenius
(University of Paderborn, Germany)
In the future vision of software engineering, services from world-wide markets are composed automated in order to build custom-made systems. Supporting such scenarios requires an adequate service matching approach. Many existing approaches do not fulfill two key requirements of emerging concepts like On-The-Fly-Computing, namely (1) comprehensiveness, i.e., the consideration of different service views that cover not only functional properties, but also non-functional properties and (2) fuzzy matching, i.e., the ability to deliver gradual results in order to cope with a certain extent of uncertainty, incompleteness, and tolerance ranges. In this paper, I present a fuzzy matching process that distinguishes between different fuzziness sources and leverages fuzziness in different matching steps which consider several service views, e.g., behavior and quality properties.

Doctoral Papers 2

PHRT: A Model and Programmable Tool for Hardware Reengineering Automation
Oleg Nenashev
(Saint Petersburg State Polytechnical University, Russia)
Hardware reengineering is a highly resource-consuming process of development cycle, so it is important to automate reengineering in order to reduce costs and provide reusable solutions. There are many specialized electronic design automation (EDA) tools for specific cases, but only few programmable tools supporting implementation of user-specific reengineering operations.
This paper presents PhD research, which aims development of such Programmable Hardware Reengineering Tool (PHRT), which can be useful for small hardware-design companies and research groups, who have specific recurrent tasks and cannot afford development of automation tools “from scratch”.
We propose HDL-independent “hybrid” device representation model for automated analysis and transformation, which combines low-level structural descriptions (netlists) with features from high-level hardware description languages (HDLs). Such model supports parallel analysis and transformation of multiple description layers at once. In our research we present PHRT prototype, which is an extendable core, which provides basic functionality for import/export, analysis, editing and transformation of hybrid models. Its functionality can be extended by extensions and script programs.
At the current state, PHRT prototype is being successfully used by several Russian hardware-design companies. Test results have proven applicability of PHRT as a good framework for user-specific reengineering cases like testing instrumentation and reliability assurance (memory replacement, structural redundancy insertion, etc.).

Using Topic Models to Understand the Evolution of a Software Ecosystem
Nicolas Lopez
(UC Irvine, USA)
The development of a software system is now ever more frequently a part of a larger development effort, including multiple software systems that co-exist in the same environment: a software ecosystem. Though most studies of the evolution of software have focused on a single software system, there is much that we can learn from the analysis of a set of interrelated systems. Topic modeling techniques show promise for mining the data stored in software repositories to understand the evolution of a system. In my research I seek to explore how topic modeling techniques can aid in understanding the evolution of a software ecosystem. The results of this research have the potential to improve how topic modeling techniques are used to predict, plan, and understand the evolution of software, and will inform the design of tools that support software engineering activities such as feature location, expertise identification, and bug detection.

Doctoral Papers 3

Automotive Architecture Description and Its Quality
Yanja Dajsuren
(Eindhoven University of Technology, Netherlands)
This research is part of the Hybrid Innovations for Trucks (HIT), an ongoing multi-disciplinary project with the objectives of CO₂ emission reduction and fuel saving for long haul vehicles. Achieving this goal necessitates definition of a proper architecture and quality techniques to enable the development of a new and more efficient control software. Therefore, this research covers automotive architecture description language and quality of automotive software.

Towards Open Architecture System
Bahtijar Vogel
(Linnaeus University, Sweden)
The use of diverse standards while developing web and mobile technologies brings new challenges when it comes to flexibility, interoperability, customizability and extensibility of the software systems. In addition, such systems in most of the cases are closed, thus make the development and customization process for system designers, developers and end-users a challenging effort. All these developments require further research attention. This work addresses these challenges from open system architecture perspective. The proposed approach is based on practical development efforts, and theoretical research including state of the art projects and definitions related to open architectures that we surveyed. The initial results indicate that a combination of service-oriented approaches with open source components and open standard data formats pave the way towards an open, extensible architecture. The core contribution of this research will be (a) an open architecture model and (b) the developed system itself based on the model, and (c) the benefits of applying open architecture approaches throughout the development processes.

Doctoral Papers 4

A Framework for Defining the Dynamic Semantics of DSLs
Ulyana Tikhonova
(Eindhoven University of Technology, Netherlands)
In this research abstract we describe our project on a common reference framework for defining domain specific languages (DSLs). The framework is meant for defining the dynamic semantics of DSLs and allows for mapping the DSL definition to the various platforms, such as verification, validation and simulation. The objectives of the project are to make a DSL dynamic semantics definition explicit and to use this definition for bridging technological diversity of various platforms, used in the DSLs development.

ESEC/FSE 2013 – Proceedings

Preface

Keynotes

Panels

AEC Summary

Technical Research

Testing I

Dynamic and Variable Software

Formal Reasoning

Empirical Studies I

Parallel, Concurrent, and Distributed Systems

Software Development Activities

Testing II

Dynamic Analysis

Models and Features

Test and Analysis

Maintenance and Evolution

Formal Verification

Model Inference and Synthesis

Empirical Studies II

JavaScript

Source Code and Programming

Bug Detection

Tool Demonstrations

Tool Demonstrations I

Tool Demonstrations II

Tool Demonstrations III

New Ideas

Analysis and Testing

Hunting Bugs

Understanding Software Development

Industrial Research

Implementing Sound Software-Engineering Practices in Companies

Approaches to Quality

Effective Industry Use of Software-Engineering Tools

Doctoral Symposium

Doctoral Papers 1

Doctoral Papers 2

Doctoral Papers 3

Doctoral Papers 4