ASE 2013 – Proceedings

Round-Up: Runtime Checking Quasi Linearizability of Concurrent Data Structures
Lu Zhang, Arijit Chattopadhyay, and Chao Wang
(Virginia Tech, USA)
We propose a new method for runtime checking of a relaxed consistency property called quasi linearizability for concurrent data structures. Quasi linearizability generalizes the standard notion of linearizability by intentionally introducing nondeterminism into the parallel computations and exploiting such nondeterminism to improve the performance. However, ensuring the quantitative aspects of this correctness condition in the low level code is a difficult task. Our method is the first fully automated method for checking quasi linearizability in the unmodified C/C++ code of concurrent data structures. It guarantees that all the reported quasi linearizability violations are real violations. We have implemented our method in a software tool based on LLVM and a concurrency testing tool called Inspect. Our experimental evaluation shows that the new method is effective in detecting quasi linearizability violations in the source code of concurrent data structures.

Constraint-Based Automatic Symmetry Detection
Shao Jie Zhang, Jun Sun, Chengnian Sun, Yang Liu

, Junwei Ma, and Jin Song Dong
(Singapore University of Technology and Design, Singapore; National University of Singapore, Singapore; Nanyang Technological University, Singapore)
We present an automatic approach to detecting symmetry relations for general concurrent models. Despite the success of symmetry reduction in mitigating state explosion problem, one essential step towards its soundness and effectiveness, i.e., how to discover sufficient symmetries with least human efforts, is often either overlooked or oversimplified. In this work, we show how a concurrent model can be viewed as a constraint satisfaction problem (CSP), and present an algorithm capable of detecting symmetries arising from the CSP which induce automorphisms of the model. To the best of our knowledge, our method is the first approach that can automatically detect both process and data symmetries as demonstrated via a number of systems.

Proving MCAPI Executions Are Correct using SMT
Yu Huang, Eric Mercer, and Jay McCarthy
(Brigham Young University, USA)
Asynchronous message passing is an important paradigm in writing applications for embedded heterogeneous multicore systems. The Multicore Association (MCA), an industry consortium promoting multicore technology, is working to standardize message passing into a single API, MCAPI, for bare metal implementation and portability across platforms. Correctness in such an API is difficult to reason about manually, and testing against reference solutions is equally difficult as reference solutions implement an unknown set of allowed behaviors, and programmers have no way to directly control API internals to expose or reproduce errors. This paper provides a way to encode an MCAPI execution as a Satisfiability Modulo Theories (SMT) problem, which if satisfiable, yields a feasible execution schedule on the same trace, such that it resolves non-determinism in the MCAPI runtime in a way that it now fails user provided assertions. The paper proves the problem is NP-complete. The encoding is useful for test, debug, and verification of MCAPI program execution. The novelty in the encoding is the direct use of match pairs (potential send and receive couplings). Match-pair encoding for MCAPI executions, when compared to other encoding strategies, is simpler to reason about, results in significantly fewer terms in the SMT problem, and captures feasible behaviors that are ignored in previously published techniques. Further, to our knowledge, this is the first SMT encoding that is able to run in infinite-buffer semantics, meaning the runtime has unlimited internal buffering as opposed to no internal buffering. Results demonstrate that the SMT encoding, restricted to zero-buffer semantics, uses fewer clauses when compared to another zero-buffer technique, and it runs faster and uses less memory. As a result the encoding scales well for programs with high levels of non-determinism in how sends and receives may potentially match.

Efficient Data Race Prediction with Incremental Reasoning on Time-Stamped Lock History
Malay K. Ganai
(NEC Labs, USA)
We present an efficient data race prediction algorithm that uses lock-reordering based incremental search on time-stamped lock histories for solving multiple races effectively. We balance prediction accuracy, coverage, and performance with a specially designed pairwise reachability algorithm that can store and re-use past search results, thereby, amortizing the cost of reasoning over redundant and overlapping search space. Compared to graph-based search algorithms, our algorithm incurs much smaller overhead due to amortization, and can potentially be used while a program under test is executing. To demonstrate such a possibility, we implemented our approach as an (iPA) module in a predictive testing framework. Our approach can handle traces with a few hundreds to half a million events, and predict known/unknown real data races with a performance penalty of less than 4

Dynamic Analysis

PIEtrace: Platform Independent Executable Trace
Yonghwi Kwon, Xiangyu Zhang, and Dongyan Xu
(Purdue University, USA)
To improve software dependability, a large number of software engineering tools have been developed over years. Many of them are difficult to apply in practice because their system and library requirements are incompatible with those of the subject software. We propose a technique called platform independent executable trace. Our technique traces and virtualizes a regular program execution that is platform dependent, and generates a stand-alone program called the trace program. Running the trace program re-generates the original execution. More importantly, trace program execution is completely independent of the underlying operating system and libraries such that it can be compiled and executed on arbitrary platforms. As such, it can be analyzed by a third party tool on a platform preferred by the tool. We have implemented the technique on x86 and sensor platforms. We show that buggy executions of 10 real-world Windows and sensor applications can be traced and virtualized, and later analyzed by existing Linux tools. We also demonstrate how the technique can be used in cross-platform malware analysis.

Info

ACM SIGSOFT Distinguished Paper Award

Improving Efficiency of Dynamic Analysis with Dynamic Dependence Summaries
Vijay Krishna Palepu, Guoqing Xu, and James A. Jones

(University of California at Irvine, USA)
Modern applications make heavy use of third-party libraries and components, which poses new challenges for efficient dynamic analysis. To perform such analyses, transitive dependent components at all layers of the call stack must be monitored and analyzed, and as such may be prohibitively expensive for systems with large libraries and components. As an approach to address such expenses, we record, summarize, and reuse dynamic dataflows between inputs and outputs of components, based on dynamic control and data traces. These summarized dataflows are computed at a fine-grained instruction level; the result of which, we call “dynamic dependence summaries.” Although static summaries have been proposed, to the best of our knowledge, this work presents the first technique for dynamic dependence summaries. The benefits to efficiency of such summarization may be afforded with losses of accuracy. As such, we evaluate the degree of accuracy loss and the degree of efficiency gain when using dynamic dependence summaries of library methods. On five large programs from the DaCapo benchmark (for which no existing whole-program dynamic dependence analyses have been shown to scale) and 21 versions of NANOXML, the summarized dependence analysis provided 90% accuracy and a speed-up of 100% (i.e., ×2), on average, when compared to traditional exhaustive dynamic dependence analysis.

Efficient Parametric Runtime Verification with Deterministic String Rewriting
Patrick Meredith and Grigore Roşu

(University of Illinois at Urbana-Champaign, USA)
Early efforts in runtime verification show that parametric regular and temporal logic specifications can be monitored efficiently. These approaches, however, have limited expressiveness: their specifications always reduce to monitors with finite state. More recent developments showed that parametric context-free properties can be efficiently monitored with overheads generally lower than 12-15%. While context-free grammars are more expressive than finite-state languages, they still do not allow every computable safety property. This paper presents a monitor synthesis algorithm for string rewriting systems (SRS). SRSs are well known to be Turing complete, allowing for the formal specification of any computable safety property. Earlier attempts at Turing complete monitoring have been relatively inefficient. This paper demonstrates that monitoring parametric SRSs is practical. The presented algorithm uses a modified version of Aho-Corasick string searching for quick pattern matching with an incremental rewriting approach that avoids reexamining parts of the string known to contain no redexes.

Info

Identifying Execution Points for Dynamic Analyses
William N. Sumner and Xiangyu Zhang

(Simon Fraser University, Canada; Purdue University, USA)
Dynamic analyses rely on the ability to identify points within or across executions. In spite of this being a core task for dynamic analyses, new solutions are frequently developed without an awareness of existing solutions, their strengths, their weaknesses, or their caveats. This paper surveys the existing approaches for identifying execution points and examines their analytical and empirical properties that researchers and developers should be aware of when using them within an analysis. In addition, based on limitations in precision, correctness, and efficiency for techniques that identify corresponding execution points across multiple executions, we designed and implemented a new technique, Precise Execution Point IDs. This technique avoids correctness and precision issues in prior solutions, enabling analyses that use our approach to also produce more correct results. Empirical comparison with the surveyed techniques shows that our approach has 25% overhead on average, several times less than existing solutions.

Testing

Operator-Based and Random Mutant Selection: Better Together
Lingming Zhang, Milos Gligoric, Darko Marinov

, and Sarfraz Khurshid
(University of Texas at Austin, USA; University of Illinois at Urbana-Champaign, USA)
Mutation testing is a powerful methodology for evaluating the quality of a test suite. However, the methodology is also very costly, as the test suite may have to be executed for each mutant. Selective mutation testing is a well-studied technique to reduce this cost by selecting a subset of all mutants, which would otherwise have to be considered in their entirety. Two common approaches are operator-based mutant selection, which only generates mutants using a subset of mutation operators, and random mutant selection, which selects a subset of mutants generated using all mutation operators. While each of the two approaches provides some reduction in the number of mutants to execute, applying either of the two to medium-sized, real- world programs can still generate a huge number of mutants, which makes their execution too expensive. This paper presents eight random sampling strategies defined on top of operator- based mutant selection, and empirically validates that operator- based selection and random selection can be applied in tandem to further reduce the cost of mutation testing. The experimental results show that even sampling only 5% of mutants generated by operator-based selection can still provide precise mutation testing results, while reducing the average mutation testing time to 6.54% (i.e., on average less than 5 minutes for this study).

Info

Testing Properties of Dataflow Program Operators
Zhihong Xu, Martin Hirzel, Gregg Rothermel, and Kun-Lung Wu
(University of Nebraska-Lincoln, USA; IBM Research, USA)
Dataflow programming languages, which represent programs as graphs of data streams and operators, are becoming increasingly popular and being used to create a wide array of commercial software applications. The dependability of programs written in these languages, as well as the systems used to compile and run these programs, hinges on the correctness of the semantic properties associated with operators. Unfortunately, these properties are often poorly defined, and frequently are not checked, and this can lead to a wide range of problems in the programs that use the operators. In this paper we present an approach for improving the dependability of dataflow programs by checking operators for necessary properties. Our approach is dynamic, and involves generating tests whose results are checked to determine whether specific properties hold or not. We present empirical data that shows that our approach is both effective and efficient at assessing the status of properties.

Bita: Coverage-Guided, Automatic Testing of Actor Programs
Samira Tasharofi, Michael Pradel

, Yu Lin, and Ralph E. Johnson
(University of Illinois at Urbana-Champaign, USA; ETH Zurich, Switzerland)
Actor programs are concurrent programs where concurrent entities communicate asynchronously by exchanging messages. Testing actor programs is challenging because the order of message receives depends on the non-deterministic scheduler and because exploring all schedules does not scale to large programs. This paper presents Bita, a scalable, automatic approach for testing non-deterministic behavior of actor programs. The key idea is to generate and explore schedules that are likely to reveal concurrency bugs because these schedules increase the schedule coverage. We present three schedule coverage criteria for actor programs, an algorithm to generate feasible schedules that increase coverage, and a technique to force a program to comply with a schedule. Applying Bita to real-world actor programs implemented in Scala reveals eight previously unknown concurrency bugs, of which six have already been fixed by the developers. Furthermore, we show our approach to find bugs 122x faster than random scheduling, on average.

SABRINE: State-Based Robustness Testing of Operating Systems
Domenico Cotroneo, Domenico Di Leo, Francesco Fucci, and Roberto Natella

(Università degli Studi di Napoli Federico II, Italy; Critiware, Italy)
The assessment of operating systems robustness with respect to unexpected or anomalous events is a fundamental requirement for mission-critical systems. Robustness can be tested by deliberately exposing the system to erroneous events during its execution, and then analyzing the OS behavior to evaluate its ability to gracefully handle these events. Since OSs are complex and stateful systems, robustness testing needs to account for the timing of erroneous events, in order to evaluate the robust behavior of the OS under different states. This paper presents SABRINE (StAte-Based Robustness testIng of operatiNg systEms), an approach for state-aware robustness testing of OSs. SABRINE automatically extracts state models from execution traces, and generates a set of test cases that cover different OS states. We evaluate the approach on a Linux-based Real-Time Operating System adopted in the avionic domain. Experimental results show that SABRINE can automatically identify relevant OS states, and find robustness vulnerabilities while keeping low the number of test cases.

Verification

Blitz: Compositional Bounded Model Checking for Real-World Programs
Chia Yuan Cho, Vijay D’Silva, and Dawn Song
(University of California at Berkeley, USA)
Bounded Model Checking (BMC) for software is a precise bug-finding technique that builds upon the efficiency of modern SAT and SMT solvers. BMC currently does not scale to large programs because the size of the generated formulae exceeds the capacity of existing solvers. We present a new, compositional and property-sensitive algorithm that enables BMC to automatically find bugs in large programs. A novel feature of our technique is to decompose the behaviour of a program into a sequence of BMC instances and use a combination of satisfying assignments and unsatisfiability proofs to propagate information across instances. A second novelty is to use the control- and data-flow of the program as well as information from proofs to prune the set of variables and procedures considered and hence, generate smaller instances. Our tool BLITZ outperforms existing tools and scales to programs with over 100,000 lines of code. BLITZ automatically and efficiently discovers bugs in widely deployed software including new vulnerabilities in Internet infrastructure software.

Ranger: Parallel Analysis of Alloy Models by Range Partitioning
Nicolás Rosner, Junaid H. Siddiqui, Nazareno Aguirre

, Sarfraz Khurshid, and Marcelo F. Frias

(Universidad de Buenos Aires, Argentina; LUMS School of Science and Engineering, Pakistan; Universidad Nacional de Río Cuarto, Argentina; University of Texas at Austin, USA; Instituto Tecnológico de Buenos Aires, Argentina)
We present a novel approach for parallel analysis of models written in Alloy, a declarative extension of first-order logic based on relations. The Alloy language is supported by the fully automatic Alloy Analyzer, which translates models into propositional formulas and uses off-the-shelf SAT technology to solve them. Our key insight is that the underlying constraint satisfaction problem can be split into subproblems of lesser complexity by using ranges of candidate solutions, which partition the space of all candidate solutions. Conceptually, we define a total ordering among the candidate solutions, split this space of candidates into ranges, and let independent SAT searches take place within these ranges’ endpoints. Our tool, Ranger, embodies our insight. Experimental evaluation shows that Ranger provides substantial speedups (in several cases, superlinear ones) for a variety of hard-to-solve Alloy models, and that adding more hardware reduces analysis costs almost linearly.

Automated Verification of Pattern-Based Interaction Invariants in Ajax Applications
Yuta Maezawa, Hironori Washizaki, Yoshinori Tanabe, and Shinichi Honiden
(University of Tokyo, Japan; Waseda University, Japan; National Institute of Informatics, Japan)
When developing asynchronous JavaScript and XML (Ajax) applications, developers implement Ajax design patterns for increasing the usability of the applications. However, unpredictable contexts of running applications might conceal faults that will break the design patterns, which decreases usability. We propose a support tool called JSVerifier that automatically verifies interaction invariants; the applications handle their interactions in invariant occurrence and order. We also present a selective set of interaction invariants derived from Ajax design patterns, as input. If the application behavior breaks the design patterns, JSVerifier automatically outputs faulty execution paths for debugging. The results of our case studies show that JSVerifier can verify the interaction invariants in a feasible amount of time, and we conclude that it can help developers increase the usability of Ajax applications.

Software Model Checking for Distributed Systems with Selector-Based, Non-blocking Communication
Cyrille Artho, Masami Hagiya, Richard Potter, Yoshinori Tanabe, Franz Weitl, and Mitsuharu Yamamoto
(AIST, Japan; University of Tokyo, Japan; National Institute of Informatics, Japan; Chiba University, Japan)
Many modern software systems are implemented as client/server architectures, where a server handles multiple clients concurrently. Testing does not cover the outcomes of all possible thread and communication schedules reliably. Software model checking, on the other hand, covers all possible outcomes but is often limited to subsets of commonly used protocols and libraries. Earlier work in cache-based software model checking handles implementations using socket-based TCP/IP networking, with one thread per client connection using blocking input/output. Recently, servers using non-blocking, selector-based input/output have become prevalent. This paper describes our work extending the Java PathFinder extension net-iocache to such software, and the application of our tool to modern server software.

Info

Evolution

A Study of Repetitiveness of Code Changes in Software Evolution
Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and Hridesh Rajan

(Iowa State University, USA)
In this paper, we present a large-scale study of repetitiveness of code changes in software evolution. We collected a large data set of 2,841 Java projects, with 1.7 billion source lines of code (SLOC) at the latest revisions, 1.8 million code change revisions (0.4 million fixes), 6.2 million changed files, and 2.5 billion changed SLOCs. A change is considered repeated within or cross-project if it matches another change having occurred in the history of the project or another project, respectively. We report the following important findings. First, repetitiveness of changes could be as high as 70-100% at small sizes and decreases exponentially as size increases. Second, repetitiveness is higher and more stable in the cross-project setting than in the within-project one. Third, fixing changes repeat similarly to general changes. Importantly, learning code changes and recommending them in software evolution is beneficial with accuracy for top-1 recommendation of over 30% and top-3 of nearly 35%. Repeated fixing changes could also be useful for automatic program repair.

Consistency-Preserving Edit Scripts in Model Versioning
Timo Kehrer, Udo Kelter, and Gabriele Taentzer
(University of Siegen, Germany; Philipps-Universität Marburg, Germany)
In model-based software development, models are iteratively evolved. To optimally support model evolution, developers need adequate tools for model versioning tasks, including comparison, patching, and merging of models. A significant disadvantage of tools currently available is that they display, and operate with, low-level model changes which refer to internal model representations and which can lead to intermediate inconsistent states. Higher-level consistency-preserving edit operations including refactorings are better suited to explain changes or to resolve conflicts. This paper presents an automatic procedure which transforms a low-level difference into an executable edit script which uses consistency-preserving edit operations only. Edit scripts support consistent model patching and merging on a higher abstraction level. Our approach to edit script generation has been evaluated in a larger real-world case study.

JFlow: Practical Refactorings for Flow-Based Parallelism
Nicholas Chen and Ralph E. Johnson
(University of Illinois at Urbana-Champaign, USA)
Emerging applications in the domains of recognition, mining and synthesis (RMS); image and video processing; data warehousing; and automatic financial trading admit a particular style of parallelism termed flow-based parallelism. To help developers exploit flow-based parallelism, popular parallel libraries such as Groovy's GPars, Intel's TBB Flow Graph and Microsoft's TPL Dataflow have begun introducing many new and useful constructs. However, to reap the benefits of such constructs, developers must first use them. This involves refactoring their existing sequential code to incorporate these constructs – a manual process that overwhelms even experts. To alleviate this burden, we introduce a set of novel analyses and transformations targeting flow-based parallelism. We implemented these ideas in JFlow, an interactive refactoring tool integrated into the Eclipse IDE. We used JFlow to parallelize seven applications: four from a previously known benchmark and three from a suite of large open source projects. JFlow, with minimal interaction from the developer, can successfully parallelize applications from the aforementioned domains with good performance (offering up to 3.45x speedup on a 4-core machine) and is fast enough to be used interactively as part of a developer's workflow.

Automated Planning for Software Architecture Evolution
Jeffrey M. Barnes, Ashutosh Pandey, and David Garlan
(Carnegie Mellon University, USA)
In previous research, we have developed a theoretical framework to help software architects make better decisions when planning software evolution. Our approach is based on representation and analysis of candidate evolution paths--sequences of transitional architectures leading from the current system to a desired target architecture. One problem with this kind of approach is that it imposes a heavy burden on the software architect, who must explicitly define and model these candidate paths. In this paper, we show how automated planning techniques can be used to support automatic generation of evolution paths, relieving this burden on the architect. We illustrate our approach by applying it to a data migration scenario, showing how this architecture evolution problem can be translated into a planning problem and solved using existing automated planning tools.

Info

Generation and Synthesis

Automatically Synthesizing SQL Queries from Input-Output Examples
Sai Zhang and Yuyin Sun
(University of Washington, USA)
Many computer end-users, such as research scientists and business analysts, need to frequently query a database, yet lack enough programming knowledge to write a correct SQL query. To alleviate this problem, we present a programming by example technique (and its tool implementation, called SQLSynthesizer) to help end-users automate such query tasks. SQLSynthesizer takes from users an example input and output of how the database should be queried, and then synthesizes a SQL query that reproduces the example output from the example input. If the synthesized SQL query is applied to another, potentially larger, database with a similar schema, the synthesized SQL query produces a corresponding result that is similar to the example output.
We evaluated SQLSynthesizer on 23 exercises from a classic database textbook and 5 forum questions about writing SQL queries. SQLSynthesizer synthesized correct answers for 15 textbook exercises and all 5 forum questions, and it did so from relatively small examples.

SEDGE: Symbolic Example Data Generation for Dataflow Programs
Kaituo Li, Christoph Reichenbach, Yannis Smaragdakis

, Yanlei Diao, and Christoph Csallner

(University of Massachusetts at Amherst, USA; Goethe University Frankfurt, Germany; University of Athens, Greece; University of Texas at Arlington, USA)
Exhaustive, automatic testing of dataflow (esp. map-reduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. ``concolic'') execution in a conventional programming language, adapted to the unique features of the dataflow domain.
In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach.

Characteristic Studies of Loop Problems for Structural Test Generation via Symbolic Execution
Xusheng Xiao, Sihan Li, Tao Xie, and Nikolai Tillmann
(North Carolina State University, USA; University of Illinois at Urbana-Champaign, USA; Microsoft Research, USA)
Dynamic Symbolic Execution (DSE) is a state-of-the-art test-generation approach that systematically explores program paths to generate high-covering tests. In DSE, the presence of loops (especially unbound loops) can cause an enormous or even infinite number of paths to be explored. There exist techniques (such as bounded iteration, heuristics, and summarization) that assist DSE in addressing loop problems. However, there exists no literature-survey or empirical work that shows the pervasiveness of loop problems or identifies challenges faced by these techniques on real-world open-source applications. To fill this gap, we provide characteristic studies to guide future research on addressing loop problems for DSE. Our proposed study methodology starts with conducting a literature-survey study to investigate how technical problems such as loop problems compromise automated software-engineering tasks such as test generation, and which existing techniques are proposed to deal with such technical problems. Then the study methodology continues with conducting an empirical study of applying the existing techniques on real-world software applications sampled based on the literature-survey results and major open-source project hostings. This empirical study investigates the pervasiveness of the technical problems and how well existing techniques can address such problems among real-world software applications. Based on such study methodology, our two-phase characteristic studies identify that bounded iteration and heuristics are effective in addressing loop problems when used properly. Our studies further identify challenges faced by these techniques and provide guidelines for effectively addressing these challenges.

Info

Entropy-Based Test Generation for Improved Fault Localization
José Campos, Rui Abreu

, Gordon Fraser, and Marcelo d'Amorim
(University of Porto, Portugal; University of Sheffield, UK; Federal University of Pernambuco, Brazil)
Spectrum-based Bayesian reasoning can effectively rank candidate fault locations based on passing/failing test cases, but the diagnostic quality highly depends on the size and diversity of the underlying test suite. As test suites in practice often do not exhibit the necessary properties, we present a technique to extend existing test suites with new test cases that optimize the diagnostic quality. We apply probability theory concepts to guide test case generation using entropy, such that the amount of uncertainty in the diagnostic ranking is minimized. Our EntBug prototype extends the search-based test generation tool EvoSuite to use entropy in the fitness function of its underlying genetic algorithm, and we applied it to seven real faults. Empirical results show that our approach reduces the entropy of the diagnostic ranking by 49% on average (compared to using the original test suite), leading to a 91% average reduction of diagnosis candidates needed to inspect to find the true faulty one.

Recommendations

Detecting Bad Smells in Source Code using Change History Information
Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrea De Lucia

, and Denys Poshyvanyk
(University of Salerno, Italy; University of Sannio, Italy; University of Molise, Italy; College of William and Mary, USA)
Code smells represent symptoms of poor implementation choices. Previous studies found that these smells make source code more difficult to maintain, possibly also increasing its fault-proneness. There are several approaches that identify smells based on code analysis techniques. However, we observe that many code smells are intrinsically characterized by how code elements change over time. Thus, relying solely on structural information may not be sufficient to detect all the smells accurately.
We propose an approach to detect five different code smells, namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy, by exploiting change history information mined from versioning systems. We applied approach, coined as HIST (Historical Information for Smell deTection), to eight software projects written in Java, and wherever possible compared with existing state-of-the-art smell detectors based on source code analysis. The results indicate that HIST's precision ranges between 61% and 80%, and its recall ranges between 61% and 100%. More importantly, the results confirm that HIST is able to identify code smells that cannot be identified through approaches solely based on code analysis.

ACM SIGSOFT Distinguished Paper Award

Personalized Defect Prediction
Tian Jiang, Lin Tan, and Sunghun Kim
(University of Waterloo, Canada; Hong Kong University of Science and Technology, China)
Many defect prediction techniques have been proposed. While they often take the author of the code into consideration, none of these techniques build a separate prediction model for each developer. Different developers have different coding styles, commit frequencies, and experience levels, causing different defect patterns. When the defects of different developers are combined, such differences are obscured, hurting prediction performance. This paper proposes personalized defect prediction—building a separate prediction model for each developer to predict software defects. As a proof of concept, we apply our personalized defect prediction to classify defects at the file change level. We evaluate our personalized change classification technique on six large software projects written in C and Java—the Linux kernel, PostgreSQL, Xorg, Eclipse, Lucene and Jackrabbit. Our personalized approach can discover up to 155 more bugs than the traditional change classification (210 versus 55) if developers inspect the top 20% lines of code that are predicted buggy. In addition, our approach improves the F1-score by 0.01–0.06 compared to the traditional change classification.

Automatic Recommendation of API Methods from Feature Requests
Ferdian Thung

, Shaowei Wang, David Lo

, and Julia Lawall
(Singapore Management University, Singapore; Inria, France; Lip6, France)
Developers often receive many feature requests. To implement these features, developers can leverage various methods from third party libraries. In this work, we propose an automated approach that takes as input a textual description of a feature request. It then recommends methods in library APIs that developers can use to implement the feature. Our recommendation approach learns from records of other changes made to software systems, and compares the textual description of the requested feature with the textual descriptions of various API methods. We have evaluated our approach on more than 500 feature requests of Axis2/Java, CXF, Hadoop Common, HBase, and Struts 2. Our experiments show that our approach is able to recommend the right methods from 10 libraries with an average recall-rate@5 of 0.690 and recall-rate@10 of 0.779 respectively. We also show that the state-of-the-art approach by Chan et al., that recommends API methods based on precise text phrases, is unable to handle feature requests.

Variability-Aware Performance Prediction: A Statistical Learning Approach
Jianmei Guo, Krzysztof Czarnecki, Sven Apel, Norbert Siegmund, and Andrzej Wąsowski

(University of Waterloo, Canada; University of Passau, Germany; IT University of Copenhagen, Denmark)
Configurable software systems allow stakeholders to derive program variants by selecting features. Understanding the correlation between feature selections and performance is important for stakeholders to be able to derive a program variant that meets their requirements. A major challenge in practice is to accurately predict performance based on a small sample of measured variants, especially when features interact. We propose a variability-aware approach to performance prediction via statistical learning. The approach works progressively with random samples, without additional effort to detect feature interactions. Empirical results on six real-world case studies demonstrate an average of 94% prediction accuracy based on small random samples. Furthermore, we investigate why the approach works by a comparative analysis of performance distributions. Finally, we compare our approach to an existing technique and guide users to choose one or the other in practice.

Info

Security

A Scalable Approach for Malware Detection through Bounded Feature Space Behavior Modeling
Mahinthan Chandramohan, Hee Beng Kuan Tan, Lionel C. Briand

, Lwin Khin Shar, and Bindu Madhavi Padmanabhuni
(Nanyang Technological University, Singapore; University of Luxembourg, Luxembourg)
In recent years, malware (malicious software) has greatly evolved and has become very sophisticated. The evolution of malware makes it difficult to detect using traditional signature-based malware detectors. Thus, researchers have proposed various behavior-based malware detection techniques to mitigate this problem. However, there are still serious shortcomings, related to scalability and computational complexity, in existing malware behavior modeling techniques. This raises questions about the practical applicability of these techniques. This paper proposes and evaluates a bounded feature space behavior modeling (BOFM) framework for scalable malware detection. BOFM models the interactions between software (which can be malware or benign) and security-critical OS resources in a scalable manner. Information collected at run-time according to this model is then used by machine learning algorithms to learn how to accurately classify software as malware or benign. One of the key problems with simple malware behavior modeling (e.g., n-gram model) is that the number of malware features (i.e., signatures) grows proportional to the size of execution traces, with a resulting malware feature space that is so large that it makes the detection process very challenging. On the other hand, in BOFM, the malware feature space is bounded by an upper limit N, a constant, and the results of our experiments show that its computation time and memory usage are vastly lower than in currently reported, malware detection techniques, while preserving or even improving their high detection accuracy.

Automatically Partition Software into Least Privilege Components using Dynamic Data Dependency Analysis
Yongzheng Wu, Jun Sun, Yang Liu

, and Jin Song Dong
(Singapore University of Technology and Design, Singapore; Nanyang Technological University, Singapore; National University of Singapore, Singapore)
The principle of least privilege requires that software components should be granted only necessary privileges, so that compromising one component does not lead to compromising others. However, writing privilege separated software is difficult and as a result, a large number of software is monolithic, i.e., it runs as a whole without separation. Manually rewriting monolithic software into privilege separated software requires significant effort and can be error prone. We propose ProgramCutter, a novel approach to automatically partitioning monolithic software using dynamic data dependency analysis. ProgramCutter works by constructing a data dependency graph whose nodes are functions and edges are data dependencies between functions. The graph is then partitioned into subgraphs where each subgraph represents a least privilege component. The privilege separated software runs each component in a separated process with confined system privileges. We evaluate it by applying it on four open source software. We can reduce the privileged part of the program from 100% to below 22%, while having a reasonable execution time overhead. Since ProgramCutter does not require any expert knowledge of the software, it not only can be used by its developers for software refactoring, but also by end users or system administrators. Our contributions are threefold: (i) we define a quantitative measure of the security and performance of privilege separation; (ii) we propose a graph-based approach to compute the optimal separation based on dynamic information flow analysis; and (iii) the separation process is automatic and does not require expert knowledge of the software.

Finding Architectural Flaws using Constraints
Radu Vanciu and Marwan Abi-Antoun
(Wayne State University, USA)
During Architectural Risk Analysis (ARA), security architects use a runtime architecture to look for security vulnerabilities that are architectural flaws rather than coding defects. The current ARA process, however, is mostly informal and manual. In this paper, we propose Scoria, a semi-automated approach for finding architectural flaws. Scoria uses a sound, hierarchical object graph with abstract objects and dataflow edges, where edges can refer to nodes in the graph. The architects can augment the object graph with security properties, which can express security information unavailable in code. Scoria allows architects to write queries on the graph in terms of the hierarchy, reachability, and provenance of a dataflow object. Based on the query results, the architects enhance their knowledge of the system security and write expressive constraints. The expressiveness is richer than previous approaches that check only for the presence or absence of communication or do not track a dataflow as an object. To evaluate Scoria, we apply these constraints to several extended examples adapted from the CERT standard for Java to confirm that Scoria can detect injected architectural flaws. Next, we write constraints to enforce an Android security policy and find one architectural flaw in one Android application.

Debugging

Improving Bug Localization using Structured Information Retrieval
Ripon K. Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E. Perry
(University of Texas at Austin, USA)
Locating bugs is important, difficult, and expensive, particularly for large-scale systems. To address this, natural language information retrieval techniques are increasingly being used to suggest potential faulty source files given bug reports. While these techniques are very scalable, in practice their effectiveness remains low in accurately localizing bugs to a small number of files. Our key insight is that structured information retrieval based on code constructs, such as class and method names, enables more accurate bug localization. We present BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available. We build BLUiR on a proven, open source IR toolkit that anyone can use. Our work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice. We evaluate BLUiR on four open source projects with approximately 3,400 bugs. Results show that BLUiR matches or outperforms a current state-of-the- art tool across applications considered, even when BLUiR does not use bug similarity data used by the other tool.

Leveraging Program Equivalence for Adaptive Program Repair: Models and First Results
Westley Weimer, Zachary P. Fry, and Stephanie Forrest
(University of Virginia, USA; University of New Mexico, USA)
Software bugs remain a compelling problem. Automated program repair is a promising approach for reducing cost, and many methods have recently demonstrated positive results. However, success on any particular bug is variable, as is the cost to find a repair. This paper focuses on generate-and-validate repair methods that enumerate candidate repairs and use test cases to define correct behavior. We formalize repair cost in terms of test executions, which dominate most test-based repair algorithms. Insights from this model lead to a novel deterministic repair algorithm that computes a patch quotient space with respect to an approximate semantic equivalence relation. This allows syntactic and dataflow analysis techniques to dramatically reduce the repair search space. Generate-and-validate program repair is shown to be a dual of mutation testing, suggesting several possible cross-fertilizations. Evaluating on 105 real-world bugs in programs totaling 5MLOC and involving 10,000 tests, our new algorithm requires an order-of-magnitude fewer test evaluations than the previous state-of-the-art and is over three times more efficient monetarily.

Detecting and Characterizing Semantic Inconsistencies in Ported Code
Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta
(University of Texas at Austin, USA; NASA Langley Research Center, USA; NASA Ames Research Center, USA)
Adding similar features and bug fixes often requires porting program patches from reference implementations and adapting them to target implementations. Porting errors may result from faulty adaptations or inconsistent updates. This paper investigates (1) the types of porting errors found in practice, and (2) how to detect and characterize potential porting errors. Analyzing version histories, we define five categories of porting errors, including incorrect control- and data-flow, code redundancy, inconsistent identifier renaming, etc. Leveraging this categorization, we design a static control- and data-dependence analysis technique, SPA, to detect and characterize porting inconsistencies. Our evaluation on code from four open-source projects shows that SPA can detect porting inconsistencies with 65% to 73% precision and 90% recall, and identify inconsistency types with 58% to 63% precision and 92% to 100% recall. In a comparison with two existing error detection tools, SPA improves precision by 14 to 17 percentage points.

Lightweight Control-Flow Instrumentation and Postmortem Analysis in Support of Debugging
Peter Ohmann and Ben Liblit
(University of Wisconsin-Madison, USA)
Debugging is difficult and costly. As a human programmer looks for a bug, it would be helpful to see a complete trace of events leading to the point of failure. Unfortunately, full tracing is simply too slow to use in deployment, and may even be impractical during testing.
We aid post-deployment debugging by giving programmers additional information about program activity shortly before failure. We use latent information in post-failure memory dumps, augmented by low-overhead, tunable run-time tracing. Our results with a realistically-tuned tracing scheme show low enough overhead (0-5%) to be used in production runs. We demonstrate several potential uses of this enhanced information, including a novel postmortem static slice restriction technique and a reduced view of potentially-executed code. Experimental evaluation shows our approach to be very effective, such as shrinking stack-sensitive interprocedural static slices by 49-78% in larger applications.

Info

ACM SIGSOFT Distinguished Paper Award

Resources

Characterizing and Detecting Resource Leaks in Android Applications
Chaorong Guo, Jian Zhang

, Jun Yan, Zhiqiang Zhang, and Yanli Zhang
(Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)
Android phones come with a host of hardware components embedded in them, such as Camera, Media Player and Sensor. Most of these components are exclusive resources or resources consuming more memory/energy than general. And they should be explicitly released by developers. Missing release operations of these resources might cause serious problems such as performance degradation or system crash. These kinds of defects are called resource leaks. This paper focuses on resource leak problems in Android apps, and presents our lightweight static analysis tool called Relda, which can automatically analyze an application’s resource operations and locate the resource leaks. We propose an automatic method for detecting resource leaks based on a modified Function Call Graph, which handles the features of event-driven mobile programming by analyzing the callbacks defined in Android framework. Our experimental data shows that Relda is effective in detecting resource leaks in real Android apps.

Dangling References in Multi-configuration and Dynamic PHP-Based Web Applications
Hung Viet Nguyen, Hoan Anh Nguyen, Tung Thanh Nguyen, Anh Tuan Nguyen, and Tien N. Nguyen
(Iowa State University, USA)
PHP is a dynamic language popularly used in Web development for writing server-side code to dynamically create multiple versions of client-side pages at run time for different configurations. A PHP program contains code to be executed or produced for multiple configurations/versions. That dynamism and multi-configuration nature leads to dangling references. Specifically, in the execution for a configuration, a reference to a variable or a call to a function is dangling if its corresponding declaration cannot be found. We conducted an exploratory study to confirm the existence of such dangling reference errors including dangling cross-language and embedded references in the client-side HTML/JavaScript code and in data-accessing SQL code that are embedded in scattered PHP code. Dangling references have caused run-time fatal failures and security vulnerabilities.
We developed DRC, a static analysis method to detect such dangling references. DRC uses symbolic execution to collect PHP declarations/references and to approximate all versions of the generated output, and then extracts embedded declarations/references. It associates each detected declaration/reference with a conditional constraint that represents the execution paths (i.e. configurations/versions) containing that declaration/reference. It then validates references against declarations via a novel dangling reference detection algorithm. Our empirical evaluation shows that DRC detects dangling references with high accuracy. It revealed 83 yet undiscovered defects caused by dangling references.

Dynamically Transforming Data Structures
Erik Österlund and Welf Löwe
(Linnaeus University, Sweden)
Fine-tuning which data structure implementation to use for a given problem is sometimes tedious work since the optimum solution depends on the context, i.e., on the operation sequences, actual parameters as well as on the hardware available at run time. Sometimes a data structure with higher asymptotic time complexity performs better in certain contexts because of lower constants. The optimal solution may not even be possible to determine at compile time. We introduce transformation data structures that dynamically change their internal representation variant based on a possibly changing context. The most suitable variant is selected at run time rather than at compile time. We demonstrate the effect on performance with a transforma- tion ArrayList data structure using an array variant and a linked hash bag variant as alternative internal representations. Using our transformation ArrayList, the standard DaCapo benchmark suite shows a performance gain of 5.19% in average.

Towards Precise Metrics for Predicting Graph Query Performance
Benedek Izsó, Zoltán Szatmári, Gábor Bergmann

, Ákos Horváth, and István Ráth
(Budapest University of Technology and Economics, Hungary)
Queries are the foundations of data intensive applications. In model-driven software engineering (MDSE), model queries are core technologies of tools and transformations. As software models are rapidly increasing in size and complexity, most MDSE tools frequently exhibit scalability issues that decrease developer productivity and increase costs. As a result, choosing the right model representation and query evaluation approach is a significant challenge for tool engineers. In the current paper, we aim to provide a benchmarking framework for the systematic investigation of query evaluation performance. More specifically, we experimentally evaluate (existing and novel) query and instance model metrics to highlight which provide sufficient performance estimates for different MDSE scenarios in various model query tools. For that purpose, we also present a comparative benchmark, which is designed to differentiate model representation and graph query evaluation approaches according to their performance when using large models and complex queries.

Specification Mining

TzuYu: Learning Stateful Typestates
Hao Xiao, Jun Sun, Yang Liu

, Shang-Wei Lin, and Chengnian Sun
(Nanyang Technological University, Singapore; Singapore University of Technology and Design, Singapore; National University of Singapore, Singapore)
Behavioral models are useful for various software engineering tasks. They are, however, often missing in practice. Thus, specification mining was proposed to tackle this problem. Existing work either focuses on learning simple behavioral models such as finite-state automata, or relies on techniques (e.g., symbolic execution) to infer finite-state machines equipped with data states, referred to as stateful typestates. The former is often inadequate as finite-state automata lack expressiveness in capturing behaviors of data-rich programs, whereas the latter is often not scalable. In this work, we propose a fully automated approach to learn stateful typestates by extending the classic active learning process to generate transition guards (i.e., propositions on data states). The proposed approach has been implemented in a tool called TzuYu and evaluated against a number of Java classes. The evaluation results show that TzuYu is capable of learning correct stateful typestates more efficiently.

Mining Branching-Time Scenarios
Dirk Fahland, David Lo

, and Shahar Maoz

(Eindhoven University of Technology, Netherlands; Singapore Management University, Singapore; Tel Aviv University, Israel)
Specification mining extracts candidate specification from existing systems, to be used for downstream tasks such as testing and verification. Specifically, we are interested in the extraction of behavior models from execution traces.
In this paper we introduce mining of branching-time scenarios in the form of existential, conditional Live Sequence Charts, using a statistical data-mining algorithm. We show the power of branching scenarios to reveal alternative scenario-based behaviors, which could not be mined by previous approaches.
The work contrasts and complements previous works on mining linear-time scenarios. An implementation and evaluation over execution trace sets recorded from several real-world applications shows the unique contribution of mining branching-time scenarios to the state-of-the-art in specification mining.

Models and Complexity

Measuring the Structural Complexity of Feature Models
Richard Pohl, Vanessa Stricker, and Klaus Pohl
(University of Duisburg-Essen, Germany)
The automated analysis of feature models (FM) is based on SAT, BDD, and CSP – known NP-complete problems. Therefore, the analysis could have an exponential worst-case execution time. However, for many practical relevant analysis cases, state-of-the-art (SOTA) analysis tools quite successfully master the problem of exponential worst-case execution time based on heuristics. So far, however, very little is known about the structure of FMs that cause the cases in which the execution time (hardness) for analyzing a given FM increases unpredictably for SOTA analysis tools. In this paper, we propose to use width measures from graph theory to characterize the structural complexity of FMs as a basis for an estimation of the hardness of analysis operations on FMs with SOTA analysis tools. We present an experiment that we use to analyze the reasonability of graph width measures as metric for the structural complexity of FMs and the hardness of FM analysis. Such a complexity metric can be used as a basis for a unified method to systematically improve SOTA analysis tools.

Info

Scalable Product Line Configuration: A Straw to Break the Camel’s Back
Abdel Salam Sayyad, Joseph Ingram, Tim Menzies, and Hany Ammar
(West Virginia University, USA)
Software product lines are hard to configure. Techniques that work for medium sized product lines fail for much larger product lines such as the Linux kernel with 6000+ features. This paper presents simple heuristics that help the Indicator-Based Evolutionary Algorithm (IBEA) in finding sound and optimum configurations of very large variability models in the presence of competing objectives. We employ a combination of static and evolutionary learning of model structure, in addition to utilizing a pre-computed solution used as a “seed” in the midst of a randomly-generated initial population. The seed solution works like a single straw that is enough to break the camel’s back –given that it is a feature-rich seed. We show promising results where we can find 30 sound solutions for configuring upward of 6000 features within 30 minutes.

ASE 2013 – Proceedings

Technical Research Track

Concurrency

Dynamic Analysis

Testing

Verification

Evolution

Generation and Synthesis

Recommendations

Security

Debugging

Resources

Specification Mining

Models and Complexity