SCAM 2015 – Proceedings

Message from the Chairs
Willkommen in Bremen!
After 14 successful editions of SCAM, we are delighted to welcome you to Bremen, Germany, for another edition of the IEEE International Working Conference on Source Code Analysis and Manipulation, collocated with the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME 2015).
SCAM promotes discussion and interaction among researchers and practitioners working on theory, techniques and applications that concern analysis and manipulation of the source code of computer systems. We started out as compiler hackers, but now we also do MSR-type analysis and even code comprehension studies using technologies like Eye Tracking. Software plays an essential role in our lives, in ways both obvious and subtle, and will continue to do so in the years to come. While much attention in the wider software engineering community is directed towards other aspects of systems development and evolution, such as specification, design and requirements engineering, it is the source code that contains the precise, and sometimes only, definitive description of the behavior of the system. SCAM focuses on the techniques and tools themselves – what they can achieve, how they can be improved, refined and combined.

Main Research

Empirical Studies I
Sun, Sep 27, 11:00 - 12:30, HS 1010 (Chair: Alexander Serebrenik)

ORBS and the Limits of Static Slicing
David Binkley, Nicolas Gold, Mark Harman, Syed Islam, Jens Krinke, and Shin Yoo
(Loyola University Maryland, USA; University College London, UK; University of East London, UK; KAIST, South Korea)
Observation-based slicing is a recently-introduced, language-independent, slicing technique based on the dependencies observable from program behaviour. Due to the well-known limits of dynamic analysis, we may only compute an under-approximation of the true observation-based slice. However, because the observation-based slice captures all possible dependence that can be observed, even such approximations can yield insight into the limitations of static slicing. For example, a static slice, S that is strictly smaller than the corresponding observation based slice is potentially unsafe. We present the results of three sets of experiments on 12 different programs, including benchmarks and larger programs, which investigate the relationship between static and observation-based slicing. We show that, in extreme cases, observation-based slices can find the true minimal static slice, where static techniques cannot. For more typical cases, our results illustrate the potential for observation-based slicing to highlight limitations in static slicers. Finally, we report on the sensitivity of observation-based slicing to test quality.

Info

Intent, Tests, and Release Dependencies: Pragmatic Recipes for Source Code Integration
Martin Brandtner, Philipp Leitner, and Harald C. Gall

(University of Zurich, Switzerland)
Continuous integration of source code changes, for example, via pull-request driven contribution channels, has become standard in many software projects. However, the decision to integrate source code changes into a release is complex and has to be taken by a software manager. In this work, we identify a set of three pragmatic recipes plus variations to support the decision making of integrating code contributions into a release. These recipes cover the isolation of source code changes, contribution of test code, and the linking of commits to issues. We analyze the development history of 21 open-source software projects, to evaluate whether, and to what extent, those recipes are followed in open-source projects. The results of our analysis showed that open-source projects largely follow recipes on a compliance level of > 75%. Hence, we conclude that the identified recipes plus variations can be seen as wide-spread relevant best-practices for source code integration.

The Use of C++ Exception Handling Constructs: A Comprehensive Study
Rodrigo Bonifácio, Fausto Carvalho, Guilherme N. Ramos, Uirá Kulesza, and Roberta Coelho
(University of Brasília, Brazil; Federal University of Rio Grande do Norte, Brazil)
Exception handling (EH) is a well-known mechanism that aims at improving software reliability in a modular way---allowing a better separation between the code that deals with exceptional conditions and the code that deals with the normal control flow of a program. Although the exception handling mechanism was conceived almost 40 years ago, formulating a reasonable design of exception handling code is still considered a challenge, which might hinder its widespread use. This paper reports the results of an empirical study that use a mixed-method approach to investigate the adoption of the exception handing mechanism in C++. Firstly, we carried out a static analysis investigation to understand how developers employ the exception handling construct of C++, considering 65 open-source systems (which comprise 34 million lines of C++ code overall). Then, to better understand the findings from the static analysis phase, we conducted a survey involving 145 C++ developers who have contributed to the subject systems. Some of the findings consistently detected during this mixed-method study reveal that, for several projects, the use of exception handling constructs is scarce and developers favor the use of other strategies to deal with exceptional conditions. In addition, the survey respondents consider that incompatibility with existing C code and libraries, extra performance costs (in terms of response time and size of the compiled code), and lack of expertise to design an exception handling strategy are among the reasons for avoiding the use of exception handling constructs.

Info

Multi-layer Software Configuration: Empirical Study on Wordpress
Mohammed Sayagh and Bram Adams
(Polytechnique Montréal, Canada)
Software can be adapted to different situations and platforms by changing its configuration. However, incorrect configurations can lead to configuration errors that are hard to resolve or understand, especially in the case of multi-layer architectures, where configuration options in each layer might contradict each other or be hard to trace to each other. Hence, this paper performs an empirical study on the occurrence of multi-layer configuration options across Wordpress (WP) plugins, WP, and the PHP engine. Our analyses show that WP and its plugins use on average 76 configuration options, a number that increases across time. We also find that each plugin uses on average 1.49% to 9.49% of all WP database options, and 1.38% to 15.18% of all WP configurable constants. 85.16% of all WP database options, 78.88% of all WP configurable constants, and 52 PHP configuration options are used by at least two plugins at the same time. Finally, we show how the latter options have a larger potential for questions and confusion amongst users.

Info

Code Search and Navigation
Sun, Sep 27, 16:00 - 18:00, HS 1010 (Chair: David Shepherd)

Can the Use of Types and Query Expansion Help Improve Large-Scale Code Search?
Otávio Augusto Lazzarini Lemos, Adriano Carvalho de Paula, Hitesh Sajnani, and Cristina V. Lopes
(Federal University of São Paulo, Brazil; University of California at Irvine, USA)
With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.

Using Changeset Descriptions as a Data Source to Assist Feature Location
Muslim Chochlov, Michael English, and Jim Buckley
(University of Limerick, Ireland)
Abstract—Feature location attempts to assist developers in discovering functionality in source code. Many textual feature location techniques utilize information retrieval and rely on comments and identifiers of source code to describe software entities. An interesting alternative would be to employ the changeset descriptions of the code altered in that changeset as a data source to describe such software entities. To investigate this we implement a technique utilizing changeset descriptions and conduct an empirical study to observe this technique’s overall performance. Moreover, we study how the granularity (i.e. file or method level of software entities) and changeset range inclusion (i.e. most recent or all historical changesets) affect such an approach. The results of a preliminary study with Rhino and Mylyn.Tasks systems suggest that the approach could lead to a potentially efficient feature location technique. They also suggest that it is advantageous in terms of the effort to configure the technique at method level granularity and that older changesets from older systems may reduce the effectiveness of the technique.

Automatically Identifying Focal Methods under Test in Unit Test Cases
Mohammad Ghafari, Carlo Ghezzi, and Konstantin Rubinov
(Politecnico di Milano, Italy; National University of Singapore, Singapore)
Modern iterative and incremental software development relies on continuous testing. The knowledge of test-to-code traceability links facilitates test-driven development and improves software evolution. Previous research identified traceability links between test cases and classes under test. Though this information is helpful, a finer granularity technique can provide more useful information beyond the knowledge of the class under test. In this paper, we focus on Java classes that instantiate stateful objects and propose an automated technique for precise detection of the focal methods under test in unit test cases. Focal methods represent the core of a test scenario inside a unit test case. Their main purpose is to affect an object's state that is then checked by other inspector methods whose purpose is ancillary and needs to be identified as such. Distinguishing focal from other (non-focal) methods is hard to accomplish manually. We propose an approach to detect focal methods under test automatically. An experimental assessment with real-world software shows that our approach identifies focal methods under test in more than 85 of cases, providing a ground for precise automatic recovery of test-to-code traceability links.

Navigating Source Code with Words
Dawn Lawrie and David Binkley
(Loyola University Maryland, USA)
The hierarchical method of organizing information has proven beneficial in learning in part because it maps well onto the human brain's memory. Exploiting this organizational strategy may help engineers cope with large software systems. In fact such an strategy is already present in source code and is manifested in the class hierarchies of objected-oriented programs. However, an engineer faced with fixing a bug or any similar need to locate the implementation of a particular feature in the code is less interested in the syntactic organization of the code and more interested in its conceptual organization. Therefore, a conceptual hierarchy would bring clear benefit. Fortunately, such a view can be extracted automatically the source code. The hierarchy generating tool HierIT performs this task using an information-theoretic approach to identify "content-bearing" words and associate them hierarchically. The resulting hierarchy enables an engineer to better understand the concepts contained in a software system. To study their value, an experiment was conducted to quantitatively and qualitatively investigate the value that hierarchies bring. The quantitative evaluation first considers the Expected Mutual Information Measure (EMIM) between the set of topic words and natural language extracted from the source code. It then considers the Best Case Tree Walk (BCTW), which captures how "expensive" it is to find interesting documents. Finally, the hierarchies are considered qualitatively by investigating their perceived usefulness in a case study involving three engineers.

Recommending Insightful Comments for Source Code using Crowdsourced Knowledge
Mohammad Masudur Rahman, Chanchal K. Roy, and Iman Keivanloo
(University of Saskatchewan, Canada; Queen's University, Canada)
Recently, automatic code comment generation is proposed to facilitate program comprehension. Existing code comment generation techniques focus on describing the functionality of the source code. However, there are other aspects such as insights about quality or issues of the code, which are overlooked by earlier approaches. In this paper, we describe a mining approach that recommends insightful comments about the quality, deficiencies or scopes for further improvement of the source code. First, we conduct an exploratory study that motivates crowdsourced knowledge from Stack Overflow discussions as a potential resource for source code comment recommendation. Second, based on the findings from the exploratory study, we propose a heuristic-based technique for mining insightful comments from Stack Overflow Q & A site for source code comment recommendation. Experiments with 292 Stack Overflow code segments and 5,039 discussion comments show that our approach has a promising recall of 85.42%. We also conducted a complementary user study which confirms the accuracy and usefulness of the recommended comments.

Static Analysis
Mon, Sep 28, 09:00 - 10:30, HS 1010 (Chair: Paul Anderson)

Checking C++ Codes for Compatibility with Operator Overloading
Alexander Hück, Christian Bischof, and Jean Utke
(TU Darmstadt, Germany; Allstate Insurance Company, USA)
Operator overloading allows the semantic extension of existing code without the need for sweeping code changes. For example, automatic differentiation tools in C++ commonly use this feature to enhance the code with additional derivative computation. To this end, a floating point data type is changed to a complex user-defined type. While conceptually straightforward, this type change often leads to compilation errors that can be tedious to decipher and resolve. This is due to the fact that the built-in floating point types in C++ are treated differently than user-defined types, and code constructs that are legal for floating point types can be a violation of the C++ standard for complex user-defined types. We identify and classify such problematic code constructs and suggest how the code can be changed to avoid these errors, while still allowing the use of operator overloading. To automatically flag such occurrences, we developed a Clang-based tool for the static analysis of C++ code based on our assessment of constructs problematic in operator overloading for numeric types. It automatically finds instances of problematic code locations and prints Lint-like warning messages. To showcase the relevance of this topic and the usefulness of our tool, we consider the basic routines of the OpenFOAM CFD software package, consisting of 1,476 C++ source and header files, for a total of over 150,000 lines of code. Altogether, we found 74 distinct occurrences of problematic code constructs in 21 files. As some of these files are included in over 400 different locations in the OpenFOAM base, errors in these files create a torrent of error messages that often are difficult to comprehend. In summary, the classification of problematic instances aids developers in writing numerical code that is fit for operator overloading and the tool helps programmers that augment legacy code in spotting problematic code constructs.

Detecting Function Purity in JavaScript
Jens Nicolay, Carlos Noguera, Coen De Roover, and Wolfgang De Meuter
(Vrije Universiteit Brussel, Belgium)
We present an approach to detect function purity in JavaScript. A function is pure if none of its applications cause observable side-effects. The approach is based on a pushdown flow analysis that besides traditional control and value flow also keeps track of write effects. To increase the precision of our purity analysis, we combine it with an intraprocedural analysis to determine freshness of variables and object references. We formalize the core aspects of our analysis, and discuss our implementation used to analyze several common JavaScript benchmarks. Experiments show that our technique is capable of detecting function purity, even in the presence of higher-order functions, dynamic property expressions, and prototypal inheritance.

Data Tainting and Obfuscation: Improving Plausibility of Incorrect Taint
Sandrine Blazy

, Stéphanie Riaud, and Thomas Sirvent
(University of Rennes 1, France; IRISA, France; DGA, France; INRIA, France)
Code obfuscation is designed to impede the reverse engineering of a binary software. Dynamic data tainting is an analysis technique used to identify dependencies between data in a software. Performing dynamic data tainting on obfuscated software usually yields hard to exploit results, due to over-tainted data. Such results are clearly identifiable as useless: an attacker will immediately discard them and opt for an alternative tool. In this paper, we present a code transformation technique meant to prevent the identification of useless results: a few lines of code are inserted in the obfuscated software, so that the results obtained by the dynamic data tainting approach appear acceptable. These results remain however wrong and lead an attacker to waste enough time and resources trying to analyze incorrect data dependencies, so that he will usually decide to use less automated and advanced analysis techniques, and maybe give up reverse engineering the current binary software. This improves the security of the software against malicious analysis.

A Grammar for Spreadsheet Formulas Evaluated on Two Large Datasets
Efthimia Aivaloglou, David Hoepelman, and Felienne Hermans
(Delft University of Technology, Netherlands)
Spreadsheets are ubiquitous in the industrial world and often perform a role similar to other computer programs, which makes them interesting research targets. However, there does not exist a reliable grammar that is concise enough to facilitate formula parsing and analysis and to support research on spreadsheet codebases. This paper presents a grammar for spreadsheet formulas that is compatible with the spreadsheet formula language, is compact enough to feasibly implement with a parser generator, and produces parse trees aimed at further manipulation and analysis. We evaluate the grammar against more than one million unique formulas extracted from the well known EUSES and Enron spreadsheet datasets, successfully parsing 99.99%. Additionally, we utilize the grammar to analyze these datasets and measure the frequency of usage of language features in spreadsheet formulas. Finally, we identify smelly constructs and uncommon cases in the syntax of formulas.

Info

Empirical Studies II
Mon, Sep 28, 11:00 - 12:15, HS 1010 (Chair: Chanchal K. Roy)

The Impact of Cross-Distribution Bug Duplicates, Empirical Study on Debian and Ubuntu
Vincent Boisselle and Bram Adams
(Polytechnique Montréal, Canada)
Although open source distributions like Debian and Ubuntu are closely related, sometimes a bug reported in the Debian bug repository is reported independently in the Ubuntu repository as well, without the Ubuntu users nor developers being aware. Such cases of undetected cross-distribution bug duplicates can cause developers and users to lose precious time working on a fix that already exists or to work individually instead of collaborating to find a fix faster. We perform a case study on Ubuntu and Debian bug repositories to measure the amount of cross-distribution bug duplicates and estimate the amount of time lost. By adapting an existing within-project duplicate detection approach (achieving a similar recall of 60%), we find 821 cross-duplicates. The early detection of such duplicates could reduce the time lost by users waiting for a fix by a median of 38 days. Furthermore, we estimate that developers from the different distributions lose a median of 47 days in which they could have collaborated together, had they been aware of duplicates. These results show the need to detect and monitor cross-distribution duplicates.

Cumulative Code Churn: Impact on Maintainability
Csaba Faragó, Péter Hegedűs, and Rudolf Ferenc
(University of Szeged, Hungary)
It is a well-known phenomena that the source code of software systems erodes during development, which results in higher maintenance costs in the long term. But can we somehow narrow down where exactly this erosion happens? Is it possible to infer the future erosion based on past code changes? Do modifications performed on frequently changing code have worse effect on software maintainability than those affecting less frequently modified code? In this study we investigated these questions and the results indicate that code churn indeed increases the pace of code erosion. We calculated cumulative code churn values and maintainability changes for every version control commit operation of three open-source and one proprietary software system. With the help of Wilcoxon rank test we compared the cumulative code churn values of the files in commits resulting maintainability increase with those of decreasing the maintainability. In the case of three systems the test showed very strong significance and in one case it resulted in strong significance (p-values 0.00235, 0.00436, 0.00018 and 0.03616). These results support our preliminary assumption that modifying high-churn code is more likely to decrease the overall maintainability of a software system, which can be thought of as the generalization of the already known phenomena that code churn results in higher number of defects.

How Do Java Methods Grow?
Daniela Steidl and Florian Deissenboeck
(CQSE, Germany)
Overly long methods hamper the maintainability of software—they are hard to understand and to change, but also difficult to test, reuse, and profile. While technically there are many opportunities to refactor long methods, little is known about their origin and their evolution. It is unclear how much effort should be spent to refactor them and when this effort is spent best. To obtain a maintenance strategy, we need a better understanding of how software systems and their methods evolve. This paper presents an empirical case study on method growth in Java with nine open source and one industry system. We show that most methods do not increase their length significantly; in fact, about half of them remain unchanged after the initial commit. Instead, software systems grow by adding new methods rather than by modifying existing methods.

(Code, Memory, Performance) Smells
Mon, Sep 28, 13:30 - 15:30, HS 1010 (Chair: Jurgen Vinju)

On the Comprehension of Code Clone Visualizations: A Controlled Study using Eye Tracking
Md Sami Uddin, Varun Gaur, Carl Gutwin, and Chanchal K. Roy
(University of Saskatchewan, Canada)
Code clone visualizations (CCVs) are graphical representations of clone detection results provided by various state-of-the-art command line and graphical analysis tools. In order to properly analyze and manipulate code clones within a target system, these visualizations must be easily and efficiently comprehensible. We conducted an eye-tracking study with 20 participants (expert, intermediate, and novice) to assess how well people can comprehend visualizations such as Scatter plots, Treemaps, and Hierarchical Dependency Graphs provided by VisCad, a recent clone visualization tool. The goals of the study were to find out what elements of the visualizations (e.g., colors, shapes, object positions) are most important for comprehension, and to identify common usage patterns for different groups. Our results help us understand how developers with different levels of expertise explore and navigate through the visualizations while performing specific tasks. Distinctive patterns of eye movements for different visualizations were found depending on the expertise of the participants. Color, shape and position information were found to play vital roles in comprehension of CCVs. Our results provide recommendations that can improve the implementation of visualization techniques in VisCad and other clone visualization systems.

When Code Smells Twice as Much: Metric-Based Detection of Variability-Aware Code Smells
Wolfram Fenske, Sandro Schulze, Daniel Meyer, and Gunter Saake
(University of Magdeburg, Germany; TU Braunschweig, Germany)
Code smells are established, widely used characterizations of shortcomings in the design and implementation of software systems. As such, they have been subject to intensive research regarding their detection and impact on understandability and changeability of source code. However, current methods do not support highly configurable software systems, that is, systems that can be customized to fit a wide range of requirements or platforms. Such systems commonly owe their configurability to conditional compilation based on C preprocessor annotations (a.k.a. #ifdefs). Since annotations directly interact with the host language (e.g., C), they may have adverse effects on understandability and changeability of source code, referred to as variability-aware code smells. In this paper, we propose a metric- based method that integrates source code and C preprocessor annotations to detect such smells. We evaluate our method for one specific smell on five open-source systems of medium size, thus, demonstrating its general applicability. Moreover, we manually reviewed 100 instances of the smell and provide a qualitative analysis of the potential impact of variability-aware code smells as well as common causes for their occurrence.

Info

LeakTracer: Tracing Leaks along the Way
Hengyang Yu, Xiaohua Shi, and Wei Feng
(Beihang University, China)
Unnecessary references in managed languages, such as Java and C#, often cause memory leaks without any immediate symptoms. These leaks become manifest when the program has been running for a long time (usually several hours, days or even weeks). Garbage collectors cannot handle this situation, since it only reclaims objects that have no external references to them. Consequently, when the number of leaked objects becomes large, garbage collection frequency increases and program performance degrades. Ultimately, the program will crash. This paper introduces LeakTracer, a tool that helps diagnose memory leaks in managed languages. The core of LeakTracer is the use of a novel leak predictor, which not only considers object size and staleness as a whole to predict leaked objects, but also carefully adjusts their contributions to the leak possibility of an object, according to the careful observation of activities of common objects during their lifetimes. We have implemented LeakTracer in two parts: (1) an online object events tracker in the Apache Harmony DRL virtual machine, and (2) an offline analyzer embedding our predictor. We have successfully used LeakTracer to find leaks in several real-world programs, and our case studies how that leak predictor can pinpoint leaked objects with high accuracy.

Automated Memory Leak Diagnosis by Regression Testing
Mohammadreza Ghanavati and Artur Andrzejak
(University of Heidelberg, Germany)
Memory leaks are tedious to detect and require significant debugging effort to be reproduced and localized. In particular, many of such bugs escape classical testing processes used in software development. One of the reasons is that unit and integration tests run too short for leaks to manifest via memory bloat or degraded performance. Moreover, many of such defects are environment-sensitive and not triggered by a test suite. Consequently, leaks are frequently discovered in the production scenario, causing elevated costs. In this paper we propose an approach for automated diagnosis of memory leaks during the development phase. Our technique is based on regression testing and exploits existing test suites. The key idea is to compare object (de-)allocation statistics (collected during unit/integration test executions) between a previous and the current software version. By grouping these statistics according to object creation sites we can detect anomalies and pinpoint the potential root causes of memory leaks. Such diagnosis can be completed before a visible memory bloat occurs, and in time proportional to the execution of test suite. We evaluate our approach using real leaks found in 7 Java applications. Results show that our approach has sufficient detection accuracy and is effective in isolating the leaky allocation site: true defect locations rank relatively high in the lists of suspicious code locations if the tests trigger the leak pattern. Our prototypical system imposes an acceptable instrumentation and execution overhead for practical memory leak detection even in large software projects.

Automating the Performance Deviation Analysis for Multiple System Releases: An Evolutionary Study
Felipe Pinto, Uirá Kulesza, and Christoph Treude

(Federal University of Rio Grande do Norte, Brazil; IFRN, Brazil)
This paper presents a scenario-based approach for the evaluation of the quality attribute of performance, measured in terms of execution time (response time). The approach is implemented by a framework that uses dynamic analysis and repository mining techniques to provide an automated way for revealing potential sources of performance degradation of scenarios between releases of a software system. The approach defines four phases: (i) preparation – choosing the scenarios and preparing the target releases; (ii) dynamic analysis – determining the performance of scenarios and methods by calculating their execution time; (iii) degradation analysis – processing and comparing the results of the dynamic analysis for different releases; and (iv) repository mining – identifying development issues and commits associated with performance deviation. The paper also describes an evolutionary study of applying the approach to multiple releases of the Netty, Wicket and Jetty frameworks. The study analyzed seven releases of each system and addressed a total of 57 scenarios. Overall, we have found 14 scenarios with significant performance deviation for Netty, 13 for Wicket, and 9 for Jetty, almost all of which could be attributed to a source code change. We also discuss feedback obtained from eight developers of Netty, Wicket and Jetty as result of a questionnaire.

Info

Code and API Transformation
Mon, Sep 28, 16:00 - 17:15, HS 1010 (Chair: Raghavan Komondoor)

From Preprocessor-Constrained Parse Graphs to Preprocessor-Constrained Control Flow
Dierk Lüdemann and Rainer Koschke
(University of Bremen, Germany)
Preprocessor-aware static analysis tools are needed for C Code to gain sound knowledge about the interference among all conditionally compiled program parts. We provide formal descriptions and algorithms to construct a preprocessor- aware control flow graph from preprocessor-aware parse graphs of SuperC. Based on the structure of parse graphs capturing the syntax nodes constrained by preprocessor constraints, we show how to model, formalize, and compute preprocessor-aware intra-procedural control-flow graphs. Such preprocessor-aware control-flow graphs may serve as the basis for subsequent preprocessor-aware control and data flow analyses.

Recording and Replaying System Specific, Source Code Transformations
Gustavo Santos, Anne Etien

, Nicolas Anquetil, Stéphane Ducasse

, and Marco Tulio Valente
(INRIA, France; University of Lille, France; Federal University of Minas Gerais, Brazil)
During its lifetime, a software system is under continuous maintenance to remain useful. Maintenance can be achieved in activities such as adding new features, fixing bugs, improving the system’s structure, or adapting to new APIs. In such cases, developers sometimes perform sequences of code changes in a systematic way. These sequences consist of small code changes (e.g., create a class, then extract a method to this class), which are applied to groups of related code entities (e.g., some of the methods of a class). This paper presents the design and proof-of-concept implementation of a tool called MacroRecorder. This tool records a sequence of code changes, then it allows the developer to generalize this sequence in order to apply it in other code locations. In this paper, we discuss MacroRecorder's approach that is independent of both development and transformation tools. The evaluation is based on previous work on repetitive code changes related to rearchitecting. MacroRecorder was able to replay 92% of the examples, which consisted in up to seven code entities modified up to 66 times. The generation of a customizable, large-scale transformation operator has the potential to efficiently assist code maintenance.

Discovering Likely Mappings between APIs using Text Mining
Rahul Pandita, Raoul Praful Jetley, Sithu D Sudarsan, and Laurie Williams
(North Carolina State University, USA; ABB Corporate Research, India)
Developers often release different versions of their applications to support various platform/programming-language application programming interfaces (APIs). To migrate an application written using one API (source) to another API (target), a developer must know how the methods in the source API map to the methods in the target API. Given a typical platform or language exposes a large number of API methods, manually writing API mappings is prohibitively resource-intensive and may be error prone. Recently, researchers proposed to automate the mapping process by mining API mappings from existing code-bases. However, these approaches require as input a manually ported (or at least functionally similar) code across source and target APIs. To address the shortcoming, this paper proposes TMAP: Text Mining based approach to discover likely API mappings using the similarity in the textual description of the source and target API documents. To evaluate our approach, we used TMAP to discover API mappings for 15 classes across: 1) Java and C# API, and 2) Java ME and Android API. We compared the discovered mappings with state-of-the-art source code analysis based approaches: Rosetta and StaMiner. Our results indicate that TMAP on average found relevant mappings for 57% more methods compared to previous approaches. Furthermore, our results also indicate that TMAP on average found exact mappings for 6.5 more methods per class with a maximum of 21 additional exact mappings for a single class as compared to previous approaches.

Info

Tool Demos
Sun, Sep 27, 13:30 - 15:30, GW2 B2890 (Chair: Felienne Hermans)

SimNav: Simulink Navigation of Model Clone Classes
Eric J. Rapos, Andrew Stevenson, Manar H. Alalfi, and James R. Cordy
(Queen's University, Canada)
SimNav is a graphical user interface designed for displaying and navigating clone classes of Simulink models detected by the model clone detector Simone. As an embedded Simulink interface tool, SimNav allows model developers to explore detected clones directly in their own model development environment rather than a separate research tool interface. SimNav allows users to open selected models for side-by-side comparison, in order to visually explore clone classes and view the differences in the clone instances, as well as to explore the context in which the clones exist. This tool paper describes the motivation, implementation, and use cases for SimNav.

Video

Info

A Translation Validation Framework for Symbolic Value Propagation Based Equivalence Checking of FSMDAs
Kunal Banerjee, Chittaranjan Mandal, and Dipankar Sarkar
(IIT Kharagpur, India)
A compiler is a computer program which translates a source code into a target code, often with an objective to reduce the execution time and/or save critical resources. However, an error in the design or in the implementation of a compiler may result in software bugs in the target code obtained from that compiler. Translation validation is a formal verification approach for compilers whereby, each individual translation is followed by a validation phase which verifies that the target code produced correctly implements the source code. In this paper, we present a tool for translation validation of optimizing transformations of programs; the original and the transformed programs are modeled as Finite State Machines with Datapath having Arrays (FSMDAs) and a symbolic value propagation (SVP) based equivalence checking strategy is applied over this model to determine the correctness of the applied transformations. The tool has been demonstrated to handle uniform and non-uniform code motions, including code motions across loops, along with transformations which result in modification of control structures of programs. Moreover, arithmetic transformations such as, associative, commutative, distributive transformations, expression simplification, constant folding, etc., are also supported.

FaultBuster: An Automatic Code Smell Refactoring Toolset
Gábor Szőke, Csaba Nagy, Lajos Jeno Fulop, Rudolf Ferenc, and Tibor Gyimóthy
(University of Szeged, Hungary)
One solution to prevent the quality erosion of a software product is to maintain its quality by continuous refactoring. However, refactoring is not always easy. Developers need to identify the piece of code that should be improved and decide how to rewrite it. Furthermore, refactoring can also be risky; that is, the modified code needs to be re-tested, so developers can see if they broke something. Many IDEs offer a range of refactorings to support so-called automatic refactoring, but tools which are really able to automatically refactor code smells are still under research.
In this paper we introduce FaultBuster, a refactoring toolset which is able to support automatic refactoring: identifying the problematic code parts via static code analysis, running automatic algorithms to fix selected code smells, and executing integrated testing tools. In the heart of the toolset lies a refactoring framework to control the analysis and the execution of automatic algorithms. FaultBuster provides IDE plugins to interact with developers via popular IDEs (Eclipse, Netbeans and IntelliJ IDEA). All the tools were developed and tested in a 2-year project with 6 software development companies where thousands of code smells were identified and fixed in 5 systems having altogether over 5 million lines of code.

Video

Info

Improving Prioritization of Software Weaknesses using Security Models with AVUS
Stephan Renatus, Corrie Bartelheimer, and Jörn Eichler
(Fraunhofer AISEC, Germany)
Testing tools for application security have become an integral part of secure development life-cycles. Despite their ability to spot important software weaknesses, the high number of findings require rigorous prioritization. Most testing tools provide generic ratings to support prioritization. Unfortunately, ratings from established tools lack context information especially with regard to the security requirements of respective components or source code. Thus experts often spend a great deal of time re-assessing the prioritization provided by these tools. This paper introduces our lightweight tool AVUS that adjusts context-free ratings of software weaknesses according to a user-defined security model. We also present a first evaluation applying AVUS to a well-known open source project and the findings of a popular, commercially available application security testing tool.

Info

A Static Microcode Analysis Tool for Programmable Load Drivers
Luca Dariz, Massimiliano Ruggeri, and Michele Selvatici
(IMAMOTER - CNR, Italy)
The advances in control electronics, with the introduction of programmable load drivers, have changed the way in which actuators, both resistive and inductive, such as electrical motors, injectors, valves, are controlled. However, usually the only programming language available for these drivers is the native assembly-like microcode that, allowing for unstructured programing constructs, exposes to the risk of dangerous control flow paths, like infinite loops or jumps to non-existent locations. In this paper an automatic static analyzer is presented, which reconstruct the control flow graph of an application from the microcode source file and checks for infinite loops and undefined jumps caused by the corresponding jump register not being set for a particular path.

Video

CodeMetropolis: Eclipse over the City of Source Code
Gergő Balogh

, Attila Szabolics, and Árpád Beszédes

(University of Szeged, Hungary)
The graphical representations of software (code visualization in particular) may provide both professional programmers and students learning only the basics with support in program comprehension. Among the numerous proposed approaches, our research applies the city metaphor for the visualisation of such code elements as classes, functions, or attributes by the tool CodeMetropolis. It uses the game engine of Minecraft for the graphics, and is able to visualize various properties of the code based on structural metrics. In this work, we present our approach to integrate our visualization tool into the Eclipse IDE environment. Previously, only standalone usage was possible, but with this new version the users can invoke the visualization directly from the IDE, and all the analysis is performed in the background. The new version of the tool now includes an Eclipse plug-in and a Minecraft modification in addition to the analysis and visualization modules which have also been extended with some new features. Possible use cases and a detailed scenario are presented.

Video

Info

SCAM 2015 – Proceedings

Frontmatter

Main Research

Empirical Studies I Sun, Sep 27, 11:00 - 12:30, HS 1010 (Chair: Alexander Serebrenik)

Code Search and Navigation Sun, Sep 27, 16:00 - 18:00, HS 1010 (Chair: David Shepherd)

Static Analysis Mon, Sep 28, 09:00 - 10:30, HS 1010 (Chair: Paul Anderson)

Empirical Studies II Mon, Sep 28, 11:00 - 12:15, HS 1010 (Chair: Chanchal K. Roy)

(Code, Memory, Performance) Smells Mon, Sep 28, 13:30 - 15:30, HS 1010 (Chair: Jurgen Vinju)

Code and API Transformation Mon, Sep 28, 16:00 - 17:15, HS 1010 (Chair: Raghavan Komondoor)

Tool Demos Sun, Sep 27, 13:30 - 15:30, GW2 B2890 (Chair: Felienne Hermans)

Empirical Studies I
Sun, Sep 27, 11:00 - 12:30, HS 1010 (Chair: Alexander Serebrenik)

Code Search and Navigation
Sun, Sep 27, 16:00 - 18:00, HS 1010 (Chair: David Shepherd)

Static Analysis
Mon, Sep 28, 09:00 - 10:30, HS 1010 (Chair: Paul Anderson)

Empirical Studies II
Mon, Sep 28, 11:00 - 12:15, HS 1010 (Chair: Chanchal K. Roy)

(Code, Memory, Performance) Smells
Mon, Sep 28, 13:30 - 15:30, HS 1010 (Chair: Jurgen Vinju)

Code and API Transformation
Mon, Sep 28, 16:00 - 17:15, HS 1010 (Chair: Raghavan Komondoor)

Tool Demos
Sun, Sep 27, 13:30 - 15:30, GW2 B2890 (Chair: Felienne Hermans)