Powered by
Conference Publishing Consulting

2014 Software Evolution Week --- IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), February 3-6, 2014, Antwerp, Belgium

CSMR-WCRE 2014 – Proceedings

Contents - Abstracts - Authors



Title Page

Message from the Chairs
Welcome to the first Software Evolution Week! – the joining of the Working Conference on Reverse Engineering (WCRE), the premier research conference on the theory and practice of recovering information from existing software and systems, and the European Conference on Software Maintenance and Reengineering (CSMR), the premier European conference on the theory and practice of maintenance, reengineering, and evolution of software systems. This is an exciting time in the history of both events as the two become one.
Software Evolution Week (SEW) will be held in Antwerp, Belgium at the University of Antwerp, from 3 February through 6 February 2014. Taken from WCRE, SEW encourages and welcomes papers and presentations that stir discussion and touch upon nontraditional topics. Attendees should expect to learn about exciting and upcoming research opportunities in software evolution. More importantly, they should expect to play a role in shaping the future of software evolution through interaction during and following SEW. To aid in this process each session will include an open discussion of the ideas presented. This open forum provides an ideal place to contribute your ideas and become part of the SEW tradition!



Using Biology and Ecology as Inspiration for Software Maintenance? (Keynote Abstract)
Philippe Grosjean
(University of Mons, Belgium)
As a bioengineer and marine ecologist, I probably have a different view on software complexity and evolution than specialists in this field. The literature as well as discussion with colleagues suggests that there may well be "hidden gems" in traditional ecology for software engineers. In this presentation, I will compare a couple of biological and software (mostly Open Source) ecosystems and suggest a few ideas that may be useful for software maintenance research.
Two key aspects appeared to me when I started to work on Open Source software ecosystems: (1) the difference in terminology in biology and software engineering, and (2) the much more collaborative trends in software ecosystems, compared to biological ecosystems.
The first aspect is mostly a technical issue that unfortunately creates a strong barrier between software engineers and biologists. So, it should be worth considering using the same or similar meaning for the same terms, like ecosystem, resource, consumer,... in both disciplines.
The second aspect is much more interesting. So, software ecosystems exhibit much more collaboration and much less competition than biological ecosystems? Since biologists consider competition as one of the major driving forces for biological evolution (recall Darwin and his natural selection mechanism through struggle for existence), it is very clear that the fundamental rules that drive both biological and software ecosystems are completely different.

Article Search
Mitigating the Risk of Software Change in Practice: Retrospective on More Than 50 Architecture Evaluations in Industry (Keynote Paper)
Jens Knodel and Matthias Naab
(Fraunhofer IESE, Germany)
Architecture evaluation has become a mature instrument to mitigate the risk of software change. It enables decision-making about software systems being changed or being prepared for change. While scientific literature on architecture evaluation approaches is available, publications on practical experiences are rather limited. In this paper, we share our experiences – after having performed more than 50 architecture evaluations for industrial customers in the last decade. We compiled facts and consolidate our findings about the risk of software change and architecture evaluations as a means to mitigate change. We highlight the role of reverse engineering in these projects. In addition, we share our lessons learned and provide data on common beliefs and provide examples for frequently observed misconceptions on the power of reverse engineering. This industrial and practical perspective allows practitioners to benefit from our experience in their daily architecture work and the scientific community to focus their research work on the generalizability of our findings.

Article Search
The Vision of Software Clone Management: Past, Present, and Future (Keynote Paper)
Chanchal K. Roy, Minhaz F. Zibran, and Rainer Koschke
(University of Saskatchewan, Canada; University of Bremen, Germany)
Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, which cover the detection, analysis, and evolutionary characteristics of code clones. This paper presents a comprehensive survey on the state of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactoring, cost-benefit analysis) beyond the detection and analysis. This is the first survey on clone management, where we point to the achievements so far, and reveal avenues for further research necessary towards an integrated clone management system. We believe that we have done a good job in surveying the area of clone management and that this work may serve as a roadmap for future research in the area.

Article Search

Main Research

Code Search

NL-Based Query Refinement and Contextualized Code Search Results: A User Study
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet
(Montclair State University, USA)
As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Source code search tools match a developer's keyword-style or natural language query with comments and identifiers in the source code to identify relevant methods that may need to be changed or understood to complete the maintenance task. In this search process, the developer faces a number of challenges: (1) formulating a query, (2) determining if the results are relevant, and (3) if the results are not relevant, reformulating the query. In this paper, we present a NL-based results view for searching source code for maintenance that helps address these challenges by integrating multiple feedback mechanisms into the search results view: prevalence of the query words in the result set, results grouped by NL-based information, as a result list, and suggested alternative query words. Our search technique is implemented as an Eclipse plug-in, CONQUER, and has been empirically validated by 18 Java developers. Our results show that users prefer CONQUER over a state of the art search technique, requesting customization of the interface in future query reformulation techniques.

Article Search Info
Automated Construction of a Software-Specific Word Similarity Database
Yuan Tian, David Lo, and Julia Lawall
(Singapore Management University, Singapore; INRIA, France; LIP6, France)
Many automated software engineering approaches, including code search, bug report categorization, and duplicate bug report detection, measure similarities between two documents by analyzing natural language contents. Often different words are used to express the same meaning and thus measuring similarities using exact matching of words is insufficient. To solve this problem, past studies have shown the need to measure the similarities between pairs of words. To meet this need, the natural language processing community has built WordNet which is a manually constructed lexical database that records semantic relations among words and can be used to measure how similar two words are. However, WordNet is a general purpose resource, and often does not contain software-specific words. Also, the meanings of words in WordNet are often different than when they are used in software engineering context. Thus, there is a need for a software-specific WordNet-like resource that can measure similarities of words.
In this work, we propose an automated approach that builds a software-specific WordNet like resource, named WordSimSE , by leveraging the textual contents of posts in StackOverflow. Our approach measures the similarity of words by computing the similarities of the weighted co-occurrences of these words with three types of words in the textual corpus. We have evaluated our approach on a set of software-specific words and compared our approach with an existing WordNet-based technique (WordNet_res) to return top-k most similar words. Human judges are used to evaluate the effectiveness of the two techniques. We find that WordNet_res returns no result for 55% of the queries. For the remaining queries, WordNet_res returns significantly poorer results.

Article Search
A Case Study of Paired Interleaving for Evaluating Code Search Techniques
Kostadin Damevski, David Shepherd, and Lori Pollock
(Virginia State University, USA; ABB, USA; University of Delaware, USA)
Source code search tools are designed to help developers locate code relevant to their task. The effectiveness of a search technique often depends on properties of user queries, the code being searched, and the specific task at hand. Thus, new code search techniques should ideally be evaluated in realistic situations that closely reflect the complexity, purpose of use, and context encountered during actual search sessions. This paper explores what can be learned from using an online paired interleaving approach, originally used for evaluating internet search engines, to comparatively observe and assess the effectiveness of code search tools in the field. We present a case study in which we implemented online paired interleaving for code search, deployed the tool in an IDE for developers at multiple companies, and analyzed results from over 300 user queries during their daily software maintenance tasks. We leveraged the results to direct further improvement of a search technique, redeployed the tool and analyzed results from over 600 queries to validate that an improvement in search was achieved in the field. We also report on the characteristics of user queries collected during the study, which are significantly different than queries currently used in evaluations.

Article Search

Software Evolution

Broken Promises: An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades
Jens Dietrich, Kamil Jezek, and Premek Brada
(Massey University, New Zealand; University of West Bohemia, Czech Republic)
It has become common practice to build programs by using libraries. While the benefits of reuse are well known, an often overlooked risk are system runtime failures due to API changes in libraries that evolve independently. Traditionally, the consistency between a program and the libraries it uses is checked at build time when the entire system is compiled and tested. However, the trend towards partially upgrading systems by redeploying only evolved library versions results in situations where these crucial verification steps are skipped. For Java programs, partial upgrades create additional interesting problems as the compiler and the virtual machine use different rule sets to enforce contracts between the providers and the consumers of APIs. We have studied the extent of the problem on the qualitas corpus, a data set consisting of Java open-source programs widely used in empirical studies. In this paper, we describe the study and report its key findings. We found that the above mentioned issues do occur in practice, albeit not on a wide scale.

Article Search Info
Detecting Infeasible Branches Based on Code Patterns
Sun Ding, Hongyu Zhang, and Hee Beng Kuan Tan
(Nanyang Technological University, Singapore; Tsinghua University, China)
Infeasible branches are program branches that can never be exercised regardless of the inputs of the program. Detecting infeasible branches is important to many software engineering tasks such as test case generation and test coverage measurement. Applying full-scale symbolic evaluation to infeasible branch detection could be very costly, especially for a large software system. In this work, we propose a code pattern based method for detecting infeasible branches. We first introduce two general patterns that can characterize the source code containing infeasible branches. We then develop a tool, called Pattern-based method for Infeasible branch Detection (PIND), to detect infeasible branches based on the discovered code patterns. PIND only performs symbolic evaluation for the branches that exhibit the identified code patterns, therefore significantly reduce the number of symbolic evaluations required. We evaluate PIND from two aspects: accuracy and efficiency. The experimental results show that PIND can effectively and efficiently detect infeasible branches in real-world Java and Android programs. We also explore the application of PIND in measuring test case coverage.

Article Search
Web API Growing Pains: Stories from Client Developers and Their Code
Tiago Espinha, Andy Zaidman, and Hans-Gerhard Gross
(Delft University of Technology, Netherlands)
Web APIs provide a systematic and extensible approach for application-to-application interaction. Developers using web APIs are forced to accompany the API providers in their software evolution tasks. In order to understand the distress caused by this imposition on web API client developers we perform a semi-structured interview with six such developers. We also investigate how major web API providers organize their API evolution, and we explore how this affects source code changes of their clients. Our exploratory study of the Twitter, Google Maps, Facebook and Netflix web APIs analyzes the state of web API evolution practices and provides insight into the impact of service evolution on client software. Our study is complemented with a set of observations regarding best practices for web API evolution.

Article Search

Software Clones

An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies
Shuai Xie, Foutse Khomh, Ying Zou, and Iman Keivanloo
(Queen's University, Canada; Polytechnique Montréal, Canada)
Copy and paste activities create clone groups in software systems. The evolution of a clone group across the history of a software system is termed as clone genealogy. During the evolution of a clone group, developers may change the location of the code fragments in the clone group. The type of the clone group may also change (e.g., from Type-1 to Type-2). These two phenomena have been referred to as clone migration and clone mutation respectively. Previous studies have found that clone migration occur frequently in software systems, and suggested that clone migration can induce faults in a software system. In this paper, we examine how clone migration phenomena affect the risk for faults in clone segments, clone groups, and clone genealogies from three long-lived software systems JBOSS, APACHE-ANT, and ARGOUML. Results show that: (1) migrated clone segments, clone groups, and clone genealogies are not equally fault-prone; (2) when a clone mutation occurs during a clone migration, the risk for faults in the migrated clone is increased; (3) migrating a clone that was not changed for a longer period of time is risky.

Article Search
Unification and Refactoring of Clones
Giri Panamoottil Krishnan and Nikolaos Tsantalis
(Concordia University, Canada)
Code duplication has been recognized as a potentially serious problem having a negative impact on the maintainability, comprehensibility, and evolution of software systems. In the past, several techniques have been developed for the detection and management of software clones. Existing code duplication can be eliminated by extracting the common functionality into a single module. However, the unification and refactoring of software clones is a challenging problem, since clones usually go through several modifications after their initial introduction. In this paper we present an approach for the unification and refactoring of software clones that overcomes the limitations of previous approaches. More specifically, our approach is able to detect and parameterize non-trivial differences between the clones. Moreover, it can find an optimal mapping between the statements of the clones that minimizes the number of differences. We compared the proposed technique with a competitive clone refactoring tool and concluded that our approach is able to find a significantly larger number of refactorable clones.

Article Search
Automatic Ranking of Clones for Refactoring through Mining Association Rules
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
In this paper, we present an in-depth empirical study on identifying clone fragments that can be important refactoring candidates. We mine association rules among clones in order to detect clone fragments that belong to the same clone class and have a tendency of changing together during software evolution. The idea is that if two or more clone fragments from the same class often change together (i.e., are likely to co-change) preserving their similarity, they might be important candidates for refactoring. Merging such clones into one (if possible) can potentially decrease future clone maintenance effort.
We define a particular clone change pattern, the Similarity Preserving Change Pattern (SPCP), and consider the cloned fragments that changed according to this pattern (i.e., the SPCP clones) as important candidates for refactoring. For the purpose of our study, we implement a prototype tool called MARC that identifies SPCP clones and mines association rules among these. The rules as well as the SPCP clones are ranked for refactoring on the basis of their change-proneness. We applied MARC on thirteen subject systems and retrieved the refactoring candidates for three types of clones (Type 1, Type 2, and Type 3) separately. Our experimental results show that SPCP clones can be considered important candidates for refactoring. Clones that do not follow SPCP either evolve independently or are rarely changed. By considering SPCP clones for refactoring we not only can minimize refactoring effort considerably but also can reduce the possibility of delayed synchronizations among clones and thus, can minimize inconsistencies in software systems.

Article Search

Fault Understanding

Follow the Path: Debugging State Anomalies along Execution Histories
Michael Perscheid, Tim Felgentreff, and Robert Hirschfeld
(HPI, Germany)
To understand how observable failures come into being, back-in-time debuggers help developers by providing full access to past executions. However, such potentially large execution histories do not include any hints to failure causes. For that reason, developers are forced to ascertain unexpected state properties and wrong behavior completely on their own. Without deep program understanding, back-in-time debugging can end in countless and difficult questions about possible failure causes that consume a lot of time for following failures back to their root causes.
In this paper, we present state navigation as a debugging guide that highlights unexpected state properties along execution histories. After deriving common object properties from the expected behavior of passing test cases, we generate likely invariants, compare them with the failing run, and map differences as state anomalies to the past execution. So, developers obtain a common thread through the large amount of run-time data which helps them to answer what causes the observable failure. We implement our completely automatic state navigation as part of our test- driven fault navigation and its Path tools framework. To evaluate our approach, we observe eight developers during debugging four non-trivial failures. As a result, we find out that our state navigation is able to aid developers and to decrease the required time for localizing the root cause of a failure.

Article Search Video Info
Towards More Accurate Multi-label Software Behavior Learning
Xin Xia, Yang Feng, David Lo, Zhenyu Chen, and Xinyu Wang
(Zhejiang University, China; Nanjing University, China; Singapore Management University, Singapore)
In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of faults, using a composite algorithm named MLL-GA, which combines various multi-label learning algorithms by leveraging genetic algorithm (GA). To evaluate the effectiveness of MLL-GA, we perform experiments on 6 open source programs and show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665. We also compare our algorithm with Ml.KNN and show that on average across the 6 datasets, MLL-GA improves the average F-measure of Ml.KNN by 14.43%.

Article Search
An Empirical Study of Long Lived Bugs
Ripon K. Saha, Sarfraz Khurshid, and Dewayne E. Perry
(University of Texas at Austin, USA)
Bug fixing is a crucial part of software development and maintenance. A large number of bugs often indicate poor software quality since buggy behavior not only causes failures that may be costly but also has a detrimental effect on the user’s overall experience with the software product. The impact of long lived bugs can be even more critical since experiencing the same bug version after version can be particularly frustrating for user. While there are many studies that investigate factors affecting bug fixing time for entire bug repositories, to the best of our knowledge, none of these studies investigates the extent and reasons of long lived bugs. In this paper, we analyzed long lived bugs from five different perspectives: their proportion, severity, assignment, reasons, as well as the nature of fixes. Our study on four open-source projects shows that there are a considerable number of long lived bugs in each system and over 90% of them adversely affect the user’s experience. The reasons of these long lived bugs are diverse including long assignment time, not understanding their importance in advance etc. However, many bug-fixes were delayed without any specific reasons. Our analysis of bug fixing changes further shows that many long lived bugs can be fixed quickly through careful prioritization. We believe our results will help both developers and researchers to better understand factors behind delays, improve the overall bug fixing process, and investigate analytical approaches for prioritizing bugs based on bug severity as well as expected bug fixing effort.

Article Search

Where the Faults Lie

Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron, Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson
(Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden)
Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies.

Article Search
Cross-Project Defect Prediction Models: L'Union Fait la Force
Annibale Panichella, Rocco Oliveto, and Andrea De Lucia
(University of Salerno, Italy; University of Molise, Italy)
Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors.

Article Search
An Empirical Study of Bug Report Field Reassignment
Xin Xia, David Lo, Ming Wen, Emad Shihab, and Bo Zhou
(Zhejiang University, China; Singapore Management University, Singapore; Rochester Institute of Technology, USA)
A bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), platform, etc., which provide important information for the bug triaging and fixing process. It is important to make sure that bug information is correct since previous studies showed that the wrong assignment of bug report fields could increase the bug fixing time, and even delay the delivery of the software. In this paper, we perform an empirical study on bug report field reassignments in open-source software projects. To better understand why bug report fields are reassigned, we manually collect 99 recent bug reports that had their fields reassigned and emailed their reporters and developers asking why these fields got reassigned. Then, we perform a large-scale empirical study on 8 types of bug report field reassignments in 4 open-source software projects containing a total of 190,558 bug reports. In particular, we investigate 1) the number of bug reports whose fields get reassigned, 2) the difference in bug fixing time between bug reports whose fields get reassigned and those whose fields are not reassigned, 3) the duration a field in a bug report gets reassigned, 4) the number of fields in a bug report that get reassigned, 5) the number of times a field in a bug report gets reassigned, and 6) whether the experience of bug reporters affect the reassignment of bug report fields. We find that a large number (approximately 80%) of bug reports have their fields reassigned, and the bug reports whose fields get reassigned require more time to be fixed than those without field reassignments.

Article Search

Software Quality Improvement

Supporting Continuous Integration by Mashing-Up Software Quality Information
Martin Brandtner, Emanuel Giger, and Harald Gall
(University of Zurich, Switzerland)
Continuous Integration (CI) has become an established best practice of modern software development. Its philosophy of regularly integrating the changes of individual developers with the mainline code base saves the entire development team from descending into Integration Hell, a term coined in the field of extreme programming. In practice CI is supported by automated tools to cope with this repeated integration of source code through automated builds, testing, and deployments. Currently available products, for example, Jenkins-CI, SonarQube or GitHub, allow for the implementation of a seamless CI-process. One of the main problems, however, is that relevant information about the quality and health of a software system is both scattered across those tools and across multiple views. We address this challenging problem by raising awareness of quality aspects and tailor this information to particular stakeholders, such as developers or testers. For that we present a quality awareness framework and platform called SQA-Mashup. It makes use of the service-based mashup paradigm and integrates information from the entire CI-toolchain in a single service. To evaluate its usefulness we conducted a user study. It showed that SQA-Mashup’s single point of access allows to answer questions regarding the state of a system more quickly and accurately than standalone CI-tools.

Article Search Info
Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions
Mohammad Masudur Rahman, Shamima Yeasmin, and Chanchal K. Roy
(University of Saskatchewan, Canada)
Study shows that software developers spend about 19% of their time looking for information in the web during software development and maintenance. Traditional web search forces them to leave the working environment (e.g., IDE) and look for information in the web browser. It also does not consider the context of the problems that the developers search solutions for. The frequent switching between web browser and the IDE is both time-consuming and distracting, and the keyword-based traditional web search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that exploits the APIs provided by three popular web search engines- Google, Yahoo, Bing and a popular programming Q & A site, StackOverflow, and captures the content-relevance, context-relevance, popularity and search engine confidence of each candidate result against the encountered programming problems. Experiments with 75 programming errors and exceptions using the proposed approach show that inclusion of different types of contextual information associated with a given exception can enhance the recommendation accuracy of a given exception. Experiments both with two existing approaches and existing web search engines confirm that our approach can perform better than them in terms of recall, mean precision and other performance measures with little computational cost.

Article Search Video
Test Suite Reduction for Fault Detection and Localization: A Combined Approach
László Vidács, Árpád Beszédes, Dávid Tengeri, István Siket, and Tibor Gyimóthy
(MTA-SZTE Research Group on Artificial Intelligence, Hungary; University of Szeged, Hungary)
The relation of test suites and actual faults in a software is of critical importance for timely product release. There are two particularily critical properties of test suites to this end: fault localization capability, to characterize the effort of finding the actually defective program elements, and fault detection capability which measures how probable is their manifestation and detection in the first place. While there are well established methods to predict fault detection capability (by measuring code coverage, for instance), characterization of fault localization is an emerging research topic. In this work, we investigate the effect of different test reduction methods on the performance of fault localization and detection techniques. We also provide new combined methods that incorporate both localization and detection aspects. We empirically evaluate the methods first by measuring detection and localization metrics of test suites with various reduction sizes, followed by how reduced test suites perform with actual faults. We experiment with SIR programs traditionally used in fault localization research, and extend the case study with large industrial software systems including GCC and WebKit.

Article Search


In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria
Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, Andrian Marcus, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
(University of Salerno, Italy; University of Molise, Italy; Wayne State University, USA; Polytechnique Montréal, Canada)
Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer’s point-of-view.

Article Search
Remodularization Analysis using Semantic Clustering
Gustavo Santos, Marco Tulio Valente, and Nicolas Anquetil
(UFMG, Brazil; INRIA, France)
In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report that Semantic Clustering and conceptual metrics can be used to express and explain the intention of the architects when performing common modularization operators, such as module decomposition.

Article Search Info
Mc2for: A Tool for Automatically Translating Matlab to Fortran 95
Xu Li and Laurie Hendren
(McGill University, Canada)
MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB's high-level syntax and dynamic types make it ideal for prototyping, programmers often prefer using high-performance static languages such as Fortran for their final distributable code. Rather than rewriting the code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to an equivalent Fortran program. There are several important challenges for automatically translating MATLAB to Fortran, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to equivalent Fortran constructs.
In this paper, we introduce Mc2For, a tool which automatically translates MATLAB to Fortran 95. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shape of arrays and the range of scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated Fortran program. The second part is an extensible Fortran code generation framework automatically transforming MATLAB constructs to Fortran. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated Fortran code on a collection of MATLAB benchmarks.

Article Search

Empirical Investigation

Does Return Null Matter?
Shuhei Kimura, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, and Shinji Kusumoto
(Osaka University, Japan)
Developers often use null references for the returned values of methods (return null) in object-oriented languages. Although developers often use return null to indicate that a program does not satisfy some necessary conditions, it is generally felt that a method returning null is costly to maintain. One of the reasons for is that when a method receives a value returned from a method invocation whose code includes return null, it is necessary to check whether the returned value is null or not (null check). As developers often forget to write null checks, null dereferences occur frequently. However, it has not been clarified to what degree return null affects software maintenance during software evolution. This paper shows the influences of return null by investigating return null and null check in the evolution of source code. Experiments conducted on 14 open source projects showed that developers modify return null more frequently than return statements that do not include null. This result indicates that return null has a negative effect on software maintenance. It was also found that the size and the development phases of projects have no effect on the frequency of modifications on return null and null check. In addition, we found that all the projects in this experiment had from one to four null checks per 100 lines.

Article Search
Extracting Relative Thresholds for Source Code Metrics
Paloma Oliveira, Marco Tulio Valente, and Fernando Paim Lima
(UFMG, Brazil; IFMG, Brazil)
Establishing credible thresholds is a central challenge for promoting source code metrics as an effective instrument to control the internal quality of software systems. To address this challenge, we propose the concept of relative thresholds for evaluating metrics data following heavy-tailed distributions. The proposed thresholds are relative because they assume that metric thresholds should be followed by most source code entities, but that it is also natural to have a number of entities in the ``long-tail'' that do not follow the defined limits. In the paper, we describe an empirical method for extracting relative thresholds from real systems. We also report a study on applying this method in a corpus with 106 systems. Based on the results of this study, we argue that the proposed thresholds express a balance between real and idealized design practices.

Article Search
Reverse Engineering Web Configurators
Ebrahim Khalil Abbasi, Mathieu Acher, Patrick Heymans, and Anthony Cleve
(University of Namur, Belgium; University of Rennes 1, France)
A Web configurator offers a highly interactive environment to assist users in customising sales products through the selection of configuration options. Our previous empirical study revealed that a significant number of configurators are suboptimal in reliability, efficiency, and maintainability, opening avenues for re-engineering support and methodologies. This paper presents a tool-supported reverse-engineering process to semi-automatically extract configuration-specific data from a legacy Web configurator. The extracted and structured data is stored in formal models (e.g., variability models) and can be used in a forward-engineering process to generate a customized interface with an underlying reliable reasoning engine. Two major components are presented: (1) a Web Wrapper that extracts structured configuration-specific data from unstructured or semi-structured Web pages of a configurator, and (2) a Web Crawler that explores the ``configuration space" (i.e., all objects representing configuration-specific data) and simulates users' configuration actions. We describe variability data extraction patterns, used on top of the Wrapper and the Crawler to extract configuration data. Experimental results on five existing Web configurators show that the specification of a few variability patterns enable the identification of hundreds of options.

Article Search

Patterns and Anti-patterns

A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces
Luay Alawneh, Abdelwahab Hamou-Lhadj, Syed Shariyar Murtaza, and Yan Liu
(Concordia University, Canada; Jordan University of Science and Technology, Jordan)
Studies have shown that understanding of inter-process communication patterns is an enabler to effective analysis of high performance computing (HPC) applications. In previous work, we presented an algorithm for recovering communication patterns from traces of HPC systems. The algorithm worked well on small cases but it suffered from low accuracy when applied to large (and most interesting) traces. We believe that this was due to the fact that we viewed the trace as a mere string of operations of inter-process communication. That is, we did not take into account program control flow information. In this paper, we improve the detection accuracy by using function calls to serve as a context to guide the pattern extraction process. When applied to traces generated from two HPC benchmark applications, we demonstrate that this contextual approach improves precision and recall by an average of 56% and 66% respectively over the non-contextual method.

Article Search
Transition and Defect Patterns of Components in Dependency Cycles during Software Evolution
Tosin Daniel Oyetoyan, Daniela Soares Cruzes, and Reidar Conradi
(NTNU, Norway; SINTEF, Norway)
The challenge to break existing cyclically connected components of running software is not trivial. Since it involves planning and human resources to ensure that the software behavior is preserved after refactoring activity. Therefore, to motivate refactoring it is essential to obtain evidence of the benefits to the product quality. This study investigates the defect-proneness patterns of cyclically connected components vs. non-cyclic ones when they transition across software releases. We have mined and classified software components into two groups and two transition states– the cyclic and the non-cyclic ones. Next, we have performed an empirical study of four software systems from evolutionary perspective. Using standard statistical tests on formulated hypotheses, we have determined the significance of the defect profiles and complexities of each group. The results show that during software evolution, components that transition between dependency cycles have higher probability to be defect-prone than those that transition outside of cycles. Furthermore, out of the three complexity variables investigated, we found that an increase in the class reachability set size tends to be more associated with components that turn defective when they transition between dependency cycles. Lastly, we found no evidence of any systematic “cycle-breaking” refactoring between releases of the software systems. Thus, these findings motivate for refactoring of components in dependency cycle taking into account the minimization of metrics such as the class reachability set size.

Article Search
Anti-pattern Detection with Model Queries: A Comparison of Approaches
Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc
(Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary)
Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios.

Article Search

Early Research Achievements

Maintenance and Co

Examining the Relationship between Topic Model Similarity and Software Maintenance
Scott Grant and James R. Cordy
(Queen's University, Canada)
Software maintenance is the last phase of software development, and typically one of the most time-consuming. One reason for this is the difficulty in finding related source code fragments. A high-level understanding of the source code is necessary to make decisions about which source code fragments should be modified together, for example, in the context of fixing a bug. Even with a similarity metric available, understanding what it means to measure similarity in the first place is important; if a technique suggests that two source code fragments are related, is there a human-oriented way of explaining that relation? In this paper, we attempt to identify a concrete link between software maintenance and the similarity metrics provided by latent topic models. We show that similarity in topic models is related to the likelihood that source code fragments will be modified together in the future, and that an awareness of similar source code can make software maintenance easier.

Article Search
On the Maintainability of CRAN Packages
Maëlick Claes, Tom Mens, and Philippe Grosjean
(University of Mons, Belgium)
When writing software, developers are confronted with a trade-off between depending on existing components and reimplementing similar functionality in their own code. Errors may be inadvertently introduced because of dependencies to unreliable components, and it may take longer time to fix these errors. We study such issues in the context of the CRAN archive, a long-lived software ecosystem consisting of over 5000 R packages being actively maintained by over 2500 maintainers, with different flavors of each package depending on the development status and target operating system. Based on an analysis of package dependencies and package status, we present preliminary results on the sources of errors in these packages per flavor, and the time that is needed to fix these errors.

Article Search
Formal Foundations for Semi-parsing
Vadim Zaytsev
(University of Amsterdam, Netherlands)
There exist many techniques for imprecise manipulation of source code (robust parsing, error repair, lexical analysis, etc), mostly relying on heuristic-based tolerance. Such techniques are rarely fully formalised and quite often idiosyncratic, which makes them very hard to compare with respect to their applicability, tolerance level and general usefulness. With a combination of recently developed formal methods such as Boolean grammars and parsing schemata, we can model different tolerant methods of modelling software and formally argue about relationships between them.

Article Search
On the Use of Positional Proximity in IR-Based Feature Location
Emily Hill, Bunyamin Sisman, and Avinash Kak
(Montclair State University, USA; Purdue University, USA)
As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Recently proposed approaches to bug localization and feature location have suggested using the positional proximity of words in the source code files and the bug reports to determine the relevance of a file to a query. Two different types of approaches have emerged for incorporating word proximity and order in retrieval: those based on ad-hoc considerations and those based on Markov Random Field (MRF) modeling. In this paper, we explore using both these types of approaches to identify over 200 features in five open source Java systems. In addition, we use positional proximity of query words within natural language (NL) phrases in order to capture the NL semantics of positional proximity. As expected, our results indicate that the power of these approaches varies from one dataset to another. However, the variations are larger for the ad-hoc positional-proximity based approaches than with the approach based on MRF. In other words, the feature location results are more consistent across the datasets with MRF based modeling of the features.

Article Search
Recommending Verbs for Rename Method using Association Rule Mining
Yuki Kashiwabara, Yuya Onizuka, Takashi Ishio, Yasuhiro Hayase, Tetsuo Yamamoto, and Katsuro Inoue
(Osaka University, Japan; University of Tsukuba, Japan; Nihon University, Japan)
An identifier is one of the crucial elements for program readability. Method names in an object-oriented program are important identifiers because method names are used for understanding the behavior of the methods without reading a part of the program. It is well-known that each method name should consist of a verb and objects according to general guidelines. However, it is not easy to name methods consistently since each of the developers may have a different understanding of the verbs and objects used in the method names. As a first step to enable developers to name methods consistently and easily, we focus on the verbs used in the method names.
In this paper, we present a technique to recommend candidate verbs for a method name so that developers can use consistent verbs for method names. Given a method, we recommend a list of verbs used in many other methods similar to the given method, by using association rules. We have extracted association rules from 445 OSS projects and applied these rules to two projects. As a result, the extracted rules could recommend the current verbs in the top 10 candidates for 60.6% of the methods covered by our approach. Furthermore, we have identified four meaningful groups of rules for verb recommendation.

Article Search
An Algorithm for Keyword Search on an Execution Path
Toshihiro Kamiya
(Future University Hakodate, Japan)
This paper presents a code-search method, which includes an algorithm of keyword code-search and a prototype implementation. In this paper, a query is a set of keywords and a search result is a set of execution paths fulfilling the query, that is, each of the execution paths includes all of the keywords. Here, an execution path represents one of all levels of method calls of all possible dynamic dispatches in an OO program; thus, many execution paths can be generated even from a small program. The algorithm works on a data structure named an And/Or/Call graph, which is a compact representation of execution paths. The prototype implementation searches names of methods or types, or words in string literals from Java source code.

Article Search
Comparison of Feature Implementations across Languages, Technologies, and Styles
Ralf Lämmel, Martin Leinberger, Thomas Schmorleiz, and Andrei Varanovich
(University of Koblenz-Landau, Germany)
We describe and validate a method for comparing programming languages or technologies or programming styles in the context of implementing certain programming tasks. To this end, we analyze a number of 'little software systems' readily implementing a common feature set. We analyze source code, structured documentation, derived metadata, and other computed data. More specifically, we compare these systems on the grounds of the NCLOC metric while delegating more advanced metrics to future work. To reason about feature implementations in such a multi-language and multi-technological setup, we rely on an infrastructure which enriches traditional software artifacts (i.e., files in a repository) with additional metadata for implemented features as well as used languages and technologies. All resources are organized and exposed according to Linked Data principles so that they are conveniently explorable; both programmatic and interactive access is possible. The relevant formats and the underlying ontology are openly accessible and documented.

Article Search
Spotting Automatically Cross-Language Relations
Federico Tomassetti, Giuseppe Rizzo, and Marco Torchiano
(Politecnico di Torino, Italy; Università di Torino, Italy; EURECOM, France)
Nowadays most of the software projects are coded using several formal languages, either spread on different artifacts or even embedded in the same one. These formal languages are linked each other using cross-language relations, mainly framework specific and established at runtime. In this work we present a language agnostic approach to automatically detect cross-language relations to ease re-factoring, validation and to allow navigation support to the developer. We map a project in a set of Abstract Syntax Trees (ASTs); pair-wise we compute the intersection of the nodes and we pre-select potential candidates that can hold cross-relations. We then factorize the ASTs according to the nodes which surround the candidate and pair-wise we compute the semantic similarity of the factorized trees. We narrow down a set of statistically significant features and we map them into a predictive model. We apply such a procedure to an AngularJS application and we show that this approach spots cross-language relations at fine grained level with 93.2% of recall and a F-measure of 92.2%.

Article Search

Change and Co-evolution

Mining Frequent Bug-Fix Code Changes
Haidar Osman, Mircea Lungu, and Oscar Nierstrasz
(University of Bern, Switzerland)
Detecting bugs as early as possible plays an important role in ensuring software quality before shipping. We argue that mining previous bug fixes can produce good knowledge about why bugs happen and how they are fixed. In this paper, we mine the change history of 717 open source projects to extract bug-fix patterns. We also manually inspect many of the bugs we found to get insights into the contexts and reasons behind those bugs. For instance, we found out that missing null checks and missing initializations are very recurrent and we believe that they can be automatically detected and fixed.

Article Search
Orchestrating Change: An Artistic Representation of Software Evolution
Shane McIntosh, Katie Legere, and Ahmed E. Hassan
(Queen's University, Canada)
Several visualization tools have been proposed to highlight interesting software evolution phenomena. These tools help practitioners to navigate large and complex software systems, and also support researchers in studying software evolution. However, little work has explored the use of sound in the context of software evolution. In this paper, we propose the use of musical interpretation to support exploration of software evolution data. In order to generate music inspired by software evolution, we use parameter-based sonification, i.e., a mapping of dataset characteristics to sound. Our approach yields musical scores that can be played synthetically or by a symphony orchestra. In designing our approach, we address three challenges: (1) the generated music must be aesthetically pleasing, (2) the generated music must accurately reflect the changes that have occurred, and (3) a small group of musicians must be able to impersonate a large development team. We assess the feasibility of our approach using historical data from Eclipse, which yields promising results.

Article Search Info
Co-evolving Code-Related and Database-Related Changes in a Data-Intensive Software System
Mathieu Goeminne, Alexandre Decan, and Tom Mens
(University of Mons, Belgium)
Current empirical studies on the evolution of software systems are primarily analysing source code. Sometimes, social aspects such as the activity of contributors are considered as well. Very few studies, however, focus on data-intensive software systems (DISS), in which a significant part of the total development effort is devoted to maintaining and evolving the database schema. We report on early results obtained in the empirical analysis of the co-evolution between code-related and database-related activities in a large open source DISS. As a case study, we have analysed OSCAR, for which the historical information spanning many years is available in a Git repository.

Article Search
Improving the Detection Accuracy of Evolutionary Coupling by Measuring Change Correspondence
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
If two or more program entities change together (i.e., co-change) frequently (i.e., in many commits) during software evolution, it is likely that the entities are related and we say that the entities are showing evolutionary coupling. Association rules have been used to express evolutionary coupling and two related measures, support and confidence, have been used to measure the strength of coupling among the co-changed entities. However, an association rule relies only on the number of times the entities have co-changed. It does not analyze whether the changes are corresponding and whether the entities are really related. As a result, association rule often reports false positives and also, ignores important coupling among the infrequently co-changed entities. Focusing on this issue we propose to calculate a new measure, change correspondence, blending the idea of concept location in a code-base to determine whether the changes to the co-changed entities are corresponding and thus, whether they are really related. Our preliminary investigation result on four subject systems written in two programming languages shows that change correspondence has the potential to accurately determine whether two entities are related even if they co-changed infrequently. Thus, we believe that our new measure will help us improve the detection accuracy of evolutionary coupling.

Article Search

Industry Track

Fact Extraction from Bash in Support of Script Migration
Ian J. Davis, Richard C. Holt, and Ron Mraz
(University of Waterloo, Canada; Owl Computing Technologies, USA)
Owl Computing Technologies provides software and hardware that facilitates secure unidirectional data transfer across the internet. Bash scripts are used to facilitate customer installation of Owl’s client/server software, and to provide high level management, control, and monitoring of client/server interfaces. With the evolution of more robust scripting languages, Owl now wishes to convert their bash scripts to other scripting languages. As part of this conversion exercise the configuration and customization of their bash scripts will no longer involve direct end user modifications of the script logic. It will instead be achieved through appropriate modification of a supporting XML configuration file, which is read by each script. This avoids the risk that end users erroneously change scripts, and makes legitimate end user customization of their scripts simpler, more obvious, and easier to discern.
An open source fact extractor was implemented that determines the dynamic usage made of every variable within an arbitrary bash script. This tool reports errors in a script and generates an XML configuration file that describes variable usage. Those variables whose value may not be assigned by an end user are manually removed from this XML configuration file. A second program reads this configuration file, generates the appropriate bash variable assignment statements, and these are then applied within bash by using the bash eval command. Collectively this provides a simple mechanism for altering arbitrary bash scripts so that they use an external XML configuration file, as a first step in the larger exercise of migrating bash scripts to other scripting languages.

Article Search Info
Lightweight Runtime Reverse Engineering of Binary File Format Variants
Jeroen van den Bos
(Netherlands Forensic Institute, Netherlands)
Binary file formats are regularly extended and modified, often unintentionally in the form of bugs in the implementations of applications and libraries that create files. Applications that need to read data from binary files created by other applications face the complicated task of supporting the resulting many variants.
Lightweight implementation patterns to perform runtime reverse engineering can be used to handle common extensions, modifications and bugs. This increases application usability by generating fewer errors as well as provides useful automated feedback to maintainers.
This paper describes a set of patterns that are the result of experience in developing and maintaining a collection of automated digital forensics tools. The patterns are illustrated through practical examples and can be directly applied by practitioners.

Article Search
Towards Tool Support for Analyzing Legacy Systems in Technical Domains
Claus Klammer and Josef Pichler
(Software Competence Center Hagenberg, Austria)
Software in technical domains contains extensive and complex computations in a highly-optimized and unstructured way. Such software systems developed and maintained over years are prone to become legacy code based on old technology and without accurate documentation. We have conducted several industrial projects to reengineer and re-document legacy systems in electrical engineering and steel making domains by means of self-provided techniques and tools. Based on this experience, we derived requirements for a toolkit to analyze legacy code in technical domains and developed a corresponding toolkit including feature location and static analysis on a multi-language level. We have applied our approach and toolkit for software systems implemented in the C++, Fortran, and PL/SQL programming languages and illustrate main benefits of our approach from these experiences.

Article Search
Analysis and Clustering of Model Clones: An Automotive Industrial Experience
Manar H. Alalfi, James R. Cordy, and Thomas R. Dean
(Queen's University, Canada)
In this paper we present our early experience analyzing subsystem similarity in industrial automotive models. We apply our model clone detection tool, SIMONE, to identify identical and near-miss Simulink subsystem clones and cluster them into classes based on clone size and similarity threshold. We then analyze clone detection results using graph visualizations generated by the SIMGraph, a SIMONE extension, to identify subsystem patterns. SIMGraph provides us and our industrial partners with new interesting and useful insights that improves our understanding of the analyzed models and suggests better ways to maintain them.

Article Search
Experience on Applying Software Architecture Recovery to Automotive Embedded Systems
Xinhai Zhang, Magnus Persson, Mattias Nyberg, Behrooz Mokhtari, Anton Einarson, Henrik Linder, Jonas Westman, DeJiu Chen, and Martin Törngren
(KTH, Sweden; Scania, Sweden; HiQ, Sweden)
The importance and potential advantages with a comprehensive product architecture description are well described in the literature. However, developing such a description takes additional resources, and it is difficult to maintain consistency with evolving implementations. This paper presents an approach and industrial experience which is based on architecture recovery from source code at truck manufacturer Scania CV AB. The extracted representation of the architecture is presented in several views and verified on CAN signal level. Lessons learned are discussed.

Article Search
Towards Recovering and Exploiting Domain Knowledge from C Code: A Case Study on Automotive Software
Jochen Quante, Mohammed Tarabain, and Janet Siegmund
(Bosch, Germany; University of Magdeburg, Germany; University of Passau, Germany)
To create a software system, a lot of knowledge about the domain that it deals with is needed. This is particularly true for embedded control software, which is in close contact with physical machinery, relationships, and effects. In this paper, we investigate if and how this knowledge that was once built into the software can be recovered from the source code - and what it can be used for. We apply approaches from previous research to an engine control software and adapt it to our setting. In particular, we are constrained to pure C code with limited structure, whereas previous work has mainly dealt with object-oriented software. Despite these limiting conditions, our study shows promising results.

Article Search
Migrating Legacy Spreadsheets-Based Systems to Web MVC Architecture: An Industrial Case Study
Domenico Amalfitano, Anna Rita Fasolino, Valerio Maggio, Porfirio Tramontana, Giancarlo Di Mare, Ferdinando Ferrara, and Stefano Scala
(University of Naples Federico II, Italy; Fiat Group Automobiles, Italy)
The use of spreadsheets to implement Information Systems is widespread in industry. Scripting languages and ad-hoc frameworks (e.g., Visual Basic for Applications) for Rapid Application Development are often exploited by organizations to quickly develop Spreadsheets-based Information Systems for supporting the information management of their business processes. Maintenance tasks on these systems can be very difficult and cause a remarkable worsening of the overall system quality. To prevent these issues, the migration of such systems to new architectures may be a valid solution. In this paper we present our experience in migrating an Excel spreadsheet-based system to a Web application based on a MVC architecture. The proposed approach was successfully applied in a real context of a company operating in the automotive industry.

Article Search

Project Track

CHOReOS: Large Scale Choreographies for the Future Internet
Marco Autili, Paola Inverardi, and Massimo Tivoli
(University of L'Aquila, Italy)
In this paper we share our experience in the CHOReOS EU project. CHOReOS provides solutions for the development and execution of large scale choreographies for the Future Internet. Our main involvement in the project concerns the definition of a choreography development process based on automated synthesis of choreographies out of a large scale service base and a user-centric requirements specification. By focusing on the work package WP2, whose main outcome is the realization of the CHOReOS development process, we discuss the WP2 activities by also summarizing main objectives and related achievements.

Article Search
DIVERSIFY: Ecology-Inspired Software Evolution for Diversity Emergence
Benoit Baudry, Martin Monperrus, Cendrine Mony, Franck Chauvel, Franck Fleurey, and Siobhán Clarke
(INRIA, France; University of Rennes 1, France; SINTEF, Norway; Trinity College Dublin, Ireland)
DIVERSIFY is an EU funded project, which aims at favoring spontaneous diversification in software systems in order to increase their adaptive capacities. This objective is founded on three observations: software has to constantly evolve to face unpredictable changes in its requirements, execution environment or to respond to failure (bugs, attacks, etc.); the emergence and maintenance of high levels of diversity are essential to provide adaptive capacities to many forms of complex systems, ranging from ecological and biological systems to social and economical systems; diversity levels tend to be very low in software systems.
DIVERSIFY explores how the biological evolutionary mechanisms, which sustain high levels of biodiversity in ecosystems (speciation, phenotypic plasticity and natural selection) can be translated in software evolution principles. In this work, we consider evolution as a driver for diversity as a means to increase resilience in software systems. In particular, we are inspired by bipartite ecological relationships to investigate the automatic diversification of the server side of a client-server architecture. This type of software diversity aims at mitigating the risks of software monoculture. The consortium gathers researchers from the software-intensive, distributed systems and the ecology areas in order to transfer ecological concepts and processes as software design principles.

Article Search
The MARKet for Open Source: An Intelligent Virtual Open Source Marketplace
Gabriele Bavota, Alicja Ciemniewska, Ilknur Chulani, Antonio De Nigro, Massimiliano Di Penta, Davide Galletti, Roberto Galoppini, Thomas F. Gordon, Pawel Kedziora, Ilaria Lener, Francesco Torelli, Roberto Pratola, Juliusz Pukacki, Yacine Rebahi, and Sergio García Villalonga
(University of Sannio, Italy; Poznan Supercomputing and Networking Center, Poland; ATOS Research, Spain; Engineering Ingegneria Informatica, Italy; Slashdot Media, UK; Fraunhofer FOKUS, Germany; T6 ECO, Italy)
This paper describes the MARKOS (the MARKet for Open Source) European Project, a FP7-ICT-2011-8 STREP project, which aims to realize a service and an interactive application providing an integrated view on the open source projects available on the web, focusing on functional, structural, and licenses aspects of software source code. MARKOS involves 7 partners from 5 countries, including industries, universities, and research institutions.
MARKOS differs from other services available on the Web---which often provide textual-based code search---in that it provides the possibility to browse the code structure at a high level of abstraction, in order to facilitate the understanding of the software from a technical point of view. Also, it highlights relationships between software components released by different projects, giving an integrated view of the available Open Source software at a global scale. Last, but not least, MARKOS is able to highlight potential legal issues due to license incompatibilities, providing explanations for these issues and supporting developers in the search for alternative solutions to their problems.
MARKOS will involve end users in order to allow to practice its results in scenarios coming from industrial and Open Source communities.

Article Search
ECOS: Ecological Studies of Open Source Software Ecosystems
Tom Mens, Maëlick Claes, and Philippe Grosjean
(University of Mons, Belgium)
Software ecosystems, collections of projects developed by the same community, are among the most complex artefacts constructed by humans. Collaborative development of open source software (OSS) has witnessed an exponential increase in two decades. Our hypothesis is that software ecosystems bear many similarities with natural ecosystems. While natural ecosystems have been the subject of study for many decades, research on software ecosystems is more recent. For this reason, the ECOS research project aims to determine whether and how selected ecological models and theories from natural ecosystems can be adapted and adopted to understand and better explain how OSS projects (akin to biological species) evolve, and to determine what are the main factors that drive the success or popularity of these projects. Expressed in biological terms, we wish to use knowledge on the evolution of natural ecosystems to provide support aiming to optimize the fitness of OSS projects, and to increase the resistance and resilience of OSS ecosystems.

Article Search Video
FITTEST: A New Continuous and Automated Testing Process for Future Internet Applications
Tanja Vos, Paolo Tonella, Wishnu Prasetya , Peter M. Kruse, Alessandra Bagnato, Mark Harman, and Onn Shehory
(Universidad Politécnica de Valencia, Spain; Fondazione Bruno Kessler, Italy; Utrecht University, Netherlands; Berner & Mattner, Germany; SOFTEAM, France; University College London, UK; IBM Research, Israel)
Since our society is becoming increasingly dependent on applications emerging on the Future Internet, quality of these applications becomes a matter that cannot be neglected. However, the complexity of the technologies involved in Future Internet applications makes testing extremely challenging. The EU FP7 FITTEST project has addressed some of these challenges by developing and evaluating a Continuous and Integrated Testing Environment that monitors a Future Internet application when it runs such that it can automatically adapt the testware to the dynamically changing behaviour of the application.

Article Search
Model Inference and Security Testing in the SPaCIoS Project
Matthias Büchler, Karim Hossen, Petru Florin Mihancea, Marius Minea, Roland Groz, and Catherine Oriat
(Technische Universität München, Germany; University of Grenoble, France; LIG, France; Institute e-Austria Timisoara, Romania; Politehnica University of Timisoara, Romania)
The SPaCIoS project has as goal the validation and testing of security properties of services and web applications. It proposes a methodology and tool collection centered around models described in a dedicated specification language, supporting model inference, mutation-based testing, and model checking. The project has developed two approaches to reverse engineer models from implementations. One is based on remote interaction (typically through an HTTP connection) to observe the runtime behaviour and infer a model in black-box mode. The other is based on analysis of application code when available. This paper presents the reverse engineering parts of the project, along with an illustration of how vulnerabilities can be found with various SPaCIoS tool components on a typical security benchmark.

Article Search

Tool Demonstrations

Tool Demonstrations 1

in*Bug: Visual Analytics of Bug Repositories
Tommaso Dal Sasso and Michele Lanza
(University of Lugano, Switzerland)
Bug tracking systems are used to track and store the defects reported during the life of software projects. The underlying repositories represent a valuable source of information used for example for defect prediction and program comprehension. However, bug tracking systems present the actual bugs essentially in textual form, which is not only cumbersome to navigate, but also hinders the understanding of the intricate pieces of information that revolve around software bugs. We present in*Bug, a web-based visual analytics platform to navigate and inspect bug repositories. in*Bug provides several interactive views to understand detailed information about the bugs and the people that report them. The tool can be downloaded at http://inbug.inf.usi.ch

Article Search
APIEvolutionMiner: Keeping API Evolution under Control
André Hora, Anne Etien, Nicolas Anquetil, Stéphane Ducasse, and Marco Tulio Valente
(INRIA, France; UFMG, Brazil)
During software evolution, source code is constantly refactored. In real-world migrations, many methods in the newer version are not present in the old version (e.g., 60% of the methods in Eclipse 2.0 were not in version 1.0). This requires changes to be consistently applied to reflect the new API and avoid further maintenance problems. In this paper, we propose a tool to extract rules by monitoring API changes applied in source code during system evolution. In this process, changes are mined at revision level in code history. Our tool focuses on mining invocation changes to keep track of how they are evolving. We also provide three case studies in order to evaluate the tool.

Article Search
How the Sando Search Tool Recommends Queries
Xi Ge, David Shepherd, Kostadin Damevski, and Emerson Murphy-Hill
(North Carolina State University, USA; ABB, USA; Virginia State University, USA)
Developers spend a significant amount of time searching their local codebase. To help them search efficiently, researchers have proposed novel tools that apply state-of-the-art information retrieval algorithms to retrieve relevant code snippets from the local codebase. However, these tools still rely on the developer to craft an effective query, which requires that the developer is familiar with the terms contained in the related code snippets. Our empirical data from a state-of-the-art local code search tool, called Sando, suggests that developers are sometimes unacquainted with their local codebase. In order to bridge the gap between developers and their ever-increasing local codebase, in this paper we demonstrate the recommendation techniques integrated in Sando.

Article Search
Building Development Tools Interactively using the Ekeko Meta-Programming Library
Coen De Roover and Reinout Stevens
(Vrije Universiteit Brussel, Belgium)
Ekeko is a Clojure library for applicative logic meta-programming against an Eclipse workspace. Ekeko has been applied successfully to answering program queries (e.g., “does this bug pattern occur in my code?”), to analyzing project corpora (e.g., “how often does this API usage pattern occur in this corpus?”), and to transforming programs (e.g., “change occurrences of this pattern as follows”) in a declarative manner. These applications rely on a seamless embedding of logic queries in applicative expressions. While the former identify source code of interest, the latter associate error markers with, compute statistics about, or rewrite the identified source code snippets. In this paper, we detail the logic and applicative aspects of the Ekeko library. We also highlight key choices in their implementation. In particular, we demonstrate how a causal connection with the Eclipse infrastructure enables building development tools interactively on the Clojure read-eval-print loop.

Article Search
Bit-Error Injection for Software Developers
Marcel Heing-Becker, Timo Kamph, and Sibylle Schupp
(Hamburg University of Technology, Germany)
This paper presents FITIn, a bit-error injection tool designed for evaluating software-implemented hardware fault tol- erance (SIHFT) mechanisms. Like most bit-error injection tools, FITIn injects faults at run time into the binary of a program. Unlike previous bit-error injection tools, FITIn allows a software developer to control the targets of injection campaigns at the level of a higher programming language rather than assembler. FITIn is implemented as a Valgrind plugin and has been tested for C programs. We present its architecture, demonstrate its functioning using examples from three benchmarks (Dhrystone, STAMP, and CoreMark), provide performance figures, and discuss general limitations of the approach.

Article Search Video Info
QualityGate SourceAudit: A Tool for Assessing the Technical Quality of Software
Tibor Bakota, Péter Hegedűs, István Siket, Gergely Ladányi, and Rudolf Ferenc
(FrontEndART Software, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary; University of Szeged, Hungary)
Software systems are evolving continuously in order to fulfill the ever-changing business needs. This endless modification, however, decreases the internal quality of the system over time. This phenomena is called software erosion, which results in higher development, testing, and operational costs. The SourceAudit tool presented in this paper helps managing the technical risks of software deterioration by allowing immediate, automatic, and objective assessment of software quality. By monitoring the high-level technical quality of systems it is possible to immediately perform the necessary steps needed to reduce the effects of software erosion, thus reaching higher maintainability and lower costs in the mid and long-term. The tool measures source code maintainability according to the ISO/IEC~25010 based probabilistic software maintainability model called ColumbusQM. It gives a holistic view on software quality and warns on source code maintainability decline.

Article Search

Tool Demonstrations 2

Follow the Path: Debugging Tools for Test-Driven Fault Navigation
Michael Perscheid and Robert Hirschfeld
(HPI, Germany)
Debugging failing test cases, particularly the search for failure causes, is often a laborious and time-consuming activity. Standard debugging tools such as symbolic debuggers and test runners hardly facilitate developers during this task because they neither provide advice to failure causes nor back-in-time capabilities.
In this paper, we present test-driven fault navigation as a debugging guide that integrates spectrum-based and state anomalies into execution histories in order to systematically trace failure causes back to defects. We describe and demonstrate our Path tools that implement our debugging method for the Squeak/Smalltalk development environment.

Article Search Info
jModex: Model Extraction for Verifying Security Properties of Web Applications
Petru Florin Mihancea and Marius Minea
(Politehnica University of Timisoara, Romania; Institute e-Austria Timisoara, Romania)
Detecting security vulnerabilities in web applications is an important task before taking them on-line. We present jModex, a tool that analyzes the code of web applications to extract behavioral models. The security properties of these models can then be verified with a model checker. An initial evaluation, in which a confirmed security flaw is identified using a model extracted by jModex, shows the tool potential.

Article Search
PHP AiR: Analyzing PHP Systems with Rascal
Mark Hills and Paul Klint
(East Carolina University, USA; CWI, Netherlands; INRIA, France)
PHP is currently one of the most popular programming languages, widely used in both the open source community and in industry to build large web-focused applications and application frameworks. To provide a solid framework for working with large PHP systems in areas such as evaluating how language features are used, studying how PHP systems evolve, program analysis for refactoring and security validation, and software metrics, we have developed PHP AiR, a framework for PHP Analysis in Rascal. Here we briefly describe features available in PHP AiR, integration with the Eclipse PHP Development Tools, and usage scenarios in program analysis, metrics, and empirical software engineering.

Article Search Info
Mc2for Demo: A Tool for Automatically Translating Matlab to Fortran 95
Xu Li and Laurie Hendren
(McGill University, Canada)
MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB's high-level syntax and dynamic types makes it ideal for prototyping, programmers often prefer using high-performance static languages such as Fortran for their final distributable code. Rather than requiring programmers to rewrite their code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to an equivalent Fortran program. There are several important challenges for automatically translating MATLAB to Fortran, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to Fortran constructs.
In this tool demonstration, we introduce the tool Mc2For, a mature prototype which automatically translates MATLAB programs to Fortran. This tool takes as input a MATLAB entry point function file of a program with corresponding information of its input parameters, then automatically finds all functions reachable directly or indirectly from the entry point, loads the necessary files, and translates all the reachable MATLAB functions to equivalent Fortran. The output of the tool is a collection of Fortran function files, which can be compiled with any Fortran 95-compliant compiler. Mc2For is open source, and has been implemented in Java using the McLab framework, which means that the tool runs on any system supporting Java.

Article Search
Dahlia: A Visual Analyzer of Database Schema Evolution
Loup Meurice and Anthony Cleve
(University of Namur, Belgium)
In a continuously changing environment, software evolution becomes an unavoidable activity. The mining software repositories (MSR) field studies the valuable data available in software repositories such as source code version-control systems, issue/bug-tracking systems, or communication archives. In recent years, many researchers have used MSR techniques as a way to support software understanding and evolution. While many software systems are data-intensive, i.e., their central artifact is a database, little attention has been devoted to the analysis of this important system component in the context of software evolution. The goal of our work is to reduce this gap by considering the database evolution history as an additional information source to aid software evolution. We present DAHLIA (Database ScHema EvoLutIon Analysis), a visual analyzer of database schema evolution. Our tool mines the database schema evolution history from the software repository and allows its interactive, visual analysis. We describe DAHLIA and present our novel approach supporting data-intensive software evolution.

Article Search

Doctoral Symposium

SENSEI: Software Evolution Service Integration
Jan Jelschen
(University of Oldenburg, Germany)
Software evolution tools mostly implement a single technique to assist in achieving a specific objective. Overhauling, renovating, or migrating large and complex legacy software systems require the proper combination of several different techniques appropriate for each subtask. Since few tools are built for interoperability, the setup of a toolchain supporting a given software evolution process is an elaborate, time-consuming, error-prone, and redundant endeavor, which yields brittle and inflexible toolchains with little to no reusability.
This paper presents SENSEI, an approach to enable the implementation of an integration framework for software evolution tools using component-based, service-oriented, and model-driven methods, to ease toolchain creation and enable agile execution of software evolution projects. It will be evaluated by implementing and using it to build the toolchains supporting two software evolution projects, and having practitioners assess its usefulness.

Article Search
Understanding the Evolution of Socio-technical Aspects in Open Source Ecosystems
Mathieu Goeminne
(University of Mons, Belgium)
Open source systems being related to each other may be grouped in bigger systems called software ecosystems. The goal of our PhD dissertation was to understand the evolution of the social aspects in such ecosystems. More precisely, we studied how contributors to these ecosystems can be grouped in different communities that evolve and collaborate in different ways. In doing so, we provided evidence that contributors have specificities that are not taken into account by today’s analysis tools. Becoming aware of these specificities opens up new research and practically relevant questions on how new automated tools can be designed and used to offer better support to the ecosystem’s contributors in their activities.

Article Search

Workshop Descriptions

International Workshop on Software Clones
Rainer Koschke, Nils Göde, and Yoshiki Higo
(University of Bremen, Germany; CQSE, Germany; Osaka University, Japan)
Software Clones are identical or similar pieces of code, models or designs. In IWSC2014, we will discuss issues in software clone detection, analysis and management, as well as applications to software engineering contexts that can benefit from knowledge of clones. These are important emerging topics in software engineering research and practice. We will also discuss broader topics on software clones, such as clone detection methods, clone classification, management, and evolution, the role of clones in software system architecture, quality and evolution, clones in plagiarism, licensing, and copyright, and other topics related to similarity in software systems. The format of this workshop will give enough time for intense discussions.

Article Search
International Workshop on Open and Original Problems in Software Language Engineering
Anya Helene Bagge and Vadim Zaytsev
(University of Bergen, Norway; CWI, Netherlands; University of Amsterdam, Netherlands)
The second international workshop on Open and Original Problems in Software Language Engineering (OOPSLE'14) follows the first one held at WCRE 2013 in Koblenz. It is meant to be a discussion-oriented and collaborative forum for formulating and addressing with open, unsolved and unsolvable problems in software language engineering (SLE), which is a research domain of systematic, disciplined and measurable approaches of development, evolution and maintenance of artificial languages used in software development. OOPSLE aims to serve as a think tank for identifying and formulating challenges in the software language engineering field — these challenges could be addressed later at venues like SLE, MODELS, CSMR, WCRE, ICSM and others.

Article Search
International Workshop on Software Quality and Maintainability
Lodewijk Bergmans, Steven Raemaekers, and Tom Mens
(Software Improvement Group, Netherlands; University of Mons, Belgium)
SQM 2014 (http://sqm2014.sig.eu), the 8th International Workshop on Software Quality and Maintainability, was organized as a satellite event of the CSMR-WCRE 2014 conference in Antwerp, on February 3, 2014. The workshop received 18 submissions, focusing on research, empirical studies, industry practices and experiences in the area of software quality, maintainability and traceability. This year, the special theme of the workshop is "exploring the boundaries between the theory and practice of software quality''.

Article Search

proc time: 2.43