ICSME 2015 – Proceedings

Messages from the Chairs
Welcome to ICSME 2015 in Bremen! It is a pleasure and a distinct honour to host the IEEE International Conference on Software Maintenance and Evolution (ICSME) for the first time in Germany. Since its start in 1983 – at that time still named ICSM (International Conference on Software Maintenance) – ICSME has grown and developed into the largest international scientific forum for software maintenance researchers and practitioners. Participants from academia, government, and industry share ideas and experiences solving critical software maintenance problems. Today, ICSME is the premier international event in the field of maintenance and evolution.

Technical Research Track

Developers
Tue, Sep 29, 10:40 - 12:20, GW2 B3009 (Chair: Michael Godfrey)

Software History under the Lens: A Study on Why and How Developers Examine It
Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey
(Oregon State University, USA; University of Illinois at Urbana-Champaign, USA)
Despite software history being indispensable for developers, there is little empirical knowledge about how they examine software history. Without such knowledge, researchers and tool builders are in danger of making wrong assumptions and building inadequate tools. In this paper we present an in-depth empirical study about the motivations developers have for examining software history, the strategies they use, and the challenges they encounter. To learn these, we interviewed 14 experienced developers from industry, and then extended our findings by surveying 217 developers. We found that history does not begin with the latest commit but with uncommitted changes. Moreover, we found that developers had different motivations for examining recent and old history. Based on these findings we propose 3-LENS HISTORY, a novel unified model for reasoning about software history.

To Fix or to Learn? How Production Bias Affects Developers' Information Foraging during Debugging
David Piorkowski, Scott D. Fleming, Christopher Scaffidi, Margaret Burnett, Irwin Kwan, Austin Z. Henley, Jamie Macbeth, Charles Hill, and Amber Horvath
(Oregon State University, USA; University of Memphis, USA; Clemson University, USA)
Developers performing maintenance activities must balance their efforts to learn the code vs. their efforts to actually change it. This balancing act is consistent with the "production bias" that, according to Carroll's minimalist learning theory, generally affects software users during everyday tasks. This suggests that developers' focus on efficiency should have marked effects on how they forage for the information they think they need to fix bugs. To investigate how developers balance fixing versus learning during debugging, we conducted the first empirical investigation of the interplay between production bias and information foraging. Our theory-based study involved 11 participants: half tasked with fixing a bug, and half tasked with learning enough to help someone else fix it. Despite the subtlety of difference between their tasks, participants foraged remarkably differently-- making foraging decisions from different types of "patches," with different types of information, and succeeding with different foraging tactics.

Developers' Perception of Co-change Patterns: An Empirical Study
Luciana L. Silva, Marco Tulio Valente, Marcelo de A. Maia, and Nicolas Anquetil
(Federal University of Minas Gerais, Brazil; Federal Institute of Triangulo Mineiro, Brazil; Federal University of Uberlândia, Brazil; INRIA, France)
Co-change clusters are groups of classes that frequently change together. They are proposed as an alternative modular view, which can be used to assess the traditional decomposition of systems in packages. To investigate developer's perception of co-change clusters, we report in this paper a study with experts on six systems, implemented in two languages. We mine 102 co-change clusters from the version history of such systems, which are classified in three patterns regarding their projection to the package structure: Encapsulated, Crosscutting, and Octopus. We then collect the perception of expert developers on such clusters, aiming to ask two central questions: (a) what concerns and changes are captured by the extracted clusters? (b) do the extracted clusters reveal design anomalies? We conclude that Encapsulated Clusters are often viewed as healthy designs and that Crosscutting Clusters tend to be associated to design anomalies. Octopus Clusters are normally associated to expected class distributions, which are not easy to implement in an encapsulated way, according to the interviewed developers.

When and Why Developers Adopt and Change Software Licenses
Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel M. German, and Denys Poshyvanyk

(College of William and Mary, USA; Free University of Bolzano, Italy; University of Sannio, Italy; University of Victoria, Canada)
Software licenses legally govern the way in which developers can use, modify, and redistribute a particular system. While previous studies either investigated licensing through mining software repositories or studied licensing through FOSS reuse, we aim at understanding the rationale behind developers' decisions for choosing or changing software at understanding the rationale behind developers' decisions for choosing or changing software licensing by surveying open source developers. In this paper, we analyze when developers consider licensing, the reasons why developers pick a license for their project, and the factors that influence licensing changes. Additionally, we explore the licensing-related problems that developers experienced and expectations they have for licensing support from forges (e.g., GitHub). Our investigation involves, on one hand, the analysis of the commit history of 16,221 Java open source projects to identify the commits where licenses were added or changed. On the other hand, it consisted of a survey---in which 138 developers informed their involvement in licensing-related decisions and 52 provided deeper insights about the rationale behind the actions that they had undertaken. The results indicate that developers adopt licenses early in the project's development and change licensing after some period of development (if at all). We also found that developers have inherent biases with respect to software licensing. Additionally, reuse---whether by a non-contributor or for commercial purposes---is a dominant reason why developers change licenses of their systems. Finally, we discuss potential areas of research that could ameliorate the difficulties that software developers are facing with regard to licensing issues of their software systems.

Program Comprehension
Tue, Sep 29, 13:50 - 15:30, GW2 B3009 (Chair: Denys Poshyvanyk)

Investigating Naming Convention Adherence in Java References
Simon Butler, Michel Wermelinger, and Yijun Yu
(Open University, UK)
Naming conventions can help the readability and comprehension of code, and thus the onboarding of new developers. Conventions also provide cues that help developers and tools extract information from identifier names to support software maintenance. Tools exist to automatically check naming conventions but they are often limited to simple checks, e.g. regarding typography. The adherence to more elaborate conventions, such as the use of noun and verbal phrases in names, is not checked. We present Nominal, a naming convention checking library for Java that allows the declarative specification of conventions regarding typography and the use of abbreviations and phrases. To test Nominal, and to investigate the extent to which developers follow conventions, we extract 3.5 million reference - field, formal argument and local variable - name declarations from 60 FLOSS projects and determine their adherence to two well-known Java naming convention guidelines that give developers scope to choose a variety of forms of name, and sometimes offer conflicting advice. We found developers largely follow naming conventions, but adherence to specific conventions varies widely.

Developing a Model of Loop Actions by Mining Loop Characteristics from a Large Code Corpus
Xiaoran Wang, Lori Pollock, and K. Vijay-Shanker
(University of Delaware, USA)
Some high level algorithmic steps require more than one statement to implement, but are not large enough to be a method on their own. Specifically, many algorithmic steps (e.g., count, compare pairs of elements, find the maximum) are implemented as loop structures, which lack the higher level abstraction of the action being performed, and can negatively affect both human readers and automatic tools. Additionally, in a study of 14,317 projects, we found that less than 20% of loops are documented to help readers. In this paper, we present a novel automatic approach to identify the high level action implemented by a given loop. We leverage the available, large source of high-quality open source projects to mine loop characteristics and develop an action identification model. We use the model and feature vectors extracted from loop code to automatically identify the high level actions implemented by loops. We have evaluated the accuracy of the loop action identification and coverage of the model over 7159 open source programs. The results show great promise for this approach to automatically insert internal comments and provide additional higher level naming for loop actions to be used by tools such as code search.

Delta Extraction: An Abstraction Technique to Comprehend Why Two Objects Could Be Related
Naoya Nitta and Tomohiro Matsuoka
(Konan University, Japan)
In an execution of a large scale program, even a simple observable behavior may be generated by a wide range of the source code. To comprehend how such a behavior is implemented in the code, a debugger would be helpful. However, when using a debugger, developers often encounter several types of cumbersome tasks and are often confused by the huge and complicated runtime information. To support such a debugger-based comprehension task, we propose an abstraction technique of runtime information, named delta, and present a delta extraction and visualization tool. Basically, a delta is defined for two linked objects in an object-oriented program's execution. It intuitively represents the reason why these objects could be related in the execution, and it can hide the details of how these objects were related. We have conducted experiments on four subject tasks from two real-world systems to evaluate how appropriately an extracted delta can answer the `why' question and how long the tool can reduce the working time to answer the question. The results show that each delta can successfully answer the question and a tens-of-minutes to one-hour debugger-based task can be reduced by extracting a delta.

Modeling Changeset Topics for Feature Location
Christopher S. Corley, Kelly L. Kashuda, and Nicholas A. Kraft
(University of Alabama, USA; ABB Corporate Research, USA)
Feature location is a program comprehension activity in which a developer inspects source code to locate the classes or methods that implement a feature of interest. Many feature location techniques (FLTs) are based on text retrieval models, and in such FLTs it is typical for the models to be trained on source code snapshots. However, source code evolution leads to model obsolescence and thus to the need to retrain the model from the latest snapshot. In this paper, we introduce a topic-modeling-based FLT in which the model is built incrementally from source code history. By training an online learning algorithm using changesets, the FLT maintains an up-to-date model without incurring the non-trivial computational cost associated with retraining traditional FLTs. Overall, we studied over 600 defects and features from 4 open-source Java projects. We also present a historical simulation that demonstrates how the FLT performs as a project evolves. Our results indicate that the accuracy of a changeset-based FLT is similar to that of a snapshot-based FLT, but without the retraining costs.

Info

Software Quality
Tue, Sep 29, 16:00 - 17:40, GW2 B3009 (Chair: Alexander Serebrenik)

Four Eyes Are Better Than Two: On the Impact of Code Reviews on Software Quality
Gabriele Bavota and Barbara Russo
(Free University of Bolzano, Italy)
Code review is advocated as one of the best practices to improve software quality and reduce the likelihood of introducing defects during code change activities. Recent research has shown how code components having a high review coverage (i.e., a high proportion of reviewed changes) tend to be less involved in post-release fixing activities. Yet the relationship between code review and bug introduction or the overall software quality is still largely unexplored.
This paper presents an empirical, exploratory study on three large open source systems that aims at investigating the influence of code review on (i) the chances of inducing bug fixes and (ii) the quality of the committed code components, as assessed by code coupling, complexity, and readability.
Findings show that unreviewed commits (i.e., commits that did not undergo a review process) have over two times more chances of introducing bugs than reviewed commits (i.e., commits that underwent a review process). In addition, code committed after review has a substantially higher readability with respect to unreviewed code.

A Comparative Study on the Bug-Proneness of Different Types of Code Clones
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
Code clones are defined to be the exactly or nearly similar code fragments in a software system's code-base. The existing clone related studies reveal that code clones are likely to introduce bugs and inconsistencies in the code-base. However, although there are different types of clones, it is still unknown which types of clones have a higher likeliness of introducing bugs to the software systems and so, should be considered more important for managing with techniques such as refactoring or tracking. With this focus, we performed a study that compared the bug-proneness of the major clone-types: Type 1, Type 2, and Type 3. According to our experimental results on thousands of revisions of seven diverse subject systems, Type 3 clones exhibit the highest bug-proneness among the three clone-types. The bug-proneness of Type 1 clones is the lowest. Also, Type 3 clones have the highest likeliness of being co-changed consistently while experiencing bug-fixing changes. Moreover, the Type 3 clones that experience bug-fixes have a higher possibility of evolving following a Similarity Preserving Change Pattern (SPCP) compared to the bug-fix clones of the other two clone-types. From the experimental results it is clear that Type 3 clones should be given a higher priority than the other two clone-types when making clone management decisions. We believe that our study provides useful implications for ranking clones for refactoring and tracking.

An Empirical Study of Bugs in Test Code
Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah
(University of British Columbia, Canada)
Testing aims at detecting (regression) bugs in production code. However, testing code is just as likely to contain bugs as the code it tests. Buggy test cases can silently miss bugs in the production code or loudly ring false alarms when the production code is correct. We present the first empirical study of bugs in test code to characterize their prevalence and root cause categories. We mine the bug repositories and version control systems of 211 Apache Software Foundation (ASF) projects and find 5,556 test-related bug reports. We (1) compare properties of test bugs with production bugs, such as active time and fixing effort needed, and (2) qualitatively study 443 randomly sampled test bug reports in detail and categorize them based on their impact and root causes. Our results show that (1) around half of all the projects had bugs in their test code; (2) the majority of test bugs are false alarms, i.e., test fails while the production code is correct, while a minority of these bugs result in silent horrors, i.e., test passes while the production code is incorrect; (3) incorrect and missing assertions are the dominant root cause of silent horror bugs; (4) semantic (25%), flaky (21%), environment-related (18%) bugs are the dominant root cause categories of false alarms; (5) the majority of false alarm bugs happen in the exercise portion of the tests, and (6) developers contribute more actively to fixing test bugs and test bugs are fixed sooner compared to production bugs. In addition, we evaluate whether existing bug detection tools can detect bugs in test code.

Investigating Code Review Quality: Do People and Participation Matter?
Oleksii Kononenko, Olga Baysal, Latifa Guerrouj, Yaxin Cao, and Michael W. Godfrey

(University of Waterloo, Canada; Université de Montréal, Canada; École de Technologie Supérieure, Canada)
Code review is an essential element of any mature software development project; it aims at evaluating code contributions submitted by developers. In principle, code review should improve the quality of code changes (patches) before they are committed to the project's master repository. In practice, bugs are sometimes unwittingly introduced during this process.
In this paper, we report on an empirical study investigating code review quality for Mozilla, a large open-source project. We explore the relationships between the reviewers' code inspections and a set of factors, both personal and social in nature, that might affect the quality of such inspections. We applied the SZZ algorithm to detect bug-inducing changes that were then linked to the code review information extracted from the issue tracking system. We found that 54% of the reviewed changes introduced bugs in the code. Our findings also showed that both personal metrics, such as reviewer workload and experience, and participation metrics, such as the number of involved developers, are associated with the quality of the code review process.

Modularity
Wed, Sep 30, 08:30 - 10:10, GW2 B3009 (Chair: Giuseppe Scanniello)

Inter-smell Relations in Industrial and Open Source Systems: A Replication and Comparative Analysis
Aiko Yamashita, Marco Zanoni, Francesca Arcelli Fontana, and Bartosz Walter
(Oslo and Akershus University College of Applied Sciences, Norway; University of Milano-Bicocca, Italy; Poznan University of Technology, Poland)
The presence of anti-patterns and code smells can affect adversely software evolution and quality. Recent work has shown that code smells that appear together in the same file (i.e., collocated smells) can interact with each other, leading to various types of maintenance issues and/or to the intensification of negative effects. Moreover, it has been found that code smell interactions can occur across coupled files (i.e., coupled smells), with comparable negative effects as the interaction of same- file (collocated) smells. Different inter-smell relations have been described in previous work, yet only few studies have evaluated them empirically. This study attempts to replicate the findings from previous work on inter-smell relations by analyzing larger systems, and by including both industrial and open source ones. We also include the analysis of coupled smells in addition to collocated smells, to achieve a more complete picture on the landscape of inter-smell relations. Our observations suggest that if coupled smells are not considered, one may risk increasing the number of false negatives when analysing inter-smell relations. A major finding is that patterns of inter-smell relations vary between open source and industrial systems. This suggests that contextual variables (e.g., domain, development mode, environment) should be considered in further studies on code smells.

Evaluating Clone Detection Tools with BigCloneBench
Jeffrey Svajlenko and Chanchal K. Roy
(University of Saskatchewan, Canada)
Many clone detection tools have been proposed in the literature. However, our knowledge of their performance in real software systems is limited, particularly their recall. In this paper, we use our big data clone benchmark, BigCloneBench, to evaluate the recall of ten clone detection tools. BigCloneBench is a collection of eight million validated clones within IJaDataset-2.0, a big data software repository containing 25,000 open-source Java systems. BigCloneBench contains both intra-project and inter-project clones of the four primary clone types. We use this benchmark to evaluate the recall of the tools per clone type and across the entire range of clone syntactical similarity. We evaluate the tools for both single-system and cross-project detection scenarios. Using multiple clone-matching metrics, we evaluate the quality of the tools' reporting of the benchmark clones with respect to refactoring and automatic clone analysis use-cases. We compare these real-world results against our Mutation and Injection Framework, a synthetic benchmark, to reveal deeper understanding of the tools. We found that the tools have strong recall for Type-1 and Type-2 clones, as well as Type-3 clones with high syntactical similarity. The tools have weaker detection of clones with lower syntactical similarity.

Uncovering Dependence Clusters and Linchpin Functions
David Binkley, Árpád Beszédes

, Syed Islam, Judit Jász, and Béla Vancsics

(Loyola University Maryland, USA; University of Szeged, Hungary; University of East London, UK)
Dependence clusters are (maximal) collections of mutually dependent source code entities according to some dependence relation. Their presence in software complicates many maintenance activities including testing, refactoring, and feature extraction. Despite several studies finding them common in production code, their formation, identification, and overall structure are not well understood, partly because of challenges in approximating true dependences between program entities. Previous research has considered two approximate dependence relations: a fine-grained statement-level relation using control and data dependences from a program’s System Dependence Graph and a coarser relation based on function-level control-flow reachability. In principal, the first is more expensive and more precise than the second. Using a collection of twenty programs, we present an empirical investigation of the clusters identified by these two approaches. In support of the analysis, we consider hybrid cluster types that works at the coarser function-level but is based on the higher-precision statement-level dependences. The three types of clusters are compared based on their slice sets using two clustering metrics. We also perform extensive analysis of the programs to identify linchpin functions – functions primarily responsible for holding a cluster together. Results include evidence that the less expensive, coarser approaches can often be used as effective proxies for the more expensive, finer-grained approaches. Finally, the linchpin analysis shows that linchpin functions can be effectively and automatically identified.

Forked and Integrated Variants in an Open-Source Firmware Project
Ștefan Stănciulescu, Sandro Schulze, and Andrzej Wąsowski

(IT University of Copenhagen, Denmark; TU Braunschweig, Germany)
Code cloning has been reported both on small (code fragments) and large (entire projects) scale. Cloning-in-the-large, or forking, is gaining ground as a reuse mechanism thanks to availability of better tools for maintaining forked project variants, hereunder distributed version control systems and interactive source management platforms such as Github. We study advantages and disadvantages of forking using the case of Marlin, an open source firmware for 3D printers. We find that many problems and advantages of cloning do translate to forking. Interestingly, the Marlin community uses both forking and integrated variability management (conditional compilation) to create variants and features. Thus, studying it increases our understanding of the choice between integrated and clone-based variant management. It also allows us to observe mechanisms governing source code maturation, in particular when, why and how feature implementations are migrated from forks to the main integrated platform. We believe that this understanding will ultimately help development of tools mixing clone-based and integrated variant management, combining the advantages of both.

Info

Program Analysis
Wed, Sep 30, 10:40 - 12:20, GW2 B3009 (Chair: Arpad Beszedes)

Towards Automating Dynamic Analysis for Behavioral Design Pattern Detection
Andrea De Lucia

, Vincenzo Deufemia, Carmine Gravino, and Michele Risi
(University of Salerno, Italy)
The detection of behavioral design patterns is more accurate when a dynamic analysis is performed on the candidate instances identified statically. Such a dynamic analysis requires the monitoring of the candidate instances at run-time through the execution of a set of test cases. However, the definition of such test cases is a time-consuming task if performed manually, even more, when the number of candidate instances is high and they include many false positives. In this paper we present the results of an empirical study aiming at assessing the effectiveness of dynamic analysis based on automatically generated test cases in behavioral design pattern detection. The study considered three behavioral design patterns, namely State, Strategy, and Observer, and three publicly available software systems, namely JHotDraw 5.1, QuickUML 2001, and MapperXML 1.9.7. The results show that dynamic analysis based on automatically generated test cases improves the precision of design pattern detection tools based on static analysis only. As expected, this improvement in precision is achieved at the expenses of recall, so we also compared the results achieved with automatically generated test cases with the more expensive but also more accurate results achieved with manually built test cases. The results of this analysis allowed us to highlight costs and benefits of automating dynamic analysis for design pattern detection.

Practical and Accurate Pinpointing of Configuration Errors using Static Analysis
Zhen Dong, Artur Andrzejak, and Kun Shao
(University of Heidelberg, Germany; Hefei University of Technology, China)
Software misconfigurations are responsible for a substantial part of today's system failures, causing about one-quarter of all customer-reported issues. Identifying their root causes can be costly in terms of time and human resources. We present an approach to automatically pinpoint such defects without error reproduction. It uses static analysis to infer the correlation degree between each configuration option and program sites affected by an exception. The only run-time information required by our approach is the stack trace of a failure. This is an essential advantage compared to existing approaches which require to reproduce errors or to provide testing oracles. We evaluate our approach on 29 errors from 4 configurable software programs, namely JChord, Randoop, Hadoop, and Hbase. Our approach can successfully diagnose 27 out of 29 errors. For 20 errors, the failure-inducing configuration option is ranked first.

Info

Deterministic Dynamic Race Detection Across Program Versions
Sri Varun Poluri and Murali Krishna Ramanathan
(Indian Institute of Science, India)
Dynamic race detectors operate by analyzing ex- ecution traces of programs to detect races in multithreaded programs. As the thread interleavings influence these traces, the sets of races detected across multiple runs of the detector can vary. This non-determinism without any change in program source and input can reduce programmer confidence in using the detector. From an organizational perspective, a defect needs to be reported consistently until it is fixed. Non-determinism complicates the work flow and the problem is further exacerbated with modifications to the program. In this paper, we propose a framework for deterministic dynamic race detection that ensures detection of races until they are fixed, even across program versions. The design attempts to preserve the racy behavior with changes to the program source that include addition (and deletion) of locks and shared memory accesses. We record, transform and replay the schedules across program versions intelligently to achieve this goal. We have implemented a framework, named STABLER, and evaluated our ideas by applying popular race detectors (DJIT+, FastTrack) on different versions of many open-source multi- threaded Java programs. Our experimental results show that we are able to detect all the unfixed races consistently across major releases of the program. For both the detectors, the maximum incurred slowdown, with our framework, for record and replay is 1.2x and 2.29x respectively. We also perform user experiments where volunteers fixed a significant number of races. In spite of these changes, our framework is effective in its ability to detect all the unfixed races.

Program Specialization and Verification using File Format Specifications
Raveendra Kumar Medicherla, Raghavan Komondoor, and S. Narendran
(Tata Consultancy Services, India; Indian Institute of Science, India)
Programs that process data that reside in files are widely used in varied domains, such as banking, healthcare, and web-traffic analysis. Precise static analysis of these programs in the context of software transformation and verification tasks is a challenging problem. Our key insight is that static analysis of file-processing programs can be made more useful if knowledge of the input file formats of these programs is made available to the analysis. We instantiate this idea to solve two practical problems -- specializing the code of a program to a given ``restricted'' input file format, and verifying if a program ``conforms'' to a given input file format. We then discuss an implementation of our approach, and also empirical results on a set of real and realistic programs. The results are very encouraging in the terms of both scalability as well as precision of the approach.

Refactoring
Wed, Sep 30, 13:50 - 15:30, GW2 B3009 (Chair: Romain Robbes)

An Empirical Evaluation of Ant Build Maintenance using Formiga
Ryan Hardt and Ethan V. Munson
(University of Wisconsin-Eau Claire, USA; University of Wisconsin-Milwaukee, USA)
As a software project evolves, so does its build system. Significant effort is necessary to maintain the build system to cope with this evolution, in part because changes to source code often require parallel changes in the build system. Our tool, Formiga, is a build maintenance and dependency discovery tool for the Ant build system. Formiga's primary uses are to automate build changes as the source code is updated, to identify the build dependencies within a software project, and to assist with build refactoring. Formiga is implemented as an IDE plugin, which allows it to recognize when project resources are updated and automatically update the build system accordingly. A controlled experiment was conducted to assess Formiga's ability to assist developers with build maintenance. Subjects responded to scenarios in which various forms of build maintenance and/or knowledge about deliverables and their contents were requested. Subjects completed eleven different build maintenance tasks in both an experimental condition using Formiga and a control condition using only conventional IDE services. The study used a balanced design relative to both task order and use of Formiga. This design also ensured that order balancing was not confounded with the subjects' level of Ant expertise. Formiga was shown to significantly reduce the time required to perform build maintenance while increasing the correctness with which it can be performed for both novice and experienced developers.

Scripting Parametric Refactorings in Java to Retrofit Design Patterns
Jongwook Kim, Don Batory, and Danny Dig
(University of Texas at Austin, USA; Oregon State University, USA)
Retrofitting design patterns into a program by hand is tedious and error-prone. A programmer must distinguish refactorings that are provided by an Integrated Development Environment (IDE) from those that must be realized manually, determine a precise sequence of refactorings to apply, and perform this sequence repetitively to a laborious degree. We designed, implemented, and evaluated Reflective Refactoring (R2), a Java package to automate the creation of classical design patterns (Visitor, Abstract Factory, etc.), their inverses, and variants. We encoded 18 out of 23 Gang-of-Four design patterns as R2 scripts and explain why the remaining are inappropriate for refactoring engines. We evaluate the productivity and scalability of R2 with a case study of 6 real-world applications. In one case, R2 automatically created a Visitor with 276 visit methods by invoking 554 Eclipse refactorings in 10 minutes - an achievement that could not be done manually. R2 also sheds light on why refactoring correctness, expressiveness, and speed are critical issues for scripting in next-generation refactoring engines.

Info

System Specific, Source Code Transformations
Gustavo Santos, Nicolas Anquetil, Anne Etien, Stéphane Ducasse

, and Marco Tulio Valente
(INRIA, France; Federal University of Minas Gerais, Brazil)
During its lifetime, a software system might undergo a major transformation effort in its structure, for example to mi- grate to a new architecture or bring some drastic improvements to the system. Particularly in this context, we found evidences that some sequences of code changes are made in a systematic way. These sequences are composed of small code transformations (e.g., create a class, move a method) which are repeatedly applied to groups of related entities (e.g., a class and some of its methods). A typical example consists in the systematic introduction of a Factory design pattern on the classes of a package. We define these sequences as transformation patterns. In this paper, we identify examples of transformation patterns in real world software systems and study their properties: (i) they are specific to a system; (ii) they were applied manually; (iii) they were not always applied to all the software entities which could have been transformed; (iv) they were sometimes complex; and (v) they were not always applied in one shot but over several releases. These results suggest that transformation patterns could benefit from automated support in their application. From this study, we propose as future work to develop a macro recorder, a tool with which a developer records a sequence of code transformations and then automatically applies them in other parts of the system as a customizable, large-scale transformation operator.

A Decision Support System to Refactor Class Cycles
Tosin Daniel Oyetoyan, Daniela Soares Cruzes, and Christian Thurmann-Nielsen
(NTNU, Norway; SINTEF, Norway; EVRY, Norway)
Many studies show that real-world systems are riddled with large dependency cycles among software classes. Dependency cycles are claimed to affect quality factors such as testability, extensibility, modifiability, and reusability. Recent studies reveal that most defects are concentrated in classes that are in and near cycles. In this paper, we (1) propose a new metric: IRCRSS based on the Class Reachability Set Size (CRSS) to identify the reduction ratio between the CRSS of a class and its interfaces, and (2) presents a cycle-breaking decision support system (CB-DSS) that implements existing design approaches in combination with class edge contextual data. Evaluations of multiple systems show that (1) the IRCRSS metric can be used to identify fewer classes as candidates for breaking large cycles, thus reducing refactoring effort, and (2) the CB-DSS can assist software engineers to plan restructuring of classes involved in complex dependency cycles.

Code Mining and Recommendation
Thu, Oct 1, 10:40 - 12:20, GW2 B3009 (Chair: David Shepherd)

On the Role of Developer's Scattered Changes in Bug Prediction
Dario Di Nucci, Fabio Palomba

, Sandro Siravo, Gabriele Bavota, Rocco Oliveto, and Andrea De Lucia

(University of Salerno, Italy; University of Molise, Italy; Free University of Bolzano, Italy)
The importance of human-related factors in the introduction of bugs has recently been the subject of a number of empirical studies. However, such factors have not been captured yet in bug prediction models which simply exploit product metrics or process metrics based on the number and type of changes or on the number of developers working on a software component. Previous studies have demonstrated that focused developers are less prone to introduce defects than non focused developers. According to this observation, software components changed by focused developers should also be less error prone than software components changed by less focused developers. In this paper we capture this observation by measuring the structural and semantic scattering of changes performed by the developers working on a software component and use these two measures to build a bug prediction model. Such a model has been evaluated on five open source systems and compared with two competitive prediction models: the first exploits the number of developers working on a code component in a given time period as predictor, while the second is based on the concept of code change entropy. The achieved results show the superiority of our model with respect to the two competitive approaches, and the complementarity of the defined scattering measures with respect to standard predictors commonly used in the literature.

Info

How Do Developers React to API Evolution? The Pharo Ecosystem Case
André Hora, Romain Robbes, Nicolas Anquetil, Anne Etien, Stéphane Ducasse

, and Marco Tulio Valente
(Federal University of Minas Gerais, Brazil; University of Chile, Chile; INRIA, France)
Software engineering research now considers that no system is an island, but it is part of an ecosystem involving other systems, developers, users, hardware, . . . When one system (e.g., a framework) evolves, its clients often need to adapt. Client developers might need to adapt to functionalities, client systems might need to be adapted to a new API, client users might need to adapt to a new User Interface. The consequences of such changes are yet unclear, what proportion of the ecosystem might be expected to react, how long might it take for a change to diffuse in the ecosystem, do all clients react in the same way? This paper reports on an exploratory study aimed at observing API evolution and its impact on a large-scale software ecosystem, Pharo, which has about 3,600 distinct systems, more than 2,800 contributors, and six years of evolution. We analyze 118 API changes and answer research questions regarding the magnitude, duration, extension, and consistency of such changes in the ecosystem. The results of this study help to characterize the impact of API evolution in large software ecosystems, and provide the basis to better understand how such impact can be alleviated.

Who Should Review This Change?: Putting Text and File Location Analyses Together for More Accurate Recommendations
Xin Xia, David Lo

, Xinyu Wang, and Xiaohu Yang

(Zhejiang University, China; Singapore Management University, Singapore)
Software code review is a process of developers inspecting new code changes made by others, to evaluate their quality and identify and fix defects, before integrating them to the main branch of a version control system. Modern Code Review (MCR), a lightweight and tool-based variant of conventional code review, is widely adopted in both open source and proprietary software projects. One challenge that impacts MCR is the assignment of appropriate developers to review a code change. Considering that there could be hundreds of potential code reviewers in a software project, picking suitable reviewers is not a straightforward task. A prior study by Thongtanunam et al. showed that the difficulty in selecting suitable reviewers may delay the review process by an average of 12 days. In this paper, to address the challenge of assigning suitable reviewers to changes, we propose a hybrid and incremental approach TIE which utilizes the advantages of both text mining and a file location-based approach. To do this, TIE integrates an incremental text mining model which analyzes the textual contents in a review request, and a similarity model which measures the similarity of changed file paths and reviewed file paths. We perform a large-scale experiment on four open source projects, namely Android, OpenStack, QT, and LibreOffice, containing a total of 42,045 reviews. The experimental results show that on average TIE can achieve top-1, top-5, and top-10 accuracies, and Mean Reciprocal Rank (MRR) of 0.52, 0.79, 0.85, and 0.64 for the four projects, which improves the state-of-the-art approach RevFinder, proposed by Thongtanunam et al., by 61%, 23%, 8%, and 37%, respectively.

Exploring API Method Parameter Recommendations
Muhammad Asaduzzaman, Chanchal K. Roy, Samiul Monir, and Kevin A. Schneider
(University of Saskatchewan, Canada)
A number of techniques have been developed that support method call completion. However, there has been little research on the problem of method parameter completion. In this paper, we first present a study that helps us to understand how developers complete method parameters. Based on our observations, we developed a recommendation technique, called Parc, that collects parameter usage context using a source code localness property that suggests that developers tend to collocate related code fragments. Parc uses previous code examples together with contextual and static type analysis to recommend method parameters. Evaluating our technique against the only available state-of-the-art tool using a number of subject systems and different Java libraries shows that our approach has potential. We also explore the parameter recommendation support provided by the Eclipse Java Development Tools (JDT). Finally, we discuss limitations of our proposed technique and outline future research directions.

Info

Mobile Applications
Thu, Oct 1, 13:50 - 15:30, GW2 B3009 (Chair: Peng Xin)

How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall

(University of Zurich, Switzerland; University of Sannio, Italy; TU München, Germany)
App Stores, such as Google Play or the Apple Store, allow users to provide feedback on apps by posting review comments and giving star ratings. These platforms constitute a useful electronic mean in which application developers and users can productively exchange information about apps. Previous research showed that users feedback contains usage scenarios, bug reports and feature requests, that can help app developers to accomplish software maintenance and evolution tasks. However, in the case of the most popular apps, the large amount of received feedback, its unstructured nature and varying quality can make the identification of useful user feedback a very challenging task. In this paper we present a taxonomy to classify app reviews into categories relevant to software maintenance and evolution, as well as an approach that merges three techniques: (1) Natural Language Processing, (2) Text Analysis and (3) Sentiment Analysis to automatically classify app reviews into the proposed categories. We show that the combined use of these techniques allows to achieve better results (a precision of 75% and a recall of 74%) than results obtained using each technique individually (precision of 70% and a recall of 67%).

User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps
Fabio Palomba

, Mario Linares-Vásquez, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk

, and Andrea De Lucia

(University of Salerno, Italy; College of William and Mary, USA; Free University of Bolzano, Italy; University of Molise, Italy; University of Sannio, Italy)
Nowadays software applications, and especially mobile apps, undergo frequent release updates through app stores. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features. In this paper we show—by performing a study on 100 Android apps—how developers addressing user reviews increase their app’s success in terms of ratings. Specifically, we devise an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. The results indicate that developers implementing user reviews are rewarded in terms of ratings. This poses the need for specialized recommendation systems aimed at analyzing informative crowd reviews and prioritizing feedback to be satisfied in order to increase the apps success.

Info

What Are the Characteristics of High-Rated Apps? A Case Study on Free Android Applications
Yuan Tian, Meiyappan Nagappan, David Lo

, and Ahmed E. Hassan

(Singapore Management University, Singapore; Rochester Institute of Technology, USA; Queen's University, Canada)
The tremendous rate of growth in the mobile app market over the past few years has attracted many developers to build mobile apps. However, while there is no shortage of stories of how lone developers have made great fortunes from their apps, the majority of developers are struggling to break even. For those struggling developers, knowing the ``DNA'' (i.e., characteristics) of high-rated apps is the first step towards successful development and evolution of their apps. In this paper, we investigate 28 factors along eight dimensions to understand how high-rated apps are different from low-rated apps. We also investigate what are the most influential factors by applying a random-forest classifier to identify high-rated apps. Through a case study on 1,492 high-rated and low-rated free apps mined from the Google Play store, we find that high-rated apps are statistically significantly different in 17 out of the 28 factors that we considered. Our experiment also shows that the size of an app, the number of promotional images that the app displays on its web store page, and the target SDK version of an app are the most influential factors.

GreenAdvisor: A Tool for Analyzing the Impact of Software Evolution on Energy Consumption
Karan Aggarwal, Abram Hindle, and Eleni Stroulia
(University of Alberta, Canada)
Change-impact analysis, namely "identifying the potential consequences of a change" is an important and well studied problem in software evolution. Any change may potentially affect an application's behaviour, performance, and energy consumption profile. Our previous work demonstrated that changes to the system-call profile of an application correlated with changes to the application's energy-consumption profile. This paper evaluates and describes GreenAdvisor, a first of its kind tool that systematically records and analyzes an application's system calls to predict whether the energy-consumption profile of an application has changed. The GreenAdvisor tool was distributed to numerous software teams, whose members were surveyed about their experience using GreenAdvisor while developing Android applications to examine the energy-consumption impact of selected commits from the teams' projects. GreenAdvisor was evaluated against commits of these teams' projects. The two studies confirm the usefulness of our tool in assisting developers analyze and understand the energy-consumption profile changes of a new version. Based on our study findings, we constructed an improved prediction model to forecast the direction of the change, when a change in the energy-consumption profile is anticipated. This work can potentially be extremely useful to developers who currently have no similar tools.

Tool Demo Track
Wed, Sep 30, 13:50 - 15:30, GW2 B2890 (Chair: Collin McMillan; Nicholas A. Kraft)

apiwave: Keeping Track of API Popularity and Migration
André Hora and Marco Tulio Valente
(Federal University of Minas Gerais, Brazil)
Every day new frameworks and libraries are created and existing ones evolve. To benefit from such newer or improved APIs, client developers should update their applications. In practice, this process presents some challenges: APIs are commonly backward-incompatible (causing client applications to fail when updating) and multiple APIs are available (making it difficult to decide which one to use). To address these challenges, we propose apiwave, a tool that keeps track of API popularity and migration of major frameworks/libraries. The current version includes data about the evolution of top 650 GitHub Java projects, from which 320K APIs were extracted. We also report an experience using apiwave on real-world scenarios.

UrbanIt: Visualizing Repositories Everywhere
Andrea Ciani, Roberto Minelli, Andrea Mocci, and Michele Lanza

(University of Lugano, Switzerland)
Software evolution is supported by a variety of tools that help developers understand the structure of a software system, analyze its history and support specific classes of analyses. However, the increasingly distributed nature of software development requires basic repository analyses to be always available to developers, even when they cannot access their workstation with full-fledged applications and command-line tools.
We present UrbanIt, a gesture-based tablet application for the iPad that supports the visualization of software repositories together with useful evolutionary analyses (e.g. version diff) and basic sharing features in a portable and mobile setting. UrbanIt is paired with a web application that manages synchronization of multiple repositories.

Info

ePadEvo: A Tool for the Detection of Behavioral Design Patterns
Andrea De Lucia

, Vincenzo Deufemia, Carmine Gravino, Michele Risi, and Ciro Pirolli
(University of Salerno, Italy)
In this demonstration we present ePADevo, an Eclipse plug-in for recovering design pattern instances from object- oriented source code. The tool is able to recover design pattern instances through a static analysis performed on a data model extracted from source code, and a dynamic analysis performed through the instrumentation and the monitoring of the software system. Dynamic analysis is performed with automatically generated test cases exploiting the EvoSuite tool.

PARC: Recommending API Methods Parameters
Muhammad Asaduzzaman, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
APIs have grown considerably in size. To free developers from remembering every detail of an API, code completion has become an integral part of modern IDEs. Most work on code completion targets completing API method calls and leaves the task of completing method parameters to the developers. However, parameter completion is also a non-trivial task. We present an Eclipse plugin, called PARC, that supports automatic completion of API method parameters. The tool is based on the localness property of source code, which states that developers tend to put related code fragments close together. PARC combines contextual and static type analysis to support a wide range of parameter expression types.

Video

Info

ArchFLoc: Locating and Explaining Architectural Features in Running Web Applications
Yan Gao and Daqing Hou
(Clarkson University, USA)
Feature location is a critical step in the software maintenance process where a developer identifies the software artifacts that need to be changed in order to fulfill a new feature request. Much progress has been made in understanding the feature location process and in creating new tools to help a developer in performing this task. However, there is still lack of support for locating architectural features, ones that require a developer to touch on more than one architectural component. We demonstrate a tool called ArchFLoc that can be used to discover and highlight architectural level features that are otherwise hidden in a software system. ArchFLoc is integrated into user interfaces, so the developer can express a feature query by directly interacting with user interface elements at runtime. Based on the user query, ArchFLoc discovers relevant code artifacts and dependencies, and assembles documentation to explain their roles in the overall architectural design.

WSDarwin: A Web Application for the Support of REST Service Evolution
Marios Fokaefs, Mihai Oprescu, and Eleni Stroulia
(University of Alberta, Canada)
REST has become a very popular architectural style for service-oriented systems, primarily due to its ease of use and flexibility. However, the lightweight nature of its syntax does not necessitate the use of systematic methods and tools. In this work, we argue that such tools can greatly facilitate complex engineering tasks, including service discovery and evolution. We present the WSDarwin set of tools to generate WADL interfaces for REST services, to compare service interfaces to identify differences between versions, and to compare service offerings of different vendors to facilitate service discovery and interoperability. Video URL: http://youtu.be/52CclMbJt6M Web App URL: http://ssrg17.cs.ualberta.ca/wsdarwin/ Documentation URL: http://goo.gl/Qrzgaq

Video

Info

DUM-Tool
Simone Romano and Giuseppe Scanniello
(University of Basilicata, Italy)
With object-oriented programming languages (e.g., Java or C#), the identification of unreachable source code may be very complex especially when working at method level. To deal with the detection of unreachable methods, we have defined an approach named DUM: Detecting Unreachable Methods. We implemented a prototype of a supporting software we named DUM-Tool. It works on Java byte-code and detects unreachable methods by traversing a graph-based representation of a subject software.

Info

Industry Track

Industry Experience
Tue, Sep 29, 10:40 - 12:20, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

An Empirical Study on the Handling of Crash Reports in a Large Software Company: An Experience Report
Abdou Maiga, Abdelwahab Hamou-Lhadj, Mathieu Nayrolles, Korosh Koochekian-Sabor, and Alf Larsson
(Concordia University, Canada; Ericsson, Sweden)
In this paper, we report on an empirical study we have conducted at Ericsson to understand the handling of crash reports (CRs). The study was performed on a dataset of CRs spanning over two years of activities on one of Ericsson’s largest systems (+4 Million LOC). CRs at Ericsson are divided into two types: Internal and External. Internal CRs are reported within the organization after the integration and system testing phase. External CRs are submitted by customers and caused mainly by field failures. We examine the proportion and severity of internal CRs and that of external CRs. A large number of external (and severe) CRs could indicate flaws in the testing phase. Failing to react quickly to external CRs, on the other hand, may expose Ericsson to fines and penalties due to the Working Level Agreements (WLA) that Ericsson has with its customers. Moreover, we contrast the time it takes to handle each type of CRs with the dual aim to understand the similarities and differences as well as the factors that impact the handling of each type of CRs. Our results show that (a) it takes more time to fix external CRs compared to internal CRs, (b) the severity attribute is used inconsistently through organizational units, (c) assignment time of internal CRs is less than that of external CRs, (d) More than 50% of CRs are not answered within the organization’s fixing time requirements defined in WLA.

How Developers Detect and Fix Performance Bottlenecks in Android Apps
Mario Linares-Vásquez, Christopher Vendome, Qi Luo, and Denys Poshyvanyk

(College of William and Mary, USA)
Performance of rapidly evolving mobile apps is one of the top concerns for users and developers nowadays. Despite the efforts of researchers and mobile API designers to provide developers with guidelines and best practices for improving the performance of mobile apps, performance bottlenecks are still a significant and frequent complaint that impacts the ratings and apps’ chances for success. However, little research has been done into understanding actual developers’ practices for detecting and fixing performance bottlenecks in mobile apps. In this paper, we present the results of an empirical study aimed at studying and understanding these practices by surveying 485 open source Android app and library developers, and manually analyzing performance bugs and fixes in their app repositories hosted on GitHub. The paper categorizes actual practices and tools used by real developers while dealing with performance issues. In general, our findings indicate that developers heavily rely on user reviews and manual execution of the apps for detecting performance bugs. While developers also use available tools to detect performance bottlenecks, these tools are mostly for profiling and do not help in detecting and fixing performance issues automatically

Info

Challenges for Maintenance of PLC-Software and Its Related Hardware for Automated Production Systems: Selected Industrial Case Studies
Birgit Vogel-Heuser, Juliane Fischer, Susanne Rösch, Stefan Feldmann, and Sebastian Ulewicz
(TU München, Germany)
The specific challenges for maintenance of software and its relat-ed hardware for the domain of automated Production Systems is discussed. Presenting four industrial case studies from renowned and world market leading German machine and plant manufac-turing companies, these challenges and different solution ap-proaches are introduced with a focus on software architectures to support modularity as a basis for maintaining long-living automated Production Systems. Additionally, most critical as-pects hindering classical approaches from software engineering to be successful, e.g., modes of operation and fault handling, are discussed. In the last decades, research in the field of software engineering for automated Production Systems (aPS) has been focusing on developing domain specific model-driven engineering approaches supporting the development process, but mostly neglecting the operation, maintenance and re-engineering as-pects. However, the success of model-driven engineering in aPS industry has been limited because the effort to introduce model-driven engineering and to change the entire existing legacy soft-ware is estimated as too high and the benefit as too low against the background of customer specific solutions expecting a low degree of reuse.

Code Smells in Spreadsheet Formulas Revisited on an Industrial Dataset
Bas Jansen and Felienne Hermans
(Delft University of Technology, Netherlands)
In previous work, code smells have been adapted to be applicable on spreadsheet formulas. The smell detection algorithm used in this earlier study was validated on a small dataset of industrial spreadsheets by interviewing the users of these spreadsheets and asking them about their opinion about the found smells. In this paper a more in depth validation of the algorithm is done by analyzing a set of spreadsheets of which users indicated whether or not they are smelly. This new dataset gives us the unique possibility to get more insight in how we can distinguish `bad' spreadsheets from `good' spreadsheets. We do that in two ways: For both the smelly and non smelly spreadsheets we 1) have calculated the metrics that detect the smells and 2) have calculated metrics with respect to size, level of coupling, and the use of functions. The results show that indeed the metrics for the smells decrease in spreadsheets that are not smelly. With respect to size we found to our surprise that the improved spreadsheets were not smaller, but bigger. With regard to coupling and the use of functions both datasets are similar. It indicates that it is difficult to use metrics with respect to size, degree of coupling or use of functions to draw conclusions on the complexity of a spreadsheet.

Info

Developer Studies
Tue, Sep 29, 13:50 - 15:30, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Web Usage Patterns of Developers
Christopher S. Corley, Federico Lois, and Sebastián Quezada
(ABB Corporate Research, USA; Corvalius, Argentina)
Developers often rely on the web-based tools for troubleshooting, collaboration, issue tracking, code reviewing, documentation viewing, and a myriad of other uses. Developers also use the web for non-development purposes, such as reading news or social media. In this paper we explore whether web usage is detriment to a developer's focus on work from a sample over 150 developers. Additionally, we investigate if highly-focused developers use the web differently than other developers. Our qualitative findings suggest highly-focused developers use the web differently, but we are unable to predict a developer's focused based on web usage alone. Further quantitative findings suggest that web usage does not have a negative impact on a developer's focus.

Identifying Wasted Effort in the Field via Developer Interaction Data
Gergő Balogh

, Gábor Antal, Árpád Beszédes

, László Vidács, Tibor Gyimóthy, and Ádám Zoltán Végh
(University of Szeged, Hungary; AENSys Informatics, Hungary)
During software projects, several parts of the source code are usually re-written due to imperfect solutions before the code is released. This wasted effort is of central interest to the project management to assure on-time delivery. Although the amount of thrown-away code can be measured from version control systems, stakeholders are more interested in productivity dynamics that reflect the constant change in a software project. In this paper we present a field study of measuring the productivity of a medium-sized J2EE project. We propose a productivity analysis method where productivity is expressed through dynamic profiles -- the so-called Micro-Productivity Profiles (MPPs). They can be used to characterize various constituents of software projects such as components, phases and teams. We collected detailed traces of developers' actions using an Eclipse IDE plug-in for seven months of software development throughout two milestones. We present and evaluate profiles of two important axes of the development process: by milestone and by application layers. MPPs can be an aid to take project control actions and help in planning future projects. Based on the experiments, project stakeholders identified several points to improve the development process. It is also acknowledged, that profiles show additional information compared to a naive diff-based approach.

Is This Code Written in English? A Study of the Natural Language of Comments and Identifiers in Practice
Timo Pawelka and Elmar Juergens
(TU München, Germany; CQSE, Germany)
Comments and identifiers are the main source of documentation of source-code and are therefore an integral part of the development and the maintenance of a program. As English is the world language, most comments and identifiers are written in English. However, if they are in any other language, a developer without knowledge of this language will almost perceive the code to be undocumented or even obfuscated. In absence of industrial data, academia is not aware of the extent of the problem of non- English comments and identifiers in practice. In this paper, we propose an approach for the language identification of source- code comments and identifiers. With the approach, a large-scale study has been conducted of the natural language of source-code comments and identifiers, analyzing multiple open-source and industry systems. The results show that a significant amount of the industry projects contain comments and identifiers in more than one language, whereas none of the analyzed open-source systems has this problem.

Impact Assessment for Vulnerabilities in Open-Source Software Libraries
Henrik Plate, Serena Elisa Ponta, and Antonino Sabetta
(SAP Labs, France)
Software applications integrate more and more open-source software (OSS) to benefit from code reuse. As a drawback, each vulnerability discovered in bundled OSS may potentially affect the application that includes it. Upon the disclosure of every new vulnerability, the application vendor has to assess whether such vulnerability is exploitable in the particular usage context of the applications, and needs to determine whether customers require an urgent patch containing a non-vulnerable version of the OSS. Unfortunately, current decision making relies mostly on natural-language vulnerability descriptions and expert knowledge, and is therefore difficult, time-consuming, and error-prone. This paper proposes a novel approach to support the impact assessment based on the analysis of code changes introduced by security fixes. We describe our approach using an illustrative example and perform a comparison with both proprietary and open-source state-of-the-art solutions. Finally we report on our experience with a sample application and two industrial development projects.

Software Quality
Wed, Sep 30, 10:40 - 12:20, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Experiences from Performing Software Quality Evaluations via Combining Benchmark-Based Metrics Analysis, Software Visualization, and Expert Assessment
Aiko Yamashita
(Oslo and Akershus University College of Applied Sciences, Norway)
Software quality assessments are critical in organizations where the software has been produced by external vendors, or when the development and maintenance of a software product has been outsourced to external parties. These assessments are typically challenging because is not always possible to access the original developers (or sometimes is not even allowed), and in rare cases suppliers keep an account of the costs associated to code changes or defect fixes. In those situations, one is left with the artifacts (e.g., database, source code, and documentation) as the only sources of evidence for performing such evaluations. A major challenge is also to provide fact-based conclusions for supporting decision-making, instead of subjective interpretations based on expert assessments (an approach still very predominant in mainstream industrial practice). This paper describes an instance of a software quality evaluation process performed for an international logistics company, which combined: benchmark-based metrics threshold analysis, software visualization, and expert assessment. An interview was carried out afterwards with a member from the business division of the company, to assess the usefulness of the methodology and corresponding findings, and to explore avenues for future improvement.

Do Automatic Refactorings Improve Maintainability? An Industrial Case Study
Gábor Szőke, Csaba Nagy, Péter Hegedűs, Rudolf Ferenc, and Tibor Gyimóthy
(University of Szeged, Hungary)
Refactoring is often treated as the main remedy against the unavoidable code erosion happening during software evolution. Studies show that refactoring is indeed an elemental part of the developers' arsenal. However, empirical studies about the impact of refactorings on software maintainability still did not reach a consensus. Moreover, most of these empirical investigations are carried out on open-source projects where distinguishing refactoring operations from other development activities is a challenge in itself.
We had a chance to work together with several software development companies in a project where they got extra budget to improve their source code by performing refactoring operations. Taking advantage of this controlled environment, we collected a large amount of data during a refactoring phase where the developers used a (semi)automatic refactoring tool. By measuring the maintainability of the involved subject systems before and after the refactorings, we got valuable insights into the effect of these refactorings on large-scale industrial projects. All but one company, who applied a special refactoring strategy, achieved a maintainability improvement at the end of the refactoring phase, but even that one company suffered from the negative impact of only one type of refactoring.

An Empirical Evaluation of the Effectiveness of Inspection Scenarios Developed from a Defect Repository
Kiyotaka Kasubuchi, Shuji Morisaki, Akiko Yoshida, and Chikako Ogawa
(SCREEN Holdings, Japan; Nagoya University, Japan; Shizuoka University, Japan)
Abstracting and summarizing high-severity defects detected dur-ing inspections of previous software versions could lead to effec-tive inspection scenarios in a subsequent version in software maintenance and evolution. We conducted an empirical evalua-tion of 456 defects detected from the requirement specification inspections conducted during the development of industrial soft-ware. The defects were collected from an earlier version, which included 59 high-severity defects, and from a later version, which included 48 high-severity defects. The results of the evaluation showed that nine defect types and their corresponding inspection scenarios were obtained by abstracting and summarizing 45 de-fects in the earlier version. The results of the evaluation also showed that 46 of the high-severity defects in the later version could be potentially detected using the obtained inspection sce-narios. The study also investigated which inspection scenarios can be obtained by the checklist proposed in the value-based review (VBR). It was difficult to obtain five of the inspection sce-narios using the VBR checklist. Furthermore, to investigate the effectiveness of cluster analysis for inspection scenario develop-ment, the 59 high-severity defects in the earlier version were clus-tered into similar defect groups by a clustering algorithm. The results indicated that cluster analysis can be a guide for selecting similar defects and help in the tasks of abstracting and summa-rizing defects.

Efficient Regression Testing Based on Test History: An Industrial Evaluation
Edward Dunn Ekelund and Emelie Engström
(Axis Communication, Sweden; Lund University, Sweden)
Due to changes in the development practices at Axis Communications, towards continuous integration, faster regression testing feedback is needed. The current automated regression test suite takes approximately seven hours to run which prevents developers from integrating code changes several times a day as preferred. Therefore we want to implement a highly selective yet accurate regression testing strategy. Traditional code coverage based techniques are not applicable due to the size and complexity of the software under test. Instead we decided to select tests based on regression test history. We developed a tool, the Difference Engine, which parses and analyzes results from previous test runs and outputs regression test recommendations. The Difference Engine correlates code and test cases at package level and recommends test cases that are strongly correlated to recently changed packages. We evaluated the technique with respect to correctness, precision, recall and efficiency. Our results are promising. On average the tool manages to identify 80% of the relevant tests while recommending only 4% of the test cases in the full regression test suite.

Software Reengineering
Tue, Sep 29, 16:00 - 17:40, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Migrating Legacy Control Software to Multi-core Hardware
Michael Wahler, Raphael Eidenbenz, Carsten Franke, and Yvonne-Anne Pignolet
(ABB Corporate Research, Switzerland)
This paper reports on a case study on analyzing, structuring and re-using real-time control algorithms which represent a significant amount of intellectual property. As a starting point, legacy code written in ADA together with a Windows-based testing framework is available. The goal is to migrate the code onto a real-time multi-core platform taking advantage of technological progress.
We present a tool-supported three-step approach for such legacy control software: identifying and isolating the control algorithms, preparing these algorithms and their information exchange for execution within a modern execution framework for Linux written in C++, and validating the solution by a) performing regression testing to ensure partial correctness and b) validating its real-time properties.

Query by Example in Large-Scale Code Repositories
Vipin Balachandran
(VMware, India)
Searching code samples in a code repository is an important part of program comprehension. Most of the existing tools for code search support syntactic element search and regular expression pattern search. However, they are text-based and hence cannot handle queries which are syntactic patterns. The proposed solutions for querying syntactic patterns using specialized query languages present a steep learning curve for users. The querying would be more user-friendly if the syntactic pattern can be formulated in the underlying programming language (as a sample code snippet) instead of a specialized query language. In this paper, we propose a solution for the query by example problem using Abstract Syntax Tree (AST) structural similarity match. The query snippet is converted to an AST, then its subtrees are compared against AST subtrees of source files in the repository and the similarity values of matching subtrees are aggregated to arrive at a relevance score for each of the source files. To scale this approach to large code repositories, we use locality-sensitive hash functions and numerical vector approximation of trees. Our experimental evaluation involves running control queries against a real project. The results show that our algorithm can achieve high precision (0.73) and recall (0.81) and scale to large code repositories without compromising quality.

Does Software Modernization Deliver What It Aimed for? A Post Modernization Analysis of Five Software Modernization Case Studies
Ravi Khadka, Prajan Shrestha, Bart Klein, Amir Saeidi, Jurriaan Hage, Slinger Jansen, Edwin van Dis, and Magiel Bruntink
(Utrecht University, Netherlands; University of Amsterdam, Netherlands; CGI, Netherlands)
Software modernization has been extensively researched, primarily focusing on observing the associated phenomena, and providing technical solutions to facilitate the modernization process. Software modernization is claimed to be successful when the modernization is completed using those technical solutions. Very limited research, if any, is reported with an aim at documenting the post-modernization impacts, i.e., whether any of the pre-modernization business goals are in fact achieved after modernization. In this research, we attempt to address this relative absence of empirical study through five retrospective software modernization case studies. We use an explanatory case study approach to document the pre-modernization business goals, and to decide whether those goals have been achieved. The intended benefits for each of the five cases we considered were all (partially) met, and in most cases fully. Moreover, many cases exhibited a number of unintended benefits, and some reported detrimental effects of modernization.

Info

Reverse Engineering a Visual Age Application
Harry M. Sneed and Chris Verhoef
(SoRing, Hungary; TU Dresden, Germany; VU University Amsterdam, Netherlands)
This paper is an industrial case study of how a VisualAge application system on an IBM mainframe was reverse engineered into a system reference repository. The starting point was the code fragments generated by the VisualAge interactive development tool. The results of the reverse engineering process were a use case documentation, a module documentation and a system reference repository. In these documents, the names of the data and functions were extended to be more understandable. The process was in the end fully automated and took three months to implement. The resulting documentation is now being used as a basis for re-implementing the system in Java.

Using Static Analysis for Knowledge Extraction from Industrial User Interfaces
Bernhard Dorninger, Josef Pichler, and Albin Kern
(Software Competence Center Hagenberg, Austria; Engel Austria, Austria)
Graphical User Interfaces (GUI) play an essential role in operating industrial facilities and machines. Depending on the range and variability of a manufacturer's product portfolio a huge library of GUI software may exist. This poses quite a challenge when it comes to testing or re-engineering. Static analysis helps to unveil valuable, inherent knowledge and prepare it for further analysis and processing. In our case at ENGEL Austria GmbH, we extract the internal structure of the GUI screens, their variants and the control system context they are used in, i.e. which PLC variables they access. In another step, we analyze the usage pattern of method calls to certain UI widgets. In this paper we show our approach to gain these information based on static analysis of existing GUI source code for injection molding machines.

Early Research Achievements Track

Defects and Refactoring
Wed, Sep 30, 08:30 - 10:10, GW2 B2890 (Chair: Coen De Roover; Foutse Khomh; Lin Tan; Serge Demeyer)

Constrained Feature Selection for Localizing Faults
Tien-Duy B. Le, David Lo

, and Ming Li
(Singapore Management University, Singapore; Nanjing University, China)
Developers often take much time and effort to find buggy program elements. To help developers debug, many past studies have proposed spectrum-based fault localization techniques. These techniques compare and contrast correct and faulty execution traces and highlight suspicious program elements. In this work, we propose constrained feature selection algorithms that we use to localize faults. Feature selection algorithms are commonly used to identify important features that are helpful for a classification task. By mapping an execution trace to a classification instance and a program element to a feature, we can transform fault localization to the feature selection problem. Unfortunately, existing feature selection algorithms do not perform too well, and we extend its performance by adding a constraint to the feature selection formulation based on a specific characteristic of the fault localization problem. We have performed experiments on a popular benchmark containing 154 faulty versions from 8 programs and demonstrate that several variants of our approach can outperform many fault localization techniques proposed in the literature. Using Wilcoxon rank-sum test and Cliff's d effect size, we also show that the improvements are both statistically significant and substantial.

Crowdsourced Bug Triaging
Ali Sajedi Badashian, Abram Hindle, and Eleni Stroulia
(University of Alberta, Canada)
Bug triaging and assignment is a time-consuming task in big projects. Most research in this area examines the developers’ prior development and bug-fixing activities in order to recognize their areas of expertise and assign to them relevant bug fixes. We propose a novel method that exploits a new source of evidence for the developers’ expertise, namely their contributions to Q&A platforms such as Stack Overflow. We evaluated this method in the context of the 20 largest GitHub projects, considering 7144 bug reports. Our results demonstrate that our method exhibits superior accuracy to other state-of-theart methods, and that future bug-assignment algorithms should consider exploring other sources of expertise, beyond the project’s version-control system and bug tracker.

Toward Improving Graftability on Automated Program Repair
Soichi Sumi, Yoshiki Higo

, Keisuke Hotta, and Shinji Kusumoto
(Osaka University, Japan)
In software evolution, many bugs occur and developers spend a long time to fix them. Program debugging is a costly and difficult task. Automated program repair is a promising way to reduce costs on program debugging dramatically. Several repair techniques reusing existing code lines have been proposed in the past. They reuse code lines already existing in the source code to generate variant source code of a given source code (if an inserted code line to fix a given bug is identical to any of the code lines in existing source code, we call the code line graftable). However, there are many bugs that such techniques cannot automatically repair. One of the reasons is that many bugs require code lines not existing in the source code of the software. In order to mitigate this issue, we are conducting our research with two ideas. The first idea is using a large dataset of source code to reuse code lines. The second idea is reusing only structures of code lines. Vocabularies are obtained from faulty code regions. In this paper, we report the feasibilities of the two ideas. More concretely, we found that the first and second ideas improved graftability of code lines to 43--59% and 56--64% from 34--54%, respectively. If we combine both the ideas, graftability was improved to 64--69%. In cases where we used the second idea, 24--49% variables used in reused code lines were able to be retrieved from the surrounding code of given faulty code regions.

Mining Stack Overflow for Discovering Error Patterns in SQL Queries
Csaba Nagy and Anthony Cleve
(University of Namur, Belgium)
Constructing complex queries in SQL sometimes necessitates the use of language constructs and the invocation of internal functions which inexperienced developers find hard to comprehend or which are unknown to them. In the worst case, bad usage of these constructs might lead to errors, to ineffective queries, or hamper developers in their tasks.
This paper presents a mining technique for Stack Overflow to identify error-prone patterns in SQL queries. Identifying such patterns can help developers to avoid the use of error-prone constructs, or if they have to use such constructs, the Stack Overflow posts can help them to properly utilize the language. Hence, our purpose is to provide the initial steps towards a recommendation system that supports developers in constructing SQL queries.
Our current implementation supports the MySQL dialect, and Stack Overflow has over 300,000 questions tagged with the MySQL flag in its database. It provides a huge knowledge base where developers can ask questions about real problems. Our initial results indicate that our technique is indeed able to identify patterns among them.

Towards Purity-Guided Refactoring in Java
Jiachen Yang, Keisuke Hotta, Yoshiki Higo

, and Shinji Kusumoto
(Osaka University, Japan)
Refactoring source code requires preserving a certain level of semantic behaviors, which are difficult to be checked by IDEs. Therefore, IDEs generally check syntactic pre-conditions instead before applying refactoring, which are often too restrictive than checking semantic behaviors. On the other hand, there are pure functions in the source code that do not have observable side-effects, of which semantic behaviors are more easily to be checked. In this research, we propose purity-guided refactoring, which applies high-level refactoring such as memoization on pure functions that can be detected statically. By combining our purity analyzing tool purano with refactoring, we can ensure the preservation of semantic behaviors on these detected pure functions, which is impossible through previous refactoring operations provided by IDEs. As a case study of our approach, we applied memorization refactoring on several open-source software in Java. We observed improvements of the performance and preservation of semantics by profiling their bundled test cases.

Fitness Workout for Fat Interfaces: Be Slim, Clean, and Flexible
Spyros Kranas, Apostolos V. Zarras, and Panos Vassiliadis
(University of Ioannina, Greece)
A class that provides a fat interface violates the interface segregation principle, which states that the clients of the class should not be coupled with methods that they do not need. Coping with this problem involves extracting interfaces that satisfy the needs of the clients. In this paper, we envision an interface extraction method that serves a combination of four principles: (1) fitness, as the extracted interfaces have to fit the needs of the clients, (2) clarity, as the interfaces should not be cluttered with duplicated methods declarations due to clients' similar needs, (3) flexibility, as it should be easy to maintain the extracted interfaces to cope with client changes, without affecting parts of the software that are not concerned by the changes, and (4) practicality, as the interface extraction should account for practical issues like the number of extracted interfaces, domain/developer specific constraints on what to include in the interfaces, etc. Our preliminary results show that it is feasible to extract interfaces by respecting the aforementioned principles. Moreover, our results reveal a number of open issues around the trading between fitness, clarity, flexibility and practicality.

Social and Developers
Thu, Oct 1, 10:40 - 12:20, GW2 B2890 (Chair: Fabian Beck; Latifa Guerrouj)

Choosing Your Weapons: On Sentiment Analysis Tools for Software Engineering Research
Robbert Jongeling, Subhajit Datta, and Alexander Serebrenik
(Eindhoven University of Technology, Netherlands; Singapore University of Technology and Design, Singapore)
Recent years have seen an increasing attention to social aspects of software engineering, including studies of emotions and sentiments experienced and expressed by the software developers. Most of these studies reuse existing sentiment analysis tools such as SentiStrength and NLTK. However, these tools have been trained on product reviews and movie reviews and, therefore, their results might not be applicable in the software engineering domain. In this paper we study whether the sentiment analysis tools agree with the sentiment recognized by human evaluators (as reported in an earlier study) as well as with each other. Furthermore, we evaluate the impact of the choice of a sentiment analysis tool on software engineering studies by conducting a simple study of differences in issue resolution times for positive, negative and neutral texts. We repeat the study for seven datasets (issue trackers and Stack Overflow questions) and different sentiment analysis tools and observe that the disagreement between the tools can lead to contradictory conclusions.

Assessing Developer Contribution with Repository Mining-Based Metrics
Jalerson Lima, Christoph Treude, Fernando Figueira Filho, and Uirá Kulesza
(Federal University of Rio Grande do Norte, Brazil; IFRN, Brazil)
Productivity as a result of individual developers' contributions is an important aspect for software companies to maintain their competitiveness in the market. However, there is no consensus in the literature on how to measure productivity or developer contribution. While some repository mining-based metrics have been proposed, they lack validation in terms of their applicability and usefulness from the individuals who will use them to assess developer contribution: team and project leaders. In this paper, we propose the design of a suite of metrics for the assessment of developer contribution, based on empirical evidence obtained from project and team leaders. In a preliminary evaluation with four software development teams, we found that code contribution and code complexity metrics received the most positive feedback, while participants pointed out several threats of using bug-related metrics for contribution assessment. None of the metrics can be used in isolation, and project leaders and developers need to be aware of the benefits, limitations, and threats of each one. These findings present a first step towards the design of a larger suite of metrics as well as an investigation into the impact of using metrics to assess contribution.

What's Hot in Software Engineering Twitter Space?
Abhishek Sharma, Yuan Tian, and David Lo

(Singapore Management University, Singapore)
Abstract—Twitter is a popular means to disseminate information and currently more than 300 million people are using it actively. Software engineers are no exception; Singer et al. have shown that many developers use Twitter to stay current with recent technological trends. At various time points, many users are posting microblogs (i.e., tweets) about the same topic in Twitter. We refer to this reasonably large set of topically-coherent microblogs in the Twitter space made at a particular point in time as an event. In this work, we perform an exploratory study on software engineering related events in Twitter. We collect a large set of Twitter messages over a period of 8 months that are made by 79,768 Twitter users and filter them by five programming language keywords. We then run a state-of-the-art Twitter event detection algorithm borrowed from the Natural Language Processing (NLP) domain. Next, using the open coding procedure, we manually analyze 1,000 events that are identified by the NLP tool, and create eleven categories of events (10 main categories + “others”). We find that external resource sharing, technical discussion, and software product updates are the “hottest” categories. These findings shed light on hot topics in Twitter that are interesting to many people and they provide guidance to future Twitter analytics studies that develop automated solutions to help users find fresh, relevant, and interesting pieces of information from Twitter stream to keep developers up-to-date with recent trends.

Validating Metric Thresholds with Developers: An Early Result
Paloma Oliveira, Marco Tulio Valente, Alexandre Bergel, and Alexander Serebrenik
(Federal University of Minas Gerais, Brazil; IFMG, Brazil; University of Chile, Chile; Eindhoven University of Technology, Netherlands)
Thresholds are essential for promoting source code metrics as an effective instrument to control the internal quality of software applications. However, little is known about the relation between software quality as identified by metric thresholds and as perceived by real developers. In this paper, we report the first results of a study designed to validate a technique that extracts relative metric thresholds from benchmark data. We use this technique to extract thresholds from a benchmark of 79 Pharo/Smalltalk applications, which are validated with five experts and 25 developers. Our preliminary results indicate that good quality applications—as cited by experts—respect metric thresholds. In contrast, we observed that noncompliant applications are not largely viewed as requiring more effort to maintain than other applications.

Info

Towards a Survival Analysis of Database Framework Usage in Java Projects
Mathieu Goeminne and Tom Mens
(University of Mons, Belgium)
Many software projects rely on a relational database in order to realize part of their functionality. Various database frameworks and object-relational mappings have been developed and used to facilitate data manipulation. Little is known about whether and how such frameworks co-occur, how they complement or compete with each other, and how this changes over time. We empirically studied these aspects for 5 Java database frameworks, based on a corpus of 3,707 GitHub Java projects. In particular, we analysed whether certain database frameworks co- occur frequently, and whether some database frameworks get replaced over time by others. Using the statistical technique of survival analysis, we explored the survival of the database frameworks in the considered projects. This provides useful evidence to software developers about which frameworks can be used successfully in combination and which combinations should be avoided.

Maintenance and Analysis
Thu, Oct 1, 13:50 - 15:30, GW2 B2890 (Chair: Ferenc Rudolf; Giuseppe Scanniello)

Exploring the Use of Deep Learning for Feature Location
Christopher S. Corley, Kostadin Damevski, and Nicholas A. Kraft
(University of Alabama, USA; Virginia Commonwealth University, USA; ABB Corporate Research, USA)
Deep learning models can infer complex patterns present in natural language text. Relative to n-gram models, deep learning models can capture more complex statistical patterns based on smaller training corpora. In this paper we explore the use of a particular deep learning model, document vectors (DVs), for feature location. DVs seem well suited to use with source code, because they both capture the influence of context on each term in a corpus and map terms into a continuous semantic space that encodes semantic relationships such as synonymy. We present preliminary results that show that a feature location technique (FLT) based on DVs can outperform an analogous FLT based on latent Dirichlet allocation (LDA) and then suggest several directions for future work on the use of deep learning models to improve developer effectiveness in feature location.

Info

Using Stereotypes in the Automatic Generation of Natural Language Summaries for C++ Methods
Nahla J. Abid, Natalia Dragan, Michael L. Collard, and Jonathan I. Maletic
(Kent State University, USA; University of Akron, USA)
An approach to automatically generate natural language documentation summaries for C++ methods is presented. The approach uses prior work by the authors on stereotyping methods along with the source code analysis framework srcML. First, each method is automatically assigned a stereotype(s) based on static analysis and a set of heuristics. Then, the approach uses the stereotype information, static analysis, and predefined templates to generate a natural-language summary for each method. This summary is automatically added to the code base as a comment for each method. The predefined templates are designed to produce a generic summary for specific method stereotypes. Static analysis is used to extract internal details about the method (e.g., parameters, local variables, calls, etc.). This information is used to specialize the generated summaries.

Keecle: Mining Key Architecturally Relevant Classes using Dynamic Analysis
Liliane do Nascimento Vale and Marcelo de A. Maia
(Federal University of Uberlândia, Brazil; Federal University of Goiás, Brazil)
Reconstructing architectural components from existing software applications is an important task during the software maintenance cycle because either those elements do not exist or are outdated. Reverse engineering techniques are used to reduce the effort demanded during the reconstruction. Unfortunately, there is no widely accepted technique to retrieve software components from source code. Moreover, in several architectural descriptions of systems, a set of architecturally relevant classes are used to represent the set of architectural components. Based on this fact, we propose Keecle, a novel dynamic analysis approach for the detection of such classes from execution traces in a semi-automatic manner. Several mechanisms are applied to reduce the size of traces, and finally the reduced set of key classes is identified using Naïve Bayes classification. We evaluated the approach with two open source systems, in order to assess if the encountered classes map to the actual architectural classes defined in the documentation of those respective systems. The results were analyzed in terms of precision and recall, and suggest that the proposed approach is effective for revealing key classes that conceptualize architectural components, outperforming a state-of-the-art approach.

Combining Software Interrelationship Data across Heterogeneous Software Repositories
Nikola Ilo, Johann Grabner, Thomas Artner, Mario Bernhart, and Thomas Grechenig
(Vienna University of Technology, Austria)
Software interrelationships have an impact on the quality and evolution of software projects and are therefore important to development and maintenance. Package management and build systems result in software ecosystems that usually are syntactically and semantically incompatible with each other, although the described software can overlap. There is currently no general way for querying software interrelationships across these different ecosystems. In this paper, we present our approach to combine and consequently query information about software interrelationships across different ecosystems. We propose an ontology for the semantic modeling of the relationships as linked data. Furthermore, we introduce a temporal storage and query model to handle inconsistencies between different data sources. By providing a scalable and extensible architecture to retrieve and process data from multiple repositories, we establish a foundation for ongoing research activities. We evaluated our approach by integrating the data of several ecosystems and demonstrated its usefulness by creating tools for vulnerability notification and license violation detection.

Recovering Transitive Traceability Links among Software Artifacts
Kazuki Nishikawa, Hironori Washizaki, Yoshiaki Fukazawa, Keishi Oshima, and Ryota Mibe
(Waseda University, Japan; Hitachi, Japan; Yokohama Research Laboratory, Japan)
Although many methods have been suggested to automatically recover traceability links in software development, they do not cover all link combinations (e.g., links between the source code and test cases) because specific documents or artifact features (e.g., log documents and structures of source code) are used. In this paper, we propose a method called the Connecting Links Method (CLM) to recover transitive traceability links between two artifacts using a third artifact. Because CLM uses a different artifact as a document, it can be applied to kinds of various data. Basically, CLM recovers traceability links using the Vector Space Model (VSM) in Information Retrieval (IR) methods. For example, by connecting links between A and B and between B and C, CLM retrieves the link between A and C transitively. In this way, CLM can recover transitive traceability links when a suggested method cannot. Here we demonstrate that CLM can effectively recover links that VSM is hard using Open Source Software.

Live Object Exploration: Observing and Manipulating Behavior and State of Java Objects
Benjamin Biegel, Benedikt Lesch, and Stephan Diehl
(University of Trier, Germany)
In this paper we introduce a visual representation of Java objects that can be used for observing and manipulating behavior and state of currently developed classes. It runs separately, e.g., on a tablet, beside an integrated development environment. Within the visualization, developers are able to arbitrarily change the object state, then invoke any method with custom parameters and observe how the object state changes. When changing the source code of the related class, the visualization holds the previous object state and adapts the new behavior defined by the underlying source code. This instantly enables developers to observe functionalities objects of a certain class have and how they manipulate their state, and especially, how source code changes influence their behavior. We implemented a first prototype as a touch-enabled web application that is connected to a conventional integrated development environment. In order to gain first practical insights, we evaluated our approach in a pilot user study.

Doctoral Symposium

Post-Doctoral
Mon, Sep 28, 13:30 - 15:00, GW2 B2890 (Chair: Eleni Stroulia; Massimiliano Di Penta)

Supporting Newcomers in Software Development Projects
Sebastiano Panichella
(University of Zurich, Switzerland)
The recent and fast expansion of OSS (Open-source software) communities has fostered research on how open source projects evolve and how their communities interact. Several research studies show that the inflow of new developers plays an important role in the longevity and the success of OSS projects. Beside that they also discovered that an high percentage of newcomers tend to leave the project because of the socio-technical barriers they meet when they join the project. However, such research effort did not generate yet concrete results in support retention and training of project newcomers. In this thesis dissertation we investigated problems arising when newcomers join software projects, and possible solutions to support them. Specifically, we studied (i) how newcomers behave during development activities and how they interact with others developers with the aim at (ii) developing tools and/or techniques for supporting them during the integration in the development team. Thus, among the various recommenders, we defined (i) a tool able to suggest appropriate mentors to newcomers during the training stage; then, with the aim at supporting newcomers during program comprehension we defined other two recommenders: A tool that (ii) generates high quality source code summaries and another tool able to (iii) provide descriptions of specific source code elements. For future work, we plan to improve the proposed recommenders and to integrate other kind of recommenders to better support newcomers in OSS projects.

Advances in Software Product Quality Measurement and Its Applications in Software Evolution
Péter Hegedűs
(University of Szeged, Hungary)
The main results presented in this work, a synopsis of the connected PhD dissertation, are related to software product quality modeling and measurement as well as to the application of the newly proposed methods, tools and techniques in software evolution. All the novel theoretical results and models were thoroughly validated via empirical case studies and successfully applied in practice. The thesis result statements can be grouped into three major points: (i) system-level software quality models; (ii) source code element-level software quality models; (iii) applications of the proposed quality models. Some of the methods and tools presented in the thesis have been utilized in Hungarian and international R&D projects as well as by the industrial partners of the Software Engineering Department of the University of Szeged.

Pre-Doctoral
Mon, Sep 28, 15:30 - 17:00, GW2 B2890 (Chair: Eleni Stroulia; Massimiliano Di Penta)

Treating Software Quality as a First-Class Entity
Yuriy Tymchuk
(University of Lugano, Switzerland)
Quality is a crucial property of any software system and consists of many aspects. On the one hand, quality measures how well a piece of software satisfies its functional requirements. On the other hand, it captures how easy it is to understand, test and modify a software system. While functional requirements are provided by the product owner, maintainability of software is often underestimated. Currently software quality is either assessed by experts, or presented as a list of rule violations reported by some kind of static analyzer. Both these approaches are working with a sense of quality outside of the software itself. We envision quality as a first-class entity of a software system, a concept that similarly to the functionality is persistent within the software itself. We believe that each entity or a group of software entities should be able to tell about its quality, reasons of bad smells and ways to resolve them. This concept will allow to build quality aware tools for each step of the software development lifecycle. On our way to the concept of quality as a first class entity, we have created a code review approach where software quality is the main concern. A reviewer makes decisions and takes actions based on the quality of the reviewed system. We plan to continue our research by integrating advanced quality rules into our tools and devising new approaches to represent quality and integrate it into everyday workflow. We started to develop a layer on top of a software model responsible for the quality feedback and allowing to develop quality-aware IDE plugins.

Detection Strategies of Smells in Web Software Development
Maurício F. Aniche
(University of São Paulo, Brazil)
Web application development uses many technologies and programming languages, both on the server side and on the client side. Maintaining the heterogeneous source code base is not easy, as each technology contains its own set of best practices and standards. Therefore, developers must be aware of diverse technologies' and languages' best practices, and quickly identify them in their codebases. To achieve that, we propose a set of detection strategies to automatically identify the presence (or ausence) of known bad web development practices. Our first implemented detection strategy enabled us to understand the feasibility of such work, and confirmed its usefulness for web developers.

Code Smells in Highly Configurable Software
Wolfram Fenske
(University of Magdeburg, Germany)
Modern software systems are increasingly configurable. Conditional compilation based on C preprocessor directives (i. e., #ifdefs) is a popular variability mechanism to implement this configurability in source code. Although C preprocessor usage has been subject to repeated criticism, with regard to variability implementation, there is no thorough understanding of which patterns are particularly harmful. Specifically, we lack empirical evidence of how frequently reputedly bad patterns occur in practice and which negative effect they have. For object-oriented software, in contrast, code smells are commonly used to describe source code that exhibits known design flaws, which negatively affect understandability or changeability. Established code smells, however, have no notion of variability. Consequently, they cannot characterize flawed patterns of variability implementation. The goal of my research is therefore to create a catalog of variability-aware code smells. I will collect empirical proof of how frequently these smells occur and what their negative impact is on understandability, changeability, and fault-proneness of affected code. Moreover, I will develop techniques to detect variability-aware code smells automatically and reliably.

A Model-Based Approach to Software Refactoring
Ioana Verebi
(Politehnica University of Timisoara, Romania)
Refactoring is a key activity for any software system, as it ensures that the system is easily maintainable and extensible. However, complex refactorings (restructurings) are largely performed by hand, as there are no automated means of chaining existent basic refactorings. In addition, developers cannot quickly and safely evaluate the effects of a restructuring solution over another. In this context, we introduce a model- based approach to software refactoring, which provides an easy and safe way to explore restructuring alternatives. Restructurings are written as a composition of low-level model transformations, making them reusable in different complex refactorings. In order to support our approach, we implemented a tool named reFactor, which aims to bridge the gap between design flaw detection and correction. It detects design problems and offers a platform to compose model transformations into composite restructurings, while permanently monitoring the overall quality of the code.

ICSME 2015 – Proceedings

Frontmatter

Technical Research Track

Developers Tue, Sep 29, 10:40 - 12:20, GW2 B3009 (Chair: Michael Godfrey)

Program Comprehension Tue, Sep 29, 13:50 - 15:30, GW2 B3009 (Chair: Denys Poshyvanyk)

Software Quality Tue, Sep 29, 16:00 - 17:40, GW2 B3009 (Chair: Alexander Serebrenik)

Modularity Wed, Sep 30, 08:30 - 10:10, GW2 B3009 (Chair: Giuseppe Scanniello)

Program Analysis Wed, Sep 30, 10:40 - 12:20, GW2 B3009 (Chair: Arpad Beszedes)

Refactoring Wed, Sep 30, 13:50 - 15:30, GW2 B3009 (Chair: Romain Robbes)

Code Mining and Recommendation Thu, Oct 1, 10:40 - 12:20, GW2 B3009 (Chair: David Shepherd)

Mobile Applications Thu, Oct 1, 13:50 - 15:30, GW2 B3009 (Chair: Peng Xin)

Tool Demo Track Wed, Sep 30, 13:50 - 15:30, GW2 B2890 (Chair: Collin McMillan; Nicholas A. Kraft)

Industry Track

Industry Experience Tue, Sep 29, 10:40 - 12:20, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Developer Studies Tue, Sep 29, 13:50 - 15:30, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Software Quality Wed, Sep 30, 10:40 - 12:20, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Software Reengineering Tue, Sep 29, 16:00 - 17:40, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Early Research Achievements Track

Defects and Refactoring Wed, Sep 30, 08:30 - 10:10, GW2 B2890 (Chair: Coen De Roover; Foutse Khomh; Lin Tan; Serge Demeyer)

Social and Developers Thu, Oct 1, 10:40 - 12:20, GW2 B2890 (Chair: Fabian Beck; Latifa Guerrouj)

Maintenance and Analysis Thu, Oct 1, 13:50 - 15:30, GW2 B2890 (Chair: Ferenc Rudolf; Giuseppe Scanniello)

Doctoral Symposium

Post-Doctoral Mon, Sep 28, 13:30 - 15:00, GW2 B2890 (Chair: Eleni Stroulia; Massimiliano Di Penta)

Pre-Doctoral Mon, Sep 28, 15:30 - 17:00, GW2 B2890 (Chair: Eleni Stroulia; Massimiliano Di Penta)

Developers
Tue, Sep 29, 10:40 - 12:20, GW2 B3009 (Chair: Michael Godfrey)

Program Comprehension
Tue, Sep 29, 13:50 - 15:30, GW2 B3009 (Chair: Denys Poshyvanyk)

Software Quality
Tue, Sep 29, 16:00 - 17:40, GW2 B3009 (Chair: Alexander Serebrenik)

Modularity
Wed, Sep 30, 08:30 - 10:10, GW2 B3009 (Chair: Giuseppe Scanniello)

Program Analysis
Wed, Sep 30, 10:40 - 12:20, GW2 B3009 (Chair: Arpad Beszedes)

Refactoring
Wed, Sep 30, 13:50 - 15:30, GW2 B3009 (Chair: Romain Robbes)

Code Mining and Recommendation
Thu, Oct 1, 10:40 - 12:20, GW2 B3009 (Chair: David Shepherd)

Mobile Applications
Thu, Oct 1, 13:50 - 15:30, GW2 B3009 (Chair: Peng Xin)

Tool Demo Track
Wed, Sep 30, 13:50 - 15:30, GW2 B2890 (Chair: Collin McMillan; Nicholas A. Kraft)

Industry Experience
Tue, Sep 29, 10:40 - 12:20, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Developer Studies
Tue, Sep 29, 13:50 - 15:30, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Software Quality
Wed, Sep 30, 10:40 - 12:20, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Software Reengineering
Tue, Sep 29, 16:00 - 17:40, GW2 B2890 (Chair: Jochen Quante; David Shepherd)

Defects and Refactoring
Wed, Sep 30, 08:30 - 10:10, GW2 B2890 (Chair: Coen De Roover; Foutse Khomh; Lin Tan; Serge Demeyer)

Social and Developers
Thu, Oct 1, 10:40 - 12:20, GW2 B2890 (Chair: Fabian Beck; Latifa Guerrouj)

Maintenance and Analysis
Thu, Oct 1, 13:50 - 15:30, GW2 B2890 (Chair: Ferenc Rudolf; Giuseppe Scanniello)

Post-Doctoral
Mon, Sep 28, 13:30 - 15:00, GW2 B2890 (Chair: Eleni Stroulia; Massimiliano Di Penta)

Pre-Doctoral
Mon, Sep 28, 15:30 - 17:00, GW2 B2890 (Chair: Eleni Stroulia; Massimiliano Di Penta)