ICSME 2015 – Proceedings

Software History under the Lens: A Study on Why and How Developers Examine It
Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey
(Oregon State University, USA; University of Illinois at Urbana-Champaign, USA)
Despite software history being indispensable for developers, there is little empirical knowledge about how they examine software history. Without such knowledge, researchers and tool builders are in danger of making wrong assumptions and building inadequate tools. In this paper we present an in-depth empirical study about the motivations developers have for examining software history, the strategies they use, and the challenges they encounter. To learn these, we interviewed 14 experienced developers from industry, and then extended our findings by surveying 217 developers. We found that history does not begin with the latest commit but with uncommitted changes. Moreover, we found that developers had different motivations for examining recent and old history. Based on these findings we propose 3-LENS HISTORY, a novel unified model for reasoning about software history.

To Fix or to Learn? How Production Bias Affects Developers' Information Foraging during Debugging
David Piorkowski, Scott D. Fleming, Christopher Scaffidi, Margaret Burnett, Irwin Kwan, Austin Z. Henley, Jamie Macbeth, Charles Hill, and Amber Horvath
(Oregon State University, USA; University of Memphis, USA; Clemson University, USA)
Developers performing maintenance activities must balance their efforts to learn the code vs. their efforts to actually change it. This balancing act is consistent with the "production bias" that, according to Carroll's minimalist learning theory, generally affects software users during everyday tasks. This suggests that developers' focus on efficiency should have marked effects on how they forage for the information they think they need to fix bugs. To investigate how developers balance fixing versus learning during debugging, we conducted the first empirical investigation of the interplay between production bias and information foraging. Our theory-based study involved 11 participants: half tasked with fixing a bug, and half tasked with learning enough to help someone else fix it. Despite the subtlety of difference between their tasks, participants foraged remarkably differently-- making foraging decisions from different types of "patches," with different types of information, and succeeding with different foraging tactics.

Developers' Perception of Co-change Patterns: An Empirical Study
Luciana L. Silva, Marco Tulio Valente, Marcelo de A. Maia, and Nicolas Anquetil
(Federal University of Minas Gerais, Brazil; Federal Institute of Triangulo Mineiro, Brazil; Federal University of Uberlândia, Brazil; INRIA, France)
Co-change clusters are groups of classes that frequently change together. They are proposed as an alternative modular view, which can be used to assess the traditional decomposition of systems in packages. To investigate developer's perception of co-change clusters, we report in this paper a study with experts on six systems, implemented in two languages. We mine 102 co-change clusters from the version history of such systems, which are classified in three patterns regarding their projection to the package structure: Encapsulated, Crosscutting, and Octopus. We then collect the perception of expert developers on such clusters, aiming to ask two central questions: (a) what concerns and changes are captured by the extracted clusters? (b) do the extracted clusters reveal design anomalies? We conclude that Encapsulated Clusters are often viewed as healthy designs and that Crosscutting Clusters tend to be associated to design anomalies. Octopus Clusters are normally associated to expected class distributions, which are not easy to implement in an encapsulated way, according to the interviewed developers.

When and Why Developers Adopt and Change Software Licenses
Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel M. German, and Denys Poshyvanyk

(College of William and Mary, USA; Free University of Bolzano, Italy; University of Sannio, Italy; University of Victoria, Canada)
Software licenses legally govern the way in which developers can use, modify, and redistribute a particular system. While previous studies either investigated licensing through mining software repositories or studied licensing through FOSS reuse, we aim at understanding the rationale behind developers' decisions for choosing or changing software at understanding the rationale behind developers' decisions for choosing or changing software licensing by surveying open source developers. In this paper, we analyze when developers consider licensing, the reasons why developers pick a license for their project, and the factors that influence licensing changes. Additionally, we explore the licensing-related problems that developers experienced and expectations they have for licensing support from forges (e.g., GitHub). Our investigation involves, on one hand, the analysis of the commit history of 16,221 Java open source projects to identify the commits where licenses were added or changed. On the other hand, it consisted of a survey---in which 138 developers informed their involvement in licensing-related decisions and 52 provided deeper insights about the rationale behind the actions that they had undertaken. The results indicate that developers adopt licenses early in the project's development and change licensing after some period of development (if at all). We also found that developers have inherent biases with respect to software licensing. Additionally, reuse---whether by a non-contributor or for commercial purposes---is a dominant reason why developers change licenses of their systems. Finally, we discuss potential areas of research that could ameliorate the difficulties that software developers are facing with regard to licensing issues of their software systems.

Program Comprehension
Tue, Sep 29, 13:50 - 15:30, GW2 B3009 (Chair: Denys Poshyvanyk)

Investigating Naming Convention Adherence in Java References
Simon Butler, Michel Wermelinger, and Yijun Yu
(Open University, UK)
Naming conventions can help the readability and comprehension of code, and thus the onboarding of new developers. Conventions also provide cues that help developers and tools extract information from identifier names to support software maintenance. Tools exist to automatically check naming conventions but they are often limited to simple checks, e.g. regarding typography. The adherence to more elaborate conventions, such as the use of noun and verbal phrases in names, is not checked. We present Nominal, a naming convention checking library for Java that allows the declarative specification of conventions regarding typography and the use of abbreviations and phrases. To test Nominal, and to investigate the extent to which developers follow conventions, we extract 3.5 million reference - field, formal argument and local variable - name declarations from 60 FLOSS projects and determine their adherence to two well-known Java naming convention guidelines that give developers scope to choose a variety of forms of name, and sometimes offer conflicting advice. We found developers largely follow naming conventions, but adherence to specific conventions varies widely.

Developing a Model of Loop Actions by Mining Loop Characteristics from a Large Code Corpus
Xiaoran Wang, Lori Pollock, and K. Vijay-Shanker
(University of Delaware, USA)
Some high level algorithmic steps require more than one statement to implement, but are not large enough to be a method on their own. Specifically, many algorithmic steps (e.g., count, compare pairs of elements, find the maximum) are implemented as loop structures, which lack the higher level abstraction of the action being performed, and can negatively affect both human readers and automatic tools. Additionally, in a study of 14,317 projects, we found that less than 20% of loops are documented to help readers. In this paper, we present a novel automatic approach to identify the high level action implemented by a given loop. We leverage the available, large source of high-quality open source projects to mine loop characteristics and develop an action identification model. We use the model and feature vectors extracted from loop code to automatically identify the high level actions implemented by loops. We have evaluated the accuracy of the loop action identification and coverage of the model over 7159 open source programs. The results show great promise for this approach to automatically insert internal comments and provide additional higher level naming for loop actions to be used by tools such as code search.

Delta Extraction: An Abstraction Technique to Comprehend Why Two Objects Could Be Related
Naoya Nitta and Tomohiro Matsuoka
(Konan University, Japan)
In an execution of a large scale program, even a simple observable behavior may be generated by a wide range of the source code. To comprehend how such a behavior is implemented in the code, a debugger would be helpful. However, when using a debugger, developers often encounter several types of cumbersome tasks and are often confused by the huge and complicated runtime information. To support such a debugger-based comprehension task, we propose an abstraction technique of runtime information, named delta, and present a delta extraction and visualization tool. Basically, a delta is defined for two linked objects in an object-oriented program's execution. It intuitively represents the reason why these objects could be related in the execution, and it can hide the details of how these objects were related. We have conducted experiments on four subject tasks from two real-world systems to evaluate how appropriately an extracted delta can answer the `why' question and how long the tool can reduce the working time to answer the question. The results show that each delta can successfully answer the question and a tens-of-minutes to one-hour debugger-based task can be reduced by extracting a delta.

Modeling Changeset Topics for Feature Location
Christopher S. Corley, Kelly L. Kashuda, and Nicholas A. Kraft
(University of Alabama, USA; ABB Corporate Research, USA)
Feature location is a program comprehension activity in which a developer inspects source code to locate the classes or methods that implement a feature of interest. Many feature location techniques (FLTs) are based on text retrieval models, and in such FLTs it is typical for the models to be trained on source code snapshots. However, source code evolution leads to model obsolescence and thus to the need to retrain the model from the latest snapshot. In this paper, we introduce a topic-modeling-based FLT in which the model is built incrementally from source code history. By training an online learning algorithm using changesets, the FLT maintains an up-to-date model without incurring the non-trivial computational cost associated with retraining traditional FLTs. Overall, we studied over 600 defects and features from 4 open-source Java projects. We also present a historical simulation that demonstrates how the FLT performs as a project evolves. Our results indicate that the accuracy of a changeset-based FLT is similar to that of a snapshot-based FLT, but without the retraining costs.

Info

Software Quality
Tue, Sep 29, 16:00 - 17:40, GW2 B3009 (Chair: Alexander Serebrenik)

Four Eyes Are Better Than Two: On the Impact of Code Reviews on Software Quality
Gabriele Bavota and Barbara Russo
(Free University of Bolzano, Italy)
Code review is advocated as one of the best practices to improve software quality and reduce the likelihood of introducing defects during code change activities. Recent research has shown how code components having a high review coverage (i.e., a high proportion of reviewed changes) tend to be less involved in post-release fixing activities. Yet the relationship between code review and bug introduction or the overall software quality is still largely unexplored.
This paper presents an empirical, exploratory study on three large open source systems that aims at investigating the influence of code review on (i) the chances of inducing bug fixes and (ii) the quality of the committed code components, as assessed by code coupling, complexity, and readability.
Findings show that unreviewed commits (i.e., commits that did not undergo a review process) have over two times more chances of introducing bugs than reviewed commits (i.e., commits that underwent a review process). In addition, code committed after review has a substantially higher readability with respect to unreviewed code.

A Comparative Study on the Bug-Proneness of Different Types of Code Clones
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
Code clones are defined to be the exactly or nearly similar code fragments in a software system's code-base. The existing clone related studies reveal that code clones are likely to introduce bugs and inconsistencies in the code-base. However, although there are different types of clones, it is still unknown which types of clones have a higher likeliness of introducing bugs to the software systems and so, should be considered more important for managing with techniques such as refactoring or tracking. With this focus, we performed a study that compared the bug-proneness of the major clone-types: Type 1, Type 2, and Type 3. According to our experimental results on thousands of revisions of seven diverse subject systems, Type 3 clones exhibit the highest bug-proneness among the three clone-types. The bug-proneness of Type 1 clones is the lowest. Also, Type 3 clones have the highest likeliness of being co-changed consistently while experiencing bug-fixing changes. Moreover, the Type 3 clones that experience bug-fixes have a higher possibility of evolving following a Similarity Preserving Change Pattern (SPCP) compared to the bug-fix clones of the other two clone-types. From the experimental results it is clear that Type 3 clones should be given a higher priority than the other two clone-types when making clone management decisions. We believe that our study provides useful implications for ranking clones for refactoring and tracking.

An Empirical Study of Bugs in Test Code
Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah
(University of British Columbia, Canada)
Testing aims at detecting (regression) bugs in production code. However, testing code is just as likely to contain bugs as the code it tests. Buggy test cases can silently miss bugs in the production code or loudly ring false alarms when the production code is correct. We present the first empirical study of bugs in test code to characterize their prevalence and root cause categories. We mine the bug repositories and version control systems of 211 Apache Software Foundation (ASF) projects and find 5,556 test-related bug reports. We (1) compare properties of test bugs with production bugs, such as active time and fixing effort needed, and (2) qualitatively study 443 randomly sampled test bug reports in detail and categorize them based on their impact and root causes. Our results show that (1) around half of all the projects had bugs in their test code; (2) the majority of test bugs are false alarms, i.e., test fails while the production code is correct, while a minority of these bugs result in silent horrors, i.e., test passes while the production code is incorrect; (3) incorrect and missing assertions are the dominant root cause of silent horror bugs; (4) semantic (25%), flaky (21%), environment-related (18%) bugs are the dominant root cause categories of false alarms; (5) the majority of false alarm bugs happen in the exercise portion of the tests, and (6) developers contribute more actively to fixing test bugs and test bugs are fixed sooner compared to production bugs. In addition, we evaluate whether existing bug detection tools can detect bugs in test code.

Investigating Code Review Quality: Do People and Participation Matter?
Oleksii Kononenko, Olga Baysal, Latifa Guerrouj, Yaxin Cao, and Michael W. Godfrey

(University of Waterloo, Canada; Université de Montréal, Canada; École de Technologie Supérieure, Canada)
Code review is an essential element of any mature software development project; it aims at evaluating code contributions submitted by developers. In principle, code review should improve the quality of code changes (patches) before they are committed to the project's master repository. In practice, bugs are sometimes unwittingly introduced during this process.
In this paper, we report on an empirical study investigating code review quality for Mozilla, a large open-source project. We explore the relationships between the reviewers' code inspections and a set of factors, both personal and social in nature, that might affect the quality of such inspections. We applied the SZZ algorithm to detect bug-inducing changes that were then linked to the code review information extracted from the issue tracking system. We found that 54% of the reviewed changes introduced bugs in the code. Our findings also showed that both personal metrics, such as reviewer workload and experience, and participation metrics, such as the number of involved developers, are associated with the quality of the code review process.

Modularity
Wed, Sep 30, 08:30 - 10:10, GW2 B3009 (Chair: Giuseppe Scanniello)

Inter-smell Relations in Industrial and Open Source Systems: A Replication and Comparative Analysis
Aiko Yamashita, Marco Zanoni, Francesca Arcelli Fontana, and Bartosz Walter
(Oslo and Akershus University College of Applied Sciences, Norway; University of Milano-Bicocca, Italy; Poznan University of Technology, Poland)
The presence of anti-patterns and code smells can affect adversely software evolution and quality. Recent work has shown that code smells that appear together in the same file (i.e., collocated smells) can interact with each other, leading to various types of maintenance issues and/or to the intensification of negative effects. Moreover, it has been found that code smell interactions can occur across coupled files (i.e., coupled smells), with comparable negative effects as the interaction of same- file (collocated) smells. Different inter-smell relations have been described in previous work, yet only few studies have evaluated them empirically. This study attempts to replicate the findings from previous work on inter-smell relations by analyzing larger systems, and by including both industrial and open source ones. We also include the analysis of coupled smells in addition to collocated smells, to achieve a more complete picture on the landscape of inter-smell relations. Our observations suggest that if coupled smells are not considered, one may risk increasing the number of false negatives when analysing inter-smell relations. A major finding is that patterns of inter-smell relations vary between open source and industrial systems. This suggests that contextual variables (e.g., domain, development mode, environment) should be considered in further studies on code smells.

Evaluating Clone Detection Tools with BigCloneBench
Jeffrey Svajlenko and Chanchal K. Roy
(University of Saskatchewan, Canada)
Many clone detection tools have been proposed in the literature. However, our knowledge of their performance in real software systems is limited, particularly their recall. In this paper, we use our big data clone benchmark, BigCloneBench, to evaluate the recall of ten clone detection tools. BigCloneBench is a collection of eight million validated clones within IJaDataset-2.0, a big data software repository containing 25,000 open-source Java systems. BigCloneBench contains both intra-project and inter-project clones of the four primary clone types. We use this benchmark to evaluate the recall of the tools per clone type and across the entire range of clone syntactical similarity. We evaluate the tools for both single-system and cross-project detection scenarios. Using multiple clone-matching metrics, we evaluate the quality of the tools' reporting of the benchmark clones with respect to refactoring and automatic clone analysis use-cases. We compare these real-world results against our Mutation and Injection Framework, a synthetic benchmark, to reveal deeper understanding of the tools. We found that the tools have strong recall for Type-1 and Type-2 clones, as well as Type-3 clones with high syntactical similarity. The tools have weaker detection of clones with lower syntactical similarity.

Uncovering Dependence Clusters and Linchpin Functions
David Binkley, Árpád Beszédes

, Syed Islam, Judit Jász, and Béla Vancsics

(Loyola University Maryland, USA; University of Szeged, Hungary; University of East London, UK)
Dependence clusters are (maximal) collections of mutually dependent source code entities according to some dependence relation. Their presence in software complicates many maintenance activities including testing, refactoring, and feature extraction. Despite several studies finding them common in production code, their formation, identification, and overall structure are not well understood, partly because of challenges in approximating true dependences between program entities. Previous research has considered two approximate dependence relations: a fine-grained statement-level relation using control and data dependences from a program’s System Dependence Graph and a coarser relation based on function-level control-flow reachability. In principal, the first is more expensive and more precise than the second. Using a collection of twenty programs, we present an empirical investigation of the clusters identified by these two approaches. In support of the analysis, we consider hybrid cluster types that works at the coarser function-level but is based on the higher-precision statement-level dependences. The three types of clusters are compared based on their slice sets using two clustering metrics. We also perform extensive analysis of the programs to identify linchpin functions – functions primarily responsible for holding a cluster together. Results include evidence that the less expensive, coarser approaches can often be used as effective proxies for the more expensive, finer-grained approaches. Finally, the linchpin analysis shows that linchpin functions can be effectively and automatically identified.

Forked and Integrated Variants in an Open-Source Firmware Project
Ștefan Stănciulescu, Sandro Schulze, and Andrzej Wąsowski

(IT University of Copenhagen, Denmark; TU Braunschweig, Germany)
Code cloning has been reported both on small (code fragments) and large (entire projects) scale. Cloning-in-the-large, or forking, is gaining ground as a reuse mechanism thanks to availability of better tools for maintaining forked project variants, hereunder distributed version control systems and interactive source management platforms such as Github. We study advantages and disadvantages of forking using the case of Marlin, an open source firmware for 3D printers. We find that many problems and advantages of cloning do translate to forking. Interestingly, the Marlin community uses both forking and integrated variability management (conditional compilation) to create variants and features. Thus, studying it increases our understanding of the choice between integrated and clone-based variant management. It also allows us to observe mechanisms governing source code maturation, in particular when, why and how feature implementations are migrated from forks to the main integrated platform. We believe that this understanding will ultimately help development of tools mixing clone-based and integrated variant management, combining the advantages of both.

Info

Program Analysis
Wed, Sep 30, 10:40 - 12:20, GW2 B3009 (Chair: Arpad Beszedes)

Towards Automating Dynamic Analysis for Behavioral Design Pattern Detection
Andrea De Lucia

, Vincenzo Deufemia, Carmine Gravino, and Michele Risi
(University of Salerno, Italy)
The detection of behavioral design patterns is more accurate when a dynamic analysis is performed on the candidate instances identified statically. Such a dynamic analysis requires the monitoring of the candidate instances at run-time through the execution of a set of test cases. However, the definition of such test cases is a time-consuming task if performed manually, even more, when the number of candidate instances is high and they include many false positives. In this paper we present the results of an empirical study aiming at assessing the effectiveness of dynamic analysis based on automatically generated test cases in behavioral design pattern detection. The study considered three behavioral design patterns, namely State, Strategy, and Observer, and three publicly available software systems, namely JHotDraw 5.1, QuickUML 2001, and MapperXML 1.9.7. The results show that dynamic analysis based on automatically generated test cases improves the precision of design pattern detection tools based on static analysis only. As expected, this improvement in precision is achieved at the expenses of recall, so we also compared the results achieved with automatically generated test cases with the more expensive but also more accurate results achieved with manually built test cases. The results of this analysis allowed us to highlight costs and benefits of automating dynamic analysis for design pattern detection.

Practical and Accurate Pinpointing of Configuration Errors using Static Analysis
Zhen Dong, Artur Andrzejak, and Kun Shao
(University of Heidelberg, Germany; Hefei University of Technology, China)
Software misconfigurations are responsible for a substantial part of today's system failures, causing about one-quarter of all customer-reported issues. Identifying their root causes can be costly in terms of time and human resources. We present an approach to automatically pinpoint such defects without error reproduction. It uses static analysis to infer the correlation degree between each configuration option and program sites affected by an exception. The only run-time information required by our approach is the stack trace of a failure. This is an essential advantage compared to existing approaches which require to reproduce errors or to provide testing oracles. We evaluate our approach on 29 errors from 4 configurable software programs, namely JChord, Randoop, Hadoop, and Hbase. Our approach can successfully diagnose 27 out of 29 errors. For 20 errors, the failure-inducing configuration option is ranked first.

Info

Deterministic Dynamic Race Detection Across Program Versions
Sri Varun Poluri and Murali Krishna Ramanathan
(Indian Institute of Science, India)
Dynamic race detectors operate by analyzing ex- ecution traces of programs to detect races in multithreaded programs. As the thread interleavings influence these traces, the sets of races detected across multiple runs of the detector can vary. This non-determinism without any change in program source and input can reduce programmer confidence in using the detector. From an organizational perspective, a defect needs to be reported consistently until it is fixed. Non-determinism complicates the work flow and the problem is further exacerbated with modifications to the program. In this paper, we propose a framework for deterministic dynamic race detection that ensures detection of races until they are fixed, even across program versions. The design attempts to preserve the racy behavior with changes to the program source that include addition (and deletion) of locks and shared memory accesses. We record, transform and replay the schedules across program versions intelligently to achieve this goal. We have implemented a framework, named STABLER, and evaluated our ideas by applying popular race detectors (DJIT+, FastTrack) on different versions of many open-source multi- threaded Java programs. Our experimental results show that we are able to detect all the unfixed races consistently across major releases of the program. For both the detectors, the maximum incurred slowdown, with our framework, for record and replay is 1.2x and 2.29x respectively. We also perform user experiments where volunteers fixed a significant number of races. In spite of these changes, our framework is effective in its ability to detect all the unfixed races.

Program Specialization and Verification using File Format Specifications
Raveendra Kumar Medicherla, Raghavan Komondoor, and S. Narendran
(Tata Consultancy Services, India; Indian Institute of Science, India)
Programs that process data that reside in files are widely used in varied domains, such as banking, healthcare, and web-traffic analysis. Precise static analysis of these programs in the context of software transformation and verification tasks is a challenging problem. Our key insight is that static analysis of file-processing programs can be made more useful if knowledge of the input file formats of these programs is made available to the analysis. We instantiate this idea to solve two practical problems -- specializing the code of a program to a given ``restricted'' input file format, and verifying if a program ``conforms'' to a given input file format. We then discuss an implementation of our approach, and also empirical results on a set of real and realistic programs. The results are very encouraging in the terms of both scalability as well as precision of the approach.

Refactoring
Wed, Sep 30, 13:50 - 15:30, GW2 B3009 (Chair: Romain Robbes)

An Empirical Evaluation of Ant Build Maintenance using Formiga
Ryan Hardt and Ethan V. Munson
(University of Wisconsin-Eau Claire, USA; University of Wisconsin-Milwaukee, USA)
As a software project evolves, so does its build system. Significant effort is necessary to maintain the build system to cope with this evolution, in part because changes to source code often require parallel changes in the build system. Our tool, Formiga, is a build maintenance and dependency discovery tool for the Ant build system. Formiga's primary uses are to automate build changes as the source code is updated, to identify the build dependencies within a software project, and to assist with build refactoring. Formiga is implemented as an IDE plugin, which allows it to recognize when project resources are updated and automatically update the build system accordingly. A controlled experiment was conducted to assess Formiga's ability to assist developers with build maintenance. Subjects responded to scenarios in which various forms of build maintenance and/or knowledge about deliverables and their contents were requested. Subjects completed eleven different build maintenance tasks in both an experimental condition using Formiga and a control condition using only conventional IDE services. The study used a balanced design relative to both task order and use of Formiga. This design also ensured that order balancing was not confounded with the subjects' level of Ant expertise. Formiga was shown to significantly reduce the time required to perform build maintenance while increasing the correctness with which it can be performed for both novice and experienced developers.

Scripting Parametric Refactorings in Java to Retrofit Design Patterns
Jongwook Kim, Don Batory, and Danny Dig
(University of Texas at Austin, USA; Oregon State University, USA)
Retrofitting design patterns into a program by hand is tedious and error-prone. A programmer must distinguish refactorings that are provided by an Integrated Development Environment (IDE) from those that must be realized manually, determine a precise sequence of refactorings to apply, and perform this sequence repetitively to a laborious degree. We designed, implemented, and evaluated Reflective Refactoring (R2), a Java package to automate the creation of classical design patterns (Visitor, Abstract Factory, etc.), their inverses, and variants. We encoded 18 out of 23 Gang-of-Four design patterns as R2 scripts and explain why the remaining are inappropriate for refactoring engines. We evaluate the productivity and scalability of R2 with a case study of 6 real-world applications. In one case, R2 automatically created a Visitor with 276 visit methods by invoking 554 Eclipse refactorings in 10 minutes - an achievement that could not be done manually. R2 also sheds light on why refactoring correctness, expressiveness, and speed are critical issues for scripting in next-generation refactoring engines.

Info

System Specific, Source Code Transformations
Gustavo Santos, Nicolas Anquetil, Anne Etien, Stéphane Ducasse

, and Marco Tulio Valente
(INRIA, France; Federal University of Minas Gerais, Brazil)
During its lifetime, a software system might undergo a major transformation effort in its structure, for example to mi- grate to a new architecture or bring some drastic improvements to the system. Particularly in this context, we found evidences that some sequences of code changes are made in a systematic way. These sequences are composed of small code transformations (e.g., create a class, move a method) which are repeatedly applied to groups of related entities (e.g., a class and some of its methods). A typical example consists in the systematic introduction of a Factory design pattern on the classes of a package. We define these sequences as transformation patterns. In this paper, we identify examples of transformation patterns in real world software systems and study their properties: (i) they are specific to a system; (ii) they were applied manually; (iii) they were not always applied to all the software entities which could have been transformed; (iv) they were sometimes complex; and (v) they were not always applied in one shot but over several releases. These results suggest that transformation patterns could benefit from automated support in their application. From this study, we propose as future work to develop a macro recorder, a tool with which a developer records a sequence of code transformations and then automatically applies them in other parts of the system as a customizable, large-scale transformation operator.

A Decision Support System to Refactor Class Cycles
Tosin Daniel Oyetoyan, Daniela Soares Cruzes, and Christian Thurmann-Nielsen
(NTNU, Norway; SINTEF, Norway; EVRY, Norway)
Many studies show that real-world systems are riddled with large dependency cycles among software classes. Dependency cycles are claimed to affect quality factors such as testability, extensibility, modifiability, and reusability. Recent studies reveal that most defects are concentrated in classes that are in and near cycles. In this paper, we (1) propose a new metric: IRCRSS based on the Class Reachability Set Size (CRSS) to identify the reduction ratio between the CRSS of a class and its interfaces, and (2) presents a cycle-breaking decision support system (CB-DSS) that implements existing design approaches in combination with class edge contextual data. Evaluations of multiple systems show that (1) the IRCRSS metric can be used to identify fewer classes as candidates for breaking large cycles, thus reducing refactoring effort, and (2) the CB-DSS can assist software engineers to plan restructuring of classes involved in complex dependency cycles.

Code Mining and Recommendation
Thu, Oct 1, 10:40 - 12:20, GW2 B3009 (Chair: David Shepherd)

On the Role of Developer's Scattered Changes in Bug Prediction
Dario Di Nucci, Fabio Palomba

, Sandro Siravo, Gabriele Bavota, Rocco Oliveto, and Andrea De Lucia

(University of Salerno, Italy; University of Molise, Italy; Free University of Bolzano, Italy)
The importance of human-related factors in the introduction of bugs has recently been the subject of a number of empirical studies. However, such factors have not been captured yet in bug prediction models which simply exploit product metrics or process metrics based on the number and type of changes or on the number of developers working on a software component. Previous studies have demonstrated that focused developers are less prone to introduce defects than non focused developers. According to this observation, software components changed by focused developers should also be less error prone than software components changed by less focused developers. In this paper we capture this observation by measuring the structural and semantic scattering of changes performed by the developers working on a software component and use these two measures to build a bug prediction model. Such a model has been evaluated on five open source systems and compared with two competitive prediction models: the first exploits the number of developers working on a code component in a given time period as predictor, while the second is based on the concept of code change entropy. The achieved results show the superiority of our model with respect to the two competitive approaches, and the complementarity of the defined scattering measures with respect to standard predictors commonly used in the literature.

Info

How Do Developers React to API Evolution? The Pharo Ecosystem Case
André Hora, Romain Robbes, Nicolas Anquetil, Anne Etien, Stéphane Ducasse

, and Marco Tulio Valente
(Federal University of Minas Gerais, Brazil; University of Chile, Chile; INRIA, France)
Software engineering research now considers that no system is an island, but it is part of an ecosystem involving other systems, developers, users, hardware, . . . When one system (e.g., a framework) evolves, its clients often need to adapt. Client developers might need to adapt to functionalities, client systems might need to be adapted to a new API, client users might need to adapt to a new User Interface. The consequences of such changes are yet unclear, what proportion of the ecosystem might be expected to react, how long might it take for a change to diffuse in the ecosystem, do all clients react in the same way? This paper reports on an exploratory study aimed at observing API evolution and its impact on a large-scale software ecosystem, Pharo, which has about 3,600 distinct systems, more than 2,800 contributors, and six years of evolution. We analyze 118 API changes and answer research questions regarding the magnitude, duration, extension, and consistency of such changes in the ecosystem. The results of this study help to characterize the impact of API evolution in large software ecosystems, and provide the basis to better understand how such impact can be alleviated.

Who Should Review This Change?: Putting Text and File Location Analyses Together for More Accurate Recommendations
Xin Xia, David Lo

, Xinyu Wang, and Xiaohu Yang

(Zhejiang University, China; Singapore Management University, Singapore)
Software code review is a process of developers inspecting new code changes made by others, to evaluate their quality and identify and fix defects, before integrating them to the main branch of a version control system. Modern Code Review (MCR), a lightweight and tool-based variant of conventional code review, is widely adopted in both open source and proprietary software projects. One challenge that impacts MCR is the assignment of appropriate developers to review a code change. Considering that there could be hundreds of potential code reviewers in a software project, picking suitable reviewers is not a straightforward task. A prior study by Thongtanunam et al. showed that the difficulty in selecting suitable reviewers may delay the review process by an average of 12 days. In this paper, to address the challenge of assigning suitable reviewers to changes, we propose a hybrid and incremental approach TIE which utilizes the advantages of both text mining and a file location-based approach. To do this, TIE integrates an incremental text mining model which analyzes the textual contents in a review request, and a similarity model which measures the similarity of changed file paths and reviewed file paths. We perform a large-scale experiment on four open source projects, namely Android, OpenStack, QT, and LibreOffice, containing a total of 42,045 reviews. The experimental results show that on average TIE can achieve top-1, top-5, and top-10 accuracies, and Mean Reciprocal Rank (MRR) of 0.52, 0.79, 0.85, and 0.64 for the four projects, which improves the state-of-the-art approach RevFinder, proposed by Thongtanunam et al., by 61%, 23%, 8%, and 37%, respectively.

Exploring API Method Parameter Recommendations
Muhammad Asaduzzaman, Chanchal K. Roy, Samiul Monir, and Kevin A. Schneider
(University of Saskatchewan, Canada)
A number of techniques have been developed that support method call completion. However, there has been little research on the problem of method parameter completion. In this paper, we first present a study that helps us to understand how developers complete method parameters. Based on our observations, we developed a recommendation technique, called Parc, that collects parameter usage context using a source code localness property that suggests that developers tend to collocate related code fragments. Parc uses previous code examples together with contextual and static type analysis to recommend method parameters. Evaluating our technique against the only available state-of-the-art tool using a number of subject systems and different Java libraries shows that our approach has potential. We also explore the parameter recommendation support provided by the Eclipse Java Development Tools (JDT). Finally, we discuss limitations of our proposed technique and outline future research directions.

Info

Mobile Applications
Thu, Oct 1, 13:50 - 15:30, GW2 B3009 (Chair: Peng Xin)

How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall

(University of Zurich, Switzerland; University of Sannio, Italy; TU München, Germany)
App Stores, such as Google Play or the Apple Store, allow users to provide feedback on apps by posting review comments and giving star ratings. These platforms constitute a useful electronic mean in which application developers and users can productively exchange information about apps. Previous research showed that users feedback contains usage scenarios, bug reports and feature requests, that can help app developers to accomplish software maintenance and evolution tasks. However, in the case of the most popular apps, the large amount of received feedback, its unstructured nature and varying quality can make the identification of useful user feedback a very challenging task. In this paper we present a taxonomy to classify app reviews into categories relevant to software maintenance and evolution, as well as an approach that merges three techniques: (1) Natural Language Processing, (2) Text Analysis and (3) Sentiment Analysis to automatically classify app reviews into the proposed categories. We show that the combined use of these techniques allows to achieve better results (a precision of 75% and a recall of 74%) than results obtained using each technique individually (precision of 70% and a recall of 67%).

User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps
Fabio Palomba

, Mario Linares-Vásquez, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk

, and Andrea De Lucia

(University of Salerno, Italy; College of William and Mary, USA; Free University of Bolzano, Italy; University of Molise, Italy; University of Sannio, Italy)
Nowadays software applications, and especially mobile apps, undergo frequent release updates through app stores. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features. In this paper we show—by performing a study on 100 Android apps—how developers addressing user reviews increase their app’s success in terms of ratings. Specifically, we devise an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. The results indicate that developers implementing user reviews are rewarded in terms of ratings. This poses the need for specialized recommendation systems aimed at analyzing informative crowd reviews and prioritizing feedback to be satisfied in order to increase the apps success.

Info

What Are the Characteristics of High-Rated Apps? A Case Study on Free Android Applications
Yuan Tian, Meiyappan Nagappan, David Lo

, and Ahmed E. Hassan

(Singapore Management University, Singapore; Rochester Institute of Technology, USA; Queen's University, Canada)
The tremendous rate of growth in the mobile app market over the past few years has attracted many developers to build mobile apps. However, while there is no shortage of stories of how lone developers have made great fortunes from their apps, the majority of developers are struggling to break even. For those struggling developers, knowing the ``DNA'' (i.e., characteristics) of high-rated apps is the first step towards successful development and evolution of their apps. In this paper, we investigate 28 factors along eight dimensions to understand how high-rated apps are different from low-rated apps. We also investigate what are the most influential factors by applying a random-forest classifier to identify high-rated apps. Through a case study on 1,492 high-rated and low-rated free apps mined from the Google Play store, we find that high-rated apps are statistically significantly different in 17 out of the 28 factors that we considered. Our experiment also shows that the size of an app, the number of promotional images that the app displays on its web store page, and the target SDK version of an app are the most influential factors.

GreenAdvisor: A Tool for Analyzing the Impact of Software Evolution on Energy Consumption
Karan Aggarwal, Abram Hindle, and Eleni Stroulia
(University of Alberta, Canada)
Change-impact analysis, namely "identifying the potential consequences of a change" is an important and well studied problem in software evolution. Any change may potentially affect an application's behaviour, performance, and energy consumption profile. Our previous work demonstrated that changes to the system-call profile of an application correlated with changes to the application's energy-consumption profile. This paper evaluates and describes GreenAdvisor, a first of its kind tool that systematically records and analyzes an application's system calls to predict whether the energy-consumption profile of an application has changed. The GreenAdvisor tool was distributed to numerous software teams, whose members were surveyed about their experience using GreenAdvisor while developing Android applications to examine the energy-consumption impact of selected commits from the teams' projects. GreenAdvisor was evaluated against commits of these teams' projects. The two studies confirm the usefulness of our tool in assisting developers analyze and understand the energy-consumption profile changes of a new version. Based on our study findings, we constructed an improved prediction model to forecast the direction of the change, when a change in the energy-consumption profile is anticipated. This work can potentially be extremely useful to developers who currently have no similar tools.

ICSME 2015 – Proceedings

Technical Research Track

Developers
Tue, Sep 29, 10:40 - 12:20, GW2 B3009 (Chair: Michael Godfrey)

Program Comprehension
Tue, Sep 29, 13:50 - 15:30, GW2 B3009 (Chair: Denys Poshyvanyk)

Software Quality
Tue, Sep 29, 16:00 - 17:40, GW2 B3009 (Chair: Alexander Serebrenik)

Modularity
Wed, Sep 30, 08:30 - 10:10, GW2 B3009 (Chair: Giuseppe Scanniello)

Program Analysis
Wed, Sep 30, 10:40 - 12:20, GW2 B3009 (Chair: Arpad Beszedes)

Refactoring
Wed, Sep 30, 13:50 - 15:30, GW2 B3009 (Chair: Romain Robbes)

Code Mining and Recommendation
Thu, Oct 1, 10:40 - 12:20, GW2 B3009 (Chair: David Shepherd)

Mobile Applications
Thu, Oct 1, 13:50 - 15:30, GW2 B3009 (Chair: Peng Xin)

ICSME 2015 – Proceedings

Technical Research Track

Developers Tue, Sep 29, 10:40 - 12:20, GW2 B3009 (Chair: Michael Godfrey)

Program Comprehension Tue, Sep 29, 13:50 - 15:30, GW2 B3009 (Chair: Denys Poshyvanyk)

Software Quality Tue, Sep 29, 16:00 - 17:40, GW2 B3009 (Chair: Alexander Serebrenik)

Modularity Wed, Sep 30, 08:30 - 10:10, GW2 B3009 (Chair: Giuseppe Scanniello)

Program Analysis Wed, Sep 30, 10:40 - 12:20, GW2 B3009 (Chair: Arpad Beszedes)

Refactoring Wed, Sep 30, 13:50 - 15:30, GW2 B3009 (Chair: Romain Robbes)

Code Mining and Recommendation Thu, Oct 1, 10:40 - 12:20, GW2 B3009 (Chair: David Shepherd)

Mobile Applications Thu, Oct 1, 13:50 - 15:30, GW2 B3009 (Chair: Peng Xin)

Developers
Tue, Sep 29, 10:40 - 12:20, GW2 B3009 (Chair: Michael Godfrey)

Program Comprehension
Tue, Sep 29, 13:50 - 15:30, GW2 B3009 (Chair: Denys Poshyvanyk)

Software Quality
Tue, Sep 29, 16:00 - 17:40, GW2 B3009 (Chair: Alexander Serebrenik)

Modularity
Wed, Sep 30, 08:30 - 10:10, GW2 B3009 (Chair: Giuseppe Scanniello)

Program Analysis
Wed, Sep 30, 10:40 - 12:20, GW2 B3009 (Chair: Arpad Beszedes)

Refactoring
Wed, Sep 30, 13:50 - 15:30, GW2 B3009 (Chair: Romain Robbes)

Code Mining and Recommendation
Thu, Oct 1, 10:40 - 12:20, GW2 B3009 (Chair: David Shepherd)

Mobile Applications
Thu, Oct 1, 13:50 - 15:30, GW2 B3009 (Chair: Peng Xin)