CSMR-WCRE 2014 – Author Index |
Contents -
Abstracts -
Authors
|
A B C D E F G H I J K L M O P R S T U V W X Y Z
Abbasi, Ebrahim Khalil |
CSMR-WCRE '14: "Reverse Engineering Web Configurators ..."
Reverse Engineering Web Configurators
Ebrahim Khalil Abbasi, Mathieu Acher, Patrick Heymans, and Anthony Cleve (University of Namur, Belgium; University of Rennes 1, France) A Web configurator offers a highly interactive environment to assist users in customising sales products through the selection of configuration options. Our previous empirical study revealed that a significant number of configurators are suboptimal in reliability, efficiency, and maintainability, opening avenues for re-engineering support and methodologies. This paper presents a tool-supported reverse-engineering process to semi-automatically extract configuration-specific data from a legacy Web configurator. The extracted and structured data is stored in formal models (e.g., variability models) and can be used in a forward-engineering process to generate a customized interface with an underlying reliable reasoning engine. Two major components are presented: (1) a Web Wrapper that extracts structured configuration-specific data from unstructured or semi-structured Web pages of a configurator, and (2) a Web Crawler that explores the ``configuration space" (i.e., all objects representing configuration-specific data) and simulates users' configuration actions. We describe variability data extraction patterns, used on top of the Wrapper and the Crawler to extract configuration data. Experimental results on five existing Web configurators show that the specification of a few variability patterns enable the identification of hundreds of options. @InProceedings{CSMR-WCRE14p264, author = {Ebrahim Khalil Abbasi and Mathieu Acher and Patrick Heymans and Anthony Cleve}, title = {Reverse Engineering Web Configurators}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {264--273}, doi = {}, year = {2014}, } |
|
Acher, Mathieu |
CSMR-WCRE '14: "Reverse Engineering Web Configurators ..."
Reverse Engineering Web Configurators
Ebrahim Khalil Abbasi, Mathieu Acher, Patrick Heymans, and Anthony Cleve (University of Namur, Belgium; University of Rennes 1, France) A Web configurator offers a highly interactive environment to assist users in customising sales products through the selection of configuration options. Our previous empirical study revealed that a significant number of configurators are suboptimal in reliability, efficiency, and maintainability, opening avenues for re-engineering support and methodologies. This paper presents a tool-supported reverse-engineering process to semi-automatically extract configuration-specific data from a legacy Web configurator. The extracted and structured data is stored in formal models (e.g., variability models) and can be used in a forward-engineering process to generate a customized interface with an underlying reliable reasoning engine. Two major components are presented: (1) a Web Wrapper that extracts structured configuration-specific data from unstructured or semi-structured Web pages of a configurator, and (2) a Web Crawler that explores the ``configuration space" (i.e., all objects representing configuration-specific data) and simulates users' configuration actions. We describe variability data extraction patterns, used on top of the Wrapper and the Crawler to extract configuration data. Experimental results on five existing Web configurators show that the specification of a few variability patterns enable the identification of hundreds of options. @InProceedings{CSMR-WCRE14p264, author = {Ebrahim Khalil Abbasi and Mathieu Acher and Patrick Heymans and Anthony Cleve}, title = {Reverse Engineering Web Configurators}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {264--273}, doi = {}, year = {2014}, } |
|
Alawneh, Luay |
CSMR-WCRE '14: "A Contextual Approach for ..."
A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces
Luay Alawneh, Abdelwahab Hamou-Lhadj, Syed Shariyar Murtaza, and Yan Liu (Concordia University, Canada; Jordan University of Science and Technology, Jordan) Studies have shown that understanding of inter-process communication patterns is an enabler to effective analysis of high performance computing (HPC) applications. In previous work, we presented an algorithm for recovering communication patterns from traces of HPC systems. The algorithm worked well on small cases but it suffered from low accuracy when applied to large (and most interesting) traces. We believe that this was due to the fact that we viewed the trace as a mere string of operations of inter-process communication. That is, we did not take into account program control flow information. In this paper, we improve the detection accuracy by using function calls to serve as a context to guide the pattern extraction process. When applied to traces generated from two HPC benchmark applications, we demonstrate that this contextual approach improves precision and recall by an average of 56% and 66% respectively over the non-contextual method. @InProceedings{CSMR-WCRE14p274, author = {Luay Alawneh and Abdelwahab Hamou-Lhadj and Syed Shariyar Murtaza and Yan Liu}, title = {A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {274--282}, doi = {}, year = {2014}, } |
|
Anquetil, Nicolas |
CSMR-WCRE '14: "Remodularization Analysis ..."
Remodularization Analysis using Semantic Clustering
Gustavo Santos, Marco Tulio Valente, and Nicolas Anquetil (UFMG, Brazil; INRIA, France) In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report that Semantic Clustering and conceptual metrics can be used to express and explain the intention of the architects when performing common modularization operators, such as module decomposition. @InProceedings{CSMR-WCRE14p224, author = {Gustavo Santos and Marco Tulio Valente and Nicolas Anquetil}, title = {Remodularization Analysis using Semantic Clustering}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {224--233}, doi = {}, year = {2014}, } Info |
|
Antinyan, Vard |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Antoniol, Giuliano |
CSMR-WCRE '14: "In Medio Stat Virtus: Extract ..."
In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria
Gabriele Bavota, Rocco Oliveto, Andrea De Lucia , Andrian Marcus, Yann-Gaël Guéhéneuc, and Giuliano Antoniol (University of Salerno, Italy; University of Molise, Italy; Wayne State University, USA; Polytechnique Montréal, Canada) Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer’s point-of-view. @InProceedings{CSMR-WCRE14p214, author = {Gabriele Bavota and Rocco Oliveto and Andrea De Lucia and Andrian Marcus and Yann-Gaël Guéhéneuc and Giuliano Antoniol}, title = {In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {214--223}, doi = {}, year = {2014}, } |
|
Bavota, Gabriele |
CSMR-WCRE '14: "In Medio Stat Virtus: Extract ..."
In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria
Gabriele Bavota, Rocco Oliveto, Andrea De Lucia , Andrian Marcus, Yann-Gaël Guéhéneuc, and Giuliano Antoniol (University of Salerno, Italy; University of Molise, Italy; Wayne State University, USA; Polytechnique Montréal, Canada) Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer’s point-of-view. @InProceedings{CSMR-WCRE14p214, author = {Gabriele Bavota and Rocco Oliveto and Andrea De Lucia and Andrian Marcus and Yann-Gaël Guéhéneuc and Giuliano Antoniol}, title = {In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {214--223}, doi = {}, year = {2014}, } |
|
Beszédes, Árpád |
CSMR-WCRE '14: "Test Suite Reduction for Fault ..."
Test Suite Reduction for Fault Detection and Localization: A Combined Approach
László Vidács, Árpád Beszédes , Dávid Tengeri, István Siket, and Tibor Gyimóthy (MTA-SZTE Research Group on Artificial Intelligence, Hungary; University of Szeged, Hungary) The relation of test suites and actual faults in a software is of critical importance for timely product release. There are two particularily critical properties of test suites to this end: fault localization capability, to characterize the effort of finding the actually defective program elements, and fault detection capability which measures how probable is their manifestation and detection in the first place. While there are well established methods to predict fault detection capability (by measuring code coverage, for instance), characterization of fault localization is an emerging research topic. In this work, we investigate the effect of different test reduction methods on the performance of fault localization and detection techniques. We also provide new combined methods that incorporate both localization and detection aspects. We empirically evaluate the methods first by measuring detection and localization metrics of test suites with various reduction sizes, followed by how reduced test suites perform with actual faults. We experiment with SIR programs traditionally used in fault localization research, and extend the case study with large industrial software systems including GCC and WebKit. @InProceedings{CSMR-WCRE14p204, author = {László Vidács and Árpád Beszédes and Dávid Tengeri and István Siket and Tibor Gyimóthy}, title = {Test Suite Reduction for Fault Detection and Localization: A Combined Approach}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {204--213}, doi = {}, year = {2014}, } |
|
Brada, Premek |
CSMR-WCRE '14: "Broken Promises: An Empirical ..."
Broken Promises: An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades
Jens Dietrich, Kamil Jezek, and Premek Brada (Massey University, New Zealand; University of West Bohemia, Czech Republic) It has become common practice to build programs by using libraries. While the benefits of reuse are well known, an often overlooked risk are system runtime failures due to API changes in libraries that evolve independently. Traditionally, the consistency between a program and the libraries it uses is checked at build time when the entire system is compiled and tested. However, the trend towards partially upgrading systems by redeploying only evolved library versions results in situations where these crucial verification steps are skipped. For Java programs, partial upgrades create additional interesting problems as the compiler and the virtual machine use different rule sets to enforce contracts between the providers and the consumers of APIs. We have studied the extent of the problem on the qualitas corpus, a data set consisting of Java open-source programs widely used in empirical studies. In this paper, we describe the study and report its key findings. We found that the above mentioned issues do occur in practice, albeit not on a wide scale. @InProceedings{CSMR-WCRE14p64, author = {Jens Dietrich and Kamil Jezek and Premek Brada}, title = {Broken Promises: An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {64--73}, doi = {}, year = {2014}, } Info |
|
Brandtner, Martin |
CSMR-WCRE '14: "Supporting Continuous Integration ..."
Supporting Continuous Integration by Mashing-Up Software Quality Information
Martin Brandtner, Emanuel Giger, and Harald Gall (University of Zurich, Switzerland) Continuous Integration (CI) has become an established best practice of modern software development. Its philosophy of regularly integrating the changes of individual developers with the mainline code base saves the entire development team from descending into Integration Hell, a term coined in the field of extreme programming. In practice CI is supported by automated tools to cope with this repeated integration of source code through automated builds, testing, and deployments. Currently available products, for example, Jenkins-CI, SonarQube or GitHub, allow for the implementation of a seamless CI-process. One of the main problems, however, is that relevant information about the quality and health of a software system is both scattered across those tools and across multiple views. We address this challenging problem by raising awareness of quality aspects and tailor this information to particular stakeholders, such as developers or testers. For that we present a quality awareness framework and platform called SQA-Mashup. It makes use of the service-based mashup paradigm and integrates information from the entire CI-toolchain in a single service. To evaluate its usefulness we conducted a user study. It showed that SQA-Mashup’s single point of access allows to answer questions regarding the state of a system more quickly and accurately than standalone CI-tools. @InProceedings{CSMR-WCRE14p184, author = {Martin Brandtner and Emanuel Giger and Harald Gall}, title = {Supporting Continuous Integration by Mashing-Up Software Quality Information}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {184--193}, doi = {}, year = {2014}, } Info |
|
Chen, Zhenyu |
CSMR-WCRE '14: "Towards More Accurate Multi-label ..."
Towards More Accurate Multi-label Software Behavior Learning
Xin Xia, Yang Feng, David Lo , Zhenyu Chen, and Xinyu Wang (Zhejiang University, China; Nanjing University, China; Singapore Management University, Singapore) In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of faults, using a composite algorithm named MLL-GA, which combines various multi-label learning algorithms by leveraging genetic algorithm (GA). To evaluate the effectiveness of MLL-GA, we perform experiments on 6 open source programs and show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665. We also compare our algorithm with Ml.KNN and show that on average across the 6 datasets, MLL-GA improves the average F-measure of Ml.KNN by 14.43%. @InProceedings{CSMR-WCRE14p134, author = {Xin Xia and Yang Feng and David Lo and Zhenyu Chen and Xinyu Wang}, title = {Towards More Accurate Multi-label Software Behavior Learning}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {134--143}, doi = {}, year = {2014}, } |
|
Cleve, Anthony |
CSMR-WCRE '14: "Reverse Engineering Web Configurators ..."
Reverse Engineering Web Configurators
Ebrahim Khalil Abbasi, Mathieu Acher, Patrick Heymans, and Anthony Cleve (University of Namur, Belgium; University of Rennes 1, France) A Web configurator offers a highly interactive environment to assist users in customising sales products through the selection of configuration options. Our previous empirical study revealed that a significant number of configurators are suboptimal in reliability, efficiency, and maintainability, opening avenues for re-engineering support and methodologies. This paper presents a tool-supported reverse-engineering process to semi-automatically extract configuration-specific data from a legacy Web configurator. The extracted and structured data is stored in formal models (e.g., variability models) and can be used in a forward-engineering process to generate a customized interface with an underlying reliable reasoning engine. Two major components are presented: (1) a Web Wrapper that extracts structured configuration-specific data from unstructured or semi-structured Web pages of a configurator, and (2) a Web Crawler that explores the ``configuration space" (i.e., all objects representing configuration-specific data) and simulates users' configuration actions. We describe variability data extraction patterns, used on top of the Wrapper and the Crawler to extract configuration data. Experimental results on five existing Web configurators show that the specification of a few variability patterns enable the identification of hundreds of options. @InProceedings{CSMR-WCRE14p264, author = {Ebrahim Khalil Abbasi and Mathieu Acher and Patrick Heymans and Anthony Cleve}, title = {Reverse Engineering Web Configurators}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {264--273}, doi = {}, year = {2014}, } |
|
Conradi, Reidar |
CSMR-WCRE '14: "Transition and Defect Patterns ..."
Transition and Defect Patterns of Components in Dependency Cycles during Software Evolution
Tosin Daniel Oyetoyan, Daniela Soares Cruzes, and Reidar Conradi (NTNU, Norway; SINTEF, Norway) The challenge to break existing cyclically connected components of running software is not trivial. Since it involves planning and human resources to ensure that the software behavior is preserved after refactoring activity. Therefore, to motivate refactoring it is essential to obtain evidence of the benefits to the product quality. This study investigates the defect-proneness patterns of cyclically connected components vs. non-cyclic ones when they transition across software releases. We have mined and classified software components into two groups and two transition states– the cyclic and the non-cyclic ones. Next, we have performed an empirical study of four software systems from evolutionary perspective. Using standard statistical tests on formulated hypotheses, we have determined the significance of the defect profiles and complexities of each group. The results show that during software evolution, components that transition between dependency cycles have higher probability to be defect-prone than those that transition outside of cycles. Furthermore, out of the three complexity variables investigated, we found that an increase in the class reachability set size tends to be more associated with components that turn defective when they transition between dependency cycles. Lastly, we found no evidence of any systematic “cycle-breaking” refactoring between releases of the software systems. Thus, these findings motivate for refactoring of components in dependency cycle taking into account the minimization of metrics such as the class reachability set size. @InProceedings{CSMR-WCRE14p283, author = {Tosin Daniel Oyetoyan and Daniela Soares Cruzes and Reidar Conradi}, title = {Transition and Defect Patterns of Components in Dependency Cycles during Software Evolution}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {283--292}, doi = {}, year = {2014}, } |
|
Cruzes, Daniela Soares |
CSMR-WCRE '14: "Transition and Defect Patterns ..."
Transition and Defect Patterns of Components in Dependency Cycles during Software Evolution
Tosin Daniel Oyetoyan, Daniela Soares Cruzes, and Reidar Conradi (NTNU, Norway; SINTEF, Norway) The challenge to break existing cyclically connected components of running software is not trivial. Since it involves planning and human resources to ensure that the software behavior is preserved after refactoring activity. Therefore, to motivate refactoring it is essential to obtain evidence of the benefits to the product quality. This study investigates the defect-proneness patterns of cyclically connected components vs. non-cyclic ones when they transition across software releases. We have mined and classified software components into two groups and two transition states– the cyclic and the non-cyclic ones. Next, we have performed an empirical study of four software systems from evolutionary perspective. Using standard statistical tests on formulated hypotheses, we have determined the significance of the defect profiles and complexities of each group. The results show that during software evolution, components that transition between dependency cycles have higher probability to be defect-prone than those that transition outside of cycles. Furthermore, out of the three complexity variables investigated, we found that an increase in the class reachability set size tends to be more associated with components that turn defective when they transition between dependency cycles. Lastly, we found no evidence of any systematic “cycle-breaking” refactoring between releases of the software systems. Thus, these findings motivate for refactoring of components in dependency cycle taking into account the minimization of metrics such as the class reachability set size. @InProceedings{CSMR-WCRE14p283, author = {Tosin Daniel Oyetoyan and Daniela Soares Cruzes and Reidar Conradi}, title = {Transition and Defect Patterns of Components in Dependency Cycles during Software Evolution}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {283--292}, doi = {}, year = {2014}, } |
|
Csiszár, Norbert István |
CSMR-WCRE '14: "Anti-pattern Detection with ..."
Anti-pattern Detection with Model Queries: A Comparison of Approaches
Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc (Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary) Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. @InProceedings{CSMR-WCRE14p293, author = {Zoltán Ujhelyi and Ákos Horváth and Dániel Varró and Norbert István Csiszár and Gábor Szőke and László Vidács and Rudolf Ferenc}, title = {Anti-pattern Detection with Model Queries: A Comparison of Approaches}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {293--302}, doi = {}, year = {2014}, } |
|
Damevski, Kostadin |
CSMR-WCRE '14: "A Case Study of Paired Interleaving ..."
A Case Study of Paired Interleaving for Evaluating Code Search Techniques
Kostadin Damevski, David Shepherd , and Lori Pollock (Virginia State University, USA; ABB, USA; University of Delaware, USA) Source code search tools are designed to help developers locate code relevant to their task. The effectiveness of a search technique often depends on properties of user queries, the code being searched, and the specific task at hand. Thus, new code search techniques should ideally be evaluated in realistic situations that closely reflect the complexity, purpose of use, and context encountered during actual search sessions. This paper explores what can be learned from using an online paired interleaving approach, originally used for evaluating internet search engines, to comparatively observe and assess the effectiveness of code search tools in the field. We present a case study in which we implemented online paired interleaving for code search, deployed the tool in an IDE for developers at multiple companies, and analyzed results from over 300 user queries during their daily software maintenance tasks. We leveraged the results to direct further improvement of a search technique, redeployed the tool and analyzed results from over 600 queries to validate that an improvement in search was achieved in the field. We also report on the characteristics of user queries collected during the study, which are significantly different than queries currently used in evaluations. @InProceedings{CSMR-WCRE14p54, author = {Kostadin Damevski and David Shepherd and Lori Pollock}, title = {A Case Study of Paired Interleaving for Evaluating Code Search Techniques}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {54--63}, doi = {}, year = {2014}, } |
|
De Lucia, Andrea |
CSMR-WCRE '14: "Cross-Project Defect Prediction ..."
Cross-Project Defect Prediction Models: L'Union Fait la Force
Annibale Panichella, Rocco Oliveto, and Andrea De Lucia (University of Salerno, Italy; University of Molise, Italy) Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors. @InProceedings{CSMR-WCRE14p164, author = {Annibale Panichella and Rocco Oliveto and Andrea De Lucia}, title = {Cross-Project Defect Prediction Models: L'Union Fait la Force}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {164--173}, doi = {}, year = {2014}, } CSMR-WCRE '14: "In Medio Stat Virtus: Extract ..." In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria Gabriele Bavota, Rocco Oliveto, Andrea De Lucia , Andrian Marcus, Yann-Gaël Guéhéneuc, and Giuliano Antoniol (University of Salerno, Italy; University of Molise, Italy; Wayne State University, USA; Polytechnique Montréal, Canada) Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer’s point-of-view. @InProceedings{CSMR-WCRE14p214, author = {Gabriele Bavota and Rocco Oliveto and Andrea De Lucia and Andrian Marcus and Yann-Gaël Guéhéneuc and Giuliano Antoniol}, title = {In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {214--223}, doi = {}, year = {2014}, } |
|
Dietrich, Jens |
CSMR-WCRE '14: "Broken Promises: An Empirical ..."
Broken Promises: An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades
Jens Dietrich, Kamil Jezek, and Premek Brada (Massey University, New Zealand; University of West Bohemia, Czech Republic) It has become common practice to build programs by using libraries. While the benefits of reuse are well known, an often overlooked risk are system runtime failures due to API changes in libraries that evolve independently. Traditionally, the consistency between a program and the libraries it uses is checked at build time when the entire system is compiled and tested. However, the trend towards partially upgrading systems by redeploying only evolved library versions results in situations where these crucial verification steps are skipped. For Java programs, partial upgrades create additional interesting problems as the compiler and the virtual machine use different rule sets to enforce contracts between the providers and the consumers of APIs. We have studied the extent of the problem on the qualitas corpus, a data set consisting of Java open-source programs widely used in empirical studies. In this paper, we describe the study and report its key findings. We found that the above mentioned issues do occur in practice, albeit not on a wide scale. @InProceedings{CSMR-WCRE14p64, author = {Jens Dietrich and Kamil Jezek and Premek Brada}, title = {Broken Promises: An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {64--73}, doi = {}, year = {2014}, } Info |
|
Ding, Sun |
CSMR-WCRE '14: "Detecting Infeasible Branches ..."
Detecting Infeasible Branches Based on Code Patterns
Sun Ding, Hongyu Zhang , and Hee Beng Kuan Tan (Nanyang Technological University, Singapore; Tsinghua University, China) Infeasible branches are program branches that can never be exercised regardless of the inputs of the program. Detecting infeasible branches is important to many software engineering tasks such as test case generation and test coverage measurement. Applying full-scale symbolic evaluation to infeasible branch detection could be very costly, especially for a large software system. In this work, we propose a code pattern based method for detecting infeasible branches. We first introduce two general patterns that can characterize the source code containing infeasible branches. We then develop a tool, called Pattern-based method for Infeasible branch Detection (PIND), to detect infeasible branches based on the discovered code patterns. PIND only performs symbolic evaluation for the branches that exhibit the identified code patterns, therefore significantly reduce the number of symbolic evaluations required. We evaluate PIND from two aspects: accuracy and efficiency. The experimental results show that PIND can effectively and efficiently detect infeasible branches in real-world Java and Android programs. We also explore the application of PIND in measuring test case coverage. @InProceedings{CSMR-WCRE14p74, author = {Sun Ding and Hongyu Zhang and Hee Beng Kuan Tan}, title = {Detecting Infeasible Branches Based on Code Patterns}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {74--83}, doi = {}, year = {2014}, } |
|
Espinha, Tiago |
CSMR-WCRE '14: "Web API Growing Pains: Stories ..."
Web API Growing Pains: Stories from Client Developers and Their Code
Tiago Espinha, Andy Zaidman, and Hans-Gerhard Gross (Delft University of Technology, Netherlands) Web APIs provide a systematic and extensible approach for application-to-application interaction. Developers using web APIs are forced to accompany the API providers in their software evolution tasks. In order to understand the distress caused by this imposition on web API client developers we perform a semi-structured interview with six such developers. We also investigate how major web API providers organize their API evolution, and we explore how this affects source code changes of their clients. Our exploratory study of the Twitter, Google Maps, Facebook and Netflix web APIs analyzes the state of web API evolution practices and provides insight into the impact of service evolution on client software. Our study is complemented with a set of observations regarding best practices for web API evolution. @InProceedings{CSMR-WCRE14p84, author = {Tiago Espinha and Andy Zaidman and Hans-Gerhard Gross}, title = {Web API Growing Pains: Stories from Client Developers and Their Code}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {84--93}, doi = {}, year = {2014}, } |
|
Fails, Jerry Alan |
CSMR-WCRE '14: "NL-Based Query Refinement ..."
NL-Based Query Refinement and Contextualized Code Search Results: A User Study
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet (Montclair State University, USA) As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Source code search tools match a developer's keyword-style or natural language query with comments and identifiers in the source code to identify relevant methods that may need to be changed or understood to complete the maintenance task. In this search process, the developer faces a number of challenges: (1) formulating a query, (2) determining if the results are relevant, and (3) if the results are not relevant, reformulating the query. In this paper, we present a NL-based results view for searching source code for maintenance that helps address these challenges by integrating multiple feedback mechanisms into the search results view: prevalence of the query words in the result set, results grouped by NL-based information, as a result list, and suggested alternative query words. Our search technique is implemented as an Eclipse plug-in, CONQUER, and has been empirically validated by 18 Java developers. Our results show that users prefer CONQUER over a state of the art search technique, requesting customization of the interface in future query reformulation techniques. @InProceedings{CSMR-WCRE14p34, author = {Emily Hill and Manuel Roldan-Vega and Jerry Alan Fails and Greg Mallet}, title = {NL-Based Query Refinement and Contextualized Code Search Results: A User Study}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {34--43}, doi = {}, year = {2014}, } Info |
|
Felgentreff, Tim |
CSMR-WCRE '14: "Follow the Path: Debugging ..."
Follow the Path: Debugging State Anomalies along Execution Histories
Michael Perscheid, Tim Felgentreff, and Robert Hirschfeld (HPI, Germany) To understand how observable failures come into being, back-in-time debuggers help developers by providing full access to past executions. However, such potentially large execution histories do not include any hints to failure causes. For that reason, developers are forced to ascertain unexpected state properties and wrong behavior completely on their own. Without deep program understanding, back-in-time debugging can end in countless and difficult questions about possible failure causes that consume a lot of time for following failures back to their root causes. In this paper, we present state navigation as a debugging guide that highlights unexpected state properties along execution histories. After deriving common object properties from the expected behavior of passing test cases, we generate likely invariants, compare them with the failing run, and map differences as state anomalies to the past execution. So, developers obtain a common thread through the large amount of run-time data which helps them to answer what causes the observable failure. We implement our completely automatic state navigation as part of our test- driven fault navigation and its Path tools framework. To evaluate our approach, we observe eight developers during debugging four non-trivial failures. As a result, we find out that our state navigation is able to aid developers and to decrease the required time for localizing the root cause of a failure. @InProceedings{CSMR-WCRE14p124, author = {Michael Perscheid and Tim Felgentreff and Robert Hirschfeld}, title = {Follow the Path: Debugging State Anomalies along Execution Histories}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {124--133}, doi = {}, year = {2014}, } Video Info |
|
Feng, Yang |
CSMR-WCRE '14: "Towards More Accurate Multi-label ..."
Towards More Accurate Multi-label Software Behavior Learning
Xin Xia, Yang Feng, David Lo , Zhenyu Chen, and Xinyu Wang (Zhejiang University, China; Nanjing University, China; Singapore Management University, Singapore) In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of faults, using a composite algorithm named MLL-GA, which combines various multi-label learning algorithms by leveraging genetic algorithm (GA). To evaluate the effectiveness of MLL-GA, we perform experiments on 6 open source programs and show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665. We also compare our algorithm with Ml.KNN and show that on average across the 6 datasets, MLL-GA improves the average F-measure of Ml.KNN by 14.43%. @InProceedings{CSMR-WCRE14p134, author = {Xin Xia and Yang Feng and David Lo and Zhenyu Chen and Xinyu Wang}, title = {Towards More Accurate Multi-label Software Behavior Learning}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {134--143}, doi = {}, year = {2014}, } |
|
Ferenc, Rudolf |
CSMR-WCRE '14: "Anti-pattern Detection with ..."
Anti-pattern Detection with Model Queries: A Comparison of Approaches
Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc (Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary) Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. @InProceedings{CSMR-WCRE14p293, author = {Zoltán Ujhelyi and Ákos Horváth and Dániel Varró and Norbert István Csiszár and Gábor Szőke and László Vidács and Rudolf Ferenc}, title = {Anti-pattern Detection with Model Queries: A Comparison of Approaches}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {293--302}, doi = {}, year = {2014}, } |
|
Gall, Harald |
CSMR-WCRE '14: "Supporting Continuous Integration ..."
Supporting Continuous Integration by Mashing-Up Software Quality Information
Martin Brandtner, Emanuel Giger, and Harald Gall (University of Zurich, Switzerland) Continuous Integration (CI) has become an established best practice of modern software development. Its philosophy of regularly integrating the changes of individual developers with the mainline code base saves the entire development team from descending into Integration Hell, a term coined in the field of extreme programming. In practice CI is supported by automated tools to cope with this repeated integration of source code through automated builds, testing, and deployments. Currently available products, for example, Jenkins-CI, SonarQube or GitHub, allow for the implementation of a seamless CI-process. One of the main problems, however, is that relevant information about the quality and health of a software system is both scattered across those tools and across multiple views. We address this challenging problem by raising awareness of quality aspects and tailor this information to particular stakeholders, such as developers or testers. For that we present a quality awareness framework and platform called SQA-Mashup. It makes use of the service-based mashup paradigm and integrates information from the entire CI-toolchain in a single service. To evaluate its usefulness we conducted a user study. It showed that SQA-Mashup’s single point of access allows to answer questions regarding the state of a system more quickly and accurately than standalone CI-tools. @InProceedings{CSMR-WCRE14p184, author = {Martin Brandtner and Emanuel Giger and Harald Gall}, title = {Supporting Continuous Integration by Mashing-Up Software Quality Information}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {184--193}, doi = {}, year = {2014}, } Info |
|
Giger, Emanuel |
CSMR-WCRE '14: "Supporting Continuous Integration ..."
Supporting Continuous Integration by Mashing-Up Software Quality Information
Martin Brandtner, Emanuel Giger, and Harald Gall (University of Zurich, Switzerland) Continuous Integration (CI) has become an established best practice of modern software development. Its philosophy of regularly integrating the changes of individual developers with the mainline code base saves the entire development team from descending into Integration Hell, a term coined in the field of extreme programming. In practice CI is supported by automated tools to cope with this repeated integration of source code through automated builds, testing, and deployments. Currently available products, for example, Jenkins-CI, SonarQube or GitHub, allow for the implementation of a seamless CI-process. One of the main problems, however, is that relevant information about the quality and health of a software system is both scattered across those tools and across multiple views. We address this challenging problem by raising awareness of quality aspects and tailor this information to particular stakeholders, such as developers or testers. For that we present a quality awareness framework and platform called SQA-Mashup. It makes use of the service-based mashup paradigm and integrates information from the entire CI-toolchain in a single service. To evaluate its usefulness we conducted a user study. It showed that SQA-Mashup’s single point of access allows to answer questions regarding the state of a system more quickly and accurately than standalone CI-tools. @InProceedings{CSMR-WCRE14p184, author = {Martin Brandtner and Emanuel Giger and Harald Gall}, title = {Supporting Continuous Integration by Mashing-Up Software Quality Information}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {184--193}, doi = {}, year = {2014}, } Info |
|
Gross, Hans-Gerhard |
CSMR-WCRE '14: "Web API Growing Pains: Stories ..."
Web API Growing Pains: Stories from Client Developers and Their Code
Tiago Espinha, Andy Zaidman, and Hans-Gerhard Gross (Delft University of Technology, Netherlands) Web APIs provide a systematic and extensible approach for application-to-application interaction. Developers using web APIs are forced to accompany the API providers in their software evolution tasks. In order to understand the distress caused by this imposition on web API client developers we perform a semi-structured interview with six such developers. We also investigate how major web API providers organize their API evolution, and we explore how this affects source code changes of their clients. Our exploratory study of the Twitter, Google Maps, Facebook and Netflix web APIs analyzes the state of web API evolution practices and provides insight into the impact of service evolution on client software. Our study is complemented with a set of observations regarding best practices for web API evolution. @InProceedings{CSMR-WCRE14p84, author = {Tiago Espinha and Andy Zaidman and Hans-Gerhard Gross}, title = {Web API Growing Pains: Stories from Client Developers and Their Code}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {84--93}, doi = {}, year = {2014}, } |
|
Guéhéneuc, Yann-Gaël |
CSMR-WCRE '14: "In Medio Stat Virtus: Extract ..."
In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria
Gabriele Bavota, Rocco Oliveto, Andrea De Lucia , Andrian Marcus, Yann-Gaël Guéhéneuc, and Giuliano Antoniol (University of Salerno, Italy; University of Molise, Italy; Wayne State University, USA; Polytechnique Montréal, Canada) Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer’s point-of-view. @InProceedings{CSMR-WCRE14p214, author = {Gabriele Bavota and Rocco Oliveto and Andrea De Lucia and Andrian Marcus and Yann-Gaël Guéhéneuc and Giuliano Antoniol}, title = {In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {214--223}, doi = {}, year = {2014}, } |
|
Gyimóthy, Tibor |
CSMR-WCRE '14: "Test Suite Reduction for Fault ..."
Test Suite Reduction for Fault Detection and Localization: A Combined Approach
László Vidács, Árpád Beszédes , Dávid Tengeri, István Siket, and Tibor Gyimóthy (MTA-SZTE Research Group on Artificial Intelligence, Hungary; University of Szeged, Hungary) The relation of test suites and actual faults in a software is of critical importance for timely product release. There are two particularily critical properties of test suites to this end: fault localization capability, to characterize the effort of finding the actually defective program elements, and fault detection capability which measures how probable is their manifestation and detection in the first place. While there are well established methods to predict fault detection capability (by measuring code coverage, for instance), characterization of fault localization is an emerging research topic. In this work, we investigate the effect of different test reduction methods on the performance of fault localization and detection techniques. We also provide new combined methods that incorporate both localization and detection aspects. We empirically evaluate the methods first by measuring detection and localization metrics of test suites with various reduction sizes, followed by how reduced test suites perform with actual faults. We experiment with SIR programs traditionally used in fault localization research, and extend the case study with large industrial software systems including GCC and WebKit. @InProceedings{CSMR-WCRE14p204, author = {László Vidács and Árpád Beszédes and Dávid Tengeri and István Siket and Tibor Gyimóthy}, title = {Test Suite Reduction for Fault Detection and Localization: A Combined Approach}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {204--213}, doi = {}, year = {2014}, } |
|
Hamou-Lhadj, Abdelwahab |
CSMR-WCRE '14: "A Contextual Approach for ..."
A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces
Luay Alawneh, Abdelwahab Hamou-Lhadj, Syed Shariyar Murtaza, and Yan Liu (Concordia University, Canada; Jordan University of Science and Technology, Jordan) Studies have shown that understanding of inter-process communication patterns is an enabler to effective analysis of high performance computing (HPC) applications. In previous work, we presented an algorithm for recovering communication patterns from traces of HPC systems. The algorithm worked well on small cases but it suffered from low accuracy when applied to large (and most interesting) traces. We believe that this was due to the fact that we viewed the trace as a mere string of operations of inter-process communication. That is, we did not take into account program control flow information. In this paper, we improve the detection accuracy by using function calls to serve as a context to guide the pattern extraction process. When applied to traces generated from two HPC benchmark applications, we demonstrate that this contextual approach improves precision and recall by an average of 56% and 66% respectively over the non-contextual method. @InProceedings{CSMR-WCRE14p274, author = {Luay Alawneh and Abdelwahab Hamou-Lhadj and Syed Shariyar Murtaza and Yan Liu}, title = {A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {274--282}, doi = {}, year = {2014}, } |
|
Hansson, Jörgen |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Hendren, Laurie |
CSMR-WCRE '14: "Mc2for: A Tool for Automatically ..."
Mc2for: A Tool for Automatically Translating Matlab to Fortran 95
Xu Li and Laurie Hendren (McGill University, Canada) MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB's high-level syntax and dynamic types make it ideal for prototyping, programmers often prefer using high-performance static languages such as Fortran for their final distributable code. Rather than rewriting the code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to an equivalent Fortran program. There are several important challenges for automatically translating MATLAB to Fortran, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to equivalent Fortran constructs. In this paper, we introduce Mc2For, a tool which automatically translates MATLAB to Fortran 95. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shape of arrays and the range of scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated Fortran program. The second part is an extensible Fortran code generation framework automatically transforming MATLAB constructs to Fortran. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated Fortran code on a collection of MATLAB benchmarks. @InProceedings{CSMR-WCRE14p234, author = {Xu Li and Laurie Hendren}, title = {Mc2for: A Tool for Automatically Translating Matlab to Fortran 95}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {234--243}, doi = {}, year = {2014}, } |
|
Henriksson, Anders |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Heymans, Patrick |
CSMR-WCRE '14: "Reverse Engineering Web Configurators ..."
Reverse Engineering Web Configurators
Ebrahim Khalil Abbasi, Mathieu Acher, Patrick Heymans, and Anthony Cleve (University of Namur, Belgium; University of Rennes 1, France) A Web configurator offers a highly interactive environment to assist users in customising sales products through the selection of configuration options. Our previous empirical study revealed that a significant number of configurators are suboptimal in reliability, efficiency, and maintainability, opening avenues for re-engineering support and methodologies. This paper presents a tool-supported reverse-engineering process to semi-automatically extract configuration-specific data from a legacy Web configurator. The extracted and structured data is stored in formal models (e.g., variability models) and can be used in a forward-engineering process to generate a customized interface with an underlying reliable reasoning engine. Two major components are presented: (1) a Web Wrapper that extracts structured configuration-specific data from unstructured or semi-structured Web pages of a configurator, and (2) a Web Crawler that explores the ``configuration space" (i.e., all objects representing configuration-specific data) and simulates users' configuration actions. We describe variability data extraction patterns, used on top of the Wrapper and the Crawler to extract configuration data. Experimental results on five existing Web configurators show that the specification of a few variability patterns enable the identification of hundreds of options. @InProceedings{CSMR-WCRE14p264, author = {Ebrahim Khalil Abbasi and Mathieu Acher and Patrick Heymans and Anthony Cleve}, title = {Reverse Engineering Web Configurators}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {264--273}, doi = {}, year = {2014}, } |
|
Higo, Yoshiki |
CSMR-WCRE '14: "Does Return Null Matter? ..."
Does Return Null Matter?
Shuhei Kimura, Keisuke Hotta, Yoshiki Higo , Hiroshi Igaki, and Shinji Kusumoto (Osaka University, Japan) Developers often use null references for the returned values of methods (return null) in object-oriented languages. Although developers often use return null to indicate that a program does not satisfy some necessary conditions, it is generally felt that a method returning null is costly to maintain. One of the reasons for is that when a method receives a value returned from a method invocation whose code includes return null, it is necessary to check whether the returned value is null or not (null check). As developers often forget to write null checks, null dereferences occur frequently. However, it has not been clarified to what degree return null affects software maintenance during software evolution. This paper shows the influences of return null by investigating return null and null check in the evolution of source code. Experiments conducted on 14 open source projects showed that developers modify return null more frequently than return statements that do not include null. This result indicates that return null has a negative effect on software maintenance. It was also found that the size and the development phases of projects have no effect on the frequency of modifications on return null and null check. In addition, we found that all the projects in this experiment had from one to four null checks per 100 lines. @InProceedings{CSMR-WCRE14p244, author = {Shuhei Kimura and Keisuke Hotta and Yoshiki Higo and Hiroshi Igaki and Shinji Kusumoto}, title = {Does Return Null Matter?}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {244--253}, doi = {}, year = {2014}, } |
|
Hill, Emily |
CSMR-WCRE '14: "NL-Based Query Refinement ..."
NL-Based Query Refinement and Contextualized Code Search Results: A User Study
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet (Montclair State University, USA) As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Source code search tools match a developer's keyword-style or natural language query with comments and identifiers in the source code to identify relevant methods that may need to be changed or understood to complete the maintenance task. In this search process, the developer faces a number of challenges: (1) formulating a query, (2) determining if the results are relevant, and (3) if the results are not relevant, reformulating the query. In this paper, we present a NL-based results view for searching source code for maintenance that helps address these challenges by integrating multiple feedback mechanisms into the search results view: prevalence of the query words in the result set, results grouped by NL-based information, as a result list, and suggested alternative query words. Our search technique is implemented as an Eclipse plug-in, CONQUER, and has been empirically validated by 18 Java developers. Our results show that users prefer CONQUER over a state of the art search technique, requesting customization of the interface in future query reformulation techniques. @InProceedings{CSMR-WCRE14p34, author = {Emily Hill and Manuel Roldan-Vega and Jerry Alan Fails and Greg Mallet}, title = {NL-Based Query Refinement and Contextualized Code Search Results: A User Study}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {34--43}, doi = {}, year = {2014}, } Info |
|
Hirschfeld, Robert |
CSMR-WCRE '14: "Follow the Path: Debugging ..."
Follow the Path: Debugging State Anomalies along Execution Histories
Michael Perscheid, Tim Felgentreff, and Robert Hirschfeld (HPI, Germany) To understand how observable failures come into being, back-in-time debuggers help developers by providing full access to past executions. However, such potentially large execution histories do not include any hints to failure causes. For that reason, developers are forced to ascertain unexpected state properties and wrong behavior completely on their own. Without deep program understanding, back-in-time debugging can end in countless and difficult questions about possible failure causes that consume a lot of time for following failures back to their root causes. In this paper, we present state navigation as a debugging guide that highlights unexpected state properties along execution histories. After deriving common object properties from the expected behavior of passing test cases, we generate likely invariants, compare them with the failing run, and map differences as state anomalies to the past execution. So, developers obtain a common thread through the large amount of run-time data which helps them to answer what causes the observable failure. We implement our completely automatic state navigation as part of our test- driven fault navigation and its Path tools framework. To evaluate our approach, we observe eight developers during debugging four non-trivial failures. As a result, we find out that our state navigation is able to aid developers and to decrease the required time for localizing the root cause of a failure. @InProceedings{CSMR-WCRE14p124, author = {Michael Perscheid and Tim Felgentreff and Robert Hirschfeld}, title = {Follow the Path: Debugging State Anomalies along Execution Histories}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {124--133}, doi = {}, year = {2014}, } Video Info |
|
Horváth, Ákos |
CSMR-WCRE '14: "Anti-pattern Detection with ..."
Anti-pattern Detection with Model Queries: A Comparison of Approaches
Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc (Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary) Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. @InProceedings{CSMR-WCRE14p293, author = {Zoltán Ujhelyi and Ákos Horváth and Dániel Varró and Norbert István Csiszár and Gábor Szőke and László Vidács and Rudolf Ferenc}, title = {Anti-pattern Detection with Model Queries: A Comparison of Approaches}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {293--302}, doi = {}, year = {2014}, } |
|
Hotta, Keisuke |
CSMR-WCRE '14: "Does Return Null Matter? ..."
Does Return Null Matter?
Shuhei Kimura, Keisuke Hotta, Yoshiki Higo , Hiroshi Igaki, and Shinji Kusumoto (Osaka University, Japan) Developers often use null references for the returned values of methods (return null) in object-oriented languages. Although developers often use return null to indicate that a program does not satisfy some necessary conditions, it is generally felt that a method returning null is costly to maintain. One of the reasons for is that when a method receives a value returned from a method invocation whose code includes return null, it is necessary to check whether the returned value is null or not (null check). As developers often forget to write null checks, null dereferences occur frequently. However, it has not been clarified to what degree return null affects software maintenance during software evolution. This paper shows the influences of return null by investigating return null and null check in the evolution of source code. Experiments conducted on 14 open source projects showed that developers modify return null more frequently than return statements that do not include null. This result indicates that return null has a negative effect on software maintenance. It was also found that the size and the development phases of projects have no effect on the frequency of modifications on return null and null check. In addition, we found that all the projects in this experiment had from one to four null checks per 100 lines. @InProceedings{CSMR-WCRE14p244, author = {Shuhei Kimura and Keisuke Hotta and Yoshiki Higo and Hiroshi Igaki and Shinji Kusumoto}, title = {Does Return Null Matter?}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {244--253}, doi = {}, year = {2014}, } |
|
Igaki, Hiroshi |
CSMR-WCRE '14: "Does Return Null Matter? ..."
Does Return Null Matter?
Shuhei Kimura, Keisuke Hotta, Yoshiki Higo , Hiroshi Igaki, and Shinji Kusumoto (Osaka University, Japan) Developers often use null references for the returned values of methods (return null) in object-oriented languages. Although developers often use return null to indicate that a program does not satisfy some necessary conditions, it is generally felt that a method returning null is costly to maintain. One of the reasons for is that when a method receives a value returned from a method invocation whose code includes return null, it is necessary to check whether the returned value is null or not (null check). As developers often forget to write null checks, null dereferences occur frequently. However, it has not been clarified to what degree return null affects software maintenance during software evolution. This paper shows the influences of return null by investigating return null and null check in the evolution of source code. Experiments conducted on 14 open source projects showed that developers modify return null more frequently than return statements that do not include null. This result indicates that return null has a negative effect on software maintenance. It was also found that the size and the development phases of projects have no effect on the frequency of modifications on return null and null check. In addition, we found that all the projects in this experiment had from one to four null checks per 100 lines. @InProceedings{CSMR-WCRE14p244, author = {Shuhei Kimura and Keisuke Hotta and Yoshiki Higo and Hiroshi Igaki and Shinji Kusumoto}, title = {Does Return Null Matter?}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {244--253}, doi = {}, year = {2014}, } |
|
Jezek, Kamil |
CSMR-WCRE '14: "Broken Promises: An Empirical ..."
Broken Promises: An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades
Jens Dietrich, Kamil Jezek, and Premek Brada (Massey University, New Zealand; University of West Bohemia, Czech Republic) It has become common practice to build programs by using libraries. While the benefits of reuse are well known, an often overlooked risk are system runtime failures due to API changes in libraries that evolve independently. Traditionally, the consistency between a program and the libraries it uses is checked at build time when the entire system is compiled and tested. However, the trend towards partially upgrading systems by redeploying only evolved library versions results in situations where these crucial verification steps are skipped. For Java programs, partial upgrades create additional interesting problems as the compiler and the virtual machine use different rule sets to enforce contracts between the providers and the consumers of APIs. We have studied the extent of the problem on the qualitas corpus, a data set consisting of Java open-source programs widely used in empirical studies. In this paper, we describe the study and report its key findings. We found that the above mentioned issues do occur in practice, albeit not on a wide scale. @InProceedings{CSMR-WCRE14p64, author = {Jens Dietrich and Kamil Jezek and Premek Brada}, title = {Broken Promises: An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {64--73}, doi = {}, year = {2014}, } Info |
|
Keivanloo, Iman |
CSMR-WCRE '14: "An Empirical Study on the ..."
An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies
Shuai Xie, Foutse Khomh , Ying Zou , and Iman Keivanloo (Queen's University, Canada; Polytechnique Montréal, Canada) Copy and paste activities create clone groups in software systems. The evolution of a clone group across the history of a software system is termed as clone genealogy. During the evolution of a clone group, developers may change the location of the code fragments in the clone group. The type of the clone group may also change (e.g., from Type-1 to Type-2). These two phenomena have been referred to as clone migration and clone mutation respectively. Previous studies have found that clone migration occur frequently in software systems, and suggested that clone migration can induce faults in a software system. In this paper, we examine how clone migration phenomena affect the risk for faults in clone segments, clone groups, and clone genealogies from three long-lived software systems JBOSS, APACHE-ANT, and ARGOUML. Results show that: (1) migrated clone segments, clone groups, and clone genealogies are not equally fault-prone; (2) when a clone mutation occurs during a clone migration, the risk for faults in the migrated clone is increased; (3) migrating a clone that was not changed for a longer period of time is risky. @InProceedings{CSMR-WCRE14p94, author = {Shuai Xie and Foutse Khomh and Ying Zou and Iman Keivanloo}, title = {An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {94--103}, doi = {}, year = {2014}, } |
|
Khomh, Foutse |
CSMR-WCRE '14: "An Empirical Study on the ..."
An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies
Shuai Xie, Foutse Khomh , Ying Zou , and Iman Keivanloo (Queen's University, Canada; Polytechnique Montréal, Canada) Copy and paste activities create clone groups in software systems. The evolution of a clone group across the history of a software system is termed as clone genealogy. During the evolution of a clone group, developers may change the location of the code fragments in the clone group. The type of the clone group may also change (e.g., from Type-1 to Type-2). These two phenomena have been referred to as clone migration and clone mutation respectively. Previous studies have found that clone migration occur frequently in software systems, and suggested that clone migration can induce faults in a software system. In this paper, we examine how clone migration phenomena affect the risk for faults in clone segments, clone groups, and clone genealogies from three long-lived software systems JBOSS, APACHE-ANT, and ARGOUML. Results show that: (1) migrated clone segments, clone groups, and clone genealogies are not equally fault-prone; (2) when a clone mutation occurs during a clone migration, the risk for faults in the migrated clone is increased; (3) migrating a clone that was not changed for a longer period of time is risky. @InProceedings{CSMR-WCRE14p94, author = {Shuai Xie and Foutse Khomh and Ying Zou and Iman Keivanloo}, title = {An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {94--103}, doi = {}, year = {2014}, } |
|
Khurshid, Sarfraz |
CSMR-WCRE '14: "An Empirical Study of Long ..."
An Empirical Study of Long Lived Bugs
Ripon K. Saha, Sarfraz Khurshid, and Dewayne E. Perry (University of Texas at Austin, USA) Bug fixing is a crucial part of software development and maintenance. A large number of bugs often indicate poor software quality since buggy behavior not only causes failures that may be costly but also has a detrimental effect on the user’s overall experience with the software product. The impact of long lived bugs can be even more critical since experiencing the same bug version after version can be particularly frustrating for user. While there are many studies that investigate factors affecting bug fixing time for entire bug repositories, to the best of our knowledge, none of these studies investigates the extent and reasons of long lived bugs. In this paper, we analyzed long lived bugs from five different perspectives: their proportion, severity, assignment, reasons, as well as the nature of fixes. Our study on four open-source projects shows that there are a considerable number of long lived bugs in each system and over 90% of them adversely affect the user’s experience. The reasons of these long lived bugs are diverse including long assignment time, not understanding their importance in advance etc. However, many bug-fixes were delayed without any specific reasons. Our analysis of bug fixing changes further shows that many long lived bugs can be fixed quickly through careful prioritization. We believe our results will help both developers and researchers to better understand factors behind delays, improve the overall bug fixing process, and investigate analytical approaches for prioritizing bugs based on bug severity as well as expected bug fixing effort. @InProceedings{CSMR-WCRE14p144, author = {Ripon K. Saha and Sarfraz Khurshid and Dewayne E. Perry}, title = {An Empirical Study of Long Lived Bugs}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {144--153}, doi = {}, year = {2014}, } |
|
Kimura, Shuhei |
CSMR-WCRE '14: "Does Return Null Matter? ..."
Does Return Null Matter?
Shuhei Kimura, Keisuke Hotta, Yoshiki Higo , Hiroshi Igaki, and Shinji Kusumoto (Osaka University, Japan) Developers often use null references for the returned values of methods (return null) in object-oriented languages. Although developers often use return null to indicate that a program does not satisfy some necessary conditions, it is generally felt that a method returning null is costly to maintain. One of the reasons for is that when a method receives a value returned from a method invocation whose code includes return null, it is necessary to check whether the returned value is null or not (null check). As developers often forget to write null checks, null dereferences occur frequently. However, it has not been clarified to what degree return null affects software maintenance during software evolution. This paper shows the influences of return null by investigating return null and null check in the evolution of source code. Experiments conducted on 14 open source projects showed that developers modify return null more frequently than return statements that do not include null. This result indicates that return null has a negative effect on software maintenance. It was also found that the size and the development phases of projects have no effect on the frequency of modifications on return null and null check. In addition, we found that all the projects in this experiment had from one to four null checks per 100 lines. @InProceedings{CSMR-WCRE14p244, author = {Shuhei Kimura and Keisuke Hotta and Yoshiki Higo and Hiroshi Igaki and Shinji Kusumoto}, title = {Does Return Null Matter?}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {244--253}, doi = {}, year = {2014}, } |
|
Krishnan, Giri Panamoottil |
CSMR-WCRE '14: "Unification and Refactoring ..."
Unification and Refactoring of Clones
Giri Panamoottil Krishnan and Nikolaos Tsantalis (Concordia University, Canada) Code duplication has been recognized as a potentially serious problem having a negative impact on the maintainability, comprehensibility, and evolution of software systems. In the past, several techniques have been developed for the detection and management of software clones. Existing code duplication can be eliminated by extracting the common functionality into a single module. However, the unification and refactoring of software clones is a challenging problem, since clones usually go through several modifications after their initial introduction. In this paper we present an approach for the unification and refactoring of software clones that overcomes the limitations of previous approaches. More specifically, our approach is able to detect and parameterize non-trivial differences between the clones. Moreover, it can find an optimal mapping between the statements of the clones that minimizes the number of differences. We compared the proposed technique with a competitive clone refactoring tool and concluded that our approach is able to find a significantly larger number of refactorable clones. @InProceedings{CSMR-WCRE14p104, author = {Giri Panamoottil Krishnan and Nikolaos Tsantalis}, title = {Unification and Refactoring of Clones}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {104--113}, doi = {}, year = {2014}, } |
|
Kusumoto, Shinji |
CSMR-WCRE '14: "Does Return Null Matter? ..."
Does Return Null Matter?
Shuhei Kimura, Keisuke Hotta, Yoshiki Higo , Hiroshi Igaki, and Shinji Kusumoto (Osaka University, Japan) Developers often use null references for the returned values of methods (return null) in object-oriented languages. Although developers often use return null to indicate that a program does not satisfy some necessary conditions, it is generally felt that a method returning null is costly to maintain. One of the reasons for is that when a method receives a value returned from a method invocation whose code includes return null, it is necessary to check whether the returned value is null or not (null check). As developers often forget to write null checks, null dereferences occur frequently. However, it has not been clarified to what degree return null affects software maintenance during software evolution. This paper shows the influences of return null by investigating return null and null check in the evolution of source code. Experiments conducted on 14 open source projects showed that developers modify return null more frequently than return statements that do not include null. This result indicates that return null has a negative effect on software maintenance. It was also found that the size and the development phases of projects have no effect on the frequency of modifications on return null and null check. In addition, we found that all the projects in this experiment had from one to four null checks per 100 lines. @InProceedings{CSMR-WCRE14p244, author = {Shuhei Kimura and Keisuke Hotta and Yoshiki Higo and Hiroshi Igaki and Shinji Kusumoto}, title = {Does Return Null Matter?}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {244--253}, doi = {}, year = {2014}, } |
|
Lawall, Julia |
CSMR-WCRE '14: "Automated Construction of ..."
Automated Construction of a Software-Specific Word Similarity Database
Yuan Tian, David Lo , and Julia Lawall (Singapore Management University, Singapore; INRIA, France; LIP6, France) Many automated software engineering approaches, including code search, bug report categorization, and duplicate bug report detection, measure similarities between two documents by analyzing natural language contents. Often different words are used to express the same meaning and thus measuring similarities using exact matching of words is insufficient. To solve this problem, past studies have shown the need to measure the similarities between pairs of words. To meet this need, the natural language processing community has built WordNet which is a manually constructed lexical database that records semantic relations among words and can be used to measure how similar two words are. However, WordNet is a general purpose resource, and often does not contain software-specific words. Also, the meanings of words in WordNet are often different than when they are used in software engineering context. Thus, there is a need for a software-specific WordNet-like resource that can measure similarities of words. In this work, we propose an automated approach that builds a software-specific WordNet like resource, named WordSimSE , by leveraging the textual contents of posts in StackOverflow. Our approach measures the similarity of words by computing the similarities of the weighted co-occurrences of these words with three types of words in the textual corpus. We have evaluated our approach on a set of software-specific words and compared our approach with an existing WordNet-based technique (WordNet_res) to return top-k most similar words. Human judges are used to evaluate the effectiveness of the two techniques. We find that WordNet_res returns no result for 55% of the queries. For the remaining queries, WordNet_res returns significantly poorer results. @InProceedings{CSMR-WCRE14p44, author = {Yuan Tian and David Lo and Julia Lawall}, title = {Automated Construction of a Software-Specific Word Similarity Database}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {44--53}, doi = {}, year = {2014}, } |
|
Li, Xu |
CSMR-WCRE '14: "Mc2for: A Tool for Automatically ..."
Mc2for: A Tool for Automatically Translating Matlab to Fortran 95
Xu Li and Laurie Hendren (McGill University, Canada) MATLAB is a dynamic numerical scripting language widely used by scientists, engineers and students. While MATLAB's high-level syntax and dynamic types make it ideal for prototyping, programmers often prefer using high-performance static languages such as Fortran for their final distributable code. Rather than rewriting the code by hand, our solution is to provide a tool that automatically translates the original MATLAB program to an equivalent Fortran program. There are several important challenges for automatically translating MATLAB to Fortran, such as correctly estimating the static type characteristics of all the variables in a MATLAB program, mapping MATLAB built-in functions, and effectively mapping MATLAB constructs to equivalent Fortran constructs. In this paper, we introduce Mc2For, a tool which automatically translates MATLAB to Fortran 95. This tool consists of two major parts. The first part is an interprocedural analysis component to estimate the static type characteristics, such as the shape of arrays and the range of scalars, which are used to generate variable declarations and to remove unnecessary array bounds checking in the translated Fortran program. The second part is an extensible Fortran code generation framework automatically transforming MATLAB constructs to Fortran. This work has been implemented within the McLab framework, and we demonstrate the performance of the translated Fortran code on a collection of MATLAB benchmarks. @InProceedings{CSMR-WCRE14p234, author = {Xu Li and Laurie Hendren}, title = {Mc2for: A Tool for Automatically Translating Matlab to Fortran 95}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {234--243}, doi = {}, year = {2014}, } |
|
Lima, Fernando Paim |
CSMR-WCRE '14: "Extracting Relative Thresholds ..."
Extracting Relative Thresholds for Source Code Metrics
Paloma Oliveira, Marco Tulio Valente, and Fernando Paim Lima (UFMG, Brazil; IFMG, Brazil) Establishing credible thresholds is a central challenge for promoting source code metrics as an effective instrument to control the internal quality of software systems. To address this challenge, we propose the concept of relative thresholds for evaluating metrics data following heavy-tailed distributions. The proposed thresholds are relative because they assume that metric thresholds should be followed by most source code entities, but that it is also natural to have a number of entities in the ``long-tail'' that do not follow the defined limits. In the paper, we describe an empirical method for extracting relative thresholds from real systems. We also report a study on applying this method in a corpus with 106 systems. Based on the results of this study, we argue that the proposed thresholds express a balance between real and idealized design practices. @InProceedings{CSMR-WCRE14p254, author = {Paloma Oliveira and Marco Tulio Valente and Fernando Paim Lima}, title = {Extracting Relative Thresholds for Source Code Metrics}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {254--263}, doi = {}, year = {2014}, } |
|
Liu, Yan |
CSMR-WCRE '14: "A Contextual Approach for ..."
A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces
Luay Alawneh, Abdelwahab Hamou-Lhadj, Syed Shariyar Murtaza, and Yan Liu (Concordia University, Canada; Jordan University of Science and Technology, Jordan) Studies have shown that understanding of inter-process communication patterns is an enabler to effective analysis of high performance computing (HPC) applications. In previous work, we presented an algorithm for recovering communication patterns from traces of HPC systems. The algorithm worked well on small cases but it suffered from low accuracy when applied to large (and most interesting) traces. We believe that this was due to the fact that we viewed the trace as a mere string of operations of inter-process communication. That is, we did not take into account program control flow information. In this paper, we improve the detection accuracy by using function calls to serve as a context to guide the pattern extraction process. When applied to traces generated from two HPC benchmark applications, we demonstrate that this contextual approach improves precision and recall by an average of 56% and 66% respectively over the non-contextual method. @InProceedings{CSMR-WCRE14p274, author = {Luay Alawneh and Abdelwahab Hamou-Lhadj and Syed Shariyar Murtaza and Yan Liu}, title = {A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {274--282}, doi = {}, year = {2014}, } |
|
Lo, David |
CSMR-WCRE '14: "Towards More Accurate Multi-label ..."
Towards More Accurate Multi-label Software Behavior Learning
Xin Xia, Yang Feng, David Lo , Zhenyu Chen, and Xinyu Wang (Zhejiang University, China; Nanjing University, China; Singapore Management University, Singapore) In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of faults, using a composite algorithm named MLL-GA, which combines various multi-label learning algorithms by leveraging genetic algorithm (GA). To evaluate the effectiveness of MLL-GA, we perform experiments on 6 open source programs and show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665. We also compare our algorithm with Ml.KNN and show that on average across the 6 datasets, MLL-GA improves the average F-measure of Ml.KNN by 14.43%. @InProceedings{CSMR-WCRE14p134, author = {Xin Xia and Yang Feng and David Lo and Zhenyu Chen and Xinyu Wang}, title = {Towards More Accurate Multi-label Software Behavior Learning}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {134--143}, doi = {}, year = {2014}, } CSMR-WCRE '14: "Automated Construction of ..." Automated Construction of a Software-Specific Word Similarity Database Yuan Tian, David Lo , and Julia Lawall (Singapore Management University, Singapore; INRIA, France; LIP6, France) Many automated software engineering approaches, including code search, bug report categorization, and duplicate bug report detection, measure similarities between two documents by analyzing natural language contents. Often different words are used to express the same meaning and thus measuring similarities using exact matching of words is insufficient. To solve this problem, past studies have shown the need to measure the similarities between pairs of words. To meet this need, the natural language processing community has built WordNet which is a manually constructed lexical database that records semantic relations among words and can be used to measure how similar two words are. However, WordNet is a general purpose resource, and often does not contain software-specific words. Also, the meanings of words in WordNet are often different than when they are used in software engineering context. Thus, there is a need for a software-specific WordNet-like resource that can measure similarities of words. In this work, we propose an automated approach that builds a software-specific WordNet like resource, named WordSimSE , by leveraging the textual contents of posts in StackOverflow. Our approach measures the similarity of words by computing the similarities of the weighted co-occurrences of these words with three types of words in the textual corpus. We have evaluated our approach on a set of software-specific words and compared our approach with an existing WordNet-based technique (WordNet_res) to return top-k most similar words. Human judges are used to evaluate the effectiveness of the two techniques. We find that WordNet_res returns no result for 55% of the queries. For the remaining queries, WordNet_res returns significantly poorer results. @InProceedings{CSMR-WCRE14p44, author = {Yuan Tian and David Lo and Julia Lawall}, title = {Automated Construction of a Software-Specific Word Similarity Database}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {44--53}, doi = {}, year = {2014}, } CSMR-WCRE '14: "An Empirical Study of Bug ..." An Empirical Study of Bug Report Field Reassignment Xin Xia, David Lo , Ming Wen, Emad Shihab, and Bo Zhou (Zhejiang University, China; Singapore Management University, Singapore; Rochester Institute of Technology, USA) A bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), platform, etc., which provide important information for the bug triaging and fixing process. It is important to make sure that bug information is correct since previous studies showed that the wrong assignment of bug report fields could increase the bug fixing time, and even delay the delivery of the software. In this paper, we perform an empirical study on bug report field reassignments in open-source software projects. To better understand why bug report fields are reassigned, we manually collect 99 recent bug reports that had their fields reassigned and emailed their reporters and developers asking why these fields got reassigned. Then, we perform a large-scale empirical study on 8 types of bug report field reassignments in 4 open-source software projects containing a total of 190,558 bug reports. In particular, we investigate 1) the number of bug reports whose fields get reassigned, 2) the difference in bug fixing time between bug reports whose fields get reassigned and those whose fields are not reassigned, 3) the duration a field in a bug report gets reassigned, 4) the number of fields in a bug report that get reassigned, 5) the number of times a field in a bug report gets reassigned, and 6) whether the experience of bug reporters affect the reassignment of bug report fields. We find that a large number (approximately 80%) of bug reports have their fields reassigned, and the bug reports whose fields get reassigned require more time to be fixed than those without field reassignments. @InProceedings{CSMR-WCRE14p174, author = {Xin Xia and David Lo and Ming Wen and Emad Shihab and Bo Zhou}, title = {An Empirical Study of Bug Report Field Reassignment}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {174--183}, doi = {}, year = {2014}, } |
|
Mallet, Greg |
CSMR-WCRE '14: "NL-Based Query Refinement ..."
NL-Based Query Refinement and Contextualized Code Search Results: A User Study
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet (Montclair State University, USA) As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Source code search tools match a developer's keyword-style or natural language query with comments and identifiers in the source code to identify relevant methods that may need to be changed or understood to complete the maintenance task. In this search process, the developer faces a number of challenges: (1) formulating a query, (2) determining if the results are relevant, and (3) if the results are not relevant, reformulating the query. In this paper, we present a NL-based results view for searching source code for maintenance that helps address these challenges by integrating multiple feedback mechanisms into the search results view: prevalence of the query words in the result set, results grouped by NL-based information, as a result list, and suggested alternative query words. Our search technique is implemented as an Eclipse plug-in, CONQUER, and has been empirically validated by 18 Java developers. Our results show that users prefer CONQUER over a state of the art search technique, requesting customization of the interface in future query reformulation techniques. @InProceedings{CSMR-WCRE14p34, author = {Emily Hill and Manuel Roldan-Vega and Jerry Alan Fails and Greg Mallet}, title = {NL-Based Query Refinement and Contextualized Code Search Results: A User Study}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {34--43}, doi = {}, year = {2014}, } Info |
|
Marcus, Andrian |
CSMR-WCRE '14: "In Medio Stat Virtus: Extract ..."
In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria
Gabriele Bavota, Rocco Oliveto, Andrea De Lucia , Andrian Marcus, Yann-Gaël Guéhéneuc, and Giuliano Antoniol (University of Salerno, Italy; University of Molise, Italy; Wayne State University, USA; Polytechnique Montréal, Canada) Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer’s point-of-view. @InProceedings{CSMR-WCRE14p214, author = {Gabriele Bavota and Rocco Oliveto and Andrea De Lucia and Andrian Marcus and Yann-Gaël Guéhéneuc and Giuliano Antoniol}, title = {In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {214--223}, doi = {}, year = {2014}, } |
|
Meding, Wilhelm |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Mondal, Manishankar |
CSMR-WCRE '14: "Automatic Ranking of Clones ..."
Automatic Ranking of Clones for Refactoring through Mining Association Rules
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider (University of Saskatchewan, Canada) In this paper, we present an in-depth empirical study on identifying clone fragments that can be important refactoring candidates. We mine association rules among clones in order to detect clone fragments that belong to the same clone class and have a tendency of changing together during software evolution. The idea is that if two or more clone fragments from the same class often change together (i.e., are likely to co-change) preserving their similarity, they might be important candidates for refactoring. Merging such clones into one (if possible) can potentially decrease future clone maintenance effort. We define a particular clone change pattern, the Similarity Preserving Change Pattern (SPCP), and consider the cloned fragments that changed according to this pattern (i.e., the SPCP clones) as important candidates for refactoring. For the purpose of our study, we implement a prototype tool called MARC that identifies SPCP clones and mines association rules among these. The rules as well as the SPCP clones are ranked for refactoring on the basis of their change-proneness. We applied MARC on thirteen subject systems and retrieved the refactoring candidates for three types of clones (Type 1, Type 2, and Type 3) separately. Our experimental results show that SPCP clones can be considered important candidates for refactoring. Clones that do not follow SPCP either evolve independently or are rarely changed. By considering SPCP clones for refactoring we not only can minimize refactoring effort considerably but also can reduce the possibility of delayed synchronizations among clones and thus, can minimize inconsistencies in software systems. @InProceedings{CSMR-WCRE14p114, author = {Manishankar Mondal and Chanchal K. Roy and Kevin A. Schneider}, title = {Automatic Ranking of Clones for Refactoring through Mining Association Rules}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {114--123}, doi = {}, year = {2014}, } |
|
Murtaza, Syed Shariyar |
CSMR-WCRE '14: "A Contextual Approach for ..."
A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces
Luay Alawneh, Abdelwahab Hamou-Lhadj, Syed Shariyar Murtaza, and Yan Liu (Concordia University, Canada; Jordan University of Science and Technology, Jordan) Studies have shown that understanding of inter-process communication patterns is an enabler to effective analysis of high performance computing (HPC) applications. In previous work, we presented an algorithm for recovering communication patterns from traces of HPC systems. The algorithm worked well on small cases but it suffered from low accuracy when applied to large (and most interesting) traces. We believe that this was due to the fact that we viewed the trace as a mere string of operations of inter-process communication. That is, we did not take into account program control flow information. In this paper, we improve the detection accuracy by using function calls to serve as a context to guide the pattern extraction process. When applied to traces generated from two HPC benchmark applications, we demonstrate that this contextual approach improves precision and recall by an average of 56% and 66% respectively over the non-contextual method. @InProceedings{CSMR-WCRE14p274, author = {Luay Alawneh and Abdelwahab Hamou-Lhadj and Syed Shariyar Murtaza and Yan Liu}, title = {A Contextual Approach for Effective Recovery of Inter-process Communication Patterns from HPC Traces}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {274--282}, doi = {}, year = {2014}, } |
|
Oliveira, Paloma |
CSMR-WCRE '14: "Extracting Relative Thresholds ..."
Extracting Relative Thresholds for Source Code Metrics
Paloma Oliveira, Marco Tulio Valente, and Fernando Paim Lima (UFMG, Brazil; IFMG, Brazil) Establishing credible thresholds is a central challenge for promoting source code metrics as an effective instrument to control the internal quality of software systems. To address this challenge, we propose the concept of relative thresholds for evaluating metrics data following heavy-tailed distributions. The proposed thresholds are relative because they assume that metric thresholds should be followed by most source code entities, but that it is also natural to have a number of entities in the ``long-tail'' that do not follow the defined limits. In the paper, we describe an empirical method for extracting relative thresholds from real systems. We also report a study on applying this method in a corpus with 106 systems. Based on the results of this study, we argue that the proposed thresholds express a balance between real and idealized design practices. @InProceedings{CSMR-WCRE14p254, author = {Paloma Oliveira and Marco Tulio Valente and Fernando Paim Lima}, title = {Extracting Relative Thresholds for Source Code Metrics}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {254--263}, doi = {}, year = {2014}, } |
|
Oliveto, Rocco |
CSMR-WCRE '14: "Cross-Project Defect Prediction ..."
Cross-Project Defect Prediction Models: L'Union Fait la Force
Annibale Panichella, Rocco Oliveto, and Andrea De Lucia (University of Salerno, Italy; University of Molise, Italy) Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors. @InProceedings{CSMR-WCRE14p164, author = {Annibale Panichella and Rocco Oliveto and Andrea De Lucia}, title = {Cross-Project Defect Prediction Models: L'Union Fait la Force}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {164--173}, doi = {}, year = {2014}, } CSMR-WCRE '14: "In Medio Stat Virtus: Extract ..." In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria Gabriele Bavota, Rocco Oliveto, Andrea De Lucia , Andrian Marcus, Yann-Gaël Guéhéneuc, and Giuliano Antoniol (University of Salerno, Italy; University of Molise, Italy; Wayne State University, USA; Polytechnique Montréal, Canada) Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer’s point-of-view. @InProceedings{CSMR-WCRE14p214, author = {Gabriele Bavota and Rocco Oliveto and Andrea De Lucia and Andrian Marcus and Yann-Gaël Guéhéneuc and Giuliano Antoniol}, title = {In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {214--223}, doi = {}, year = {2014}, } |
|
Österström, Per |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Oyetoyan, Tosin Daniel |
CSMR-WCRE '14: "Transition and Defect Patterns ..."
Transition and Defect Patterns of Components in Dependency Cycles during Software Evolution
Tosin Daniel Oyetoyan, Daniela Soares Cruzes, and Reidar Conradi (NTNU, Norway; SINTEF, Norway) The challenge to break existing cyclically connected components of running software is not trivial. Since it involves planning and human resources to ensure that the software behavior is preserved after refactoring activity. Therefore, to motivate refactoring it is essential to obtain evidence of the benefits to the product quality. This study investigates the defect-proneness patterns of cyclically connected components vs. non-cyclic ones when they transition across software releases. We have mined and classified software components into two groups and two transition states– the cyclic and the non-cyclic ones. Next, we have performed an empirical study of four software systems from evolutionary perspective. Using standard statistical tests on formulated hypotheses, we have determined the significance of the defect profiles and complexities of each group. The results show that during software evolution, components that transition between dependency cycles have higher probability to be defect-prone than those that transition outside of cycles. Furthermore, out of the three complexity variables investigated, we found that an increase in the class reachability set size tends to be more associated with components that turn defective when they transition between dependency cycles. Lastly, we found no evidence of any systematic “cycle-breaking” refactoring between releases of the software systems. Thus, these findings motivate for refactoring of components in dependency cycle taking into account the minimization of metrics such as the class reachability set size. @InProceedings{CSMR-WCRE14p283, author = {Tosin Daniel Oyetoyan and Daniela Soares Cruzes and Reidar Conradi}, title = {Transition and Defect Patterns of Components in Dependency Cycles during Software Evolution}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {283--292}, doi = {}, year = {2014}, } |
|
Panichella, Annibale |
CSMR-WCRE '14: "Cross-Project Defect Prediction ..."
Cross-Project Defect Prediction Models: L'Union Fait la Force
Annibale Panichella, Rocco Oliveto, and Andrea De Lucia (University of Salerno, Italy; University of Molise, Italy) Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors. @InProceedings{CSMR-WCRE14p164, author = {Annibale Panichella and Rocco Oliveto and Andrea De Lucia}, title = {Cross-Project Defect Prediction Models: L'Union Fait la Force}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {164--173}, doi = {}, year = {2014}, } |
|
Perry, Dewayne E. |
CSMR-WCRE '14: "An Empirical Study of Long ..."
An Empirical Study of Long Lived Bugs
Ripon K. Saha, Sarfraz Khurshid, and Dewayne E. Perry (University of Texas at Austin, USA) Bug fixing is a crucial part of software development and maintenance. A large number of bugs often indicate poor software quality since buggy behavior not only causes failures that may be costly but also has a detrimental effect on the user’s overall experience with the software product. The impact of long lived bugs can be even more critical since experiencing the same bug version after version can be particularly frustrating for user. While there are many studies that investigate factors affecting bug fixing time for entire bug repositories, to the best of our knowledge, none of these studies investigates the extent and reasons of long lived bugs. In this paper, we analyzed long lived bugs from five different perspectives: their proportion, severity, assignment, reasons, as well as the nature of fixes. Our study on four open-source projects shows that there are a considerable number of long lived bugs in each system and over 90% of them adversely affect the user’s experience. The reasons of these long lived bugs are diverse including long assignment time, not understanding their importance in advance etc. However, many bug-fixes were delayed without any specific reasons. Our analysis of bug fixing changes further shows that many long lived bugs can be fixed quickly through careful prioritization. We believe our results will help both developers and researchers to better understand factors behind delays, improve the overall bug fixing process, and investigate analytical approaches for prioritizing bugs based on bug severity as well as expected bug fixing effort. @InProceedings{CSMR-WCRE14p144, author = {Ripon K. Saha and Sarfraz Khurshid and Dewayne E. Perry}, title = {An Empirical Study of Long Lived Bugs}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {144--153}, doi = {}, year = {2014}, } |
|
Perscheid, Michael |
CSMR-WCRE '14: "Follow the Path: Debugging ..."
Follow the Path: Debugging State Anomalies along Execution Histories
Michael Perscheid, Tim Felgentreff, and Robert Hirschfeld (HPI, Germany) To understand how observable failures come into being, back-in-time debuggers help developers by providing full access to past executions. However, such potentially large execution histories do not include any hints to failure causes. For that reason, developers are forced to ascertain unexpected state properties and wrong behavior completely on their own. Without deep program understanding, back-in-time debugging can end in countless and difficult questions about possible failure causes that consume a lot of time for following failures back to their root causes. In this paper, we present state navigation as a debugging guide that highlights unexpected state properties along execution histories. After deriving common object properties from the expected behavior of passing test cases, we generate likely invariants, compare them with the failing run, and map differences as state anomalies to the past execution. So, developers obtain a common thread through the large amount of run-time data which helps them to answer what causes the observable failure. We implement our completely automatic state navigation as part of our test- driven fault navigation and its Path tools framework. To evaluate our approach, we observe eight developers during debugging four non-trivial failures. As a result, we find out that our state navigation is able to aid developers and to decrease the required time for localizing the root cause of a failure. @InProceedings{CSMR-WCRE14p124, author = {Michael Perscheid and Tim Felgentreff and Robert Hirschfeld}, title = {Follow the Path: Debugging State Anomalies along Execution Histories}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {124--133}, doi = {}, year = {2014}, } Video Info |
|
Pollock, Lori |
CSMR-WCRE '14: "A Case Study of Paired Interleaving ..."
A Case Study of Paired Interleaving for Evaluating Code Search Techniques
Kostadin Damevski, David Shepherd , and Lori Pollock (Virginia State University, USA; ABB, USA; University of Delaware, USA) Source code search tools are designed to help developers locate code relevant to their task. The effectiveness of a search technique often depends on properties of user queries, the code being searched, and the specific task at hand. Thus, new code search techniques should ideally be evaluated in realistic situations that closely reflect the complexity, purpose of use, and context encountered during actual search sessions. This paper explores what can be learned from using an online paired interleaving approach, originally used for evaluating internet search engines, to comparatively observe and assess the effectiveness of code search tools in the field. We present a case study in which we implemented online paired interleaving for code search, deployed the tool in an IDE for developers at multiple companies, and analyzed results from over 300 user queries during their daily software maintenance tasks. We leveraged the results to direct further improvement of a search technique, redeployed the tool and analyzed results from over 600 queries to validate that an improvement in search was achieved in the field. We also report on the characteristics of user queries collected during the study, which are significantly different than queries currently used in evaluations. @InProceedings{CSMR-WCRE14p54, author = {Kostadin Damevski and David Shepherd and Lori Pollock}, title = {A Case Study of Paired Interleaving for Evaluating Code Search Techniques}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {54--63}, doi = {}, year = {2014}, } |
|
Rahman, Mohammad Masudur |
CSMR-WCRE '14: "Towards a Context-Aware IDE-Based ..."
Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions
Mohammad Masudur Rahman, Shamima Yeasmin, and Chanchal K. Roy (University of Saskatchewan, Canada) Study shows that software developers spend about 19% of their time looking for information in the web during software development and maintenance. Traditional web search forces them to leave the working environment (e.g., IDE) and look for information in the web browser. It also does not consider the context of the problems that the developers search solutions for. The frequent switching between web browser and the IDE is both time-consuming and distracting, and the keyword-based traditional web search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that exploits the APIs provided by three popular web search engines- Google, Yahoo, Bing and a popular programming Q & A site, StackOverflow, and captures the content-relevance, context-relevance, popularity and search engine confidence of each candidate result against the encountered programming problems. Experiments with 75 programming errors and exceptions using the proposed approach show that inclusion of different types of contextual information associated with a given exception can enhance the recommendation accuracy of a given exception. Experiments both with two existing approaches and existing web search engines confirm that our approach can perform better than them in terms of recall, mean precision and other performance measures with little computational cost. @InProceedings{CSMR-WCRE14p194, author = {Mohammad Masudur Rahman and Shamima Yeasmin and Chanchal K. Roy}, title = {Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {194--203}, doi = {}, year = {2014}, } Video |
|
Roldan-Vega, Manuel |
CSMR-WCRE '14: "NL-Based Query Refinement ..."
NL-Based Query Refinement and Contextualized Code Search Results: A User Study
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet (Montclair State University, USA) As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Source code search tools match a developer's keyword-style or natural language query with comments and identifiers in the source code to identify relevant methods that may need to be changed or understood to complete the maintenance task. In this search process, the developer faces a number of challenges: (1) formulating a query, (2) determining if the results are relevant, and (3) if the results are not relevant, reformulating the query. In this paper, we present a NL-based results view for searching source code for maintenance that helps address these challenges by integrating multiple feedback mechanisms into the search results view: prevalence of the query words in the result set, results grouped by NL-based information, as a result list, and suggested alternative query words. Our search technique is implemented as an Eclipse plug-in, CONQUER, and has been empirically validated by 18 Java developers. Our results show that users prefer CONQUER over a state of the art search technique, requesting customization of the interface in future query reformulation techniques. @InProceedings{CSMR-WCRE14p34, author = {Emily Hill and Manuel Roldan-Vega and Jerry Alan Fails and Greg Mallet}, title = {NL-Based Query Refinement and Contextualized Code Search Results: A User Study}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {34--43}, doi = {}, year = {2014}, } Info |
|
Roy, Chanchal K. |
CSMR-WCRE '14: "Automatic Ranking of Clones ..."
Automatic Ranking of Clones for Refactoring through Mining Association Rules
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider (University of Saskatchewan, Canada) In this paper, we present an in-depth empirical study on identifying clone fragments that can be important refactoring candidates. We mine association rules among clones in order to detect clone fragments that belong to the same clone class and have a tendency of changing together during software evolution. The idea is that if two or more clone fragments from the same class often change together (i.e., are likely to co-change) preserving their similarity, they might be important candidates for refactoring. Merging such clones into one (if possible) can potentially decrease future clone maintenance effort. We define a particular clone change pattern, the Similarity Preserving Change Pattern (SPCP), and consider the cloned fragments that changed according to this pattern (i.e., the SPCP clones) as important candidates for refactoring. For the purpose of our study, we implement a prototype tool called MARC that identifies SPCP clones and mines association rules among these. The rules as well as the SPCP clones are ranked for refactoring on the basis of their change-proneness. We applied MARC on thirteen subject systems and retrieved the refactoring candidates for three types of clones (Type 1, Type 2, and Type 3) separately. Our experimental results show that SPCP clones can be considered important candidates for refactoring. Clones that do not follow SPCP either evolve independently or are rarely changed. By considering SPCP clones for refactoring we not only can minimize refactoring effort considerably but also can reduce the possibility of delayed synchronizations among clones and thus, can minimize inconsistencies in software systems. @InProceedings{CSMR-WCRE14p114, author = {Manishankar Mondal and Chanchal K. Roy and Kevin A. Schneider}, title = {Automatic Ranking of Clones for Refactoring through Mining Association Rules}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {114--123}, doi = {}, year = {2014}, } CSMR-WCRE '14: "Towards a Context-Aware IDE-Based ..." Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions Mohammad Masudur Rahman, Shamima Yeasmin, and Chanchal K. Roy (University of Saskatchewan, Canada) Study shows that software developers spend about 19% of their time looking for information in the web during software development and maintenance. Traditional web search forces them to leave the working environment (e.g., IDE) and look for information in the web browser. It also does not consider the context of the problems that the developers search solutions for. The frequent switching between web browser and the IDE is both time-consuming and distracting, and the keyword-based traditional web search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that exploits the APIs provided by three popular web search engines- Google, Yahoo, Bing and a popular programming Q & A site, StackOverflow, and captures the content-relevance, context-relevance, popularity and search engine confidence of each candidate result against the encountered programming problems. Experiments with 75 programming errors and exceptions using the proposed approach show that inclusion of different types of contextual information associated with a given exception can enhance the recommendation accuracy of a given exception. Experiments both with two existing approaches and existing web search engines confirm that our approach can perform better than them in terms of recall, mean precision and other performance measures with little computational cost. @InProceedings{CSMR-WCRE14p194, author = {Mohammad Masudur Rahman and Shamima Yeasmin and Chanchal K. Roy}, title = {Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {194--203}, doi = {}, year = {2014}, } Video |
|
Saha, Ripon K. |
CSMR-WCRE '14: "An Empirical Study of Long ..."
An Empirical Study of Long Lived Bugs
Ripon K. Saha, Sarfraz Khurshid, and Dewayne E. Perry (University of Texas at Austin, USA) Bug fixing is a crucial part of software development and maintenance. A large number of bugs often indicate poor software quality since buggy behavior not only causes failures that may be costly but also has a detrimental effect on the user’s overall experience with the software product. The impact of long lived bugs can be even more critical since experiencing the same bug version after version can be particularly frustrating for user. While there are many studies that investigate factors affecting bug fixing time for entire bug repositories, to the best of our knowledge, none of these studies investigates the extent and reasons of long lived bugs. In this paper, we analyzed long lived bugs from five different perspectives: their proportion, severity, assignment, reasons, as well as the nature of fixes. Our study on four open-source projects shows that there are a considerable number of long lived bugs in each system and over 90% of them adversely affect the user’s experience. The reasons of these long lived bugs are diverse including long assignment time, not understanding their importance in advance etc. However, many bug-fixes were delayed without any specific reasons. Our analysis of bug fixing changes further shows that many long lived bugs can be fixed quickly through careful prioritization. We believe our results will help both developers and researchers to better understand factors behind delays, improve the overall bug fixing process, and investigate analytical approaches for prioritizing bugs based on bug severity as well as expected bug fixing effort. @InProceedings{CSMR-WCRE14p144, author = {Ripon K. Saha and Sarfraz Khurshid and Dewayne E. Perry}, title = {An Empirical Study of Long Lived Bugs}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {144--153}, doi = {}, year = {2014}, } |
|
Santos, Gustavo |
CSMR-WCRE '14: "Remodularization Analysis ..."
Remodularization Analysis using Semantic Clustering
Gustavo Santos, Marco Tulio Valente, and Nicolas Anquetil (UFMG, Brazil; INRIA, France) In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report that Semantic Clustering and conceptual metrics can be used to express and explain the intention of the architects when performing common modularization operators, such as module decomposition. @InProceedings{CSMR-WCRE14p224, author = {Gustavo Santos and Marco Tulio Valente and Nicolas Anquetil}, title = {Remodularization Analysis using Semantic Clustering}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {224--233}, doi = {}, year = {2014}, } Info |
|
Schneider, Kevin A. |
CSMR-WCRE '14: "Automatic Ranking of Clones ..."
Automatic Ranking of Clones for Refactoring through Mining Association Rules
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider (University of Saskatchewan, Canada) In this paper, we present an in-depth empirical study on identifying clone fragments that can be important refactoring candidates. We mine association rules among clones in order to detect clone fragments that belong to the same clone class and have a tendency of changing together during software evolution. The idea is that if two or more clone fragments from the same class often change together (i.e., are likely to co-change) preserving their similarity, they might be important candidates for refactoring. Merging such clones into one (if possible) can potentially decrease future clone maintenance effort. We define a particular clone change pattern, the Similarity Preserving Change Pattern (SPCP), and consider the cloned fragments that changed according to this pattern (i.e., the SPCP clones) as important candidates for refactoring. For the purpose of our study, we implement a prototype tool called MARC that identifies SPCP clones and mines association rules among these. The rules as well as the SPCP clones are ranked for refactoring on the basis of their change-proneness. We applied MARC on thirteen subject systems and retrieved the refactoring candidates for three types of clones (Type 1, Type 2, and Type 3) separately. Our experimental results show that SPCP clones can be considered important candidates for refactoring. Clones that do not follow SPCP either evolve independently or are rarely changed. By considering SPCP clones for refactoring we not only can minimize refactoring effort considerably but also can reduce the possibility of delayed synchronizations among clones and thus, can minimize inconsistencies in software systems. @InProceedings{CSMR-WCRE14p114, author = {Manishankar Mondal and Chanchal K. Roy and Kevin A. Schneider}, title = {Automatic Ranking of Clones for Refactoring through Mining Association Rules}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {114--123}, doi = {}, year = {2014}, } |
|
Shepherd, David |
CSMR-WCRE '14: "A Case Study of Paired Interleaving ..."
A Case Study of Paired Interleaving for Evaluating Code Search Techniques
Kostadin Damevski, David Shepherd , and Lori Pollock (Virginia State University, USA; ABB, USA; University of Delaware, USA) Source code search tools are designed to help developers locate code relevant to their task. The effectiveness of a search technique often depends on properties of user queries, the code being searched, and the specific task at hand. Thus, new code search techniques should ideally be evaluated in realistic situations that closely reflect the complexity, purpose of use, and context encountered during actual search sessions. This paper explores what can be learned from using an online paired interleaving approach, originally used for evaluating internet search engines, to comparatively observe and assess the effectiveness of code search tools in the field. We present a case study in which we implemented online paired interleaving for code search, deployed the tool in an IDE for developers at multiple companies, and analyzed results from over 300 user queries during their daily software maintenance tasks. We leveraged the results to direct further improvement of a search technique, redeployed the tool and analyzed results from over 600 queries to validate that an improvement in search was achieved in the field. We also report on the characteristics of user queries collected during the study, which are significantly different than queries currently used in evaluations. @InProceedings{CSMR-WCRE14p54, author = {Kostadin Damevski and David Shepherd and Lori Pollock}, title = {A Case Study of Paired Interleaving for Evaluating Code Search Techniques}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {54--63}, doi = {}, year = {2014}, } |
|
Shihab, Emad |
CSMR-WCRE '14: "An Empirical Study of Bug ..."
An Empirical Study of Bug Report Field Reassignment
Xin Xia, David Lo , Ming Wen, Emad Shihab, and Bo Zhou (Zhejiang University, China; Singapore Management University, Singapore; Rochester Institute of Technology, USA) A bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), platform, etc., which provide important information for the bug triaging and fixing process. It is important to make sure that bug information is correct since previous studies showed that the wrong assignment of bug report fields could increase the bug fixing time, and even delay the delivery of the software. In this paper, we perform an empirical study on bug report field reassignments in open-source software projects. To better understand why bug report fields are reassigned, we manually collect 99 recent bug reports that had their fields reassigned and emailed their reporters and developers asking why these fields got reassigned. Then, we perform a large-scale empirical study on 8 types of bug report field reassignments in 4 open-source software projects containing a total of 190,558 bug reports. In particular, we investigate 1) the number of bug reports whose fields get reassigned, 2) the difference in bug fixing time between bug reports whose fields get reassigned and those whose fields are not reassigned, 3) the duration a field in a bug report gets reassigned, 4) the number of fields in a bug report that get reassigned, 5) the number of times a field in a bug report gets reassigned, and 6) whether the experience of bug reporters affect the reassignment of bug report fields. We find that a large number (approximately 80%) of bug reports have their fields reassigned, and the bug reports whose fields get reassigned require more time to be fixed than those without field reassignments. @InProceedings{CSMR-WCRE14p174, author = {Xin Xia and David Lo and Ming Wen and Emad Shihab and Bo Zhou}, title = {An Empirical Study of Bug Report Field Reassignment}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {174--183}, doi = {}, year = {2014}, } |
|
Siket, István |
CSMR-WCRE '14: "Test Suite Reduction for Fault ..."
Test Suite Reduction for Fault Detection and Localization: A Combined Approach
László Vidács, Árpád Beszédes , Dávid Tengeri, István Siket, and Tibor Gyimóthy (MTA-SZTE Research Group on Artificial Intelligence, Hungary; University of Szeged, Hungary) The relation of test suites and actual faults in a software is of critical importance for timely product release. There are two particularily critical properties of test suites to this end: fault localization capability, to characterize the effort of finding the actually defective program elements, and fault detection capability which measures how probable is their manifestation and detection in the first place. While there are well established methods to predict fault detection capability (by measuring code coverage, for instance), characterization of fault localization is an emerging research topic. In this work, we investigate the effect of different test reduction methods on the performance of fault localization and detection techniques. We also provide new combined methods that incorporate both localization and detection aspects. We empirically evaluate the methods first by measuring detection and localization metrics of test suites with various reduction sizes, followed by how reduced test suites perform with actual faults. We experiment with SIR programs traditionally used in fault localization research, and extend the case study with large industrial software systems including GCC and WebKit. @InProceedings{CSMR-WCRE14p204, author = {László Vidács and Árpád Beszédes and Dávid Tengeri and István Siket and Tibor Gyimóthy}, title = {Test Suite Reduction for Fault Detection and Localization: A Combined Approach}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {204--213}, doi = {}, year = {2014}, } |
|
Staron, Miroslaw |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Szőke, Gábor |
CSMR-WCRE '14: "Anti-pattern Detection with ..."
Anti-pattern Detection with Model Queries: A Comparison of Approaches
Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc (Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary) Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. @InProceedings{CSMR-WCRE14p293, author = {Zoltán Ujhelyi and Ákos Horváth and Dániel Varró and Norbert István Csiszár and Gábor Szőke and László Vidács and Rudolf Ferenc}, title = {Anti-pattern Detection with Model Queries: A Comparison of Approaches}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {293--302}, doi = {}, year = {2014}, } |
|
Tan, Hee Beng Kuan |
CSMR-WCRE '14: "Detecting Infeasible Branches ..."
Detecting Infeasible Branches Based on Code Patterns
Sun Ding, Hongyu Zhang , and Hee Beng Kuan Tan (Nanyang Technological University, Singapore; Tsinghua University, China) Infeasible branches are program branches that can never be exercised regardless of the inputs of the program. Detecting infeasible branches is important to many software engineering tasks such as test case generation and test coverage measurement. Applying full-scale symbolic evaluation to infeasible branch detection could be very costly, especially for a large software system. In this work, we propose a code pattern based method for detecting infeasible branches. We first introduce two general patterns that can characterize the source code containing infeasible branches. We then develop a tool, called Pattern-based method for Infeasible branch Detection (PIND), to detect infeasible branches based on the discovered code patterns. PIND only performs symbolic evaluation for the branches that exhibit the identified code patterns, therefore significantly reduce the number of symbolic evaluations required. We evaluate PIND from two aspects: accuracy and efficiency. The experimental results show that PIND can effectively and efficiently detect infeasible branches in real-world Java and Android programs. We also explore the application of PIND in measuring test case coverage. @InProceedings{CSMR-WCRE14p74, author = {Sun Ding and Hongyu Zhang and Hee Beng Kuan Tan}, title = {Detecting Infeasible Branches Based on Code Patterns}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {74--83}, doi = {}, year = {2014}, } |
|
Tengeri, Dávid |
CSMR-WCRE '14: "Test Suite Reduction for Fault ..."
Test Suite Reduction for Fault Detection and Localization: A Combined Approach
László Vidács, Árpád Beszédes , Dávid Tengeri, István Siket, and Tibor Gyimóthy (MTA-SZTE Research Group on Artificial Intelligence, Hungary; University of Szeged, Hungary) The relation of test suites and actual faults in a software is of critical importance for timely product release. There are two particularily critical properties of test suites to this end: fault localization capability, to characterize the effort of finding the actually defective program elements, and fault detection capability which measures how probable is their manifestation and detection in the first place. While there are well established methods to predict fault detection capability (by measuring code coverage, for instance), characterization of fault localization is an emerging research topic. In this work, we investigate the effect of different test reduction methods on the performance of fault localization and detection techniques. We also provide new combined methods that incorporate both localization and detection aspects. We empirically evaluate the methods first by measuring detection and localization metrics of test suites with various reduction sizes, followed by how reduced test suites perform with actual faults. We experiment with SIR programs traditionally used in fault localization research, and extend the case study with large industrial software systems including GCC and WebKit. @InProceedings{CSMR-WCRE14p204, author = {László Vidács and Árpád Beszédes and Dávid Tengeri and István Siket and Tibor Gyimóthy}, title = {Test Suite Reduction for Fault Detection and Localization: A Combined Approach}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {204--213}, doi = {}, year = {2014}, } |
|
Tian, Yuan |
CSMR-WCRE '14: "Automated Construction of ..."
Automated Construction of a Software-Specific Word Similarity Database
Yuan Tian, David Lo , and Julia Lawall (Singapore Management University, Singapore; INRIA, France; LIP6, France) Many automated software engineering approaches, including code search, bug report categorization, and duplicate bug report detection, measure similarities between two documents by analyzing natural language contents. Often different words are used to express the same meaning and thus measuring similarities using exact matching of words is insufficient. To solve this problem, past studies have shown the need to measure the similarities between pairs of words. To meet this need, the natural language processing community has built WordNet which is a manually constructed lexical database that records semantic relations among words and can be used to measure how similar two words are. However, WordNet is a general purpose resource, and often does not contain software-specific words. Also, the meanings of words in WordNet are often different than when they are used in software engineering context. Thus, there is a need for a software-specific WordNet-like resource that can measure similarities of words. In this work, we propose an automated approach that builds a software-specific WordNet like resource, named WordSimSE , by leveraging the textual contents of posts in StackOverflow. Our approach measures the similarity of words by computing the similarities of the weighted co-occurrences of these words with three types of words in the textual corpus. We have evaluated our approach on a set of software-specific words and compared our approach with an existing WordNet-based technique (WordNet_res) to return top-k most similar words. Human judges are used to evaluate the effectiveness of the two techniques. We find that WordNet_res returns no result for 55% of the queries. For the remaining queries, WordNet_res returns significantly poorer results. @InProceedings{CSMR-WCRE14p44, author = {Yuan Tian and David Lo and Julia Lawall}, title = {Automated Construction of a Software-Specific Word Similarity Database}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {44--53}, doi = {}, year = {2014}, } |
|
Tsantalis, Nikolaos |
CSMR-WCRE '14: "Unification and Refactoring ..."
Unification and Refactoring of Clones
Giri Panamoottil Krishnan and Nikolaos Tsantalis (Concordia University, Canada) Code duplication has been recognized as a potentially serious problem having a negative impact on the maintainability, comprehensibility, and evolution of software systems. In the past, several techniques have been developed for the detection and management of software clones. Existing code duplication can be eliminated by extracting the common functionality into a single module. However, the unification and refactoring of software clones is a challenging problem, since clones usually go through several modifications after their initial introduction. In this paper we present an approach for the unification and refactoring of software clones that overcomes the limitations of previous approaches. More specifically, our approach is able to detect and parameterize non-trivial differences between the clones. Moreover, it can find an optimal mapping between the statements of the clones that minimizes the number of differences. We compared the proposed technique with a competitive clone refactoring tool and concluded that our approach is able to find a significantly larger number of refactorable clones. @InProceedings{CSMR-WCRE14p104, author = {Giri Panamoottil Krishnan and Nikolaos Tsantalis}, title = {Unification and Refactoring of Clones}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {104--113}, doi = {}, year = {2014}, } |
|
Ujhelyi, Zoltán |
CSMR-WCRE '14: "Anti-pattern Detection with ..."
Anti-pattern Detection with Model Queries: A Comparison of Approaches
Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc (Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary) Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. @InProceedings{CSMR-WCRE14p293, author = {Zoltán Ujhelyi and Ákos Horváth and Dániel Varró and Norbert István Csiszár and Gábor Szőke and László Vidács and Rudolf Ferenc}, title = {Anti-pattern Detection with Model Queries: A Comparison of Approaches}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {293--302}, doi = {}, year = {2014}, } |
|
Valente, Marco Tulio |
CSMR-WCRE '14: "Extracting Relative Thresholds ..."
Extracting Relative Thresholds for Source Code Metrics
Paloma Oliveira, Marco Tulio Valente, and Fernando Paim Lima (UFMG, Brazil; IFMG, Brazil) Establishing credible thresholds is a central challenge for promoting source code metrics as an effective instrument to control the internal quality of software systems. To address this challenge, we propose the concept of relative thresholds for evaluating metrics data following heavy-tailed distributions. The proposed thresholds are relative because they assume that metric thresholds should be followed by most source code entities, but that it is also natural to have a number of entities in the ``long-tail'' that do not follow the defined limits. In the paper, we describe an empirical method for extracting relative thresholds from real systems. We also report a study on applying this method in a corpus with 106 systems. Based on the results of this study, we argue that the proposed thresholds express a balance between real and idealized design practices. @InProceedings{CSMR-WCRE14p254, author = {Paloma Oliveira and Marco Tulio Valente and Fernando Paim Lima}, title = {Extracting Relative Thresholds for Source Code Metrics}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {254--263}, doi = {}, year = {2014}, } CSMR-WCRE '14: "Remodularization Analysis ..." Remodularization Analysis using Semantic Clustering Gustavo Santos, Marco Tulio Valente, and Nicolas Anquetil (UFMG, Brazil; INRIA, France) In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report that Semantic Clustering and conceptual metrics can be used to express and explain the intention of the architects when performing common modularization operators, such as module decomposition. @InProceedings{CSMR-WCRE14p224, author = {Gustavo Santos and Marco Tulio Valente and Nicolas Anquetil}, title = {Remodularization Analysis using Semantic Clustering}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {224--233}, doi = {}, year = {2014}, } Info |
|
Varró, Dániel |
CSMR-WCRE '14: "Anti-pattern Detection with ..."
Anti-pattern Detection with Model Queries: A Comparison of Approaches
Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc (Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary) Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. @InProceedings{CSMR-WCRE14p293, author = {Zoltán Ujhelyi and Ákos Horváth and Dániel Varró and Norbert István Csiszár and Gábor Szőke and László Vidács and Rudolf Ferenc}, title = {Anti-pattern Detection with Model Queries: A Comparison of Approaches}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {293--302}, doi = {}, year = {2014}, } |
|
Vidács, László |
CSMR-WCRE '14: "Test Suite Reduction for Fault ..."
Test Suite Reduction for Fault Detection and Localization: A Combined Approach
László Vidács, Árpád Beszédes , Dávid Tengeri, István Siket, and Tibor Gyimóthy (MTA-SZTE Research Group on Artificial Intelligence, Hungary; University of Szeged, Hungary) The relation of test suites and actual faults in a software is of critical importance for timely product release. There are two particularily critical properties of test suites to this end: fault localization capability, to characterize the effort of finding the actually defective program elements, and fault detection capability which measures how probable is their manifestation and detection in the first place. While there are well established methods to predict fault detection capability (by measuring code coverage, for instance), characterization of fault localization is an emerging research topic. In this work, we investigate the effect of different test reduction methods on the performance of fault localization and detection techniques. We also provide new combined methods that incorporate both localization and detection aspects. We empirically evaluate the methods first by measuring detection and localization metrics of test suites with various reduction sizes, followed by how reduced test suites perform with actual faults. We experiment with SIR programs traditionally used in fault localization research, and extend the case study with large industrial software systems including GCC and WebKit. @InProceedings{CSMR-WCRE14p204, author = {László Vidács and Árpád Beszédes and Dávid Tengeri and István Siket and Tibor Gyimóthy}, title = {Test Suite Reduction for Fault Detection and Localization: A Combined Approach}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {204--213}, doi = {}, year = {2014}, } CSMR-WCRE '14: "Anti-pattern Detection with ..." Anti-pattern Detection with Model Queries: A Comparison of Approaches Zoltán Ujhelyi, Ákos Horváth, Dániel Varró, Norbert István Csiszár, Gábor Szőke, László Vidács, and Rudolf Ferenc (Budapest University of Technology and Economics, Hungary; University of Szeged, Hungary; Refactoring 2011, Hungary; MTA-SZTE Research Group on Artificial Intelligence, Hungary) Program queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries. Our paper investigates the use of the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by three general-purpose model query techniques based on native Java code, local-search and incremental evaluation. We provide in-depth comparison of these techniques on the source code of 17 Java projects using queries taken from refactoring operations in different usage profiles. Our results show that general purpose model queries outperform hand-coded queries by 2-3 orders of magnitude, while there is a 5-10 times increase in memory consumption and model load time. In addition, measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. @InProceedings{CSMR-WCRE14p293, author = {Zoltán Ujhelyi and Ákos Horváth and Dániel Varró and Norbert István Csiszár and Gábor Szőke and László Vidács and Rudolf Ferenc}, title = {Anti-pattern Detection with Model Queries: A Comparison of Approaches}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {293--302}, doi = {}, year = {2014}, } |
|
Wang, Xinyu |
CSMR-WCRE '14: "Towards More Accurate Multi-label ..."
Towards More Accurate Multi-label Software Behavior Learning
Xin Xia, Yang Feng, David Lo , Zhenyu Chen, and Xinyu Wang (Zhejiang University, China; Nanjing University, China; Singapore Management University, Singapore) In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of faults, using a composite algorithm named MLL-GA, which combines various multi-label learning algorithms by leveraging genetic algorithm (GA). To evaluate the effectiveness of MLL-GA, we perform experiments on 6 open source programs and show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665. We also compare our algorithm with Ml.KNN and show that on average across the 6 datasets, MLL-GA improves the average F-measure of Ml.KNN by 14.43%. @InProceedings{CSMR-WCRE14p134, author = {Xin Xia and Yang Feng and David Lo and Zhenyu Chen and Xinyu Wang}, title = {Towards More Accurate Multi-label Software Behavior Learning}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {134--143}, doi = {}, year = {2014}, } |
|
Wen, Ming |
CSMR-WCRE '14: "An Empirical Study of Bug ..."
An Empirical Study of Bug Report Field Reassignment
Xin Xia, David Lo , Ming Wen, Emad Shihab, and Bo Zhou (Zhejiang University, China; Singapore Management University, Singapore; Rochester Institute of Technology, USA) A bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), platform, etc., which provide important information for the bug triaging and fixing process. It is important to make sure that bug information is correct since previous studies showed that the wrong assignment of bug report fields could increase the bug fixing time, and even delay the delivery of the software. In this paper, we perform an empirical study on bug report field reassignments in open-source software projects. To better understand why bug report fields are reassigned, we manually collect 99 recent bug reports that had their fields reassigned and emailed their reporters and developers asking why these fields got reassigned. Then, we perform a large-scale empirical study on 8 types of bug report field reassignments in 4 open-source software projects containing a total of 190,558 bug reports. In particular, we investigate 1) the number of bug reports whose fields get reassigned, 2) the difference in bug fixing time between bug reports whose fields get reassigned and those whose fields are not reassigned, 3) the duration a field in a bug report gets reassigned, 4) the number of fields in a bug report that get reassigned, 5) the number of times a field in a bug report gets reassigned, and 6) whether the experience of bug reporters affect the reassignment of bug report fields. We find that a large number (approximately 80%) of bug reports have their fields reassigned, and the bug reports whose fields get reassigned require more time to be fixed than those without field reassignments. @InProceedings{CSMR-WCRE14p174, author = {Xin Xia and David Lo and Ming Wen and Emad Shihab and Bo Zhou}, title = {An Empirical Study of Bug Report Field Reassignment}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {174--183}, doi = {}, year = {2014}, } |
|
Wikström, Erik |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Wranker, Johan |
CSMR-WCRE '14: "Identifying Risky Areas of ..."
Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report
Vard Antinyan, Miroslaw Staron , Wilhelm Meding, Per Österström, Erik Wikström, Johan Wranker, Anders Henriksson, and Jörgen Hansson (Chalmers, Sweden; University of Gothenburg, Sweden; Ericsson, Sweden; Volvo, Sweden) Abstract—Modern software development relies on incremental delivery to facilitate quick response to customers’ requests. In this dynamic environment the continuous modifications of software code can cause risks for software developers; when developing a new feature increment, the added or modified code may contain fault-prone or difficult-to-maintain elements. The outcome of these risks can be defective software or decreased development velocity. This study presents a method to identify the risky areas and assess the risk when developing software code in Lean/Agile environment. We have conducted an action research project in two large companies, Ericsson AB and Volvo Group Truck Technology. During the study we have measured a set of code properties and investigated their influence on risk. The results show that the superposition of two metrics, complexity and revisions of a source code file, can effectively enable identification and assessment of the risk. We also illustrate how this kind of assessment can be successfully used by software developers to manage risks on a weekly basis as well as release-wise. A measurement system for systematic risk assessment has been introduced to two companies. @InProceedings{CSMR-WCRE14p154, author = {Vard Antinyan and Miroslaw Staron and Wilhelm Meding and Per Österström and Erik Wikström and Johan Wranker and Anders Henriksson and Jörgen Hansson}, title = {Identifying Risky Areas of Software Code in Agile/Lean Software Development: An Industrial Experience Report}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {154--163}, doi = {}, year = {2014}, } |
|
Xia, Xin |
CSMR-WCRE '14: "Towards More Accurate Multi-label ..."
Towards More Accurate Multi-label Software Behavior Learning
Xin Xia, Yang Feng, David Lo , Zhenyu Chen, and Xinyu Wang (Zhejiang University, China; Nanjing University, China; Singapore Management University, Singapore) In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of faults, using a composite algorithm named MLL-GA, which combines various multi-label learning algorithms by leveraging genetic algorithm (GA). To evaluate the effectiveness of MLL-GA, we perform experiments on 6 open source programs and show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665. We also compare our algorithm with Ml.KNN and show that on average across the 6 datasets, MLL-GA improves the average F-measure of Ml.KNN by 14.43%. @InProceedings{CSMR-WCRE14p134, author = {Xin Xia and Yang Feng and David Lo and Zhenyu Chen and Xinyu Wang}, title = {Towards More Accurate Multi-label Software Behavior Learning}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {134--143}, doi = {}, year = {2014}, } CSMR-WCRE '14: "An Empirical Study of Bug ..." An Empirical Study of Bug Report Field Reassignment Xin Xia, David Lo , Ming Wen, Emad Shihab, and Bo Zhou (Zhejiang University, China; Singapore Management University, Singapore; Rochester Institute of Technology, USA) A bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), platform, etc., which provide important information for the bug triaging and fixing process. It is important to make sure that bug information is correct since previous studies showed that the wrong assignment of bug report fields could increase the bug fixing time, and even delay the delivery of the software. In this paper, we perform an empirical study on bug report field reassignments in open-source software projects. To better understand why bug report fields are reassigned, we manually collect 99 recent bug reports that had their fields reassigned and emailed their reporters and developers asking why these fields got reassigned. Then, we perform a large-scale empirical study on 8 types of bug report field reassignments in 4 open-source software projects containing a total of 190,558 bug reports. In particular, we investigate 1) the number of bug reports whose fields get reassigned, 2) the difference in bug fixing time between bug reports whose fields get reassigned and those whose fields are not reassigned, 3) the duration a field in a bug report gets reassigned, 4) the number of fields in a bug report that get reassigned, 5) the number of times a field in a bug report gets reassigned, and 6) whether the experience of bug reporters affect the reassignment of bug report fields. We find that a large number (approximately 80%) of bug reports have their fields reassigned, and the bug reports whose fields get reassigned require more time to be fixed than those without field reassignments. @InProceedings{CSMR-WCRE14p174, author = {Xin Xia and David Lo and Ming Wen and Emad Shihab and Bo Zhou}, title = {An Empirical Study of Bug Report Field Reassignment}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {174--183}, doi = {}, year = {2014}, } |
|
Xie, Shuai |
CSMR-WCRE '14: "An Empirical Study on the ..."
An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies
Shuai Xie, Foutse Khomh , Ying Zou , and Iman Keivanloo (Queen's University, Canada; Polytechnique Montréal, Canada) Copy and paste activities create clone groups in software systems. The evolution of a clone group across the history of a software system is termed as clone genealogy. During the evolution of a clone group, developers may change the location of the code fragments in the clone group. The type of the clone group may also change (e.g., from Type-1 to Type-2). These two phenomena have been referred to as clone migration and clone mutation respectively. Previous studies have found that clone migration occur frequently in software systems, and suggested that clone migration can induce faults in a software system. In this paper, we examine how clone migration phenomena affect the risk for faults in clone segments, clone groups, and clone genealogies from three long-lived software systems JBOSS, APACHE-ANT, and ARGOUML. Results show that: (1) migrated clone segments, clone groups, and clone genealogies are not equally fault-prone; (2) when a clone mutation occurs during a clone migration, the risk for faults in the migrated clone is increased; (3) migrating a clone that was not changed for a longer period of time is risky. @InProceedings{CSMR-WCRE14p94, author = {Shuai Xie and Foutse Khomh and Ying Zou and Iman Keivanloo}, title = {An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {94--103}, doi = {}, year = {2014}, } |
|
Yeasmin, Shamima |
CSMR-WCRE '14: "Towards a Context-Aware IDE-Based ..."
Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions
Mohammad Masudur Rahman, Shamima Yeasmin, and Chanchal K. Roy (University of Saskatchewan, Canada) Study shows that software developers spend about 19% of their time looking for information in the web during software development and maintenance. Traditional web search forces them to leave the working environment (e.g., IDE) and look for information in the web browser. It also does not consider the context of the problems that the developers search solutions for. The frequent switching between web browser and the IDE is both time-consuming and distracting, and the keyword-based traditional web search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that exploits the APIs provided by three popular web search engines- Google, Yahoo, Bing and a popular programming Q & A site, StackOverflow, and captures the content-relevance, context-relevance, popularity and search engine confidence of each candidate result against the encountered programming problems. Experiments with 75 programming errors and exceptions using the proposed approach show that inclusion of different types of contextual information associated with a given exception can enhance the recommendation accuracy of a given exception. Experiments both with two existing approaches and existing web search engines confirm that our approach can perform better than them in terms of recall, mean precision and other performance measures with little computational cost. @InProceedings{CSMR-WCRE14p194, author = {Mohammad Masudur Rahman and Shamima Yeasmin and Chanchal K. Roy}, title = {Towards a Context-Aware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {194--203}, doi = {}, year = {2014}, } Video |
|
Zaidman, Andy |
CSMR-WCRE '14: "Web API Growing Pains: Stories ..."
Web API Growing Pains: Stories from Client Developers and Their Code
Tiago Espinha, Andy Zaidman, and Hans-Gerhard Gross (Delft University of Technology, Netherlands) Web APIs provide a systematic and extensible approach for application-to-application interaction. Developers using web APIs are forced to accompany the API providers in their software evolution tasks. In order to understand the distress caused by this imposition on web API client developers we perform a semi-structured interview with six such developers. We also investigate how major web API providers organize their API evolution, and we explore how this affects source code changes of their clients. Our exploratory study of the Twitter, Google Maps, Facebook and Netflix web APIs analyzes the state of web API evolution practices and provides insight into the impact of service evolution on client software. Our study is complemented with a set of observations regarding best practices for web API evolution. @InProceedings{CSMR-WCRE14p84, author = {Tiago Espinha and Andy Zaidman and Hans-Gerhard Gross}, title = {Web API Growing Pains: Stories from Client Developers and Their Code}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {84--93}, doi = {}, year = {2014}, } |
|
Zhang, Hongyu |
CSMR-WCRE '14: "Detecting Infeasible Branches ..."
Detecting Infeasible Branches Based on Code Patterns
Sun Ding, Hongyu Zhang , and Hee Beng Kuan Tan (Nanyang Technological University, Singapore; Tsinghua University, China) Infeasible branches are program branches that can never be exercised regardless of the inputs of the program. Detecting infeasible branches is important to many software engineering tasks such as test case generation and test coverage measurement. Applying full-scale symbolic evaluation to infeasible branch detection could be very costly, especially for a large software system. In this work, we propose a code pattern based method for detecting infeasible branches. We first introduce two general patterns that can characterize the source code containing infeasible branches. We then develop a tool, called Pattern-based method for Infeasible branch Detection (PIND), to detect infeasible branches based on the discovered code patterns. PIND only performs symbolic evaluation for the branches that exhibit the identified code patterns, therefore significantly reduce the number of symbolic evaluations required. We evaluate PIND from two aspects: accuracy and efficiency. The experimental results show that PIND can effectively and efficiently detect infeasible branches in real-world Java and Android programs. We also explore the application of PIND in measuring test case coverage. @InProceedings{CSMR-WCRE14p74, author = {Sun Ding and Hongyu Zhang and Hee Beng Kuan Tan}, title = {Detecting Infeasible Branches Based on Code Patterns}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {74--83}, doi = {}, year = {2014}, } |
|
Zhou, Bo |
CSMR-WCRE '14: "An Empirical Study of Bug ..."
An Empirical Study of Bug Report Field Reassignment
Xin Xia, David Lo , Ming Wen, Emad Shihab, and Bo Zhou (Zhejiang University, China; Singapore Management University, Singapore; Rochester Institute of Technology, USA) A bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), platform, etc., which provide important information for the bug triaging and fixing process. It is important to make sure that bug information is correct since previous studies showed that the wrong assignment of bug report fields could increase the bug fixing time, and even delay the delivery of the software. In this paper, we perform an empirical study on bug report field reassignments in open-source software projects. To better understand why bug report fields are reassigned, we manually collect 99 recent bug reports that had their fields reassigned and emailed their reporters and developers asking why these fields got reassigned. Then, we perform a large-scale empirical study on 8 types of bug report field reassignments in 4 open-source software projects containing a total of 190,558 bug reports. In particular, we investigate 1) the number of bug reports whose fields get reassigned, 2) the difference in bug fixing time between bug reports whose fields get reassigned and those whose fields are not reassigned, 3) the duration a field in a bug report gets reassigned, 4) the number of fields in a bug report that get reassigned, 5) the number of times a field in a bug report gets reassigned, and 6) whether the experience of bug reporters affect the reassignment of bug report fields. We find that a large number (approximately 80%) of bug reports have their fields reassigned, and the bug reports whose fields get reassigned require more time to be fixed than those without field reassignments. @InProceedings{CSMR-WCRE14p174, author = {Xin Xia and David Lo and Ming Wen and Emad Shihab and Bo Zhou}, title = {An Empirical Study of Bug Report Field Reassignment}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {174--183}, doi = {}, year = {2014}, } |
|
Zou, Ying |
CSMR-WCRE '14: "An Empirical Study on the ..."
An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies
Shuai Xie, Foutse Khomh , Ying Zou , and Iman Keivanloo (Queen's University, Canada; Polytechnique Montréal, Canada) Copy and paste activities create clone groups in software systems. The evolution of a clone group across the history of a software system is termed as clone genealogy. During the evolution of a clone group, developers may change the location of the code fragments in the clone group. The type of the clone group may also change (e.g., from Type-1 to Type-2). These two phenomena have been referred to as clone migration and clone mutation respectively. Previous studies have found that clone migration occur frequently in software systems, and suggested that clone migration can induce faults in a software system. In this paper, we examine how clone migration phenomena affect the risk for faults in clone segments, clone groups, and clone genealogies from three long-lived software systems JBOSS, APACHE-ANT, and ARGOUML. Results show that: (1) migrated clone segments, clone groups, and clone genealogies are not equally fault-prone; (2) when a clone mutation occurs during a clone migration, the risk for faults in the migrated clone is increased; (3) migrating a clone that was not changed for a longer period of time is risky. @InProceedings{CSMR-WCRE14p94, author = {Shuai Xie and Foutse Khomh and Ying Zou and Iman Keivanloo}, title = {An Empirical Study on the Fault-Proneness of Clone Migration in Clone Genealogies}, booktitle = {Proc.\ CSMR-WCRE}, publisher = {IEEE}, pages = {94--103}, doi = {}, year = {2014}, } |
95 authors
proc time: 0.34