Powered by
Conference Publishing Consulting

2013 10th Working Conference on Mining Software Repositories (MSR), May 18–19, 2013, San Francisco, CA, USA

MSR 2013 – Proceedings

Contents - Abstracts - Authors
Online Calendar - iCal File

Preface

Title Page

Message from the Chairs
This is the MSR 2013 Preface, including the welcome message from the chair and the list of organizing, steering and program committee members.

Keynote

What Is Software Development Productivity, Anyway? (Keynote)
Gail C. Murphy
(University of British Columbia, Canada)
Businesses and consumers all want more software faster. The seemingly ever-increasing demand for more software suggests the need to not only increase production capabilities but also to produce more with the resources available for production. In other words, software development productivity needs to increase. But what is software development productivity anyway? In this talk, I will explore various ways in which productivity, both in general and for software development, has been characterized and will explore ways in which mining software repository information can help accelerate both software development productivity and innovation.
Article Search

Bug Triaging

Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation
Ramin Shokripour, John Anvik, Zarinah M. Kasirun, and Sima Zamani
(University of Malaya, Malaysia; Central Washington University, USA)
Large software development projects receive many bug reports and each of these reports needs to be triaged. An important step in the triage process is the assignment of the report to a developer. Most previous efforts towards improving bug report assignment have focused on using an activity-based approach. We address some of the limitations of activity-based approaches by proposing a two-phased location-based approach where bug report assignment recommendations are based on the predicted location of the bug. The proposed approach utilizes a noun extraction process on several information sources to determine bug location information and a simple term weighting scheme to provide a bug report assignment recommendation. We found that by using a location-based approach, we achieved an accuracy of 89.41% and 59.76% when recommending five developers for the Eclipse and Mozilla projects, respectively.
Article Search
Which Work-Item Updates Need Your Response?
Debdoot Mukherjee and Malika Garg
(IBM Research, India; IIT Delhi, India)
Work-item notifications alert the team collaborating on a work-item about any update to the work-item (e.g., addition of comments, change in status). However, as software professionals get involved with multiple tasks in project(s), they are inundated by too many notifications from the work-item tool. Users are upset that they often miss the notifications that solicit their response in the crowd of mostly useless ones. We investigate the severity of this problem by studying the work-item repositories of two large collaborative projects and conducting a user study with one of the project teams. We find that, on an average, only 1 out of every 5 notifications that are received by the users require a response from them. We propose TWINY - a machine learning based approach to predict whether a notification will prompt any action from its recipient. Such a prediction can help to suitably mark up notifications and to decide whether a notification needs to be sent out immediately or be bundled in a message digest. We conduct empirical studies to evaluate the efficacy of different classification techniques in this setting. We find that incremental learning algorithms are ideally suited, and ensemble methods appear to give the best results in terms of prediction accuracy.
Article Search
Bug Report Assignee Recommendation using Activity Profiles
Hoda Naguib, Nitesh Narayan, Bernd Brügge, and Dina Helal
(TU Munich, Germany; German University in Cairo, Egypt)
One question which frequently arises within the context of artifacts stored in a bug tracking repository is: who should work on this bug report? A number of approaches exist to semi-automatically identify and recommend developers, e.g. using machine learning techniques and social networking analysis. In this work, we propose a new approach for assignee recommendation leveraging user activities in a bug tracking repository. Within the bug tracking repository, an activity profile is created for each user from the history of all his activities (i.e. review, assign, and resolve). This profile, to some extent, indicates the users role, expertise, and involvement in this project. These activities influence and contribute to the identification and ranking of suitable assignees. In order to evaluate our work, we apply it to bug reports of three different projects. Our results indicate that the proposed approach is able to achieve an average hit ratio of 88%. Comparing this result to the LDA-SVMbased assignee recommendation technique, it was found that the proposed approach performs better.
Article Search

MSR Goes Mobile

Asking for (and about) Permissions Used by Android Apps
Ryan Stevens, Jonathan Ganz, Vladimir Filkov, Premkumar Devanbu, and Hao Chen
(UC Davis, USA)
Security policies, which specify what applications are allowed to do, are notoriously difficult to specify correctly. Many applications were found to request over-liberal permis- sions. On mobile platforms, this might prevent a cautious user from installing an otherwise harmless application or, even worse, increase the attack surface in vulnerable applications. As a result of such difficulties, programmers frequently ask about them in on-line fora. Our goal is to gain some insight into both the misuse of permissions and the discussions of permissions in on-line fora. We analyze about 10,000 free apps from popular Android markets and found a significant sub-linear relationship between the popularity of a permission and the number of times when it is misused. We also study the relationship of permission use and the number of questions about the permission on StackOverflow. Finally, we study the effect of the influence of a permission (the functionality that it controls) and the interference of a permission (the number of other permissions that influence the same classes) on the occurrence of both permission misuse and permission discussions in StackOverflow.
Article Search
Retrieving and Analyzing Mobile Apps Feature Requests from Online Reviews
Claudia Iacob and Rachel Harrison
(Oxford Brookes University, UK)
Mobile app reviews are valuable repositories of ideas coming directly from app users. Such ideas span various topics, and in this paper we show that 23.3% of them represent feature requests, i.e. comments through which users either suggest new features for an app or express preferences for the re-design of already existing features of an app. One of the challenges app developers face when trying to make use of such feedback is the massive amount of available reviews. This makes it difficult to identify specific topics and recurring trends across reviews. Through this work, we aim to support such processes by designing MARA (Mobile App Review Analyzer), a prototype for automatic retrieval of mobile app feature requests from online reviews. The design of the prototype is a) informed by an investigation of the ways users express feature requests through reviews, b) developed around a set of pre-defined linguistic rules, and c) evaluated on a large sample of online reviews. The results of the evaluation were further analyzed using Latent Dirichlet Allocation for identifying common topics across feature requests, and the results of this analysis are reported in this paper.
Article Search
Gerrit Software Code Review Data from Android
Murtuza Mukadam, Christian Bird, and Peter C. Rigby
(Concordia University, Canada; Microsoft Research, USA)
Over the past decade, a number of tools and systems have been developed to manage various aspects of the software development lifecycle. Until now, tool supported code review, an important aspect of software development, has been largely ignored. With the advent of open source code review tools such as Gerrit along with projects that use them, code review data is now available for collection, analysis, and triangulation with other software development data. In this paper, we extract Android peer review data from Gerrit. We describe the Android peer review process, the reverse engineering of the Gerrit JSON API, our data mining and cleaning methodology, database schema, and provide an example of how the data can be used to answer an empirical software engineering question. The database is available for use by the research community.
Article Search
Who Does What during a Code Review? Datasets of OSS Peer Review Repositories
Kazuki Hamasaki, Raula Gaikovina Kula, Norihiro Yoshida, A. E. Camargo Cruz, Kenji Fujiwara, and Hajimu Iida
(NAIST, Japan; Osaka University, Japan)
We present four datasets that are focused on the general roles of OSS peer review members. With data mined from both an integrated peer review system and code source repositories, our rich datasets comprise of peer review data that was automatically recorded. Using the Android project as a case study, we describe our extraction methodology, the datasets and their application used for three separate studies. Our datasets are available online at http://sdlab.naist.jp/reviewmining/
Article Search

MSR Challenge

Why, When, and What: Analyzing Stack Overflow Questions by Topic, Type, and Code
Miltiadis Allamanis and Charles Sutton
(University of Edinburgh, UK)
Questions from Stack Overflow provide a unique opportunity to gain insight into what programming concepts are the most confusing. We present a topic modeling analysis that combines question concepts, types, and code. Using topic modeling, we are able to associate programming concepts and identifiers (like the String class) with particular types of questions, such as, "how to perform encoding".
Article Search
Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis
Joshua Charles Campbell, Chenlei Zhang, Zhen Xu, Abram Hindle, and James Miller
(University of Alberta, Canada)
A project's documentation is the primary source of information for developers using that project. With hundreds of thousands of programming-related questions posted on programming Q&A websites, such as Stack Overflow, we question whether the developer-written documentation provides enough guidance for programmers. In this study, we wanted to know if there are any topics which are inadequately covered by the project documentation. We combined questions from Stack Overflow and documentation from the PHP and Python projects. Then, we applied topic analysis to this data using latent Dirichlet allocation (LDA), and found topics in Stack Overflow that did not overlap the project documentation. We successfully located topics that had deficient project documentation. We also found topics in need of tutorial documentation that were outside of the scope of the PHP or Python projects, such as MySQL and HTML.
Article Search
Detecting API Usage Obstacles: A Study of iOS and Android Developer Questions
Wei Wang and Michael W. Godfrey
(University of Waterloo, Canada)
Software frameworks provide sets of generic functionalities that can be later customized for a specific task. When developers invoke API methods in a framework, they often encounter obstacles in finding the correct usage of the API, let alone to employ best practices. Previous research addresses this line of questions by mining API usage patterns to induce API usage templates, by conducting and compiling interviews of developers, and by inferring correlations among APIs. In this paper, we analyze API-related posts regarding iOS and Android development from a Q&A website, (stackoverflow.com). Assuming that API-related posts are primarily about API usage obstacles, we find several iOS and Android API classes that appear to be particularly likely to challenge developers, even after we factor out API usage hotspots, inferred by modelling API usage of open source iOS and Android applications. For each API with usage obstacles, we further apply a topic mining tool to posts that are tagged with the API, and we discover several repetitive scenarios in which API usage obstacles occur. We consider our work as a stepping stone towards understanding API usage challenges based on forum-based input from a multitude of developers, input that is prohibitively expensive to collect through interviews. Our method helps to motivate future research in API usage, and can allow designers of platforms --- such as iOS and Android --- to better understand the problems developers have in using their platforms, and to make corresponding improvements.
Article Search
Encouraging User Behaviour with Achievements: An Empirical Study
Scott Grant and Buddy Betts
(Queen's University, Canada; OUYA, USA)
Stack Overflow, a question and answer website, uses a reward system called badges to publicly reward users for their contributions to the community. Badges are used alongside a reputation score to reward positive behaviour by relating a user's site identity with their perceived expertise and respect in the community. A greater number of badges associated with a user profile in some way indicates a higher level of authority, leading to a natural incentive for users to attempt to achieve as many badges as possible. In this study, we examine the publicly available logs for Stack Overflow to examine three of these badges in detail. We look at the effect of one badge in context on an individual user level and at the global scope of three related badges across all users by mining user behaviour around the time that the badge is awarded. This analysis supports the claim that badges can be used to influence user behaviour by demonstrating one instance of an increase in user activity related to a badge immediately before it is awarded when compared to the period afterwards.
Article Search
Is Programming Knowledge Related to Age? An Exploration of Stack Overflow
Patrick Morrison and Emerson Murphy-Hill
(North Carolina State University, USA)
Becoming an expert at programming is thought to take an estimated 10,000 hours of deliberate practice . But what happens after that? Do programming experts continue to develop, do they plateau, or is there a decline at some point? A diversity of opinion exists on this matter, but many seem to think that aging brings a decline in adoption and absorption of new programming knowledge. We develop several research questions on this theme, and draw on data from StackOverflow to address these questions. The goal of this research is to support career planning and staff development for programmers by identifying age-related trends in StackOverflow data. We observe that programmer reputation scores increase relative to age well into the 50’s, that programmers in their 30’s tend to focus on fewer areas relative to those younger or older in age, and that there is not a strong correlation between age and scores in specific knowledge areas.
Article Search
A Discriminative Model Approach for Suggesting Tags Automatically for Stack Overflow Questions
Avigit K. Saha, Ripon K. Saha, and Kevin A. Schneider
(University of Saskatchewan, Canada; University of Texas at Austin, USA)
Annotating documents with keywords or ‘tags’ is useful for categorizing documents and helping users find a document efficiently and quickly. Question and answer (Q&A) sites also use tags to categorize questions to help ensure that their users are aware of questions related to their areas of expertise or interest. However, someone asking a question may not necessarily know the best way to categorize or tag the question, and automatically tagging or categorizing a question is a challenging task. Since a Q&A site may host millions of questions with tags and other data, this information can be used as a training and test dataset for approaches that automatically suggest tags for new questions. In this paper, we mine data from millions of questions from the Q&A site Stack Overflow, and using a discriminative model approach, we automatically suggest question tags to help a questioner choose appropriate tags for eliciting a response.
Article Search
Exploring Activeness of Users in QA Forums
Vibha Singhal Sinha, Senthil Mani, and Monika Gupta
(IBM Research, India)
Success of a Q&A forum depends on volume of content (questions and answers) and quality of content (are the questions asked relevant, answers provided correct etc). Community participation is essential to create and curate content. Since their inception in 2008, stack exchange based forums have been able to engage a large number of users to create a rich repository of good quality questions and answers. In this paper, we wish to investigate the activeness of users in the stack- exchange network particularly from a perspective of content creation. We also attempt to measure how the forums incentive mechanism has enabled users activeness. Further, we investigate how users have diffused to other parts of the stack exchange network over time, hence bootstrapping new forums.
Article Search
A Study of Innovation Diffusion through Link Sharing on Stack Overflow
Carlos Gómez, Brendan Cleary, and Leif Singer
(University of Victoria, Canada)
It is poorly understood how developers discover and adopt software development innovations such as tools, libraries, frameworks, or web sites that support developers. Yet, being aware of and choosing appropriate tools and components can have a significant impact on the outcome of a software project. In our study, we investigate link sharing on Stack Overflow to gain insights into how software developers discover and disseminate innovations. We find that link sharing is a significant phenomenon on Stack Overflow, that Stack Overflow is an important resource for software development innovation dissemination and that its part of a larger interconnected network of online resources used and referenced by developers. This knowledge can guide researchers and practitioners who build tools and services that support software developers in the exploration, discovery, and adoption of software development innovations.
Article Search
Making Sense of Online Code Snippets
Siddharth Subramanian and Reid Holmes
(University of Waterloo, Canada)
Stack Overflow contains a large number of high quality source code snippets. The quality of these snippets has been verified by users marking them as solving a specific problem. Stack Overflow treats source code snippets as plain text and searches surface snippets as they would any other text. Unfortunately, plain text does not capture the structural qualities of these snippets; for example, snippets frequently refer to specific API (e.g., Android), but by treating the snippets as text, linkage to the Android API is not always apparent. We perform snippet analysis to extract structural information from short plain-text snippets that are often found in Stack Overflow. This analysis is able to identify 253,137 method calls and type references from 21,250 Stack Overflow code snippets. We show how identifying these structural relationships from snippets could perform better than lexical search over code blocks in practice.
Article Search
Building Reputation in StackOverflow: An Empirical Investigation
Amiangshu Bosu, Christopher S. Corley, Dustin Heaton, Debarshi Chatterji, Jeffrey C. Carver, and Nicholas A. Kraft
(University of Alabama, USA)
StackOverflow (SO) contributors are recognized by reputation scores. Earning a high reputation score requires technical expertise and sustained effort. We analyzed the SO data from four perspectives to understand the dynamics of reputation building on SO. The results of our analysis provide guidance to new SO contributors who want to earn high reputation scores quickly. In particular, the results indicate that the following activities can help to build reputation quickly: answering questions related to tags with lower expertise density, answering questions promptly, being the first one to answer a question, being active during off peak hours, and contributing to diverse areas.
Article Search
An Exploratory Analysis of Mobile Development Issues using Stack Overflow
Mario Linares-Vásquez, Bogdan Dit, and Denys Poshyvanyk
(College of William and Mary, USA)
Question & answer (Q&A) websites, such as Stack Overflow (SO), are widely used by developers to find and provide answers to technical issues and concerns in software development. Mobile development is not an exception to the rule. In the latest SO dump, more than 400K questions were labeled with tags related to mobile technologies. Although, previous works have analyzed the main topics and trends in SO threads, there are no studies devoted specifically to mobile development. In this paper we used topic modeling techniques to extract hot-topics from mobile-development related questions. Our findings suggest that most of the questions include topics related to general questions and compatibility issues, and the most specific topics, such as crash reports and database connection, are present in a reduced set of questions.
Article Search
Answering Questions about Unanswered Questions of Stack Overflow
Muhammad Asaduzzaman, Ahmed Shah Mashiyat, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada; University of Toronto, Canada)
Community-based question answering services accumulate large volumes of knowledge through the voluntary services of people across the globe. Stack Overflow is an example of such a service that targets developers and software engineers. In general, questions in Stack Overflow are answered in a very short time. However, we found that the number of unanswered questions has increased significantly in the past two years. Understanding why questions remain unanswered can help information seekers improve the quality of their questions, increase their chances of getting answers, and better decide when to use Stack Overflow services. In this paper, we mine data on unanswered questions from Stack Overflow. We then conduct a qualitative study to categorize unanswered questions, which reveals characteristics that would be difficult to find otherwise. Finally, we conduct an experiment to determine whether we can predict how long a question will remain unanswered in Stack Overflow.
Article Search

Changes and Fixes

Will My Patch Make It? And How Fast?: Case Study on the Linux Kernel
Yujuan Jiang, Bram Adams, and Daniel M. German
(Polytechnique Montréal, Canada; University of Victoria, Canada)
The Linux kernel follows an extremely distributed reviewing and integration process supported by 130 developer mailing lists and a hierarchy of dozens of Git repositories for version control. Since not every patch can make it and of those that do, some patches require a lot more reviewing and integra- tion effort than others, developers, reviewers and integrators need support for estimating which patches are worthwhile to spend effort on and which ones do not stand a chance. This paper cross- links and analyzes eight years of patch reviews from the kernel mailing lists and committed patches from the Git repository to understand which patches are accepted and how long it takes those patches to get to the end user. We found that 33% of the patches makes it into a Linux release, and that most of them need 3 to 6 months for this. Furthermore, that patches developed by more experienced developers are more easily accepted and faster reviewed and integrated. Additionally, reviewing time is impacted by submission time, the number of affected subsystems by the patch and the number of requested reviewers.
Article Search
Linux Variability Anomalies: What Causes Them and How Do They Get Fixed?
Sarah Nadi, Christian Dietrich, Reinhard Tartler, Richard C. Holt, and Daniel Lohmann
(University of Waterloo, Canada; University of Erlangen-Nuremberg, Germany)
The Linux kernel is one of the largest configurable open source software systems implementing static variability. In Linux, variability is scattered over three different artifacts: source code files, Kconfig files, and Makefiles. Previous work detected inconsistencies between these artifacts that led to anomalies in the intended variability of Linux. We call these variability anomalies. However, there has been no work done to analyze how these variability anomalies are introduced in the first place, and how they get fixed. In this work, we provide an analysis of the causes and fixes of variability anomalies in Linux. We first perform an exploratory case study that uses an existing set of patches which solve variability anomalies to identify patterns for their causes. The observations we make from this dataset allow us to develop four research questions which we then answer in a confirmatory case study on the scope of the whole Linux kernel. We show that variability anomalies exist for several releases in the kernel before they get fixed, and that contrary to our initial suspicion, typos in feature names do not commonly cause these anomalies. Our results show that variability anomalies are often introduced through incomplete patches that change Kconfig definitions without properly propagating these changes to the rest of the system. Anomalies are then commonly fixed through changes to the code rather than to Kconfig files.
Article Search
The Impact of Tangled Code Changes
Kim Herzig and Andreas Zeller
(Microsoft Reserach, UK; Saarland University, Germany)
When interacting with version control systems, developers often commit unrelated or loosely related code changes in a single transaction. When analyzing the version history, such tangled changes will make all changes to all modules appear related, possibly compromising the resulting analyses through noise and bias. In an investigation of five open-source JAVA projects, we found up to 15% of all bug fixes to consist of multiple tangled changes. Using a multi-predictor approach to untangle changes, we show that on average at least 16.6% of all source files are incorrectly associated with bug reports. We recommend better change organization to limit the impact of tangled changes.
Article Search
A Dataset from Change History to Support Evaluation of Software Maintenance Tasks
Bogdan Dit, Andrew Holtzhauer, Denys Poshyvanyk, and Huzefa Kagdi
(College of William and Mary, USA; Wichita State University, USA)
Approaches that support software maintenance need to be evaluated and compared against existing ones, in order to demonstrate their usefulness in practice. However, oftentimes the lack of well-established sets of benchmarks leads to situations where these approaches are evaluated using different datasets, which results in biased comparisons. In this data paper we describe and make publicly available a set of benchmarks from six Java applications, which can be used in the evaluation of various software engineering (SE) tasks, such as feature location and impact analysis. These datasets consist of textual description of change requests, the locations in the source code where they were implemented, and execution traces. Four of the benchmarks were already used in several SE research papers, and two of them are new. In addition, we describe in detail the methodology used for generating these benchmarks and provide a suite of tools in order to encourage other researchers to validate our datasets and generate new benchmarks for other subject software systems. Our online appendix: http://www.cs.wm.edu/semeru/data/msr13/
Article Search
Apache Commits: Social Network Dataset
Alexander C. MacLean and Charles D. Knutson
(Brigham Young University, USA)
Building non-trivial software is a social endeavor. Therefore, understanding the social network of developers is key to the study of software development organizations. We present a graph representation of the commit behavior of developers within the Apache Software Foundation for 2010 and 2011. Relationships between developers in the network represent collaborative commit behavior. Several similarity and summary metrics have been pre-calculated. The data, along with the tools that were used to create it and some further discussion, can be found at: http://sequoia.cs.byu.edu/lab/?page=artifacts/apacheGraphs.
Article Search

Software Evolution

Understanding the Evolution of Type-3 Clones: An Exploratory Study
Ripon K. Saha, Chanchal K. Roy, Kevin A. Schneider, and Dewayne E. Perry
(University of Texas at Austin, USA; University of Saskatchewan, Canada)
Understanding the evolution of clones is important for both understanding the maintenance implications of clones and for building a robust clone management system. To this end, researchers have already conducted a number of studies to analyze the evolution of clones, mostly focusing on Type-1 and Type-2 clones. However, although there are a significant number of Type-3 clones in software systems, we know a little how they actually evolve. In this paper, we perform an exploratory study on the evolution of Type-1, Type-2, and Type-3 clones in six open source software systems written in two different programming languages and compare the result with a previous study to better understand the evolution of Type-3 clones. Our results show that although Type-3 clones are more likely to change inconsistently, the absolute number of consistently changed Type-3 clone classes is greater than that of Type-1 and Type-2. Type-3 clone classes also have a lifespan similar to that of Type-1 and Type-2 clones. In addition, a considerable number of Type-1 and Type-2 clones convert into Type-3 clones during evolution. Therefore, it is important to manage type-3 clones properly to limit their negative impact. However, various automated clone management techniques such as notifying developers about clone changes or linked editing should be chosen carefully due to the inconsistent nature of Type-3 clones.
Article Search
An Empirical Study of the Fault-Proneness of Clone Mutation and Clone Migration
Shuai Xie, Foutse Khomh, and Ying Zou
(Queen's University, Canada; Polytechnique Montréal, Canada)
When implementing new features into a software system, developers may duplicate several lines of code to reuse some existing code segments. This action creates code clones in the software system. The literature has documented different types of code clone (e.g., Type-1, Type-2, and Type-3). Once created, code clones evolve as they are modified during both the development and maintenance phases of the software system. The evolution of code clones across the revisions of a software system is known as a clone genealogy. Existing work has investigated the fault-proneness of Type-1 and Type-2 clone genealogies. In this study, we investigate clone genealogies containing Type-3 clones. We analyze three long lived software systems APACHE-ANT, ARGOUML, and JBOSS, which are all written in JAVA. Using the NICAD clone detection tool, we build clone genealogies and examine two evolutionary phenomena on clones: the mutation of the type of a clone during the evolution of a system, and the migration of clone segments across the repositories of a software system. Results show that 1) mutation and migration occur frequently in software systems; 2) the mutation of a clone group to Type-2 or Type-3 clones increases the risk for faults; 3) increasing the distance between code segments in a clone group also increases the risk for faults. Index Terms: Types of clones; clone genealogy; clone migration; fault-proneness.
Article Search
Intensive Metrics for the Study of the Evolution of Open Source Projects: Case Studies from Apache Software Foundation Projects
Santiago Gala-Pérez, Gregorio Robles, Jesús M. González-Barahona, and Israel Herraiz
(Apache Software Foundation, Spain; Universidad Rey Juan Carlos, Spain; Universidad Politécnica de Madrid, Spain)
Based on the empirical evidence that the ratio of email messages in public mailing lists to versioning system commits has remained relatively constant along the history of the Apache Software Foundation (ASF), this paper has as goal to study what can be inferred from such a metric for projects of the ASF. We have found that the metric seems to be an intensive metric as it is independent of the size of the project, its activity, or the number of developers, and remains relatively independent of the technology or functional area of the project. Our analysis provides evidence that the metric is related to the technical effervescence and popularity of project, and as such can be a good candidate to measure its healthy evolution. Other, similar metrics -like the ratio of developer messages to commits and the ratio of issue tracker messages to commits- are studied for several projects as well, in order to see if they have similar characteristics.
Article Search
A Preliminary Investigation of Using Age and Distance Measures in the Detection of Evolutionary Couplings
Abdulkareem Alali, Brian Bartman, Christian D. Newman, and Jonathan I. Maletic
(Kent State University, USA)
An initial study of using two measures to improve the accuracy of evolutionary couplings uncovered from version history is presented. Two measures, namely the age of a pattern and the distance among items within a pattern, are defined and used with the traditional methods for computing evolutionary couplings. The goal is to reduce the number of false positives (i.e., inaccurate or irrelevant claims of coupling). Initial observations are presented that lend evidence that these measures may have the potential to improve the results of computing evolutionary couplings.
Article Search

Analysis of Bug Reports

Search-Based Duplicate Defect Detection: An Industrial Experience
Mehdi Amoui, Nilam Kaushik, Abraham Al-Dabbagh, Ladan Tahvildari, Shimin Li, and Weining Liu
(University of Waterloo, Canada; BlackBerry, Canada)
Duplicate defects put extra overheads on software organizations, as the cost and effort of managing duplicate defects are mainly redundant. Due to the use of natural language and various ways to describe a defect, it is usually hard to investigate duplicate defects automatically. This problem is more severe in large software organizations with huge defect repositories and massive number of defect reporters. Ideally, an efficient tool should prevent duplicate reports from reaching developers by automatically detecting and/or filtering duplicates. It also should be able to offer defect triagers a list of top-N similar bug reports and allow them to compare the similarity of incoming bug reports with the suggested duplicates. This demand has motivated us to design and develop a search-based duplicate bug detection framework at BlackBerry. The approach follows a generalized process model to evaluate and tune the performance of the system in a systematic way. We have applied the framework on software projects at BlackBerry, in addition to the Mozilla defect repository. The experimental results exhibit the performance of the developed framework and highlight the high impact of parameter tuning on its performance.
Article Search
A Contextual Approach towards More Accurate Duplicate Bug Report Detection
Anahita Alipour, Abram Hindle, and Eleni Stroulia
(University of Alberta, Canada)
Bug-tracking and issue-tracking systems tend to be populated with bugs, issues, or tickets written by a wide variety of bug reporters, with different levels of training and knowledge about the system being discussed. Many bug reporters lack the skills, vocabulary, knowledge, or time to efficiently search the issue tracker for similar issues. As a result, issue trackers are often full of duplicate issues and bugs, and bug triaging is time consuming and error prone. Many researchers have approached the bug-deduplication problem using off-the-shelf information-retrieval tools, such as BM25F used by Sun et al. In our work, we extend the state of the art by investigating how contextual information, relying on our prior knowledge of software quality, software architecture, and system-development (LDA) topics, can be exploited to improve bug-deduplication. We demonstrate the effectiveness of our contextual bug-deduplication method on the bug repository of the Android ecosystem. Based on this experience, we conclude that researchers should not ignore the context of software engineering when using IR tools for deduplication.
Article Search
Bug Resolution Catalysts: Identifying Essential Non-committers from Bug Repositories
Senthil Mani, Seema Nagar, Debdoot Mukherjee, Ramasuri Narayanam, Vibha Singhal Sinha, and Amit A. Nanavati
(IBM Research, India)
Bugs are inevitable in software projects. Resolving bugs is the primary activity in software maintenance. Developers, who fix bugs through code changes, are naturally important par- ticipants in bug resolution. However, there are other participants in these projects who do not perform any code commits. They can be reporters reporting bugs; people having a deep technical know-how of the software and providing valuable insights on how to solve the bug; bug-tossers who re-assign the bugs to the right set of developers. Even though all of them act on the bugs by tossing and commenting, not all of them may be crucial for bug resolution. In this paper, we formally define essential non- committers and try to identify these bug resolution catalysts. We empirically study 98304 bug reports across 11 open source and 5 commercial software projects for validating the existence of such catalysts. We propose a network analysis based approach to construct a Minimal Essential Graph that identifies such people in a project. Finally, we suggest ways of leveraging this information for bug triaging and bug report summarization.
Article Search
The Eclipse and Mozilla Defect Tracking Dataset: A Genuine Dataset for Mining Bug Information
Ahmed Lamkanfi, Javier Pérez, and Serge Demeyer
(University of Antwerp, Belgium)
The analysis of bug reports is an important subfield within the mining software repositories community. It explores the rich data available in defect tracking systems to uncover interesting and actionable information about the bug triaging process. While bug data is readily accessible from systems like Bugzilla and JIRA, a common database schema and a curated dataset could significantly enhance future research because it allows for easier replication. Consequently, in this paper we propose the Eclipse and Mozilla Defect Tracking Dataset, a representative database of bug data, filtered to contain only genuine defects (i.e., no feature requests) and designed to cover the whole bug-triage life cycle (i.e., store all intermediate actions). We have used this dataset ourselves for predicting bug severity, for studying bug-fixing time and for identifying erroneously assigned components.
Article Search

Software Ecosystems, Big Data

Mining Source Code Repositories at Massive Scale using Language Modeling
Miltiadis Allamanis and Charles Sutton
(University of Edinburgh, UK)
The tens of thousands of high-quality open source software projects on the Internet raise the exciting possibility of studying software development by finding patterns across truly large source code repositories. This could enable new tools for developing code, encouraging reuse, and navigating large projects. In this paper, we build the first giga-token probabilistic language model of source code, based on 352 million lines of Java. This is 100 times the scale of the pioneering work by Hindle et al. The giga-token model is significantly better at the code suggestion task than previous models. More broadly, our approach provides a new "lens" for analyzing software projects, enabling new complexity metrics based on statistical analysis of large corpora. We call these metrics data-driven complexity metrics. We propose new metrics that measure the complexity of a code module and the topical centrality of a module to a software project. In particular, it is possible to distinguish reusable utility classes from classes that are part of a program's core logic based solely on general information theoretic criteria.
Article Search
Do Software Categories Impact Coupling Metrics?
Lucas Batista Leite de Souza and Marcelo de Almeida Maia
(UFU, Brazil)
Software metrics is a valuable mechanism to assess the quality of software systems. Metrics can help the automated analysis of the growing data available in software repositories. Coupling metrics is a kind of software metrics that have been extensively used since the seventies to evaluate several software properties related to maintenance, evolution and reuse tasks. For example, several works have shown that we can use coupling metrics to assess the reusability of software artifacts available in repositories. However, thresholds for software metrics to indicate adequate coupling levels are still a matter of discussion. In this paper, we investigate the impact of software categories on the coupling level of software systems. We have found that different categories may have different levels of coupling, suggesting that we need special attention when comparing software systems in different categories and when using predefined thresholds already available in the literature.
Article Search
The Maven Repository Dataset of Metrics, Changes, and Dependencies
Steven Raemaekers, Arie van Deursen, and Joost Visser
(Software Improvement Group, Netherlands; TU Delft, Netherlands)
We present the Maven Dependency Dataset (MDD), containing metrics, changes and dependencies of 148,253 jar files. Metrics and changes have been calculated at the level of individual methods, classes and packages of multiple library versions. A complete call graph is also presented which in- cludes call, inheritance, containment and historical relationships between all units of the entire repository. In this paper, we describe our dataset and the methodology used to obtain it. We present different conceptual views of MDD and we also describe limitations and data quality issues that researchers using this data should be aware of.
Article Search
A Historical Dataset for the Gnome Ecosystem
Mathieu Goeminne, Maëlick Claes, and Tom Mens
(University of Mons, Belgium)
We present a dataset of the open source software ecosystem Gnome from a social point of view. We have collected historical data about the contributors to all Gnome projects stored on git.gnome.org, taking into account the problem of identity matching, and associating different activity types to the contributors. This type of information is very useful to complement the traditional, source-code related information one can obtain by mining and analyzing the actual source code. The dataset can be obtained at https://bitbucket.org/mgoeminne/sgl-flossmetric-dbmerge.
Article Search
A Network of Rails: A Graph Dataset of Ruby on Rails and Associated Projects
Patrick Wagstrom, Corey Jergensen, and Anita Sarma
(IBM Research, USA; University of Nebraska-Lincoln, USA)
Software projects, whether open source, proprietary, or a combination thereof, rarely exist in isolation. Rather, most projects build on a network of people and ideas from dozens, hundreds, or even thousands of other projects. Using the GitHub APIs it is possible to extract these relationships for millions of users and projects. In this paper we present a dataset of a large network of open source projects centered around Ruby on Rails. This dataset provides insight into the relationships between Ruby on Rails and an ecosystem involving 1116 projects. To facilitate understanding of this data in the context of relationships between projects, users, and their activities, it is provided as a graph database suitable for assessing network properties of the community and individuals within those communities and can be found at https://github.com/pridkett/gitminer-data-rails.
Article Search
The GHTorent Dataset and Tool Suite
Georgios Gousios
(TU Delft, Netherlands)
During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this paper, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.
Article Search

Bug/Change Classification and Localization

Discovering, Reporting, and Fixing Performance Bugs
Adrian Nistor, Tian Jiang, and Lin Tan
(University of Illinois at Urbana-Champaign, USA; University of Waterloo, Canada)
Software performance is critical for how users perceive the quality of software products. Performance bugs---programming errors that cause significant performance degradation---lead to poor user experience and low system throughput. Designing effective techniques to address performance bugs requires a deep understanding of how performance bugs are discovered, reported, and fixed. In this paper, we study how performance bugs are discovered, reported to developers, and fixed by developers, and compare the results with those for non-performance bugs. We study performance and non-performance bugs from three popular code bases: Eclipse JDT, Eclipse SWT, and Mozilla. First, we find little evidence that fixing performance bugs has a higher chance to introduce new functional bugs than fixing non-performance bugs, which implies that developers may not need to be overconcerned about fixing performance bugs. Second, although fixing performance bugs is about as error-prone as fixing nonperformance bugs, fixing performance bugs is more difficult than fixing non-performance bugs, indicating that developers need better tool support for fixing performance bugs and testing performance bug patches. Third, unlike many non-performance bugs, a large percentage of performance bugs are discovered through code reasoning, not through users observing the negative effects of the bugs (e.g., performance degradation) or through profiling. The result suggests that techniques to help developers reason about performance, better test oracles, and better profiling techniques are needed for discovering performance bugs.
Article Search
Improving Bug Localization using Correlations in Crash Reports
Shaohua Wang, Foutse Khomh, and Ying Zou
(Queen's University, Canada; Polytechnique Montréal, Canada)
Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users' environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bugs for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to a same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bugs, as a crash correlation group. In this paper, we propose three rules to identify correlated crash types automatically. We also propose an algorithm to locate and rank buggy files using crash correlation groups. Through an empirical study on Firefox and Eclipse, we show that the three rules can identify crash correlation groups with a precision of 100% and a recall of 90% for Firefox and a precision of 79% and a recall of 65% for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62% and a precision of 42% for Firefox and a recall of 52% and a precision of 50% for Eclipse. On the top 10 buggy file candidates, the recall increases to 92% for Firefox and 90% for Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together.
Article Search
Testing Principles, Current Practices, and Effects of Change Localization
Steven Raemaekers, Gabriela F. Nane, Arie van Deursen, and Joost Visser
(Software Improvement Group, Netherlands; TU Delft, Netherlands)
Best practices in software development state that code that is likely to change should be encapsulated to localize possible modifications. In this paper, we investigate the appli- cation and effects of this design principle. We investigate the relationship between the stability, encapsulation and popularity of libraries on a dataset of 148,253 Java libraries. We find that bigger systems with more rework in existing methods have less stable interfaces and that bigger systems tend to encapsulate dependencies better. Additionally, there are a number of factors that are associated with change in library interfaces, such as rework in existing methods, system size, encapsulation of dependencies and the number of dependencies. We find that current encapsulation practices are not targeted at libraries that change the most. We also investigate the strength of ripple effects caused by instability of dependencies and we find that libraries cause ripple effects in systems using them and that these effects can be mitigated by encapsulation.
Article Search

Social Mining

Fixing the 'Out of Sight Out of Mind' Problem: One Year of Mood-Based Microblogging in a Distributed Software Team
Kevin Dullemond, Ben van Gameren, Margaret-Anne Storey, and Arie van Deursen
(TU Delft, Netherlands; University of Victoria, Canada)
Distributed teams face the challenge of staying connected. How do team members stay connected when they no longer see each other on a daily basis? What should be done when there is no coffee corner to share your latest exploits? In this paper we evaluate a microblogging system which makes this possible in a distributed setting. The system, WeHomer, enables the sharing of information and corresponding emotions in a fully distributed organization. We analyzed the content of over a year of usage data by 19 team members in a structured fashion, performed 5 semi-structured interviews and report our findings in this paper. We draw conclusions about the topics shared, the impact on software teams and the impact of distribution and team composition. Main findings include an increase in team-connectedness and easier access to information that is traditionally harder to consistently acquire.
Article Search
Communication in Open Source Software Development Mailing Lists
Anja Guzzi, Alberto Bacchelli, Michele Lanza, Martin Pinzger, and Arie van Deursen
(TU Delft, Netherlands; University of Lugano, Switzerland; University of Klagenfurt, Austria)
Open source software (OSS) development teams use electronic means, such as emails, instant messaging, or forums, to conduct open and public discussions. Researchers investigated mailing lists considering them as a hub for project communication. Prior work focused on specific aspects of emails, for example the handling of patches, traceability concerns, or social networks. This led to insights pertaining to the investigated aspects, but not to a comprehensive view of what developers communicate about. Our objective is to increase the understanding of development mailing lists communication. We quantitatively and qualitatively analyzed a sample of 506 email threads from the development mailing list of a major OSS project, Lucene. Our investigation reveals that implementation details are discussed only in about 35% of the threads, and that a range of other topics is discussed. Moreover, core developers participate in less than 75% of the threads. We observed that the development mailing list is not the main player in OSS project communication, as it also includes other channels such as the issue repository.
Article Search
Tag Recommendation in Software Information Sites
Xin Xia, David Lo, Xinyu Wang, and Bo Zhou
(Zhejiang University, China; Singapore Management University, Singapore)
Nowadays, software engineers use a variety of online media to search and become informed of new and interesting technologies, and to learn from and help one another. We refer to these kinds of online media which help software engineers improve their performance in software development, maintenance and test processes as software information sites. It is common to see tags in software information sites and many sites allow users to tag various objects with their own words. Users increasingly use tags to describe the most important features of their posted contents or projects. In this paper, we propose TagCombine, an automatic tag recommendation method which analyzes objects in software information sites. TagCombine has 3 different components: 1. multi-label ranking component which considers tag recommendation as a multi-label learning problem; 2. similarity based ranking component which recommends tags from similar objects; 3. tag-term based ranking component which considers the relationship between different terms and tags, and recommends tags after analyzing the terms in the objects. We evaluate TagCombine on 2 software information sites, StackOverflow and Freecode, which contain 47,668 and 39,231 text documents, respectively, and 437 and 243 tags, respectively. Experiment results show that for StackOverflow, our TagCombine achieves recall@5 and recall@10 scores of 0.5964 and 0.7239, respectively; For Freecode, it achieves recall@5 and recall@10 scores of 0.6391 and 0.7773, respectively. Moreover, averaging over StackOverflow and Freecode results, we improve TagRec proposed by Al-Kofahi et al. by 22.65% and 14.95%, and the tag recommendation method proposed by Zangerle et al. by 18.5% and 7.35% for recall@5 and recall@10 scores.
Article Search
Using Developer Interaction Data to Compare Expertise Metrics
Romain Robbes and David Röthlisberger
(University of Chile, Chile; Federico Santa María Technical University, Chile)
The expertise of a software developer is said to be a crucial factor for the development time required to complete a task. Even if this hypothesis is intuitive, research has not yet quantified the effect of developer expertise on development time. A related problem is that the design space for expertise metrics is large; out of the various automated expertise metrics proposed, we do not know which metric most reliably captures expertise. What prevents a proper evaluation of expertise metrics and their relation with development time is the lack of data on development tasks, such as their precise duration. Fortunately, this data is starting to become available in the form of growing developer interaction repositories. We show that applying MSR techniques to these developer interaction repositories gives us the necessary tools to perform such an evaluation.
Article Search
Project Roles in the Apache Software Foundation: A Dataset
Megan Squire
(Elon University, USA)
This paper outlines the steps in the creation and maintenance of a new dataset listing leaders of the various projects of the Apache Software Foundation (ASF). Included in this dataset are different levels of committers to the various ASF project code bases, as well as regular and emeritus members of the ASF, and directors and officers of the ASF. The dataset has been donated to the FLOSSmole project under an open source license, and is available for download (https://code.google.com /p/flossmole/downloads/detail?name=apachePeople2013-Jan.zip), or for direct querying via a database client.
Article Search
Apache-Affiliated Twitter Screen Names: A Dataset
Megan Squire
(Elon University, USA)
This paper describes a new dataset containing Twitter screen names for members of the projects affiliated with the Apache Software Foundation (ASF). The dataset includes the confirmed Twitter screen names, as well as the real name as listed on Twitter, and the user identification as used within the Apache organization. The paper also describes the process used to collect and clean this data, and shows some sample queries for learning how to use the data. The dataset has been donated to the FLOSSmole project and is available for download (https://code.google.com/p/flossmole/downloads/detail?name=apacheTwitter2013-Jan.zip) or direct querying via a database client.
Article Search

Search-Driven Development

Assisting Code Search with Automatic Query Reformulation for Bug Localization
Bunyamin Sisman and Avinash C. Kak
(Purdue University, USA)
Source code retrieval plays an important role in many software engineering tasks. However, designing a query that can accurately retrieve the relevant software artifacts can be challenging for developers as it requires a certain level of knowledge and experience regarding the code base. This paper demonstrates how the difficulty of designing a proper query can be alleviated through automatic Query Reformulation (QR), an under-the-hood operation for reformulating a user's query with no additional input from the user. The proposed QR framework works by enriching a user's search query with certain specific additional terms drawn from the highest-ranked artifacts retrieved in response to the initial query. The important point here is that these additional terms injected into a query are those that are deemed to be "close" to the original query terms in the source code on the basis of positional proximity. This similarity metric is based on the notion that terms that deal with the same concepts in source code are usually proximal to one another. We demonstrate the superiority of our QR framework in relation to the QR frameworks well-known in the natural language document retrieval by showing significant improvements in bug localization performance for two large software projects using more than 4,000 queries.
Article Search
Mining Succinct and High-Coverage API Usage Patterns from Source Code
Jue Wang, Yingnong Dang, Hongyu Zhang, Kai Chen, Tao Xie, and Dongmei Zhang
(Tsinghua University, China; Microsoft Research, China; Peking University, China; North Carolina State University, USA)
During software development, a developer often needs to discover specific usage patterns of Application Programming Interface (API) methods. However, these usage patterns are often not well documented. To help developers to get such usage patterns, there are approaches proposed to mine client code of the API methods. However, they lack metrics to measure the quality of the mined usage patterns, and the API usage patterns mined by the existing approaches tend to be many and redundant, posing significant barriers for being practical adoption. To address these issues, in this paper, we propose two quality metrics (succinctness and coverage) for mined usage patterns, and further propose a novel approach called Usage Pattern Miner (UP-Miner) that mines succinct and high-coverage usage patterns of API methods from source code. We have evaluated our approach on a large-scale Microsoft codebase. The results show that our approach is effective and outperforms an existing representative approach MAPO. The user studies conducted with Microsoft developers confirm the usefulness of the proposed approach in practice.
Article Search
Rendezvous: A Search Engine for Binary Code
Wei Ming Khoo, Alan Mycroft, and Ross Anderson
(University of Cambridge, UK)
The problem of matching between binaries is important for software copyright enforcement as well as for identifying disclosed vulnerabilities in software. We present a search engine prototype called Rendezvous which enables indexing and searching for code in binary form. Rendezvous identifies binary code using a statistical model comprising instruction mnemonics, control flow sub-graphs and data constants which are simple to extract from a disassembly, yet normalising with respect to different compilers and optimisations. Experiments show that Rendezvous achieves F2 measures of 86.7% and 83.0% on the GNU C library compiled with different compiler optimisations and the GNU coreutils suite compiled with gcc and clang respectively. These two code bases together comprise more than one million lines of code. Rendezvous will bring significant changes to the way patch management and copyright enforcement is currently performed.
Article Search
An Unabridged Source Code Dataset for Research in Software Reuse
Werner Janjic, Oliver Hummel, Marcus Schumacher, and Colin Atkinson
(University of Mannheim, Germany; KIT, Germany)
This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Mannheim. It consists of the complete index used to drive the search engine, www.merobase.com, the vast majority of the source code modules accessible through it, and a tool that enables researchers to efficiently browse the collected data. We describe the techniques used to collect, format and store the data set, as well as the core capabilities of the Merobase search engine such as classic keyword-based, interface-based and test-driven search. This data-set, which represents one of the largest searchable collections of source and binary modules available online, has been recently made available for download and use in further research projects. All files are available at http://merobase.informatik.uni-mannheim.de/sources/
Article Search

10 Years of MSR

The MSR Cookbook: Mining a Decade of Research
Hadi Hemmati, Sarah Nadi, Olga Baysal, Oleksii Kononenko, Wei Wang, Reid Holmes, and Michael W. Godfrey
(University of Waterloo, Canada)
The Mining Software Repositories (MSR) research community has grown significantly since the first MSR workshop was held in 2004. As the community continues to broaden its scope and deepens its expertise, it is worthwhile to reflect on the best practices that our community has developed over the past decade of research. We identify these best practices by surveying past MSR conferences and workshops. To that end, we review all 117 full papers published in the MSR proceedings between 2004 and 2012. We extract 268 comments from these papers, and categorize them using a grounded theory methodology. From this evaluation, four high-level themes were identified: data acquisition and preparation, synthesis, analysis, and sharing/replication. Within each theme we identify several common recommendations, and also examine how these recommendations have evolved over the past decade. In an effort to make this survey a living artifact, we also provide a public forum that contains the extracted recommendations in the hopes that the MSR community can engage in a continuing discussion on our evolving best practices.
Article Search
Happy Birthday! A Trend Analysis on Past MSR Papers
Serge Demeyer, Alessandro Murgia, Kevin Wyckmans, and Ahmed Lamkanfi
(University of Antwerp, Belgium)
On the occasion of the 10th anniversary of the MSR conference, it is a worthwhile exercise to meditate on the past, present and future of our research discipline. Indeed, since the MSR community has experienced a big influx of researchers bringing in new ideas, state-of-the art technology and contemporary research methods it is unclear what the future might bring. In this paper, we report on a text mining exercise applied on the complete corpus of MSR papers to reflect on where we come from; where we are now; and where we should be going. We address issues like the trendy (and outdated) research topics; the frequently (and less frequently) cited cases; the popular (and emerging) mining infrastructure; and finally the proclaimed actionable information which we are deemed to uncover.
Article Search
Replicating Mining Studies with SOFAS
Giacomo Ghezzi and Harald C. Gall
(University of Zurich, Switzerland)
The replication of studies in mining software repositories (MSR) is essential to compare different mining techniques or assess their findings across many projects. However, it has been shown that very few of these studies can be easily replicated. Their replication is just as fundamental as the studies themselves and is one of the main threats to validity that they suffer from. In this paper, we show how we can alleviate this problem with our SOFAS framework. SOFAS is a platform that enables a systematic and repeatable analysis of software projects by providing extensible and composable analysis workflows. These workflows can be applied on a multitude of software projects, facilitating the replication and scaling of mining studies. In this paper, we show how and to which degree replication can be achieved. We investigated the mining studies of MSR from 2004 to 2011 and found that from 88 studies published in the MSR proceedings so far, we can fully replicate 25 empirical studies. Additionally, we can replicate 27 additional studies to a large extent. These studies account for 30% and 32%, respectively, of the mining studies published. To support our claim we describe in detail one large study that we replicated and discuss how replication with SOFAS works for the other studies investigated. To discuss the potential of our platform we also characterise how studies can be easily enriched to deliver even more comprehensive answers by extending the analysis workflows provided by the platform.
Article Search
A Historical Dataset of Software Engineering Conferences
Bogdan Vasilescu, Alexander Serebrenik, and Tom Mens
(TU Eindhoven, Netherlands; University of Mons, Belgium)
The Mining Software Repositories community typically focuses on data from software configuration management tools, mailing lists, and bug tracking repositories to uncover interesting and actionable information about the evolution of software systems. However, the techniques employed and the challenges faced when mining are not restricted to these types of repositories. In this paper, we present an atypical dataset of software engineering conferences, containing historical data about the accepted papers and the composition of programme committees for eleven well-established conferences. The dataset (published on Github at https://github.com/tue-mdse/conferenceMetrics) can be used, e.g., by conference steering committees or programme committee chairs to assess their selection process and compare against other conferences in the field, or by prospective authors to decide in which conferences to publish.
Article Search

Mining Unstructured Data

Automatically Mining Software-Based, Semantically-Similar Words from Comment-Code Mappings
Matthew J. Howard, Samir Gupta, Lori Pollock, and K. Vijay-Shanker
(University of Delaware, USA)
Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.
Article Search
Strategies for Avoiding Text Fixture Smells during Software Evolution
Michaela Greiler, Andy Zaidman, Arie van Deursen, and Margaret-Anne Storey
(TU Delft, Netherlands; University of Victoria, Canada)
An important challenge in creating automated tests is how to design test fixtures, i.e., the setup code that initializes the system under test before actual automated testing can start. Test designers have to choose between different approaches for the setup, trading off maintenance overhead with slow test execution. Over time, test code quality can erode and test smells can develop, such as the occurrence of overly general fixtures, obscure in-line code and dead fields. In this paper, we investigate how fixture-related test smells evolve over time by analyzing several thousand revisions of five open source systems. Our findings indicate that setup management strategies strongly influence the types of test fixture smells that emerge in code, and that several types of fixture smells often emerge at the same time. Based on this information, we recommend important guidelines for setup strategies, and suggest how tool support can be improved to help in both avoiding the emergence of such smells as well as how to refactor code when test smells do appear.
Article Search
Contextual Analysis of Program Logs for Understanding System Behaviors
Qiang Fu, Jian-Guang Lou, Qingwei Lin, Rui Ding, Dongmei Zhang, and Tao Xie
(Microsoft Research, China; Microsoft, China; North Carolina State University, USA)
Understanding behaviors of a software system is very important to perform daily system maintenance tasks. In practice, one way to gain the knowledge about the runtime behavior of a system is to manually analyze system logs collected during the system executions. With the increasing scale and complexity of software systems, it becomes challenging for system operators to manually analyze system logs. To address these challenges, in this paper, we propose a new approach for contextual analysis of system logs for understanding a systems behaviors. In particular, we first use execution patterns to represent execution structures reflected by a sequence of system logs, and propose an algorithm to mine execution patterns from program logs. The mined execution patterns correspond to different execution paths of the system. Based on these execution patterns, our approach further learns essential contextual factors (e.g., the occurrences of specific program logs with specific parameter values) that cause a specific branch or path to be executed by the system. The mining and learning results can help system operators to understand a software systems runtime execution logics and behaviors during various tasks such as system problem diagnosis. We demonstrate the feasibility of our approach upon two real-world software systems (Hadoop and Ethereal).
Article Search
A Dataset for Evaluating Identifier Splitters
David Binkley, Dawn Lawrie, Lori Pollock, Emily Hill, and K. Vijay-Shanker
(Loyola University Maryland, USA; University of Delaware, USA; Montclair State University, USA)
Software engineering and evolution techniques have recently started to exploit the natural language information in source code. A key step in doing so is splitting identifiers into their constituent words. While simple in concept, identifier splitting raises several challenging issues, leading to a range of splitting techniques. Consequently, the research community would benefit from a dataset (i.e., a gold set) that facilitates comparative studies of identifier splitting techniques. A gold set of 2,663 split identifiers was constructed from 8,522 individual human splitting judgements and can be obtained from www.cs.loyola.edu/~binkley/ludiso. This set's construction and observations aimed at its effective use are described.
Article Search
INVocD: Identifier Name Vocabulary Dataset
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp
(Open University, UK)
INVocD is a database of the identifier name declarations and vocabulary found in 60 FLOSS Java projects where the source code structure is recorded and the identifier name vocabulary is made directly available, offering advantages for identifier name research over conventional source code models. The database has been used to support a range of research projects from identifier name analysis to concept location, and provides many opportunities to researchers. INVocD may be downloaded from http://oro.open.ac.uk/36992
Article Search

Predictor Models

Better Cross Company Defect Prediction
Fayola Peters, Tim Menzies, and Andrian Marcus
(West Virginia University, USA; Wayne State University, USA)
How can we find data for quality prediction? Early in the life cycle, projects may lack the data needed to build such predictors. Prior work assumed that relevant training data was found nearest to the local project. But is this the best approach? This paper introduces the Peters filter which is based on the following conjecture: When local data is scarce, more information exists in other projects. Accordingly, this filter selects training data via the structure of other projects. To assess the performance of the Peters filter, we compare it with two other approaches for quality prediction. Within- company learning and cross-company learning with the Burak filter (the state-of-the-art relevancy filter). This paper finds that: 1) within-company predictors are weak for small data-sets; 2) the Peters filter+cross-company builds better predictors than both within-company and the Burak filter+cross-company; and 3) the Peters filter builds 64% more useful predictors than both within- company and the Burak filter+cross-company approaches. Hence, we recommend the Peters filter for cross-company learning.
Article Search
Using Citation Influence to Predict Software Defects
Wei Hu and Kenny Wong
(University of Alberta, Canada)
The software dependency network reflects structure and the developer contribution network reflects process. Previous studies have used social network properties over these networks to predict whether a software component is defect-prone. However, these studies do not consider the strengths of the dependencies in the networks. In our approach, we use a citation influence topic model to determine dependency strengths among components and developers, analyze weak and strong dependencies separately, and apply social network properties to predict defect-prone components. In experiments on Eclipse and NetBeans, our approach has higher accuracy than prior work.
Article Search
Revisiting Software Development Effort Estimation Based on Early Phase Development Activities
Masateru Tsunoda, Koji Toda, Kyohei Fushida, Yasutaka Kamei, Meiyappan Nagappan, and Naoyasu Ubayashi
(Toyo University, Japan; Fukuoka Institute of Technology, Japan; NTT, Japan; Kyushu University, Japan; Queen's University, Canada)
Many research projects on software estimation use software size as a major explanatory variable. However, practitioners sometimes use the ratio of effort for early phase activities such as planning and requirement analysis, to the effort for the whole development phase of the software in order to estimate effort. In this paper, we focus on effort estimation based on the effort for early phase activities. The goal of the research is to examine the relationship of early phase effort and software size with software development effort. To achieve the goal, we built effort estimation models using early phase effort as an explanatory variable, and compared the estimation accuracies of these models to the effort estimation models based on software size. In addition, we built estimation models using both early phase effort and software size. In our experiment, we used ISBSG dataset, which was collected from software development companies, and regarded planning phase effort and requirement analysis effort as early phase effort. The result of the experiment showed that when both software size and sum of planning and requirement analysis phase effort were used as explanatory variables, the estimation accuracy was most improved (Average Balanced Relative Error was improved to 75.4% from 148.4%). Based on the result, we recommend that both early phase effort and software size be used as explanatory variables, because that combination showed the high accuracy, and did not have multicollinearity issues.
Article Search

proc time: 0.25