ICPC 2013 – Proceedings

Improving Program Comprehension by Answering Questions (Keynote)
Brad A. Myers

(CMU, USA)
My Natural Programming Project is working on making software development easier to learn, more effective, and less error prone. An important focus over the last few years has been to discover what are the hard-to-answer questions that developers ask while they are trying to comprehend their programs, and then to develop tools to help answer those questions. For example, when studying programmers working on everyday bugs, we found that they continuously ask “Why” and “Why Not” questions as they try to comprehend what happened. We developed the “Whyline” debugging tool, which allows programmers to directly ask these questions of their programs and get a visualization of the answers. In a small lab study, Whyline increased productivity by a factor of about two. We studied professional programmers trying to understand unfamiliar code and identified over 100 questions they identified as hard-to-answer. In particular, we saw that programmers frequently had specific questions about the feasible execution paths, so we developed a new visualization tool to directly present this information. When trying to use unfamiliar APIs, such as the Java SDK and the SAP eSOA APIs, we discovered some common patterns that make programmers up to 10 times slower in finding and understanding how to use the appropriate methods, so we developed new tools to assist them. This talk will provide an overview of our studies and resulting tools that address program comprehension issues.

Technical Research

Textual Analysis
Mon, May 20, 11:00 - 12:30, Bayview A (Chair: Gabriele Bavota)

Part-of-Speech Tagging of Program Identifiers for Improved Text-Based Software Engineering Tools
Samir Gupta, Sana Malik, Lori Pollock, and K. Vijay-Shanker
(University of Delaware, USA; University of Maryland, USA)
To aid program comprehension, programmers choose identifiers for methods, classes, fields and other program elements primarily by following naming conventions in software. These software “naming conventions” follow systematic patterns which can convey deep natural language clues that can be leveraged by software engineering tools. For example, they can be used to increase the accuracy of software search tools, improve the ability of program navigation tools to recommend related methods, and raise the accuracy of other program analyses. After splitting multi-word names into their component words, the next step to extracting accurate natural language information is tagging each word with its part of speech (POS) and then chunking the name into natural language phrases. State-of-the- art approaches, most of which rely on “traditional POS taggers” trained on natural language documents, do not capture the syntactic structure of program elements. In this paper, we present a POS tagger and syntactic chunker for source code names that takes into account programmers’ naming conventions to understand the regular, systematic ways a program element is named. We studied the naming conventions used in Object Oriented Programming and identified different grammatical constructions that characterize a large number of program identifiers. This study then informed the design of our POS tagger and chunker. Our evaluation results show a significant improvement in accuracy(11%-20%) of POS tagging of identifiers, over the current approaches. With this improved accuracy, both automated software engineering tools and developers will be able to better capture and understand the information available in code.

Evaluating Source Code Summarization Techniques: Replication and Expansion
Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, and Jeffrey C. Carver
(University of Alabama, USA)
During software evolution a developer must investigate source code to locate then understand the entities that must be modified to complete a change task. To help developers in this task, Haiduc et al. proposed text summarization based approaches to the automatic generation of class and method summaries, and via a study of four developers, they evaluated source code summaries generated using their techniques. In this paper we propose a new topic modeling based approach to source code summarization, and via a study of 14 developers, we evaluate source code summaries generated using the proposed technique. Our study partially replicates the original study by Haiduc et al. in that it uses the objects, the instruments, and a subset of the summaries from the original study, but it also expands the original study in that it includes more subjects and new summaries. The results of our study both support the findings of the original and provide new insights into the processes and criteria that developers use to evaluate source code summaries. Based on our results, we suggest future directions for research on source code summarization.

Automatic Generation of Natural Language Summaries for Java Classes
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K. Vijay-Shanker
(Wayne State University, USA; Universidad Nacional de Colombia, Colombia; IBM Research, India; University of Delaware, USA)
Most software engineering tasks require developers to understand parts of the source code. When faced with unfamiliar code, developers often rely on (internal or external) documentation to gain an overall understanding of the code and determine whether it is relevant for the current task. Unfortunately, the documentation is often absent or outdated. This paper presents a technique to automatically generate human readable summaries for Java classes, assuming no documentation exists. The summaries allow developers to understand the main goal and structure of the class. The focus of the summaries is on the content and responsibilities of the classes, rather than their relationships with other classes. The summarization tool determines the class and method stereotypes and uses them, in conjunction with heuristics, to select the information to be included in the summaries. Then it generates the summaries using existing lexicalization tools. A group of programmers judged a set of generated summaries for Java classes and determined that they are readable and understandable, they do not include extraneous information, and, in most cases, they are not missing essential information.

The Role of Visualization in Program Comprehension
Mon, May 20, 14:00 - 16:00, Bayview A (Chair: Andrian Marcus)

An Empirical Study on the Efficiency of Graphical vs. Textual Representations in Requirements Comprehension
Zohreh Sharafi, Alessandro Marchetto, Angelo Susi

, Giuliano Antoniol, and Yann-Gaël Guéhéneuc
(Polytechnique Montréal, Canada; Fondazione Bruno Kessler, Italy)
Graphical representations are used to visualise, specify, and document software artifacts in all stages of software development process. In contrast with text, graphical representations are presented in two-dimensional form, which seems easy to process. However, few empirical studies investigated the efficiency of graphical representations vs. textual ones in modelling and presenting software requirements. Therefore, in this paper, we report the results of an eye-tracking experiment involving 28 participants to study the impact of structured textual vs. graphical representations on subjects' efficiency while performing requirement comprehension tasks. We measure subjects' efficiency in terms of the percentage of correct answers (accuracy) and of the time and effort spend to perform the tasks.
We observe no statistically-significant difference in term of accuracy. However, our subjects spent more time and effort while working with the graphical representation although this extra time and effort does not affect accuracy. Our findings challenge the general assumption that graphical representations are more efficient than the textual ones at least in the case of developers not familiar with the graphical representation. Indeed, our results emphasise that training can significantly improve the efficiency of our subjects working with graphical representations. Moreover, by comparing the visual paths of our subjects, we observe that the spatial structure of the graphical representation leads our subjects to follow two different strategies (top-down vs. bottom-up) and subsequently this hierarchical structure helps developers to ease the difficulty of model comprehension tasks.

SArF Map: Visualizing Software Architecture from Feature and Layer Viewpoints
Kenichi Kobayashi, Manabu Kamimura, Keisuke Yano, Koki Kato, and Akihiko Matsuo
(Fujitsu Labs, Japan)
To facilitate understanding the architecture of a software system, we developed SArF Map technique that visualizes software architecture from feature and layer viewpoints using a city metaphor. SArF Map visualizes implicit software features using our previous study, SArF dependency-based software clustering algorithm. Since features are high-level abstraction units of software, a generated map can be directly used for high-level decision making such as reuse and also for communications between developers and non-developer stakeholders. In SArF Map, each feature is visualized as a city block, and classes in the feature are laid out as buildings reflecting their software layer. Relevance between features is represented as streets. Dependency links are visualized lucidly. Through open source and industrial case studies, we show that the architecture of the target systems can be easily overviewed and that the quality of their packaging designs can be quickly assessed.

Multiscale Visual Comparison of Execution Traces
Jonas Trümper, Jürgen Döllner, and Alexandru Telea
(HPI, Germany; University of Groningen, Netherlands)
Understanding the execution of programs by means of program traces is a key strategy in software comprehension. An important task in this context is comparing two traces in order to find similarities and differences in terms of executed code, execution order, and execution duration. For large and complex program traces, this is a difficult task due to the cardinality of the trace data. In this paper, we propose a new visualization method based on icicle plots and edge bundles. We address visual scalability by several multiscale visualization metaphors, which help users navigating from the main differences between two traces to intermediate structural-difference levels, and, finally fine-grained function call levels. We show how our approach, implemented in a tool called TraceDiff, is applicable in several scenarios for trace difference comprehension on real-world trace datasets.

Video

In Situ Understanding of Performance Bottlenecks through Visually Augmented Code
Fabian Beck, Oliver Moseler, Stephan Diehl, and Günter Daniel Rey
(University of Stuttgart, Germany; University of Trier, Germany; Fernuniversität in Hagen, Germany)
Finding and fixing performance bottlenecks requires sound knowledge of the program that is to be optimized. In this paper, we propose an approach for presenting performance-related information to software engineers by visually augmenting source code shown in an editor. Small diagrams at each method declaration and method call visualize the propagation of runtime consumption through the program as well as the interplay of threads in parallelized programs. Advantages of in situ visualization like this over traditional representations, where code and profiling information are shown in different places, promise to be the prevention of a split-attention effect caused by multiple views; information is presented where required, which supports understanding and navigation. We implemented the approach as an IDE plug-in and tested it in a user study with four developers improving the performance of their own programs. The user study provides insights into the process of understanding performance bottlenecks with our approach.

Software Quality
Mon, May 20, 16:30 - 17:30, Bayview A (Chair: Andrew Begel)

Monitoring User Interactions for Supporting Failure Reproduction
Tobias Roehm, Nigar Gurbanova, Bernd Bruegge, Christophe Joubert, and Walid Maalej
(TU Munich, Germany; Prodevelop, Spain; University of Hamburg, Germany)
The first step to comprehend and fix a software bug is usually to reproduce the corresponding failure. Reproducing a failure requires information about steps to reproduce, i.e. the steps necessary to make a failure occur in the development environment. In case of an application with a user interface, steps to reproduce consist of the interactions between a user and the application that precede the failure. Unfortunately, bug reports typically lack this information. Users are either unaware of its importance to developers, are unable to describe it, or simply do not have time to report it. In this paper, we present a simple but effective and resource efficient approach to monitor interactions between users and their applications selectively at a high level of abstraction, e.g. editing operations and commands. This minimizes the monitoring overhead and enables developers to analyze user interaction traces. We map monitored interactions to a taxonomy of user interactions to help developers comprehend user behavior. Further, we present the Timeline Tool that visualizes monitored interaction traces preceding failures. To evaluate our approach we conducted an experiment with 12 participants and asked them to reproduce bug reports from an open-source project. We found that developers are able to derive steps to reproduce from monitored interaction traces. In particular, inexperienced developers profit from the Timeline Tool, as they are able to reproduce failures that they cannot reproduce without it. The monitoring overhead is rather small (approx. 5 % CPU and 2-5% memory) and users feel it does not influence their work in a negative way.

Quality Analysis of Source Code Comments
Daniela Steidl, Benjamin Hummel, and Elmar Juergens
(CQSE, Germany)
A significant amount of source code in software systems consists of comments, i. e., parts of the code which are ignored by the compiler. Comments in code represent a main source for system documentation and are hence key for source code understanding with respect to development and mainte- nance. Although many software developers consider comments to be crucial for program understanding, existing approaches for software quality analysis ignore system commenting or make only quantitative claims. Hence, current quality analyzes do not take a significant part of the software into account. In this work, we present a first detailed approach for quality analysis and assessment of code comments. The approach provides a model for comment quality which is based on different comment categories. To categorize comments, we use machine learning on Java and C/C++ programs. The model comprises different quality aspects: by providing metrics tailored to suit specific categories, we show how quality aspects of the model can be assessed. The validity of the metrics is evaluated with a survey among 16 experienced software developers, a case study demonstrates the relevance of the metrics in practice.

Source Code Comprehension
Tue, May 21, 11:00 - 12:30, Bayview A (Chair: Andy Zaidman)

Gapped Code Clone Detection with Lightweight Source Code Analysis
Hiroaki Murakami, Keisuke Hotta, Yoshiki Higo

, Hiroshi Igaki, and Shinji Kusumoto
(Osaka University, Japan)
A variety of methods detecting code clones has been proposed before. In order to detect gapped code clones, AST-based technique, PDG-based technique, metric-based technique and text-based technique using the LCS algorithm have been proposed. However, each of those techniques has limitations. For example, existing AST-based techniques and PDG-based techniques require costs for transforming source files into intermediate representations such as ASTs or PDGs and comparing them. Existing metric-based techniques and text-based techniques using the LCS algorithm cannot detect code clones if methods or blocks are partially duplicated. This paper proposes a new method that detects gapped code clones using the Smith-Waterman algorithm to resolve those limitations. The Smith-Waterman algorithm is an algorithm for identifying similar alignments between two sequences even if they include some gaps. The authors developed the proposed method as a software tool named CDSW, and confirmed that the proposed method could resolve the limitations by conducting a quantitative evaluation with Bellon's benchmark.

Insight into a Method Co-change Pattern to Identify Highly Coupled Methods: An Empirical Study
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
In this paper, we describe an empirical study of a unique method co-change pattern that has the potential to pinpoint design deficiency in a software system. We automatically identify this pattern by inspecting the method co-change history using reasonable constraints on method association rules. We also investigate the effect of code clones on the method co-changes identified according to the pattern, because there is a common intuition that clone fragments from the same clone class often require corresponding changes to ensure they remain consistent with each other.
According to our in-depth investigation on hundreds of revisions of seven open-source software systems considering three types of clones (Type 1, Type 2, Type 3), our identified pattern helps us detect methods that are logically coupled with multiple other methods and that exhibit a significantly higher modification frequency than other methods. We call the methods detected by the pattern MMCGs (Methods appearing in Multiple Commit Groups) considering the pattern semantic. MMCGs can be considered as the candidates for restructuring in order to minimize coupling as well as to reduce the change-proneness of a software system. According to our observation, code clones have a significant effect on method co-changes as well as on MMCGs. We believe that clone refactoring can help us minimize evolutionary coupling among methods.

Patterns of Cross-Language Linking in Java Frameworks
Philip Mayer and Andreas Schroeder
(LMU Munich, Germany)
The term Cross-Language Linking refers to the ability to specify, locate, navigate, and keep intact the connections between artifacts defined in different programming languages used for building one software application. Although understanding cross-language links and keeping them intact during development and maintenance activities is an important productivity issue, there has been little research on understanding the characteristics of such connections. We have thus built a theory from case studies, specifically, three theory-selected Java cross-language frameworks, each of which links artifacts written in the Java programming language to artifacts written in a declarative, framework-specific domain specific language. Our main contribution is to identify, from these experiences, common patterns of cross-language linking in the domain of Java frameworks with DSLs, which besides their informative nature can also be seen as requirements for designing and building a linking language and tooling infrastructure.

Traceability and Feature Location
Tue, May 21, 14:00 - 15:00, Bayview A (Chair: Lori Pollok)

Using Code Ownership to Improve IR-Based Traceability Link Recovery
Diana Diaz, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Silvia Takahashi, and Andrea De Lucia

(Universidad de los Andes, Colombia; University of Sannio, Italy; Wayne State University, USA; University of Molise, Italy; University of Salerno, Italy)
Information Retrieval (IR) techniques have gained wide-spread acceptance as a method for automating traceability recovery. These techniques recover links between software artifacts based on their textual similarity, i.e., the higher the similarity, the higher the likelihood that there is a link between the two artifacts. A common problem with all IR-based techniques is filtering out noise from the list of candidate links, in order to improve the recovery accuracy. Indeed, software artifacts may be related in many ways and the textual information captures only one aspect of their relationships. In this paper we propose to leverage code ownership information to capture relationships between source code artifacts for improving the recovery of traceability links between documentation and source code. Specifically, we extract the author of each source code component and for each author we identify the “context” she worked on. Thus, for a given query from the external documentation we compute the similarity between it and the context of the authors. When retrieving classes that relate to a specific query using a standard IR-based approach we reward all the classes developed by the authors having their context most similar to the query, by boosting their similarity to the query. The proposed approach, named TYRION (TraceabilitY link Recovery using Information retrieval and code OwNership), has been instantiated for the recovery of traceability links between use cases and Java classes of two software systems. The results indicate that code ownership information can be used to improve the accuracy of an IR-based traceability link recovery technique.

Structural Information Based Term Weighting in Text Retrieval for Feature Location
Blake Bassett and Nicholas A. Kraft
(University of Alabama, USA)
Many recent feature location techniques (FLTs) apply text retrieval (TR) techniques to corpora built from text embedded in source code. Term weighting is a standard preprocessing step in TR and is used to adjust the importance of a term within a document or corpus. Common term weighting schemes such as tf-idf may not be optimal for use with source code, because they originate from a natural language context and were designed for use with unstructured documents. In this paper we propose a new approach to term weighting in which term weights are assigned using the structural information from the source code. We then evaluate the proposed approach by conducting an empirical study of a TR-based FLT. In all, we study over 400 bugs and features from five open source Java systems and find that structural term weighting can cause a statistically significant improvement in the accuracy of the FLT.

Comprehending API
Tue, May 21, 15:00 - 16:00, Bayview A (Chair: Chris Parnin)

Extracting Problematic API Features from Forum Discussions
Yingying Zhang and Daqing Hou
(Clarkson University, USA)
Software engineering activities often produce large amounts of unstructured data. Useful information can be extracted from such data to facilitate software development activities, such as bug reports management and documentation provision. Online forums, in particular, contain extensive valuable information that can aid in software development. However, no work has been done to extract {problematic API features} from online forums. In this paper, we investigate ways to extract problematic API features that are discussed as a source of difficulty in each thread, using natural language processing and sentiment analysis techniques. Based on a preliminary manual analysis of the content of a discussion thread and a categorization of the role of each sentence therein, we decide to focus on a negative sentiment sentence and its close neighbors as a unit for extracting API features. We evaluate a set of candidate solutions by comparing tool-extracted problematic API design features with manually produced golden test data. Our best solution yields a precision of 89%. We have also investigated three potential applications for our feature extraction solution: (i) highlighting the negative sentence and its neighbors to help illustrate the main API feature; (ii) searching helpful online information using the extracted API feature as a query; (iii) summarizing the problematic features to reveal the ``hot topics'' in a forum.

Multi-dimensional Exploration of API Usage
Coen De Roover, Ralf Lämmel, and Ekaterina Pek
(Vrije Universiteit Brussel, Belgium; University of Koblenz-Landau, Germany)
This paper is concerned with understanding API usage in a systematic, explorative manner for the benefit of both API developers and API users. There exist complementary, less explorative methods, e.g., based on code search, code completion, or API documentation. In contrast, our approach is highly interactive and can be seen as an extension of what IDEs readily provide today. Exploration is based on multiple dimensions: i) the hierarchically organized scopes of projects and APIs; ii) metrics of API usage (e.g., number of project classes extending API classes); iii) metadata for APIs; iv) project- versus API-centric views. We also provide the QUAATLAS corpus of Java projects which enhances the existing QUALITAS corpus to enable API-usage analysis. We implemented the exploration approach in an open-source, IDE-like, Web-enabled tool EXAPUS.

Comprehending Software Architectures
Tue, May 21, 16:30 - 18:00, Bayview A (Chair: Dirk Beyer)

Evaluating Software Clustering Algorithms in the Context of Program Comprehension
Anas Mahmoud and Nan Niu
(Mississippi State University, USA)
We propose a novel approach for evaluating software clustering algorithms in the context of program comprehension. Based on the assumption that program comprehension is a task-driven activity, our approach utilizes interaction logs from previous maintenance sessions to automatically devise multiple comprehension-aware and task-sensitive decompositions of software systems. These decompositions are then used as authoritative figures to evaluate the effectiveness of various clustering algorithms. Our approach addresses several challenges associated with evaluating clustering algorithms externally using expert-driven authoritative decompositions. Such limitations include the subjectivity of human experts, the availability of such authoritative figures, and the decaying structure of software systems. We conduct an experimental analysis using two datasets, including an open-source system and a proprietary system, to test the applicability of our approach and validate our research claims.

On the Accuracy of Architecture Compliance Checking Support: Accuracy of Dependency Analysis and Violation Reporting
Leo Pruijt, Christian Köppe, and Sjaak Brinkkemper
(Hogeschool Utrecht, Netherlands; Utrecht University, Netherlands)
Architecture Compliance Checking (ACC) is useful to bridge the gap between architecture and implementation. ACC is an approach to verify conformance of implemented program code to high-level models of architectural design. Static ACC focuses on the modular software architecture and on the existence of rule violating dependencies between modules. Accurate tool support is essential for effective and efficient ACC. This paper presents a study on the accuracy of ACC tools regarding dependency analysis and violation reporting. Seven tools were tested and compared by means of a custom-made test application. In addition, the code of open source system Freemind was used to compare the tools on the number and precision of reported violation and dependency messages. On the average, 74 percent of 34 dependency types in our custom-made test software were reported, while 69 percent of 109 violating dependencies within a module of Freemind were reported. The test results show large differences between the tools, but all tools could improve the accuracy of the reported dependencies and violations.

Building Extensions for Applications: Towards the Understanding of Extension Possibilities
Mohamed Aly, Anis Charfi, and Mira Mezini
(SAP, Germany; TU Darmstadt, Germany)
Software extensions enable developers to introduce new features to a software system for supporting new requirements. In order for a developer to build an extension for a certain software system, the developer has to understand what extension possibilities exist, which software artifacts provide these possibilities, the constraints and dependencies between the extensible software artifacts, and how to correctly implement an extension. Building extensions for multilayered applications can be very challenging. For example, a simple user interface extension in a business application can require a developer to consider extensible artifacts from underlying user interfaces, business processes, databases, and code. In commercial applications, extension developers can depend on classical means like APIs, frameworks, documentation, tutorials, and example code provided by the software provider to understand the extension possibilities and how to successfully implement, run, and deploy an extension.
For complex multilayered applications, relying on such classical means can be very hard and time-consuming for the extension developers. In IDEs, various program comprehension tools and approaches have helped developers in carrying out development tasks. However, most of the tools focus on the code level, lack the support for multilayered applications, and do not particularly focus on extensibility. In this paper we investigate the resources and methods that extension developers currently depend on for implementing extensions and we evaluate their effectiveness in a study of extension developers performing extension development tasks for a complex business application. Based on the results of our study, we identify the problems and challenges that face extension developers and we propose requirements that program comprehension tools should support to aid extension developers.

Industry Track
Mon, May 20, 16:30 - 17:30, Bayview B

On the Understanding of Programs with Continuous Code Reviews
Mario Bernhart and Thomas Grechenig
(TU Vienna, Austria)
Code reviews are a very effective, but effortful quality assurance technique. A major problem is to read and understand source-code that was produced by someone else. With different programming styles and complex interactions, understanding the code under review is the most expensive sub- task of a code review. As with many other modern software engineering practices, code reviews may be applied as a continuous process to reduce the effort and support the concept of collective ownership. This study evaluates the effect of a continuous code review process on the understandability and collective ownership of the code base. A group of 8 subjects performed a total of 114 code reviews within 18 months in an industrial context and conducted an expert evaluation according to this research question. This study concludes that there is a clear positive effect on the understandability and collective ownership of the code base with continuous code reviews, but also limiting factors and drawbacks for complex review tasks.

Applying Clone Change Notification System into an Industrial Development Process
Yuki Yamanaka, Eunjong Choi, Norihiro Yoshida, Katsuro Inoue, and Tateki Sano
(Osaka University, Japan; Nara Institute of Science and Technology, Japan; NEC, Japan)
Programmers tend to write code clones unintentionally even in the case that they can easily avoid them. Clone change management is one of crucial issues in open source software (OSS) development as well as in industrial software development (e.g., development of social infrastructure, financial system, and medical equipment). When an industrial developer fixes a defect, he/she has to find the code clones corresponding to the code fragment including it. So far, several studies performed on the analysis of clone evolution in OSS. However, to our knowledge, a few researches have been reported on an application of a clone change notification system to industrial development process. In this paper, we introduce a system for notifying creation and change of code clones, and then report on the experience with 40-days application of it into a development process in NEC Corporation. In the industrial application, a developer successfully identified ten unintentionally-developed clones that should be refactored.

Early Research Achievements Track
Tue, May 21, 16:30 - 18:00, Bayview B

Manhattan: Supporting Real-Time Visual Team Activity Awareness
Michele Lanza

, Marco D'Ambros, Alberto Bacchelli, Lile Hattori, and Francesco Rigotti
(University of Lugano, Switzerland)
Collaboration is essential for the development of complex software systems. An important aspect of collaboration is team awareness: The understanding of the activity of others that provides a context for one’s activity. We claim that the current IDE support for awareness is inadequate: The typical setting is to rely on software configuration management systems (SCMs), which are based on an explicit check-out/check-in model. If developers rely only on SCMs information, they become aware of concurrent changes only when they commit their code to the repository. This generates problems such as complex merging and redundant work. Most tools to raise awareness notify developers of emerging conflicts in the form of textual notifications. We propose to improve the notification by using real-time visualization integrated in the IDE to notify developers of team activity. Our approach, implemented in a tool called Manhattan, eases team activity comprehension by relying on a city metaphor. Manhattan depicts a software system as a live city that changes as the underlying system evolves. Within the city, Manhattan renders team activity information, updating developers in real-time about changes implemented by the entire development team. Further, Manhattan provides programmers with immediate feedback about emerging conflicts in which they are involved.

Blogging Developer Knowledge: Motivations, Challenges, and Future Directions
Chris Parnin, Christoph Treude, and Margaret-Anne Storey
(Georgia Tech, USA; McGill University, Canada; University of Victoria, Canada)
Why do software developers place so much effort into writing public blog posts about their knowledge, experiences, and opinions on software development? What are the benefits, problems, and tools needed--what can the research community do to help? In this paper, we describe a research agenda aimed at understanding the motivations and issues of software development blogging. We interviewed developers as well as mined and analyzed their blog posts. For this initial study, we selected developers from various backgrounds: IDE plugin development, mobile development, and web development. We found that developers used blogging for a variety of functions such as documentation, technology discussion, and announcing progress. They were motivated by a variety of reasons such as personal branding, knowledge retention, and feedback. Among the challenges for blog authors identified in our initial study, we found primitive tool support, difficulty recreating and recalling recent development experiences, and management of blog comments. Finally, many developers expressed that the motivations and benefits they received for blogging in public did not directly translate to corporate settings.

Towards Generating Human-Oriented Summaries of Unit Test Cases
Manabu Kamimura and Gail C. Murphy
(Fujitsu Labs, Japan; University of British Columbia, Canada)
The emergence of usable unit testing frameworks (e.g., JUnit for Java code) and unit test generators (e.g., CodePro for Java code) make it easier to create more comprehensive unit testing suites for applications. Unfortunately, test code, especially generated test code, can be difficult to comprehend. In this paper, we propose generating human-oriented summaries of test cases. We suggest an initial approach based on a static analysis of the source code of the test cases. Our goal is to help improve a human’s ability to quickly comprehend unit test cases so that appropriate decisions can be made about where to place effort when dealing with large unit test suites.

Towards a Unified Software Attack Model to Assess Software Protections
Cataldo Basile and Mariano Ceccato
(Politecnico di Torino, Italy; Fondazione Bruno Kessler, Italy)
Attackers can tamper with programs to break usage conditions. Different software protection techniques have been proposed to limit the possibility of tampering. Some of them just limit the possibility to understand the (binary) code, others react more actively when a change attempt is detected. However, the validation of the software protection techniques has been always conducted without taking into consideration a unified process adopted by attackers to tamper with programs.
In this paper we present an extension of the mini-cycle of change, initially proposed to model the process of changing program for maintenance, to describe the process faced by an attacker to defeat software protections. This paper also shows how this new model should support a developer when considering what are the most appropriate protections to deploy.

Improving the Detection Accuracy of Evolutionary Coupling
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
If two or more program entities (e.g., files, classes, methods) co-change frequently during software evolution, these entities are said to have evolutionary coupling. The entities that frequently co-change (i.e., exhibit evolutionary coupling) are likely to have logical coupling (or dependencies) among them. Association rules and two related measurements, Support and Confidence, have been used to predict whether two or more co-changing entities are logically coupled. In this paper, we propose and investigate a new measurement, Significance, that has the potential to improve the detection accuracy of association rule mining techniques. Our preliminary investigation on four open-source subject systems implies that our proposed measurement is capable of extracting coupling relationships even from infrequently co-changed entity sets that might seem insignificant while considering only Support and Confidence. Our proposed measurement, Significance (in association with Support and Confidence), has the potential to predict logical coupling with higher precision and recall.

Tool Demonstrations
Tue, May 21, 11:00 - 12:30, Bayview B

Agec: An Execution-Semantic Clone Detection Tool
Toshihiro Kamiya
(Future University Hakodate, Japan)
Agec is a semantic code-clone detection tool from Java bytecode, which (1) applies a kind of abstract interpretation to bytecode as a static analysis, in order to generate n-grams of possible execution traces, (2) detects the same n-grams from distinct places of the bytecode, and (3) then reports these n- grams as code clones. The strengths of the tool are: static analysis (no need for test cases), detection of clones of deeply nested invocations, and Map-Reduce ready detection algorithms for scalability.

JSummarizer: An Automatic Generator of Natural Language Summaries for Java Classes
Laura Moreno, Andrian Marcus, Lori Pollock, and K. Vijay-Shanker
(Wayne State University, USA; University of Delaware, USA)
JSummarizer is an Eclipse plug-in for automatically generating natural language summaries of Java classes. The summary is based on the stereotype of the class, which implicitly encodes the design intent of the class and is automatically inferred by JSummarizer. The tool uses a set of predefined heuristics to determine what information will be reflected in the summary, and it uses natural language processing and generation techniques to form the summary. The generated summaries can be used to re-document the code and to help developers to easier understand large and complex classes.

OnionUML: An Eclipse Plug-In for Visualizing UML Class Diagrams in Onion Graph Notation
Michael Falcone and Bonita Sharif
(Youngstown State University, USA)
This paper presents OnionUML, an Eclipse plug-in that reduces the number of visible classes in a UML class diagram while preserving structure and semantics of the UML elements. Compaction of class elements is done using onion graph notation. The goal is that developers will be able to view and understand subsystems of a large software system while being able to visualize how that subsystem fits into the whole system.

SimCad: An Extensible and Faster Clone Detection Tool for Large Scale Software Systems
Md. Sharif Uddin, Chanchal K. Roy, and Kevin A. Schneider
(University of Saskatchewan, Canada)
Code cloning is an inevitable phenomenon in evolution of software systems. To reduce the harmful effects of clones in software evolution, they need to be identified correctly as well in a time efficient way. There might be various types of clones in a software system. Earlier research shows detection of near-miss clones in large datasets appears to be costly in terms of time and memory. Among the clone detection tools available in practice, not very many of them are found effective in that regard. In this paper we present a standalone clone detection tool SimCad. It is based on a highly scalable and faster clone detection algorithm designed to detect both exact and near-miss clones in large-scale software systems. One of the potential aspects of SimCad is that its clone detection function is made more portable by packaging it into a library called SimLib. Thus, SimLib now can be used as an off-the-shelf clone detection library that can be easily integrated into other applications that are designed to work based on detected clones. For example, a standalone tool or an Integrated Development Environment (IDE) plugin can use SimLib for realtime clone detection while providing its own services like clone visualization and/or clone management functionalities. We hope that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspects of detection and management of clones in software.

ICPC 2013 – Proceedings

Preface

Keynote

Technical Research

Textual Analysis Mon, May 20, 11:00 - 12:30, Bayview A (Chair: Gabriele Bavota)

The Role of Visualization in Program Comprehension Mon, May 20, 14:00 - 16:00, Bayview A (Chair: Andrian Marcus)

Software Quality Mon, May 20, 16:30 - 17:30, Bayview A (Chair: Andrew Begel)

Source Code Comprehension Tue, May 21, 11:00 - 12:30, Bayview A (Chair: Andy Zaidman)

Traceability and Feature Location Tue, May 21, 14:00 - 15:00, Bayview A (Chair: Lori Pollok)

Comprehending API Tue, May 21, 15:00 - 16:00, Bayview A (Chair: Chris Parnin)

Comprehending Software Architectures Tue, May 21, 16:30 - 18:00, Bayview A (Chair: Dirk Beyer)

Industry Track Mon, May 20, 16:30 - 17:30, Bayview B

Early Research Achievements Track Tue, May 21, 16:30 - 18:00, Bayview B

Tool Demonstrations Tue, May 21, 11:00 - 12:30, Bayview B