SWAN 2017 – Proceedings

Message from the Chairs
It is our pleasure to welcome you to SWAN 2017, the third International Workshop on Software Analytics, co-located with 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) and held on September 04, 2017 in Paderborn, Germany. SWAN 2017 aims at providing a common venue for researchers and practitioners across software engineering, data mining and mining software repositories research domains to share new approaches and emerging results in developing and validating analytics rich solutions, as well as adopting analytics to software development and maintenance processes to better inform their everyday decisions.

Find, Understand, and Extend Development Screencasts on YouTube
Mathias Ellmann, Alexander Oeser, Davide Fucci, and Walid Maalej
(University of Hamburg, Germany; HITeC, Germany)
A software development screencast is a video that captures the screen of a developer working on a particular task and explaining implementation details. Due to the increased popularity of development screencasts e.g., on YouTube, we study how and to what extent they can be used as additional source of knowledge to answer developers’ questions, for example about the use of a specific API. We first study the difference between development screencasts and other types of screencasts using video frame analysis. When comparing frames with the Cosine algorithm, developers can expect ten development screencasts in the top 20 out of 100 different YouTube videos. We then extracted popular development topics. These were: database operations, system set-up, plug-in development, game development, and testing. We also identified six recurring tasks performed in development screencasts, such as object usage and UI operations. Finally, we conducted a similarity analysis of the screencast transcripts and the Javadoc of the corresponding screencasts.

Analyzing Source Code for Automated Design Pattern Recommendation
Oliver Hummel and Stefan Burger
(Mannheim University of Applied Sciences, Germany; Siemens, Germany)
Mastery of the subtleties of object-oriented programming lan- guages is undoubtedly challenging to achieve. Design patterns have been proposed some decades ago in order to support soft- ware designers and developers in overcoming recurring challeng- es in the design of object-oriented software systems. However, given that dozens if not hundreds of patterns have emerged so far, it can be assumed that their mastery has become a serious chal- lenge in its own right. In this paper, we describe a proof of con- cept implementation of a recommendation system that aims to detect opportunities for the Strategy design pattern that developers have missed so far. For this purpose, we have formalized natural language pattern guidelines from the literature and quantified them for static code analysis with data mined from a significant collection of open source systems. Moreover, we present the re- sults from analyzing 25 different open source systems with this prototype as it discovered more than 200 candidates for imple- menting the Strategy pattern and the encouraging results of a pre- liminary evaluation with experienced developers. Finally, we sketch how we are currently extending this work to other patterns.

Metadata-Based Code Example Embedding
Philippe Tamla, Sven Feja, and Christian R. Prause
(NTT DATA, Germany; adesso, Germany; German Aerospace Center, Germany)
In practice, developers usually seek different ways to save time and effort. Thus, they use different tools (such as search engines, issue tracking, or Q&A sites) to collaborate and find code examples that meet their specific needs. However, such tools only support the traditional find-alter-embed approach of code examples while ignoring the origin and location of these sources. Such information can be very useful to assist software development tasks such as bug-fixing, teamwork, and knowledge transfer, through direct notification of critical changes made to the code example, or access to the original source including its discussions, issues, and bug reports. In this paper, we propose a new approach that consists of collecting meta information about a code example to automatically track critical changes to it and its origin and provide feedback to both developers and the online community. We report on our vision, approach and challenges, and draft a software architecture to implement our research idea.

Timezone and Time-of-Day Variance in GitHub Teams: An Empirical Method and Study
Premkumar Devanbu

, Pallavi Kudigrama, Cindy Rubio-González, and Bogdan Vasilescu

(University of California at Davis, USA; Carnegie Mellon University, USA)
Open source projects based in ecosystems like GitHub seamlessly allow distributed software development. Contributors to some GitHub projects may originate from many different timezones; in others they may all reside in just one timezone. How might this timezone dispersion (or concentration) affect the diurnal distribution of work activity in these projects? In commercial projects, there has been a desire to use top-down management and work allocation to exploit timezone dispersion of project teams, to engender a more round-the-clock work cycle. We focus on GitHub, and explore the relationship between timezone dispersion and work activity dispersion. We find that while time-of-day work activity dispersion is indeed associated strongly with timezone dispersion, it is equally (if not more strongly) affected by project team size.

Predicting Rankings of Software Verification Tools
Mike Czech, Eyke Hüllermeier, Marie-Christine Jakobs, and Heike Wehrheim
(University of Paderborn, Germany)
Today, software verification tools have reached the maturity to be used for large scale programs. Different tools perform differently well on varying code. A software developer is hence faced with the problem of choosing a tool appropriate for her program at hand. A ranking of tools on programs could facilitate the choice. Such rankings can, however, so far only be obtained by running all considered tools on the program.
In this paper, we present a machine learning approach to predicting rankings of tools on programs. The method builds upon so-called label ranking algorithms, which we complement with appropriate kernels providing a similarity measure for programs. Our kernels employ a graph representation for software source code that mixes elements of control flow and program dependence graphs with abstract syntax trees. Using data sets from the software verification competition SV-COMP, we demonstrate our rank prediction technique to generalize well and achieve a rather high predictive accuracy (rank correlation > 0.6).

SWAN 2017 – Proceedings

3rd ACM SIGSOFT International Workshop on Software Analytics (SWAN 2017)