MAINT 2019 – Proceedings

Message from the Chairs
Welcome to the 2nd International Workshop on Mining and Analyzing Interaction Histories (MAINT 2019), co-located with SANER 2019, the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, held in Hangzhou, China on the 24th of February 2019. MAINT focuses on the interactions of developers with the IDE, i.e., large sequences of events that capture how developers peruse the IDEs functionalities to support the programming task at hand, including activities like reading, writing, and debugging source code. These interaction histories can be seen at different levels of abstraction: They can be high-level events related to the domain of code entities, like a refactoring or adding a method to a class, very basic events like typing the keyboard or moving the mouse to reach for a specific UI in the IDE, or even biometric data captured by wearable devices.

Comprehending and Empowering Developers by Mining Interaction Data (Keynote)
David Lo

(Singapore Management University, Singapore)
Developer interaction data is a rich trove of information that can be analysed and mined to better comprehend developer activities and which in turn shed light into their pain points and needs — and empower developers to perform their tasks more effectively and/or efficiently. Interaction data can be obtained by tracking developer activities in IDE and also related tools that developers regularly use to accomplish their day-to-day tasks (e.g., web browser). Mining and analysis of developer interaction data is a relatively young topic in the area of mining software repositories (MSR). And as such, it is now an exciting time where there are much challenges to address and opportunities to tap on.
This talk highlights several of our recent work which is part of the exciting community-wide effort to tackle challenges and take advantage of the potentials of mining developer interaction data. In particular, this talk, which is divided into 3 parts, highlights how developer interaction data can be mined to (1) comprehend how developers understand programs, (2) comprehend how developers search the web, and (3) empower developers to create interactive video tutorials. In the first part, I'll describe our field study of program comprehension in practice by analysing developer interaction data within and outside IDE across a total of seven real projects, on 78 professional developers, and amounting to 3,148 working hours. In the second part, I describe our mixed-method study based on collecting search queries from 60 developers' interaction data and surveying 235 software engineers from more than 21 countries across five continues to understand what developers frequently search for and of the search tasks that they often find challenging. In the final part, I describe our work that design and build a programming video tutorial authoring system that that leverages operating system level instrumentation to log workflow history while tutorial authors are creating programming videos, and the corresponding tutorial watching system that enhances the learning experience of video tutorials by providing programming-specific workflow history and timeline-based browsing interactions.
Aside from methodologies and findings, opportunities in terms of open technical problems and potential benefits of mining developer interaction data will also be discussed. Hopefully, the talk would inspire attendees to continue innovating in this exciting research topic.

Visual Studio Automated Refactoring Tool Should Improve Development Time, but ReSharper Led to More Solution-Build Failures
Ehsan Firouzi and Ashkan Sami
(Shiraz University, Iran)
Several studies showed effectiveness of manual refactoring. Thus, automated refactoring tools in Visual Studio are expected to have positive effects on the success rate of ‘Builds’. Few studies have actually investigated automated refactoring tools impact in this aspect. In addition, we have investigated test results, commits and undo changes into version control system (VCS) also. Firstly, if the number of successful compilation of session solution ‘Builds’ is considered as a metric, our investigation on Enriched Event Stream Dataset showed use of ReSharper has lowered the success rate of builds, introduced errors and may have increased the development time. Secondly, test error rates decreased when ReSharper tools are used aligned with previous studies on software testing. Finally, percentage of commits has significantly increased and reverts decreased on use of ReSharper. Overall, despite some positive impacts that the automated refactoring tool of Visual Studio had, our study showed that the automated suggestions and changes that ReSharper has made are not completely trustable. Our research suggests that ReSharper may not consider a global view of the solution.

Statistical API Completion Based on Code Relevance Mining
Chengpeng Wang, Yixiao Yang, Han Liu, and Le Kang
(Tsinghua University, China; Chinese Academy of Sciences, China)
While Application Programming Interface (API) enables an easy and flexible software development process, selecting a best-fit API is often non-straightforward in practice due to misunderstanding on the API specification or a complex programming context etc.. Consequently, the API selection has always been time-consuming and error-prone. In recent years, API recommendation systems have been introduced to help developers choose an API automatically, e.g., Eclipse and IntelliJ can generate an internal or user-defined API on the fly. Other research leveraged language models to capture the regularity in API usage and further guide the completion of APIs. While existing approaches provided a general support for API usage, they suffer from the lack of semantic awareness (e.g., Eclipse) and code relevance (e.g., language model based methods). To overcome these limitations, we proposed CRMAC in this paper. The key insight of CRMAC is a combination of a cache language model which learns code regularity from both open-source projects and local projects, as well as a relevance mining engine that identifies similar code to enable a weighted language model training. In our empirical evaluation, CRMAC overwhelmed n-gram approaches, with an improvement of 5.28% in terms of top 10 accuracy. Moreover, over 79% APIs were correctly predicted in the top 10 guesses of CRMAC.

Summarizing Code Changes by Tracing an Operation History Graph
Takayuki Omori, Katsuhisa Maruyama, and Atsushi Ohnishi
(Ritsumeikan University, Japan)
By replaying the edit history of source code, the changes in software can be precisely understood. Recently, several replay tools have been proposed to support developers' tasks of understanding past code changes. However, replaying is still a time-consuming task due to a plethora of the edit operations. This paper proposes a novel method to generate a summary of edit histories. The method is supposed to be used before replaying with existing tools to easily find which part is important for the current understanding task. This paper also introduces OpG2, which is a graph format that represents change histories at code and syntax levels. We can easily summarize edit operations by tracing edges in the graph. A case study shows an example of the summarization process using an operation history derived from an actual software development.

Toward Interaction-Based Evaluation of Visualization Approaches to Comprehending Program Behavior
Lyu Kaixie, Kunihiro Noda, and Takashi Kobayashi
(Tokyo Institute of Technology, Japan)
Reverse-engineered sequence diagrams are promising tools to comprehend the runtime behavior of object-oriented programs. To improve the readability and understandability of massive-scale sequence diagrams, various techniques for effectively exploring or compressing sequence diagrams have been proposed in the literature. When researchers analyze the effectiveness of these approaches through user studies, it is important to reveal not only what techniques can improve developer productivity but also how developers explore reverse-engineered sequence diagrams and how exploration and compression features are utilized. We developed a feature to record interactions between a developer and recovered sequence diagrams in our tool, SDExplorer. We show how the recorded interaction data can be used for in-depth analysis of developer activities, toward the evaluation of visualization approaches to helping behavioral comprehension.

MAINT 2019 – Proceedings

2019 IEEE 2nd International Workshop on Mining and Analyzing Interaction Histories (MAINT)