SANER 2017

2017 IEEE 24th International Conference on Software Analysis, Evolution, and Reengineering (SANER), February 20-24, 2017, Klagenfurt, Austria

Desktop Layout

Software and Model Analysis
Main Research
Reducing Redundancies in Multi-revision Code Analysis
Carol V. Alexandru, Sebastiano Panichella, and Harald C. Gall
(University of Zurich, Switzerland)
Abstract: Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code.

Authors:


Time stamp: 2020-02-28T21:38:31+01:00