Powered by
2015 IEEE 1st International Workshop on Software Analytics (SWAN),
March 2, 2015,
Montréal, Canada
2015 IEEE 1st International Workshop on Software Analytics (SWAN)
Frontmatter
Foreword
Welcome to SWAN 2015, the first International Workshop on Software Analytics, co-located with the
22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER 2015) and
held on March 02, 2015 in Montreal, Canada.
Bugs, Defects, and Crashes
Toward a Learned Project-Specific Fault Taxonomy: Application of Software Analytics
Billy Kidwell and
Jane Huffman Hayes
(University of Kentucky, USA)
This position paper argues that fault classification provides vital information for software analytics, and that machine learning techniques such as clustering can be applied to learn a project- (or organization-) specific fault taxonomy. Anecdotal evidence of this position is presented as well as possible areas of research for moving toward the posited goal.
@InProceedings{SWAN15p1,
author = {Billy Kidwell and Jane Huffman Hayes},
title = {Toward a Learned Project-Specific Fault Taxonomy: Application of Software Analytics},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {1--4},
doi = {},
year = {2015},
}
Challenges and Issues of Mining Crash Reports
Le An and
Foutse Khomh
(Polytechnique Montréal, Canada)
Automatic crash reporting tools built in many software systems allow software practitioners to understand the origin of field crashes and help them prioritise field crashes or bugs, locate erroneous files, and/or predict bugs and crash occurrences in subsequent versions of the software systems. In this paper, after illustrating the structure of crash reports in Mozilla, we discuss some techniques for mining information from crash reports, and highlight the challenges and issues of these techniques. Our aim is to raise the awareness of the research community about issues that may bias research results obtained from crash reports and provide some guidelines to address certain challenges related to mining crash reports.
@InProceedings{SWAN15p5,
author = {Le An and Foutse Khomh},
title = {Challenges and Issues of Mining Crash Reports},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {5--8},
doi = {},
year = {2015},
}
Bug Report Recommendation for Code Inspection
Shin Fujiwara,
Hideaki Hata,
Akito Monden, and
Kenichi Matsumoto
(NAIST, Japan)
Large software projects such as Mozilla Firefox and Eclipse own more than ten thousand bug reports that have been reported but left unresolved. To utilize such a great amount of unresolved bug reports and accelerate bug detection and removal, we propose to a way recommend programmers a bug report that is likely to contain failure descriptions related to a source file being inspected. We employ the vector space model (VSM) to make a relevancy ranking of bug reports to a given source file. The result of an experiment using data of three open source software projects showed that the accuracies of recommendations ranged from 21.74% to 60.05% in terms of the percentage of recommendations that contained relevant bug reports in a top 10 recommended list.
@InProceedings{SWAN15p9,
author = {Shin Fujiwara and Hideaki Hata and Akito Monden and Kenichi Matsumoto},
title = {Bug Report Recommendation for Code Inspection},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {9--12},
doi = {},
year = {2015},
}
Testing
Test Case Analytics: Mining Test Case Traces to Improve Risk-Driven Testing
Tanzeem Bin Noor and
Hadi Hemmati
(University of Manitoba, Canada)
In risk-driven testing, test cases are generated
and/or prioritized based on different risk measures. For example,
the most basic risk measure would analyze the history of the
software and assigns higher risk to the test cases that used to
detect bugs in the past. However, in practice, a test case may not
be exactly the same as a previously failed test, but quite similar. In
this study, we define a new risk measure that assigns a risk factor
to a test case, if it is similar to a failing test case from history.
The similarity is defined based on the execution traces of the test
cases, where we define each test case as a sequence of method
calls. We have evaluated our new risk measure by comparing it
to a traditional risk measure (where the risk measure would be
increased only if the very same test case, not a similar one, failed
in the past). The results of our study, in the context of test case
prioritization, on two open source projects show that our new
risk measure is by far more effective in identifying failing test
cases compared to the traditional risk measure.
@InProceedings{SWAN15p13,
author = {Tanzeem Bin Noor and Hadi Hemmati},
title = {Test Case Analytics: Mining Test Case Traces to Improve Risk-Driven Testing},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {13--16},
doi = {},
year = {2015},
}
Testing Analytics on Software Variability
Hareton K. N. Leung and
Kim Man Lui
(Hong Kong Polytechnic University, China)
Software testing is a tool-driven process. However, there are many situations in which different hardware/software components are tightly integrated. Thus system integration testing has to be manually executed to evaluate the system's compliance with its specified requirements and performance. There could be many combinations of changes as different versions of hardware and software components could be upgraded and/or substituted. Occasionally, some software components could even be replaced by clones. The whole system after each component change demands to be re-tested to ensure proper system behavior. For better utilization of resources, there is a need to prioritize the past test cases to test the newly integrated systems. We propose a way to facilitate the use of historical testing records of the previous systems so that a test-case portfolio can be developed, which intends to maximize testing resources for the same integrated product family. As the proposed framework does not consider much of internal software complexity, the implementation costs are relatively low
@InProceedings{SWAN15p17,
author = {Hareton K. N. Leung and Kim Man Lui},
title = {Testing Analytics on Software Variability},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {17--20},
doi = {},
year = {2015},
}
Source Code, Branches, and Libraries
How We Resolve Conflict: An Empirical Study of Method-Level Conflict Resolution
Ryohei Yuzuki,
Hideaki Hata, and
Kenichi Matsumoto
(NAIST, Japan)
Context: Branching and merging are common activities in large-scale software development projects. Isolated development with branching enables developers to focus their effort on their specific tasks without wasting time on the problems caused by other developers’ changes. After the completion of tasks in branches, such branches should be integrated into common branches by merging. When conflicts occur in merging, developers need to resolve the conflicts, which are troublesome. Goal: To support conflict resolution in merging, we aim to understand how conflicts are resolved in practice from large-scale study. Method: We present techniques for identifying conflicts and detecting conflict resolution in method level. Result: From the analysis of 10 OSS projects written in Java, we found that (1) 44% (339/779) of conflicts are caused by changing concurrently the same positions of methods, 48% (375/779) are by deleting methods, 8% (65/779) are by renaming methods, and that (2) 99% (771/779) of conflicts are resolved by adopting one method directly. Conclusions: Our results suggest that most of conflicts are resolved by simple way. One of our future works is developing methods for supporting conflict resolution.
@InProceedings{SWAN15p21,
author = {Ryohei Yuzuki and Hideaki Hata and Kenichi Matsumoto},
title = {How We Resolve Conflict: An Empirical Study of Method-Level Conflict Resolution},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {21--24},
doi = {},
year = {2015},
}
M3: A General Model for Code Analytics in Rascal
Bas Basten,
Mark Hills,
Paul Klint,
Davy Landman,
Ashim Shahi,
Michael J. Steindorfer, and
Jurgen J. Vinju
(CWI, Netherlands; East Carolina University, USA)
This short paper introduces M3, a simple and extensible model for capturing facts about source code for future analysis. M3 is a core part of the standard library of the Rascal meta programming language. We motivate it, position it to related work and detail the key design aspects.
@InProceedings{SWAN15p25,
author = {Bas Basten and Mark Hills and Paul Klint and Davy Landman and Ashim Shahi and Michael J. Steindorfer and Jurgen J. Vinju},
title = {M3: A General Model for Code Analytics in Rascal},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {25--28},
doi = {},
year = {2015},
}
Studying the Impact of Evolution in R Libraries on Software Engineering Research
Catherine Ramirez,
Meiyappan Nagappan, and
Mehdi Mirakhorli
(Rochester Institute of Technology, USA)
Empirical software engineering has become an integral and important part of software engineering research in both academia and industry. Every year several new theories are empirically validated by mining and analyzing historical data from open source and closed source projects. Researchers rely on statistical libraries in tools like R, Weka, SAS, SPPS, and Matlab for their analysis. However, these libraries like any software library undergo periodic maintenance. Such maintenance can be to improve performance, but can also be to alter the core algorithms behind the library. If indeed the core algorithms are changed, then the empirical results that have been compiled with the previous versions may not be current anymore. However, this problem exists only if (a) statistical libraries are constantly edited and (b) the results they produce are difference from one version to another. Hence in this paper, we first explore if either of the above two conditions hold true for one library in the statistical package R. We find that both conditions are true in the case of the randomForest method in the randomForest package.
@InProceedings{SWAN15p29,
author = {Catherine Ramirez and Meiyappan Nagappan and Mehdi Mirakhorli},
title = {Studying the Impact of Evolution in R Libraries on Software Engineering Research},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {29--30},
doi = {},
year = {2015},
}
Tools
Analyzing Dynamic Information with Spy and Roassal: An Experience Report
Alison Fernandez,
Diego Gabriel Nuñez Duran,
Alejandro Infante, and
Alexandre Bergel
(University of San Simon, Bolivia; University of Chile, Chile)
Dynamic analyses tools are seldom crafted by practitioners. This paper discusses the benefits of supporting the practitioners to build their ad-hoc tool and presents our experience to lower the barrier to gather dynamic information. The experience we present is driven by the combination of the Spy profiling framework and the Roassal visualization engine, two frameworks used in industry and academia. We conclude with two question to discuss at the workshop.
@InProceedings{SWAN15p31,
author = {Alison Fernandez and Diego Gabriel Nuñez Duran and Alejandro Infante and Alexandre Bergel},
title = {Analyzing Dynamic Information with Spy and Roassal: An Experience Report},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {31--34},
doi = {},
year = {2015},
}
MARFCAT: Fast Code Analysis for Defects and Vulnerabilities
Serguei A. Mokhov,
Joey Paquet, and
Mourad Debbabi
(Concordia University, Canada)
We present a fast machine-learning approach to static code analysis and fingerprinting for weaknesses related to security, software engineering, and others using the open-source MARF framework and its MARFCAT application. We used the NIST's SATE IV static analysis tool exposition workshop’s data sets that included popular open-source projects and large synthetic sets as test cases. To aid detection of weak or vulnerable code, including source or binary on different platforms the machine learning approach proved to be fast and accurate to for such tasks where other tools are either much slower or have much smaller recall of known vulnerabilities. We use signal processing techniques in our approach to accomplish the classification tasks. MARFCAT's design is independent of the language being analyzed, source code, bytecode, or binary.
@InProceedings{SWAN15p35,
author = {Serguei A. Mokhov and Joey Paquet and Mourad Debbabi},
title = {MARFCAT: Fast Code Analysis for Defects and Vulnerabilities},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {35--38},
doi = {},
year = {2015},
}
Info
Collaboration
University-Industry Collaboration and Open Source Software (OSS) Dataset in Mining Software Repositories (MSR) Research
Ambika Tripathi,
Savita Dabral, and
Ashish Sureka
(IIIT Delhi, India; SARL, India)
Mining Software Repositories (MSR) is an applied and practise-oriented field aimed at solving real problems encountered by practitioners and bringing value to Industry. We believe that empirical studies on both Open Source Software (OSS) and Closed or Proprietary Source (CSS/PSS) is required in MSR research to increase generalizability or transferability of findings and reduce external (or threats) validity concerns. Furthermore, we believe that a collaboration between University and Industry is must or important in achieving the stated goals and agenda of MSR research (such as deployment and technology transfer). We analyse past five years of research papers published in MSR series of conferences (2010-2014) and count the number of studies using solely OSS data or solely CSS data or both OSS and CSS data. We also count the number of papers published by authors solely from Universities, solely from Industry and from both University and Industry. We present our findings which indicate lack of University-Industry collaboration (measured using co-authorship in scientific publications) and paucity of empirical studies on CSS/PSS data. Our analysis reveals that out of 187 studies over a period of 5 years, 90.9
@InProceedings{SWAN15p39,
author = {Ambika Tripathi and Savita Dabral and Ashish Sureka},
title = {University-Industry Collaboration and Open Source Software (OSS) Dataset in Mining Software Repositories (MSR) Research},
booktitle = {Proc.\ SWAN},
publisher = {IEEE},
pages = {39--40},
doi = {},
year = {2015},
}
proc time: 0.1