Powered by
Conference Publishing Consulting

2015 IEEE 1st International Workshop on Software Analytics (SWAN), March 2, 2015, Montréal, Canada

SWAN 2015 – Proceedings

Contents - Abstracts - Authors

2015 IEEE 1st International Workshop on Software Analytics (SWAN)

Frontmatter

Title Page


Foreword
Welcome to SWAN 2015, the first International Workshop on Software Analytics, co-located with the 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER 2015) and held on March 02, 2015 in Montreal, Canada.

Bugs, Defects, and Crashes

Toward a Learned Project-Specific Fault Taxonomy: Application of Software Analytics
Billy Kidwell and Jane Huffman Hayes
(University of Kentucky, USA)
This position paper argues that fault classification provides vital information for software analytics, and that machine learning techniques such as clustering can be applied to learn a project- (or organization-) specific fault taxonomy. Anecdotal evidence of this position is presented as well as possible areas of research for moving toward the posited goal.

Challenges and Issues of Mining Crash Reports
Le An and Foutse KhomhORCID logo
(Polytechnique Montréal, Canada)
Automatic crash reporting tools built in many software systems allow software practitioners to understand the origin of field crashes and help them prioritise field crashes or bugs, locate erroneous files, and/or predict bugs and crash occurrences in subsequent versions of the software systems. In this paper, after illustrating the structure of crash reports in Mozilla, we discuss some techniques for mining information from crash reports, and highlight the challenges and issues of these techniques. Our aim is to raise the awareness of the research community about issues that may bias research results obtained from crash reports and provide some guidelines to address certain challenges related to mining crash reports.

Bug Report Recommendation for Code Inspection
Shin Fujiwara, Hideaki Hata, Akito Monden, and Kenichi Matsumoto ORCID logo
(NAIST, Japan)
Large software projects such as Mozilla Firefox and Eclipse own more than ten thousand bug reports that have been reported but left unresolved. To utilize such a great amount of unresolved bug reports and accelerate bug detection and removal, we propose to a way recommend programmers a bug report that is likely to contain failure descriptions related to a source file being inspected. We employ the vector space model (VSM) to make a relevancy ranking of bug reports to a given source file. The result of an experiment using data of three open source software projects showed that the accuracies of recommendations ranged from 21.74% to 60.05% in terms of the percentage of recommendations that contained relevant bug reports in a top 10 recommended list.

Testing

Test Case Analytics: Mining Test Case Traces to Improve Risk-Driven Testing
Tanzeem Bin Noor and Hadi Hemmati
(University of Manitoba, Canada)
In risk-driven testing, test cases are generated and/or prioritized based on different risk measures. For example, the most basic risk measure would analyze the history of the software and assigns higher risk to the test cases that used to detect bugs in the past. However, in practice, a test case may not be exactly the same as a previously failed test, but quite similar. In this study, we define a new risk measure that assigns a risk factor to a test case, if it is similar to a failing test case from history. The similarity is defined based on the execution traces of the test cases, where we define each test case as a sequence of method calls. We have evaluated our new risk measure by comparing it to a traditional risk measure (where the risk measure would be increased only if the very same test case, not a similar one, failed in the past). The results of our study, in the context of test case prioritization, on two open source projects show that our new risk measure is by far more effective in identifying failing test cases compared to the traditional risk measure.

Testing Analytics on Software Variability
Hareton K. N. Leung and Kim Man Lui
(Hong Kong Polytechnic University, China)
Software testing is a tool-driven process. However, there are many situations in which different hardware/software components are tightly integrated. Thus system integration testing has to be manually executed to evaluate the system's compliance with its specified requirements and performance. There could be many combinations of changes as different versions of hardware and software components could be upgraded and/or substituted. Occasionally, some software components could even be replaced by clones. The whole system after each component change demands to be re-tested to ensure proper system behavior. For better utilization of resources, there is a need to prioritize the past test cases to test the newly integrated systems. We propose a way to facilitate the use of historical testing records of the previous systems so that a test-case portfolio can be developed, which intends to maximize testing resources for the same integrated product family. As the proposed framework does not consider much of internal software complexity, the implementation costs are relatively low

Source Code, Branches, and Libraries

How We Resolve Conflict: An Empirical Study of Method-Level Conflict Resolution
Ryohei Yuzuki, Hideaki Hata, and Kenichi Matsumoto ORCID logo
(NAIST, Japan)
Context: Branching and merging are common activities in large-scale software development projects. Isolated development with branching enables developers to focus their effort on their specific tasks without wasting time on the problems caused by other developers’ changes. After the completion of tasks in branches, such branches should be integrated into common branches by merging. When conflicts occur in merging, developers need to resolve the conflicts, which are troublesome. Goal: To support conflict resolution in merging, we aim to understand how conflicts are resolved in practice from large-scale study. Method: We present techniques for identifying conflicts and detecting conflict resolution in method level. Result: From the analysis of 10 OSS projects written in Java, we found that (1) 44% (339/779) of conflicts are caused by changing concurrently the same positions of methods, 48% (375/779) are by deleting methods, 8% (65/779) are by renaming methods, and that (2) 99% (771/779) of conflicts are resolved by adopting one method directly. Conclusions: Our results suggest that most of conflicts are resolved by simple way. One of our future works is developing methods for supporting conflict resolution.

M3: A General Model for Code Analytics in Rascal
Bas Basten, Mark Hills, Paul Klint, Davy Landman, Ashim Shahi, Michael J. Steindorfer, and Jurgen J. Vinju ORCID logo
(CWI, Netherlands; East Carolina University, USA)
This short paper introduces M3, a simple and extensible model for capturing facts about source code for future analysis. M3 is a core part of the standard library of the Rascal meta programming language. We motivate it, position it to related work and detail the key design aspects.

Studying the Impact of Evolution in R Libraries on Software Engineering Research
Catherine Ramirez, Meiyappan Nagappan, and Mehdi Mirakhorli
(Rochester Institute of Technology, USA)
Empirical software engineering has become an integral and important part of software engineering research in both academia and industry. Every year several new theories are empirically validated by mining and analyzing historical data from open source and closed source projects. Researchers rely on statistical libraries in tools like R, Weka, SAS, SPPS, and Matlab for their analysis. However, these libraries like any software library undergo periodic maintenance. Such maintenance can be to improve performance, but can also be to alter the core algorithms behind the library. If indeed the core algorithms are changed, then the empirical results that have been compiled with the previous versions may not be current anymore. However, this problem exists only if (a) statistical libraries are constantly edited and (b) the results they produce are difference from one version to another. Hence in this paper, we first explore if either of the above two conditions hold true for one library in the statistical package R. We find that both conditions are true in the case of the randomForest method in the randomForest package.

Tools

Analyzing Dynamic Information with Spy and Roassal: An Experience Report
Alison Fernandez, Diego Gabriel Nuñez Duran, Alejandro Infante, and Alexandre Bergel
(University of San Simon, Bolivia; University of Chile, Chile)
Dynamic analyses tools are seldom crafted by practitioners. This paper discusses the benefits of supporting the practitioners to build their ad-hoc tool and presents our experience to lower the barrier to gather dynamic information. The experience we present is driven by the combination of the Spy profiling framework and the Roassal visualization engine, two frameworks used in industry and academia. We conclude with two question to discuss at the workshop.

MARFCAT: Fast Code Analysis for Defects and Vulnerabilities
Serguei A. Mokhov, Joey Paquet, and Mourad Debbabi
(Concordia University, Canada)
We present a fast machine-learning approach to static code analysis and fingerprinting for weaknesses related to security, software engineering, and others using the open-source MARF framework and its MARFCAT application. We used the NIST's SATE IV static analysis tool exposition workshop’s data sets that included popular open-source projects and large synthetic sets as test cases. To aid detection of weak or vulnerable code, including source or binary on different platforms the machine learning approach proved to be fast and accurate to for such tasks where other tools are either much slower or have much smaller recall of known vulnerabilities. We use signal processing techniques in our approach to accomplish the classification tasks. MARFCAT's design is independent of the language being analyzed, source code, bytecode, or binary.

Info

Collaboration

University-Industry Collaboration and Open Source Software (OSS) Dataset in Mining Software Repositories (MSR) Research
Ambika Tripathi, Savita Dabral, and Ashish Sureka
(IIIT Delhi, India; SARL, India)
Mining Software Repositories (MSR) is an applied and practise-oriented field aimed at solving real problems encountered by practitioners and bringing value to Industry. We believe that empirical studies on both Open Source Software (OSS) and Closed or Proprietary Source (CSS/PSS) is required in MSR research to increase generalizability or transferability of findings and reduce external (or threats) validity concerns. Furthermore, we believe that a collaboration between University and Industry is must or important in achieving the stated goals and agenda of MSR research (such as deployment and technology transfer). We analyse past five years of research papers published in MSR series of conferences (2010-2014) and count the number of studies using solely OSS data or solely CSS data or both OSS and CSS data. We also count the number of papers published by authors solely from Universities, solely from Industry and from both University and Industry. We present our findings which indicate lack of University-Industry collaboration (measured using co-authorship in scientific publications) and paucity of empirical studies on CSS/PSS data. Our analysis reveals that out of 187 studies over a period of 5 years, 90.9

proc time: 0.47