ESEC/FSE 2018 Workshops
26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018)
Powered by
Conference Publishing Consulting

4th ACM SIGSOFT International Workshop on Software Analytics (SWAN 2018), November 5, 2018, Lake Buena Vista, FL, USA

SWAN 2018 – Advance Table of Contents

Contents - Abstracts - Authors
Twitter: https://twitter.com/esecfse

4th ACM SIGSOFT International Workshop on Software Analytics (SWAN 2018)

Title Page

Message from the Chairs
Welcome to the 4th International Workshop on Software Analytics (SWAN 2018), co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE-18) and held on November 5, 2018 in Lake Buena Vista, Florida, USA. Here, researchers meet in an informal interactive forum to exchange ideas and experiences, streamline research on software analytics, identify some common ground of their work, share lessons and challenges, and articulating a vision for the future of software analytics.
(No) Influence of Continuous Integration on the Commit Activity in GitHub Projects
Sebastian Baltes, Jascha Knack, Daniel Anastasiou, Ralf Tymann, and Stephan Diehl
(University of Trier, Germany)
A core goal of Continuous Integration (CI) is to make small incremental changes to software projects, which are integrated frequently into a mainline repository or branch. This paper presents an empirical study that investigates if developers adjust their commit activity towards the above-mentioned goal after projects start using CI. We analyzed the commit and merge activity in 93 GitHub projects that introduced the hosted CI system Travis CI, but have previously been developed for at least one year before introducing CI. In our analysis, we only found one non-negligible effect, an increased merge ratio, meaning that there were more merging commits in relation to all commits after the projects started using Travis CI. This effect has also been reported in related work. However, we observed the same effect in a random sample of 60 GitHub projects not using CI. Thus, it is unlikely that the effect is caused by the introduction of CI alone. We conclude that: (1) in our sample of projects, the introduction of CI did not lead to major changes in developers' commit activity, and (2) it is important to compare the commit activity to a baseline before attributing an effect to a treatment that may not be the cause for the observed effect.
Preprint Info
Characterizing the Influence of Continuous Integration: Empirical Results from 250+ Open Source and Proprietary Projects
Akond Rahman, Amritanshu Agrawal, Rahul Krishna, and Alexander Sobran
(North Carolina State University, USA; IBM, USA)
Continuous integration (CI) tools integrate code changes by automatically compiling, building, and executing test cases upon submission of code changes. Use of CI tools is getting increasingly popular, yet how proprietary projects reap the benefits of CI remains unknown. To investigate the influence of CI on software development, we analyze 150 open source software (OSS) projects, and 123 proprietary projects. For OSS projects, we observe the expected benefits after CI adoption, e.g., improvements in bug and issue resolution. However, for the proprietary projects, we cannot make similar observations. Our findings indicate that only adoption of CI might not be enough to the improve software development process. CI can be effective for software development if practitioners use CI's feedback mechanism efficiently, by applying the practice of making frequent commits. For our set of proprietary projects we observe practitioners commit less frequently, and hence not use CI effectively for obtaining feedback on the submitted code changes. Based on our findings we recommend industry practitioners to adopt the best practices of CI to reap the benefits of CI tools for example, making frequent commits.
Article Search
Facilitating Feasibility Analysis: The Pilot Defects Prediction Dataset Maker
Davide Falessi and Max Moede
(California Polytechnic State University, USA)
Our industrial experience in institutionalizing defect prediction models in the software industry shows that the first step is to measure prediction metrics and defects to assess the feasibility of the tool, i.e., if the accuracy of the defect prediction tool is higher than of a random predictor. However, computing prediction metrics is time consuming and error prone. Thus, the feasibility analysis has a cost which needs some initial investment by the potential clients. This initial investment acts as a barrier for convincing potential clients of the benefits of institutionalizing a software prediction model. To reduce this barrier, in this paper we present the Pilot Defects Prediction Dataset Maker (PDPDM), a desktop application for measuring metrics to use for defect prediction. PDPDM receives as input the repository’s information of a software project, and it provides as output, in an easy and replicable way, a dataset containing a set of 17 well-defined product and process metrics, that have been shown to be useful for defect prediction, such as size and smells. PDPDM avoids the use of outdated datasets and it allows researchers and practitioners to create defect datasets without the need to write any lines of code.
Article Search Video Info
Is One Hyperparameter Optimizer Enough?
Huy Tu and Vivek Nair
(North Carolina State University, USA)

Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics.To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to a defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be “best” and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50% of cases, was no better than using default configurations.

We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.


Article Search
Differentially-Private Software Analytics for Mobile Apps: Opportunities and Challenges
Hailong Zhang, Sufian Latif, Raef Bassily, and Atanas Rountev
(Ohio State University, USA)
Software analytics libraries are widely used in mobile applications, which raises many questions about trade-offs between privacy, utility, and practicality. A promising approach to address these questions is differential privacy. This algorithmic framework has emerged in the last decade as the foundation for numerous algorithms with strong privacy guarantees, and has recently been adopted by several projects in industry and government. This paper discusses the benefits and challenges of employing differential privacy in software analytics used in mobile apps. We aim to outline an initial research agenda that serves as the starting point for further discussions in the software engineering research community.
Article Search
A Comparative Study of FAQs for Software Development
Mathias Ellmann and Irmo Timmann
(University of Hamburg, Germany)
Developers use FAQs (Frequently Asked Questions) to access and share knowledge about software libraries, APIs, and platforms. This paper studies 2,660 questions from 43 FAQ websites. We analyzed accessibility metrics such as the steps from the main documentation page, tagging or multilingualism as well as structure and readability metrics such as code-to-text ratio, number of links, and Flesch Reading-Ease. In addition, we compared these FAQs to 69,548 Stack Overflow (SO) posts, which cover the same topics and which have been posted by developers at least twice (i.e. duplicates). Our results reveal that different software vendors give different importance to their FAQs, e.g. by investing more effort or less in structuring and presenting them. We found that studied FAQs include more references (e.g. to corresponding API documentation) and are more verbose and difficult to read than corresponding SO duplicates. We also found that FAQs cover additional topics compared to corresponding duplicate posts.
Article Search
Towards a Framework for Generating Program Dependence Graphs from Source Code
Victor J. Marin and Carlos R. Rivero
(Rochester Institute of Technology, USA)
Originally conceived for compiler optimization, the program dependence graph has become a widely used internal representation for tools in many software engineering tasks. The currently available frameworks for building program dependence graphs rely on compiled source code, which requires resolving dependencies. As a result, these frameworks cannot be applied for analyzing legacy codebases whose dependencies cannot be automatically resolved, or for large codebases in which resolving dependencies can be infeasible. In this paper, we present a framework for generating program dependence graphs from source code based on transition rules, and we describe lessons learned when implementing two different versions of the framework based on a grammar interpreter and an abstract syntax tree iterator, respectively.
Article Search

proc time: 2.66