Powered by
4th ACM SIGSOFT International Workshop on Software Analytics (SWAN 2018),
November 5, 2018,
Lake Buena Vista, FL, USA
4th ACM SIGSOFT International Workshop on Software Analytics (SWAN 2018)
Message from the Chairs
Welcome to the 4th International Workshop on Software Analytics (SWAN 2018), co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE-18) and held on November 5, 2018 in Lake Buena Vista, Florida, USA. Here, researchers meet in an informal interactive forum to exchange ideas and experiences, streamline research on software analytics, identify some common ground of their work, share lessons and challenges, and articulating a vision for the future of software analytics.
(No) Influence of Continuous Integration on the Commit Activity in GitHub Projects
Sebastian Baltes, Jascha Knack, Daniel Anastasiou, Ralf Tymann, and Stephan Diehl
(University of Trier, Germany)
A core goal of Continuous Integration (CI) is to make small incremental changes to software projects, which are integrated frequently into a mainline repository or branch. This paper presents an empirical study that investigates if developers adjust their commit activity towards the above-mentioned goal after projects start using CI. We analyzed the commit and merge activity in 93 GitHub projects that introduced the hosted CI system Travis CI, but have previously been developed for at least one year before introducing CI. In our analysis, we only found one non-negligible effect, an increased merge ratio, meaning that there were more merging commits in relation to all commits after the projects started using Travis CI. This effect has also been reported in related work. However, we observed the same effect in a random sample of 60 GitHub projects not using CI. Thus, it is unlikely that the effect is caused by the introduction of CI alone. We conclude that: (1) in our sample of projects, the introduction of CI did not lead to major changes in developers' commit activity, and (2) it is important to compare the commit activity to a baseline before attributing an effect to a treatment that may not be the cause for the observed effect.
@InProceedings{SWAN18p1,
author = {Sebastian Baltes and Jascha Knack and Daniel Anastasiou and Ralf Tymann and Stephan Diehl},
title = {(No) Influence of Continuous Integration on the Commit Activity in GitHub Projects},
booktitle = {Proc.\ SWAN},
publisher = {ACM},
pages = {1--7},
doi = {10.1145/3278142.3278143},
year = {2018},
}
Publisher's Version
Info
Characterizing the Influence of Continuous Integration: Empirical Results from 250+ Open Source and Proprietary Projects
Akond Rahman, Amritanshu Agrawal, Rahul Krishna, and Alexander Sobran
(North Carolina State University, USA; IBM, USA)
Continuous integration (CI) tools integrate code changes by automatically compiling, building, and executing test cases upon submission of code changes. Use of CI tools is getting increasingly popular, yet how proprietary projects reap the benefits of CI remains unknown. To investigate the influence of CI on software development, we analyze 150 open source software (OSS) projects, and 123 proprietary projects. For OSS projects, we observe the expected benefits after CI adoption, e.g., improvements in bug and issue resolution. However, for the proprietary projects, we cannot make similar observations. Our findings indicate that only adoption of CI might not be enough to the improve software development process. CI can be effective for software development if practitioners use CI's feedback mechanism efficiently, by applying the practice of making frequent commits. For our set of proprietary projects we observe practitioners commit less frequently, and hence not use CI effectively for obtaining feedback on the submitted code changes. Based on our findings we recommend industry practitioners to adopt the best practices of CI to reap the benefits of CI tools for example, making frequent commits.
@InProceedings{SWAN18p8,
author = {Akond Rahman and Amritanshu Agrawal and Rahul Krishna and Alexander Sobran},
title = {Characterizing the Influence of Continuous Integration: Empirical Results from 250+ Open Source and Proprietary Projects},
booktitle = {Proc.\ SWAN},
publisher = {ACM},
pages = {8--14},
doi = {10.1145/3278142.3278149},
year = {2018},
}
Publisher's Version
Facilitating Feasibility Analysis: The Pilot Defects Prediction Dataset Maker
Davide Falessi and Max Jason Moede
(California Polytechnic State University, USA)
Our industrial experience in institutionalizing defect prediction models in the software industry shows that the first step is to measure prediction metrics and defects to assess the feasibility of the tool, i.e., if the accuracy of the defect prediction tool is higher than of a random predictor. However, computing prediction metrics is time consuming and error prone. Thus, the feasibility analysis has a cost which needs some initial investment by the potential clients. This initial investment acts as a barrier for convincing potential clients of the benefits of institutionalizing a software prediction model. To reduce this barrier, in this paper we present the Pilot Defects Prediction Dataset Maker (PDPDM), a desktop application for measuring metrics to use for defect prediction. PDPDM receives as input the repository’s information of a software project, and it provides as output, in an easy and replicable way, a dataset containing a set of 17 well-defined product and process metrics, that have been shown to be useful for defect prediction, such as size and smells. PDPDM avoids the use of outdated datasets and it allows researchers and practitioners to create defect datasets without the need to write any lines of code.
@InProceedings{SWAN18p15,
author = {Davide Falessi and Max Jason Moede},
title = {Facilitating Feasibility Analysis: The Pilot Defects Prediction Dataset Maker},
booktitle = {Proc.\ SWAN},
publisher = {ACM},
pages = {15--18},
doi = {10.1145/3278142.3278147},
year = {2018},
}
Publisher's Version
Video
Info
Is One Hyperparameter Optimizer Enough?
Huy Tu and Vivek Nair
(North Carolina State University, USA)
Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics.To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to a defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be “best” and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50% of cases, was no better than using default configurations.
We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.
@InProceedings{SWAN18p19,
author = {Huy Tu and Vivek Nair},
title = {Is One Hyperparameter Optimizer Enough?},
booktitle = {Proc.\ SWAN},
publisher = {ACM},
pages = {19--25},
doi = {10.1145/3278142.3278145},
year = {2018},
}
Publisher's Version
Differentially-Private Software Analytics for Mobile Apps: Opportunities and Challenges
Hailong Zhang, Sufian Latif, Raef Bassily, and
Atanas Rountev
(Ohio State University, USA)
Software analytics libraries are widely used in mobile
applications, which raises many
questions about trade-offs between privacy, utility, and practicality.
A promising approach to address these questions is differential
privacy. This algorithmic framework has emerged in the
last decade as the foundation for numerous algorithms with strong
privacy guarantees, and has recently been adopted by several projects
in industry and government. This paper discusses
the benefits and challenges of employing differential privacy
in software analytics used in mobile apps. We aim to outline an
initial research agenda that serves as the starting point for
further discussions in the software engineering research community.
@InProceedings{SWAN18p26,
author = {Hailong Zhang and Sufian Latif and Raef Bassily and Atanas Rountev},
title = {Differentially-Private Software Analytics for Mobile Apps: Opportunities and Challenges},
booktitle = {Proc.\ SWAN},
publisher = {ACM},
pages = {26--29},
doi = {10.1145/3278142.3278148},
year = {2018},
}
Publisher's Version
Towards a Framework for Generating Program Dependence Graphs from Source Code
Victor J. Marin and Carlos R. Rivero
(Rochester Institute of Technology, USA)
Originally conceived for compiler optimization, the program dependence graph has become a widely used internal representation for tools in many software engineering tasks. The currently available frameworks for building program dependence graphs rely on compiled source code, which requires resolving dependencies. As a result, these frameworks cannot be applied for analyzing legacy codebases whose dependencies cannot be automatically resolved, or for large codebases in which resolving dependencies can be infeasible. In this paper, we present a framework for generating program dependence graphs from source code based on transition rules, and we describe lessons learned when implementing two different versions of the framework based on a grammar interpreter and an abstract syntax tree iterator, respectively.
@InProceedings{SWAN18p30,
author = {Victor J. Marin and Carlos R. Rivero},
title = {Towards a Framework for Generating Program Dependence Graphs from Source Code},
booktitle = {Proc.\ SWAN},
publisher = {ACM},
pages = {30--36},
doi = {10.1145/3278142.3278144},
year = {2018},
}
Publisher's Version
proc time: 1.43