Powered by
33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018),
September 3–7, 2018,
Montpellier, France
Frontmatter
Message from the Chairs
It is our great pleasure and honor to welcome everyone to the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2018). The ASE conference series is the premier research forum for automated software engineering. Each year it brings together researchers and practitioners from academia and industry to discuss foundations, techniques, and tools for automated analysis, design, implementation, testing, and maintenance of software systems.
ASE 2018 Conference Organization
Organizing Committee
General Chair
Marianne Huchard University of Montpellier, France
Program Co-Chairs
Gordon Fraser University of Passau, Germany
Christian Kästner Carnegie Mellon University, USA
Keynotes
Software Heritage: Collecting, Preserving, and Sharing All Our Source Code (Keynote)
Roberto Di Cosmo
(Inria, France; University Paris Diderot, France)
Software Heritage is a non profit initiative whose ambitious goal is to collect, preserve and share the source code of all software ever written, with its full development history, building a universal source code software knowledge base.
Software Heritage addresses a variety of needs: preserving our scientific and technological knowledge, enabling better software development and reuse for society and industry, fostering better science, and building an essential infrastructure for large scale, reproducible software studies.
We have already collected over 4 billions unique source files from over 80 millions repositories, and organised them into a giant Merkle graph, with full deduplication across all repositories.
This allows us to cope with the growth of collaborative software development, and provides a unique vantage point for observing its evolution.
In this talk, we will highlight the new challenges and opportunities that Software Heritage brings up.
@InProceedings{ASE18p1,
author = {Roberto Di Cosmo},
title = {Software Heritage: Collecting, Preserving, and Sharing All Our Source Code (Keynote)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {1--2},
doi = {10.1145/3238147.3241985},
year = {2018},
}
Publisher's Version
Automated Requirements Engineering Challenges with Examples from Small Unmanned Aerial Systems (Keynote)
Jane Cleland-Huang
(University of Notre Dame, USA)
Requirements Engineering includes various activities aimed at discovering, analyzing, validating, evolving, and managing software and systems requirements. Many of these activities are human facing, effort intensive, and sometimes error prone. They could benefit greatly from cutting edge advances in automation. However, the software engineering community has primarily focused on automating other aspects of the development process such as testing, code analytics, and mining software respositories. As a result, advances in software analytics have had superficial impact upon advancing the state of art and practice in the field of requirements engineering. Two primary inhibitors are the lack of publicly available datasets and poorly publicized industry-relevant open requirements analytic challenges. To empower the Automated Software Engineering community to tackle open Requirements Engineering challenges, the talk will describe the rapidly evolving landscape of requirements engineering, clearly articulate open challenges, draw upon examples from an ongoing, agile, safety-critical project in the domain of Unmanned Aerial Vehicles, and introduce Dronology as a new community dataset.
@InProceedings{ASE18p3,
author = {Jane Cleland-Huang},
title = {Automated Requirements Engineering Challenges with Examples from Small Unmanned Aerial Systems (Keynote)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {3--3},
doi = {10.1145/3238147.3241986},
year = {2018},
}
Publisher's Version
Implementation Science for Software Engineering: Bridging the Gap between Research and Practice (Keynote)
Lauren Herckis
(Carnegie Mellon University, USA)
Software engineering research rarely impacts practitioners in the field. A desire to facilitate transfer alone is not sufficient to bridge the gap between research and practice. Fields from medicine to education have acknowledged a similar challenge over the past 25 years. An empirical approach to the translation of research into evidence-based practice has emerged from the resulting discussion. Implementation science has fundamentally changed the way that biomedical research is conducted, and has revolutionized the daily practice of doctors, social workers, epidemiologists, early childhood educators, and more. In this talk we will explore the methods, frameworks, and practices of implementation science and their application to novel disciplines, including software engineering research. We will close by proposing some directions for future software engineering research to facilitate transfer.
@InProceedings{ASE18p4,
author = {Lauren Herckis},
title = {Implementation Science for Software Engineering: Bridging the Gap between Research and Practice (Keynote)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {4--4},
doi = {10.1145/3238147.3264581},
year = {2018},
}
Publisher's Version
The Need for Context in Software Engineering (IEEE CS Harlan Mills Award Keynote)
Gail C. Murphy
(University of British Columbia, Canada)
The development of a software system requires the orchestration of many different people using many different tools. Despite the need for a developer who is performing a task to understand the context in which that task is undertaken, the tools we imagine, build and provide to support software developers are typically isolated. Instead of the tools helping a developer work within the appropriate context, it is the developer who must bring the context to the tools.
In this talk, I will argue that the lack of context in the software engineering tools we build limits the effectiveness of developers and of our software development practices.
@InProceedings{ASE18p5,
author = {Gail C. Murphy},
title = {The Need for Context in Software Engineering (IEEE CS Harlan Mills Award Keynote)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {5--5},
doi = {10.1145/3238147.3241987},
year = {2018},
}
Publisher's Version
Technical Research Papers
Performance
On Adopting Linters to Deal with Performance Concerns in Android Apps
Sarra Habchi, Xavier Blanc, and Romain Rouvoy
(Inria, France; University of Lille, France; University of Bordeaux, France)
With millions of applications (apps) distributed through mobile markets, engaging and retaining end-users challenge Android developers to deliver a nearly perfect user experience. As mobile apps run in resource-limited devices, performance is a critical criterion for the quality of experience. Therefore, developers are expected to pay much attention to limit performance bad practices. On the one hand, many studies already identified such performance bad practices and showed that they can heavily impact app performance. Hence, many static analysers, a.k.a. linters, have been proposed to detect and fix these bad practices. On the other hand, other studies have shown that Android developers tend to deal with performance reactively and they rarely build on linters to detect and fix performance bad practices. In this paper, we therefore perform a qualitative study to investigate this gap between research and development community. In particular, we performed interviews with 14 experienced Android developers to identify the perceived benefits and constraints of using linters to identify performance bad practices in Android apps. Our observations can have a direct impact on developers and the research community. Specifically, we describe why and how developers leverage static source code analysers to improve the performance of their apps. On top of that, we bring to light important challenges faced by developers when it comes to adopting static analysis for performance purposes.
@InProceedings{ASE18p6,
author = {Sarra Habchi and Xavier Blanc and Romain Rouvoy},
title = {On Adopting Linters to Deal with Performance Concerns in Android Apps},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {6--16},
doi = {10.1145/3238147.3238197},
year = {2018},
}
Publisher's Version
Published Artifact
Artifacts Available
PerfLearner: Learning from Bug Reports to Understand and Generate Performance Test Frames
Xue Han, Tingting Yu, and
David Lo
(University of Kentucky, USA; Singapore Management University, Singapore)
Software performance is important for ensuring the quality of software products. Performance bugs, defined as programming errors that cause significant performance degradation, can lead to slow systems and poor user experience. While there has been some research on automated performance testing such as test case generation, the main idea is to select workload values to increase the
program execution times. These techniques often assume the initial test cases have the right combination of input parameters and focus on evolving values of certain input parameters. However, such an assumption may not hold for highly configurable real-word applications, in which the combinations of input parameters can be very large. In this paper, we manually analyze 300 bug reports from three large open source projects - Apache HTTP Server, MySQL, and Mozilla Firefox. We found that 1) exposing performance bugs often requires combinations of multiple input parameters, and 2) certain
input parameters are frequently involved in exposing performance bugs. Guided by these findings, we designed and evaluated an automated approach, PerfLearner, to extract execution commands and
input parameters from descriptions of performance bug reports and use them to generate test frames for guiding actual performance test case generation.
@InProceedings{ASE18p17,
author = {Xue Han and Tingting Yu and David Lo},
title = {PerfLearner: Learning from Bug Reports to Understand and Generate Performance Test Frames},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {17--28},
doi = {10.1145/3238147.3238204},
year = {2018},
}
Publisher's Version
AutoConfig: Automatic Configuration Tuning for Distributed Message Systems
Liang Bao, Xin Liu, Ziheng Xu, and Baoyin Fang
(Xidian University, China; University of California at Davis, USA)
Distributed message systems (DMSs) serve as the communication backbone for many real-time streaming data processing applications. To support the vast diversity of such applications, DMSs provide a large number of parameters to configure. However, It overwhelms for most users to configure these parameters well for better performance. Although many automatic configuration approaches have been proposed to address this issue, critical challenges still remain: 1) to train a better and robust performance prediction model using a limited number of samples, and 2) to search for a high-dimensional parameter space efficiently within a time constraint. In this paper, we propose AutoConfig -- an automatic configuration system that can optimize producer-side throughput on DMSs. AutoConfig constructs a novel comparison-based model (CBM) that is more robust that the prediction-based model (PBM) used by previous learning-based approaches. Furthermore, AutoConfig uses a weighted Latin hypercube sampling (wLHS) approach to select a set of samples that can provide a better coverage over the high-dimensional parameter space. wLHS allows AutoConfig to search for more promising configurations using the trained CBM. We have implemented AutoConfig on the Kafka platform, and evaluated it using eight different testing scenarios deployed on a public cloud. Experimental results show that our CBM can obtain better results than that of PBM under the same random forests based model. Furthermore, AutoConfig outperforms default configurations by 215.40% on average, and five state-of-the-art configuration algorithms by 7.21%-64.56%.
@InProceedings{ASE18p29,
author = {Liang Bao and Xin Liu and Ziheng Xu and Baoyin Fang},
title = {AutoConfig: Automatic Configuration Tuning for Distributed Message Systems},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {29--40},
doi = {10.1145/3238147.3238175},
year = {2018},
}
Publisher's Version
Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning
Andrew Habib and
Michael Pradel
(TU Darmstadt, Germany)
Thread-safe classes are pervasive in concurrent, object-oriented software. However, many classes lack documentation regarding their safety guarantees under multi-threaded usage. This lack of documentation forces developers who use a class in a concurrent program to either carefully inspect the implementation of the class, to conservatively synchronize all accesses to it, or to optimistically assume that the class is thread-safe. To overcome the lack of documentation, we present TSFinder, an approach to automatically classify classes as supposedly thread-safe or thread-unsafe. The key idea is to combine a lightweight static analysis that extracts a graph representation from classes with a graph-based classifier. After training the classifier with classes known to be thread-safe and thread-unsafe, it achieves an accuracy of 94.5% on previously unseen classes, enabling the approach to infer thread safety documentation with high confidence. The classifier takes about 3 seconds per class, i.e., it is efficient enough to infer documentation for many classes.
@InProceedings{ASE18p41,
author = {Andrew Habib and Michael Pradel},
title = {Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {41--52},
doi = {10.1145/3238147.3238212},
year = {2018},
}
Publisher's Version
Info
Testing Studies
A Large-Scale Study of Test Coverage Evolution
Michael Hilton, Jonathan Bell, and
Darko Marinov
(Carnegie Mellon University, USA; George Mason University, USA; University of Illinois at Urbana-Champaign, USA)
Statement coverage is commonly used as a measure of test suite quality. Coverage is often used as a part of a code review process: if a patch decreases overall coverage, or is itself not covered, then the patch is scrutinized more closely. Traditional studies of how coverage changes with code evolution have examined the overall coverage of the entire program, and more recent work directly examines the coverage of patches (changed statements). We present an evaluation much larger than prior studies and moreover consider a new, important kind of change --- coverage changes of unchanged statements. We present a large-scale evaluation of code coverage evolution over 7,816 builds of 47 projects written in popular languages including Java, Python, and Scala. We find that in large, mature projects, simply measuring the change to statement coverage does not capture the nuances of code evolution. Going beyond considering statement coverage as a simple ratio, we examine how the set of statements covered evolves between project revisions. We present and study new ways to assess the impact of a patch on a project's test suite quality that both separates coverage of the patch from coverage of the non-patch, and separates changes in coverage from changes in the set of statements covered.
@InProceedings{ASE18p53,
author = {Michael Hilton and Jonathan Bell and Darko Marinov},
title = {A Large-Scale Study of Test Coverage Evolution},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {53--63},
doi = {10.1145/3238147.3238183},
year = {2018},
}
Publisher's Version
Effectiveness and Challenges in Generating Concurrent Tests for Thread-Safe Classes
Valerio Terragni and
Mauro Pezzè
(University of Lugano, Switzerland)
Developing correct and efficient concurrent programs is difficult and error-prone, due to the complexity of thread synchronization. Often, developers alleviate such problem by relying on thread-safe classes, which encapsulate most synchronization-related challenges. Thus, testing such classes is crucial to ensure the reliability of the concurrency aspects of programs. Some recent techniques and corresponding tools tackle the problem of testing thread-safe classes by automatically generating concurrent tests. In this paper, we present a comprehensive study of the state-of-the-art techniques and an independent empirical evaluation of the publicly available tools. We conducted the study by executing all tools on the JaConTeBe benchmark that contains 47 well-documented concurrency faults. Our results show that 8 out of 47 faults (17%) were detected by at least one tool. By studying the issues of the tools and the generated tests, we derive insights to guide future research on improving the effectiveness of automated concurrent test generation.
@InProceedings{ASE18p64,
author = {Valerio Terragni and Mauro Pezzè},
title = {Effectiveness and Challenges in Generating Concurrent Tests for Thread-Safe Classes},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {64--75},
doi = {10.1145/3238147.3238224},
year = {2018},
}
Publisher's Version
Build and Test Automation
Scalable Incremental Building with Dynamic Task Dependencies
Gabriël Konat, Sebastian Erdweg, and
Eelco Visser
(Delft University of Technology, Netherlands)
Incremental build systems are essential for fast, reproducible software builds.
Incremental build systems enable short feedback cycles when they capture dependencies precisely and selectively execute build tasks efficiently.
A much overlooked feature of build systems is the expressiveness of the scripting language, which directly influences the maintainability of build scripts.
In this paper, we present a new incremental build algorithm that allows build engineers to use a full-fledged programming language with explicit task invocation, value and file inspection facilities, and conditional and iterative language constructs.
In contrast to prior work on incrementality for such programmable builds, our algorithm scales with the number of tasks affected by a change and is independent of the size of the software project being built.
Specifically, our algorithm accepts a set of changed files, transitively detects and re-executes affected build tasks, but also accounts for new task dependencies discovered during building.
We have evaluated the performance of our algorithm in a real-world case study and confirm its scalability.
@InProceedings{ASE18p76,
author = {Gabriël Konat and Sebastian Erdweg and Eelco Visser},
title = {Scalable Incremental Building with Dynamic Task Dependencies},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {76--86},
doi = {10.1145/3238147.3238196},
year = {2018},
}
Publisher's Version
Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI
Keheliya Gallaba, Christian Macho, Martin Pinzger, and Shane McIntosh
(McGill University, Canada; University of Klagenfurt, Austria)
Automated builds, which may pass or fail, provide feedback to a development team about changes to the codebase. A passing build indicates that the change compiles cleanly and tests (continue to) pass. A failing (a.k.a., broken) build indicates that there are issues that require attention. Without a closer analysis of the nature of build outcome data, practitioners and researchers are likely to make two critical assumptions: (1) build results are not noisy; however, passing builds may contain failing or skipped jobs that are actively or passively ignored; and (2) builds are equal; however, builds vary in terms of the number of jobs and configurations.
To investigate the degree to which these assumptions about build breakage hold, we perform an empirical study of 3.7 million build jobs spanning 1,276 open source projects. We find that: (1) 12% of passing builds have an actively ignored failure; (2) 9% of builds have a misleading or incorrect outcome on average; and (3) at least 44% of the broken builds contain passing jobs, i.e., the breakage is local to a subset of build variants. Like other software archives, build data is noisy and complex. Analysis of build data requires nuance.
@InProceedings{ASE18p87,
author = {Keheliya Gallaba and Christian Macho and Martin Pinzger and Shane McIntosh},
title = {Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {87--97},
doi = {10.1145/3238147.3238171},
year = {2018},
}
Publisher's Version
Quality Assurance for Machine Learning Techniques
Automated Directed Fairness Testing
Sakshi Udeshi, Pryanshu Arora, and Sudipta Chattopadhyay
(Singapore University of Technology and Design, Singapore; BITS Pilani, India)
Fairness is a critical trait in decision making. As machine-learning
models are increasingly being used in sensitive application domains
(e.g. education and employment) for decision making, it is crucial
that the decisions computed by such models are free of unintended
bias. But how can we automatically validate the fairness of arbitrary
machine-learning models? For a given machine-learning model and
a set of sensitive input parameters, our Aeqitas approach automatically
discovers discriminatory inputs that highlight fairness
violation. At the core of Aeqitas are three novel strategies to
employ probabilistic search over the input space with the objective
of uncovering fairness violation. Our Aeqitas approach leverages
inherent robustness property in common machine-learning models
to design and implement scalable test generation methodologies.
An appealing feature of our generated test inputs is that they can
be systematically added to the training set of the underlying model
and improve its fairness. To this end, we design a fully automated
module that guarantees to improve the fairness of the model.
We implemented Aeqitas and we have evaluated it on six stateof-
the-art classifiers. Our subjects also include a classifier that was
designed with fairness in mind. We show that Aeqitas effectively
generates inputs to uncover fairness violation in all the subject
classifiers and systematically improves the fairness of respective
models using the generated test inputs. In our evaluation, Aeqitas
generates up to 70% discriminatory inputs (w.r.t. the total number
of inputs generated) and leverages these inputs to improve the
fairness up to 94%.
@InProceedings{ASE18p98,
author = {Sakshi Udeshi and Pryanshu Arora and Sudipta Chattopadhyay},
title = {Automated Directed Fairness Testing},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {98--108},
doi = {10.1145/3238147.3238165},
year = {2018},
}
Publisher's Version
Concolic Testing for Deep Neural Networks
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening
(University of Oxford, UK; University of Liverpool, UK)
Concolic testing combines program execution and symbolic analysis to explore the execution paths of a software program. In this paper, we develop the first concolic testing approach for Deep Neural Networks (DNNs). More specifically, we utilise quantified linear arithmetic over rationals to express test requirements that have been studied in the literature, and then develop a coherent method to perform concolic testing with the aim of better coverage. Our experimental results show the effectiveness of the concolic testing approach in both achieving high coverage and finding adversarial examples.
@InProceedings{ASE18p109,
author = {Youcheng Sun and Min Wu and Wenjie Ruan and Xiaowei Huang and Marta Kwiatkowska and Daniel Kroening},
title = {Concolic Testing for Deep Neural Networks},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {109--119},
doi = {10.1145/3238147.3238172},
year = {2018},
}
Publisher's Version
DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems
Lei Ma, Felix Juefei-Xu,
Fuyuan Zhang, Jiyuan Sun,
Minhui Xue, Bo Li, Chunyang Chen, Ting Su,
Li Li,
Yang Liu,
Jianjun Zhao, and Yadong Wang
(Harbin Institute of Technology, China; Nanyang Technological University, Singapore; Carnegie Mellon University, USA; Kyushu University, Japan; University of Illinois at Urbana-Champaign, USA; Monash University, Australia)
Deep learning (DL) defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. We have seen wide adoption of DL in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the testing adequacy of a DL system is usually measured by the accuracy of test data. Considering the limitation of accessible high quality test data, good accuracy performance on test data can hardly provide confidence to the testing adequacy and generality of DL systems. Unlike traditional software systems that have clear and controllable logic and functionality, the lack of interpretability in a DL system makes system analysis and defect detection difficult, which could potentially hinder its real-world deployment. In this paper, we propose DeepGauge, a set of multi-granularity testing criteria for DL systems, which aims at rendering a multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, and with four state-of-the-art adversarial attack techniques against DL. The potential usefulness of DeepGauge sheds light on the construction of more generic and robust DL systems.
@InProceedings{ASE18p120,
author = {Lei Ma and Felix Juefei-Xu and Fuyuan Zhang and Jiyuan Sun and Minhui Xue and Bo Li and Chunyang Chen and Ting Su and Li Li and Yang Liu and Jianjun Zhao and Yadong Wang},
title = {DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {120--131},
doi = {10.1145/3238147.3238202},
year = {2018},
}
Publisher's Version
DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems
Mengshi Zhang, Yuqun Zhang, Lingming Zhang,
Cong Liu, and Sarfraz Khurshid
(University of Texas at Austin, USA; Southern University of Science and Technology, China; University of Texas at Dallas, USA)
While Deep Neural Networks (DNNs) have established the fundamentals
of image-based autonomous driving systems, they may
exhibit erroneous behaviors and cause fatal accidents. To address the
safety issues in autonomous driving systems, a recent set of testing
techniques have been designed to automatically generate artificial
driving scenes to enrich test suite, e.g., generating new input images
transformed from the original ones. However, these techniques are
insufficient due to two limitations: first, many such synthetic images
often lack diversity of driving scenes, and hence compromise the resulting
efficacy and reliability. Second, for machine-learning-based
systems, a mismatch between training and application domain can
dramatically degrade system accuracy, such that it is necessary to
validate inputs for improving system robustness.
In this paper, we propose DeepRoad, an unsupervised DNN-based
framework for automatically testing the consistency of DNN-based
autonomous driving systems and online validation. First, DeepRoad
automatically synthesizes large amounts of diverse driving scenes
without using image transformation rules (e.g. scale, shear and rotation).
In particular, DeepRoad is able to produce driving scenes
with various weather conditions (including those with rather extreme
conditions) by applying Generative Adversarial Networks (GANs)
along with the corresponding real-world weather scenes. Second,
DeepRoad utilizes metamorphic testing techniques to check the
consistency of such systems using synthetic images. Third, DeepRoad
validates input images for DNN-based systems by measuring
the distance of the input and training images using their VGGNet
features. We implement DeepRoad to test three well-recognized
DNN-based autonomous driving systems in Udacity self-driving car
challenge. The experimental results demonstrate that DeepRoad can
detect thousands of inconsistent behaviors for these systems, and
effectively validate input images to potentially enhance the system
robustness as well.
@InProceedings{ASE18p132,
author = {Mengshi Zhang and Yuqun Zhang and Lingming Zhang and Cong Liu and Sarfraz Khurshid},
title = {DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {132--142},
doi = {10.1145/3238147.3238187},
year = {2018},
}
Publisher's Version
Variability
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective Search
Raja Ben Abdessalem,
Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter
(University of Luxembourg, Luxembourg; Delft University of Technology, Netherlands; IEE, Luxembourg)
Complex systems such as autonomous cars are typically built as a composition of features that are independent units of functionality. Features tend to interact and impact one another's behavior in unknown ways. A challenge is to detect and manage feature interactions, in particular, those that violate system requirements, hence leading to failures. In this paper, we propose a technique to detect feature interaction failures by casting this problem into a search-based test generation problem. We define a set of hybrid test objectives (distance functions) that combine traditional coverage-based heuristics with new heuristics specifically aimed at revealing feature interaction failures. We develop a new search-based test generation algorithm, called FITEST, that is guided by our hybrid test objectives. FITEST extends recently proposed many-objective evolutionary algorithms to reduce the time required to compute fitness values. We evaluate our approach using two versions of an industrial self-driving system. Our results show that our hybrid test objectives are able to identify more than twice as many feature interaction failures as two baseline test objectives used in the software testing literature (i.e., coverage-based and failure-based test objectives). Further, the feedback from domain experts indicates that the detected feature interaction failures represent real faults in their systems that were not previously identified based on analysis of the system features and their requirements.
@InProceedings{ASE18p143,
author = {Raja Ben Abdessalem and Annibale Panichella and Shiva Nejati and Lionel C. Briand and Thomas Stifter},
title = {Testing Autonomous Cars for Feature Interaction Failures using Many-Objective Search},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {143--154},
doi = {10.1145/3238147.3238192},
year = {2018},
}
Publisher's Version
Tackling Combinatorial Explosion: A Study of Industrial Needs and Practices for Analyzing Highly Configurable Systems
Mukelabai Mukelabai, Damir Nešić, Salome Maro, Thorsten Berger, and Jan-Philipp Steghöfer
(Chalmers University of Technology, Sweden; University of Gothenburg, Sweden; KTH, Sweden)
Highly configurable systems are complex pieces of software. To tackle this complexity, hundreds of dedicated analysis techniques have been conceived, many of which able to analyze system properties for all possible system configurations, as opposed to traditional, single-system analyses. Unfortunately, it is largely unknown whether these techniques are adopted in practice, whether they address actual needs, or what strategies practitioners actually apply to analyze highly configurable systems. We present a study of analysis practices and needs in industry. It relied on a survey with 27 practitioners engineering highly configurable systems and follow-up interviews with 15 of them, covering 18 different companies from eight countries. We confirm that typical properties considered in the literature (e.g., reliability) are relevant, that consistency between variability models and artifacts is critical, but that the majority of analyses for specifications of configuration options (a.k.a., variability model analysis) is not perceived as needed. We identified rather pragmatic analysis strategies, including practices to avoid the need for analysis. For instance, testing with experience-based sampling is the most commonly applied strategy, while systematic sampling is rarely applicable. We discuss analyses that are missing and synthesize our insights into suggestions for future research.
@InProceedings{ASE18p155,
author = {Mukelabai Mukelabai and Damir Nešić and Salome Maro and Thorsten Berger and Jan-Philipp Steghöfer},
title = {Tackling Combinatorial Explosion: A Study of Industrial Needs and Practices for Analyzing Highly Configurable Systems},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {155--166},
doi = {10.1145/3238147.3238201},
year = {2018},
}
Publisher's Version
Info
Understanding and Detecting Evolution-Induced Compatibility Issues in Android Apps
Dongjie He,
Lian Li, Lei Wang, Hengjie Zheng, Guangwei Li, and
Jingling Xue
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; UNSW, Australia)
The frequent release of Android OS and its various versions bring many compatibility issues to Android Apps. This paper studies and addresses such evolution-induced compatibility problems. We conduct an extensive empirical study over 11 different Android versions and 4,936 Android Apps. Our study shows that there are drastic API changes between adjacent Android versions, with averagely 140.8 new types, 1,505.6 new methods, and 979.2 new fields being introduced in each release. However, the Android Support Library (provided by the Android OS) only supports less than 23% of the newly added methods, with much less support for new types and fields. As a result, 91.84% of Android Apps write additional code to support different OS versions. Furthermore, 88.65% of the supporting codes share a common pattern, which directly compares variable android.os.Build.VERSION.SDK_INT with a constant version number, to use an API of particular versions.
Based on our findings, we develop a new tool called IctApiFinder, to detect incompatible API usages in Android applications. IctApiFinder effectively computes the OS versions on which an API may be invoked, using an inter-procedural data-flow analysis framework. It detects numerous incompatible API usages in 361 out of 1,425 Apps. Compared to Android Lint, IctApiFinder is sound and able to reduce the false positives by 82.1%. We have reported the issues to 13 Apps developers. At present, 5 of them have already been confirmed by the original developers and 3 of them have already been fixed.
@InProceedings{ASE18p167,
author = {Dongjie He and Lian Li and Lei Wang and Hengjie Zheng and Guangwei Li and Jingling Xue},
title = {Understanding and Detecting Evolution-Induced Compatibility Issues in Android Apps},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {167--177},
doi = {10.1145/3238147.3238185},
year = {2018},
}
Publisher's Version
Mining and Crowd Sourcing
Characterizing the Natural Language Descriptions in Software Logging Statements
Pinjia He, Zhuangbin Chen, Shilin He, and
Michael R. Lyu
(Chinese University of Hong Kong, China)
Logging is a common programming practice of great importance in modern software development, because software logs have been widely used in various software maintenance tasks. To provide high-quality logs, developers need to design the description text in logging statements carefully. Inappropriate descriptions will slow down or even mislead the maintenance process, such as postmortem analysis. However, there is currently a lack of rigorous guide and specifications on developer logging behaviors, which makes the construction of description text in logging statements a challenging problem. To fill this significant gap, in this paper, we systematically study what developers log, with focus on the usage of natural language descriptions in logging statements. We obtain 6 valuable findings by conducting source code analysis on 10 Java projects and 7 C# projects, which contain 28,532,975 LOC and 115,159 logging statements in total. Furthermore, our study demonstrates the potential of automated description text generation for logging statements by obtaining up to 49.04 BLEU-4 score and 62.1 ROUGE-L score using a simple information retrieval method. To facilitate future research in this field, the datasets have been publicly released.
@InProceedings{ASE18p178,
author = {Pinjia He and Zhuangbin Chen and Shilin He and Michael R. Lyu},
title = {Characterizing the Natural Language Descriptions in Software Logging Statements},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {178--189},
doi = {10.1145/3238147.3238193},
year = {2018},
}
Publisher's Version
Assessing the Type Annotation Burden
John-Paul Ore, Sebastian Elbaum, Carrick Detweiler, and Lambros Karkazis
(University of Nebraska-Lincoln, USA)
Type annotations provide a link between program variables and domain-specific types. When combined with a type system, these annotations can enable early fault detection. For type annotations to be cost-effective in practice, they need to be both accurate and affordable for developers. We lack, however, an understanding of how burdensome type annotation is for developers. Hence, this work explores three fundamental questions: 1) how accurately do developers make type annotations; 2) how long does a single annotation take; and, 3) if a system could automatically suggest a type annotation, how beneficial to accuracy are correct suggestions and how detrimental are incorrect suggestions? We present results of a study of 71 programmers using 20 random code artifacts that contain variables with physical unit types that must be annotated. Subjects choose a correct type annotation only 51% of the time and take an average of 136 seconds to make a single correct annotation. Our qualitative analysis reveals that variable names and reasoning over mathematical operations are the leading clues for type selection. We find that suggesting the correct type boosts accuracy to 73%, while making a poor suggestion decreases accuracy to 28%. We also explore what state-of-the-art automated type annotation systems can and cannot do to help developers with type annotations, and identify implications for tool developers.
@InProceedings{ASE18p190,
author = {John-Paul Ore and Sebastian Elbaum and Carrick Detweiler and Lambros Karkazis},
title = {Assessing the Type Annotation Burden},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {190--201},
doi = {10.1145/3238147.3238173},
year = {2018},
}
Publisher's Version
Mining File Histories: Should We Consider Branches?
Vladimir Kovalenko, Fabio Palomba, and
Alberto Bacchelli
(Delft University of Technology, Netherlands; University of Zurich, Switzerland)
Modern distributed version control systems, such as Git, offer support for branching — the possibility to develop parts of software outside the master trunk. Consideration of the repository structure in Mining Software Repository (MSR) studies requires a thorough approach to mining, but there is no well-documented, widespread methodology regarding the handling of merge commits and branches. Moreover, there is still a lack of knowledge of the extent to which considering branches during MSR studies impacts the results of the studies.
In this study, we set out to evaluate the importance of proper handling of branches when calculating file modification histories. We analyze over 1,400 Git repositories of four open source ecosystems and compute modification histories for over two million files, using two different algorithms. One algorithm only follows the first parent of each commit when traversing the repository, the other returns the full modification history of a file across all branches. We show that the two algorithms consistently deliver different results, but the scale of the difference varies across projects and ecosystems. Further, we evaluate the importance of accurate mining of file histories by comparing the performance of common techniques that rely on file modification history — reviewer recommendation, change recommendation, and defect prediction — for two algorithms of file history retrieval. We find that considering full file histories leads to an increase in the techniques’ performance that is rather modest.
@InProceedings{ASE18p202,
author = {Vladimir Kovalenko and Fabio Palomba and Alberto Bacchelli},
title = {Mining File Histories: Should We Consider Branches?},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {202--213},
doi = {10.1145/3238147.3238169},
year = {2018},
}
Publisher's Version
Info
Tell Them Apart: Distilling Technology Differences from Crowd-Scale Comparison Discussions
Yi Huang, Chunyang Chen,
Zhenchang Xing, Tian Lin, and
Yang Liu
(Australian National University, Australia; Monash University, Australia; Nanyang Technological University, Singapore)
Developers can use different technologies for many software development tasks in their work. However, when faced with several technologies with comparable functionalities, it is not easy for developers to select the most appropriate one, as comparisons among technologies are time-consuming by trial and error. Instead, developers can resort to expert articles, read official documents or ask questions in QA sites for technology comparison, but it is opportunistic to get a comprehensive comparison as online information is often fragmented or contradictory. To overcome these limitations, we propose the diffTech system that exploits the crowdsourced discussions from Stack Overflow, and assists technology comparison with an informative summary of different comparison aspects. We first build a large database of comparable technologies in software engineering by mining tags in Stack Overflow, and then locate comparative sentences about comparable technologies with natural language processing methods. We further mine prominent comparison aspects by clustering similar comparative sentences and representing each cluster with its keywords. The evaluation demonstrates both the accuracy and usefulness of our model and we implement our approach into a practical website for public use.
@InProceedings{ASE18p214,
author = {Yi Huang and Chunyang Chen and Zhenchang Xing and Tian Lin and Yang Liu},
title = {Tell Them Apart: Distilling Technology Differences from Crowd-Scale Comparison Discussions},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {214--224},
doi = {10.1145/3238147.3238208},
year = {2018},
}
Publisher's Version
Info
Security
ReScue: Crafting Regular Expression DoS Attacks
Yuju Shen, Yanyan Jiang,
Chang Xu, Ping Yu,
Xiaoxing Ma, and Jian Lu
(Nanjing University, China)
Regular expression (regex) with modern extensions is one of the most popular string processing tools. However, poorly-designed regexes can yield exponentially many matching steps, and lead to regex Denial-of-Service (ReDoS) attacks under well-conceived string inputs. This paper presents Rescue, a three-phase gray-box analytical technique, to automatically generate ReDoS strings to highlight vulnerabilities of given regexes. Rescue systematically seeds (by a genetic search), incubates (by another genetic search), and finally pumps (by a regex-dedicated algorithm) for generating strings with maximized search time. We implemenmted the Rescue tool and evaluated it against 29,088 practical regexes in real-world projects. The evaluation results show that Rescue found 49% more attack strings compared with the best existing technique, and applying Rescue to popular GitHub projects discovered ten previously unknown ReDoS vulnerabilities.
@InProceedings{ASE18p225,
author = {Yuju Shen and Yanyan Jiang and Chang Xu and Ping Yu and Xiaoxing Ma and Jian Lu},
title = {ReScue: Crafting Regular Expression DoS Attacks},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {225--235},
doi = {10.1145/3238147.3238159},
year = {2018},
}
Publisher's Version
Info
TDroid: Exposing App Switching Attacks in Android with Control Flow Specialization
Jie Liu, Diyu Wu, and
Jingling Xue
(UNSW, Australia)
The Android multitasking mechanism can be plagued with app switching attacks, in which a malicious app replaces the legitimate top activity of the focused app with one of its own, thus mounting, e.g., phishing and denial-of-service attacks. Existing market-level defenses are still ineffective, as static analysis is fundamentally unable to reason about the intention of an app and dynamic analysis has low coverage.
We introduce TDroid, a new market-level approach to detecting app switching attacks. The challenge lies in how to handle a plethora of input-dependent branch predicates (forming an exponential number of paths) that control the execution of the code responsible for launching such attacks. TDroid tackles this challenge by combining static and dynamic analysis to analyze an app without producing any false positives. In its static analysis, TDroid transforms the app into runnable slices containing potentially app switching attacks, one slice per attack. In its dynamic analysis, TDroid executes these slices on an Android phone or emulator to expose their malicious GUIs. The novelty lies in the use of a new trigger-oriented slicing technique in producing runnable slices so that certain input-dependent branch predicates are specialized to execute always some fixed branches.
Evaluated with a large set of malware apps, TDroid is shown to outperform the state of the art, by detecting substantially more app switching attacks, in a few minutes per app, on average.
@InProceedings{ASE18p236,
author = {Jie Liu and Diyu Wu and Jingling Xue},
title = {TDroid: Exposing App Switching Attacks in Android with Control Flow Specialization},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {236--247},
doi = {10.1145/3238147.3238188},
year = {2018},
}
Publisher's Version
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Ameni Ben Fadhel, Domenico Bianculli, and Lionel C. Briand
(University of Luxembourg, Luxembourg)
A Role-based Access Control (RBAC) mechanism prevents unauthorized users to perform an operation, according to authorization policies which are defined on the user’s role within an enterprise. Several models have been proposed to specify complex RBAC policies. However, existing approaches for policy enforcement do not fully support all the types of policies that can be expressed in these models, which hinders their adoption among practitioners.
In this paper we propose a model-driven enforcement framework for complex policies captured by GemRBAC+CTX, a comprehensive RBAC model proposed in the literature. We reduce the problem of making an access decision to checking whether a system state (from an RBAC point of view), expressed as an instance of the GemRBAC+CTX model, satisfies the constraints corresponding to the RBAC policies to be enforced at run time. We provide enforcement algorithms for various types of access requests and events, and a prototype tool (MORRO) implementing them. We also show how to integrate MORRO into an industrial Web application. The evaluation results show the applicability of our approach on a industrial system and its scalability with respect to the various parameters characterizing an AC configuration.
@InProceedings{ASE18p248,
author = {Ameni Ben Fadhel and Domenico Bianculli and Lionel C. Briand},
title = {Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {248--258},
doi = {10.1145/3238147.3238167},
year = {2018},
}
Publisher's Version
ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection
Bo Jiang, Ye Liu, and W. K. Chan
(Beihang University, China; City University of Hong Kong, China)
Decentralized cryptocurrencies feature the use of blockchain to transfer values among peers on networks without central agency. Smart contracts are programs running on top of the blockchain consensus protocol to enable people make agreements while minimizing trusts. Millions of smart contracts have been deployed in various decentralized applications. The security vulnerabilities within those smart contracts pose significant threats to their applications. Indeed, many critical security vulnerabilities within smart contracts on Ethereum platform have caused huge financial losses to their users. In this work, we present ContractFuzzer, a novel fuzzer to test Ethereum smart contracts for security vulnerabilities. ContractFuzzer generates fuzzing inputs based on the ABI specifications of smart contracts, defines test oracles to detect security vulnerabilities, instruments the EVM to log smart contracts runtime behaviors, and analyzes these logs to report security vulnerabilities. Our fuzzing of 6991 smart contracts has flagged more than 459 vulnerabilities with high precision. In particular, our fuzzing tool successfully detects the vulnerability of the DAO contract that leads to USD 60 million loss and the vulnerabilities of Parity Wallet that have led to the loss of USD 30 million and the freezing of USD 150 million worth of Ether.
@InProceedings{ASE18p259,
author = {Bo Jiang and Ye Liu and W. K. Chan},
title = {ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {259--269},
doi = {10.1145/3238147.3238177},
year = {2018},
}
Publisher's Version
Published Artifact
Info
Artifacts Available
Developer Tools
SEEDE: Simultaneous Execution and Editing in a Development Environment
Steven P. Reiss, Qi Xin, and Jeff Huang
(Brown University, USA)
We introduce a tool within the Code Bubbles development environment that allows for continuous execution as the programmer edits. The tool, SEEDE, shows both the intermediate and final results of execution in terms of variables, control and data flow, output, and graphics. These results are updated as the user edits. The tool can be used to help the user write new code or to find and fix bugs. The tool is explicitly designed to let the user quickly explore the execution of a method along with all the code it invokes, possibly while writing or modifying the code. The user can start continuous execution either at a breakpoint or for a test case. This paper describes the tool, its implementation, and its user interface. It presents an initial user study of the tool demonstrating its potential utility.
@InProceedings{ASE18p270,
author = {Steven P. Reiss and Qi Xin and Jeff Huang},
title = {SEEDE: Simultaneous Execution and Editing in a Development Environment},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {270--281},
doi = {10.1145/3238147.3238182},
year = {2018},
}
Publisher's Version
Effective API Recommendation without Historical Software Repositories
Xiaoyu Liu, LiGuo Huang, and Vincent Ng
(Southern Methodist University, USA; University of Texas at Dallas, USA)
It is time-consuming and labor-intensive to learn and locate the correct API for programming tasks. Thus, it is beneficial to perform API recommendation automatically. The graph-based statistical model has been shown to recommend top-10 API candidates effectively. It falls short, however, in accurately recommending an actual top-1 API. To address this weakness, we propose RecRank, an approach and tool that applies a novel ranking-based discriminative approach leveraging API usage path features to improve top-1 API recommendation. Empirical evaluation on a large corpus of (1385+8) open source projects shows that RecRank significantly improves top-1 API recommendation accuracy and mean reciprocal rank when compared to state-of-the-art API recommendation approaches.
@InProceedings{ASE18p282,
author = {Xiaoyu Liu and LiGuo Huang and Vincent Ng},
title = {Effective API Recommendation without Historical Software Repositories},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {282--292},
doi = {10.1145/3238147.3238216},
year = {2018},
}
Publisher's Version
API Method Recommendation without Worrying about the Task-API Knowledge Gap
Qiao Huang, Xin Xia,
Zhenchang Xing,
David Lo, and Xinyu Wang
(Zhejiang University, China; Monash University, Australia; Australian National University, Australia; Singapore Management University, Singapore)
Developers often need to search for appropriate APIs for their programming tasks. Although most libraries have API reference documentation, it is not easy to find appropriate APIs due to the lexical gap and knowledge gap between the natural language description of the programming task and the API description in API documentation. Here, the lexical gap refers to the fact that the same semantic meaning can be expressed by different words, and the knowledge gap refers to the fact that API documentation mainly describes API functionality and structure but lacks other types of information like concepts and purposes, which are usually the key information in the task description. In this paper, we propose an API recommendation approach named BIKER (Bi-Information source based KnowledgE Recommendation) to tackle these two gaps. To bridge the lexical gap, BIKER uses word embedding technique to calculate the similarity score between two text descriptions. Inspired by our survey findings that developers incorporate Stack Overflow posts and API documentation for bridging the knowledge gap, BIKER leverages Stack Overflow posts to extract candidate APIs for a program task, and ranks candidate APIs by considering the query’s similarity with both Stack Overflow posts and API documentation. It also summarizes supplementary information (e.g., API description, code examples in Stack Overflow posts) for each API to help developers select the APIs that are most relevant to their tasks. Our evaluation with 413 API-related questions confirms the effectiveness of BIKER for both class- and method-level API recommendation, compared with state-of-the-art baselines. Our user study with 28 Java developers further demonstrates the practicality of BIKER for API search.
@InProceedings{ASE18p293,
author = {Qiao Huang and Xin Xia and Zhenchang Xing and David Lo and Xinyu Wang},
title = {API Method Recommendation without Worrying about the Task-API Knowledge Gap},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {293--304},
doi = {10.1145/3238147.3238191},
year = {2018},
}
Publisher's Version
An Automated Approach to Estimating Code Coverage Measures via Execution Logs
Boyuan Chen, Jian Song, Peng Xu, Xing Hu, and
Zhen Ming (Jack) Jiang
(York University, Canada; Baidu, China)
Software testing is a widely used technique to ensure the quality of software systems. Code coverage measures are commonly used to evaluate and improve the existing test suites. Based on our industrial and open source studies, existing state-of-the-art code coverage tools are only used during unit and integration testing due to issues like engineering challenges, performance overhead, and incomplete results. To resolve these issues, in this paper we have proposed an automated approach, called LogCoCo, to estimating code coverage measures using the readily available execution logs. Using program analysis techniques, LogCoCo matches the execution logs with their corresponding code paths and estimates three different code coverage criteria: method coverage, statement coverage, and branch coverage. Case studies on one open source system (HBase) and five commercial systems from Baidu and systems show that: (1) the results of LogCoCo are highly accurate (>96% in seven out of nine experiments) under a variety of testing activities (unit testing, integration testing, and benchmarking); and (2) the results of LogCoCo can be used to evaluate and improve the existing test suites. Our collaborators at Baidu are currently considering adopting LogCoCo and use it on a daily basis.
@InProceedings{ASE18p305,
author = {Boyuan Chen and Jian Song and Peng Xu and Xing Hu and Zhen Ming (Jack) Jiang},
title = {An Automated Approach to Estimating Code Coverage Measures via Execution Logs},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {305--316},
doi = {10.1145/3238147.3238214},
year = {2018},
}
Publisher's Version
Static Analysis
How Many of All Bugs Do We Find? A Study of Static Bug Detectors
Andrew Habib and
Michael Pradel
(TU Darmstadt, Germany)
Static bug detectors are becoming increasingly popular and are widely used by professional software developers. While most work on bug detectors focuses on whether they find bugs at all, and on how many false positives they report in addition to legitimate warnings, the inverse question is often neglected: How many of all real-world bugs do static bug detectors find? This paper addresses this question by studying the results of applying three widely used static bug detectors to an extended version of the Defects4J dataset that consists of 15 Java projects with 594 known bugs. To decide which of these bugs the tools detect, we use a novel methodology that combines an automatic analysis of warnings and bugs with a manual validation of each candidate of a detected bug. The results of the study show that: (i) static bug detectors find a non-negligible amount of all bugs, (ii) different tools are mostly complementary to each other, and (iii) current bug detectors miss the large majority of the studied bugs. A detailed analysis of bugs missed by the static detectors shows that some bugs could have been found by variants of the existing detectors, while others are domain-specific problems that do not match any existing bug pattern. These findings help potential users of such tools to assess their utility, motivate and outline directions for future work on static bug detection, and provide a basis for future comparisons of static bug detection with other bug finding techniques, such as manual and automated testing.
@InProceedings{ASE18p317,
author = {Andrew Habib and Michael Pradel},
title = {How Many of All Bugs Do We Find? A Study of Static Bug Detectors},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {317--328},
doi = {10.1145/3238147.3238213},
year = {2018},
}
Publisher's Version
Info
TRIMMER: Application Specialization for Code Debloating
Hashim Sharif, Muhammad Abubakar, Ashish Gehani, and Fareed Zaffar
(University of Illinois at Urbana-Champaign, USA; Lahore University of Management Sciences, Pakistan; SRI International, USA)
With the proliferation of new hardware architectures and ever-evolving user requirements, the software stack is becoming increasingly bloated. In practice, only a limited subset of the supported functionality is utilized in a particular usage context, thereby presenting an opportunity to eliminate unused features. In the past, program specialization has been proposed as a mechanism for enabling automatic software debloating. In this work, we show how existing program specialization techniques lack the analyses required for providing code simplification for real-world programs. We present an approach that uses stronger analysis techniques to take advantage of constant configuration data, thereby enabling more effective debloating. We developed Trimmer, an application specialization tool that leverages user-provided configuration data to specialize an application to its deployment context. The specialization process attempts to eliminate the application functionality that is unused in the user-defined context. Our evaluation demonstrates Trimmer can effectively reduce code bloat. For 13 applications spanning various domains, we observe a mean binary size reduction of 21% and a maximum reduction of 75%. We also show specialization reduces the surface for code-reuse attacks by reducing the number of exploitable gadgets. For the evaluated programs, we observe a 20% mean reduction in the total gadget count and a maximum reduction of 87%.
@InProceedings{ASE18p329,
author = {Hashim Sharif and Muhammad Abubakar and Ashish Gehani and Fareed Zaffar},
title = {TRIMMER: Application Specialization for Code Debloating},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {329--339},
doi = {10.1145/3238147.3238160},
year = {2018},
}
Publisher's Version
A Unified Lattice Model and Framework for Purity Analyses
Dominik Helm, Florian Kübler, Michael Eichberg, Michael Reif, and
Mira Mezini
(TU Darmstadt, Germany)
Analyzing methods in object-oriented programs whether they are side-effect free and also deterministic, i.e., mathematically pure, has been the target of extensive research. Identifying such methods helps to find code smells and security related issues, and also helps analyses detecting concurrency bugs. Pure methods are also used by formal verification approaches as the foundations for specifications and proving the pureness is necessary to ensure correct specifications.
However, so far no common terminology exists which describes the purity of methods. Furthermore, some terms (e.g., pure or side-effect free) are also used inconsistently. Further, all current approaches only report selected purity information making them only suitable for a smaller subset of the potential use cases.
In this paper, we present a fine-grained unified lattice model which puts the purity levels found in the literature into relation and which adds a new level that generalizes existing definitions. We have also implemented a scalable, modularized purity analysis which produces significantly more precise results for real-world programs than the best-performing related work. The analysis shows that all defined levels are found in real-world projects.
@InProceedings{ASE18p340,
author = {Dominik Helm and Florian Kübler and Michael Eichberg and Michael Reif and Mira Mezini},
title = {A Unified Lattice Model and Framework for Purity Analyses},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {340--350},
doi = {10.1145/3238147.3238226},
year = {2018},
}
Publisher's Version
Verification 1
Control Flow-Guided SMT Solving for Program Verification
Jianhui Chen and
Fei He
(Tsinghua University, China)
Satisfiability modulo theories (SMT) solvers have been widely applied as the reasoning engine for diverse software analysis and verification technologies. The efficiency of the SMT solver has significant effects on the performance of these technologies. However, the current SMT solvers are designed for the general purpose of constraint solving. Many useful knowledge of programs cannot be utilized during the SMT solving. As a result, the SMT solver may spend a lot of effort to explore redundant search space. In this paper, we propose a novel approach for utilizing control-flow knowledge in SMT solving. With this technique, the search space can be considerably reduced and the efficiency of SMT solving is observably improved. We conducted extensive experiments on credible benchmarks, the results show orders of magnitude improvements of our approach.
@InProceedings{ASE18p351,
author = {Jianhui Chen and Fei He},
title = {Control Flow-Guided SMT Solving for Program Verification},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {351--361},
doi = {10.1145/3238147.3238218},
year = {2018},
}
Publisher's Version
PaMpeR: Proof Method Recommendation System for Isabelle/HOL
Yutaka Nagashima and Yilun He
(Czech Technical University, Czechia; University of Innsbruck, Austria; University of Sydney, Australia)
Deciding which sub-tool to use for a given proof state requires expertise specific to each interactive theorem prover (ITP). To mitigate this problem, we present PaMpeR, a proof method recommendation system for Isabelle/HOL. Given a proof state, PaMpeR recommends proof methods to discharge the proof goal and provides qualitative explanations as to why it suggests these methods. PaMpeR generates these recommendations based on existing hand-written proof corpora, thus transferring experienced users’ expertise to new users. Our evaluation shows that PaMpeR correctly predicts experienced users’ proof methods invocation especially when it comes to special purpose proof methods.
@InProceedings{ASE18p362,
author = {Yutaka Nagashima and Yilun He},
title = {PaMpeR: Proof Method Recommendation System for Isabelle/HOL},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {362--372},
doi = {10.1145/3238147.3238210},
year = {2018},
}
Publisher's Version
Maintenance and Machine Learning
Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?
Zhongxin Liu, Xin Xia,
Ahmed E. Hassan,
David Lo,
Zhenchang Xing, and Xinyu Wang
(Zhejiang University, China; Monash University, Australia; Queen's University, Canada; Singapore Management University, Singapore; Australian National University, Australia)
Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.’s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.
@InProceedings{ASE18p373,
author = {Zhongxin Liu and Xin Xia and Ahmed E. Hassan and David Lo and Zhenchang Xing and Xinyu Wang},
title = {Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {373--384},
doi = {10.1145/3238147.3238190},
year = {2018},
}
Publisher's Version
Deep Learning Based Feature Envy Detection
Hui Liu, Zhifeng Xu, and Yanzhen Zou
(Beijing Institute of Technology, China; Peking University, China)
Software refactoring is widely employed to improve software quality. A key step in software refactoring is to identify which part of the software should be refactored. To facilitate the identification, a number of approaches have been proposed to identify certain structures in the code (called code smells) that suggest the possibility of refactoring. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics to predictions. However, it is challenging to manually select the best features, especially textual features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting feature envy, one of the most common code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features (especially textual features) of source code for feature envy detection, and could automatically build the complex mapping between such features and predictions. We also propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art in both detecting feature envy smells and recommending destinations for identified smelly methods.
@InProceedings{ASE18p385,
author = {Hui Liu and Zhifeng Xu and Yanzhen Zou},
title = {Deep Learning Based Feature Envy Detection},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {385--396},
doi = {10.1145/3238147.3238166},
year = {2018},
}
Publisher's Version
Improving Automatic Source Code Summarization via Deep Reinforcement Learning
Yao Wan, Zhou Zhao, Min Yang,
Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu
(Zhejiang University, China; Chinese Academy of Sciences, China; University of Technology Sydney, Australia; University of Illinois at Chicago, USA)
Code summarization provides a high level natural language description of the function performed by code, as it can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major drawbacks: a) Their encoders only consider the sequential content of code, ignoring the tree structure which is also critical for the task of code summarization; b) Their decoders are typically trained to predict the next word by maximizing the likelihood of next ground-truth word with previous ground-truth word given. However, it is expected to generate the entire sequence from scratch at test time. This discrepancy can cause an exposure bias issue, making the learnt decoder suboptimal. In this paper, we incorporate an abstract syntax tree structure as well as sequential content of code snippets into a deep reinforcement learning framework (i.e., actor-critic network). The actor network provides the confidence of predicting the next word according to current state. On the other hand, the critic network evaluates the reward value of all possible extensions of the current state and can provide global guidance for explorations. We employ an advantage reward composed of BLEU metric to train both networks. Comprehensive experiments on a real-world dataset show the effectiveness of our proposed model when compared with some state-of-the-art methods.
@InProceedings{ASE18p397,
author = {Yao Wan and Zhou Zhao and Min Yang and Guandong Xu and Haochao Ying and Jian Wu and Philip S. Yu},
title = {Improving Automatic Source Code Summarization via Deep Reinforcement Learning},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {397--407},
doi = {10.1145/3238147.3238206},
year = {2018},
}
Publisher's Version
Symbolic Execution
Template-Guided Concolic Testing via Online Learning
Sooyoung Cha, Seonho Lee, and
Hakjoo Oh
(Korea University, South Korea)
We present template-guided concolic testing, a new technique for effectively reducing the search space in concolic testing. Addressing the path-explosion problem has been a significant challenge in concolic testing. Diverse search heuristics have been proposed to mitigate this problem but using search heuristics alone is not sufficient to substantially improve code coverage for real-world programs. The goal of this paper is to complement existing techniques and achieve higher coverage by exploiting templates in concolic testing. In our approach, a template is a partially symbolized input vector whose job is to reduce the search space. However, choosing a right set of templates is nontrivial and significantly affects the final performance of our approach. We present an algorithm that automatically learns useful templates online, based on data collected from previous runs of concolic testing. The experimental results with open-source programs show that our technique achieves greater branch coverage and finds bugs more effectively than conventional concolic testing.
@InProceedings{ASE18p408,
author = {Sooyoung Cha and Seonho Lee and Hakjoo Oh},
title = {Template-Guided Concolic Testing via Online Learning},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {408--418},
doi = {10.1145/3238147.3238227},
year = {2018},
}
Publisher's Version
Android Testing via Synthetic Symbolic Execution
Xiang Gao, Shin Hwei Tan, Zhen Dong, and
Abhik Roychoudhury
(National University of Singapore, Singapore; Southern University of Science and Technology, China)
Symbolic execution of Android applications is challenging as it involves either building a customized VM for Android or modeling the Android libraries. Since the Android Runtime evolves from one version to another, building a high-fidelity symbolic execution engine involves modeling the effect of the libraries and their evolved versions. Without simulating the behavior of Android libraries, path divergence may occur due to constraint loss when the symbolic values flow into Android framework and these values later affect the subsequent path taken. Previous works such as JPF-Android have relied on the modeling of execution environment such as libraries. In this work, we build a dynamic symbolic execution engine for Android apps, without any manual modeling of execution environment. Environment (or library) dependent control flow decisions in the application will trigger an on-demand program synthesis step to automatically deduce a representation of the library.This representation is refined on-the-fly by running the corresponding library multiple times.The overarching goal of the refinement is to enhance behavioral coverage and to alleviate the path divergence problem during symbolic execution. Moreover, our library synthesis can be made context-specific. Compared to traditional synthesis approaches which aim to synthesize the complete library code, our context-specific synthesis engine can generate more precise expressions for a given context. The evaluation of our dynamic symbolic execution engine, built on top of JDART, shows that the library models obtained from program synthesis are often more accurate than the semi-manual models in JPF-Android. Furthermore, our symbolic execution engine could reach more branch targets, as compared to using the JPF-Android models.
@InProceedings{ASE18p419,
author = {Xiang Gao and Shin Hwei Tan and Zhen Dong and Abhik Roychoudhury},
title = {Android Testing via Synthetic Symbolic Execution},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {419--429},
doi = {10.1145/3238147.3238225},
year = {2018},
}
Publisher's Version
PARTI: A Multi-interval Theory Solver for Symbolic Execution
Oscar Soria Dustmann, Klaus Wehrle, and
Cristian Cadar
(RWTH Aachen University, Germany; Imperial College London, UK)
Symbolic execution is an effective program analysis technique whose scalability largely depends on the ability to quickly solve large numbers of first-order logic queries.
We propose an effective general technique for speeding up the solving of queries in the theory of arrays and bit-vectors with a specific structure, while otherwise falling back to a complete solver.
The technique has two stages: a learning stage that determines the solution sets of each symbolic variable, and a decision stage that uses this information to quickly determine the satisfiability of certain types of queries.
The main challenges involve deciding which operators to support and precisely dealing with integer type casts and arithmetic underflow and overflow.
We implemented this technique in an incomplete solver called PARTI (``PARtial Theory solver for Intervals''), directly integrating it into the popular KLEE symbolic execution engine.
We applied KLEE with PARTI and a state-of-the-art SMT solver to synthetic and real-world benchmarks.
We found that PARTI practically does not hurt performance while many times achieving order-of-magnitude speedups.
@InProceedings{ASE18p430,
author = {Oscar Soria Dustmann and Klaus Wehrle and Cristian Cadar},
title = {PARTI: A Multi-interval Theory Solver for Symbolic Execution},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {430--440},
doi = {10.1145/3238147.3238179},
year = {2018},
}
Publisher's Version
Maintenance
Client-Specific Equivalence Checking
Federico Mora, Yi Li,
Julia Rubin, and
Marsha Chechik
(University of Toronto, Canada; University of British Columbia, Canada)
Software is often built by integrating components created by different teams or even different organizations. With little understanding of changes in dependent components, it is challenging to maintain correctness and robustness of the entire system. In this paper, we investigate the effect of component changes on the behavior of their clients. We observe that changes in a component are often irrelevant to a particular client and thus can be adopted without any delays or negative effects. Following this observation, we formulate the notion of client-specific equivalence checking (CSE) and develop an automated technique optimized for checking such equivalence. We evaluate our technique on a set of benchmarks, including those from the existing literature on equivalence checking, and show its applicability and effectiveness.
@InProceedings{ASE18p441,
author = {Federico Mora and Yi Li and Julia Rubin and Marsha Chechik},
title = {Client-Specific Equivalence Checking},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {441--451},
doi = {10.1145/3238147.3238178},
year = {2018},
}
Publisher's Version
Replay without Recording of Production Bugs for Service Oriented Applications
Nipun Arora, Jonathan Bell, Franjo Ivančić,
Gail Kaiser, and
Baishakhi Ray
(Dropbox, USA; George Mason University, USA; Google, USA; Columbia University, USA)
Short time-to-localize and time-to-fix for production bugs is extremely important for any 24x7 service-oriented application (SOA). Debugging buggy behavior in deployed applications is hard, as it requires careful reproduction of a similar environment and workload. Prior approaches for automatically reproducing production failures do not scale to large SOA systems. Our key insight is that for many failures in SOA systems (e.g., many semantic and performance bugs), a failure can automatically be reproduced solely by relaying network packets to replicas of suspect services, an insight that we validated through a manual study of 16 real bugs across five different systems. This paper presents Parikshan, an application monitoring framework that leverages user-space virtualization and network proxy technologies to provide a sandbox “debug” environment. In this “debug” environment, developers are free to attach debuggers and analysis tools without impacting performance or correctness of the production environment. In comparison to existing monitoring solutions that can slow down production applications, Parikshan allows application monitoring at significantly lower overhead.
@InProceedings{ASE18p452,
author = {Nipun Arora and Jonathan Bell and Franjo Ivančić and Gail Kaiser and Baishakhi Ray},
title = {Replay without Recording of Production Bugs for Service Oriented Applications},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {452--463},
doi = {10.1145/3238147.3238186},
year = {2018},
}
Publisher's Version
Reducing Interactive Refactoring Effort via Clustering-Based Multi-objective Search
Vahid Alizadeh and Marouane Kessentini
(University of Michigan, USA)
Refactoring is nowadays widely adopted in the industry because bad design decisions can be very costly and extremely risky. On the one hand, automated refactoring does not always lead to the desired design. On the other
hand, manual refactoring is error-prone, time-consuming and not practical for radical changes. Thus, recent research trends in the field focused on integrating developers feedback into automated refactoring recommendations because developers understand
the problem domain intuitively and may have a clear target design in mind. However, this interactive process can be repetitive, expensive, and
tedious since developers must evaluate recommended refactorings, and adapt them to the targeted design especially in large systems where the number of possible strategies can grow exponentially.
In this paper, we propose an interactive approach combining the use of multi-objective and unsupervised learning to reduce the developer's interaction effort when refactoring systems. We generate, first, using multi-objective search different possible refactoring strategies by finding a trade-off between several conflicting quality attributes. Then, an unsupervised learning algorithm clusters the different trade-off solutions, called the Pareto front, to guide the developers in selecting their region of interests and reduce the number of refactoring options to explore. The feedback from the developer, both at the cluster and solution levels, are used to automatically generate constraints to reduce the search space in the next iterations and focus on the region of developer preferences. We selected 14 active developers to manually evaluate the effectiveness our tool on 5 open source projects and one industrial system. The results show that the participants found their desired refactorings faster and more accurate than the current state of the art.
@InProceedings{ASE18p464,
author = {Vahid Alizadeh and Marouane Kessentini},
title = {Reducing Interactive Refactoring Effort via Clustering-Based Multi-objective Search},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {464--474},
doi = {10.1145/3238147.3238217},
year = {2018},
}
Publisher's Version
Software Quality
FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage
Caroline Lemieux and Koushik Sen
(University of California at Berkeley, USA)
In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop (AFL), has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the bugs it can find since it simply does not cover large regions of code. If it does not cover parts of the code, it will not find bugs there. We propose a two-pronged approach to increase the coverage achieved by AFL. First, the approach automatically identifies branches exercised by few AFL-produced inputs (rare branches), which often guard code that is empirically hard to cover by naively mutating inputs. The second part of the approach is a novel mutation mask creation algorithm, which allows mutations to be biased towards producing inputs hitting a given rare branch. This mask is dynamically computed during fuzz testing and can be adapted to other testing targets. We implement this approach on top of AFL in a tool named FairFuzz. We conduct evaluation on real-world programs against state-of-the-art versions of AFL. We find that on these programs FairFuzz achieves high branch coverage at a faster rate that state-of-the-art versions of AFL. In addition, on programs with nested conditional structure, it achieves sustained increases in branch coverage after 24 hours (average 10.6% increase). In qualitative analysis, we find that FairFuzz has an increased capacity to automatically discover keywords.
@InProceedings{ASE18p475,
author = {Caroline Lemieux and Koushik Sen},
title = {FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {475--485},
doi = {10.1145/3238147.3238176},
year = {2018},
}
Publisher's Version
Efficiently Manifesting Asynchronous Programming Errors in Android Apps
Lingling Fan, Ting Su,
Sen Chen, Guozhu Meng,
Yang Liu, Lihua Xu, and
Geguang Pu
(East China Normal University, China; Nanyang Technological University, Singapore; Chinese Academy of Sciences, China; New York University Shanghai, China)
Android, the #1 mobile app framework, enforces the single-GUI-thread model, in which a single UI
thread manages GUI rendering and event dispatching. Due to this model, it is vital to avoid blocking the UI thread for responsiveness. One common practice is to offload long-running tasks into async threads. To achieve this, Android provides various async programming constructs, and leaves evelopers themselves to obey the rules implied by the model. However, as our study reveals, more than 25% apps violate these rules and introduce hard-to-detect, fail-stop errors, which we term as aysnc programming errors (APEs). To this end, this paper introduces APEChecker, a technique to automatically and efficiently manifest APEs. The key idea is to characterize APEs as specific fault patterns, and synergistically combine static analysis and dynamic UI exploration to detect and verify such errors. Among the 40 real-world Android apps, APEChecker unveils and processes 61 APEs, of which 51 are confirmed (83.6% hit rate). Specifically, APEChecker detects 3X more APEs than the state-of-art testing tools (Monkey, Sapienz and Stoat), and reduces testing time from half an hour to a few minutes. On a specific type of APEs, APEChecker confirms 5X more errors than the data race detection tool, EventRacer, with very few false alarms.
@InProceedings{ASE18p486,
author = {Lingling Fan and Ting Su and Sen Chen and Guozhu Meng and Yang Liu and Lihua Xu and Geguang Pu},
title = {Efficiently Manifesting Asynchronous Programming Errors in Android Apps},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {486--497},
doi = {10.1145/3238147.3238170},
year = {2018},
}
Publisher's Version
Expandable Group Identification in Spreadsheets
Wensheng Dou,
Shi Han, Liang Xu,
Dongmei Zhang, and
Jun Wei
(University of Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China; Microsoft Research, China)
Spreadsheets are widely used in various business tasks. Spreadsheet users may put similar data and computations by repeating a block of cells (a unit) in their spreadsheets. We name the unit and all its expanding ones as an expandable group. All units in an expandable group share the same or similar formats and semantics. As a data storage and management tool, expandable groups represent the fundamental structure in spreadsheets. However, existing spreadsheet systems do not recognize any expandable groups. Therefore, other spreadsheet analysis tools, e.g., data integration and fault detection, cannot utilize this structure of expandable groups to perform precise analysis.
In this paper, we propose ExpCheck to automatically extract expandable groups in spreadsheets. We observe that continuous units that share the similar formats and semantics are likely to be an expandable group. Inspired by this, we inspect the format of each cell and its corresponding semantics, and further classify them into expandable groups according to their similarity. We evaluate ExpCheck on 120 spreadsheets randomly sampled from the EUSES and VEnron corpora. The experimental results show that ExpCheck is effective. ExpCheck successfully detect expandable groups with F1-measure of 73.1%, significantly outperforming the state-of-the-art techniques (F1-measure of 13.3%).
@InProceedings{ASE18p498,
author = {Wensheng Dou and Shi Han and Liang Xu and Dongmei Zhang and Jun Wei},
title = {Expandable Group Identification in Spreadsheets},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {498--508},
doi = {10.1145/3238147.3238222},
year = {2018},
}
Publisher's Version
Break the Dead End of Dynamic Slicing: Localizing Data and Control Omission Bug
Yun Lin, Jun Sun, Lyly Tran, Guangdong Bai, Haijun Wang, and Jinsong Dong
(National University of Singapore, Singapore; Singapore University of Technology and Design, Singapore; Singapore Institute of Technology, Singapore; Nanyang Technological University, Singapore)
Dynamic slicing is a common way of identifying the root cause when a program fault is revealed. With the dynamic slicing technique, the programmers can follow data and control flow along the program execution trace to the root cause. However, the technique usually fails to work on omission bugs, i.e., the faults which are caused by missing executing some code. In many cases, dynamic slicing over-skips the root cause when an omission bug happens, leading the debugging process to a dead end. In this work, we conduct an empirical study on the omission bugs in the Defects4J bug repository. Our study shows that (1) omission bugs are prevalent (46.4%) among all the studied bugs; (2) there are repeating patterns on causes and fixes of the omission bugs; (3) the patterns of fixing omission bugs serve as a strong hint to break the slicing dead end. Based on our findings, we train a neural network model on the omission bugs in Defects4J repository to recommend where to approach when slicing can no long work. We conduct an experiment by applying our approach on 3193 mutated omission bugs which slicing fails to locate. The results show that our approach outperforms random benchmark on breaking the dead end and localizing the mutated omission bugs (63.8% over 2.8%).
@InProceedings{ASE18p509,
author = {Yun Lin and Jun Sun and Lyly Tran and Guangdong Bai and Haijun Wang and Jinsong Dong},
title = {Break the Dead End of Dynamic Slicing: Localizing Data and Control Omission Bug},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {509--519},
doi = {10.1145/3238147.3238163},
year = {2018},
}
Publisher's Version
Architecture and Requirements
A Genetic Algorithm for Goal-Conflict Identification
Renzo Degiovanni, Facundo Molina, Germán Regis, and
Nazareno Aguirre
(National University of Río Cuarto, Argentina; CONICET, Argentina)
Goal-conflict analysis has been widely used as an abstraction for risk analysis in goal-oriented requirements engineering approaches. In this context, where the expected behaviour of the system-to-be is captured in terms of domain properties and goals, identifying combinations of circumstances that may make the goals diverge, i.e., not to be satisfied as a whole, is of most importance.
Various approaches have been proposed in order to automatically identify boundary conditions, i.e., formulas capturing goal-divergent situations, but they either apply only to some specific goal expressions, or are affected by scalability issues that make them applicable only to relatively small specifications. In this paper, we present a novel approach to automatically identify boundary conditions, using evolutionary computation. More precisely, we develop a genetic algorithm that, given the LTL formulation of the domain properties and the goals, it searches for formulas that capture divergences in the specification. We exploit a modern LTL satisfiability checker to successfully guide our genetic algorithm to the solutions. We assess our technique on a set of case studies, and show that our genetic algorithm is able to find boundary conditions that cannot be generated by related approaches, and is able to efficiently scale to LTL specifications that other approaches are unable to deal with.
@InProceedings{ASE18p520,
author = {Renzo Degiovanni and Facundo Molina and Germán Regis and Nazareno Aguirre},
title = {A Genetic Algorithm for Goal-Conflict Identification},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {520--531},
doi = {10.1145/3238147.3238220},
year = {2018},
}
Publisher's Version
Info
Mobile Analysis
Understanding and Detecting Callback Compatibility Issues for Android Applications
Huaxun Huang,
Lili Wei, Yepang Liu, and
Shing-Chi Cheung
(Hong Kong University of Science and Technology, China; Southern University of Science and Technology, China)
The control flows of Android apps are largely driven by the protocols that govern how callback APIs are invoked in response to various events. When these callback APIs evolve along with the Android framework, the changes in their invocation protocols can induce unexpected control flows to existing Android apps, causing various compatibility issues. We refer to these issues as callback compatibility issues. While Android framework updates have received due attention, little is known about their impacts on app control flows and the callback compatibility issues thus induced. To bridge the gap, we examined Android documentations and conducted an empirical study on 100 real-world callback compatibility issues to investigate how these issues were induced by callback API evolutions. Based on our empirical findings, we propose a graph-based model to capture the control flow inconsistencies caused by API evolutions and devise a static analysis technique, Cider, to detect callback compatibility issues. Our evaluation of Cider on 20 popular open-source Android apps shows that Cider is effective. It detected 13 new callback compatibility issues in these apps, among which 12 issues were confirmed and 9 issues were fixed.
@InProceedings{ASE18p532,
author = {Huaxun Huang and Lili Wei and Yepang Liu and Shing-Chi Cheung},
title = {Understanding and Detecting Callback Compatibility Issues for Android Applications},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {532--542},
doi = {10.1145/3238147.3238181},
year = {2018},
}
Publisher's Version
Detecting and Summarizing GUI Changes in Evolving Mobile Apps
Kevin Moran, Cody Watson, John Hoskins, George Purnell, and
Denys Poshyvanyk
(College of William and Mary, USA)
Mobile applications have become a popular software development domain in recent years due in part to a large user base, capable hardware, and accessible platforms. However, mobile developers also face unique challenges, including pressure for frequent releases to keep pace with rapid platform evolution, hardware iteration, and user feedback. Due to this rapid pace of evolution, developers need automated support for documenting the changes made to their apps in order to aid in program comprehension. One of the more challenging types of changes to document in mobile apps are those made to the graphical user interface (GUI) due to its abstract, pixel-based representation. In this paper, we present a fully automated approach, called GCAT, for detecting and summarizing GUI changes during the evolution of mobile apps. GCAT leverages computer vision techniques and natural language generation to accurately and concisely summarize changes made to the GUI of a mobile app between successive commits or releases. We evaluate the performance of our approach in terms of its precision and recall in detecting GUI changes compared to developer specified changes, and investigate the utility of the generated change reports in a controlled user study. Our results indicate that GCAT is capable of accurately detecting and classifying GUI changes - outperforming developers - while providing useful documentation.
@InProceedings{ASE18p543,
author = {Kevin Moran and Cody Watson and John Hoskins and George Purnell and Denys Poshyvanyk},
title = {Detecting and Summarizing GUI Changes in Evolving Mobile Apps},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {543--553},
doi = {10.1145/3238147.3238203},
year = {2018},
}
Publisher's Version
Empirically Assessing Opportunities for Prefetching and Caching in Mobile Apps
Yixue Zhao, Paul Wat, Marcelo Schmitt Laser, and
Nenad Medvidović
(University of Southern California, USA)
Network latency in mobile software has a large impact on user experience, with potentially severe economic consequences. Prefetching and caching have been shown effective in reducing the latencies in browser-based systems. However, those techniques cannot be directly applied to the emerging domain of mobile apps because of the differences in network interactions. Moreover, there is a lack of research on prefetching and caching techniques that may be suitable for the mobile app domain, and it is not clear whether such techniques can be effective or whether they are even feasible. This paper takes the first step toward answering these questions by conducting a comprehensive study to understand the characteristics of HTTP requests in over 1,000 popular Android apps. Our work focuses on the prefetchability of requests using static program analysis techniques and cacheability of resulting responses. We find that there is a substantial opportunity to leverage prefetching and caching in mobile apps, but that suitable techniques must take into account the nature of apps’ network interactions and idiosyncrasies such as untrustworthy HTTP header information. Our observations provide guidelines for developers to utilize prefetching and caching schemes in app development, and motivate future research in this area.
@InProceedings{ASE18p554,
author = {Yixue Zhao and Paul Wat and Marcelo Schmitt Laser and Nenad Medvidović},
title = {Empirically Assessing Opportunities for Prefetching and Caching in Mobile Apps},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {554--564},
doi = {10.1145/3238147.3238215},
year = {2018},
}
Publisher's Version
Safe Stream-Based Programming with Refinement Types
Benno Stein,
Lazaro Clapp, Manu Sridharan, and
Bor-Yuh Evan Chang
(University of Colorado Boulder, USA; Uber Technologies, USA)
In stream-based programming, data sources are abstracted as a stream of values that can be manipulated via callback functions.
Stream-based programming is exploding in popularity, as it provides a powerful and expressive paradigm for handling asynchronous data sources in interactive software.
However, high-level stream abstractions can also make it difficult for developers to reason about control- and data-flow relationships in their programs.
This is particularly impactful when asynchronous stream-based code interacts with thread-limited features such as UI frameworks that restrict UI access to a single thread, since the threading behavior of streaming constructs is often non-intuitive and insufficiently documented.
In this paper, we present a type-based approach that can statically prove the thread-safety of UI accesses in stream-based software.
Our key insight is that the fluent APIs of stream-processing frameworks enable the tracking of threads via type-refinement, making it possible to reason automatically about what thread a piece of code runs on -- a difficult problem in general.
We implement the system as an annotation-based Java typechecker for Android programs built upon the popular ReactiveX framework and evaluate its efficacy by annotating and analyzing 8 open-source apps, where we find 33 instances of unsafe UI access while incurring an annotation burden of only one annotation per 186 source lines of code.
We also report on our experience applying the typechecker to two much larger apps from the Uber Technologies, Inc. codebase, where it currently runs on every code change and blocks changes that introduce potential threading bugs.
@InProceedings{ASE18p565,
author = {Benno Stein and Lazaro Clapp and Manu Sridharan and Bor-Yuh Evan Chang},
title = {Safe Stream-Based Programming with Refinement Types},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {565--576},
doi = {10.1145/3238147.3238174},
year = {2018},
}
Publisher's Version
Repair
Automated Model Repair for Alloy
Kaiyuan Wang, Allison Sullivan, and Sarfraz Khurshid
(University of Texas at Austin, USA)
Automated program repair is an active research area. However, existing research focuses mostly on imperative code, e.g. in Java. In this paper, we study the problem of repairing declarative models in Alloy -- a first order relational logic with transitive closure. We introduce ARepair, the first technique for repairing Alloy models. ARepair follows the spirit of traditional automated program repair techniques. Specifically, ARepair takes as input a faulty Alloy model and a test suite that contains some failing test, and outputs a repaired model that is correct with respect to the given tests. ARepair integrates ideas from mutation testing and program synthesis to provide an effective solution for repairing Alloy models. The experimental results show that ARepair can fix 28 out of 38 real-world faulty models we collected.
@InProceedings{ASE18p577,
author = {Kaiyuan Wang and Allison Sullivan and Sarfraz Khurshid},
title = {Automated Model Repair for Alloy},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {577--588},
doi = {10.1145/3238147.3238162},
year = {2018},
}
Publisher's Version
PFix: Fixing Concurrency Bugs Based on Memory Access Patterns
Huarui Lin,
Zan Wang,
Shuang Liu, Jun Sun, Dongdi Zhang, and Guangning Wei
(Tianjin University, China; Singapore University of Technology and Design, Singapore)
Concurrency bugs of a multi-threaded program may only manifest
with certain scheduling, i.e., they are heisenbugs which are
observed only from time to time if we execute the same program
with the same input multiple times. They are notoriously hard to
fix. In this work, we propose an approach to automatically fix concurrency
bugs. Compared to previous approaches, our key idea is
to systematically fix concurrency bugs by inferring locking policies
from failure inducing memory-access patterns. That is, we automatically
identify memory-access patterns which are correlated
with the manifestation of the bug, and then conjecture what is the
intended locking policy of the program. Afterwards, we fix the
program by implementing the locking policy so that the failure
inducing memory-access patterns are made impossible. We have
implemented our approach in a toolkit called PFix which supports
Java programs. We applied PFix to a set of 23 concurrency bugs
and are able to automatically fix 19 of them. In comparison, Grail
which is the state-of-the-art tool for fixing concurrency bugs in
Java programs can only fix 3 of them correctly.
@InProceedings{ASE18p589,
author = {Huarui Lin and Zan Wang and Shuang Liu and Jun Sun and Dongdi Zhang and Guangning Wei},
title = {PFix: Fixing Concurrency Bugs Based on Memory Access Patterns},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {589--600},
doi = {10.1145/3238147.3238198},
year = {2018},
}
Publisher's Version
Generating Reusable Web Components from Mockups
Mohammad Bajammal, Davood Mazinanian, and
Ali Mesbah
(University of British Columbia, Canada)
The transformation of a user interface mockup designed by a graphic designer to web components in the final app built by a web developer is often laborious, involving manual and time consuming steps. We propose an approach to automate this aspect of web development by generating reusable web components from a mockup. Our approach employs visual analysis of the mockup, and unsupervised learning of visual cues to create reusable web components (e.g., React components). We evaluated our approach, implemented in a tool called VizMod, on five real-world web mockups, and assessed the transformations and generated components through comparison with web development experts. The results show that VizMod achieves on average 94% precision and 75% recall in terms of agreement with the developers' assessment. Furthermore, the refactorings yielded 22% code reusability, on average.
@InProceedings{ASE18p601,
author = {Mohammad Bajammal and Davood Mazinanian and Ali Mesbah},
title = {Generating Reusable Web Components from Mockups},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {601--611},
doi = {10.1145/3238147.3238194},
year = {2018},
}
Publisher's Version
Semantic Crash Bucketing
Rijnard van Tonder, John Kotheimer, and
Claire Le Goues
(Carnegie Mellon University, USA)
Precise crash triage is important for automated dynamic testing tools, like fuzzers. At scale, fuzzers produce millions of crashing inputs. Fuzzers use heuristics, like stack hashes, to cut down on duplicate bug reports. These heuristics are fast, but often imprecise: even after deduplication, hundreds of uniquely reported crashes can still correspond to the same bug. Remaining crashes must be inspected manually, incurring considerable effort. In this paper we present Semantic Crash Bucketing, a generic method for precise crash bucketing using program transformation. Semantic Crash Bucketing maps crashing inputs to unique bugs as a function of changing a program (i.e., a semantic delta). We observe that a real bug fix precisely identifies crashes belonging to the same bug. Our insight is to approximate real bug fixes with lightweight program transformation to obtain the same level of precision. Our approach uses (a) patch templates and (b) semantic feedback from the program to automatically generate and apply approximate fixes for general bug classes. Our evaluation shows that approximate fixes are competitive with using true fixes for crash bucketing, and significantly outperforms built-in deduplication techniques for three state of the art fuzzers.
@InProceedings{ASE18p612,
author = {Rijnard van Tonder and John Kotheimer and Claire Le Goues},
title = {Semantic Crash Bucketing},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {612--622},
doi = {10.1145/3238147.3238200},
year = {2018},
}
Publisher's Version
Verification 2
A Symbolic Model Checking Approach to the Analysis of String and Length Constraints
Hung-En Wang, Shih-Yu Chen, Fang Yu, and Jie-Hong R. Jiang
(National Taiwan University, Taiwan; National Chengchi University, Taiwan)
Strings with length constraints are prominent in software security analysis. Recent endeavors have made significant progress in developing constraint solvers for strings and integers. Most prior methods are based on deduction with inference rules or analysis using automata. The former may be inefficient when the constraints involve complex string manipulations such as language replacement; the latter may not be easily extended to handle length constraints and may be inadequate for counterexample generation due to approximation. Inspired by recent work on string analysis with logic circuit representation, we propose a new method for solving string with length constraints by an implicit representation of automata with length encoding. The length-encoded automata are of infinite states and can represent languages beyond regular expressions. By converting string and length constraints into a dependency graph of manipulations over length-encoded automata, a symbolic model checker for infinite state systems can be leveraged as an engine for the analysis of string and length constraints. Experiments show that our method has its unique capability of handling complex string and length constraints not solvable by existing methods.
@InProceedings{ASE18p623,
author = {Hung-En Wang and Shih-Yu Chen and Fang Yu and Jie-Hong R. Jiang},
title = {A Symbolic Model Checking Approach to the Analysis of String and Length Constraints},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {623--633},
doi = {10.1145/3238147.3238189},
year = {2018},
}
Publisher's Version
Info
Domain-Independent Multi-threaded Software Model Checking
Dirk Beyer and
Karlheinz Friedberger
(LMU Munich, Germany)
Recent development of software aims at massively parallel execution,
because of the trend to increase the number of processing units per CPU socket.
But many approaches for program analysis are not designed
to benefit from a multi-threaded execution
and lack support to utilize multi-core computers.
Rewriting existing algorithms is difficult and error-prone,
and the design of new parallel algorithms also has limitations.
An orthogonal problem is the granularity:
computing each successor state in parallel seems too fine-grained,
so the open question is to find the right structural level for parallel execution.
We propose an elegant solution to these problems:
Block summaries should be computed in parallel.
Many successful approaches to software verification are based on summaries
of control-flow blocks, large blocks, or function bodies.
Block-abstraction memoization is a successful domain-independent approach for
summary-based program analysis.
We redesigned the verification approach of block-abstraction memoization
starting from its original recursive definition,
such that it can run in a parallel manner
for utilizing the available computation resources
without losing its advantages of being independent
from a certain abstract domain.
We present an implementation of our new approach
for multi-core shared-memory machines.
The experimental evaluation shows that our summary-based approach has no
significant overhead compared to the existing sequential approach
and that it has a significant speedup when using multi-threading.
@InProceedings{ASE18p634,
author = {Dirk Beyer and Karlheinz Friedberger},
title = {Domain-Independent Multi-threaded Software Model Checking},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {634--644},
doi = {10.1145/3238147.3238195},
year = {2018},
}
Publisher's Version
Published Artifact
Artifacts Available
Scheduling Constraint Based Abstraction Refinement for Weak Memory Models
Liangze Yin,
Wei Dong,
Wanwei Liu, and
Ji Wang
(National University of Defense Technology, China)
Scheduling constraint based abstraction refinement (SCAR) is one of the most efficient methods for verifying programs under sequential consistency (SC). However, most multi-processor architectures implement weak memory models (WMMs) in order to improve the performance of a program. Due to the nondeterministic execution of those memory operations by the same thread, the behavior of a program under WMMs is much more complex than that under SC, which significantly increases the verification complexity. This paper elegantly extends the SCAR method to WMMs such as TSO and PSO. To capture the order requirements of an abstraction counterexample under WMMs, we have enriched the event order graph (EOG) of a counterexample such that it is competent for both SC and WMMs. We have also proposed a unified EOG generation method which can always obtain a minimal EOG efficiently. Experimental results on a large set of multi-threaded C programs show promising results of our method. It significantly outperforms state-of-the-art tools, and the time and memory it required to verify a program under TSO and PSO are roughly comparable to that under SC.
@InProceedings{ASE18p645,
author = {Liangze Yin and Wei Dong and Wanwei Liu and Ji Wang},
title = {Scheduling Constraint Based Abstraction Refinement for Weak Memory Models},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {645--655},
doi = {10.1145/3238147.3238223},
year = {2018},
}
Publisher's Version
Code Differencing and Merging
Datalog-Based Scalable Semantic Diffing of Concurrent Programs
Chungha Sung,
Shuvendu K. Lahiri, Constantin Enea, and Chao Wang
(University of Southern California, USA; Microsoft Research, USA; University of Paris Diderot, France)
When an evolving program is modified to address issues related to
thread synchronization, there is a need to confirm the change is
correct, i.e., it does not introduce unexpected behavior. However,
manually comparing two programs to identify the semantic difference is
labor intensive and error prone, whereas techniques based on
model checking are computationally expensive.
To fill the gap, we develop a fast and approximate
static analysis for computing synchronization differences of two
programs. The method is fast because, instead of relying on
heavy-weight model checking techniques, it leverages a polynomial-time
Datalog-based program analysis framework to compute
differentiating data-flow edges,
i.e., edges allowed by one program but not the other.
Although approximation is used our method is sufficiently accurate due
to careful design of the Datalog inference rules and iterative
increase of the required data-flow edges for representing a difference.
We have implemented our method and evaluated it on a large number of
multithreaded C programs to confirm its ability to produce, often
within seconds, the same differences obtained by human; in contrast,
prior techniques based on model checking take minutes or even hours
and thus can be 10x to 1000x slower.
@InProceedings{ASE18p656,
author = {Chungha Sung and Shuvendu K. Lahiri and Constantin Enea and Chao Wang},
title = {Datalog-Based Scalable Semantic Diffing of Concurrent Programs},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {656--666},
doi = {10.1145/3238147.3238211},
year = {2018},
}
Publisher's Version
Info
αDiff: Cross-Version Binary Code Similarity Detection with DNN
Bingchang Liu,
Wei Huo,
Chao Zhang, Wenchao Li,
Feng Li, Aihua Piao, and Wei Zou
(Institute of Information Engineering at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Tsinghua University, China)
Binary code similarity detection (BCSD) has many applications, including patch analysis, plagiarism detection, malware detection, and vulnerability search etc. Existing solutions usually perform comparisons over specific syntactic features extracted from binary code, based on expert knowledge. They have either high performance overheads or low detection accuracy. Moreover, few solutions are suitable for detecting similarities between cross-version binaries, which may not only diverge in syntactic structures but also diverge slightly in semantics.
In this paper, we propose a solution αDiff, employing three semantic features, to address the cross-version BCSD challenge. It first extracts the intra-function feature of each binary function using a deep neural network (DNN). The DNN works directly on raw bytes of each function, rather than features (e.g., syntactic structures) provided by experts. αDiff further analyzes the function call graph of each binary, which are relatively stable in cross-version binaries, and extracts the inter-function and inter-module features. Then, a distance is computed based on these three features and used for BCSD. We have implemented a prototype of αDiff, and evaluated it on a dataset with about 2.5 million samples. The result shows that αDiff outperforms state-of-the-art static solutions by over 10 percentages on average in different BCSD settings.
@InProceedings{ASE18p667,
author = {Bingchang Liu and Wei Huo and Chao Zhang and Wenchao Li and Feng Li and Aihua Piao and Wei Zou},
title = {αDiff: Cross-Version Binary Code Similarity Detection with DNN},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {667--678},
doi = {10.1145/3238147.3238199},
year = {2018},
}
Publisher's Version
ClDiff: Generating Concise Linked Code Differences
Kaifeng Huang,
Bihuan Chen,
Xin Peng, Daihong Zhou, Ying Wang,
Yang Liu, and Wenyun Zhao
(Fudan University, China; Nanyang Technological University, Singapore)
Analyzing and understanding source code changes is important in a variety of software maintenance tasks. To this end, many code differencing and code change summarization methods have been proposed. For some tasks (e.g. code review and software merging), however, those differencing methods generate too fine-grained a representation of code changes, and those summarization methods generate too coarse-grained a representation of code changes. Moreover, they do not consider the relationships among code changes. Therefore, the generated differences or summaries make it not easy to analyze and understand code changes in some software maintenance tasks.
In this paper, we propose a code differencing approach, named CLDIFF, to generate concise linked code differences whose granularity is in between the existing code differencing and code change summarization methods. The goal of CLDIFF is to generate more easily understandable code differences. CLDIFF takes source code files before and after changes as inputs, and consists of three steps. First, it pre-processes the source code files by pruning unchanged declara- tions from the parsed abstract syntax trees. Second, it generates concise code differences by grouping fine-grained code differences at or above the statement level and describing high-level changes in each group. Third, it links the related concise code differences according to five pre-defined links. Experiments with 12 Java projects (74,387 commits) and a human study with 10 participants have indicated the accuracy, conciseness, performance and usefulness of CLDIFF.
@InProceedings{ASE18p679,
author = {Kaifeng Huang and Bihuan Chen and Xin Peng and Daihong Zhou and Ying Wang and Yang Liu and Wenyun Zhao},
title = {ClDiff: Generating Concise Linked Code Differences},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {679--690},
doi = {10.1145/3238147.3238219},
year = {2018},
}
Publisher's Version
Mobile Security
Characterizing and Identifying Misexposed Activities in Android Applications
Jiwei Yan, Xi Deng, Ping Wang, Tianyong Wu,
Jun Yan, and
Jian Zhang
(Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)
Exported Activity (EA), a kind of activities in Android apps that can be launched by external components, is one of the most important inter-component communication (ICC) mechanisms to realize the interaction and cooperation among multiple apps. Existing works have pointed out that, once exposed, an activity will be vulnerable to malicious ICC attacks, such as permission leakage attack. Unfortunately, it is observed that a considerable number of activities in commercial apps are exposed inadvertently, while few works have studied the necessity and reasonability of such exposure. This work takes the first step to systematically study the exposing behavior of EAs through analyzing 13,873 Android apps. It utilizes the EA associated call relationships extracted from byte-code via data-flow analysis, as well as the launch conditions obtained from the manifest files, to guide the study on the usage and misexposure of EAs. The empirical findings are that the EA mechanism is widely adopted in development and the activities are liable to be misexposed due to the developers' misunderstanding or carelessness. Further study on subsets of apps selected according to different criteria indicates that the misexposed EAs have specific characteristics, which are manually summarized into six typical misuse patterns. As a consequence, ten heuristics are designed to decide whether an activity should be exposed or not and are implemented into an automatic tool called Mist. Experiments on the collected apps show that around one fifth EAs are unnecessarily exposed and there are more than one third EAs whose exposure may not be suggested.
@InProceedings{ASE18p691,
author = {Jiwei Yan and Xi Deng and Ping Wang and Tianyong Wu and Jun Yan and Jian Zhang},
title = {Characterizing and Identifying Misexposed Activities in Android Applications},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {691--701},
doi = {10.1145/3238147.3238164},
year = {2018},
}
Publisher's Version
A Tale of Two Cities: How WebView Induces Bugs to Android Applications
Jiajun Hu,
Lili Wei, Yepang Liu,
Shing-Chi Cheung, and
Huaxun Huang
(Hong Kong University of Science and Technology, China; Southern University of Science and Technology, China)
WebView is a widely used Android component that augments a native app with web browser capabilities. It eases the interactions between an app’s native code and web code. However, the interaction mechanism of WebView induces new types of bugs in Android apps. Understanding the characteristics and manifestation of these WebView-induced bugs (ωBugs for short) facilitates the correct usages of WebViews in Android apps. This motivates us to conduct the first empirical study on ωBugs based on those found in popular open-source Android apps. Our study identified the major root causes and consequences of ωBugs and made interesting observations that can be leveraged for detecting and diagnosing ωBugs. Based on the empirical study, we further propose an automated testing technique ωDroid to effectively expose ωBugs in Android apps. In our experiments, ωDroid successfully discovered 30 unique and previously-unknown ωBugs when applied to 146 open-source Android apps. We reported the 30 ωBugs to the corresponding app developers. Out of these 30 ωBugs, 14 were confirmed and 7 of them were fixed. This shows that ωDroid can effectively detect ωBugs that are of the developers’ concern.
@InProceedings{ASE18p702,
author = {Jiajun Hu and Lili Wei and Yepang Liu and Shing-Chi Cheung and Huaxun Huang},
title = {A Tale of Two Cities: How WebView Induces Bugs to Android Applications},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {702--713},
doi = {10.1145/3238147.3238180},
year = {2018},
}
Publisher's Version
Dual-Force: Understanding WebView Malware via Cross-Language Forced Execution
Zhenhao Tang, Juan Zhai,
Minxue Pan, Yousra Aafer, Shiqing Ma, Xiangyu Zhang, and Jianhua Zhao
(Nanjing University, China; Purdue University, USA)
Modern Android malwares tend to use advanced techniques to cover their malicious behaviors. They usually feature multi-staged, condition-guarded and environment-specific payloads. An increasing number of them utilize WebView, particularly the two-way communications between Java and JavaScript, to evade detection and analysis of existing techniques. We propose Dual-Force, a forced execution technique which simultaneously forces both Java and JavaScript code of WebView applications to execute along various paths without requiring any environment setup or providing any inputs manually. As such, the hidden payloads of WebView malwares are forcefully exposed. The technique features a novel execution model that allows forced execution to suppress exceptions and continue execution. Experimental results show that Dual-Force precisely exposes malicious payload in 119 out of 150 WebView malwares. Compared to the state-of-the-art, Dual-Force can expose 23% more malicious behaviors.
@InProceedings{ASE18p714,
author = {Zhenhao Tang and Juan Zhai and Minxue Pan and Yousra Aafer and Shiqing Ma and Xiangyu Zhang and Jianhua Zhao},
title = {Dual-Force: Understanding WebView Malware via Cross-Language Forced Execution},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {714--725},
doi = {10.1145/3238147.3238221},
year = {2018},
}
Publisher's Version
Self-Protection of Android Systems from Inter-component Communication Attacks
Mahmoud Hammad,
Joshua Garcia, and
Sam Malek
(University of California at Irvine, USA)
The current security mechanisms for Android apps, both static and dynamic analysis approaches, are insufficient for detection and prevention of the increasingly dynamic and sophisticated security attacks. Static analysis approaches suffer from false positives whereas dynamic analysis approaches suffer from false negatives. Moreover, they all lack the ability to efficiently analyze systems with incremental changes—such as adding/removing apps, granting/revoking permissions, and dynamic components’ communications. Each time the system changes, the entire analysis needs to be repeated, making the existing approaches inefficient for practical use. To mitigate their shortcomings, we have developed SALMA, a novel self-protecting Android software system that monitors itself and adapts its behavior at runtime to prevent a wide-range of security risks. SALMA maintains a precise architectural model, represented as a Multiple-Domain-Matrix, and incrementally and efficiently analyzes an Android system in response to incremental system changes. The maintained architecture is used to reason about the running Android system. Every time the system changes, SALMA determines (1) the impacted part of the system, and (2) the subset of the security analyses that need to be performed, thereby greatly improving the performance of the approach. Our experimental results on hundreds of real-world apps corroborate SALMA’s scalability and efficiency as well as its ability to detect and prevent security attacks at runtime with minimal disruption.
@InProceedings{ASE18p726,
author = {Mahmoud Hammad and Joshua Garcia and Sam Malek},
title = {Self-Protection of Android Systems from Inter-component Communication Attacks},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {726--737},
doi = {10.1145/3238147.3238207},
year = {2018},
}
Publisher's Version
Experience Papers
An Empirical Study of Android Test Generation Tools in Industrial Cases
Wenyu Wang, Dengfeng Li, Wei Yang, Yurui Cao, Zhenwen Zhang,
Yuetang Deng, and Tao Xie
(University of Illinois at Urbana-Champaign, USA; University of Texas at Dallas, USA; Tencent, China)
User Interface (UI) testing is a popular approach to ensure the quality of mobile apps.
Numerous test generation tools have been developed to support UI testing on mobile apps, especially for Android apps.
Previous work evaluates and compares different test generation tools using only relatively simple open-source apps, while real-world industrial apps tend to have more complex functionalities and implementations.
There is no direct comparison among test generation tools with regard to effectiveness and ease-of-use on these industrial apps.
To address such limitation, we study existing state-of-the-art or state-of-the-practice test generation tools on 68 widely-used industrial apps.
We directly compare the tools with regard to code coverage and fault-detection ability.
According to our results, Monkey, a state-of-the-practice tool from Google, achieves the highest method coverage on 22 of 41 apps whose method coverage data can be obtained.
Of all 68 apps under study, Monkey also achieves the highest activity coverage on 35 apps, while Stoat, a state-of-the-art tool, is able to trigger the highest number of unique crashes on 23 apps.
By analyzing the experimental results, we provide suggestions for combining different test generation tools to achieve better performance.
We also report our experience in applying these tools to industrial apps under study.
Our study results give insights on how Android UI test generation tools could be improved to better handle complex industrial apps.
@InProceedings{ASE18p738,
author = {Wenyu Wang and Dengfeng Li and Wei Yang and Yurui Cao and Zhenwen Zhang and Yuetang Deng and Tao Xie},
title = {An Empirical Study of Android Test Generation Tools in Industrial Cases},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {738--748},
doi = {10.1145/3238147.3240465},
year = {2018},
}
Publisher's Version
Achieving Test Automation with Testers without Coding Skills: An Industrial Report
Davrondzhon Gafurov, Arne Erik Hurum, and Martin Markman
(Norwegian Directorate of eHealth, Norway)
We present a process driven test automation solution which enables delegating (part of) automation tasks from test automation engineer (expensive resource) to test analyst (non-developer, less expensive).
In our approach, a test automation engineer implements test steps (or actions) which are executed automatically. Such automated test steps represent user actions in the system under test and specified by a natural language which is understandable by a non-technical person. Then, a test analyst with a domain knowledge organizes automated steps combined with test input to create an automated test case. It should be emphasized that the test analyst does not need to possess programming skills to create, modify or execute automated test cases. We refine benchmark test automation architecture to be better suitable for an effective separation and sharing of responsibilities between the test automation engineer (with coding skills) and test analyst (with a domain knowledge).
In addition, we propose a metric to empirically estimate cooperation between test automation engineer and test analyst's works.
The proposed automation solution has been defined based on our experience in the development and maintenance of Helsenorg, the national electronic health services in Norway which has had over one million of visits per month past year, and we still use it to automate the execution of regression tests.
@InProceedings{ASE18p749,
author = {Davrondzhon Gafurov and Arne Erik Hurum and Martin Markman},
title = {Achieving Test Automation with Testers without Coding Skills: An Industrial Report},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {749--756},
doi = {10.1145/3238147.3240463},
year = {2018},
}
Publisher's Version
Navigating the Maze: The Impact of Configurability in Bioinformatics Software
Mikaela Cashman, Myra B. Cohen, Priya Ranjan, and Robert W. Cottingham
(University of Nebraska-Lincoln, USA; Oak Ridge National Laboratory, USA)
The bioinformatics software domain contains thousands of applications for automating tasks such as the pairwise alignment of DNA sequences, building and reasoning about metabolic models or simulating growth of an organism. Its end users range from sophisticated developers to those with little computational experience. In response to their needs, developers provide many options to customize the way their algorithms are tuned. Yet there is little or no automated help for the user in determining the consequences or impact of the options they choose. In this paper we describe our experience working with configurable bioinformatics tools. We find limited documentation and help for combining and selecting options along with variation in both functionality and performance. We also find previously undetected faults. We summarize our findings with a set of lessons learned, and present a roadmap for creating automated techniques to interact with bioinformatics software. We believe these will generalize to other types of scientific software.
@InProceedings{ASE18p757,
author = {Mikaela Cashman and Myra B. Cohen and Priya Ranjan and Robert W. Cottingham},
title = {Navigating the Maze: The Impact of Configurability in Bioinformatics Software},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {757--767},
doi = {10.1145/3238147.3240466},
year = {2018},
}
Publisher's Version
Info
Automatically Testing Implementations of Numerical Abstract Domains
Alexandra Bugariu,
Valentin Wüstholz,
Maria Christakis, and
Peter Müller
(ETH Zurich, Switzerland; MPI-SWS, Germany)
Static program analyses are routinely applied as the basis of code optimizations and to detect safety and security issues in software systems. For their results to be reliable, static analyses should be sound (i.e., should not produce false negatives) and precise (i.e., should report a low number of false positives). Even though it is possible to prove properties of the design of a static analysis, ensuring soundness and precision for its implementation is challenging. Complex algorithms and sophisticated optimizations make static analyzers difficult to implement and test.
In this paper, we present an automatic technique to test, among other properties, the soundness and precision of abstract domains, the core of all static analyzers based on abstract interpretation. In order to cover a wide range of test data and input states, we construct inputs by applying sequences of abstract-domain operations to representative domain elements, and vary the operations through gray-box fuzzing. We use mathematical properties of abstract domains as test oracles. Our experimental evaluation demonstrates the effectiveness of our approach. We detected several previously unknown soundness and precision errors in widely-used abstract domains. Our experiments also show that our approach is more effective than dynamic symbolic execution and than fuzzing the test inputs directly.
@InProceedings{ASE18p768,
author = {Alexandra Bugariu and Valentin Wüstholz and Maria Christakis and Peter Müller},
title = {Automatically Testing Implementations of Numerical Abstract Domains},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {768--778},
doi = {10.1145/3238147.3240464},
year = {2018},
}
Publisher's Version
Experiences Applying Automated Architecture Analysis Tool Suites
Ran Mo, Will Snipes, Yuanfang Cai, Srini Ramaswamy, Rick Kazman, and Martin Naedele
(Central China Normal University, China; ABB Corporate Research, USA; Drexel University, USA; ABB, USA; Carnegie Mellon University, USA; University of Hawaii, USA; ABB, Switzerland)
In this paper, we report our experiences of applying three complementary
automated software architecture analysis techniques,
supported by a tool suite, called DV8, to 8 industrial projects within
a large company. DV8 includes two state-of-the-art architecture-level
maintainability metrics—Decoupling Level and Propagation
Cost, an architecture flaw detection tool, and an architecture root
detection tool. We collected development process data from the
project teams as input to these tools, reported the results back
to the practitioners, and followed up with telephone conferences
and interviews. Our experiences revealed that the metrics scores,
quantitative debt analysis, and architecture flaw visualization can
effectively bridge the gap between management and development,
help them decide if, when, and where to refactor. In particular, the
metrics scores, compared against industrial benchmarks, faithfully
reflected the practitioners’ intuitions about the maintainability of
their projects, and enabled them to better understand the maintainability
relative to other projects internal to their company, and
to other industrial products. The automatically detected architecture
flaws and roots enabled the practitioners to precisely pinpoint,
visualize, and quantify the “hotspots" within the systems that are
responsible for high maintenance costs. Except for the two smallest
projects for which both architecture metrics indicated high maintainability,
all other projects are planning or have already begun
refactorings to address the problems detected by our analyses. We
are working on further automating the tool chain, and transforming
the analysis suite into deployable services accessible by all projects
within the company.
@InProceedings{ASE18p779,
author = {Ran Mo and Will Snipes and Yuanfang Cai and Srini Ramaswamy and Rick Kazman and Martin Naedele},
title = {Experiences Applying Automated Architecture Analysis Tool Suites},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {779--789},
doi = {10.1145/3238147.3240467},
year = {2018},
}
Publisher's Version
New Ideas Papers
Continuous Code Quality: Are We (Really) Doing That?
Carmine Vassallo, Fabio Palomba,
Alberto Bacchelli, and
Harald C. Gall
(University of Zurich, Switzerland)
Continuous Integration (CI) is a software engineering practice where developers constantly integrate their changes to a project through an automated build process. The goal of CI is to provide developers with prompt feedback on several quality dimensions after each change. Indeed, previous studies provided empirical evidence on a positive association between properly following CI principles and source code quality. A core principle behind CI is Continuous Code Quality (also known as CCQ, which includes automated testing and automated code inspection) may appear simple and effective, yet we know little about its practical adoption. In this paper, we propose a preliminary empirical investigation aimed at understanding how rigorously practitioners follow CCQ. Our study reveals a strong dichotomy between theory and practice: developers do not perform continuous inspection but rather control for quality only at the end of a sprint and most of the times only on the release branch. Preprint [https://doi.org/10.5281/zenodo.1341036]. Data and Materials [http://doi.org/10.5281/zenodo.1341015].
@InProceedings{ASE18p790,
author = {Carmine Vassallo and Fabio Palomba and Alberto Bacchelli and Harald C. Gall},
title = {Continuous Code Quality: Are We (Really) Doing That?},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {790--795},
doi = {10.1145/3238147.3240729},
year = {2018},
}
Publisher's Version
RUDSEA: Recommending Updates of Dockerfiles via Software Environment Analysis
Foyzul Hassan, Rodney Rodriguez, and Xiaoyin Wang
(University of Texas at San Antonio, USA)
Dockerfiles are configuration files of docker images which package all dependencies of a software to enable convenient software deployment and porting. In other words, dockerfiles list all environment assumptions of a software application's build and / or execution, so they need to be frequently updated when the environment assumptions change during fast software evolution. In this paper, we propose RUDSEA, a novel approach to recommend updates of dockerfiles to developers based on analyzing changes on software environment assumptions and their impacts. Our evaluation on 1,199 real-world instruction updates shows that RUDSEA can recommend correct update locations for 78.5% of the updates, and correct code changes for 44.1% of the updates.
@InProceedings{ASE18p796,
author = {Foyzul Hassan and Rodney Rodriguez and Xiaoyin Wang},
title = {RUDSEA: Recommending Updates of Dockerfiles via Software Environment Analysis},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {796--801},
doi = {10.1145/3238147.3240470},
year = {2018},
}
Publisher's Version
Delta Debugging Microservice Systems
Xiang Zhou,
Xin Peng, Tao Xie, Jun Sun, Wenhai Li, Chao Ji, and Dan Ding
(Fudan University, China; University of Illinois at Urbana-Champaign, USA; Singapore University of Technology and Design, Singapore)
Debugging microservice systems involves the deployment and manipulation
of microservice systems on a containerized environment
and faces unique challenges due to the high complexity and dynamism
of microservices. To address these challenges, in this paper,
we propose a debugging approach for microservice systems based
on the delta debugging algorithm, which is to minimize failureinducing
deltas of circumstances (e.g., deployment, environmental
configurations) for effective debugging. Our approach includes
novel techniques for defining, deploying/manipulating, and executing
deltas following the idea of delta debugging. In particular,
to construct a (failing) circumstance space for delta debugging to
minimize, our approach defines a set of dimensions that can affect
the execution of microservice systems. Our experimental study
on a medium-size microservice benchmark system shows that our
approach can effectively identify failure-inducing deltas that help
diagnose the root causes.
@InProceedings{ASE18p802,
author = {Xiang Zhou and Xin Peng and Tao Xie and Jun Sun and Wenhai Li and Chao Ji and Dan Ding},
title = {Delta Debugging Microservice Systems},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {802--807},
doi = {10.1145/3238147.3240730},
year = {2018},
}
Publisher's Version
Personalized Teammate Recommendation for Crowdsourced Software Developers
Luting Ye, Hailong Sun, Xu Wang, and Jiaruijue Wang
(Beihang University, China; Beijing Advanced Innovation Center for Big Data and Brain Computing, China)
Most crowdsourced software development platforms adopt contest paradigm to solicit contributions from the community. To attain competitiveness in complex tasks, crowdsourced software developers often choose to work with others collaboratively. However, existing crowdsourcing platforms generally assume independent contributions from developers and do not provide effective support for team formation. Prior studies
on team recommendation aim at optimizing task outcomes by recommending the most suitable team for a task instead of finding appropriate collaborators for a specific person. In this work, we are concerned with teammate recommendation for crowdsourcing developers. First, we present the results of an empirical study of Kaggle, which shows that developers’personal teammate preferences are mainly affected by three
factors. Second, we give a collaboration willingness model to characterize developers’ teammate preferences and formulate teammate recommendation as an optimization problem. Then we design a heuristic algorithm to find suitable teammates for a developer. Finally, we have conducted a set of experiments on a Kaggle dataset to evaluate the effectiveness of our approach.
@InProceedings{ASE18p808,
author = {Luting Ye and Hailong Sun and Xu Wang and Jiaruijue Wang},
title = {Personalized Teammate Recommendation for Crowdsourced Software Developers},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {808--813},
doi = {10.1145/3238147.3240472},
year = {2018},
}
Publisher's Version
S-gram: Towards Semantic-Aware Security Auditing for Ethereum Smart Contracts
Han Liu, Chao Liu, Wenqi Zhao, Yu Jiang, and Jiaguang Sun
(Tsinghua University, China; Peking University, China; Ant Financial, China)
Smart contracts, as a promising and powerful application on the
Ethereum blockchain, have been growing rapidly in the past few
years. Since they are highly vulnerable to different forms of attacks,
their security becomes a top priority. However, existing security auditing
techniques are either limited in fnding vulnerabilities (rely on pre-defned bug paterns)
or very expensive (rely on program analysis), thus are insufcient for Ethereum.
To mitigate these limitations, we proposed a novel semanticaware security auditing
technique called S-gram for Ethereum. The
key insight is a combination of N-gram language modeling and
lightweight static semantic labeling, which can learn statistical regularities of
contract tokens and capture high-level semantics as
well (e.g., flow sensitivity of a transaction). S-gram can be used
to predict potential vulnerabilities by identifying irregular token
sequences and optimize existing in-depth analyzers (e.g., symbolic
execution engines, fuzzers etc.). We have implemented S-gram for
Solidity smart contracts in Ethereum. The evaluation demonstrated
the potential of S-gram in identifying possible security issues.
@InProceedings{ASE18p814,
author = {Han Liu and Chao Liu and Wenqi Zhao and Yu Jiang and Jiaguang Sun},
title = {S-gram: Towards Semantic-Aware Security Auditing for Ethereum Smart Contracts},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {814--819},
doi = {10.1145/3238147.3240728},
year = {2018},
}
Publisher's Version
An Evolutionary Approach for Analyzing Alloy Specifications
Jianghao Wang,
Hamid Bagheri, and Myra B. Cohen
(University of Nebraska-Lincoln, USA)
Formal methods use mathematical notations and logical reasoning to precisely define a program's specifications, from which we can instantiate valid instances of a system. With these techniques we can perform a multitude of tasks to check system dependability. Despite the existence of many automated tools including ones considered lightweight, they still lack a strong adoption in practice. At the crux of this problem, is scalability and applicability to large real world applications. In this paper we show how to relax the completeness guarantee without much loss, since soundness is maintained. We have extended a popular lightweight analysis, Alloy, with a genetic algorithm. Our new tool, EvoAlloy, works at the level of finite relations generated by Kodkod and evolves the chromosomes based on the failed constraints. In a feasibility study we demonstrate that we can find solutions to a set of specifications beyond the scope where traditional Alloy fails. While small specifications take longer with EvoAlloy, the scalability means we can handle larger specifications. Our future vision is that when specifications are small we can maintain both soundness and completeness, but when this fails, EvoAlloy can switch to its genetic algorithm.
@InProceedings{ASE18p820,
author = {Jianghao Wang and Hamid Bagheri and Myra B. Cohen},
title = {An Evolutionary Approach for Analyzing Alloy Specifications},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {820--825},
doi = {10.1145/3238147.3240468},
year = {2018},
}
Publisher's Version
A Neural Framework for Retrieval and Summarization of Source Code
Qingying Chen and
Minghui Zhou
(Peking University, China)
Code retrieval and summarization are two tasks often employed by software developers to reuse code that spreads over online repositories. In this paper, we present a neural framework that allows bidirectional mapping between source code and natural language to improve these two tasks. Our framework, BVAE, is designed to have two Variational AutoEncoders (VAEs) to model bimodal data: C-VAE for source code and L-VAE for natural language. Both VAEs are trained jointly to reconstruct their input as much as possible with regularization that captures the closeness between the latent variables of code and description. BVAE could learn semantic vector representations for both code and description and generate completely new descriptions for arbitrary code snippets. We design two instance models of BVAE for retrieval and summarization tasks respectively and evaluate their performance on a benchmark which involves two programming languages: C# and SQL. Experiments demonstrate BVAE’s potential on the two tasks.
@InProceedings{ASE18p826,
author = {Qingying Chen and Minghui Zhou},
title = {A Neural Framework for Retrieval and Summarization of Source Code},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {826--831},
doi = {10.1145/3238147.3240471},
year = {2018},
}
Publisher's Version
An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation
Michele Tufano, Cody Watson,
Gabriele Bavota, Massimiliano Di Penta, Martin White, and
Denys Poshyvanyk
(College of William and Mary, USA; University of Lugano, Switzerland; University of Sannio, Italy)
Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. We mine millions of bug-fixes from the change histories of GitHub repositories to extract meaningful examples of such bug-fixes. Then, we abstract the buggy and corresponding fixed code, and use them to train an Encoder-Decoder model able to translate buggy code into its fixed version. Our model is able to fix hundreds of unique buggy methods in the wild. Overall, this model is capable of predicting fixed patches generated by developers in 9% of the cases.
@InProceedings{ASE18p832,
author = {Michele Tufano and Cody Watson and Gabriele Bavota and Massimiliano Di Penta and Martin White and Denys Poshyvanyk},
title = {An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {832--837},
doi = {10.1145/3238147.3240732},
year = {2018},
}
Publisher's Version
Loop Path Reduction by State Pruning
Jianxiong Gao and Steven S. Lumetta
(University of Illinois at Urbana-Champaign, USA)
Path explosion has been a problem for symbolic execution for a long time. The key to avoid path explosion is to limit the number of paths generated within loops while maintaining high code coverage. Full symbolic execution creates paths for every possible execution path. Frequently, paths within loops do not contribute to code coverage. Branches within loops may generate new states at every iteration. The path explosion problem created by loops often stops symbolic execution to reach deeper parts of the code.
In this paper, we propose a new path exploration method that reduces the number of states needed to achieve high coverage. Our algorithm limits the number of new states created by first prioritizing states, and then pruning the states that do not contribute to code coverage. Our algorithm does not require loop invariant inference/loop summarization, nor does it bound the number of iterations of loop exploration. The proposed algorithm can thus handle a broader set of loops than previous approaches. In fact, our algorithm is orthogonal to loop summarization techniques and search-guide heuristics, so it complements the current methods.
We have implemented our algorithm using KLEE and tested with 235 student-generated versions of a classroom assignment. Our results show that our algorithm helps to achieve the same coverage with speedup of 11.8× for 117 out of the 235 programs, while adding 15% max observed and 2% average overhead over the 50% of programs not benefiting from the technique. The maximum speedup for a single program is 52.3×.
@InProceedings{ASE18p838,
author = {Jianxiong Gao and Steven S. Lumetta},
title = {Loop Path Reduction by State Pruning},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {838--843},
doi = {10.1145/3238147.3240731},
year = {2018},
}
Publisher's Version
node2defect: Using Network Embedding to Improve Software Defect Prediction
Yu Qu, Ting Liu, Jianlei Chi, Yangxu Jin, Di Cui, Ancheng He, and Qinghua Zheng
(Xi'an Jiaotong University, China)
Network measures have been proved to be useful in predicting software defects. Leveraging the dependency relationships between software modules, network measures can capture various structural features of software systems. However, existing studies have relied on user-defined network measures (e.g., degree statistics or centrality metrics), which are inflexible and require high computation cost, to describe the structural features. In this paper, we propose a new method called node2defect which uses a newly proposed network embedding technique, node2vec, to automatically learn to encode dependency network structure into low-dimensional vector spaces to improve software defect prediction. Specifically, we firstly construct a program's Class Dependency Network. Then node2vec is used to automatically learn structural features of the network. After that, we combine the learned features with traditional software engineering features, for accurate defect prediction. We evaluate our method on 15 open source programs. The experimental results show that in average, node2defect improves the state-of-the-art approach by 9.15% in terms of F-measure.
@InProceedings{ASE18p844,
author = {Yu Qu and Ting Liu and Jianlei Chi and Yangxu Jin and Di Cui and Ancheng He and Qinghua Zheng},
title = {node2defect: Using Network Embedding to Improve Software Defect Prediction},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {844--849},
doi = {10.1145/3238147.3240469},
year = {2018},
}
Publisher's Version
Towards Automating Disambiguation of Regulations: Using the Wisdom of Crowds
Manasi Patwardhan, Abhishek Sainani, Richa Sharma, Shirish Karande, and Smita Ghaisas
(TCS Research, India)
Compliant software is a critical need of all modern businesses. Disambiguating regulations to derive requirements is therefore an important software engineering activity. Regulations however are ridden with ambiguities that make their comprehension a challenge, seemingly surmountable only by legal experts. Since legal experts' involvement in every project is expensive, approaches to automate the disambiguation need to be explored. These approaches however require a large amount of annotated data. Collecting data exclusively from experts is not a scalable and affordable solution. In this paper, we present the results of a crowd sourcing experiment to collect annotations on ambiguities in regulations from professional software engineers. We discuss an approach to automate the arduous and critical step of identifying ground truth labels by employing crowd consensus using Expectation Maximization (EM). We demonstrate that the annotations reaching a consensus match those of experts with an accuracy of 87%.
@InProceedings{ASE18p850,
author = {Manasi Patwardhan and Abhishek Sainani and Richa Sharma and Shirish Karande and Smita Ghaisas},
title = {Towards Automating Disambiguation of Regulations: Using the Wisdom of Crowds},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {850--855},
doi = {10.1145/3238147.3240727},
year = {2018},
}
Publisher's Version
Tool Demonstrations
jStanley: Placing a Green Thumb on Java Collections
Rui Pereira, Pedro Simão, Jácome Cunha, and
João Saraiva
(INESC TEC, Portugal; University of Minho, Portugal; NOVA-LINCS, Portugal; Universidade Nova Lisboa, Portugal)
Software developers are more and more eager to understand their code’s energy performance. However, even with such knowledge it is di cult to know how to improve the code. Indeed, little tool support exists to understand the energy consumption pro le of a software system and to eventually (automatically) improve its code.
In this paper we present a tool termed jStanley which automatically nds collections in Java programs that can be replaced by others with a positive impact on the energy consumption as well as on the execution time. In seconds, developers obtain information about energy-eager collection usage. jStanley will further suggest alternative collections to improve the code, making it use less time, energy, or a combination of both. The preliminary evaluation we ran using jStanley shows energy gains between 2% and 17%, and a reduction in execution time between 2% and 13%.
A video can be seen at https://greensoftwarelab.github.io/jStanley.
@InProceedings{ASE18p856,
author = {Rui Pereira and Pedro Simão and Jácome Cunha and João Saraiva},
title = {jStanley: Placing a Green Thumb on Java Collections},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {856--859},
doi = {10.1145/3238147.3240473},
year = {2018},
}
Publisher's Version
Info
SRCIROR: A Toolset for Mutation Testing of C Source Code and LLVM Intermediate Representation
Farah Hariri and August Shi
(University of Illinois at Urbana-Champaign, USA)
We present SRCIROR (pronounced “sorcerer“), a toolset for performing
mutation testing at the levels of C/C++ source code (SRC)
and the LLVM compiler intermediate representation (IR). At the
SRC level, SRCIROR identifies program constructs for mutation by
pattern-matching on the Clang AST. At the IR level, SRCIROR directly
mutates the LLVM IR instructions through LLVM passes.
Our implementation enables SRCIROR to (1) handle any program
that Clang can handle, extending to large programs with a minimal
overhead, and (2) have a small percentage of invalid mutants
that do not compile. SRCIROR enables performing mutation testing
using the same classes of mutation operators at both the SRC and
IR levels, and it is easily extensible to support more operators. In
addition, SRCIROR can collect coverage to generate mutants only
for covered code elements. Our tool is publicly available on GitHub
(https://github.com/TestingResearchIllinois/srciror). We evaluate SRCIROR
on Coreutils subjects. Our evaluation shows interesting differences
between SRC and IR, demonstrating the value of SRCIROR
in enabling mutation testing research across different levels of code
representation.
@InProceedings{ASE18p860,
author = {Farah Hariri and August Shi},
title = {SRCIROR: A Toolset for Mutation Testing of C Source Code and LLVM Intermediate Representation},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {860--863},
doi = {10.1145/3238147.3240482},
year = {2018},
}
Publisher's Version
Lightweight Source Code Monitoring with Triggr
Alim Ozdemir,
Ayse Tosun, Hakan Erdogmus, and
Rui Abreu
(Istanbul Technical University, Turkey; Carnegie Mellon University, USA; University of Lisbon, Portugal)
Existing tools for monitoring the quality of codebases modified
by multiple developers tend to be centralized and inflexible. These
tools increase the visibility of quality by producing effective reports
and visualizations when a change is made to the codebase
and triggering alerts when undesirable situations occur. However,
their configuration is invariably both (a) centrally managed in
that individual maintainers cannot define local rules to receive
customized feedback when a change occurs in a specific part of
the code in which they are particularly interested, and (b) coarse-grained
in that analyses cannot be turned on and off below the
file level. Triggr, the tool proposed in this paper, addresses these
limitations by allowing distributed, customized, and fine-grained
monitoring. It is a lightweight re-implementation of our previous
tool, CodeAware, which adopts the same paradigm. The tool
listens on a codebase’s shared repository using an event-based
approach, and can send alerts to subscribed developers based on
rules defined locally by them. Triggr is open-source and available
at https://github.com/lyzerk/Triggr. A demonstration video can be
found at https://youtu.be/qQs9aDwXJjY.
@InProceedings{ASE18p864,
author = {Alim Ozdemir and Ayse Tosun and Hakan Erdogmus and Rui Abreu},
title = {Lightweight Source Code Monitoring with Triggr},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {864--867},
doi = {10.1145/3238147.3240486},
year = {2018},
}
Publisher's Version
OCELOT: A Search-Based Test-Data Generation Tool for C
Simone Scalabrino,
Giovanni Grano, Dario Di Nucci, Michele Guerra,
Andrea De Lucia,
Harald C. Gall, and
Rocco Oliveto
(University of Molise, Italy; University of Zurich, Switzerland; Vrije Universiteit Brussel, Belgium; University of Salerno, Italy)
Automatically generating test cases plays an important role to reduce the time spent by developers during the testing phase.
In last years, several approaches have been proposed to tackle such a problem: amongst others, search-based techniques have been shown to be particularly promising.
In this paper we describe Ocelot, a search-based tool for the automatic generation of test cases in C. Ocelot allows practitioners to write skeletons of test cases for their programs and researchers to easily implement and experiment new approaches for automatic test-data generation. We show that Ocelot achieves a higher coverage compared to a competitive tool in 81% of the cases. Ocelot is publicly available to support both researchers and practitioners.
@InProceedings{ASE18p868,
author = {Simone Scalabrino and Giovanni Grano and Dario Di Nucci and Michele Guerra and Andrea De Lucia and Harald C. Gall and Rocco Oliveto},
title = {OCELOT: A Search-Based Test-Data Generation Tool for C},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {868--871},
doi = {10.1145/3238147.3240477},
year = {2018},
}
Publisher's Version
Live Versioning of Web Applications through Refactoring
Julián Grigera, Juan Cruz Gardey, Alejandra Garrido, and Gustavo Rossi
(National University of La Plata, Argentina; CIC, Argentina; CONICET, Argentina)
Client-Side Web Refactorings (CSWRs) allow improving interaction aspects of web applications by applying changes to the user interface without altering the code base, even in production settings. However, developers are not always willing, or even allowed to apply external adaptations to their applications’ user interface. Besides, CSWRs do not guarantee improvements in all contexts, so it may be unwise to install them in a production version. We propose a tool that allows creating private versions of a running web application almost automatically. Using this tool, developers or usability experts can easily combine CSWRs to create alternative versions of web applications, without the need of creating a cloned sandbox environment for each version. This yields many uses, such as quickly setting up user tests, showing live alternatives to Product Owners, and even performing A/B testing. The tool is built on top of Kobold, a service that allows applying CSWRs to fix usability smells. Kobold with versioning is available at: https://autorefactoring.lifia.info.unlp.edu.ar. A screencast of the tool is available at https://youtu.be/LVc3BOtVP3I.
@InProceedings{ASE18p872,
author = {Julián Grigera and Juan Cruz Gardey and Alejandra Garrido and Gustavo Rossi},
title = {Live Versioning of Web Applications through Refactoring},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {872--875},
doi = {10.1145/3238147.3240483},
year = {2018},
}
Publisher's Version
Video
code_call_lens: Raising the Developer Awareness of Critical Code
Andrea Janes, Michael Mairegger, and Barbara Russo
(Free University of Bolzano, Italy)
As a developer, it is often complex to foresee the impact of changes in source code on usage, e.g., it is time-consuming to find out all components that will be impacted by a change or estimate the impact on the usability of a failing piece of code. It is therefore hard to decide how much effort in quality assurance is justifiable to obtain the desired business goals. In this paper, to reduce the difficulty for developers to understand the importance of source code, we propose an automated way to provide this information to developers as they are working on a given piece of code. As a proof-of-concept, we developed a plug-in for Microsoft Visual Studio Code that informs about the importance of source code methods based on the frequency of usage by the end-users of the developed software. The plug-in aims to increase the awareness developers have about the importance of source code in an unobtrusive way, helping them to prioritize their effort to quality assurance, technical excellence, and usability. code_call_lens can be downloaded from GitHub at https://github.com/xxMUROxx/vscode.code_call_lens.
@InProceedings{ASE18p876,
author = {Andrea Janes and Michael Mairegger and Barbara Russo},
title = {code_call_lens: Raising the Developer Awareness of Critical Code},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {876--879},
doi = {10.1145/3238147.3240488},
year = {2018},
}
Publisher's Version
Code2graph: Automatic Generation of Static Call Graphs for Python Source Code
Gharib Gharibi, Rashmi Tripathi, and Yugyung Lee
(University of Missouri-Kansas City, USA)
A static call graph is an imperative prerequisite used in most interprocedural analyses and software comprehension tools. However, there is a lack of software tools that can automatically analyze the Python source-code and construct its static call graph. In this paper, we introduce a prototype Python tool, named code2graph, which automates the tasks of (1) analyzing the Python source-code and extracting its structure, (2) constructing static call graphs from the source code, and (3) generating a similarity matrix of all possible execution paths in the system. Our goal is twofold: First, assist the developers in understanding the overall structure of the system. Second, provide a stepping stone for further research that can utilize the tool in software searching and similarity detection applications. For example, clustering the execution paths into a logical workflow of the system would be applied to automate specific software tasks. Code2graph has been successfully used to generate static call graphs and similarity matrices of the paths for three popular open-source Deep Learning projects (TensorFlow, Keras, PyTorch). A tool demo is available at https://youtu.be/ecctePpcAKU.
@InProceedings{ASE18p880,
author = {Gharib Gharibi and Rashmi Tripathi and Yugyung Lee},
title = {Code2graph: Automatic Generation of Static Call Graphs for Python Source Code},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {880--883},
doi = {10.1145/3238147.3240484},
year = {2018},
}
Publisher's Version
Video
The Electrum Analyzer: Model Checking Relational First-Order Temporal Specifications
Julien Brunel, David Chemouil,
Alcino Cunha, and Nuno Macedo
(ONERA, France; University of Toulouse, France; INESC TEC, Portugal; University of Minho, Portugal)
This paper presents the Electrum Analyzer, a free-software tool to validate and perform model checking of Electrum specifications. Electrum is an extension of Alloy that enriches its relational logic with LTL operators, thus simplifying the specification of dynamic systems. The Analyzer supports both automatic bounded model checking, with an encoding into SAT, and unbounded model checking, with an encoding into SMV. Instance, or counter-example, traces are presented back to the user in a unified visualizer. Features to speed up model checking are offered, including a decomposed parallel solving strategy and the extraction of symbolic bounds.
@InProceedings{ASE18p884,
author = {Julien Brunel and David Chemouil and Alcino Cunha and Nuno Macedo},
title = {The Electrum Analyzer: Model Checking Relational First-Order Temporal Specifications},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {884--887},
doi = {10.1145/3238147.3240475},
year = {2018},
}
Publisher's Version
Video
Info
ESBMC 5.0: An Industrial-Strength C Model Checker
Mikhail R. Gadelha, Felipe R. Monteiro, Jeremy Morse, Lucas C. Cordeiro, Bernd Fischer, and Denis A. Nicole
(University of Southampton, UK; Federal University of Amazonas, Brazil; University of Bristol, UK; University of Manchester, UK; Stellenbosch University, South Africa)
ESBMC is a mature, permissively licensed open-source context-bounded model checker for the verification of single- and multi-threaded C programs. It can verify both predefined safety properties (e.g., bounds check, pointer safety, overflow) and user-defined program assertions automatically. ESBMC provides C++ and Python APIs to access internal data structures, allowing inspection and extension at any stage of the verification process. We discuss improvements over previous versions of ESBMC, including the description of new front- and back-ends, IEEE floating-point support, and an improved k-induction algorithm. A demonstration is available at https://www.youtube.com/watch?v=YcJjXHlN1v8.
@InProceedings{ASE18p888,
author = {Mikhail R. Gadelha and Felipe R. Monteiro and Jeremy Morse and Lucas C. Cordeiro and Bernd Fischer and Denis A. Nicole},
title = {ESBMC 5.0: An Industrial-Strength C Model Checker},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {888--891},
doi = {10.1145/3238147.3240481},
year = {2018},
}
Publisher's Version
Video
Info
L-CMP: An Automatic Learning-Based Parameterized Verification Tool
Jialun Cao,
Yongjian Li, and Jun Pang
(Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; University of Luxembourg, Luxembourg)
This demo introduces L-CMP, an automatic learning-based parameterized verification tool. It can verify parameterized protocols by combining machine learning and model checking techniques. Given a parameterized protocol, L-CMP learns a set of auxiliary invariants and implements verification of the protocol using the invariants automatically. In particular, the learned auxiliary invariants are straightforward and readable. The experimental results show that L-CMP can successfully verify a number of cache coherence protocols, including the industrial-scale FLASH protocol.
The video is available at https://youtu.be/6Dl2HiiiS4E, and L-CMPL-CMP can be downloaded at https://github.com/ ArabelaTso/Learning-Based-ParaVerifer.
@InProceedings{ASE18p892,
author = {Jialun Cao and Yongjian Li and Jun Pang},
title = {L-CMP: An Automatic Learning-Based Parameterized Verification Tool},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {892--895},
doi = {10.1145/3238147.3240487},
year = {2018},
}
Publisher's Version
VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary
Jian Gao, Xin Yang, Ying Fu,
Yu Jiang, and Jiaguang Sun
(Tsinghua University, China)
Code reuse improves software development efficiency, however, vulnerabilities can be introduced inadvertently. Many existing works compute the code similarity based on CFGs to determine whether
a binary function contains a known vulnerability. Unfortunately, their performance in cross-platform binary search is challenged.
This paper presents VulSeeker, a semantic learning based vulnerability seeker for cross-platform binary. Given a target function and a vulnerable function, VulSeeker first constructs the labeled
semantic flow graphs and extracts basic block features as numerical vectors for both of them. Then the embedding vector of the whole binary function is generated by feeding the numerical vectors of basic blocks to the customized semantics aware DNN model. Finally, the similarity of the two binary functions is measured based on the Cosine distance. The experimental results show that VulSeeker outperforms the state-of-the-art approaches in terms of accuracy. For example, compared to the most recent and related work Gemini, VulSeeker finds 50.00% more vulnerabilities in the top-10 candidates and 13.89% more in the top-50 candidates, and improves the values of AUC and ACC for 8.23% and 12.14% respectively. The video is presented at https://youtu.be/Mw0mr84gpI8.
@InProceedings{ASE18p896,
author = {Jian Gao and Xin Yang and Ying Fu and Yu Jiang and Jiaguang Sun},
title = {VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {896--899},
doi = {10.1145/3238147.3240480},
year = {2018},
}
Publisher's Version
CPA-SymExec: Efficient Symbolic Execution in CPAchecker
Dirk Beyer and Thomas Lemberger
(LMU Munich, Germany)
We present CPA-SymExec, a tool for symbolic execution that is
implemented in the open-source, configurable verification framework CPAchecker.
Our implementation automatically detects which symbolic facts to track,
in order to obtain a small set of constraints that are necessary
to decide reachability of a program area of interest.
CPA-SymExec is based on abstraction
and counterexample-guided abstraction refinement (CEGAR),
and uses a constraint-interpolation approach to detect symbolic facts.
We show that our implementation can better mitigate the path-explosion problem
than symbolic execution without abstraction, by comparing the performance to
the state-of-the-art Klee-based symbolic-execution engine Symbiotic
and to Klee itself. For the experiments we use two kinds of
analysis tasks: one for finding an executable path to a specific
location of interest (e.g., if a test vector is desired to show that
a certain behavior occurs), and one for confirming that no executable path
to a specific location exists (e.g., if it is desired to show
that a certain behavior never occurs).
CPA-SymExec is released under the Apache 2 license
and available (inclusive source code) at https://cpachecker.sosy-lab.org.
A demonstration video is available at https://youtu.be/qoBHtvPKtnw.
@InProceedings{ASE18p900,
author = {Dirk Beyer and Thomas Lemberger},
title = {CPA-SymExec: Efficient Symbolic Execution in CPAchecker},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {900--903},
doi = {10.1145/3238147.3240478},
year = {2018},
}
Publisher's Version
Published Artifact
Video
Info
Artifacts Available
CANAL: A Cache Timing Analysis Framework via LLVM Transformation
Chungha Sung, Brandon Paulsen, and Chao Wang
(University of Southern California, USA)
A unified modeling framework for non-functional properties of a program is essential for research in software analysis and verification, since it reduces burdens on individual researchers to implement new approaches and compare existing approaches. We present CANAL, a framework that models the cache behaviors of a program by transforming its intermediate representation in the LLVM compiler. CANAL inserts auxiliary variables and instructions over these variables, to allow standard verification tools to handle a new class of cache related properties, e.g., for computing the worst-case execution time and detecting side-channel leaks.
We demonstrate the effectiveness of using three verification tools: KLEE, SMACK and Crab-llvm. We confirm the accuracy of our cache model by comparing with CPU cycle-accurate simulation results of GEM5.
CANAL is available on GitHub(https://github.com/canalcache/canal) and YouTube(https://youtu.be/JDou3F1j2nY).
@InProceedings{ASE18p904,
author = {Chungha Sung and Brandon Paulsen and Chao Wang},
title = {CANAL: A Cache Timing Analysis Framework via LLVM Transformation},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {904--907},
doi = {10.1145/3238147.3240485},
year = {2018},
}
Publisher's Version
Video
Info
Descartes: A PITest Engine to Detect Pseudo-Tested Methods: Tool Demonstration
Oscar Luis Vera-Pérez, Martin Monperrus, and
Benoit Baudry
(Inria, France; KTH, Sweden)
Descartes is a tool that implements extreme mutation operators and aims at finding pseudo-tested methods in Java projects. It leverages the efficient transformation and runtime features of PITest. The demonstration compares Descartes with Gregor, the default mutation engine provided by PITest, in a set of real open source projects. It considers the execution time, number of mutants created and the relationship between the mutation scores produced by both engines. It provides some insights on the main features exposed byDescartes.
@InProceedings{ASE18p908,
author = {Oscar Luis Vera-Pérez and Martin Monperrus and Benoit Baudry},
title = {Descartes: A PITest Engine to Detect Pseudo-Tested Methods: Tool Demonstration},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {908--911},
doi = {10.1145/3238147.3240474},
year = {2018},
}
Publisher's Version
Info
DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-Value Stores
Mohammad Roohitavaf and Sandeep Kulkarni
(Michigan State University, USA)
We present our framework DKVF that enables one to quickly prototype and evaluate new consistency protocols for key-value stores. DKVF is designed based on the separation of concerns in creating distributed data stores. This separation of concerns allows the designers of consistency protocols to only focus on the high-level consistency protocols which gives them the opportunity to quickly deploy a consistency protocol and evaluate its performance. Moreover, the loose coupling of the different components allows us to easily change different components (e.g. storage engine) of an implementation. We demonstrate DKVF by implementing four existing protocols --eventual consistency, COPS, GentleRain and CausalSpartan-- with it. The implementation of these protocols was very convenient with DKVF, as it only required to write a piece of code for the consistency component that is very close to the pseudocode of the original papers. Hence, it was possible to achieve this in just 1-2 days per protocol. DKVF also comes with a toolset that facilitates running clusters and performing experiments. Tutorial video: https://www.youtube.com/watch?v=MFJQzsJkwfc&list=PLErtSVEHsnBJvoQQI6iqGn61oNrUVVuST
@InProceedings{ASE18p912,
author = {Mohammad Roohitavaf and Sandeep Kulkarni},
title = {DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-Value Stores},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {912--915},
doi = {10.1145/3238147.3240476},
year = {2018},
}
Publisher's Version
Video
Info
DroidMate-2: A Platform for Android Test Generation
Nataniel P. Borges Jr., Jenny Hotzkow, and Andreas Zeller
(Saarland University, Germany)
Android applications (apps) represent an ever increasing portion of the software market. Automated test input generators are the state of the art for testing and security analysis. We introduce DroidMate-2 (DM-2), a platform to easily assist both developers and researchers to customize, develop and test new test generators. DM-2 can be used without app instrumentation or operating system modifications, as a test generator on real devices and emulators for app testing or regression testing. Additionally, it provides sensitive resource monitoring or blocking capabilities through a lightweight app instrumentation, out-of-thebox statement coverage measurement through a fully-fledged app instrumentation and native experiment reproducibility. In our experiments we compared DM-2 against DroidBot, a state-of-the-art test generator by measuring statement coverage. Our results show that DM-2 reached 96% of its peak coverage in less than 2/3 of the time needed by DroidBot, allowing for better and more efficient tests. On short runs (5 minutes) DM-2 outperformed DroidBot by 7% while in longer runs (1 hour) this difference increases to 8%.
ACM DL Artifact: https://www.doi.org/10.1145/3264864
For the details see: https://github.com/uds-se/droidmate/wiki/ASE-2018:-Data
@InProceedings{ASE18p916,
author = {Nataniel P. Borges Jr. and Jenny Hotzkow and Andreas Zeller},
title = {DroidMate-2: A Platform for Android Test Generation},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {916--919},
doi = {10.1145/3238147.3240479},
year = {2018},
}
Publisher's Version
Published Artifact
Info
Artifacts Available
Doctoral Symposium
Session 1
Assessing and Evaluating Functional Suitability of Software
Philipp Haindl
(JKU Linz, Austria)
While formal task models allow definition and time-based assessment of user-interaction, they have not yet been used as a baseline for assessing the variability patterns of user interaction with software. Consequently, operationalizing these variability patterns enables us to evaluate how suitable the defined task-execution paths are for the user to achieve a predefined goal. Improvements of the software could thereby be evidence-based on knowledge of changes' effects on functional suitability for the user rather than on prospective or opinion-based usage scenarios. Following a design thinking approach tailored to software engineering, understanding and observing these variability patterns are mandatory steps for continuous user-centered improvement of software. In practice however, due to the absence of an operational quality model for functional suitability it is hardly possible to effectively put these operational measures into context and to derive concrete software improvement actions. Having operational suitability metrics is even more important for software engineers as it allows to increasingly focus development activities especially on functionality which has business value for the user.
@InProceedings{ASE18p920,
author = {Philipp Haindl},
title = {Assessing and Evaluating Functional Suitability of Software},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {920--923},
doi = {10.1145/3238147.3241531},
year = {2018},
}
Publisher's Version
Session 2
Automatic Mining of Constraints for Monitoring Systems of Systems
Thomas Krismayer
(JKU Linz, Austria)
The behavior of complex software-intensive systems of systems often only fully emerges during operation, when all systems interact with each other and with their environment. Runtime monitoring approaches are thus used to detect deviations from the expected behavior, which is commonly defined by engineers, e.g., using temporal logic or domain-specific languages. However, the deep domain knowledge required to specify constraints is often not available during the development of systems of systems with multiple teams independently working on heterogeneous components. In this paper, we thus describe our ongoing PhD research to automatically mine constraints for runtime monitoring from recorded events. Our approach mines constraints on event occurrence, timing, data, and combinations of these properties. The approach further presents the mined constraints to users offering multiple ranking strategies and can also be used to support users in system evolution scenarios.
@InProceedings{ASE18p924,
author = {Thomas Krismayer},
title = {Automatic Mining of Constraints for Monitoring Systems of Systems},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {924--927},
doi = {10.1145/3238147.3241532},
year = {2018},
}
Publisher's Version
Towards Automatic Restrictification of CUDA Kernel Arguments
Rokiatou Diarra
(University of Paris-Sud, France)
Many procedural languages, such as C and C++, have pointers. Pointers are powerful and convenient, but pointer aliasing still hinders compiler optimizations, despite several years of research on pointer aliasing analysis. Because alias analysis is a difficult task and results are not always accurate, the ISO C standard 99 has added a keyword, named restrict to allow the programmer to specify non−aliasing as an aid to the compiler′s optimizer and to thereby possibly improve performance. The task of annotating pointers with the restrict keyword is still left to the programmer. This task is, in general, tedious and prone to errors especially since the C does not perform any verification to ensure that restrict keyword is not misplaced. In this paper we present a static analysis tool that (i) finds CUDA kernels call sites in which actual parameters do not alias; (ii) clones the kernels called at such sites; (iii) after performing an alias analysis in these kernels, adds the restrict keyword to their arguments; and (iv) replaces the original kernel call by a call to the optimized clone whenever possible.
@InProceedings{ASE18p928,
author = {Rokiatou Diarra},
title = {Towards Automatic Restrictification of CUDA Kernel Arguments},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {928--931},
doi = {10.1145/3238147.3241533},
year = {2018},
}
Publisher's Version
A DSL for Requirements in the Context of a Seamless Approach
Florian Galinier
(IRIT, France; University of Toulouse, France)
Reducing the lack of consistency between requirements and the system that should satisfy these requirements is one of the major issue in Requirement Engineering (RE).
The objective of my thesis work is to propose a seamless approach, allowing users to express requirements, specifications and the system itself in a unique language.
The purpose of formal approaches is to reduce inconsistency.
However, most developers are not familiar with these approaches, and they are not often used outside the critical systems domain.
Since we want that non-experts can also use our approach to validate systems in the early stage of their development, we propose a Domain Specific Language (DSL) that is: (i) close to natural language, and (ii) based on a formal semantics.
Using Model-Driven Engineering (MDE), this language bridges the gap not only between the several stakeholders that can be involved in a project, considering their different backgrounds, but also between the requirements and the code.
@InProceedings{ASE18p932,
author = {Florian Galinier},
title = {A DSL for Requirements in the Context of a Seamless Approach},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {932--935},
doi = {10.1145/3238147.3241538},
year = {2018},
}
Publisher's Version
Session 3
A Multi-objective Framework for Effective Performance Fault Injection in Distributed Systems
Luca Traini
(University of L'Aquila, Italy)
Modern distributed systems should be built to anticipate performance degradation. Often requests in these systems involve ten to thousands Remote Procedure Calls, each of which can be a source of performance degradation. The PhD programme presented here intends to address this issue by providing automated instruments to effectively drive performance fault injection in distributed systems. The envisioned approach exploits multi-objective search-based techniques to automatically find small combinations of tiny performance degradations induced by specific RPCs,which have significant impacts on the user-perceived performance. Automating the search of these events will improve the ability to inject performance issues in production in order to force developers to anticipate and mitigate them.
@InProceedings{ASE18p936,
author = {Luca Traini},
title = {A Multi-objective Framework for Effective Performance Fault Injection in Distributed Systems},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {936--939},
doi = {10.1145/3238147.3241535},
year = {2018},
}
Publisher's Version
Top-Down Model-Driven Engineering of Web Services from Extended OpenAPI Models
David Sferruzza
(LS2N, France)
Web services engineering is a crucial subject, because web services are often built to be used by other programs; thus they should have a good documentation targeting developers. Furthermore, when building a digital product, engineers need to build several programs that interact with a central instance of web services. OpenAPI, a popular industry standard, makes possible to document web services in order to quickly make a prototype of the product. It allows a top-down process where developers iterate to build an OpenAPI model that describes the web services they want, and then implement both the web services and the programs that will consume them. However, when doing such rapid prototyping, developers tend to either skip this design phase and implement web services right away, or stop updating the OpenAPI model when the product is released; in both cases they cannot take advantage of having an OpenAPI model aligned with the implementation. We show how OpenAPI can be extended to add implementation details inside models. These extensions link services to assemblies of components that describe computations. Hence a top-down development process that keeps model and implementation aligned. Moreover, this makes possible for developers to benefit from more support features while keeping the same flexibility.
@InProceedings{ASE18p940,
author = {David Sferruzza},
title = {Top-Down Model-Driven Engineering of Web Services from Extended OpenAPI Models},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {940--943},
doi = {10.1145/3238147.3241536},
year = {2018},
}
Publisher's Version
Differential Program Analysis with Fuzzing and Symbolic Execution
Yannic Noller
(Humboldt-Universität zu Berlin, Germany)
Differential program analysis means to identify the behavioral divergences in one or multiple programs, and it can be classified into two categories: identify the behavioral divergences (1) between two program versions for the same input (aka regression analysis), and (2) for the same program with two different inputs (e.g, side-channel analysis). Most of the existent approaches for both subproblems try to solve it with single techniques, which suffer from its weaknesses like scalability issues or imprecision. This research proposes to combine two very strong techniques, namely fuzzing and symbolic execution to tackle these problems and provide scalable solutions for real-world applications. The proposed approaches will be implemented on top of state-of-the-art tools like AFL and Symbolic PathFinder to evaluate them against existent work.
@InProceedings{ASE18p944,
author = {Yannic Noller},
title = {Differential Program Analysis with Fuzzing and Symbolic Execution},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {944--947},
doi = {10.1145/3238147.3241537},
year = {2018},
}
Publisher's Version
Software Engineering Techniques Applied to Relational Databases
Julien Delplanque
(University of Lille, France; CNRS, France; Inria, France)
Relational databases play a central role in software systems. As any software artefact they are subject to software engineering related problems. That apply even more when these databases hold behaviour in the form of stored functions, triggers, etc... The purpose of this article is to present a research plan to handle this problematic.
@InProceedings{ASE18p948,
author = {Julien Delplanque},
title = {Software Engineering Techniques Applied to Relational Databases},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {948--951},
doi = {10.1145/3238147.3241534},
year = {2018},
}
Publisher's Version
Journal-First Papers
Automatically Quantifying the Impact of a Change in Systems (Journal-First Abstract)
Nada Almasri, Luay Tahat, and Bogdan Korel
(Gulf University for Science and Technology, Kuwait; Illinois Institute of Technology, USA)
Software maintenance is becoming more challenging with the increased complexity of the software and the frequently applied changes. Performing impact analysis before the actual implementation of a change is a crucial task during system maintenance. While many tools and techniques are available to measure the impact of a change at the code level, only a few research work is done to measure the impact of a change at an earlier stage in the development process. Measuring the impact of a change at the model level speeds up the maintenance process allowing early discovery of critical components of the system before applying the actual change at the code level. In this paper, we present model-based impact analysis approach for state-based systems such as telecommunication or embedded systems. The proposed approach uses model dependencies to automatically measure the expected impact for a requested change instead of relying on the expertise of system maintainers, and it generates two impact sets representing the lower bound and the upper bound of the impact. Although it can be extended to other behavioral models, the presented approach mainly addresses extended finite-state machine (EFSM) models. An empirical study is conducted on six EFSM models to investigate the usefulness of the proposed approach. The results show that on average the size of the impact after a single modification (a change in a one EFSM transition) ranges between 14 and 38 % of the total size of the model. For a modification involving multiple transitions, the average size of the impact ranges between 30 and 64 % of the total size of the model. Additionally, we investigated the relationships (correlation) between the structure of the EFSM model, and the size of the impact sets. Upon preliminary analysis of the correlation, the concepts of model density and data density were defined, and it was found that they could be the major factors influencing the sizes of impact sets for models. As a result, these factors can be used to determine the types of models for which the proposed approach is the most appropriate.
@InProceedings{ASE18p952,
author = {Nada Almasri and Luay Tahat and Bogdan Korel},
title = {Automatically Quantifying the Impact of a Change in Systems (Journal-First Abstract)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {952--952},
doi = {10.1145/3238147.3241984},
year = {2018},
}
Publisher's Version
Estimating the Number of Remaining Links in Traceability Recovery (Journal-First Abstract)
Davide Falessi, Massimiliano Di Penta, Gerardo Canfora, and Giovanni Cantone
(California Polytechnic State University, USA; University of Sannio, Italy; University of Rome Tor Vergata, Italy)
Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs to be manually inspected. In this paper we introduce an approach called Estimation of the Number of Remaining Links (ENRL) which aims at estimating, via Machine Learning (ML) classifiers, the number of remaining positive links in a ranked list of candidate traceability links produced by a Natural Language Processing techniques-based recovery approach. We have evaluated the accuracy of the ENRL approach by considering several ML classifiers and NLP techniques on three datasets from industry and academia, and concerning traceability links among different kinds of software artifacts including requirements, use cases, design documents, source code, and test cases. Results from our study indicate that: (i) specific estimation models are able to provide accurate estimates of the number of remaining positive links; (ii) the estimation accuracy depends on the choice of the NLP technique, and (iii) univariate estimation models outperform multivariate ones.
@InProceedings{ASE18p953,
author = {Davide Falessi and Massimiliano Di Penta and Gerardo Canfora and Giovanni Cantone},
title = {Estimating the Number of Remaining Links in Traceability Recovery (Journal-First Abstract)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {953--953},
doi = {10.1145/3238147.3241982},
year = {2018},
}
Publisher's Version
Bounded Model Checking of C++ Programs Based on the Qt Cross-Platform Framework (Journal-First Abstract)
Felipe R. Monteiro, Mário A. P. Garcia, Lucas C. Cordeiro, and Eddie B. de Lima Filho
(Federal University of Amazonas, Brazil; University of Manchester, UK; TPV Technology, Brazil)
This work proposes an abstraction of the Qt framework, named as Qt Operational Model (QtOM), which is integrated into two different verification approaches: explicit-state model checking and symbolic (bounded) model checking. The proposed methodology is the first one to formally verify Qt-based applications, which has the potential to devise new directions for software verification of portable code. The full version of this paper is published in Software Testing, Verification and Reliability, on 02 March 2017 and it is available at https://doi.org/10.1002/stvr.1632.
@InProceedings{ASE18p954,
author = {Felipe R. Monteiro and Mário A. P. Garcia and Lucas C. Cordeiro and Eddie B. de Lima Filho},
title = {Bounded Model Checking of C++ Programs Based on the Qt Cross-Platform Framework (Journal-First Abstract)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {954--954},
doi = {10.1145/3238147.3241981},
year = {2018},
}
Publisher's Version
Video
Info
Understanding Semi-structured Merge Conflict Characteristics in Open-Source Java Projects (Journal-First Abstract)
Paola Accioly, Paulo Borba, and Guilherme Cavalcanti
(Federal University of Pernambuco, Brazil)
In a collaborative development environment, tasks are commonly assigned to developers working independent from each other. As a result, when trying to integrate these contributions, one might have to deal with conflicting changes. Such conflicts might be detected when merging contributions (merge conflicts), when building the system (build conflicts), or when running tests (semantic conflicts). Regarding such conflicts, previous studies show that they occur frequently, and impair developers’ productivity, as understanding and solving them is a demanding and tedious task that might introduce defects.
However, despite the existing evidence in the literature, the structure of changes that lead to conflicts has not been studied yet. Understanding the underlying structure of conflicts, and the involved syntactic language elements, might shed light on how to better avoid them. For example, awareness tools that inform users about ongoing parallel changes such as Syde and Palantír can benefit from knowing the most common conflict patterns to become more efficient. With that aim, in this paper we focus on understanding the underlying structure of merge conflicts.
@InProceedings{ASE18p955,
author = {Paola Accioly and Paulo Borba and Guilherme Cavalcanti},
title = {Understanding Semi-structured Merge Conflict Characteristics in Open-Source Java Projects (Journal-First Abstract)},
booktitle = {Proc.\ ASE},
publisher = {ACM},
pages = {955--955},
doi = {10.1145/3238147.3241983},
year = {2018},
}
Publisher's Version
proc time: 20.5