SANER 2019
2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering (SANER)
Powered by
Conference Publishing Consulting

2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering (SANER), February 24-27, 2019, Hangzhou, China

SANER 2019 – Proceedings

Contents - Abstracts - Authors


Title Page

Message from the Chairs




Software Engineering in a Data Science Future (Keynote)
Ahmed E. Hassan
(Queen's University, Canada)
Machine Learning (ML) advances continue to be headline news. Enormous investments are being poured into Data Science (DS) and Artificial intelligence (AI) initiatives worldwide. Hiring managers are turning every rock looking for ML, DS, and AI experts and even novices! Forbes proclaims software engineers will be replaced by deep learners in the not so distant future. In this talk I will highlight the crucial role of Software Engineering (SE) in this ML/DS/AI future. I will follow it up with a critical look at many of the challenges and risks that such sophisticated advances bring to software research and practice.

Article Search
Does Your Software Value What You Value? (Keynote)
Jon Whittle
(Monash University, Australia)
Software engineering has generally done a good job of building software systems with the intended functionality and cost and that is safe, secure and reliable. However, there is a broader set of human values -- such as transparency, integrity, diversity, compassion, social justice -- that are largely ignored when we develop software systems. In this talk, I will argue that software development methods should place more emphasis on these human values so we do a better job of building software that aligns with our individual, corporate or societal values. Furthermore, drawing on recent evidence from case studies in industry, I will argue that dealing with human values in software systems is not just of interest to a small group of organisations; rather, all software projects should think about human values, build them in where appropriate, test for them, and use them to drive design decisions. When they are not dealt with in this way, there can be severe social and economic consequences.

Article Search
Forward and Backward Traceability: Requirements and Challenges (Keynote)
Zhi Jin
(Peking University, China)
Traceability is originally one of the essential activities of good requirements management for ensuring that the right products are being built at each phase of the software development life cycle. The forward traceability has been posed for tracing the progress of the development and to analyze the impacts of the changed requirements for reducing the changing efforts. It will become more important in the resilient cyber-physical systems as such a system need not only to replicate or modify itself but also to develop the design by which it is replicated or modified. For these systems, bi-directional traceability is highly demanded and to some sense becomes an embedded capability of the systems. This talk will analyze the requirements and the technique challenges of built-in bi-directional traceability in resilient cyber-physical systems.

Article Search

Research Papers

Software Multiple-Level Change Detection Based on Two-Step MPAT Matching
Tong Wang, Dongdong Wang, Ying Zhou, and Bixin Li
(Southeast University, China)
During software evolution, change detection plays an important role in software maintenance. For example, according to the changes, developers need to verify whether the evolved contents are consistent with the evolution plan, and making a regression testing plan. Right now, the text-based method and the AST-based matching method are widely used to detect changes. However, the text-based method cannot identify updating and renaming, so the accuracy of its detecting result is not high enough; the AST-based matching method only reflects the changes between files, so its detecting result is incomplete for understanding software evolution from an overall perspective. In order to solve the above problems, we propose a software multiple-level change detection method based on two-step MPAT (multilevel program analysis tree) matching, which detects changes between two programs from multiple levels to improve accuracy and proposes comprehensive change information. Our approach consists of three steps. Firstly, we construct MPAT for each program. Secondly, we implement the two-step matching algorithm to detect changes based on the two MPATs. Thirdly, these changes are classified and clustered. Finally, we develop ChangeAnalyzer to implement our approach, and conduct our experiments on six projects to evaluate the accuracy, performance and usefulness of ChangeAnalyzer.

Article Search
Pruning the AST with Hunks to Speed up Tree Differencing
Chunhua Yang and E. James Whitehead
(QILU University of Technology, China; University of California at Santa Cruz, USA)
Inefficiency is a problem in tree-differencing approaches. As textual-differencing approaches are highly efficient and the hunks returned by them reflect the line range of the modified code, we propose a novel approach to prune the AST with hunks. We define the pruning strategies at the declaration level and the statement level, respectively. And, we have designed an algorithm to implement the strategies. Furthermore, we have integrated the algorithm to ChangeDistiller and GumTree. Through an evaluation on four open source projects, the results show that the approach is very effective in reducing the number of nodes and shortening the running time. On average, with declaration-level pruning, the number of nodes in the two tools is reduced by at least 64%. With statement-level pruning, the number of nodes in both tools is reduced by at least 74%. By using the declaration-level pruning and the statement-level pruning, GumTree's runtime is reduced by at least 70% and 75%, respectively.

Article Search
Expressions of Sentiments during Code Reviews: Male vs. Female
Rajshakhar Paul, Amiangshu Bosu, and Kazi Zakia Sultana
(Wayne State University, USA; Montclair University, USA)
As most of the software development organizations are male-dominated, many female developers encounter negative workplace experiences and report feeling like they "do not belong". Exposures to discriminatory expletives or negative critiques from their male colleagues may further exacerbate those feelings. The primary goal of this study is to identify the differences in expressions of sentiments between male and female developers during various software engineering tasks. On this goal, we mined the code review repositories of six popular open source projects. We used a semi-automated approach leveraging the name as well as multiple social networks to identify the gender of a developer. Using SentiSE, a customized and state-of-the-art sentiment analysis tool for the software engineering domain, we classify each communication as negative, positive, or neutral. We also compute the frequencies of sentiment words, emoticons, and expletives used by each developer.
Our results suggest that the likelihood of using sentiment words, emoticons, and expletives during code reviews varies based on the gender of a developer, as females are significantly less likely to express sentiments than males. Although female developers were more neutral to their male colleagues than to another female, male developers from three out of the six projects were not only writing more frequent negative comments but also withholding positive encouragements from their female counterparts. Our results provide empirical evidence of another factor behind the negative work place experiences encountered by the female developers that may be contributing to the diminishing number of females in the SE industry.

Article Search
A Study on the Interplay between Pull Request Review and Continuous Integration Builds
Fiorella Zampetti, Gabriele Bavota, Gerardo Canfora, and Massimiliano Di Penta
(University of Sannio, Italy; USI Lugano, Switzerland)
Modern code review (MCR) is nowadays well-adopted in industrial and open source projects. Recent studies have investigated how developers perceive its ability to foster code quality, developers' code ownership, and team building. MCR is often being used with automated quality checks through static analysis tools, testing or, ultimately, through automated builds on a Continuous Integration (CI) infrastructure. With the aim of understanding how developers use the outcome of CI builds during code review and, more specifically, during the discussion of pull requests, this paper empirically investigates the interplay between pull request discussion and the use of CI by means of 64,865 pull request discussions belonging to 69 open source projects. After having analyzed to what extent a build outcome influences the pull request merger, we qualitatively analyze the content of 857 pull request discussions. Also, we complement such an analysis with a survey involving 13 developers. While pull requests with passed build have a higher chance of being merged than failed ones, and while survey participants confirmed this quantitative finding, other process-related factors play a more important role in the pull request merge decision. Also, the survey participants point out cases where a pull request can be merged in presence of a CI failure, eg when a new pull request is opened to cope with the failure, when the failure is due to minor static analysis warnings. The study also indicates that CI introduces extra complexity, as in many pull requests developers have to solve non-trivial CI configuration issues.

Article Search
Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies
Felipe Ebert, Fernando Castor, Nicole Novielli, and Alexander Serebrenik
(Federal University of Pernambuco, Brazil; University of Bari, Italy; Eindhoven University of Technology, Netherlands)
Code review is a software quality assurance practice widely employed in both open source and commercial software projects to detect defects, transfer knowledge and encourage adherence to coding standards. Notwithstanding, code reviews can also delay the incorporation of a code change into a code base, thus slowing down the overall development process. Part of this delay is often a consequence of reviewers not understanding, becoming confused by, or being uncertain about the intention, behavior, or effect of a code change.
We investigate the reasons and impacts of confusion in code reviews, as well as the strategies developers adopt to cope with confusion. We employ a concurrent triangulation strategy to combine the analyses of survey responses and of the code review comments, and build a comprehensive confusion framework structured along the dimensions of the review process, the artifact being reviewed, the developers themselves and the relation between the developer and the artifact. The most frequent reasons for confusion are the missing rationale, discussion of non-functional requirements of the solution, and lack of familiarity with existing code. Developers report that confusion delays the merge decision, decreases review quality, and results in additional discussions. To cope with confusion developers request information, improve familiarity with existing code, and discuss off-line.
Based on the results, we provide a series of implications for tool builders, as well as insights and suggestions for researchers. The results of our work offer empirical justification for the need to improve code review tools to support developers facing confusion.

Article Search Info
Deep Review Sharing
Chenkai Guo, Dengrong Huang, Naipeng Dong, Quanqi Ye, Jing Xu, Yaqing Fan, Hui Yang, and Yifan Xu
(Nankai University, China; National University of Singapore, Singapore; Advanced Digital Sciences Center, Singapore)
Review-Based Software Improvement (RBSI for short) has drawn increasing research attentions in recent years. Relevant efforts focus on how to leverage the underlying information within reviews to obtain a better guidance for further updating. However, few efforts consider the Projects Without sufficient Reviews (PWR for short). Actually, PWR dominates the software projects, and the lack of PWR-based RBSI research severely blocks the improvement of certain software. In this paper, we make the first attempt to pave the road. Our goal is to establish a generic framework for sharing suitable and informative reviews to arbitrary PWR. To achieve this goal, we exploit techniques of code clone detection and review ranking. In order to improve the sharing precision, we introduce Convolutional Neural Network (CNN) into our clone detection, and design a novel CNN based clone searching module for our sharing system. Meanwhile, we adopt a heuristic filtering strategy to reduce the sharing time cost.We implement a prototype review sharing system RSharer and collect 72,440 code-review pairs as our ground knowledge. Empirical experiments on hundreds of real code fragments verify the effectiveness of RSharer. RSharer also achieves positive response and evaluation by expert developers.

Article Search
A Comparative Study of Software Bugs in Micro-clones and Regular Code Clones
Judith F. Islam, Manishankar Mondal, and Chanchal K. Roy
(University of Saskatchewan, Canada)
Reusing a code fragment through copy/pasting, also known as code cloning, is a common practice during software development and maintenance. Most of the existing studies on code clones ignore micro-clones where the size of a micro-clone fragment can be 1 to 4 LOC. In this paper we compare the bug-proneness of micro-clones with that of regular code clones. From thousands of revisions of six diverse open-source subject systems written in three languages (C, C#, and Java), we identify and investigate both regular and micro-clones that are associated with reported bugs.
Our experiment reveals that percentage of changed code fragments due to bug-fix commits is significantly higher in micro-clones than regular clones. The number of consistent changes due to bug-fix commits is significantly higher in micro-clones than regular clones. We also observe that significantly higher percentage of files get affected by bug-fix commits in micro-clones than regular clones. Finally, we found that percentage of severe bugs is significantly higher in micro-clones than regular clones. We perform Mann-Whitney-Wilcoxon (MWW) test to evaluate the statistical significance level of our experimental results. Our findings imply that micro-clones should be emphasized during clone management and software maintenance.

Article Search
On Precision of Code Clone Detection Tools
Farima Farmahinifarahani, Vaibhav Saini, Di Yang, Hitesh Sajnani, and Cristina V. Lopes
(University of California at Irvine, USA; Microsoft, USA)
Precision and recall are the main metrics used to measure the correctness of clone detectors. These metrics require the existence of labeled datasets containing the ground truth – samples of clone and non-clone pairs. For source code clone detectors, in particular, there are some techniques, as well as a concrete framework, for automatically evaluating recall, down to different types of clones. However, evaluating precision is still challenging, because of the intensive and specialized manual effort required to accomplish the task. Moreover, when precision is reported, it is typically done over all types of clones, making it hard to assess the strengths and weaknesses of the corresponding clone detectors. This paper presents systematic experiments to evaluate precision of eight code clone detection tools. Three judges independently reviewed 12,800 clone pairs to compute the undifferentiated and type-based precision of these tools. Besides providing a useful baseline for future research in code clone detection, another contribution of our work is to unveil important considerations to take into account when doing precision measurements and reporting the results. Specifically, our work shows that the reported precision of these tools leads to significantly different conclusions and insights about the tools when different types of clones are taken into account. It also stresses, once again, the importance of reporting inter-rater agreement.

Article Search
Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection
Lutz Büch and Artur Andrzejak
(University of Heidelberg, Germany)
Code clone detection remains a crucial challenge in maintaining software projects. Many classic approaches rely on handcrafted aggregation schemes, while recent work uses supervised or unsupervised learning. In this work, we study several aspects of aggregation schemes for code clone detection based on supervised learning. To this aim, we implement an AST-based Recursive Neural Network. Firstly, our ablation study shows the influence of model choices and hyperparameters. We introduce error scaling as a way to effectively and efficiently address the class imbalance problem arising in code clone detection. Secondly, we study the influence of pretrained embeddings representing nodes in ASTs. We show that simply averaging all node vectors of a given AST yields strong baseline aggregation scheme. Further, learned AST aggregation schemes greatly benefit from pretrained node embeddings. Finally, we show the importance of carefully separating training and test data by clone clusters, to reliably measure generalization of models learned with supervision.

Article Search
Fuzzing Program Logic Deeply Hidden in Binary Program Stages
Yanhao Wang, Zheng Leong Chua, Yuwei Liu, Purui Su, and Zhenkai Liang
(Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; National University of Singapore, Singapore)
Fuzzing is an effective method to identify bugs and security vulnerabilities in software. One particular difficulty faced by fuzzing is how to effectively generate inputs to cover program paths, especially for programs with complex logic. We observe that complex programs are often composed of components, which is a natural result of software engineering principles. The components interface with each other using memory buffers, forming stages of processing in the program logic. Program logic in later stages is difficult to reach by fuzzers. In this paper, we develop a novel solution to fuzz such program logic, called STAGEFUZZER. It identifies the stages and memory interfaces from program binaries, and fuzzes later stages of the program effectively. In our evaluation with a suite of typical binaries, STAGEFUZZER correctly identifies the program structure and effectively increases the coverage of program logic compared to AFL fuzzer.

Article Search
How Stable Are Eclipse Application Framework Internal Interfaces?
John Businge, Simon Kawuma, Moses Openja, Engineer Bainomugisha, and Alexander Serebrenik
(Mbarara University of Science and Technology, Uganda; Makerere University, Uganda; Eindhoven University of Technology, Netherlands)
Eclipse framework provides two interfaces: stable interfaces (APIs) and unstable interfaces (non-APIs). Despite the non-APIs being discouraged and unsupported, their usage is not uncommon. Previous studies showed that applications using relatively old non-APIs are more likely to be compatible with new releases compared to the ones that used newly introduced non-APIs; that the growth rate of non-APIs is nearly twice as much as that of APIs; and that the promotion of non-API to APIs happens at a slow pace since API providers have no assistance to identify public interface candidates.
Motivated by these findings, our main aim was to empirically investigate the entire population (2,380K) of non-APIs to find the non-APIs that remain stable for a long period of time. We employ cross-project clone detection to identify whether non-APIs introduced in a given Eclipse release remain stable over successive releases. We provide a dataset of 327K stable non-API methods that can be used by both Eclipse interface providers as possible candidates of promotion. Instead of promoting non-APIs which are too fine-grained, we summarized the non-API methods groups in given classes that are stable together and present class-level non-APIs that possible candidates promotion. We have shown that it is possible to predict the stability of a non-API in subsequent Eclipse releases with a precision of ≥56%, a recall of ≥96% and an AUC of ≥92% and an F-measure of ≥81%. We have also shown that the metrics of length of a method and number of method parameters in a non-API method are very good predictors for the stability of the non-API in successive Eclipse releases. The results provided can help the API providers to estimate a priori how much work could be involved in performing the promotion.

Article Search Info
Unveiling Exception Handling Guidelines Adopted by Java Developers
Hugo Melo, Roberta Coelho, and Christoph Treude
(Federal University of Rio Grande do Norte, Brazil; University of Adelaide, Australia)
Despite being an old language feature, Java exception handling code is one of the least understood parts of many systems. Several studies have analyzed the characteristics of exception handling code, trying to identify common practices or even link such practices to software bugs. Few works, however, have investigated exception handling issues from the point of view of developers. None of the works have focused on discovering exception handling guidelines adopted by current systems -- which are likely to be a driver of common practices. In this work, we conducted a qualitative study based on semi-structured interviews and a survey whose goal was to investigate the guidelines that are (or should be) followed by developers in their projects. Initially, we conducted semi-structured interviews with seven experienced developers, which were used to inform the design of a survey targeting a broader group of Java developers (i.e., a group of active Java developers from top-starred projects on GitHub). We emailed 863 developers and received 98 valid answers. The study shows that exception handling guidelines usually exist (70%) and are usually implicit and undocumented (54%). Our study identifies 48 exception handling guidelines related to seven different categories. We also investigated how such guidelines are disseminated to the project team and how compliance between code and guidelines is verified; we could observe that according to more than half of respondents the guidelines are both disseminated and verified through code inspection or code review. Our findings provide software development teams with a means to improve exception handling guidelines based on insights from the state of practice of 87 software projects.

Article Search
Migrating to GraphQL: A Practical Assessment
Gleison Brito, Thais Mombach, and Marco Tulio Valente
(Federal University of Minas Gerais, Brazil)
GraphQL is a novel query language proposed by Facebook to implement Web-based APIs. In this paper, we present a practical study on migrating API clients to this new technology. First, we conduct a grey literature review to gain an in-depth understanding on the benefits and key characteristics normally associated to GraphQL by practitioners. After that, we assess such benefits in practice, by migrating seven systems to use GraphQL, instead of standard REST-based APIs. As our key result, we show that GraphQL can reduce the size of the JSON documents returned by REST APIs in 94% (in number of fields) and in 99% (in number of bytes), both median results.

Article Search
Are Refactorings to Blame? An Empirical Study of Refactorings in Merge Conflicts
Mehran Mahmoudi, Sarah Nadi, and Nikolaos Tsantalis
(University of Alberta, Canada; Concordia University, Canada)
With the rise of distributed software development, branching has become a popular approach that facilitates collaboration between software developers. One of the biggest challenges that developers face when using multiple development branches is dealing with merge conflicts. Conflicts occur when inconsistent changes happen to the code. Resolving these conflicts can be a cumbersome task as it requires prior knowledge about the changes in each of the development branches. A type of change that could potentially lead to complex conflicts is code refactoring. Previous studies have proposed techniques for facilitating conflict resolution in the presence of refactorings. However, the magnitude of the impact that refactorings have on merge conflicts has never been empirically evaluated. In this paper, we perform an empirical study on almost 3,000 well-engineered open-source Java software repositories and investigate the relation between merge conflicts and 15 popular refactoring types. Our results show that refactoring operations are involved in 22% of merge conflicts, which is remarkable taking into account that we investigated a relatively small subset of all possible refactoring types. Furthermore, certain refactoring types, such as Extract Method , tend to be more problematic for merge conflicts. Our results also suggest that conflicts that involve refactored code are usually more complex, compared to conflicts with no refactoring changes.

Article Search
Accurate Design Pattern Detection Based on Idiomatic Implementation Matching in Java Language Context
Renhao Xiong and Bixin Li
(Southeast University, China)
Design patterns (DPs) are widely accepted as solutions to recurring problems in software design. While numerous approaches and tools have been proposed for DP detection over the years, the neglect of language-specific mechanism that underlies the implementation idioms of DPs leads to false or missing DP instances since language-specific features are not captured and similar characteristics are not distinguished. However, there is still a lack of research that emphasizes the idiomatic implementation in the context of a specific language. A vital challenge is the representation of software systems and language mechanism. In this work, we propose a practical approach for DP detection from source code, which exploits idiomatic implementation in the context of Java language. DPs are formally defined under the blueprint of the layered knowledge graph (LKG) that models both language-independent concepts of DPs and Java language mechanism. Based on static analysis and inference techniques, the approach enables flexible search strategies integrating structural, behavioral and semantic aspects of DPs for the detection. Concerning emerging patterns and pattern variants, the core methodology supports pluggable pattern templates. A prototype implementation has been evaluated on five open source software systems and compared with three other approaches. The evaluation results show that the proposed approach improves the accuracy with higher precision (85.7%) and recall (93.8%). The runtime performance also supports its practical applicability.

Article Search
Detecting Feature-Interaction Symptoms in Automotive Software using Lightweight Analysis
Bryan J. Muscedere, Robert Hackman, Davood Anbarnam, Joanne M. Atlee, Ian J. Davis, and Michael W. Godfrey
(University of Waterloo, Canada)
Modern automotive software systems are large, complex, and feature rich; they can contain over 100 million lines of code, comprising hundreds of features distributed across multiple electronic control units (ECUs), all operating in parallel and communicating over a CAN bus. Because they are safety-critical systems, the problem of possible Feature Interactions (FIs) must be addressed seriously; however, traditional detection approaches using dynamic analyses are unlikely to scale to the size of these systems. We are investigating an approach that detects static source-code patterns that are symptomatic of FIs. The tools report Feature-Interaction warnings, which can be investigated further by engineers to determine if they represent true FIs and if those FIs are problematic.
In this paper, we present our preliminary toolchain for FI detection. First, we extract a collection of static “facts” from the source code, such as function calls, variable assignments, and messages between features. Next, we perform relational algebra transformations on this factbase to infer additional “facts” that represent more complicated design information about the code, such as potential information flows and data dependencies; then, the full collection of “facts” is matched against a curated set of patterns for FI symptoms. We present a set of five patterns for FIs in automotive software as well a case study in which we applied our tools to the Autonomoose autonomous-driving software, developed at the University of Waterloo. Our approach identified 1,444 possible FIs in this codebase, of which 10% were classified as being probable interactions worthy of further investigation.

Article Search
Mining Cross-Task Artifact Dependencies from Developer Interactions
Usman Ashraf, Christoph Mayr-Dorn, and Alexander Egyed
(JKU Linz, Austria)
Implementing a change is a challenging task in complex, safety-critical, or long-living software systems. Developers need to identify which artifacts are affected to correctly and completely implement a change. Changes often require editing artifacts across the software system to the extent that several developers need to be involved. Crucially, a developer needs to know which artifacts under someone else’s control have impact on her work task and, in turn, how her changes cascade to other artifacts, again, under someone else’s control. These cross-task dependencies are especially important as they are a common cause of incomplete and incorrect change propagation and require explicit coordination. Along these lines the core research question in this paper is: how can we automatically detect cross-task dependencies and use them to assist the developer? We introduce an approach for mining such dependencies from past developer interactions with engineering artifacts as the basis for live recommending artifacts during change implementation. We show that our approach lists 67% of the correctly recommended artifacts within the top-10 results with real interaction data and tasks from the Mylyn project. The results demonstrate we are able to successfully find not only cross-task dependencies but also provide them to developers in a useful manner.

Article Search Info
A Human-as-Sensors Approach to API Documentation Integration and Its Effects on Novice Programmers
Cong Chen, Yulong Yang, Lin Yang, and Kang Zhang
(Tianjin University, China; University of Texas at Dallas, USA)
In recent years, there has been a great interest in integrating crowdsourced API documents that are often dispersed across multiple places. Because of the complexity of natural language, however, automatically synthesized documents often fall short on quality and completeness compared to those authored by human experts. We develop a complementary "human-as-sensors" approach to document integration that generates API FAQs based on users' help-seeking behavior and history. We investigated the benefits and limitations of this approach in the context of programming education. This paper describes a prototype system called COFAQ and a controlled experiment with 18 novice programmers. The study confirms that the generated FAQs effectively fosters knowledge transfer between the programmers and significantly reduce the need for repeated search. It also discovers several difficulties novice programmers encountered when seeking API help as well as the strategies they used to seek and utilize API knowledge.

Article Search
Feature Maps: A Comprehensible Software Representation for Design Pattern Detection
Hannes Thaller, Lukas Linsbauer, and Alexander Egyed
(JKU Linz, Austria)
Design patterns are elegant and well-tested solutions to recurrent software development problems. They are the result of software developers dealing with problems that frequently occur, solving them in the same or a slightly adapted way. A pattern's semantics provide the intent, motivation, and applicability, describing what it does, why it is needed, and where it is useful. Consequently, design patterns encode a well of information. Developers weave this information into their systems whenever they use design patterns to solve problems. This work presents Feature Maps, a flexible human- and machine-comprehensible software representation based on micro-structures. Our algorithm, the Feature-Role Normalization, presses the high-dimensional, inhomogeneous vector space of micro-structures into a feature map. We apply these concepts to the problem of detecting instances of design patterns in source code. We evaluate our methodology on four design patterns, a wide range of balanced and imbalanced labeled training data, and compare classical machine learning (Random Forests) with modern deep learning approaches (Convolutional Neural Networks). Feature maps yield robust classifiers even under challenging settings of strongly imbalanced data distributions without sacrificing human comprehensibility. Results suggest that feature maps are an excellent addition in the software analysis toolbox that can reveal useful information hidden in the source code.

Article Search
Reformulating Queries for Duplicate Bug Report Detection
Oscar Chaparro, Juan Manuel Florez, Unnati Singh, and Andrian Marcus
(University of Texas at Dallas, USA)
When bugs are reported, one important task is to check if they are new or if they were reported before. Many approaches have been proposed to partially automate duplicate bug report detection, and most of them rely on text retrieval techniques, using the bug reports as queries. Some of them include additional bug information and use complex retrieval- or learning-based methods. In the end, even the most sophisticated approaches fail to retrieve duplicate bug reports in many cases, leaving the bug triagers to their own devices. We argue that these duplicate bug retrieval tools should be used interactively, allowing the users to reformulate the queries to refine the retrieval. With that in mind, we are proposing three query reformulation strategies that require the users to simply select from the bug report the description of the software’s observed behavior and/or the bug title, and combine them to issue a new query. The paper reports an empirical evaluation of the reformulation strategies, using a basic duplicate retrieval technique, on bug reports with duplicates from 20 open source projects. The duplicate detector failed to retrieve duplicates in top 5-30 for a significant number of the bug reports (between 34% and 50%). We reformulated the queries for a sample of these bug reports and compared the results against the initial query. We found that using the observed behavior description, together with the title, leads to the best retrieval performance. Using only the title or only the observed behavior for reformulation is also better than retrieval with the initial query. The reformulation strategies lead to 56.6%-78% average retrieval improvement, over using the initial query only.

Article Search Info
Identifying Redundancies in Fork-based Development
Luyao Ren, Shurui Zhou, Christian Kästner, and Andrzej Wąsowski
(Peking University, China; Carnegie Mellon University, USA; IT University of Copenhagen, Denmark)
Fork-based development is popular and easy to use, but makes it difficult to maintain an overview of the whole community when the number of forks increases. This may lead to redundant development where multiple developers are solving the same problem in parallel without being aware of each other. Redundant development wastes effort for both maintainers and developers. In this paper, we designed an approach to identify redundant code changes in forks as early as possible by extracting clues indicating similarities between code changes, and building a machine learning model to predict redundancies. We evaluated the effectiveness from both the maintainer's and the developer's perspectives. The result shows that we achieve 57-83% precision for detecting duplicate code changes from maintainer's perspective, and we could save developers' effort of 1.9-3.0 commits on average. Also, we show that our approach significantly outperforms existing state-of-art.

Article Search
Systematic Comprehension for Developer Reply in Mobile System Forum
Chenkai Guo, Weijing Wang, Yanfeng Wu, Naipeng Dong, Quanqi Ye, Jing Xu, and Sen Zhang
(Nankai University, China; National University of Singapore, Singapore; Advanced Digital Sciences Center, Singapore)
In this paper, we attempt to bridge the gap between user review and developer reply, and conduct a systematic study for review reply in development forums, especially in Chinese mobile system forums. To this end, we concentrate on three research questions: 1) should a targeted review be replied; 2) how long time it should be replied; 3) does traditional review analysis help to pursue a reply for certain review? To answer such questions, given certain review datasets, we perform a systematical study including the following three stages: 1) a binary classification for reply behavior prediction, 2) a regression for prediction of reply time, 3) a systematic factor study for the relationship between traditional review analysis and reply performance. To enhance the accuracy of prediction and analysis, we proposed a CNN-based weak-supervision analysis framework, which exploits manifold techniques from NLP and deep learning. We validate our approach via extensive comparison experiments. The results show that our analysis framework is effective. More importantly, we have uncovered several interesting findings, which provide valuable guidance for further review improvement and recommendation.

Article Search
Improving Model Inference in Industry by Combining Active and Passive Learning
Nan Yang, Kousar Aslam, Ramon Schiffelers, Leonard Lensink, Dennis Hendriks, Loek Cleophas, and Alexander Serebrenik
(Eindhoven University of Technology, Netherlands; ASML, Netherlands; ESI/TNO, Netherlands)
Inferring behavioral models (e.g., state machines) of software systems is an important element of re-engineering activities. Model inference techniques can be categorized as active or passive learning, constructing models by (dynamically) interacting with systems or (statically) analyzing traces, respectively. Application of those techniques in the industry is, however, hindered by the trade-off between learning time and completeness achieved (active learning) or by incomplete input logs (passive learning). We investigate the learning time/completeness achieved trade-off of active learning with a pilot study at ASML, provider of lithography systems for the semiconductor industry. To resolve the trade-off we advocate extending active learning with execution logs and passive learning results.
We apply the extended approach to eighteen components used in ASML TWINSCAN lithography machines. Compared to traditional active learning, our approach significantly reduces the active learning time. Moreover, it is capable of learning the behavior missed by the traditional active learning approach.

Article Search
Towards Understandable Guards of Extracted State Machines from Embedded Software
Wasim Said, Jochen Quante, and Rainer Koschke
(Robert Bosch, Germany; University of Bremen, Germany)
The extraction of state machines from complex software systems can be very useful to understand the behavior of a software, which is a prerequisite for other software activities, such as maintenance, evolution and reengineering. However, using static analysis to extract state machines from real-world embedded software often leads to models that cannot be understood by humans: The extracted models contain a high number of states and transitions and very complex guards (transition conditions). Integrating user interaction into the extraction process can reduce these state machines to an acceptable size. However, the problem of highly complex guards remains. In this paper, we present a novel approach to reduce the complexity of guards in such state machines to a degree that is understandable for humans. The conditions are reduced by a combination of heuristic logic minimization, masking of infeasible paths, and using transition priorities. The approach is evaluated with software developers on industrial embedded C code. The results show that the approach is highly effective in making the guards understandable. Our controlled experiment shows that guards reduced by our approach and presented with priorities are easier to understand than guards without priorities.

Article Search
Mining Specifications from Documentation using a Crowd
Peng Sun, Chris Brown, Ivan Beschastnikh, and Kathryn T. Stolee
(Iowa State University, USA; North Carolina State University, USA; University of British Columbia, Canada)
Temporal API specifications are useful for many software engineering tasks, such as test case generation. In practice, however, APIs are rarely formally specified, inspiring researchers to develop tools that infer or mine specifications automatically.
Traditional specification miners infer likely temporal properties by statically analyzing the source code or by analyzing program runtime traces. These approaches are frequently confounded by the complexity of modern software and by the unavailability of representative and correct traces. Formally specifying software is traditionally an expert task. We hypothesize that human crowd intelligence provides a scalable and high-quality alternative to experts, without compromising on quality.
In this work we present CrowdSpec, an approach to use collective intelligence of crowds to generate or improve automatically mined specifications. CrowdSpec uses the observation that APIs are often accompanied by natural language documentation, which is a more appropriate resource for humans to interpret and is a complementary source of information to what is used by most automated specification miners.

Article Search
Studying Android App Popularity by Cross-Linking GitHub and Google Play Store
John Businge, Moses Openja, David Kavaler, Engineer Bainomugisha, Foutse Khomh, and Vladmir Filkov
(Mbarara University of Science and Technology, Uganda; University of California at Davis, USA; Makerere University, Uganda; Polytechnique Montréal, Canada)
The incredible success of the mobile App economy has been attracting software developers hoping for new or repeated success. Surviving in the fierce competitive App market involves in part planning ahead of time for the success of the App on the marketplace. Prior research has shown that App success can be viewed through its proxy--popularity. An important question, then, is what factors differentiates popular from unpopular Apps? GitHub, a software project forge, and Google Play store, an app market, are both crowdsourced, and provide some publicly available data that can be used to cross-link source code and app download popularity. In this study, we examined how technical and social features of Open Source Software Apps, mined from two crowdsourced websites, relate to App popularity. We observed that both the technical and the social factors play significant roles in explaining App popularity. However, the combined factors have a low effect size in explaining App popularity, as measured by average user rating on Google Play. Interestingly on GitHub, we found that social factors have a higher power in explaining the popularity compared to all the technical factors we investigated

Article Search
An Empirical Study of Learning to Rank Techniques for Effort-Aware Defect Prediction
Xiao Yu, Kwabena Ebo Bennin, Jin Liu, Jacky Wai Keung, Xiaofei Yin, and Zhou Xu
(Wuhan University, China; City University of Hong Kong, China; Fudan University, China)
Effort-Aware Defect Prediction (EADP) ranks software modules based on the possibility of these modules being defective, their predicted number of defects, or defect density by using learning to rank algorithms. Prior empirical studies compared a few learning to rank algorithms considering small number of datasets, evaluating with inappropriate or one type of performance measure, and non-robust statistical test techniques. To address these concerns and investigate the impact of learning to rank algorithms on the performance of EADP models, we examine the practical effects of 23 learning to rank algorithms on 41 available defect datasets from the PROMISE repository using a module-based effort-aware performance measure (FPA) and a source lines of code (SLOC) based effort-aware performance measure (Norm(Popt)). In addition, we compare the performance of these algorithms when they are trained on a more relevant feature subset selected by the Information Gain feature selection method. In terms of FPA and Norm(Popt), statistically significant differences are observed among these algorithms with BRR (Bayesian Ridge Regression) performing best in terms of FPA, and BRR and LTR (Learning-to-Rank) performing best in terms of Norm(Popt). When these algorithms are trained on a more relevant feature subset selected by Information Gain, LTR and BRR still perform best with significant differences in terms of FPA and Norm(Popt). Therefore, we recommend BRR and LTR for building the EADP model in order to find more defects by inspecting a certain number of modules or lines of codes.

Article Search
COLOR: Correct Locator Recommender for Broken Test Scripts using Various Clues in Web Application
Hiroyuki Kirinuki, Haruto Tanno, and Katsuyuki Natsukawa
(NTT, Japan)
Test automation tools such as Selenium are commonly used for automating end-to-end tests, but when developers update the software, they often need to modify the test scripts accordingly. However, the costs of modifying these test scripts are a big obstacle to test automation because of the scripts’ fragility. In particular, locators in test scripts are prone to change. Some prior methods tried to repair broken locators by using structural clues, but these approaches usually cannot handle radical changes to page layouts.
In this paper, we propose a novel approach called COLOR (correct locator recommender) to support repairing broken locators in accordance with software updates. COLOR uses various properties as clues obtained from screens (i.e., attributes, texts, images, and positions). We examined which properties are reliable for recommending locators by examining changes between two release versions of software, and the reliability is adopted as the weight of a property. Our experimental results obtained from four open source web applications show that COLOR can present the correct locator in first place with a 77% – 93% accuracy and is more robust against page layout changes than structure-based approaches.

Article Search
A Comparative Study of Android Repackaged Apps Detection Techniques
Xian Zhan, Tao Zhang, and Yutian Tang
(Hong Kong Polytechnic University, China; Harbin Engineering University, China)
Apps repackaging has become a serious problem which not only violates the copyrights of the original developers but also destroys the health of the Android ecosystem. A recent study shows that repackaged apps share a significant proportion of malware samples. Therefore, it is imperative to detect repackaged apps from various app markets. Although many detection technologies have been proposed, there are no solutions for systematic comparison among them. One reason is that many detection tools are not publicly available, and therefore little is known about their robustness and effectiveness. In this paper, we fill this gap by 1) analyzing these repackaging detection technologies; 2) implementing these detection techniques; 3) comparing them regarding various metrics using real repackaged apps. The analysis and the experimental results reveal new insights, which shed light on the research of repackaged apps detection.

Article Search Info
Want to Earn a Few Extra Bucks? A First Look at Money-Making Apps
Yangyu Hu, Haoyu Wang, Li Li, Yao Guo, Guoai Xu, and Ren He
(Beijing University of Posts and Telecommunications, China; Monash University, Australia; Peking University, China)
Have you ever thought of earning profits from the apps that you are using on your mobile device? It is actually achievable thanks to many so-called money-making apps, which pay app users to complete tasks such as installing another app or clicking an advertisement. To the best of our knowledge, no existing studies have investigated the characteristics of money-making apps. To this end, we conduct the first exploratory study to understand the features and implications of money-making apps. We first propose a semi-automated approach aiming to harvest money-making apps from Google Play and alternative app markets. Then we create a taxonomy to classify them into five categories and perform an empirical study from different aspects. Our study reveals several interesting observations: (1) money-making apps have become the target of malicious developers, as we found many of them expose mobile users to serious privacy and security risks. Roughly 26% of the studied apps are potentially malicious. (2) these apps have attracted millions of users, however, many users complain that they are cheated by these apps. We also revealed that ranking fraud techniques are widely used in these apps to promote the ranking of apps inside app markets. (3) these apps usually spread inappropriate and malicious contents, while unsuspicious users could get infected. Our study demonstrates the emergency for detecting and regulating this kind of apps and protect mobile users.

Article Search
AppCommune: Automated Third-Party Libraries De-duplicating and Updating for Android Apps
Bodong Li, Yuanyuan Zhang, Juanru Li, Runhan Feng, and Dawu Gu
(Shanghai Jiao Tong University, China)
The increasing usage of third-party libraries in Android apps is double-edged, boosting the development but introducing extra code base and potential vulnerabilities. Unlike desktop operating systems, Android does not support the sharing of third-party libraries between different apps. Thus both the de-duplicating and the updating of those libraries are difficult to be managed in a unified way.
In this paper, we propose a third-party library sharing method to address the issues of code bloating and obsolete code updating. Our approach separates all integrated third-party libraries from app code and makes them still accessible through a dynamic loading mechanism.The separated libraries are managed centrally and can be shared by different apps.This not only saves the storage but also guarantees a prompt update of outdated libraries for every app. We implement AppCommune, a novel app installation and execution infrastructure to support the proposed third-party library sharing without modifying the commodity Android system. Our experiments with 212 popular third-party libraries and 502 real-world Android apps demonstrate the feasibility and efficiency: all apps work stably with our library sharing model, and 11.1% storage and bandwidth are saved for app downloading and installation. In addition, AppCommune updates 86.4% of the managed third-party libraries (with 44.6% to the latest versions).

Article Search
Characterizing and Detecting Inefficient Image Displaying Issues in Android Apps
Wenjie Li, Yanyan Jiang, Chang Xu, Yepang Liu, Xiaoxing Ma, and Jian Lü
(Nanjing University, China; Southern University of Science and Technology, China)
Mobile applications (apps for short) often need to display images. However, inefficient image displaying (IID) issues are pervasive in mobile apps, and can severely impact app performance and user experience. This paper presents an empirical study of 162 real-world IID issues collected from 243 popular open-source Android apps, validating the presence and severity of IID issues, and then sheds light on these issues’ characteristics to support future research on effective issue detection. Based on the findings of this study, we developed a static IID issue detection tool TAPIR and evaluated it with real-world Android apps. The experimental evaluations show encouraging results: TAPIR detected 43 previously-unknown IID issues in the latest version of the 243 apps, 16 of which have been confirmed by respective developers and 13 have been fixed.

Article Search
Detecting Data Races Caused by Inconsistent Lock Protection in Device Drivers
Qiu-Liang Chen, Jia-Ju Bai, Zu-Ming Jiang, Julia Lawall, and Shi-Min Hu
(Tsinghua University, China; Sorbonne University, France; Inria, France; LIP6, France)
Data races are often hard to detect in device drivers, due to the non-determinism of concurrent execution. According to our study of Linux driver patches that fix data races, more than 38% of patches involve a pattern that we call inconsistent lock protection. Specifically, if a variable is accessed within two concurrently executed functions, the sets of locks held around each access are disjoint, at least one of the locksets is non-empty, and at least one of the involved accesses is a write, then a data race may occur.
In this paper, we present a runtime analysis approach, named DILP, to detect data races caused by inconsistent lock protection in device drivers. By monitoring driver execution, DILP collects the information about runtime variable accesses and executed functions. Then after driver execution, DILP analyzes the collected information to detect and report data races caused by inconsistent lock protection. We evaluate DILP on 12 device drivers in Linux 4.16.9, and find 25 real data races.

Article Search
An Empirical Study of Messaging Passing Concurrency in Go Projects
Nicolas Dilley and Julien Lange
(University of Kent, UK)
Go is a popular programming language renowned for its good support for system programming and its channel-based message passing concurrency mechanism. These strengths have made it the language of choice of many platform software such as Docker and Kubernetes. In this paper, we analyse 865 Go projects from GitHub in order to understand how message passing concurrency is used in publicly available code. Our results include the following findings: (1) message passing primitives are used frequently and intensively, (2) concurrency-related features are generally clustered in specific parts of a Go project, (3) most projects use synchronous communication channels over asynchronous ones, and (4) most Go projects use simple concurrent thread topologies, which are however currently unsupported by existing static verification frameworks.

Article Search Info
A Splitting Strategy for Testing Concurrent Programs
Xiaofang Qi and Huayang Zhou
(Southeast University, China)
Reachability testing is an important approach to testing concurrent programs. It generates and exercises every partially ordered synchronization sequence automatically and on-the-fly without constructing a static model and saving any test history. However, test sequences generated by existing reachability testing are ineffective in detecting concurrency faults involved in access anomalies, such as accessing shared variables without proper protections or synchronizations, incorrectly shrinking or shifting critical regions, etc. In this paper, we present a splitting strategy as well as a prototype tool called SplitRichTest for revealing such concurrency faults. SplitRichTest adopts the framework of conventional reachability testing while implementing our splitting strategy. By splitting relevant codes, SplitRichTest generates and exercises fine-grained synchronization sequences. The key ingredient of SplitRichTest is an efficient heuristic algorithm to select and sort candidate splitting points. We conducted an empirical study on representative concurrent Java programs and evaluated the effectiveness of SplitRichTest using mutation testing. Experimental results show that our splitting strategy facilitates generating more fine-grained synchronization sequences and significantly improves concurrency fault detection capability of reachability testing.

Article Search
Understanding Node Change Bugs for Distributed Systems
Jie Lu, Liu Chen, Lian Li, and Xiaobing Feng
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)
Distributed systems are the fundamental infrastructure for modern cloud applications and the reliability of these systems directly impacts service availability. Distributed systems run on clusters of nodes. When the system is running, nodes can join or leave the cluster at anytime, due to unexpected failure or system maintenance. It is essential for distributed systems to tolerate such node changes. However, it is also notoriously difficult and challenging to handle node changes right. There are widely existing node change bugs which can lead to catastrophic failures. We believe that a comprehensive study on node change bugs is necessary to better prevent and diagnose node change bugs. In this paper, we perform an extensive empirical study on node change bugs. We manually went through 6,660 bug issues of 5 representative distributed systems, where 620 issues were identified as node change bugs. We studied 120 bug examples in detail to understand the root causes, the impacts, the trigger conditions and fixing strategies of node change bugs. Our findings shed lights on new detection and diagnosis techniques for node change bugs.
In our empirical study, we develop two useful tools, NCTrigger and NPEDetector. NCTrigger helps users to automatically reproduce a node change bug by injecting node change events based on user specification. It largely reduces the manual efforts to reproduce a bug (from 2 days to less than half a day). NPEDetector is a static analysis tool to detect null pointer exception errors. We develop this tool based on our findings that node operations often lead to null pointer exception errors, and these errors share a simple common pattern. Experimental results show that this tool can detect 60 new null pointer errors, including 7 node change bugs. 23 bugs have already been patched and fixed.

Article Search
A Neural Model for Method Name Generation from Functional Description
Sa Gao, Chunyang Chen, Zhenchang Xing, Yukun Ma, Wen Song, and Shang-Wei Lin
(Nanyang Technological University, Singapore; Monash University, Australia; Australian National University, Australia)
The names of software artifacts, e.g., method names, are important for software understanding and maintenance, as good names can help developers easily understand others' code. However, the existing naming guidelines are difficult for developers, especially novices, to come up with meaningful, concise and compact names for the variables, methods, classes and files. With the popularity of open source, an enormous amount of project source code can be accessed, and the exhaustiveness and instability of manually naming methods could now be relieved by automatically learning a naming model from a large code repository. Nevertheless, building a comprehensive naming system is still challenging, due to the gap between natural language functional descriptions and method names. Specifically, there are three challenges: how to model the relationship between the functional descriptions and formal method names, how to handle the explosion of vocabulary when dealing with large repositories, and how to leverage the knowledge learned from large repositories to a specific project. To answer these questions, we propose a neural network to directly generate readable method names from natural language description. The proposed method is built upon the encoder-decoder framework with the attention and copying mechanisms. Our experiments show that our method can generate meaningful and accurate method names and achieve significant improvement over the state-of-the-art baseline models. We also address the cold-start problem using a training trick to utilize big data in Github for specific projects.

Article Search
Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification
Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang
(Singapore Management University, Singapore; Open University, UK)
Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language.
To recognize the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognize algorithm classes across languages. We have instantiated the framework with several kinds of token-, tree- and graph-based neural networks that encode and learn various kinds of information in code.
We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures. Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated. Our study points to a possible future research direction to tailor bilateral and multilateral neural networks that encode more relevant semantics for code learning, mining and analysis tasks.

Article Search
DeepLink: A Code Knowledge Graph Based Deep Learning Approach for Issue-Commit Link Recovery
Rui Xie, Long Chen, Wei Ye, Zhiyu Li, Tianxiang Hu, Dongdong Du, and Shikun Zhang
(Peking University, China)
Links between issue reports and corresponding code commits to fix them can greatly reduce the maintenance costs of a software project. More often than not, however, these links are missing and thus cannot be fully utilized by developers. Current practices in issue-commit link recovery extract text features and code features in terms of textual similarity from issue reports and commit logs to train their models. These approaches are limited since semantic information could be lost. Furthermore, few of them consider the effect of source code files related to a commit on issue-commit link recovery, let alone the semantics of code context. To tackle these problems, we propose to construct code knowledge graph of a code repository and generate embeddings of source code files to capture the semantics of code context. We also use embeddings to capture the semantics of issue- or commit-related text. Then we use these embeddings to calculate semantic similarity and code similarity using a deep learning approach before training a SVM binary classification model with additional features. Evaluations on real-world projects show that our approach DeepLink can outperform the state-of-the-art method.

Article Search
CNN-FL: An Effective Approach for Localizing Faults using Convolutional Neural Networks
Zhuo Zhang, Yan Lei, Xiaoguang Mao, and Panpan Li
(National University of Defense Technology, China; Chongqing University, China)
Fault localization aims at identifying suspicious statements potentially responsible for failures. The recent rapid progress on deep learning shows the promising potential of many neural network architectures in making sense of data, and more importantly, this potential offers a new prospective probably benefiting fault localization. Thus, this paper proposes CNN-FL: an approach for localizing faults based on convolutional neural networks to explore the promising potential of deep learning in fault localization. Specifically, CNN-FL constructs a convolutional neural network customized for fault localization, and then trains the network with test cases, and finally evaluates the suspiciousness of each statement by testing the trained model using a virtual test set. Our empirical results show that CNN-FL significantly improves fault localization effectiveness.

Article Search
Avatar: Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé
(University of Luxembourg, Luxembourg)
Fix pattern-based patch generation is a promising direction in Automated Program Repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of pattern-based APR systems, however, depends on the fix ingredients mined from fix changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. In this paper, we propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the AVATAR APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. Evaluated on the Defects4J benchmark, we show that, assuming a perfect localization of faults, AVATAR can generate correct patches to fix 34/39 bugs. We further find that AVATAR yields performance metrics that are comparable to that of the closely-related approaches in the literature. While AVATAR outperforms many of the state-of-the-art pattern-based APR systems, it is mostly complementary to current approaches. Overall, our study highlights the relevance of static bug finding tools as indirect contributors of fix ingredients for addressing code defects identified with functional test cases.

Article Search
Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies
Fernanda Madeiral, Simon Urli, Marcelo Maia, and Martin Monperrus
(Federal University of Uberlândia, Brazil; Inria, France; University of Lille, France; KTH, Sweden)
Benchmarks of bugs are essential to empirically evaluate automatic program repair tools. In this paper, we present Bears, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java. The collection of bugs relies on commit building state from Continuous Integration (CI) to find potential pairs of buggy and patched program versions from open-source projects hosted on GitHub. Each pair of program versions passes through a pipeline where an attempt of reproducing a bug and its patch is performed. The core step of the reproduction pipeline is the execution of the test suite of the program on both program versions. If a test failure is found in the buggy program version candidate and no test failure is found in its patched program version candidate, a bug and its patch were successfully reproduced. The uniqueness of Bears is the usage of CI (builds) to identify buggy and patched program version candidates, which has been widely adopted in the last years in open-source projects. This approach allows us to collect bugs from a diversity of projects beyond mature projects that use bug tracking systems. Moreover, Bears was designed to be publicly available and to be easily extensible by the research community through automatic creation of branches with bugs in a given GitHub repository, which can be used for pull requests in the Bears repository. We present in this paper the approach employed by Bears, and we deliver the version 1.0 of Bears, which contains 251 reproducible bugs collected from 72 projects that use the Travis CI and Maven build environment.

Article Search
Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities
Martin White, Michele Tufano, Matías Martínez, Martin Monperrus, and Denys Poshyvanyk
(College of William and Mary, USA; Polytechnic University of Hauts-de-France, France; KTH, Sweden)
In the field of automated program repair, the redundancy assumption claims large programs contain the seeds of their own repair. However, most redundancy-based program repair techniques do not reason about the repair ingredients—the code that is reused to craft a patch. We aim to reason about the repair ingredients by using code similarities to prioritize and transform statements in a codebase for patch generation. Our approach, DeepRepair, relies on deep learning to reason about code similarities. Code fragments at well-defined levels of granularity in a codebase can be sorted according to their similarity to suspicious elements (i.e., code elements that contain suspicious statements) and statements can be transformed by mapping out-of-scope identifiers to similar identifiers in scope. We examined these new search strategies for patch generation with respect to effectiveness from the viewpoint of a software maintainer. Our comparative experiments were executed on six open-source Java projects including 374 buggy program revisions and consisted of 19,949 trials spanning 2,616 days of computation time. DeepRepair’s search strategy using code similarities generally found compilable ingredients faster than the baseline, jGenProg, but this improvement neither yielded test-adequate patches in fewer attempts (on average) nor found significantly more patches (on average) than the baseline. Although the patch counts were not statistically different, there were notable differences between the nature of DeepRepair patches and jGenProg patches. The results show that our learning-based approach finds patches that cannot be found by existing redundancy-based repair techniques.

Article Search Info
On the Relation between Outdated Docker Containers, Severity Vulnerabilities, and Bugs
Ahmed Zerouali, Tom Mens, Gregorio Robles, and Jesus M. Gonzalez-Barahona
(University of Mons, Belgium; Universidad Rey Juan Carlos, Spain)
Packaging software into containers is becoming a common practice when deploying services in cloud and other environments. Docker images are one of the most popular container technologies for building and deploying containers. A container image usually includes a collection of software packages, that can have bugs and security vulnerabilities that affect the container health. Our goal is to support container deployers by analyzing the relation between outdated containers and vulnerable and buggy packages installed in them. We use the concept of technical lag of a container as the difference between a given container and the most up-to-date container that is possible with the most recent releases of the same collection of packages. For 7,380 official and community Docker images that are based on the Debian Linux distribution, we identify which software packages are installed in them and measure their technical lag in terms of version updates, security vulnerabilities and bugs. We have found, among others, that no release is devoid of vulnerabilities, so deployers cannot avoid vulnerabilities even if they deploy the most recent packages. We offer some lessons learned for container developers in regard to the strategies they can follow to minimize the number of vulnerabilities. We argue that Docker container scan and security management tools should improve their platforms by adding data about other kinds of bugs and include the measurement of technical lag to offer deployers information of when to update.

Article Search
Exploring Regular Expression Evolution
Peipei Wang, Gina R. Bai, and Kathryn T. Stolee
(North Carolina State University, USA)
Although there are tools to help developers understand the matching behaviors between a regular expression and a string, regular-expression related faults are still common. Learning developers' behavior through the change history of regular expressions can identify common edit patterns, which can inform the creation of mutation and repair operators to assist with testing and fixing regular expressions. In this work, we explore how regular expressions evolve over time, focusing on the characteristics of regular expression edits, the syntactic and semantic difference of the edits, and the feature changes of edits. Our exploration uses two datasets. First, we look at GitHub projects that have a regular expression in their current version and look back through the commit logs to collect the regular expressions' edit history. Second, we collect regular expressions composed by study participants during problem-solving tasks. Our results show that 1) 95% of the regular expressions from GitHub are not edited, 2) most edited regular expressions have a syntactic distance of 4-6 characters from their predecessors, 3) over 50% of the edits in GitHub tend to expand the scope of regular expression, and 4) the number of features used indicates the regular expression language usage increases over time. This work has implications for supporting regular expression repair and mutation to ensure test suite quality.

Article Search

RENE Track

Mining Scala Framework Extensions for Recommendation Patterns
Yunior Pacheco, Jonas De Bleser, Tim Molderez, Dario Di Nucci, Wolfgang De Meuter, and Coen De Roover
(Vrije Universiteit Brussel, Belgium; Pinar del Rio University, Cuba)
To use a framework, developers often need to hook into and customise some of its functionality. For example, a common way of customising a framework is to subclass a framework type and to override some of its methods. Recently, Asaduzzaman et al. defined these customisations as extension points and proposed a new approach to mine large amounts of Java code examples and recommend the most frequently used example, so called extension patterns. Indeed, recommending extension patterns that frequently occur at such extension points can help developers to adopt a new framework correctly and to fully exploit it. In this paper, we present a differentiated replication study of the work by Asaduzzaman et al. on Java frameworks. Our aim is to replicate the work in order to analyse extension points and extension patterns in the context of Scala frameworks. To this aim, we propose SCALA-XP-MINER, a tool for mining extension patterns in Scala software systems to empirically investigate our hypotheses. Our results show that the approach proposed by the reference work is also able to mine extension patterns for Scala frameworks and that our tool is able to achieve similar Precision, Recall and F-measure compared to FEMIR. Despite this, the distribution of the extension points by category is different and most of the patterns are rather simple. Thus, the challenge of recommending more complex patterns to Scala developers is still an open problem.

Article Search Info
Reuse (or Lack Thereof) in Travis CI Specifications: An Empirical Study of CI Phases and Commands
Puneet Kaur Sidhu, Gunter Mussbacher, and Shane McIntosh
(McGill University, Canada)
Continuous Integration (CI) is a widely used practice where code changes are automatically built and tested to check for regression as they appear in the Version Control System (VCS). CI services allow users to customize phases, which define the sequential steps of build jobs that are triggered by changes to the project. While past work has made important observations about the adoption and usage of CI, little is known about patterns of reuse in CI specifications. Should reuse be common in CI specifications, we envision that a tool could guide developers through the generation of CI specifications by offering suggestions based on popular sequences of phases and commands. To assess the feasibility of such a tool, we perform an empirical analysis of the use of different phases and commands in a curated sample of 913 CI specifications for Java-based projects that use Travis CI—one of the most popular public CI service providers. First, we observe that five of nine phases are used in 18%-75% of the projects. Second, for the five most popular phases, we apply association rule mining to discover frequent phase, command, and command category usage patterns. Unfortunately, we observe that the association rules lack sufficient support, confidence, or lift values to be considered statistically significantly interesting. Our findings suggest that the usage of phases and commands in Travis CI specifications are broad and diverse. Hence, we cannot provide suggestions for Java-based projects as we had envisioned.

Article Search
Is Self-Admitted Technical Debt a Good Indicator of Architectural Divergences?
Giancarlo Sierra, Ahmad Tahmid, Emad Shihab, and Nikolaos Tsantalis
(Concordia University, Canada)
Large software systems tend to be highly complex and often contain unaddressed issues that evolve from bad design practices or architectural implementations that drift from definition. These design flaws can originate from quick fixes, hacks or shortcuts to a solution, hence they can be seen as Technical Debt. Recently, new work has focused on studying source code comments that indicate Technical Debt, i.e., Self-Admitted Technical Debt (SATD). However, it is not known if addressing information left by developers in the form source code comments can give insight about the design flaws in a system and have the potential to provide fixes for bad architectural implementations. This paper investigates the possibility of using SATD comments to resolve architectural divergences. We leverage a data set of previously classified SATD comments to trace them to the architectural divergences of a large open source system, namely ArgoUML. We extract its conceptual and concrete architectures based on available design documentation and source code, and contrast both to expose divergences, trace them to SATD comments, and investigate their resolution. We found 7 high-level divergences in ArgoUML and 22 others among its subsystems, observing that merely 4 out of 29 (14%) divergences can be directly traced to SATD. Although using SATD as an indicator of architectural divergences is viable, the effort of doing so is time-intensive, and in general, will not lend to a significant reduction of architectural flaws in a software system.

Article Search

Industry Track

Identifying Feature Clones: An Industrial Case Study
Muslim Chochlov, Michael English, Jim Buckley, Daniel Ilie, and Maria Scanlon
(University of Limerick, Ireland; Wood, Ireland)
During its software evolution, the original software system of our industrial partner was split into three variants. These have evolved over time, but retained a lot of common functionality. During strategical planning our industrial partner realized the need for consolidation of common code in a shared code base towards more efficient code maintenance and re-use. To support this agenda, a feature-clone identification approach was proposed, combining elements of feature location (to identify the relevant code in one system) and clone detection (to identify that common feature's code across systems) techniques.
In this work, this approach is used (via our prototype tool CoRA) to locate three features that were identified by the industrial partner for re-factoring, and is evaluated. The methodology, involving a system expert, was designed to evaluate the discrete parts of the approach in isolation: textual and static analyses of feature location, and clone detection. It was found that the approach can effectively identify features and their clones. The hybrid textual/static feature location part is effective even for a relative system novice, showing results comparable to more optimal system expert's suggestions. Finally, more effective feature location increases the effectiveness of the clone detection part of the approach.

Article Search
Towards Generating Cost-Effective Test-Suite for Ethereum Smart Contract
Xingya Wang, Haoran Wu, Weisong Sun, and Yuan Zhao
(Nanjing University, China)
In Ethereum, many accounts and funds have been managed by smart contracts, thereby making them easy to be targeted. Due to the persistence characteristic of blockchain, revising a deployed smart contract is almost impossible. Both realities heighten the risks of managing funds and thus increase the demand for conducting sufficient testing to Ethereum Smart Contracts (ESC). Different from the conventional software, ESC is a gas-driven program, where developers must charge gases for deploying and testing it. Therefore, it is important to provide a cost-effective yet representative test suite, where its representativeness can be typically measured by its branch coverage. In this paper, we deem the problem of ESC test generation as a Pareto minimization problem, and three objectives, minimizing (1) uncovered branch coverage, (2) time cost, and (3) gas cost are considered. Then, we propose a random based and an NSGA-II based multi-objective approach to seek cost-effective test-suites. Our empirical study on a set of smart contracts in eight of the most widely used Ethereum Decentralized Applications (DApps) verified that the proposed approaches could significantly reduce the gas cost as well as the time cost while retaining the ability to cover branches.

Article Search
EVM*: From Offline Detection to Online Reinforcement for Ethereum Virtual Machine
Fuchen Ma, Ying Fu, Meng Ren, Mingzhe Wang, Yu Jiang, Kaixiang Zhang, Huizhong Li, and Xiang Shi
(Beijing University of Posts and Telecommunications, China; Tsinghua University, China; Sun Yat-sen University, China; WeBank, China)
Attacks on transactions of Ethereum could be dangerous because they could lead to a big loss of money. There are many tools detecting vulnerabilities in smart contracts trying to avoid potential attacks. However, we found that there are still many missed vulnerabilities in contracts. Motivated by this, we propose a methodology to reinforce EVM to stop dangerous transactions in real time even when the smart contract contains vulnerabilities. Basically, the methodology consists of three steps: monitoring strategy definition, opcode-structure maintenance and EVM instrumentation. Monitoring strategy definition refers to the specific rule to test whether there is a dangerous operation during transaction execution. Opcode-structure maintenance is to maintain a structure to store the rule related opcodes and analyze it before an operation execution. EVM instrumentation inserts the monitoring strategy, interrupting mechanism and the opcode-structure operations in EVM source code. For evaluation, we implement EVM* on js-evm, a widely-used EVM platform written in javascript. We collect 10 contracts online with known bugs and use each contract to execute a dangerous transaction, all of them have been interrupted by our reinforced EVM*, while the original EVM permits all attack transactions. For the time overhead, the reinforced EVM* is slower than the original one by 20-30%, which is tolerable for the financial critical applications.

Article Search
Testing the Message Flow of Android Auto Apps
Yu Zhang, Xi Deng, Jun Yan, Hang Su, and Hongyu Gao
(Beijing University of Technology, China; Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)
Android Auto is designed to enhance the driving experience by extending dashboards of cars with smartphones' functionalities, among which an essential one is the message flow via notification mechanism. This paper investigates the quality of current compatible apps, and locates two main error-prone points. The study begins with manually designed black-box testing models including finite state machine and combinatorial input model according to safety requirements, and extracts testing suites from them. The tests are executed on 17 popular apps and reveal dozens of defects that might result in safety risks or inferior driving experiences. These defects are manually inspected and organized into several patterns. The experience and lessons from this empirical study are helpful to the detailed design and implementation of messaging modules.

Article Search
Open-Source License Violations of Binary Software at Large Scale
Muyue Feng, Weixuan Mao, Zimu Yuan, Yang Xiao, Gu Ban, Wei Wang, Shiyang Wang, Qian Tang, Jiahuan Xu, He Su, Binghong Liu, and Wei Huo
(Institute of Information Engineering at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; National Computer Network Emergency Response Technical Team, China)
Open-source licenses are widely used in open-source projects. However, developers using or modifying the source code of open-source projects do not always strictly follow the licenses. GPL and AGPL, two of the most popular copyleft licenses, are most likely to be violated, because they require developers to open-source the entire project if any code under GPL/AGPL protection is included whether modified or not. There are few license violation detectors focusing on binary software, owning to the challenge of mapping binary code to source code efficiently and accurately at large scale. In this paper, we propose a scalable and fully-automated system to check open-source license violation of binary software at large scale. We match source code to binary code by analyzing file attributes of executable files and code features that are not affected by compilation and could vary between projects. Moreover, to break the barrier of large-scale analysis, we introduce an automatic extractor to parse executable files from installation packages that are broadly available in software download sites. In empirical experiments of binary-to-source mapping, we have got a remarkable high accuracy of 99.5% and recall of 95.6% without significant loss of precision. Besides, 2270 pairs of binary-to-source mapping relationships are discovered, with 110 license violations of GPL and AGPL licenses related to 7.2% of the 1000 real-world binary software projects.

Article Search
Qualify First! A Large Scale Modernisation Report
Leszek Włodarski, Boris Pereira, Ivan Povazan, Johan Fabry, and Vadim Zaytsev
(mBank, Poland; Raincode Labs, Belgium)
Typically in modernisation projects any concerns for code quality are silenced until the end of the migration, to simplify an already complex process. Yet, we claim from experience that prioritising quality above many other issues has many benefits. In this experience report, we discuss a modernisation project of mBank, a big Polish bank, where bad smell detection and elimination, automated testing and refactoring played a crucial rule, provided pay-offs early in the project, increased buy-in, and ensured maintainability of the end result.

Article Search
Challenges of SonarQube Plug-In Maintenance
Bence Barta, Günter Manz, István Siket, and Rudolf Ferenc
(University of Szeged, Hungary; FrontEndART Software, Hungary)
The SONARQUBE TM platform is a widely used open-source tool for continuous code quality management. It provides an API to extend the platform with plug-ins to upload additional data or to enrich its functionalities. The SourceMeter plug-in for SONARQUBE TM platform integrates the SourceMeter static source code analyzer tool into the SONARQUBE TM platform, i.e., uploads the analysis results and extends the GUI to be able to present the new results. The first version of the plug-in was released in 2015 and was compatible with the corresponding SONARQUBE TM version. However, the platform - and what is more important, its API - have evolved a lot since then, therefore the plug-in had to be adapted to the new API. It was not just a slight adjustment, though, because we had to redesign and reimplement the whole UI and, at the same time, perform significant alterations in other parts of the plug-in as well. Besides, we examined the effect of the API evolution on other open-source plug-ins and found that most of them still remain compatible with the latest version, even if they have not been updated alongside the underlying API modifications. The reason for this is that these plug-ins use only a small part of the API that have not changed over time.

Article Search Video Info
GUI Migration using MDE from GWT to Angular 6: An Industrial Case
Benoît Verhaeghe, Anne Etien, Nicolas Anquetil, Abderrahmane Seriai, Laurent Deruelle, Stéphane Ducasse, and Mustapha Derras
(University of Lille, France; CNRS, France; Inria, France; Berger-Levrault, France)
During the evolution of an application, it happensthat developers must change the programming language. Inthe context of a collaboration with Berger-Levrault, a majorIT company, we are working on the migration of a GWTapplication to Angular. We focus on the GUI aspect of thismigration which, even if both frameworks are web GraphicalUser Interface (GUI) frameworks, is made difficult because theyuse different programming languages and different organizationschema. Such migration is complicated by the fact that the newapplication must be able to mimic closely the visual aspect of theold one so that the users of the application are not disrupted.We propose an approach in four steps that uses a meta-model torepresent the GUI at a high abstraction level. We evaluated thisapproach on an application comprising 470 Java (GWT) classesrepresenting 56 pages. We are able to model all the web pagesof the application and 93% of the widgets they contain, and wesuccessfully migrated 26 out of 39 pages (66%). We give examplesof the migrated pages, both successful and not.

Article Search

ERA Track

Program State Coverage: A Test Coverage Metric Based on Executed Program States
Khashayar Etemadi Someoliayi, Sajad Jalali, Mostafa Mahdieh, and Seyed-Hassan Mirian-Hosseinabadi
(Sharif University of Technology, Iran)
In software testing, different metrics are proposed to predict and compare test suites effectiveness. In this regard, Mutation Score (MS) is one of most accurate metrics. However, calculating MS needs executing test suites many times and it is not commonly used in industry. On the other hand, Line Coverage (LC) is a widely used metric which is calculated by executing test suites only once, although it is not as accurate as MS in terms of predicting and comparing test suites effectiveness. In this paper, we propose a novel test coverage metric, called Program State Coverage (PSC), which improves the accuracy of LC. PSC works almost the same as LC and it can be calculated by executing test suites only once. However, it further considers the number of distinct program states in which each line is executed. Our experiments on 120 test suites from four packages of Apache Commons Math and Apache Commons Lang show that, compared to LC, PSC is more strongly correlated with normalized MS. As a result, we conclude that PSC is a promising test coverage metric.

Article Search
On the Diversity of Software Package Popularity Metrics: An Empirical Study of npm
Ahmed Zerouali, Tom Mens, Gregorio Robles, and Jesus M. Gonzalez-Barahona
(University of Mons, Belgium; Universidad Rey Juan Carlos, Spain)
Software systems often leverage on open source software libraries to reuse functionalities. Such libraries are readily available through software package managers like npm for JavaScript. Due to the huge amount of packages available in such package distributions, developers often decide to rely on or contribute to a software package based on its popularity. Moreover, it is a common practice for researchers to depend on popularity metrics for data sampling and choosing the right candidates for their studies. However, the meaning of popularity is relative and can be defined and measured in a diversity of ways, that might produce different outcomes even when considered for the same studies. In this paper, we show evidence of how different is the meaning of popularity in software engineering research. Moreover, we empirically analyse the relationship between different software popularity measures. As a case study, for a large dataset of 175k npm packages, we computed and extracted 9 different popularity metrics from three open source tracking systems:, and GitHub. We found that indeed popularity can be measured with different unrelated metrics, each metric can be defined within a specific context. This indicates a need for a generic framework that would use a portfolio of popularity metrics drawing from different concepts.

Article Search
On the Impact of Refactoring Operations on Code Naturalness
Bin Lin, Csaba Nagy, Gabriele Bavota, and Michele Lanza
(USI Lugano, Switzerland)
Recent studies have demonstrated that software is natural, that is, its source code is highly repetitive and predictable like human languages. Also, previous studies suggested the existence of a relationship between code quality and its naturalness, presenting empirical evidence showing that buggy code is "less natural" than non-buggy code. We conjecture that this quality-naturalness relationship could be exploited to support refactoring activities (e.g., to locate source code areas in need of refactoring). We perform a first step in this direction by analyzing whether refactoring can improve the naturalness of code.
We use state-of-the-art tools to mine a large dataset of refactoring operations performed in open source systems. Then, we investigate the impact of different types of refactoring operations on the naturalness of the impacted code. We found that (i) code refactoring does not necessarily increase the naturalness of the refactored code; and (ii) the impact on the code naturalness strongly depends on the type of refactoring operations.

Article Search
Knowledge Graphing Git Repositories: A Preliminary Study
Yanjie Zhao, Haoyu Wang, Lei Ma, Yuxin Liu, Li Li, and John Grundy
(Beijing University of Posts and Telecommunications, China; Harbin Institute of Technology, China; Monash University, Australia)
Knowledge Graph, being able to connect information from a variety of sources, has become very famous in recent years since its creation in 2012 by Google. Researchers in our community have leveraged Knowledge Graph to achieve various purposes such as improving API caveats accessibilities, generating answers to developer questions, and reasoning common software weaknesses, etc. In this work, we would like to leverage the knowledge graph concept for helping developers and project managers to comprehend software repositories. To this end, we design and implement a prototype tool called GitGraph, which takes as input a Git repository and constructs automatically a knowledge graph associated with the repository. Our preliminary experimental results show that GitGraph can correctly generate knowledge graphs for Git projects and the generated graphs are also useful for users to comprehend the projects. More specifically, the knowledge graph, on one hand, provides a graphic interface that users can interactively explore the integrated artefacts such as commits and changed methods, while on the other hand, provides a convenient means for users to search for advanced relations between the different artefacts.

Article Search
Should You Consider Adware as Malware in Your Study?
Jun Gao, Li Li, Pingfan Kong, Tegawendé F. Bissyandé, and Jacques Klein
(University of Luxembourg, Luxembourg; Monash University, Australia)
Empirical validations of research approaches eventually require a curated ground truth. In studies related to Android malware, such a ground truth is built by leveraging Anti-Virus (AV) scanning reports which are often provided freely through online services such as VirusTotal. Unfortunately, these reports do not offer precise information for appropriately and uniquely assigning classes to samples in app datasets: AV engines indeed do not have a consensus on specifying information in labels. Furthermore, labels often mix information related to families, types, etc. In particular, the notion of “adware” is currently blurry when it comes to maliciousness. There is thus a need to thoroughly investigate cases where adware samples can actually be associated to malware (e.g., because they are tagged as adware but could be considered as malware as well).
In this work, we present a large-scale analytical study of Android adware samples to quantify to what extent “adware should be considered as malware”. Our analysis is based on the Androzoo repository of 5 million apps with associated AV labels, and leverages a state-of-the-art label harmonization tool to infer the malicious type of apps before confronting it against the ad families that each adware app is associated with. We found that all adware families include samples that are actually known to implement specific malicious behavior types. Up to 50% of samples in an ad family could be flagged as malicious. Overall the study demonstrates that adware is not necessarily benign.

Article Search
Please Help! A Preliminary Study on the Effect of Social Proof and Legitimization of Paltry Contributions in Donations to OSS
Ugo Yukizawa, Masateru Tsunoda, and Amjed Tahir
(Kindai University, Japan; Massey University, New Zealand)
Open source communities have contributed widely to modern software development. The number of open source software (OSS) has increased rapidly in the past two decades. Most open source foundations (such as Eclipse, Mozilla and Apache) operate as non-profit, those foundations usually seek donations from users/developers to financially support their activities. Without such support, some projects might discontinue to develop, or even disappear. However, contributions to those foundations are usually solicited in a very simple and modest way, with no special promotions or attractions for such contributions. The aim of this study is to promote new strategies that can help to increase donations to OSS projects. We analyzed how existing donation pages are structured. We then introduce behavioral economics and psychological theories that have been used in other disciplines to promote donations in OSS. In particular, we used the social proof theory, i.e., where people tend to consider the actions of others in an attempt to reflect correct behavior when they choose their own actions, and legitimization of paltry contributions strategy i.e., using specific phrases such as “even a very small amount will be helpful” to encourage donations. In this study, we conducted an experiment with university students to examine if those theories are effective in encouraging donations to OSS. Our initial results indicate that the two strategies were indeed effective in promoting donations, and showed that users were more open for donation compared to traditional methods. This is only a preliminary analysis - we aim to include more users in the future for a more comprehensive analysis. We anticipate that such techniques might help OSS projects to secure more donations in the future.

Article Search
DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems
Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao
(Harbin Institute of Technology, China; Carnegie Mellon University, USA; Macquarie University, Australia; University of Illinois at Urbana-Champaign, USA; Monash University, Australia; Nanyang Technological University, Singapore; Kyushu University, Japan)
Deep learning (DL) has achieved remarkable progress over the past decade and has been widely applied to many industry domains. However, the robustness of DL systems recently becomes great concerns, where minor perturbation on the input might cause the DL malfunction. These robustness issues could potentially result in severe consequences when a DL system is deployed to safety-critical applications and hinder the real-world deployment of DL systems. Testing techniques enable the robustness evaluation and vulnerable issue detection of a DL system at an early stage. The main challenge of testing a DL system attributes to the high dimensionality of its inputs and large internal latent feature space, which makes testing each state almost impossible. For traditional software, combinatorial testing (CT) is an effective testing technique to balance the testing exploration effort and defect detection capabilities. In this paper, we perform an exploratory study of CT on DL systems. We propose a set of combinatorial testing criteria specialized for DL systems, as well as a CT coverage guided test generation technique. Our evaluation demonstrates that CT provides a promising avenue for testing DL systems.

Article Search
On the Impact of Outdated and Vulnerable Javascript Packages in Docker Images
Ahmed Zerouali, Valerio Cosentino, Tom Mens, Gregorio Robles, and Jesus M. Gonzalez-Barahona
(University of Mons, Belgium; Bitergia, Spain; Universidad Rey Juan Carlos, Spain)
Containerized applications, and in particular Docker images, are becoming a common solution in cloud environments to meet ever-increasing demands in terms of portability, reliability and fast deployment. A Docker image includes all environmental dependencies required to run it, such as specific versions of system and third-party packages. Leveraging on its modularity, an image can be easily embedded in other images, thus simplifying the way of sharing dependencies and building new software. However, the dependencies included in an image may be out of date due to backward compatibility requirements, endangering the environments where the image has been deployed with known vulnerabilities. While previous research efforts have focused on studying the impact of bugs and vulnerabilities of system packages within Docker images, no attention has been given to third party packages. This paper empirically studies the impact of npm JavaScript package vulnerabilities in Docker images. We based our analysis on 961 images from three official repositories that use Node.js, and 1,099 security reports of packages available on npm, the most popular JavaScript package manager. Our results reveal that the presence of outdated npm packages in Docker images increases the risk of potential security vulnerabilities, suggesting that Docker maintainers should keep their installed JavaScript packages up to date.

Article Search

Tool Demonstrations

GuardiaML: Machine Learning-Assisted Dynamic Information Flow Control
Angel Luis Scull Pupo, Jens Nicolay, Kyriakos Efthymiadis, Ann Nowé, Coen De Roover, and Elisa Gonzalez Boix
(Vrije Universiteit Brussel, Belgium)
Developing JavaScript and web applications with confidentiality and integrity guarantees is challenging. Information flow control enables the enforcement of such guarantees. However, the integration of this technique into software tools used by developers in their workflow is missing.
In this paper we present GuardiaML, a machine learning-assisted dynamic information flow control tool for JavaScript web applications. GuardiaML enables developers to detect unwanted information flow from sensitive sources to public sinks. It can handle the DOM and interaction with internal and external libraries and services. Because the specification of sources and sinks can be tedious, GuardiaML assists in this process by suggesting the tagging of sources and sinks via a machine learning component.

Article Search
OBLIVE: Seamless Code Obfuscation for Java Programs and Android Apps
Davide Pizzolotto, Roberto Fellin, and Mariano Ceccato
(Fondazione Bruno Kessler, Italy; University of Trento, Italy)
Malicious reverse engineering is a problem when a program is delivered to the end users. In fact, an end user might try to understand the internals of the program, in order to elaborate an attack, tamper with the software and alter its behaviour. Code obfuscation represents a mitigation to these kind of malicious reverse engineering and tampering attacks, making programs harder to analyze (by a tool) and understand (by a human).
In this paper, we present Oblive, a tool meant to support developers in applying code obfuscation to their programs. A developer is required to specify security requirements as single-line code annotations only. Oblive, then, reads annotations and applies state-of-the-art data and code obfuscation, namely xor-mask with opaque mask and java-to-native code, while the program is being compiled. Oblive is successfully applied both to plain Java programs and Android apps.
Showcase videos are available for the code obfuscation part and for the data obfuscation part

Article Search
Madoop: Improving Browser-Based Volunteer Computing Based on Modern Web Technologies
Hiroyuki Matsuo, Shinsuke Matsumoto, Yoshiki Higo, and Shinji Kusumoto
(Osaka University, Japan)
Browser-based volunteer computing (BBVC) is one of the distributed computing paradigms, attracting researchers’ and developers’ attention for its portability and extraordinary potential of computing power. However, BBVC still has two significant challenges: low programmability and performance. These challenges are a heavy burden for users and prevent BBVC from wide-spreading. In this paper, we propose a novel BBVC framework to solve the challenges by using MapReduce and WebAssembly. Our framework reduces the total execution time by 64% compared with a traditional BBVC mechanism. We also show a practical scenario and its performance.

Article Search Info
Automating Performance Antipattern Detection and Software Refactoring in UML Models
Davide Arcelli, Vittorio Cortellessa, and Daniele Di Pompeo
(University of L'Aquila, Italy)
The satisfaction of ever more stringent performance requirements is one of the main reasons for software evolution. However, it is complex to determine the primary causes of performance degradation, because they may depend on the joint combination of multiple factors (e.g., workload, software deployment, hardware utilization). With the increasing complexity of software systems, classical bottleneck analysis shows limitations in capturing complex performance problems. Hence, in the last decade, the detection of performance antipatterns has gained momentum as an effective way to identify performance degradation causes. We introduce PADRE (Performance Antipattern Detection and REfactoring), that is a tool for: (i) detecting performance antipattern in UML models, and (ii) refactoring models with the aim of removing the detected antipatterns. PADRE has been implemented within Epsilon, an open-source platform for model-driven engineering. It is based on a methodology that allows performance antipattern detection and refactoring within the same implementation context.

Article Search Video
ICSD: Interactive Visual Support for Understanding Code Control Structure
Ahmad Jbara, Mousa Agbaria, Alon Adoni, Malek Jabareen, and Ameen Yasin
(Netanya Academic College, Israel; University of Connecticut, USA)
Code comprehension is a mental process in any maintenance activity. It becomes decisive in large methods. Such methods are burdened with programming constructs as lines of code (LOC) correlate with McCabe's cyclomatic complexity (MCC). This makes it hard to capture their code, as they span many pages even in large screens, and as a result hinders grasping their structural properties that might be key for maintenance.
Visualization can assist in comprehending complex systems. It has been shown that control structure diagrams (CSDs) could be useful to better understand and discover structural properties, such as code regularity, of such large methods.
IDEs and development tools have been moving from desktops to the Web so as to benefit from this ubiquitous environment that provides instant collaboration and easier integration.
This paper presents ICSD, an interactive Web-based tool that implements CSD for Java methods. In particular it visualizes their control structure and nesting. The interactivity of the tool enables the developer to examine the underlying code of specific parts in the diagram. Using ICSD the developer can compare between different regions in the diagram that seem to have any kind of commonality or other interesting relation. ICSD easily conveys structural characteristics of the code, especially very large code, to the programmer and helps him to better understand and refactor.
To demonstrate the usage and usefulness of ICSD, very large functions from real softwares are presented and the different features of ICSD are shown.

Article Search
GoCity: Code City for Go
Rodrigo Brito, Aline Brito, Gleison Brito, and Marco Tulio Valente
(Federal University of Minas Gerais, Brazil)
Go is a statically typed and compiled language, which has been widely used to develop robust and popular projects. As other systems, these projects change over time. Developers commonly modify the source code to improve the quality or to implement new features. In this context, they use tools and approaches to support software maintenance tasks. However, there is a lack of tools to support Go developers in this process. To address these challenges, we introduce GoCity, a web-based implementation of the CodeCity program visualization metaphor for Go. The tool extracts source code metrics to create a software visualization in a automated way. We also report usage scenarios of GoCity involving three popular Go projects. Finally, we report the feedback of 12 developers about the GoCity of their projects.

Article Search Info

proc time: 8.02