SANER 2017 – Proceedings

What Information about Code Snippets Is Available in Different Software-Related Documents? An Exploratory Study
Preetha Chatterjee, Manziba Akanda Nishi, Kostadin Damevski, Vinay Augustine, Lori Pollock, and Nicholas A. Kraft
(University of Delaware, USA; Virginia Commonwealth University, USA; ABB Corporate Research, USA)
A large corpora of software−related documents is available on the Web, and these documents offer the unique opportunity to learn from what developers are saying or asking about the code snippets that they are discussing. For example, the natural language in a bug report provides information about what is not functioning properly in a particular code snippet. Previous research has mined information about code snippets from bug reports, emails, and Q&A forums. This paper describes an exploratory study into the kinds of information that is embedded in different software−related documents. The goal of the study is to gain insight into the potential value and difficulty of mining the natural language text associated with the code snippets found in a variety of software−related documents, including blog posts, API documentation, code reviews, and public chats.

Harnessing Twitter to Support Serendipitous Learning of Developers
Abhishek Sharma, Yuan Tian, Agus Sulistya, David Lo

, and Aiko Fallas Yamashita
(Singapore Management University, Singapore; Oslo and Akershus University College of Applied Sciences, Norway)
Developers often rely on various online resources, such as blogs, to keep themselves up-to-date with the fast pace at which software technologies are evolving. Singer et al. found that developers tend to use channels such as Twitter to keep themselves updated and support learning, often in an undirected or serendipitous way, coming across things that they may not apply presently, but which should be helpful in supporting their developer activities in future. However, identifying relevant and useful articles among the millions of pieces of information shared on Twitter is a non-trivial task. In this work to support serendipitous discovery of relevant and informative resources to support developer learning, we propose an unsupervised and a supervised approach to find and rank URLs (which point to web resources) harvested from Twitter based on their informativeness and relevance to a domain of interest. We propose 14 features to characterize each URL by considering contents of webpage pointed by it, contents and popularity of tweets mentioning it, and the popularity of users who shared the URL on Twitter. The results of our experiments on tweets generated by a set of 85,171 users over a one-month period highlight that our proposed unsupervised and supervised approaches can achieve a reasonably high Normalized Discounted Cumulative Gain (NDCG) score of 0.719 and 0.832 respectively.

Why Do We Break APIs? First Answers from Developers
Laerte Xavier, Andre Hora, and Marco Tulio Valente
(Federal University of Minas Gerais, Brazil)
Breaking contracts have a major impact on API clients. Despite this fact, recent studies show that libraries are often backward incompatible and that the rate of breaking changes increase over time. However, the specific reasons that motivate library developers to break contracts with their clients are still unclear. In this paper, we describe a qualitative study with library developers and real instance of API breaking changes. Our goal is to (i) elicit the reasons why developers introduce breaking changes; and (ii) check if they are aware about the risks of such changes. Our survey with the top contributors of popular Java libraries contributes to reveal a list of five reasons why developers break API contracts. Moreover, it also shows that most of developers are aware of these risks and, in some cases, adopt strategies to mitigate them. We conclude by prospecting a future study to strengthen our current findings. With this study, we expect to contribute on delineating tools to better assess the risks and impacts of API breaking changes.

An Arc-Based Approach for Visualization of Code Smells
Marcel Steinbeck
(University of Bremen, Germany)
Code smells are indicators of design flaws that may have negative effects on software comprehensibility and changeability. In recent years several detection tools have been developed that are supposed to help in revealing code smells in large size software systems. However, usually a subset of the detected code smells are suitable for refactorings only. Previous studies on software clones have shown that visualization of findings may assist developers in identifying relevant refactoring opportunities by highlighting peculiarities and, thus, is useful to enhance a software’s maintainability. Nevertheless, techniques to visualize code smells in general are rare, though, being an interesting field of research to bridge the gap between code smell detection and code smell refactoring. This paper presents a visualization approach that is supposed to help in assessing the dispersion and extent of arbitrary code smells by combining different existing techniques. The core of our approach consists of several Treemaps that are arranged on a circle in order to obtain a better integration of additional visualizations. Furthermore, the presented technique provides various interaction mechanisms that allow users to adjust the visualization to target elements of interest.

Towards Continuous Software Release Planning
David Ameller, Carles Farré, Xavier Franch, Danilo Valerio, and Antonino Cassarino
(Universitat Politècnica de Catalunya, Spain; Siemens, Austria)
Continuous software engineering is a new trend that is gaining increasing attention of the research community in the last years. The main idea behind this trend is to tighten the connection between the software engineering lifecycle activities (e.g., development, planning, integration, testing, etc.). While the connection between development and integration (i.e., continuous integration) has been subject of research and is applied in industrial settings, the connection between other activities is still in a very early stage. We are contributing to this research topic by proposing our ideas towards connecting the software development and software release planning activities (i.e., continuous software release planning). In this paper we present our initial findings on this topic, how we envision to address the continuous software release planning, and a research agenda to fulfil our objectives.

Evolution of Open Source Systems
Fri, Feb 24, 09:00 - 10:30

An Exploratory Study on Library Aging by Monitoring Client Usage in a Software Ecosystem
Raula Gaikovina Kula, Daniel M. German, Takashi Ishio, Ali Ouni, and Katsuro Inoue
(Osaka University, Japan; University of Victoria, Canada; United Arab Emirates University, United Arab Emirates)
In recent times, use of third-party libraries has become prevalent practice in contemporary software development. Much like other code components, unmaintained libraries are a cause for concern, especially when it risks code degradation over time. Therefore, awareness of when a library should be updated is important. With the emergence of large libraries hosting repositories such as Maven Central, we can leverage the dynamics of these ecosystems to understand and estimate when a library is due for an update. In this paper, based on the concepts of software aging, we empirically explore library usage as a means to describe its age. The study covers about 1,500 libraries belonging to the Maven software ecosystem. Results show that library usage changes are not random, with 81.7% of the popular libraries fitting typical polynomial models. Further analysis show that ecosystem factors such as emerging rivals has an effect on aging characteristics. Our preliminary findings demonstrate that awareness of library aging and its characteristics is a promising step towards aiding client systems in the maintenance of their libraries.

Trends on Empty Exception Handlers for Java Open Source Libraries
Ana Filipa Nogueira, José C. B. Ribeiro, and Mário A. Zenha-Rela
(University of Coimbra, Portugal; Polytechnic Institute of Leiria, Portugal)
Exception-handling structures provide a means to recover from unexpected or undesired flows that occur during software execution, allowing the developer to put the program in a valid state. Still, the application of proper exception-handling strategies is at the bottom of priorities for a great number of developers. Studies have already discussed this subject pinpointing that, frequently, the implementation of exception-handling mechanisms is enforced by compilers. As a consequence, several anti-patterns about Exception-handling are already identified in literature.
In this study, we have picked several releases from different Java programs and we investigated one of the most well-known anti-patterns: the empty catch handlers. We have analysed how the empty handlers evolved through several releases of a software product. We have observed some common approaches in terms of empty catches' evolution. For instance, often an empty catch is transformed into a empty catch with a comment. Moreover, for the majority of the programs, the percentage of empty handlers has decreased when comparing the first and last releases. Future work includes the automation of the analysis allowing the inclusion of data collected from other software artefacts: test suites and data from issue tracking systems.

Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
Ahmed Zerouali and Tom Mens
(University of Mons, Belgium)
Software development projects frequently rely on testing-related libraries to test the functionality of the software product automatically and efficiently. Many such libraries are available for Java, and developers face a hard time deciding which libraries are most appropriate for their project, or when to migrate to a competing library. We empirically analysed the usage of eight testing-related libraries in 4,532 open source Java projects hosted on GitHub. We studied how frequently specific (pairs of) libraries are used over time. We also identified if and when library usages are replaced by competing ones during a project’s lifetime. We found that some libraries are considerably more popular than their competitors, while some libraries become more popular over time. We observed that many projects tend to use multiple libraries together. We also observed permanent and temporary migrations between competing libraries. These findings may pave the way for recommendation tools that allow project developers to choose the most appropriate library for their needs, and to be informed of better alternatives.

On the Evolution of Exception Usage in Java Projects
Haidar Osman, Andrei Chiş, Jakob Schaerer, Mohammad Ghafari, and Oscar Nierstrasz
(University of Bern, Switzerland; Feenk, Switzerland)
Programming languages use exceptions to handle abnormal situations during the execution of a program. While programming languages often provide a set of standard exceptions, developers can further create custom exceptions to capture relevant data about project- and domain-specific errors. We hypothesize that, given their usefulness, custom exceptions are used increasingly as software systems mature. To assess this claim, we empirically analyze the evolution of exceptions and exception-handling code within four, popular and long-lived Java systems. We observe that indeed the amount of error-handling code, together with the number of custom exceptions and their usage in catch handlers and throw statements increase as projects evolve. However, we find that the usage of standard exceptions increases more than the usage of custom exceptions in both catch handlers and throw statements. A preliminary manual analysis of throw statements reveals that developers encode the domain information into the standard Java exceptions as custom string error messages instead of relying on custom exception classes.

Statically Identifying Class Dependencies in Legacy JavaScript Systems: First Results
Leonardo Humberto Silva, Marco Tulio Valente, and Alexandre Bergel
(Federal Institute of Northern Minas Gerais, Brazil; Federal University of Minas Gerais, Brazil; University of Chile, Chile)
Identifying dependencies between classes is an essential activity when maintaining and evolving software applications. It is also known that JavaScript developers often use classes to structure their projects. This happens even in legacy code, i.e., code implemented in JavaScript versions that do not provide syntactical support to classes. However, identifying associations and other dependencies between classes remain a challenge due to the lack of static type annotations. This paper investigates the use of type inference to identify relations between classes in legacy JavaScript code. To this purpose, we rely on Flow, a state-of-the-art type checker and inferencer tool for JavaScript. We perform a study using code with and without annotating the class import statements in two modular applications. The results show that precision is 100% in both systems, and that the annotated version improves the recall, ranging from 37% to 51% for dependencies in general and from 54% to 85% for associations. Therefore, we hypothesize that these tools should also depend on dynamic analysis to cover all possible dependencies in JavaScript code.

CodeCritics Applied to Database Schema: Challenges and First Results
Julien Delplanque, Anne Etien, Olivier Auverlot, Tom Mens, Nicolas Anquetil, and Stéphane Ducasse

(University of Mons, Belgium; University of Lille, France; Inria, France)
Relational databases (DB) play a critical role in many information systems. For different reasons, their schemas gather not only tables and columns but also views, triggers or stored functions (i.e., fragments of code describing treatments). As for any other code-related artefact, software quality in a DB schema helps avoiding future bugs. However, few tools exist to analyse DB quality and prevent the introduction of technical debt. Moreover, these tools suffer from limitations like the difficulty to deal with some entities (e.g., functions) or dependencies between entities. This paper presents research issues related to assessing the software quality of a DB schema by adapting existing source code analysis research to database schemas. We present preliminary results that have been validated through the implementation of DBCritics, a prototype tool to perform static analysis on the SQL source code of a database schema. DBCritics addresses the limitations of existing DB quality tools based on an internal representation considering all entities of the database and their relationships.

Patterns and Optimization
Fri, Feb 24, 14:00 - 15:30

Cloud-Based Parallel Concolic Execution
Ting Chen

, Youzheng Feng, Xiapu Luo

, Xiaodong Lin, and Xiaosong Zhang
(University of Electronic Science and Technology of China, China; Hong Kong Polytechnic University, China; University of Ontario Institute of Technology, Canada)
Path explosion is one of the biggest challenges hindering the wide application of concolic execution. Although several parallel approaches have been proposed to accelerate concolic execution, they neither scale well nor properly handle resource fluctuations and node failures, which often happen in practice. In this paper, we propose a novel approach, named PACCI, which parallelizes concolic execution and adapts to the drastic changes of computing resources by leveraging cloud infrastructures. PACCI tailors concolic execution to the MapReduce programming model and takes into account the features of cloud infrastructures. In particular, we tackle several challenging issues, such as making the exploration of different program paths independently and constructing an extensible path exploration module to support the prioritization of test inputs from a global perspective. Preliminary experimental results show that PACCI is scalable (e.g., gaining about 20X speedup using 24 nodes) and its efficiency declines slightly about 5% and 6.1% under resource fluctuations and node failures, respectively.

Under-Optimized Smart Contracts Devour Your Money
Ting Chen

, Xiaoqi Li, Xiapu Luo

, and Xiaosong Zhang
(University of Electronic Science and Technology of China, China; Hong Kong Polytechnic University, China)
Smart contracts are full-fledged programs that run on blockchains (e.g., Ethereum, one of the most popular blockchains). In Ethereum, gas (in Ether, a cryptographic currency like Bitcoin) is the execution fee compensating the computing resources of miners for running smart contracts. However, we find that under-optimized smart contracts cost more gas than necessary, and therefore the creators or users will be overcharged. In this work, we conduct the first investigation on Solidity, the recommended compiler, and reveal that it fails to optimize gas-costly programming patterns. In particular, we identify 7 gas-costly patterns and group them to 2 categories. Then, we propose and develop GASPER, a new tool for automatically locating gas-costly patterns by analyzing smart contracts' bytecodes. The preliminary results on discovering 3 representative patterns from 4,240 real smart contracts show that 93.5%, 90.1% and 80% contracts suffer from these 3 patterns, respectively.

Pluggable Controllers and Nano-Patterns
Yossi Gil

, Ori Marcovitch, and Matteo Orrù
(Technion, Israel)
This paper raises the idea of giving end users the ability to modify and extend the control flow constructs (if, while, etc.) of the underlying programming language, just as they can modify and extend the library standard implementation of function printf and class String. Pluggable Controllers are means for modular design of control constructors, e.g., if, while, do, switch, and operators such as short circuit conjunction (&&) and the “?.” operator of the Swift pro- gramming language. We propose a modular, pluggable controllers based, design of a language. In this design there are control constructors which are core, augmented by a standard library of control constructors, which just like all standard libraries, is extensible and replaceable. The control constructors standard library can then follow a course of evolution that is less coupled with that of the main language, where a library release does not mandate new language release. At the same time, the library could be extended by individuals, corporate and communities to implement more or less idiosyncratic Nano-Patterns. We demonstrate the imposition of pluggable control constructors on Java by employing Lola — a Turing-complete and programming language independent code preprocessor.

Query Construction Patterns in PHP
David Anderson and Mark Hills
(East Carolina University, USA)
Most PHP applications use databases, with developers including both static queries, given directly in the code, and dynamic queries, which are based on a mixture of static text, computed values, and user input. In this paper, we focus specifically on how developers create queries that are then used with the original MySQL API library. Based on a collection of open-source PHP applications, our initial results show that many of these queries are created according to a small collection of query construction patterns. We believe that identifying these patterns provides a solid base for program analysis, comprehension, and transformation tools that need to reason about database queries, including tools to support renovating existing PHP code to support safer, more modern database access APIs.

Info

Supporting Schema Evolution in Schema-Less NoSQL Data Stores
Loup Meurice and Anthony Cleve
(University of Namur, Belgium)
NoSQL data stores are becoming popular due to their schema-less nature. They offer a high level of flexibility, since they do not require to declare a global schema. Thus, the data model is maintained within the application source code. However, due to this flexibility, developers have to struggle with a growing data structure entropy and to manage legacy data. Moreover, support to schema evolution is lacking, which may lead to runtime errors or irretrievable data loss, if not properly handled. This paper presents an approach to support the evolution of a schema-less NoSQL data store by analyzing the application source code and its history. We motivate this approach on a subject system and explain how useful it is to understand the present database structure and facilitate future developments.

SANER 2017 – Proceedings

Early Research Achievements

Learning from and Providing Help to Developers
Wed, Feb 22, 11:00 - 12:30

Evolution of Open Source Systems
Fri, Feb 24, 09:00 - 10:30

Patterns and Optimization
Fri, Feb 24, 14:00 - 15:30

SANER 2017 – Proceedings

Early Research Achievements

Learning from and Providing Help to Developers Wed, Feb 22, 11:00 - 12:30

Evolution of Open Source Systems Fri, Feb 24, 09:00 - 10:30

Patterns and Optimization Fri, Feb 24, 14:00 - 15:30

Learning from and Providing Help to Developers
Wed, Feb 22, 11:00 - 12:30

Evolution of Open Source Systems
Fri, Feb 24, 09:00 - 10:30

Patterns and Optimization
Fri, Feb 24, 14:00 - 15:30