SANER 2017 – Author Index |
Contents -
Abstracts -
Authors
|
A B C D E F G H J K L M N O P R S T V W X Y Z
Adams, Bram |
SANER '17: "Code of Conduct in Open Source ..."
Code of Conduct in Open Source Projects
Parastou Tourani, Bram Adams, and Alexander Serebrenik (Polytechnique Montréal, Canada; Eindhoven University of Technology, Netherlands) Open source projects rely on collaboration of members from all around the world using web technologies like GitHub and Gerrit. This mixture of people with a wide range of backgrounds including minorities like women, ethnic minorities, and people with disabilities may increase the risk of offensive and destroying behaviours in the community, potentially leading affected project members to leave towards a more welcoming and friendly environment. To counter these effects, open source projects increasingly are turning to codes of conduct, in an attempt to promote their expectations and standards of ethical behaviour. In this first of its kind empirical study of codes of conduct in open source software projects, we investigated the role, scope and influence of codes of conduct through a mixture of quantitative and qualitative analysis, supported by interviews with practitioners. We found that the top codes of conduct are adopted by hundreds to thousands of projects, while all of them share 5 common dimensions. @InProceedings{SANER17p24, author = {Parastou Tourani and Bram Adams and Alexander Serebrenik}, title = {Code of Conduct in Open Source Projects}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {24--33}, doi = {}, year = {2017}, } |
|
Alexandru, Carol V. |
SANER '17: "Reducing Redundancies in Multi-revision ..."
Reducing Redundancies in Multi-revision Code Analysis
Carol V. Alexandru, Sebastiano Panichella, and Harald C. Gall (University of Zurich, Switzerland) Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code. @InProceedings{SANER17p148, author = {Carol V. Alexandru and Sebastiano Panichella and Harald C. Gall}, title = {Reducing Redundancies in Multi-revision Code Analysis}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {148--159}, doi = {}, year = {2017}, } |
|
AlSuhaibani, Reem S. |
SANER '17: "Lexical Categories for Source ..."
Lexical Categories for Source Code Identifiers
Christian D. Newman, Reem S. AlSuhaibani, Michael L. Collard, and Jonathan I. Maletic (Kent State University, USA; University of Akron, USA) A set of lexical categories, analogous to part-of-speech categories for English prose, is defined for source-code identifiers. The lexical category for an identifier is determined from its declaration in the source code, syntactic meaning in the programming language, and static program analysis. Current techniques for assigning lexical categories to identifiers use natural-language part-of-speech taggers. However, these NLP approaches assign lexical tags based on how terms are used in English prose. The approach taken here differs in that it uses only source code to determine the lexical category. The approach assigns a lexical category to each identifier and stores this information along with each declaration. srcML is used as the infrastructure to implement the approach and so the lexical information is stored directly in the srcML markup as an additional XML element for each identifier. These lexical-category annotations can then be later used by tools that automatically generate such things as code summarization or documentation. The approach is applied to 50 open source projects and the soundness of the defined lexical categories evaluated. The evaluation shows that at every level of minimum support tested, categorization is consistent at least 79% of the time with an overall consistency (across all supports) of at least 88%. The categories reveal a correlation between how an identifier is named and how it is declared. This provides a syntax-oriented view (as opposed to English part-of-speech view) of developer intent of identifiers. @InProceedings{SANER17p228, author = {Christian D. Newman and Reem S. AlSuhaibani and Michael L. Collard and Jonathan I. Maletic}, title = {Lexical Categories for Source Code Identifiers}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {228--239}, doi = {}, year = {2017}, } |
|
Amann, Sven |
SANER '17: "Enriching In-IDE Process Information ..."
Enriching In-IDE Process Information with Fine-Grained Source Code History
Sebastian Proksch, Sarah Nadi , Sven Amann, and Mira Mezini (TU Darmstadt, Germany; University of Alberta, Canada) Current studies on software development either focus on the change history of source code from version-control systems or on an analysis of simplistic in-IDE events without context information. Each of these approaches contains valuable information that is unavailable in the other case. Our work proposes enriched event streams, a solution that combines the best of both worlds and provides a holistic view on the software development process. Enriched event streams not only capture developer activities in the IDE, but also specialized context information, such as source-code snapshots for change events. To enable the storage of such code snapshots in an analyzable format, we introduce a new intermediate representation called Simplified Syntax Trees (SSTs) and build CARET, a platform that offers reusable components to conveniently work with enriched event streams. We implement FeedBaG++, an instrumentation for Visual Studio that collects enriched event streams with code snapshots in the form of SSTs. We share a dataset of enriched event streams captured from 58 users and representing 915 days of work. Additionally, to demonstrate usefulness, we present three research applications that have already made use of CARET and FeedBaG++. @InProceedings{SANER17p250, author = {Sebastian Proksch and Sarah Nadi and Sven Amann and Mira Mezini}, title = {Enriching In-IDE Process Information with Fine-Grained Source Code History}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {250--260}, doi = {}, year = {2017}, } Info |
|
An, Le |
SANER '17: "Stack Overflow: A Code Laundering ..."
Stack Overflow: A Code Laundering Platform?
Le An, Ons Mlouki, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow. @InProceedings{SANER17p283, author = {Le An and Ons Mlouki and Foutse Khomh and Giuliano Antoniol}, title = {Stack Overflow: A Code Laundering Platform?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {283--293}, doi = {}, year = {2017}, } |
|
Anquetil, Nicolas |
SANER '17: "Recommending Source Code Locations ..."
Recommending Source Code Locations for System Specific Transformations
Gustavo Santos, Klérisson V. R. Paixão, Nicolas Anquetil, Anne Etien, Marcelo de Almeida Maia, and Stéphane Ducasse (University of Lille, France; CNRS, France; Inria, France; Federal University of Uberlândia, Brazil) From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches (“structural”, based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%. @InProceedings{SANER17p160, author = {Gustavo Santos and Klérisson V. R. Paixão and Nicolas Anquetil and Anne Etien and Marcelo de Almeida Maia and Stéphane Ducasse}, title = {Recommending Source Code Locations for System Specific Transformations}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {160--170}, doi = {}, year = {2017}, } |
|
Antoniol, Giuliano |
SANER '17: "An Empirical Study of Code ..."
An Empirical Study of Code Smells in JavaScript Projects
Amir Saboury, Pooya Musavi, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) JavaScript is a powerful scripting programming language that has gained a lot of attention this past decade. Initially used exclusively for client-side web development, it has evolved to become one of the most popular programming languages, with developers now using it for both client-side and server-side application development. Similar to applications written in other programming languages, JavaScript applications contain code smells, which are poor design choices that can negatively impact the quality of an application. In this paper, we investigate code smells in JavaScript server-side applications with the aim to understand how they impact the fault-proneness of applications. We detect 12 types of code smells in 537 releases of five popular JavaScript applications (i.e., express, grunt, bower, less.js, and request) and perform survival analysis, comparing the time until a fault occurrence, in files containing code smells and files without code smells. Results show that (1) on average, files without code smells have hazard rates 65% lower than files with code smells. (2) Among the studied smells, “Variable Re-assign” and “Assignment In Conditional statements” code smells have the highest hazard rates. Additionally, we conduct a survey with 1,484 JavaScript developers, to understand the perception of developers towards our studied code smells. We found that developers consider “Nested Callbacks”, “Variable Re-assign” and “Long Parameter List” code smells to be serious design problems that hinder the maintainability and reliability of applications. This assessment is in line with the findings of our quantitative analysis. Overall, code smells affect negatively the quality of JavaScript applications and developers should consider tracking and removing them early on before the release of applications to the public. @InProceedings{SANER17p294, author = {Amir Saboury and Pooya Musavi and Foutse Khomh and Giuliano Antoniol}, title = {An Empirical Study of Code Smells in JavaScript Projects}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {294--305}, doi = {}, year = {2017}, } SANER '17: "Stack Overflow: A Code Laundering ..." Stack Overflow: A Code Laundering Platform? Le An, Ons Mlouki, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow. @InProceedings{SANER17p283, author = {Le An and Ons Mlouki and Foutse Khomh and Giuliano Antoniol}, title = {Stack Overflow: A Code Laundering Platform?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {283--293}, doi = {}, year = {2017}, } |
|
Bill, Robert |
SANER '17: "Automated Generation of Consistency-Achieving ..."
Automated Generation of Consistency-Achieving Model Editors
Patrick Neubauer, Robert Bill, Tanja Mayerhofer, and Manuel Wimmer (Vienna University of Technology, Austria) The advances of domain-specific modeling languages (DSMLs) and their editors created with modern language workbenches, have convinced domain experts of applying them as important and powerful means in their daily endeavors. Despite the fact that such editors are proficient in retaining syntactical model correctness, they present major shortages in mastering the preservation of consistency in models with elaborated language-specific constraints which require language engineers to manually implement sophisticated editing capabilities. Consequently, there is a demand for automating procedures to support editor users in both comprehending as well as resolving consistency violations. In this paper, we present an approach to automate the generation of advanced editing support for DSMLs offering automated validation, content-assist, and quick fix capabilities beyond those created by state-of-the-art language workbenches that help domain experts in retaining and achieving the consistency of models. For validation, we show potential error causes for violated constraints, instead of only the context in which constraints are violated. The state-space explosion problem is mitigated by our approach resolving constraint violations by increasing the neighborhood scope in a three-stage process, seeking constraint repair solutions presented as quick fixes to the editor user. We illustrate and provide an initial evaluation of our approach based on an Xtext-based DSML for modeling service clusters. @InProceedings{SANER17p127, author = {Patrick Neubauer and Robert Bill and Tanja Mayerhofer and Manuel Wimmer}, title = {Automated Generation of Consistency-Achieving Model Editors}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {127--137}, doi = {}, year = {2017}, } Info |
|
Briand, Lionel C. |
SANER '17: "Improving Fault Localization ..."
Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models
Bing Liu, Lucia, Shiva Nejati, and Lionel C. Briand (University of Luxembourg, Luxembourg) One promising way to improve the accuracy of fault localization based on statistical debugging is to increase diversity among test cases in the underlying test suite. In many practical situations, adding test cases is not a cost-free option because test oracles are developed manually or running test cases is expensive. Hence, we require to have test suites that are both diverse and small to improve debugging. In this paper, we focus on improving fault localization of Simulink models by generating test cases. We identify three test objectives that aim to increase test suite diversity. We use these objectives in a search-based algorithm to generate diversified but small test suites. To further minimize test suite sizes, we develop a prediction model to stop test generation when adding test cases is unlikely to improve fault localization. We evaluate our approach using three industrial subjects. Our results show (1) the three selected test objectives are able to significantly improve the accuracy of fault localization for small test suite sizes, and (2) our prediction model is able to maintain almost the same fault localization accuracy while reducing the average number of newly generated test cases by more than half. @InProceedings{SANER17p359, author = {Bing Liu and Lucia and Shiva Nejati and Lionel C. Briand}, title = {Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {359--370}, doi = {}, year = {2017}, } |
|
Brito, Aline |
SANER '17: "Historical and Impact Analysis ..."
Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente (Federal University of Minas Gerais, Brazil) Change is a routine in software development. Like any system, libraries also evolve over time. As a consequence, clients are compelled to update and, thus, benefit from the available API improvements. However, some of these API changes may break contracts previously established, resulting in compilation errors and behavioral changes. In this paper, we study a set of questions regarding API breaking changes. Our goal is to measure the amount of breaking changes on real-world libraries and its impact on clients at a large-scale level. We assess (i) the frequency of breaking changes, (ii) the behavior of these changes over time, (iii) the impact on clients, and (iv) the characteristics of libraries with high frequency of breaking changes. Our large-scale analysis on 317 real-world Java libraries, 9K releases, and 260K client applications shows that (i) 14.78% of the API changes break compatibility with previous versions, (ii) the frequency of breaking changes increases over time, (iii) 2.54% of their clients are impacted, and (iv) systems with higher frequency of breaking changes are larger, more popular, and more active. Based on these results, we provide a set of lessons to better support library and client developers in their maintenance tasks. @InProceedings{SANER17p138, author = {Laerte Xavier and Aline Brito and Andre Hora and Marco Tulio Valente}, title = {Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {138--147}, doi = {}, year = {2017}, } |
|
Carette, Antonin |
SANER '17: "Investigating the Energy Impact ..."
Investigating the Energy Impact of Android Smells
Antonin Carette, Mehdi Adel Ait Younes, Geoffrey Hecht, Naouel Moha, and Romain Rouvoy (Université du Québec à Montréal, Canada; Inria, France; University of Lille, France; IUF, France) Android code smells are bad implementation practices within Android applications (or apps) that may lead to poor software quality. These code smells are known to degrade the performance of apps and to have an impact on energy consumption. However, few studies have assessed the positive impact on energy consumption when correcting code smells. In this paper, we therefore propose a tooled and reproducible approach, called Hot-Pepper, to automatically correct code smells and evaluate their impact on energy consumption. Currently, Hot-Pepper is able to automatically correct three types of Android-specific code smells: Internal Getter/Setter, Member Ignoring Method, and HashMap Usage. Hot-Pepper derives four versions of the apps by correcting each detected smell independently, and all of them at once. Hot-Pepper is able to report on the energy consumption of each app version with a single user scenario test. Our empirical study on five open-source Android apps shows that correcting the three aforementioned Android code smells effectively and significantly reduces the energy consumption of apps. In particular, we observed a global reduction in energy consumption by 4,83% in one app when the three code smells are corrected. We also take advantage of the flexibility of Hot-Pepper to investigate the impact of three picture smells (bad picture format, compression, and bitmap format) in sample apps. We observed that the usage of optimised JPG pictures with the Android default bitmap format is the most energy efficient combination in Android apps. We believe that developers can benefit from our approach and results to guide their refactoring, and thus improve the energy consumption of their mobile apps. @InProceedings{SANER17p115, author = {Antonin Carette and Mehdi Adel Ait Younes and Geoffrey Hecht and Naouel Moha and Romain Rouvoy}, title = {Investigating the Energy Impact of Android Smells}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {115--126}, doi = {}, year = {2017}, } |
|
Ceccato, Mariano |
SANER '17: "Automatic Generation of Opaque ..."
Automatic Generation of Opaque Constants Based on the K-Clique Problem for Resilient Data Obfuscation
Roberto Tiella and Mariano Ceccato (Fondazione Bruno Kessler, Italy) Data obfuscations are program transformations used to complicate program understanding and conceal actual values of program variables. The possibility to hide constant values is a basic building block of several obfuscation techniques. For example, in XOR Masking a constant mask is used to encode data, but this mask must be hidden too, in order to keep the obfuscation resilient to attacks. In this paper, we present a novel technique based on the k-clique problem, which is known to be NP-complete, to generate opaque constants, i.e. values that are difficult to guess by static analysis. In our experimental assessment we show that our opaque constants are computationally cheap to generate, both at obfuscation time and at runtime. Moreover, due to the NP-completeness of the k-clique problem, our opaque constants can be proven to be hard to attack with state-of-the-art static analysis tools. @InProceedings{SANER17p182, author = {Roberto Tiella and Mariano Ceccato}, title = {Automatic Generation of Opaque Constants Based on the K-Clique Problem for Resilient Data Obfuscation}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {182--192}, doi = {}, year = {2017}, } |
|
Ciurumelea, Adelina |
SANER '17: "Analyzing Reviews and Code ..."
Analyzing Reviews and Code of Mobile Apps for Better Release Planning
Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella, and Harald C. Gall (University of Zurich, Switzerland) The mobile applications industry experiences an unprecedented high growth, developers working in this context face a fierce competition in acquiring and retaining users. They have to quickly implement new features and fix bugs, or risks losing their users to the competition. To achieve this goal they must closely monitor and analyze the user feedback they receive in form of reviews. However, successful apps can receive up to several thousands of reviews per day, manually analysing each of them is a time consuming task. To help developers deal with the large amount of available data, we manually analyzed the text of 1566 user reviews and defined a high and low level taxonomy containing mobile specific categories (e.g. performance, resources, battery, memory, etc.) highly relevant for developers during the planning of maintenance and evolution activities. Then we built the User Request Referencer (URR) prototype, using Machine Learning and Information Retrieval techniques, to automatically classify reviews according to our taxonomy and recommend for a particular review what are the source code files that need to be modified to handle the issue described in the user review. We evaluated our approach through an empirical study involving the reviews and code of 39 mobile applications. Our results show a high precision and recall of URR in organising reviews according to the defined taxonomy. @InProceedings{SANER17p91, author = {Adelina Ciurumelea and Andreas Schaufelbühl and Sebastiano Panichella and Harald C. Gall}, title = {Analyzing Reviews and Code of Mobile Apps for Better Release Planning}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {91--102}, doi = {}, year = {2017}, } |
|
Claes, Maëlick |
SANER '17: "An Empirical Comparison of ..."
An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems
Alexandre Decan , Tom Mens, and Maëlick Claes (University of Mons, Belgium) Nearly every popular programming language comes with one or more open source software packaging ecosystem(s), containing a large collection of interdependent software packages developed in that programming language. Such packaging ecosystems are extremely useful for their respective software development community. We present an empirical analysis of how the dependency graphs of three large packaging ecosystems (npm, CRAN and RubyGems) evolve over time. We study how the existing package dependencies impact the resilience of the three ecosystems over time and to which extent these ecosystems suffer from issues related to package dependency updates. We analyse specific solutions that each ecosystem has put into place and argue that none of these solutions is perfect, motivating the need for better tools to deal with package dependency update problems. @InProceedings{SANER17p2, author = {Alexandre Decan and Tom Mens and Maëlick Claes}, title = {An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {2--12}, doi = {}, year = {2017}, } |
|
Collard, Michael L. |
SANER '17: "Lexical Categories for Source ..."
Lexical Categories for Source Code Identifiers
Christian D. Newman, Reem S. AlSuhaibani, Michael L. Collard, and Jonathan I. Maletic (Kent State University, USA; University of Akron, USA) A set of lexical categories, analogous to part-of-speech categories for English prose, is defined for source-code identifiers. The lexical category for an identifier is determined from its declaration in the source code, syntactic meaning in the programming language, and static program analysis. Current techniques for assigning lexical categories to identifiers use natural-language part-of-speech taggers. However, these NLP approaches assign lexical tags based on how terms are used in English prose. The approach taken here differs in that it uses only source code to determine the lexical category. The approach assigns a lexical category to each identifier and stores this information along with each declaration. srcML is used as the infrastructure to implement the approach and so the lexical information is stored directly in the srcML markup as an additional XML element for each identifier. These lexical-category annotations can then be later used by tools that automatically generate such things as code summarization or documentation. The approach is applied to 50 open source projects and the soundness of the defined lexical categories evaluated. The evaluation shows that at every level of minimum support tested, categorization is consistent at least 79% of the time with an overall consistency (across all supports) of at least 88%. The categories reveal a correlation between how an identifier is named and how it is declared. This provides a syntax-oriented view (as opposed to English part-of-speech view) of developer intent of identifiers. @InProceedings{SANER17p228, author = {Christian D. Newman and Reem S. AlSuhaibani and Michael L. Collard and Jonathan I. Maletic}, title = {Lexical Categories for Source Code Identifiers}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {228--239}, doi = {}, year = {2017}, } |
|
Constantinou, Eleni |
SANER '17: "Socio-Technical Evolution ..."
Socio-Technical Evolution of the Ruby Ecosystem in GitHub
Eleni Constantinou and Tom Mens (University of Mons, Belgium) The evolution dynamics of a software ecosystem depend on the activity of the developer community contributing to projects within it. Both social and technical changes affect an ecosystem's evolution and the research community has been investigating the impact of these modifications over the last few years. Existing studies mainly focus on temporary modifications, often ignoring the effect of permanent changes on the software ecosystem. We present an empirical study of the magnitude and effect of permanent modifications in both the social and technical parts of a software ecosystem. More precisely, we measure permanent changes with regard to the ecosystem's projects, contributors and source code files and present our findings concerning the effect of these modifications. We study the Ruby ecosystem in GitHub over a nine-year period by carrying out a socio-technical analysis of the co-evolution of a large number of base projects and their forks. This analysis involves both the source code developed for these projects as well as the developers having contributed to them. We discuss our findings with respect to the ecosystem evolution according to three different viewpoints: (1) the base projects, (2) the forks and (3) the entire ecosystem containing both the base projects and forks. Our findings show an increased growth in both the technical and social aspects of the Ruby ecosystem until early 2014, followed by an increased contributor and project abandonment rate. We show the effect of permanent modifications in the ecosystem evolution and provide preliminary evidence of contributors migrating to other ecosystems when leaving the Ruby ecosystem. @InProceedings{SANER17p34, author = {Eleni Constantinou and Tom Mens}, title = {Socio-Technical Evolution of the Ruby Ecosystem in GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {34--44}, doi = {}, year = {2017}, } Info |
|
Cornu, Benoit |
SANER '17: "Dynamic Patch Generation for ..."
Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming
Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus (University of Lille, France; Inria, France) Null pointer exceptions (NPE) are the number one cause of uncaught crashing exceptions in production. In this paper, we aim at exploring the search space of possible patches for null pointer exceptions with metaprogramming. Our idea is to transform the program under repair with automated code transformation, so as to obtain a metaprogram. This metaprogram contains automatically injected hooks, that can be activated to emulate a null pointer exception patch. This enables us to perform a fine-grain analysis of the runtime context of null pointer exceptions. We set up an experiment with 16 real null pointer exceptions that have happened in the field. We compare the effectiveness of our metaprogramming approach against simple templates for repairing null pointer exceptions. @InProceedings{SANER17p349, author = {Thomas Durieux and Benoit Cornu and Lionel Seinturier and Martin Monperrus}, title = {Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {349--358}, doi = {}, year = {2017}, } |
|
Dal Sasso, Tommaso |
SANER '17: "How to Gamify Software Engineering ..."
How to Gamify Software Engineering
Tommaso Dal Sasso, Andrea Mocci, Michele Lanza , and Ebrisa Mastrodicasa (University of Lugano, Switzerland) Software development, like any prolonged and intellectually demanding activity, can negatively affect the motivation of developers. This is especially true in specific areas of software engineering, such as requirements engineering, test-driven development, bug reporting and fixing, where the creative aspects of programming fall short. The developers’ engagement might progressively degrade, potentially impacting their work’s quality. Gamification, the use of game elements and game design techniques in non-game contexts, is hailed as a means to boost the motivation of people for a wide range of rote activities. Indeed, well-designed games deeply involve gamers in a positive loop of production, feedback, and reward, eliciting desirable feelings like happiness and collaboration. The question we investigate is how the seemingly frivolous context of games and gamification can be ported to the technically challenging and sober domain of software engineering. Our investigation starts with a review of the state of the art of gamification, supported by a motivating scenario to expose how gamification elements can be integrated in software engineering. We provide a set of basic building blocks to apply gamification techniques, present a conceptual framework to do so, illustrated in two usage contexts, and critically discuss our findings. @InProceedings{SANER17p261, author = {Tommaso Dal Sasso and Andrea Mocci and Michele Lanza and Ebrisa Mastrodicasa}, title = {How to Gamify Software Engineering}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {261--271}, doi = {}, year = {2017}, } |
|
Decan, Alexandre |
SANER '17: "An Empirical Comparison of ..."
An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems
Alexandre Decan , Tom Mens, and Maëlick Claes (University of Mons, Belgium) Nearly every popular programming language comes with one or more open source software packaging ecosystem(s), containing a large collection of interdependent software packages developed in that programming language. Such packaging ecosystems are extremely useful for their respective software development community. We present an empirical analysis of how the dependency graphs of three large packaging ecosystems (npm, CRAN and RubyGems) evolve over time. We study how the existing package dependencies impact the resilience of the three ecosystems over time and to which extent these ecosystems suffer from issues related to package dependency updates. We analyse specific solutions that each ecosystem has put into place and argue that none of these solutions is perfect, motivating the need for better tools to deal with package dependency update problems. @InProceedings{SANER17p2, author = {Alexandre Decan and Tom Mens and Maëlick Claes}, title = {An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {2--12}, doi = {}, year = {2017}, } |
|
De Lucia, Andrea |
SANER '17: "Software-Based Energy Profiling ..."
Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?
Dario Di Nucci, Fabio Palomba , Antonio Prota, Annibale Panichella , Andy Zaidman, and Andrea De Lucia (University of Salerno, Italy; Delft University of Technology, Netherlands; University of Luxembourg, Luxembourg) Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile. However, all these solutions present their own advantages and disadvantages. Hardware-based tools are highly precise, but at the same time their use is bound to the acquisition of costly hardware components. Model-based tools require the calibration of parameters needed to correctly create a model on a specific hardware device. Software-based approaches do not need any hardware components, but they rely on battery measurements and, thus, they are hardware-assisted. These tools are cheaper and easier to use than hardware-based tools, but they are believed to be less precise. In this paper, we take a deeper look at the pros and cons of software-based solutions investigating to what extent their measurements depart from hardware-based solutions. To this aim, we propose a software-based tool named PETrA that we compare with the hardware-based Monsoon toolkit on 54 Android apps. The results show that PETrA performs similarly to Monsoon despite not using any sophisticated hardware components. In fact, in all the apps the mean relative error with respect to Monsoon is lower than 0.05. Moreover, for 95% of the analyzed methods the estimation error is within 5% of the actual values measured using the hardware-based toolkit. @InProceedings{SANER17p103, author = {Dario Di Nucci and Fabio Palomba and Antonio Prota and Annibale Panichella and Andy Zaidman and Andrea De Lucia}, title = {Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {103--114}, doi = {}, year = {2017}, } Video Info |
|
De Roover, Coen |
SANER '17: "Extracting Executable Transformations ..."
Extracting Executable Transformations from Distilled Code Changes
Reinout Stevens and Coen De Roover (Vrije Universiteit Brussel, Belgium) Change distilling algorithms compute a sequence of fine-grained changes that, when executed in order, transform a given source AST into a given target AST. The resulting change sequences are used in the field of mining software repositories to study source code evolution. Unfortunately, detecting and specifying source code evolutions in such a change sequence is cumbersome. We therefore introduce a tool-supported approach that identifies minimal executable subsequences in a sequence of distilled changes that implement a particular evolution pattern, specified in terms of intermediate states of the AST that undergoes each change. This enables users to describe the effect of multiple changes, irrespective of their execution order, while ensuring that different change sequences that implement the same code evolution are recalled. Correspondingly, our evaluation is two-fold. Using examples, we demonstrate the expressiveness of specifying source code evolutions through intermediate ASTs. We also show that our approach is able to recall different implementation variants of the same source code evolution in open-source histories. @InProceedings{SANER17p171, author = {Reinout Stevens and Coen De Roover}, title = {Extracting Executable Transformations from Distilled Code Changes}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {171--181}, doi = {}, year = {2017}, } |
|
Deursen, Arie van |
SANER '17: "Spreadsheet Testing in Practice ..."
Spreadsheet Testing in Practice
Sohon Roy, Felienne Hermans, and Arie van Deursen (Delft University of Technology, Netherlands) Despite being popular end-user tools, spreadsheets suffer from the vulnerability of error-proneness. In software engineering, testing has been proposed as a way to address errors. It is important therefore to know whether spreadsheet users also test, or how do they test and to what extent, especially since most spreadsheet users do not have the training, or experience, of software engineering principles. Towards this end, we conduct a two-phase mixed methods study. First, a qualitative phase, in which we interview 12 spreadsheet users, and second, a quantitative phase, in which we conduct an online survey completed by 72 users. The outcome of the interviews, organized into four different categories, consists of an overview of test practices, perceptions of spreadsheet users about testing, a set of preventive measures for avoiding errors, and an overview of maintenance practices for ensuring correctness of spreadsheets over time. The survey adds to the findings by providing quantitative estimates indicating that ensuring correctness is an important concern, and a major fraction of users do test their spreadsheets. However, their techniques are largely manual and lack formalism. Tools and automated supports are rarely used. @InProceedings{SANER17p338, author = {Sohon Roy and Felienne Hermans and Arie van Deursen}, title = {Spreadsheet Testing in Practice}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {338--348}, doi = {}, year = {2017}, } |
|
Di Nucci, Dario |
SANER '17: "Software-Based Energy Profiling ..."
Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?
Dario Di Nucci, Fabio Palomba , Antonio Prota, Annibale Panichella , Andy Zaidman, and Andrea De Lucia (University of Salerno, Italy; Delft University of Technology, Netherlands; University of Luxembourg, Luxembourg) Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile. However, all these solutions present their own advantages and disadvantages. Hardware-based tools are highly precise, but at the same time their use is bound to the acquisition of costly hardware components. Model-based tools require the calibration of parameters needed to correctly create a model on a specific hardware device. Software-based approaches do not need any hardware components, but they rely on battery measurements and, thus, they are hardware-assisted. These tools are cheaper and easier to use than hardware-based tools, but they are believed to be less precise. In this paper, we take a deeper look at the pros and cons of software-based solutions investigating to what extent their measurements depart from hardware-based solutions. To this aim, we propose a software-based tool named PETrA that we compare with the hardware-based Monsoon toolkit on 54 Android apps. The results show that PETrA performs similarly to Monsoon despite not using any sophisticated hardware components. In fact, in all the apps the mean relative error with respect to Monsoon is lower than 0.05. Moreover, for 95% of the analyzed methods the estimation error is within 5% of the actual values measured using the hardware-based toolkit. @InProceedings{SANER17p103, author = {Dario Di Nucci and Fabio Palomba and Antonio Prota and Annibale Panichella and Andy Zaidman and Andrea De Lucia}, title = {Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {103--114}, doi = {}, year = {2017}, } Video Info |
|
Ducasse, Stéphane |
SANER '17: "Recommending Source Code Locations ..."
Recommending Source Code Locations for System Specific Transformations
Gustavo Santos, Klérisson V. R. Paixão, Nicolas Anquetil, Anne Etien, Marcelo de Almeida Maia, and Stéphane Ducasse (University of Lille, France; CNRS, France; Inria, France; Federal University of Uberlândia, Brazil) From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches (“structural”, based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%. @InProceedings{SANER17p160, author = {Gustavo Santos and Klérisson V. R. Paixão and Nicolas Anquetil and Anne Etien and Marcelo de Almeida Maia and Stéphane Ducasse}, title = {Recommending Source Code Locations for System Specific Transformations}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {160--170}, doi = {}, year = {2017}, } |
|
Durieux, Thomas |
SANER '17: "Dynamic Patch Generation for ..."
Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming
Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus (University of Lille, France; Inria, France) Null pointer exceptions (NPE) are the number one cause of uncaught crashing exceptions in production. In this paper, we aim at exploring the search space of possible patches for null pointer exceptions with metaprogramming. Our idea is to transform the program under repair with automated code transformation, so as to obtain a metaprogram. This metaprogram contains automatically injected hooks, that can be activated to emulate a null pointer exception patch. This enables us to perform a fine-grain analysis of the runtime context of null pointer exceptions. We set up an experiment with 16 real null pointer exceptions that have happened in the field. We compare the effectiveness of our metaprogramming approach against simple templates for repairing null pointer exceptions. @InProceedings{SANER17p349, author = {Thomas Durieux and Benoit Cornu and Lionel Seinturier and Martin Monperrus}, title = {Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {349--358}, doi = {}, year = {2017}, } |
|
Egyed, Alexander |
SANER '17: "Analyzing Closeness of Code ..."
Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery
Hongyu Kuang, Jia Nie, Hao Hu, Patrick Rempel, Jian Lü, Alexander Egyed, and Patrick Mäder (Nanjing University, China; JKU Linz, Austria; TU Ilmenau, Germany) Information Retrieval (IR) identifies trace links based on textual similarities among software artifacts. However, the vocabulary mismatch problem between different artifacts hinders the performance of IR-based approaches. A growing body of work addresses this issue by combining IR techniques with code dependency analysis such as method calls. However, so far the performance of combined approaches is highly dependent to the correctness of IR techniques and does not take full advantage of the code dependency analysis. In this paper, we combine IR techniques with closeness analysis to improve IR-based traceability recovery. Specifically, we quantify and utilize the “closeness” for each call and data dependency between two classes to improve rankings of traceability candidate lists. An empirical evaluation based on three real-world systems suggests that our approach outperforms three baseline approaches. @InProceedings{SANER17p68, author = {Hongyu Kuang and Jia Nie and Hao Hu and Patrick Rempel and Jian Lü and Alexander Egyed and Patrick Mäder}, title = {Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {68--78}, doi = {}, year = {2017}, } |
|
Etien, Anne |
SANER '17: "Recommending Source Code Locations ..."
Recommending Source Code Locations for System Specific Transformations
Gustavo Santos, Klérisson V. R. Paixão, Nicolas Anquetil, Anne Etien, Marcelo de Almeida Maia, and Stéphane Ducasse (University of Lille, France; CNRS, France; Inria, France; Federal University of Uberlândia, Brazil) From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches (“structural”, based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%. @InProceedings{SANER17p160, author = {Gustavo Santos and Klérisson V. R. Paixão and Nicolas Anquetil and Anne Etien and Marcelo de Almeida Maia and Stéphane Ducasse}, title = {Recommending Source Code Locations for System Specific Transformations}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {160--170}, doi = {}, year = {2017}, } |
|
Ettinger, Ran |
SANER '17: "Efficient Method Extraction ..."
Efficient Method Extraction for Automatic Elimination of Type-3 Clones
Ran Ettinger, Shmuel Tyszberowicz, and Shay Menaia (Ben-Gurion University of the Negev, Israel; Academic College of Tel Aviv-Yaffo, Israel) A semantics-preserving transformation by Komondoor and Horwitz has been shown to be most effective in the elimination of type-3 clones. The two original algorithms for realizing this transformation, however, are not as efficient as the related (slice-based) transformations. We present an asymptotically-faster algorithm that implements the same transformation via bidirectional reachability on a program dependence graph, and we prove its equivalence to the original formulation. @InProceedings{SANER17p327, author = {Ran Ettinger and Shmuel Tyszberowicz and Shay Menaia}, title = {Efficient Method Extraction for Automatic Elimination of Type-3 Clones}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {327--337}, doi = {}, year = {2017}, } |
|
Feng, Yiyang |
SANER '17: "An Empirical Investigation ..."
An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults
Yiyang Feng, Wanwangying Ma, Yibiao Yang, Hongmin Lu, Yuming Zhou , and Baowen Xu (Nanjing University, China) In recent years, it has been shown that fault prediction models could effectively guide test effort allocation in finding faults if they have a high enough fault prediction accuracy (Norm(Popt) > 0.78). However, it is often difficult to achieve such a high fault prediction accuracy in practice. As a result, fault-prediction-model-guided allocation (FPA) methods may be not applicable in real development environments. To attack this problem, in this paper, we propose a new type of test effort allocation strategy: reliability-growth-model-guided allocation (RGA) method. For a given project release V, RGA attempts to predict the optimal test effort allocation for V by learning the fault distribution information from the previous releases. Based on three open-source projects, we empirically investigate the cost-effectiveness of three test effort allocation strategies for finding faults: RGA, FPA, and structural-complexity-guided allocation (SCA) method. The experimental results show that RGA shows a promising performance in finding faults when compared with SCA and FPA. @InProceedings{SANER17p371, author = {Yiyang Feng and Wanwangying Ma and Yibiao Yang and Hongmin Lu and Yuming Zhou and Baowen Xu}, title = {An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {371--381}, doi = {}, year = {2017}, } |
|
Fenske, Wolfram |
SANER '17: "Variant-Preserving Refactorings ..."
Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line
Wolfram Fenske, Jens Meinicke, Sandro Schulze, Steffen Schulze, and Gunter Saake (University of Magdeburg, Germany; Carnegie Mellon University, USA) A common and simple way to create custom product variants is to copy and adapt existing software (a.k.a. the clone-and-own approach). Clone-and-own promises low initial costs for creating a new variant as existing code is easily reused. However, clone-and-own also comes with major drawbacks for maintenance and evolution since changes, such as bug fixes, need to be synchronized among several product variants. Software product lines (SPLs) provide solutions to these problems because commonalities are implemented only once. Thus, in an SPL, changes also need to be applied only once. Therefore, the migration of cloned product variants to an SPL would be beneficial. The main tasks of migration are the identification and extraction of commonalities from existing products. However, these tasks are challenging and currently not well-supported. In this paper, we propose a step-wise and semi-automated process to migrate cloned product variants to a feature-oriented SPL. Our process relies on clone detection to identify code that is common to multiple variants and novel, variant-preserving refactorings to extract such common code. We evaluated our approach on five cloned product variants, reducing code clones by 25%. Moreover, we provide qualitative insights into possible limitations and potentials for removing even more redundant code. We argue that our approach can effectively decrease synchronization effort compared to clone-and-own development and thus reduce the long-term costs for maintenance and evolution. @InProceedings{SANER17p316, author = {Wolfram Fenske and Jens Meinicke and Sandro Schulze and Steffen Schulze and Gunter Saake}, title = {Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {316--326}, doi = {}, year = {2017}, } |
|
Gall, Harald C. |
SANER '17: "Analyzing Reviews and Code ..."
Analyzing Reviews and Code of Mobile Apps for Better Release Planning
Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella, and Harald C. Gall (University of Zurich, Switzerland) The mobile applications industry experiences an unprecedented high growth, developers working in this context face a fierce competition in acquiring and retaining users. They have to quickly implement new features and fix bugs, or risks losing their users to the competition. To achieve this goal they must closely monitor and analyze the user feedback they receive in form of reviews. However, successful apps can receive up to several thousands of reviews per day, manually analysing each of them is a time consuming task. To help developers deal with the large amount of available data, we manually analyzed the text of 1566 user reviews and defined a high and low level taxonomy containing mobile specific categories (e.g. performance, resources, battery, memory, etc.) highly relevant for developers during the planning of maintenance and evolution activities. Then we built the User Request Referencer (URR) prototype, using Machine Learning and Information Retrieval techniques, to automatically classify reviews according to our taxonomy and recommend for a particular review what are the source code files that need to be modified to handle the issue described in the user review. We evaluated our approach through an empirical study involving the reviews and code of 39 mobile applications. Our results show a high precision and recall of URR in organising reviews according to the defined taxonomy. @InProceedings{SANER17p91, author = {Adelina Ciurumelea and Andreas Schaufelbühl and Sebastiano Panichella and Harald C. Gall}, title = {Analyzing Reviews and Code of Mobile Apps for Better Release Planning}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {91--102}, doi = {}, year = {2017}, } SANER '17: "Reducing Redundancies in Multi-revision ..." Reducing Redundancies in Multi-revision Code Analysis Carol V. Alexandru, Sebastiano Panichella, and Harald C. Gall (University of Zurich, Switzerland) Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code. @InProceedings{SANER17p148, author = {Carol V. Alexandru and Sebastiano Panichella and Harald C. Gall}, title = {Reducing Redundancies in Multi-revision Code Analysis}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {148--159}, doi = {}, year = {2017}, } |
|
Hecht, Geoffrey |
SANER '17: "Investigating the Energy Impact ..."
Investigating the Energy Impact of Android Smells
Antonin Carette, Mehdi Adel Ait Younes, Geoffrey Hecht, Naouel Moha, and Romain Rouvoy (Université du Québec à Montréal, Canada; Inria, France; University of Lille, France; IUF, France) Android code smells are bad implementation practices within Android applications (or apps) that may lead to poor software quality. These code smells are known to degrade the performance of apps and to have an impact on energy consumption. However, few studies have assessed the positive impact on energy consumption when correcting code smells. In this paper, we therefore propose a tooled and reproducible approach, called Hot-Pepper, to automatically correct code smells and evaluate their impact on energy consumption. Currently, Hot-Pepper is able to automatically correct three types of Android-specific code smells: Internal Getter/Setter, Member Ignoring Method, and HashMap Usage. Hot-Pepper derives four versions of the apps by correcting each detected smell independently, and all of them at once. Hot-Pepper is able to report on the energy consumption of each app version with a single user scenario test. Our empirical study on five open-source Android apps shows that correcting the three aforementioned Android code smells effectively and significantly reduces the energy consumption of apps. In particular, we observed a global reduction in energy consumption by 4,83% in one app when the three code smells are corrected. We also take advantage of the flexibility of Hot-Pepper to investigate the impact of three picture smells (bad picture format, compression, and bitmap format) in sample apps. We observed that the usage of optimised JPG pictures with the Android default bitmap format is the most energy efficient combination in Android apps. We believe that developers can benefit from our approach and results to guide their refactoring, and thus improve the energy consumption of their mobile apps. @InProceedings{SANER17p115, author = {Antonin Carette and Mehdi Adel Ait Younes and Geoffrey Hecht and Naouel Moha and Romain Rouvoy}, title = {Investigating the Energy Impact of Android Smells}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {115--126}, doi = {}, year = {2017}, } |
|
Hermans, Felienne |
SANER '17: "Spreadsheet Testing in Practice ..."
Spreadsheet Testing in Practice
Sohon Roy, Felienne Hermans, and Arie van Deursen (Delft University of Technology, Netherlands) Despite being popular end-user tools, spreadsheets suffer from the vulnerability of error-proneness. In software engineering, testing has been proposed as a way to address errors. It is important therefore to know whether spreadsheet users also test, or how do they test and to what extent, especially since most spreadsheet users do not have the training, or experience, of software engineering principles. Towards this end, we conduct a two-phase mixed methods study. First, a qualitative phase, in which we interview 12 spreadsheet users, and second, a quantitative phase, in which we conduct an online survey completed by 72 users. The outcome of the interviews, organized into four different categories, consists of an overview of test practices, perceptions of spreadsheet users about testing, a set of preventive measures for avoiding errors, and an overview of maintenance practices for ensuring correctness of spreadsheets over time. The survey adds to the findings by providing quantitative estimates indicating that ensuring correctness is an important concern, and a major fraction of users do test their spreadsheets. However, their techniques are largely manual and lack formalism. Tools and automated supports are rarely used. @InProceedings{SANER17p338, author = {Sohon Roy and Felienne Hermans and Arie van Deursen}, title = {Spreadsheet Testing in Practice}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {338--348}, doi = {}, year = {2017}, } |
|
Hofmeister, Johannes |
SANER '17: "Shorter Identifier Names Take ..."
Shorter Identifier Names Take Longer to Comprehend
Johannes Hofmeister, Janet Siegmund, and Daniel V. Holt (University of Passau, Germany; University of Heidelberg, Germany) Developers spend the majority of their time comprehending code, a process in which identifier names play a key role. Although many identifier naming styles exist, they often lack an empirical basis and it is not quite clear whether short or long identifier names facilitate comprehension. In this paper, we investigate the effect of different identifier naming styles (letters, abbreviations, words) on program comprehension, and whether these effects arise because of their length or their semantics. We conducted an experimental study with 72 professional C# developers, who looked for defects in source-code snippets. We used a within-subjects design, such that each developer saw all three versions of identifier naming styles and we measured the time it took them to find a defect. We found that words lead to, on average, 19% faster comprehension speed compared to letters and abbreviations, but we did not find a significant difference in speed between letters and abbreviations. The results of our study suggest that defects in code are more difficult to detect when code contains only letters and abbreviations. Words as identifier names facilitate program comprehension and can help to save costs and improve software quality. @InProceedings{SANER17p217, author = {Johannes Hofmeister and Janet Siegmund and Daniel V. Holt}, title = {Shorter Identifier Names Take Longer to Comprehend}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {217--227}, doi = {}, year = {2017}, } Info |
|
Holt, Daniel V. |
SANER '17: "Shorter Identifier Names Take ..."
Shorter Identifier Names Take Longer to Comprehend
Johannes Hofmeister, Janet Siegmund, and Daniel V. Holt (University of Passau, Germany; University of Heidelberg, Germany) Developers spend the majority of their time comprehending code, a process in which identifier names play a key role. Although many identifier naming styles exist, they often lack an empirical basis and it is not quite clear whether short or long identifier names facilitate comprehension. In this paper, we investigate the effect of different identifier naming styles (letters, abbreviations, words) on program comprehension, and whether these effects arise because of their length or their semantics. We conducted an experimental study with 72 professional C# developers, who looked for defects in source-code snippets. We used a within-subjects design, such that each developer saw all three versions of identifier naming styles and we measured the time it took them to find a defect. We found that words lead to, on average, 19% faster comprehension speed compared to letters and abbreviations, but we did not find a significant difference in speed between letters and abbreviations. The results of our study suggest that defects in code are more difficult to detect when code contains only letters and abbreviations. Words as identifier names facilitate program comprehension and can help to save costs and improve software quality. @InProceedings{SANER17p217, author = {Johannes Hofmeister and Janet Siegmund and Daniel V. Holt}, title = {Shorter Identifier Names Take Longer to Comprehend}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {217--227}, doi = {}, year = {2017}, } Info |
|
Hora, Andre |
SANER '17: "Historical and Impact Analysis ..."
Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente (Federal University of Minas Gerais, Brazil) Change is a routine in software development. Like any system, libraries also evolve over time. As a consequence, clients are compelled to update and, thus, benefit from the available API improvements. However, some of these API changes may break contracts previously established, resulting in compilation errors and behavioral changes. In this paper, we study a set of questions regarding API breaking changes. Our goal is to measure the amount of breaking changes on real-world libraries and its impact on clients at a large-scale level. We assess (i) the frequency of breaking changes, (ii) the behavior of these changes over time, (iii) the impact on clients, and (iv) the characteristics of libraries with high frequency of breaking changes. Our large-scale analysis on 317 real-world Java libraries, 9K releases, and 260K client applications shows that (i) 14.78% of the API changes break compatibility with previous versions, (ii) the frequency of breaking changes increases over time, (iii) 2.54% of their clients are impacted, and (iv) systems with higher frequency of breaking changes are larger, more popular, and more active. Based on these results, we provide a set of lessons to better support library and client developers in their maintenance tasks. @InProceedings{SANER17p138, author = {Laerte Xavier and Aline Brito and Andre Hora and Marco Tulio Valente}, title = {Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {138--147}, doi = {}, year = {2017}, } |
|
Hu, Hao |
SANER '17: "Analyzing Closeness of Code ..."
Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery
Hongyu Kuang, Jia Nie, Hao Hu, Patrick Rempel, Jian Lü, Alexander Egyed, and Patrick Mäder (Nanjing University, China; JKU Linz, Austria; TU Ilmenau, Germany) Information Retrieval (IR) identifies trace links based on textual similarities among software artifacts. However, the vocabulary mismatch problem between different artifacts hinders the performance of IR-based approaches. A growing body of work addresses this issue by combining IR techniques with code dependency analysis such as method calls. However, so far the performance of combined approaches is highly dependent to the correctness of IR techniques and does not take full advantage of the code dependency analysis. In this paper, we combine IR techniques with closeness analysis to improve IR-based traceability recovery. Specifically, we quantify and utilize the “closeness” for each call and data dependency between two classes to improve rankings of traceability candidate lists. An empirical evaluation based on three real-world systems suggests that our approach outperforms three baseline approaches. @InProceedings{SANER17p68, author = {Hongyu Kuang and Jia Nie and Hao Hu and Patrick Rempel and Jian Lü and Alexander Egyed and Patrick Mäder}, title = {Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {68--78}, doi = {}, year = {2017}, } |
|
Jansen, Slinger |
SANER '17: "The Dark Side of Event Sourcing: ..."
The Dark Side of Event Sourcing: Managing Data Conversion
Michiel Overeem, Marten Spoor, and Slinger Jansen (AFAS Software, Netherlands; Utrecht University, Netherlands) Evolving software systems includes data schema changes, and because of those schema changes data has to be converted. Converting data between two different schemas while continuing the operation of the system is a challenge when that system is expected to be available always. Data conversion in event sourced systems introduces new challenges, because of the relative novelty of the event sourcing architectural pattern, because of the lack of standardized tools for data conversion, and because of the large amount of data that is stored in typical event stores. This paper addresses the challenge of schema evolution and the resulting data conversion for event sourced systems. First of all a set of event store upgrade operations is proposed that can be used to convert data between two versions of a data schema. Second, a set of techniques and strategies that execute the data conversion while continuing the operation of the system is discussed. The final contribution is an event store upgrade framework that identifies which techniques and strategies can be combined to execute the event store upgrade operations while continuing operation of the system. Two utilizations of the framework are given, the first being as decision support in upfront design of an upgrade system for event sourced systems. The framework can also be utilized as the description of an automated upgrade system that can be used for continuous deployment. The event store upgrade framework is evaluated in interviews with three renowned experts in the domain and has been found to be a comprehensive overview that can be utilized in the design and implementation of an upgrade system. The automated upgrade system has been implemented partially and applied in experiments. @InProceedings{SANER17p193, author = {Michiel Overeem and Marten Spoor and Slinger Jansen}, title = {The Dark Side of Event Sourcing: Managing Data Conversion}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {193--204}, doi = {}, year = {2017}, } |
|
Jezek, Kamil |
SANER '17: "Antipatterns Causing Memory ..."
Antipatterns Causing Memory Bloat: A Case Study
Kamil Jezek and Richard Lipka (University of West Bohemia, Czech Republic) Java is one of the languages that are popular for high abstraction and automatic memory management. As in other object-oriented languages, Java’s objects can easily represent a domain model of an application. While it has a positive impact on the design, implementation and maintenance of applications, there are drawbacks as well. One of them is a relatively high memory overhead to manage objects. In this work, we show our experience with searching for this problem in an application that we refactored to use less memory. Although the application was relatively well designed with no memory leaks, it required such a big amount of memory that for large data the application was not usable in reality. We did three relatively simple improvements: we reduced the usage of Java Collections, removed unnecessary object instances, and simplified the domain model, which reduced memory needs up to 88% and made the application better usable and even faster. This work is a case-study reporting results. Moreover, the employed ideas are formulated as a set of antipatterns, which may be used for other applications. @InProceedings{SANER17p306, author = {Kamil Jezek and Richard Lipka}, title = {Antipatterns Causing Memory Bloat: A Case Study}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {306--315}, doi = {}, year = {2017}, } |
|
Kabir, Muhammad Ashad |
SANER '17: "HDSKG: Harvesting Domain Specific ..."
HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
Xuejiao Zhao, Zhenchang Xing , Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shang-Wei Lin (Nanyang Technological University, Singapore; Australian National University, Australia; Charles Sturt University, Australia; NTT, Japan) Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. @InProceedings{SANER17p56, author = {Xuejiao Zhao and Zhenchang Xing and Muhammad Ashad Kabir and Naoya Sawada and Jing Li and Shang-Wei Lin}, title = {HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {56--67}, doi = {}, year = {2017}, } Info |
|
Khomh, Foutse |
SANER '17: "An Empirical Study of Code ..."
An Empirical Study of Code Smells in JavaScript Projects
Amir Saboury, Pooya Musavi, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) JavaScript is a powerful scripting programming language that has gained a lot of attention this past decade. Initially used exclusively for client-side web development, it has evolved to become one of the most popular programming languages, with developers now using it for both client-side and server-side application development. Similar to applications written in other programming languages, JavaScript applications contain code smells, which are poor design choices that can negatively impact the quality of an application. In this paper, we investigate code smells in JavaScript server-side applications with the aim to understand how they impact the fault-proneness of applications. We detect 12 types of code smells in 537 releases of five popular JavaScript applications (i.e., express, grunt, bower, less.js, and request) and perform survival analysis, comparing the time until a fault occurrence, in files containing code smells and files without code smells. Results show that (1) on average, files without code smells have hazard rates 65% lower than files with code smells. (2) Among the studied smells, “Variable Re-assign” and “Assignment In Conditional statements” code smells have the highest hazard rates. Additionally, we conduct a survey with 1,484 JavaScript developers, to understand the perception of developers towards our studied code smells. We found that developers consider “Nested Callbacks”, “Variable Re-assign” and “Long Parameter List” code smells to be serious design problems that hinder the maintainability and reliability of applications. This assessment is in line with the findings of our quantitative analysis. Overall, code smells affect negatively the quality of JavaScript applications and developers should consider tracking and removing them early on before the release of applications to the public. @InProceedings{SANER17p294, author = {Amir Saboury and Pooya Musavi and Foutse Khomh and Giuliano Antoniol}, title = {An Empirical Study of Code Smells in JavaScript Projects}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {294--305}, doi = {}, year = {2017}, } SANER '17: "Stack Overflow: A Code Laundering ..." Stack Overflow: A Code Laundering Platform? Le An, Ons Mlouki, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow. @InProceedings{SANER17p283, author = {Le An and Ons Mlouki and Foutse Khomh and Giuliano Antoniol}, title = {Stack Overflow: A Code Laundering Platform?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {283--293}, doi = {}, year = {2017}, } |
|
Kochhar, Pavneet Singh |
SANER '17: "Detecting Similar Repositories ..."
Detecting Similar Repositories on GitHub
Yun Zhang, David Lo , Pavneet Singh Kochhar, Xin Xia, Quanlai Li, and Jianling Sun (Zhejiang University, China; Singapore Management University, Singapore; University of California at Berkeley, USA) GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN. @InProceedings{SANER17p13, author = {Yun Zhang and David Lo and Pavneet Singh Kochhar and Xin Xia and Quanlai Li and Jianling Sun}, title = {Detecting Similar Repositories on GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {13--23}, doi = {}, year = {2017}, } Info |
|
Kuang, Hongyu |
SANER '17: "Analyzing Closeness of Code ..."
Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery
Hongyu Kuang, Jia Nie, Hao Hu, Patrick Rempel, Jian Lü, Alexander Egyed, and Patrick Mäder (Nanjing University, China; JKU Linz, Austria; TU Ilmenau, Germany) Information Retrieval (IR) identifies trace links based on textual similarities among software artifacts. However, the vocabulary mismatch problem between different artifacts hinders the performance of IR-based approaches. A growing body of work addresses this issue by combining IR techniques with code dependency analysis such as method calls. However, so far the performance of combined approaches is highly dependent to the correctness of IR techniques and does not take full advantage of the code dependency analysis. In this paper, we combine IR techniques with closeness analysis to improve IR-based traceability recovery. Specifically, we quantify and utilize the “closeness” for each call and data dependency between two classes to improve rankings of traceability candidate lists. An empirical evaluation based on three real-world systems suggests that our approach outperforms three baseline approaches. @InProceedings{SANER17p68, author = {Hongyu Kuang and Jia Nie and Hao Hu and Patrick Rempel and Jian Lü and Alexander Egyed and Patrick Mäder}, title = {Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {68--78}, doi = {}, year = {2017}, } |
|
Lanza, Michele |
SANER '17: "How to Gamify Software Engineering ..."
How to Gamify Software Engineering
Tommaso Dal Sasso, Andrea Mocci, Michele Lanza , and Ebrisa Mastrodicasa (University of Lugano, Switzerland) Software development, like any prolonged and intellectually demanding activity, can negatively affect the motivation of developers. This is especially true in specific areas of software engineering, such as requirements engineering, test-driven development, bug reporting and fixing, where the creative aspects of programming fall short. The developers’ engagement might progressively degrade, potentially impacting their work’s quality. Gamification, the use of game elements and game design techniques in non-game contexts, is hailed as a means to boost the motivation of people for a wide range of rote activities. Indeed, well-designed games deeply involve gamers in a positive loop of production, feedback, and reward, eliciting desirable feelings like happiness and collaboration. The question we investigate is how the seemingly frivolous context of games and gamification can be ported to the technically challenging and sober domain of software engineering. Our investigation starts with a review of the state of the art of gamification, supported by a motivating scenario to expose how gamification elements can be integrated in software engineering. We provide a set of basic building blocks to apply gamification techniques, present a conceptual framework to do so, illustrated in two usage contexts, and critically discuss our findings. @InProceedings{SANER17p261, author = {Tommaso Dal Sasso and Andrea Mocci and Michele Lanza and Ebrisa Mastrodicasa}, title = {How to Gamify Software Engineering}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {261--271}, doi = {}, year = {2017}, } |
|
Laverdière, Marc-André |
SANER '17: "Computing Counter-Examples ..."
Computing Counter-Examples for Privilege Protection Losses using Security Models
Marc-André Laverdière and Ettore Merlo (Tata Consultancy Services, Canada; Polytechnique Montréal, Canada) Role-Based Access Control (RBAC) is commonly used in web applications to protect information and restrict operations. Code changes may affect the security of the application and need to be validated, in order to avoid security vulnerabilities, which is a major undertaking. A statement suffers from privilege protection loss in a release pair when it was definitely protected on all execution paths in the previous release and is now reachable by some execution paths with an inferior privilege protection. Because the code change and the resulting privilege protection loss may be distant (e.g. in different functions or files), developers may find it difficult to diagnose and correct the issue. We use Pattern Traversal Flow Analysis (PTFA) to statically analyze code-derived formal models. Our analysis automatically computes counter-examples of definite protection properties and privilege protection losses. We computed privilege protections and their changes for 147 release pairs of WordPress. We computed counter-examples for a total of 14,116 privilege protection losses we found spread in 31 release pairs. We present the distribution of counter-examples’ lengths, as well as their spread across function and file boundaries. Our results show that counter-examples are typically short and localized. The median example spans 88 statements, crosses a single function boundary, and is contained in the same file. The 90 th centile example measures 174 statements and spans 3 function boundaries over 3 files. We believe that the privilege protection counter-examples’ characteristics would be helpful to focus developers’ attention for security reviews. These counter-examples are also a first step toward explanations. @InProceedings{SANER17p240, author = {Marc-André Laverdière and Ettore Merlo}, title = {Computing Counter-Examples for Privilege Protection Losses using Security Models}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {240--249}, doi = {}, year = {2017}, } |
|
Leung, Hareton |
SANER '17: "StiCProb: A Novel Feature ..."
StiCProb: A Novel Feature Mining Approach using Conditional Probability
Yutian Tang and Hareton Leung (Hong Kong Polytechnic University, China) Software Product Line Engineering is a key approach to construct applications with systematical reuse of architecture, documents and other relevant components. To migrate legacy software into a product line system, it is essential to identify the code segments that should be constructed as features from the source base. However, this could be an error-prone and complicated task, as it involves exploring a complex structure and extracting the relations between different components within a system. And normally, representing structural information of a program in a mathematical way should be a promising direction to investigate. We improve this situation by proposing a probability-based approach named StiCProb to capture source code fragments for feature concerned, which inherently provides a conditional probability to describe the closeness between two programming elements. In the case study, we conduct feature mining on several legacy systems, to compare our approach with other related approaches. As demonstrated in our experiment, our approach could support developers to locate features within legacy successfully with a better performance of 83% for precision and 41% for recall. @InProceedings{SANER17p45, author = {Yutian Tang and Hareton Leung}, title = {StiCProb: A Novel Feature Mining Approach using Conditional Probability}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {45--55}, doi = {}, year = {2017}, } Info |
|
Li, Jing |
SANER '17: "HDSKG: Harvesting Domain Specific ..."
HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
Xuejiao Zhao, Zhenchang Xing , Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shang-Wei Lin (Nanyang Technological University, Singapore; Australian National University, Australia; Charles Sturt University, Australia; NTT, Japan) Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. @InProceedings{SANER17p56, author = {Xuejiao Zhao and Zhenchang Xing and Muhammad Ashad Kabir and Naoya Sawada and Jing Li and Shang-Wei Lin}, title = {HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {56--67}, doi = {}, year = {2017}, } Info |
|
Li, Quanlai |
SANER '17: "Detecting Similar Repositories ..."
Detecting Similar Repositories on GitHub
Yun Zhang, David Lo , Pavneet Singh Kochhar, Xin Xia, Quanlai Li, and Jianling Sun (Zhejiang University, China; Singapore Management University, Singapore; University of California at Berkeley, USA) GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN. @InProceedings{SANER17p13, author = {Yun Zhang and David Lo and Pavneet Singh Kochhar and Xin Xia and Quanlai Li and Jianling Sun}, title = {Detecting Similar Repositories on GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {13--23}, doi = {}, year = {2017}, } Info |
|
Lin, Shang-Wei |
SANER '17: "HDSKG: Harvesting Domain Specific ..."
HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
Xuejiao Zhao, Zhenchang Xing , Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shang-Wei Lin (Nanyang Technological University, Singapore; Australian National University, Australia; Charles Sturt University, Australia; NTT, Japan) Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. @InProceedings{SANER17p56, author = {Xuejiao Zhao and Zhenchang Xing and Muhammad Ashad Kabir and Naoya Sawada and Jing Li and Shang-Wei Lin}, title = {HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {56--67}, doi = {}, year = {2017}, } Info |
|
Lipka, Richard |
SANER '17: "Antipatterns Causing Memory ..."
Antipatterns Causing Memory Bloat: A Case Study
Kamil Jezek and Richard Lipka (University of West Bohemia, Czech Republic) Java is one of the languages that are popular for high abstraction and automatic memory management. As in other object-oriented languages, Java’s objects can easily represent a domain model of an application. While it has a positive impact on the design, implementation and maintenance of applications, there are drawbacks as well. One of them is a relatively high memory overhead to manage objects. In this work, we show our experience with searching for this problem in an application that we refactored to use less memory. Although the application was relatively well designed with no memory leaks, it required such a big amount of memory that for large data the application was not usable in reality. We did three relatively simple improvements: we reduced the usage of Java Collections, removed unnecessary object instances, and simplified the domain model, which reduced memory needs up to 88% and made the application better usable and even faster. This work is a case-study reporting results. Moreover, the employed ideas are formulated as a set of antipatterns, which may be used for other applications. @InProceedings{SANER17p306, author = {Kamil Jezek and Richard Lipka}, title = {Antipatterns Causing Memory Bloat: A Case Study}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {306--315}, doi = {}, year = {2017}, } |
|
Liu, Bing |
SANER '17: "Improving Fault Localization ..."
Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models
Bing Liu, Lucia, Shiva Nejati, and Lionel C. Briand (University of Luxembourg, Luxembourg) One promising way to improve the accuracy of fault localization based on statistical debugging is to increase diversity among test cases in the underlying test suite. In many practical situations, adding test cases is not a cost-free option because test oracles are developed manually or running test cases is expensive. Hence, we require to have test suites that are both diverse and small to improve debugging. In this paper, we focus on improving fault localization of Simulink models by generating test cases. We identify three test objectives that aim to increase test suite diversity. We use these objectives in a search-based algorithm to generate diversified but small test suites. To further minimize test suite sizes, we develop a prediction model to stop test generation when adding test cases is unlikely to improve fault localization. We evaluate our approach using three industrial subjects. Our results show (1) the three selected test objectives are able to significantly improve the accuracy of fault localization for small test suite sizes, and (2) our prediction model is able to maintain almost the same fault localization accuracy while reducing the average number of newly generated test cases by more than half. @InProceedings{SANER17p359, author = {Bing Liu and Lucia and Shiva Nejati and Lionel C. Briand}, title = {Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {359--370}, doi = {}, year = {2017}, } |
|
Liu, Jin |
SANER '17: "Scalable Tag Recommendation ..."
Scalable Tag Recommendation for Software Information Sites
Pingyi Zhou, Jin Liu, Zijiang Yang, and Guangyou Zhou (Wuhan University, China; Western Michigan University, USA; Central China Normal University, China) Software developers can search, share and learn development experience, solutions, bug fixes and open source projects in software information sites such as StackOverflow and Freecode. Many software information sites rely on tags to classify their contents, i.e. software objects, in order to improve the performance and accuracy of various operations on the sites. The quality of tags thus has a significant impact on the usefulness of these sites. High quality tags are expected to be concise and can describe the most important features of the software objects. Unfortunately tagging is inherently an uncoordinated process. The choice of tags made by individual software developers is dependent not only on a developer's understanding of the software object but also on the developer's English skills and preferences. As a result, the number of different tags grows rapidly along with continuous addition of software objects. With thousands of different tags, many of which introduce noise, software objects become poorly classified. Such phenomenon affects negatively the speed and accuracy of developers' queries. In this paper, we propose a tool called TagMulRec to automatically recommend tags and classify software objects in evolving large-scale software information sites. Given a new software object, TagMulRec locates the software objects that are semantically similar to the new one and exploit their tags. We have evaluated TagMulRec on four software information sites, StackOverflow, AskUbuntu, AskDifferent and Freecode. According to our empirical study, TagMulRec is not only accurate but also scalable that can handle a large-scale software information site with millions of software objects and thousands of tags. @InProceedings{SANER17p272, author = {Pingyi Zhou and Jin Liu and Zijiang Yang and Guangyou Zhou}, title = {Scalable Tag Recommendation for Software Information Sites}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {272--282}, doi = {}, year = {2017}, } |
|
Lo, David |
SANER '17: "Detecting Similar Repositories ..."
Detecting Similar Repositories on GitHub
Yun Zhang, David Lo , Pavneet Singh Kochhar, Xin Xia, Quanlai Li, and Jianling Sun (Zhejiang University, China; Singapore Management University, Singapore; University of California at Berkeley, USA) GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN. @InProceedings{SANER17p13, author = {Yun Zhang and David Lo and Pavneet Singh Kochhar and Xin Xia and Quanlai Li and Jianling Sun}, title = {Detecting Similar Repositories on GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {13--23}, doi = {}, year = {2017}, } Info |
|
Lu, Hongmin |
SANER '17: "An Empirical Investigation ..."
An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults
Yiyang Feng, Wanwangying Ma, Yibiao Yang, Hongmin Lu, Yuming Zhou , and Baowen Xu (Nanjing University, China) In recent years, it has been shown that fault prediction models could effectively guide test effort allocation in finding faults if they have a high enough fault prediction accuracy (Norm(Popt) > 0.78). However, it is often difficult to achieve such a high fault prediction accuracy in practice. As a result, fault-prediction-model-guided allocation (FPA) methods may be not applicable in real development environments. To attack this problem, in this paper, we propose a new type of test effort allocation strategy: reliability-growth-model-guided allocation (RGA) method. For a given project release V, RGA attempts to predict the optimal test effort allocation for V by learning the fault distribution information from the previous releases. Based on three open-source projects, we empirically investigate the cost-effectiveness of three test effort allocation strategies for finding faults: RGA, FPA, and structural-complexity-guided allocation (SCA) method. The experimental results show that RGA shows a promising performance in finding faults when compared with SCA and FPA. @InProceedings{SANER17p371, author = {Yiyang Feng and Wanwangying Ma and Yibiao Yang and Hongmin Lu and Yuming Zhou and Baowen Xu}, title = {An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {371--381}, doi = {}, year = {2017}, } |
|
Lü, Jian |
SANER '17: "Analyzing Closeness of Code ..."
Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery
Hongyu Kuang, Jia Nie, Hao Hu, Patrick Rempel, Jian Lü, Alexander Egyed, and Patrick Mäder (Nanjing University, China; JKU Linz, Austria; TU Ilmenau, Germany) Information Retrieval (IR) identifies trace links based on textual similarities among software artifacts. However, the vocabulary mismatch problem between different artifacts hinders the performance of IR-based approaches. A growing body of work addresses this issue by combining IR techniques with code dependency analysis such as method calls. However, so far the performance of combined approaches is highly dependent to the correctness of IR techniques and does not take full advantage of the code dependency analysis. In this paper, we combine IR techniques with closeness analysis to improve IR-based traceability recovery. Specifically, we quantify and utilize the “closeness” for each call and data dependency between two classes to improve rankings of traceability candidate lists. An empirical evaluation based on three real-world systems suggests that our approach outperforms three baseline approaches. @InProceedings{SANER17p68, author = {Hongyu Kuang and Jia Nie and Hao Hu and Patrick Rempel and Jian Lü and Alexander Egyed and Patrick Mäder}, title = {Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {68--78}, doi = {}, year = {2017}, } |
|
Lucia |
SANER '17: "Improving Fault Localization ..."
Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models
Bing Liu, Lucia, Shiva Nejati, and Lionel C. Briand (University of Luxembourg, Luxembourg) One promising way to improve the accuracy of fault localization based on statistical debugging is to increase diversity among test cases in the underlying test suite. In many practical situations, adding test cases is not a cost-free option because test oracles are developed manually or running test cases is expensive. Hence, we require to have test suites that are both diverse and small to improve debugging. In this paper, we focus on improving fault localization of Simulink models by generating test cases. We identify three test objectives that aim to increase test suite diversity. We use these objectives in a search-based algorithm to generate diversified but small test suites. To further minimize test suite sizes, we develop a prediction model to stop test generation when adding test cases is unlikely to improve fault localization. We evaluate our approach using three industrial subjects. Our results show (1) the three selected test objectives are able to significantly improve the accuracy of fault localization for small test suite sizes, and (2) our prediction model is able to maintain almost the same fault localization accuracy while reducing the average number of newly generated test cases by more than half. @InProceedings{SANER17p359, author = {Bing Liu and Lucia and Shiva Nejati and Lionel C. Briand}, title = {Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {359--370}, doi = {}, year = {2017}, } |
|
Ma, Wanwangying |
SANER '17: "An Empirical Investigation ..."
An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults
Yiyang Feng, Wanwangying Ma, Yibiao Yang, Hongmin Lu, Yuming Zhou , and Baowen Xu (Nanjing University, China) In recent years, it has been shown that fault prediction models could effectively guide test effort allocation in finding faults if they have a high enough fault prediction accuracy (Norm(Popt) > 0.78). However, it is often difficult to achieve such a high fault prediction accuracy in practice. As a result, fault-prediction-model-guided allocation (FPA) methods may be not applicable in real development environments. To attack this problem, in this paper, we propose a new type of test effort allocation strategy: reliability-growth-model-guided allocation (RGA) method. For a given project release V, RGA attempts to predict the optimal test effort allocation for V by learning the fault distribution information from the previous releases. Based on three open-source projects, we empirically investigate the cost-effectiveness of three test effort allocation strategies for finding faults: RGA, FPA, and structural-complexity-guided allocation (SCA) method. The experimental results show that RGA shows a promising performance in finding faults when compared with SCA and FPA. @InProceedings{SANER17p371, author = {Yiyang Feng and Wanwangying Ma and Yibiao Yang and Hongmin Lu and Yuming Zhou and Baowen Xu}, title = {An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {371--381}, doi = {}, year = {2017}, } |
|
Mäder, Patrick |
SANER '17: "Analyzing Closeness of Code ..."
Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery
Hongyu Kuang, Jia Nie, Hao Hu, Patrick Rempel, Jian Lü, Alexander Egyed, and Patrick Mäder (Nanjing University, China; JKU Linz, Austria; TU Ilmenau, Germany) Information Retrieval (IR) identifies trace links based on textual similarities among software artifacts. However, the vocabulary mismatch problem between different artifacts hinders the performance of IR-based approaches. A growing body of work addresses this issue by combining IR techniques with code dependency analysis such as method calls. However, so far the performance of combined approaches is highly dependent to the correctness of IR techniques and does not take full advantage of the code dependency analysis. In this paper, we combine IR techniques with closeness analysis to improve IR-based traceability recovery. Specifically, we quantify and utilize the “closeness” for each call and data dependency between two classes to improve rankings of traceability candidate lists. An empirical evaluation based on three real-world systems suggests that our approach outperforms three baseline approaches. @InProceedings{SANER17p68, author = {Hongyu Kuang and Jia Nie and Hao Hu and Patrick Rempel and Jian Lü and Alexander Egyed and Patrick Mäder}, title = {Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {68--78}, doi = {}, year = {2017}, } |
|
Maia, Marcelo de Almeida |
SANER '17: "Recommending Source Code Locations ..."
Recommending Source Code Locations for System Specific Transformations
Gustavo Santos, Klérisson V. R. Paixão, Nicolas Anquetil, Anne Etien, Marcelo de Almeida Maia, and Stéphane Ducasse (University of Lille, France; CNRS, France; Inria, France; Federal University of Uberlândia, Brazil) From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches (“structural”, based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%. @InProceedings{SANER17p160, author = {Gustavo Santos and Klérisson V. R. Paixão and Nicolas Anquetil and Anne Etien and Marcelo de Almeida Maia and Stéphane Ducasse}, title = {Recommending Source Code Locations for System Specific Transformations}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {160--170}, doi = {}, year = {2017}, } |
|
Maletic, Jonathan I. |
SANER '17: "Lexical Categories for Source ..."
Lexical Categories for Source Code Identifiers
Christian D. Newman, Reem S. AlSuhaibani, Michael L. Collard, and Jonathan I. Maletic (Kent State University, USA; University of Akron, USA) A set of lexical categories, analogous to part-of-speech categories for English prose, is defined for source-code identifiers. The lexical category for an identifier is determined from its declaration in the source code, syntactic meaning in the programming language, and static program analysis. Current techniques for assigning lexical categories to identifiers use natural-language part-of-speech taggers. However, these NLP approaches assign lexical tags based on how terms are used in English prose. The approach taken here differs in that it uses only source code to determine the lexical category. The approach assigns a lexical category to each identifier and stores this information along with each declaration. srcML is used as the infrastructure to implement the approach and so the lexical information is stored directly in the srcML markup as an additional XML element for each identifier. These lexical-category annotations can then be later used by tools that automatically generate such things as code summarization or documentation. The approach is applied to 50 open source projects and the soundness of the defined lexical categories evaluated. The evaluation shows that at every level of minimum support tested, categorization is consistent at least 79% of the time with an overall consistency (across all supports) of at least 88%. The categories reveal a correlation between how an identifier is named and how it is declared. This provides a syntax-oriented view (as opposed to English part-of-speech view) of developer intent of identifiers. @InProceedings{SANER17p228, author = {Christian D. Newman and Reem S. AlSuhaibani and Michael L. Collard and Jonathan I. Maletic}, title = {Lexical Categories for Source Code Identifiers}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {228--239}, doi = {}, year = {2017}, } |
|
Mastrodicasa, Ebrisa |
SANER '17: "How to Gamify Software Engineering ..."
How to Gamify Software Engineering
Tommaso Dal Sasso, Andrea Mocci, Michele Lanza , and Ebrisa Mastrodicasa (University of Lugano, Switzerland) Software development, like any prolonged and intellectually demanding activity, can negatively affect the motivation of developers. This is especially true in specific areas of software engineering, such as requirements engineering, test-driven development, bug reporting and fixing, where the creative aspects of programming fall short. The developers’ engagement might progressively degrade, potentially impacting their work’s quality. Gamification, the use of game elements and game design techniques in non-game contexts, is hailed as a means to boost the motivation of people for a wide range of rote activities. Indeed, well-designed games deeply involve gamers in a positive loop of production, feedback, and reward, eliciting desirable feelings like happiness and collaboration. The question we investigate is how the seemingly frivolous context of games and gamification can be ported to the technically challenging and sober domain of software engineering. Our investigation starts with a review of the state of the art of gamification, supported by a motivating scenario to expose how gamification elements can be integrated in software engineering. We provide a set of basic building blocks to apply gamification techniques, present a conceptual framework to do so, illustrated in two usage contexts, and critically discuss our findings. @InProceedings{SANER17p261, author = {Tommaso Dal Sasso and Andrea Mocci and Michele Lanza and Ebrisa Mastrodicasa}, title = {How to Gamify Software Engineering}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {261--271}, doi = {}, year = {2017}, } |
|
Mayerhofer, Tanja |
SANER '17: "Automated Generation of Consistency-Achieving ..."
Automated Generation of Consistency-Achieving Model Editors
Patrick Neubauer, Robert Bill, Tanja Mayerhofer, and Manuel Wimmer (Vienna University of Technology, Austria) The advances of domain-specific modeling languages (DSMLs) and their editors created with modern language workbenches, have convinced domain experts of applying them as important and powerful means in their daily endeavors. Despite the fact that such editors are proficient in retaining syntactical model correctness, they present major shortages in mastering the preservation of consistency in models with elaborated language-specific constraints which require language engineers to manually implement sophisticated editing capabilities. Consequently, there is a demand for automating procedures to support editor users in both comprehending as well as resolving consistency violations. In this paper, we present an approach to automate the generation of advanced editing support for DSMLs offering automated validation, content-assist, and quick fix capabilities beyond those created by state-of-the-art language workbenches that help domain experts in retaining and achieving the consistency of models. For validation, we show potential error causes for violated constraints, instead of only the context in which constraints are violated. The state-space explosion problem is mitigated by our approach resolving constraint violations by increasing the neighborhood scope in a three-stage process, seeking constraint repair solutions presented as quick fixes to the editor user. We illustrate and provide an initial evaluation of our approach based on an Xtext-based DSML for modeling service clusters. @InProceedings{SANER17p127, author = {Patrick Neubauer and Robert Bill and Tanja Mayerhofer and Manuel Wimmer}, title = {Automated Generation of Consistency-Achieving Model Editors}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {127--137}, doi = {}, year = {2017}, } Info |
|
Meinicke, Jens |
SANER '17: "Variant-Preserving Refactorings ..."
Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line
Wolfram Fenske, Jens Meinicke, Sandro Schulze, Steffen Schulze, and Gunter Saake (University of Magdeburg, Germany; Carnegie Mellon University, USA) A common and simple way to create custom product variants is to copy and adapt existing software (a.k.a. the clone-and-own approach). Clone-and-own promises low initial costs for creating a new variant as existing code is easily reused. However, clone-and-own also comes with major drawbacks for maintenance and evolution since changes, such as bug fixes, need to be synchronized among several product variants. Software product lines (SPLs) provide solutions to these problems because commonalities are implemented only once. Thus, in an SPL, changes also need to be applied only once. Therefore, the migration of cloned product variants to an SPL would be beneficial. The main tasks of migration are the identification and extraction of commonalities from existing products. However, these tasks are challenging and currently not well-supported. In this paper, we propose a step-wise and semi-automated process to migrate cloned product variants to a feature-oriented SPL. Our process relies on clone detection to identify code that is common to multiple variants and novel, variant-preserving refactorings to extract such common code. We evaluated our approach on five cloned product variants, reducing code clones by 25%. Moreover, we provide qualitative insights into possible limitations and potentials for removing even more redundant code. We argue that our approach can effectively decrease synchronization effort compared to clone-and-own development and thus reduce the long-term costs for maintenance and evolution. @InProceedings{SANER17p316, author = {Wolfram Fenske and Jens Meinicke and Sandro Schulze and Steffen Schulze and Gunter Saake}, title = {Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {316--326}, doi = {}, year = {2017}, } |
|
Menaia, Shay |
SANER '17: "Efficient Method Extraction ..."
Efficient Method Extraction for Automatic Elimination of Type-3 Clones
Ran Ettinger, Shmuel Tyszberowicz, and Shay Menaia (Ben-Gurion University of the Negev, Israel; Academic College of Tel Aviv-Yaffo, Israel) A semantics-preserving transformation by Komondoor and Horwitz has been shown to be most effective in the elimination of type-3 clones. The two original algorithms for realizing this transformation, however, are not as efficient as the related (slice-based) transformations. We present an asymptotically-faster algorithm that implements the same transformation via bidirectional reachability on a program dependence graph, and we prove its equivalence to the original formulation. @InProceedings{SANER17p327, author = {Ran Ettinger and Shmuel Tyszberowicz and Shay Menaia}, title = {Efficient Method Extraction for Automatic Elimination of Type-3 Clones}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {327--337}, doi = {}, year = {2017}, } |
|
Mens, Tom |
SANER '17: "An Empirical Comparison of ..."
An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems
Alexandre Decan , Tom Mens, and Maëlick Claes (University of Mons, Belgium) Nearly every popular programming language comes with one or more open source software packaging ecosystem(s), containing a large collection of interdependent software packages developed in that programming language. Such packaging ecosystems are extremely useful for their respective software development community. We present an empirical analysis of how the dependency graphs of three large packaging ecosystems (npm, CRAN and RubyGems) evolve over time. We study how the existing package dependencies impact the resilience of the three ecosystems over time and to which extent these ecosystems suffer from issues related to package dependency updates. We analyse specific solutions that each ecosystem has put into place and argue that none of these solutions is perfect, motivating the need for better tools to deal with package dependency update problems. @InProceedings{SANER17p2, author = {Alexandre Decan and Tom Mens and Maëlick Claes}, title = {An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {2--12}, doi = {}, year = {2017}, } SANER '17: "Socio-Technical Evolution ..." Socio-Technical Evolution of the Ruby Ecosystem in GitHub Eleni Constantinou and Tom Mens (University of Mons, Belgium) The evolution dynamics of a software ecosystem depend on the activity of the developer community contributing to projects within it. Both social and technical changes affect an ecosystem's evolution and the research community has been investigating the impact of these modifications over the last few years. Existing studies mainly focus on temporary modifications, often ignoring the effect of permanent changes on the software ecosystem. We present an empirical study of the magnitude and effect of permanent modifications in both the social and technical parts of a software ecosystem. More precisely, we measure permanent changes with regard to the ecosystem's projects, contributors and source code files and present our findings concerning the effect of these modifications. We study the Ruby ecosystem in GitHub over a nine-year period by carrying out a socio-technical analysis of the co-evolution of a large number of base projects and their forks. This analysis involves both the source code developed for these projects as well as the developers having contributed to them. We discuss our findings with respect to the ecosystem evolution according to three different viewpoints: (1) the base projects, (2) the forks and (3) the entire ecosystem containing both the base projects and forks. Our findings show an increased growth in both the technical and social aspects of the Ruby ecosystem until early 2014, followed by an increased contributor and project abandonment rate. We show the effect of permanent modifications in the ecosystem evolution and provide preliminary evidence of contributors migrating to other ecosystems when leaving the Ruby ecosystem. @InProceedings{SANER17p34, author = {Eleni Constantinou and Tom Mens}, title = {Socio-Technical Evolution of the Ruby Ecosystem in GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {34--44}, doi = {}, year = {2017}, } Info |
|
Merlo, Ettore |
SANER '17: "Computing Counter-Examples ..."
Computing Counter-Examples for Privilege Protection Losses using Security Models
Marc-André Laverdière and Ettore Merlo (Tata Consultancy Services, Canada; Polytechnique Montréal, Canada) Role-Based Access Control (RBAC) is commonly used in web applications to protect information and restrict operations. Code changes may affect the security of the application and need to be validated, in order to avoid security vulnerabilities, which is a major undertaking. A statement suffers from privilege protection loss in a release pair when it was definitely protected on all execution paths in the previous release and is now reachable by some execution paths with an inferior privilege protection. Because the code change and the resulting privilege protection loss may be distant (e.g. in different functions or files), developers may find it difficult to diagnose and correct the issue. We use Pattern Traversal Flow Analysis (PTFA) to statically analyze code-derived formal models. Our analysis automatically computes counter-examples of definite protection properties and privilege protection losses. We computed privilege protections and their changes for 147 release pairs of WordPress. We computed counter-examples for a total of 14,116 privilege protection losses we found spread in 31 release pairs. We present the distribution of counter-examples’ lengths, as well as their spread across function and file boundaries. Our results show that counter-examples are typically short and localized. The median example spans 88 statements, crosses a single function boundary, and is contained in the same file. The 90 th centile example measures 174 statements and spans 3 function boundaries over 3 files. We believe that the privilege protection counter-examples’ characteristics would be helpful to focus developers’ attention for security reviews. These counter-examples are also a first step toward explanations. @InProceedings{SANER17p240, author = {Marc-André Laverdière and Ettore Merlo}, title = {Computing Counter-Examples for Privilege Protection Losses using Security Models}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {240--249}, doi = {}, year = {2017}, } |
|
Mezini, Mira |
SANER '17: "Enriching In-IDE Process Information ..."
Enriching In-IDE Process Information with Fine-Grained Source Code History
Sebastian Proksch, Sarah Nadi , Sven Amann, and Mira Mezini (TU Darmstadt, Germany; University of Alberta, Canada) Current studies on software development either focus on the change history of source code from version-control systems or on an analysis of simplistic in-IDE events without context information. Each of these approaches contains valuable information that is unavailable in the other case. Our work proposes enriched event streams, a solution that combines the best of both worlds and provides a holistic view on the software development process. Enriched event streams not only capture developer activities in the IDE, but also specialized context information, such as source-code snapshots for change events. To enable the storage of such code snapshots in an analyzable format, we introduce a new intermediate representation called Simplified Syntax Trees (SSTs) and build CARET, a platform that offers reusable components to conveniently work with enriched event streams. We implement FeedBaG++, an instrumentation for Visual Studio that collects enriched event streams with code snapshots in the form of SSTs. We share a dataset of enriched event streams captured from 58 users and representing 915 days of work. Additionally, to demonstrate usefulness, we present three research applications that have already made use of CARET and FeedBaG++. @InProceedings{SANER17p250, author = {Sebastian Proksch and Sarah Nadi and Sven Amann and Mira Mezini}, title = {Enriching In-IDE Process Information with Fine-Grained Source Code History}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {250--260}, doi = {}, year = {2017}, } Info |
|
Mlouki, Ons |
SANER '17: "Stack Overflow: A Code Laundering ..."
Stack Overflow: A Code Laundering Platform?
Le An, Ons Mlouki, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow. @InProceedings{SANER17p283, author = {Le An and Ons Mlouki and Foutse Khomh and Giuliano Antoniol}, title = {Stack Overflow: A Code Laundering Platform?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {283--293}, doi = {}, year = {2017}, } |
|
Mocci, Andrea |
SANER '17: "How to Gamify Software Engineering ..."
How to Gamify Software Engineering
Tommaso Dal Sasso, Andrea Mocci, Michele Lanza , and Ebrisa Mastrodicasa (University of Lugano, Switzerland) Software development, like any prolonged and intellectually demanding activity, can negatively affect the motivation of developers. This is especially true in specific areas of software engineering, such as requirements engineering, test-driven development, bug reporting and fixing, where the creative aspects of programming fall short. The developers’ engagement might progressively degrade, potentially impacting their work’s quality. Gamification, the use of game elements and game design techniques in non-game contexts, is hailed as a means to boost the motivation of people for a wide range of rote activities. Indeed, well-designed games deeply involve gamers in a positive loop of production, feedback, and reward, eliciting desirable feelings like happiness and collaboration. The question we investigate is how the seemingly frivolous context of games and gamification can be ported to the technically challenging and sober domain of software engineering. Our investigation starts with a review of the state of the art of gamification, supported by a motivating scenario to expose how gamification elements can be integrated in software engineering. We provide a set of basic building blocks to apply gamification techniques, present a conceptual framework to do so, illustrated in two usage contexts, and critically discuss our findings. @InProceedings{SANER17p261, author = {Tommaso Dal Sasso and Andrea Mocci and Michele Lanza and Ebrisa Mastrodicasa}, title = {How to Gamify Software Engineering}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {261--271}, doi = {}, year = {2017}, } |
|
Moha, Naouel |
SANER '17: "Investigating the Energy Impact ..."
Investigating the Energy Impact of Android Smells
Antonin Carette, Mehdi Adel Ait Younes, Geoffrey Hecht, Naouel Moha, and Romain Rouvoy (Université du Québec à Montréal, Canada; Inria, France; University of Lille, France; IUF, France) Android code smells are bad implementation practices within Android applications (or apps) that may lead to poor software quality. These code smells are known to degrade the performance of apps and to have an impact on energy consumption. However, few studies have assessed the positive impact on energy consumption when correcting code smells. In this paper, we therefore propose a tooled and reproducible approach, called Hot-Pepper, to automatically correct code smells and evaluate their impact on energy consumption. Currently, Hot-Pepper is able to automatically correct three types of Android-specific code smells: Internal Getter/Setter, Member Ignoring Method, and HashMap Usage. Hot-Pepper derives four versions of the apps by correcting each detected smell independently, and all of them at once. Hot-Pepper is able to report on the energy consumption of each app version with a single user scenario test. Our empirical study on five open-source Android apps shows that correcting the three aforementioned Android code smells effectively and significantly reduces the energy consumption of apps. In particular, we observed a global reduction in energy consumption by 4,83% in one app when the three code smells are corrected. We also take advantage of the flexibility of Hot-Pepper to investigate the impact of three picture smells (bad picture format, compression, and bitmap format) in sample apps. We observed that the usage of optimised JPG pictures with the Android default bitmap format is the most energy efficient combination in Android apps. We believe that developers can benefit from our approach and results to guide their refactoring, and thus improve the energy consumption of their mobile apps. @InProceedings{SANER17p115, author = {Antonin Carette and Mehdi Adel Ait Younes and Geoffrey Hecht and Naouel Moha and Romain Rouvoy}, title = {Investigating the Energy Impact of Android Smells}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {115--126}, doi = {}, year = {2017}, } |
|
Monperrus, Martin |
SANER '17: "Dynamic Patch Generation for ..."
Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming
Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus (University of Lille, France; Inria, France) Null pointer exceptions (NPE) are the number one cause of uncaught crashing exceptions in production. In this paper, we aim at exploring the search space of possible patches for null pointer exceptions with metaprogramming. Our idea is to transform the program under repair with automated code transformation, so as to obtain a metaprogram. This metaprogram contains automatically injected hooks, that can be activated to emulate a null pointer exception patch. This enables us to perform a fine-grain analysis of the runtime context of null pointer exceptions. We set up an experiment with 16 real null pointer exceptions that have happened in the field. We compare the effectiveness of our metaprogramming approach against simple templates for repairing null pointer exceptions. @InProceedings{SANER17p349, author = {Thomas Durieux and Benoit Cornu and Lionel Seinturier and Martin Monperrus}, title = {Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {349--358}, doi = {}, year = {2017}, } |
|
Musavi, Pooya |
SANER '17: "An Empirical Study of Code ..."
An Empirical Study of Code Smells in JavaScript Projects
Amir Saboury, Pooya Musavi, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) JavaScript is a powerful scripting programming language that has gained a lot of attention this past decade. Initially used exclusively for client-side web development, it has evolved to become one of the most popular programming languages, with developers now using it for both client-side and server-side application development. Similar to applications written in other programming languages, JavaScript applications contain code smells, which are poor design choices that can negatively impact the quality of an application. In this paper, we investigate code smells in JavaScript server-side applications with the aim to understand how they impact the fault-proneness of applications. We detect 12 types of code smells in 537 releases of five popular JavaScript applications (i.e., express, grunt, bower, less.js, and request) and perform survival analysis, comparing the time until a fault occurrence, in files containing code smells and files without code smells. Results show that (1) on average, files without code smells have hazard rates 65% lower than files with code smells. (2) Among the studied smells, “Variable Re-assign” and “Assignment In Conditional statements” code smells have the highest hazard rates. Additionally, we conduct a survey with 1,484 JavaScript developers, to understand the perception of developers towards our studied code smells. We found that developers consider “Nested Callbacks”, “Variable Re-assign” and “Long Parameter List” code smells to be serious design problems that hinder the maintainability and reliability of applications. This assessment is in line with the findings of our quantitative analysis. Overall, code smells affect negatively the quality of JavaScript applications and developers should consider tracking and removing them early on before the release of applications to the public. @InProceedings{SANER17p294, author = {Amir Saboury and Pooya Musavi and Foutse Khomh and Giuliano Antoniol}, title = {An Empirical Study of Code Smells in JavaScript Projects}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {294--305}, doi = {}, year = {2017}, } |
|
Nadi, Sarah |
SANER '17: "Enriching In-IDE Process Information ..."
Enriching In-IDE Process Information with Fine-Grained Source Code History
Sebastian Proksch, Sarah Nadi , Sven Amann, and Mira Mezini (TU Darmstadt, Germany; University of Alberta, Canada) Current studies on software development either focus on the change history of source code from version-control systems or on an analysis of simplistic in-IDE events without context information. Each of these approaches contains valuable information that is unavailable in the other case. Our work proposes enriched event streams, a solution that combines the best of both worlds and provides a holistic view on the software development process. Enriched event streams not only capture developer activities in the IDE, but also specialized context information, such as source-code snapshots for change events. To enable the storage of such code snapshots in an analyzable format, we introduce a new intermediate representation called Simplified Syntax Trees (SSTs) and build CARET, a platform that offers reusable components to conveniently work with enriched event streams. We implement FeedBaG++, an instrumentation for Visual Studio that collects enriched event streams with code snapshots in the form of SSTs. We share a dataset of enriched event streams captured from 58 users and representing 915 days of work. Additionally, to demonstrate usefulness, we present three research applications that have already made use of CARET and FeedBaG++. @InProceedings{SANER17p250, author = {Sebastian Proksch and Sarah Nadi and Sven Amann and Mira Mezini}, title = {Enriching In-IDE Process Information with Fine-Grained Source Code History}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {250--260}, doi = {}, year = {2017}, } Info |
|
Nejati, Shiva |
SANER '17: "Improving Fault Localization ..."
Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models
Bing Liu, Lucia, Shiva Nejati, and Lionel C. Briand (University of Luxembourg, Luxembourg) One promising way to improve the accuracy of fault localization based on statistical debugging is to increase diversity among test cases in the underlying test suite. In many practical situations, adding test cases is not a cost-free option because test oracles are developed manually or running test cases is expensive. Hence, we require to have test suites that are both diverse and small to improve debugging. In this paper, we focus on improving fault localization of Simulink models by generating test cases. We identify three test objectives that aim to increase test suite diversity. We use these objectives in a search-based algorithm to generate diversified but small test suites. To further minimize test suite sizes, we develop a prediction model to stop test generation when adding test cases is unlikely to improve fault localization. We evaluate our approach using three industrial subjects. Our results show (1) the three selected test objectives are able to significantly improve the accuracy of fault localization for small test suite sizes, and (2) our prediction model is able to maintain almost the same fault localization accuracy while reducing the average number of newly generated test cases by more than half. @InProceedings{SANER17p359, author = {Bing Liu and Lucia and Shiva Nejati and Lionel C. Briand}, title = {Improving Fault Localization for Simulink Models using Search-Based Testing and Prediction Models}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {359--370}, doi = {}, year = {2017}, } |
|
Neubauer, Patrick |
SANER '17: "Automated Generation of Consistency-Achieving ..."
Automated Generation of Consistency-Achieving Model Editors
Patrick Neubauer, Robert Bill, Tanja Mayerhofer, and Manuel Wimmer (Vienna University of Technology, Austria) The advances of domain-specific modeling languages (DSMLs) and their editors created with modern language workbenches, have convinced domain experts of applying them as important and powerful means in their daily endeavors. Despite the fact that such editors are proficient in retaining syntactical model correctness, they present major shortages in mastering the preservation of consistency in models with elaborated language-specific constraints which require language engineers to manually implement sophisticated editing capabilities. Consequently, there is a demand for automating procedures to support editor users in both comprehending as well as resolving consistency violations. In this paper, we present an approach to automate the generation of advanced editing support for DSMLs offering automated validation, content-assist, and quick fix capabilities beyond those created by state-of-the-art language workbenches that help domain experts in retaining and achieving the consistency of models. For validation, we show potential error causes for violated constraints, instead of only the context in which constraints are violated. The state-space explosion problem is mitigated by our approach resolving constraint violations by increasing the neighborhood scope in a three-stage process, seeking constraint repair solutions presented as quick fixes to the editor user. We illustrate and provide an initial evaluation of our approach based on an Xtext-based DSML for modeling service clusters. @InProceedings{SANER17p127, author = {Patrick Neubauer and Robert Bill and Tanja Mayerhofer and Manuel Wimmer}, title = {Automated Generation of Consistency-Achieving Model Editors}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {127--137}, doi = {}, year = {2017}, } Info |
|
Newman, Christian D. |
SANER '17: "Lexical Categories for Source ..."
Lexical Categories for Source Code Identifiers
Christian D. Newman, Reem S. AlSuhaibani, Michael L. Collard, and Jonathan I. Maletic (Kent State University, USA; University of Akron, USA) A set of lexical categories, analogous to part-of-speech categories for English prose, is defined for source-code identifiers. The lexical category for an identifier is determined from its declaration in the source code, syntactic meaning in the programming language, and static program analysis. Current techniques for assigning lexical categories to identifiers use natural-language part-of-speech taggers. However, these NLP approaches assign lexical tags based on how terms are used in English prose. The approach taken here differs in that it uses only source code to determine the lexical category. The approach assigns a lexical category to each identifier and stores this information along with each declaration. srcML is used as the infrastructure to implement the approach and so the lexical information is stored directly in the srcML markup as an additional XML element for each identifier. These lexical-category annotations can then be later used by tools that automatically generate such things as code summarization or documentation. The approach is applied to 50 open source projects and the soundness of the defined lexical categories evaluated. The evaluation shows that at every level of minimum support tested, categorization is consistent at least 79% of the time with an overall consistency (across all supports) of at least 88%. The categories reveal a correlation between how an identifier is named and how it is declared. This provides a syntax-oriented view (as opposed to English part-of-speech view) of developer intent of identifiers. @InProceedings{SANER17p228, author = {Christian D. Newman and Reem S. AlSuhaibani and Michael L. Collard and Jonathan I. Maletic}, title = {Lexical Categories for Source Code Identifiers}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {228--239}, doi = {}, year = {2017}, } |
|
Nie, Jia |
SANER '17: "Analyzing Closeness of Code ..."
Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery
Hongyu Kuang, Jia Nie, Hao Hu, Patrick Rempel, Jian Lü, Alexander Egyed, and Patrick Mäder (Nanjing University, China; JKU Linz, Austria; TU Ilmenau, Germany) Information Retrieval (IR) identifies trace links based on textual similarities among software artifacts. However, the vocabulary mismatch problem between different artifacts hinders the performance of IR-based approaches. A growing body of work addresses this issue by combining IR techniques with code dependency analysis such as method calls. However, so far the performance of combined approaches is highly dependent to the correctness of IR techniques and does not take full advantage of the code dependency analysis. In this paper, we combine IR techniques with closeness analysis to improve IR-based traceability recovery. Specifically, we quantify and utilize the “closeness” for each call and data dependency between two classes to improve rankings of traceability candidate lists. An empirical evaluation based on three real-world systems suggests that our approach outperforms three baseline approaches. @InProceedings{SANER17p68, author = {Hongyu Kuang and Jia Nie and Hao Hu and Patrick Rempel and Jian Lü and Alexander Egyed and Patrick Mäder}, title = {Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {68--78}, doi = {}, year = {2017}, } |
|
Overeem, Michiel |
SANER '17: "The Dark Side of Event Sourcing: ..."
The Dark Side of Event Sourcing: Managing Data Conversion
Michiel Overeem, Marten Spoor, and Slinger Jansen (AFAS Software, Netherlands; Utrecht University, Netherlands) Evolving software systems includes data schema changes, and because of those schema changes data has to be converted. Converting data between two different schemas while continuing the operation of the system is a challenge when that system is expected to be available always. Data conversion in event sourced systems introduces new challenges, because of the relative novelty of the event sourcing architectural pattern, because of the lack of standardized tools for data conversion, and because of the large amount of data that is stored in typical event stores. This paper addresses the challenge of schema evolution and the resulting data conversion for event sourced systems. First of all a set of event store upgrade operations is proposed that can be used to convert data between two versions of a data schema. Second, a set of techniques and strategies that execute the data conversion while continuing the operation of the system is discussed. The final contribution is an event store upgrade framework that identifies which techniques and strategies can be combined to execute the event store upgrade operations while continuing operation of the system. Two utilizations of the framework are given, the first being as decision support in upfront design of an upgrade system for event sourced systems. The framework can also be utilized as the description of an automated upgrade system that can be used for continuous deployment. The event store upgrade framework is evaluated in interviews with three renowned experts in the domain and has been found to be a comprehensive overview that can be utilized in the design and implementation of an upgrade system. The automated upgrade system has been implemented partially and applied in experiments. @InProceedings{SANER17p193, author = {Michiel Overeem and Marten Spoor and Slinger Jansen}, title = {The Dark Side of Event Sourcing: Managing Data Conversion}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {193--204}, doi = {}, year = {2017}, } |
|
Paixão, Klérisson V. R. |
SANER '17: "Recommending Source Code Locations ..."
Recommending Source Code Locations for System Specific Transformations
Gustavo Santos, Klérisson V. R. Paixão, Nicolas Anquetil, Anne Etien, Marcelo de Almeida Maia, and Stéphane Ducasse (University of Lille, France; CNRS, France; Inria, France; Federal University of Uberlândia, Brazil) From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches (“structural”, based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%. @InProceedings{SANER17p160, author = {Gustavo Santos and Klérisson V. R. Paixão and Nicolas Anquetil and Anne Etien and Marcelo de Almeida Maia and Stéphane Ducasse}, title = {Recommending Source Code Locations for System Specific Transformations}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {160--170}, doi = {}, year = {2017}, } |
|
Palomba, Fabio |
SANER '17: "Software-Based Energy Profiling ..."
Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?
Dario Di Nucci, Fabio Palomba , Antonio Prota, Annibale Panichella , Andy Zaidman, and Andrea De Lucia (University of Salerno, Italy; Delft University of Technology, Netherlands; University of Luxembourg, Luxembourg) Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile. However, all these solutions present their own advantages and disadvantages. Hardware-based tools are highly precise, but at the same time their use is bound to the acquisition of costly hardware components. Model-based tools require the calibration of parameters needed to correctly create a model on a specific hardware device. Software-based approaches do not need any hardware components, but they rely on battery measurements and, thus, they are hardware-assisted. These tools are cheaper and easier to use than hardware-based tools, but they are believed to be less precise. In this paper, we take a deeper look at the pros and cons of software-based solutions investigating to what extent their measurements depart from hardware-based solutions. To this aim, we propose a software-based tool named PETrA that we compare with the hardware-based Monsoon toolkit on 54 Android apps. The results show that PETrA performs similarly to Monsoon despite not using any sophisticated hardware components. In fact, in all the apps the mean relative error with respect to Monsoon is lower than 0.05. Moreover, for 95% of the analyzed methods the estimation error is within 5% of the actual values measured using the hardware-based toolkit. @InProceedings{SANER17p103, author = {Dario Di Nucci and Fabio Palomba and Antonio Prota and Annibale Panichella and Andy Zaidman and Andrea De Lucia}, title = {Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {103--114}, doi = {}, year = {2017}, } Video Info |
|
Panichella, Annibale |
SANER '17: "Software-Based Energy Profiling ..."
Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?
Dario Di Nucci, Fabio Palomba , Antonio Prota, Annibale Panichella , Andy Zaidman, and Andrea De Lucia (University of Salerno, Italy; Delft University of Technology, Netherlands; University of Luxembourg, Luxembourg) Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile. However, all these solutions present their own advantages and disadvantages. Hardware-based tools are highly precise, but at the same time their use is bound to the acquisition of costly hardware components. Model-based tools require the calibration of parameters needed to correctly create a model on a specific hardware device. Software-based approaches do not need any hardware components, but they rely on battery measurements and, thus, they are hardware-assisted. These tools are cheaper and easier to use than hardware-based tools, but they are believed to be less precise. In this paper, we take a deeper look at the pros and cons of software-based solutions investigating to what extent their measurements depart from hardware-based solutions. To this aim, we propose a software-based tool named PETrA that we compare with the hardware-based Monsoon toolkit on 54 Android apps. The results show that PETrA performs similarly to Monsoon despite not using any sophisticated hardware components. In fact, in all the apps the mean relative error with respect to Monsoon is lower than 0.05. Moreover, for 95% of the analyzed methods the estimation error is within 5% of the actual values measured using the hardware-based toolkit. @InProceedings{SANER17p103, author = {Dario Di Nucci and Fabio Palomba and Antonio Prota and Annibale Panichella and Andy Zaidman and Andrea De Lucia}, title = {Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {103--114}, doi = {}, year = {2017}, } Video Info |
|
Panichella, Sebastiano |
SANER '17: "Analyzing Reviews and Code ..."
Analyzing Reviews and Code of Mobile Apps for Better Release Planning
Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella, and Harald C. Gall (University of Zurich, Switzerland) The mobile applications industry experiences an unprecedented high growth, developers working in this context face a fierce competition in acquiring and retaining users. They have to quickly implement new features and fix bugs, or risks losing their users to the competition. To achieve this goal they must closely monitor and analyze the user feedback they receive in form of reviews. However, successful apps can receive up to several thousands of reviews per day, manually analysing each of them is a time consuming task. To help developers deal with the large amount of available data, we manually analyzed the text of 1566 user reviews and defined a high and low level taxonomy containing mobile specific categories (e.g. performance, resources, battery, memory, etc.) highly relevant for developers during the planning of maintenance and evolution activities. Then we built the User Request Referencer (URR) prototype, using Machine Learning and Information Retrieval techniques, to automatically classify reviews according to our taxonomy and recommend for a particular review what are the source code files that need to be modified to handle the issue described in the user review. We evaluated our approach through an empirical study involving the reviews and code of 39 mobile applications. Our results show a high precision and recall of URR in organising reviews according to the defined taxonomy. @InProceedings{SANER17p91, author = {Adelina Ciurumelea and Andreas Schaufelbühl and Sebastiano Panichella and Harald C. Gall}, title = {Analyzing Reviews and Code of Mobile Apps for Better Release Planning}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {91--102}, doi = {}, year = {2017}, } SANER '17: "Reducing Redundancies in Multi-revision ..." Reducing Redundancies in Multi-revision Code Analysis Carol V. Alexandru, Sebastiano Panichella, and Harald C. Gall (University of Zurich, Switzerland) Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code. @InProceedings{SANER17p148, author = {Carol V. Alexandru and Sebastiano Panichella and Harald C. Gall}, title = {Reducing Redundancies in Multi-revision Code Analysis}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {148--159}, doi = {}, year = {2017}, } |
|
Pollock, Lori |
SANER '17: "Automatically Generating Natural ..."
Automatically Generating Natural Language Descriptions for Object-Related Statement Sequences
Xiaoran Wang, Lori Pollock, and K. Vijay-Shanker (University of Delaware, USA) Current source code analyses driving software maintenance tools treat methods as either a single unit or a set of individual statements or words. They often leverage method names and any existing internal comments. However, internal comments are rare, and method names do not typically capture the method’s multiple high-level algorithmic steps that are too small to be a single method, but require more than one statement to implement. Previous work demonstrated feasibility of identifying high level actions automatically for loops; however, many high level actions remain unaddressed and undocumented, particularly sequences of consecutive statements that are associated with each other primarily by object references. We call these object-related action units. In this paper, we present an approach to automatically generate natural language descriptions of object-related action units within methods. We leverage the available, large source of high-quality open source projects to learn the templates of object-related actions, identify the statement that can represent the main action, and generate natural language descriptions for these actions. Our evaluation study of a set of 100 object-related statement sequences showed promise of our approach to automatically identify the action and arguments and generate natural language descriptions. @InProceedings{SANER17p205, author = {Xiaoran Wang and Lori Pollock and K. Vijay-Shanker}, title = {Automatically Generating Natural Language Descriptions for Object-Related Statement Sequences}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {205--216}, doi = {}, year = {2017}, } |
|
Proksch, Sebastian |
SANER '17: "Enriching In-IDE Process Information ..."
Enriching In-IDE Process Information with Fine-Grained Source Code History
Sebastian Proksch, Sarah Nadi , Sven Amann, and Mira Mezini (TU Darmstadt, Germany; University of Alberta, Canada) Current studies on software development either focus on the change history of source code from version-control systems or on an analysis of simplistic in-IDE events without context information. Each of these approaches contains valuable information that is unavailable in the other case. Our work proposes enriched event streams, a solution that combines the best of both worlds and provides a holistic view on the software development process. Enriched event streams not only capture developer activities in the IDE, but also specialized context information, such as source-code snapshots for change events. To enable the storage of such code snapshots in an analyzable format, we introduce a new intermediate representation called Simplified Syntax Trees (SSTs) and build CARET, a platform that offers reusable components to conveniently work with enriched event streams. We implement FeedBaG++, an instrumentation for Visual Studio that collects enriched event streams with code snapshots in the form of SSTs. We share a dataset of enriched event streams captured from 58 users and representing 915 days of work. Additionally, to demonstrate usefulness, we present three research applications that have already made use of CARET and FeedBaG++. @InProceedings{SANER17p250, author = {Sebastian Proksch and Sarah Nadi and Sven Amann and Mira Mezini}, title = {Enriching In-IDE Process Information with Fine-Grained Source Code History}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {250--260}, doi = {}, year = {2017}, } Info |
|
Prota, Antonio |
SANER '17: "Software-Based Energy Profiling ..."
Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?
Dario Di Nucci, Fabio Palomba , Antonio Prota, Annibale Panichella , Andy Zaidman, and Andrea De Lucia (University of Salerno, Italy; Delft University of Technology, Netherlands; University of Luxembourg, Luxembourg) Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile. However, all these solutions present their own advantages and disadvantages. Hardware-based tools are highly precise, but at the same time their use is bound to the acquisition of costly hardware components. Model-based tools require the calibration of parameters needed to correctly create a model on a specific hardware device. Software-based approaches do not need any hardware components, but they rely on battery measurements and, thus, they are hardware-assisted. These tools are cheaper and easier to use than hardware-based tools, but they are believed to be less precise. In this paper, we take a deeper look at the pros and cons of software-based solutions investigating to what extent their measurements depart from hardware-based solutions. To this aim, we propose a software-based tool named PETrA that we compare with the hardware-based Monsoon toolkit on 54 Android apps. The results show that PETrA performs similarly to Monsoon despite not using any sophisticated hardware components. In fact, in all the apps the mean relative error with respect to Monsoon is lower than 0.05. Moreover, for 95% of the analyzed methods the estimation error is within 5% of the actual values measured using the hardware-based toolkit. @InProceedings{SANER17p103, author = {Dario Di Nucci and Fabio Palomba and Antonio Prota and Annibale Panichella and Andy Zaidman and Andrea De Lucia}, title = {Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {103--114}, doi = {}, year = {2017}, } Video Info |
|
Rahman, Mohammad Masudur |
SANER '17: "STRICT: Information Retrieval ..."
STRICT: Information Retrieval Based Search Term Identification for Concept Location
Mohammad Masudur Rahman and Chanchal K. Roy (University of Saskatchewan, Canada) During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique--STRICT--that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-- TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%--62% of the requests with 30%--57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique. @InProceedings{SANER17p79, author = {Mohammad Masudur Rahman and Chanchal K. Roy}, title = {STRICT: Information Retrieval Based Search Term Identification for Concept Location}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {79--90}, doi = {}, year = {2017}, } Info |
|
Rempel, Patrick |
SANER '17: "Analyzing Closeness of Code ..."
Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery
Hongyu Kuang, Jia Nie, Hao Hu, Patrick Rempel, Jian Lü, Alexander Egyed, and Patrick Mäder (Nanjing University, China; JKU Linz, Austria; TU Ilmenau, Germany) Information Retrieval (IR) identifies trace links based on textual similarities among software artifacts. However, the vocabulary mismatch problem between different artifacts hinders the performance of IR-based approaches. A growing body of work addresses this issue by combining IR techniques with code dependency analysis such as method calls. However, so far the performance of combined approaches is highly dependent to the correctness of IR techniques and does not take full advantage of the code dependency analysis. In this paper, we combine IR techniques with closeness analysis to improve IR-based traceability recovery. Specifically, we quantify and utilize the “closeness” for each call and data dependency between two classes to improve rankings of traceability candidate lists. An empirical evaluation based on three real-world systems suggests that our approach outperforms three baseline approaches. @InProceedings{SANER17p68, author = {Hongyu Kuang and Jia Nie and Hao Hu and Patrick Rempel and Jian Lü and Alexander Egyed and Patrick Mäder}, title = {Analyzing Closeness of Code Dependencies for Improving IR-Based Traceability Recovery}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {68--78}, doi = {}, year = {2017}, } |
|
Rouvoy, Romain |
SANER '17: "Investigating the Energy Impact ..."
Investigating the Energy Impact of Android Smells
Antonin Carette, Mehdi Adel Ait Younes, Geoffrey Hecht, Naouel Moha, and Romain Rouvoy (Université du Québec à Montréal, Canada; Inria, France; University of Lille, France; IUF, France) Android code smells are bad implementation practices within Android applications (or apps) that may lead to poor software quality. These code smells are known to degrade the performance of apps and to have an impact on energy consumption. However, few studies have assessed the positive impact on energy consumption when correcting code smells. In this paper, we therefore propose a tooled and reproducible approach, called Hot-Pepper, to automatically correct code smells and evaluate their impact on energy consumption. Currently, Hot-Pepper is able to automatically correct three types of Android-specific code smells: Internal Getter/Setter, Member Ignoring Method, and HashMap Usage. Hot-Pepper derives four versions of the apps by correcting each detected smell independently, and all of them at once. Hot-Pepper is able to report on the energy consumption of each app version with a single user scenario test. Our empirical study on five open-source Android apps shows that correcting the three aforementioned Android code smells effectively and significantly reduces the energy consumption of apps. In particular, we observed a global reduction in energy consumption by 4,83% in one app when the three code smells are corrected. We also take advantage of the flexibility of Hot-Pepper to investigate the impact of three picture smells (bad picture format, compression, and bitmap format) in sample apps. We observed that the usage of optimised JPG pictures with the Android default bitmap format is the most energy efficient combination in Android apps. We believe that developers can benefit from our approach and results to guide their refactoring, and thus improve the energy consumption of their mobile apps. @InProceedings{SANER17p115, author = {Antonin Carette and Mehdi Adel Ait Younes and Geoffrey Hecht and Naouel Moha and Romain Rouvoy}, title = {Investigating the Energy Impact of Android Smells}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {115--126}, doi = {}, year = {2017}, } |
|
Roy, Chanchal K. |
SANER '17: "STRICT: Information Retrieval ..."
STRICT: Information Retrieval Based Search Term Identification for Concept Location
Mohammad Masudur Rahman and Chanchal K. Roy (University of Saskatchewan, Canada) During maintenance, software developers deal with numerous change requests that are written in an unstructured fashion using natural language. Such natural language texts illustrate the change requirement involving various domain related concepts. Software developers need to find appropriate search terms from those concepts so that they could locate the possible locations in the source code using a search technique. Once such locations are identified, they can implement the requested changes there. Studies suggest that developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose a novel technique--STRICT--that automatically identifies suitable search terms for a software change task by analyzing its task description using two information retrieval (IR) techniques-- TextRank and POSRank. These IR techniques determine a term's importance based on not only its co-occurrences with other important terms but also its syntactic relationships with them. Experiments using 1,939 change requests from eight subject systems report that STRICT can identify better quality search terms than baseline terms from 52%--62% of the requests with 30%--57% Top-10 retrieval accuracy which are promising. Comparison with two state-of-the-art techniques not only validates our empirical findings and but also demonstrates the superiority of our technique. @InProceedings{SANER17p79, author = {Mohammad Masudur Rahman and Chanchal K. Roy}, title = {STRICT: Information Retrieval Based Search Term Identification for Concept Location}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {79--90}, doi = {}, year = {2017}, } Info |
|
Roy, Sohon |
SANER '17: "Spreadsheet Testing in Practice ..."
Spreadsheet Testing in Practice
Sohon Roy, Felienne Hermans, and Arie van Deursen (Delft University of Technology, Netherlands) Despite being popular end-user tools, spreadsheets suffer from the vulnerability of error-proneness. In software engineering, testing has been proposed as a way to address errors. It is important therefore to know whether spreadsheet users also test, or how do they test and to what extent, especially since most spreadsheet users do not have the training, or experience, of software engineering principles. Towards this end, we conduct a two-phase mixed methods study. First, a qualitative phase, in which we interview 12 spreadsheet users, and second, a quantitative phase, in which we conduct an online survey completed by 72 users. The outcome of the interviews, organized into four different categories, consists of an overview of test practices, perceptions of spreadsheet users about testing, a set of preventive measures for avoiding errors, and an overview of maintenance practices for ensuring correctness of spreadsheets over time. The survey adds to the findings by providing quantitative estimates indicating that ensuring correctness is an important concern, and a major fraction of users do test their spreadsheets. However, their techniques are largely manual and lack formalism. Tools and automated supports are rarely used. @InProceedings{SANER17p338, author = {Sohon Roy and Felienne Hermans and Arie van Deursen}, title = {Spreadsheet Testing in Practice}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {338--348}, doi = {}, year = {2017}, } |
|
Saake, Gunter |
SANER '17: "Variant-Preserving Refactorings ..."
Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line
Wolfram Fenske, Jens Meinicke, Sandro Schulze, Steffen Schulze, and Gunter Saake (University of Magdeburg, Germany; Carnegie Mellon University, USA) A common and simple way to create custom product variants is to copy and adapt existing software (a.k.a. the clone-and-own approach). Clone-and-own promises low initial costs for creating a new variant as existing code is easily reused. However, clone-and-own also comes with major drawbacks for maintenance and evolution since changes, such as bug fixes, need to be synchronized among several product variants. Software product lines (SPLs) provide solutions to these problems because commonalities are implemented only once. Thus, in an SPL, changes also need to be applied only once. Therefore, the migration of cloned product variants to an SPL would be beneficial. The main tasks of migration are the identification and extraction of commonalities from existing products. However, these tasks are challenging and currently not well-supported. In this paper, we propose a step-wise and semi-automated process to migrate cloned product variants to a feature-oriented SPL. Our process relies on clone detection to identify code that is common to multiple variants and novel, variant-preserving refactorings to extract such common code. We evaluated our approach on five cloned product variants, reducing code clones by 25%. Moreover, we provide qualitative insights into possible limitations and potentials for removing even more redundant code. We argue that our approach can effectively decrease synchronization effort compared to clone-and-own development and thus reduce the long-term costs for maintenance and evolution. @InProceedings{SANER17p316, author = {Wolfram Fenske and Jens Meinicke and Sandro Schulze and Steffen Schulze and Gunter Saake}, title = {Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {316--326}, doi = {}, year = {2017}, } |
|
Saboury, Amir |
SANER '17: "An Empirical Study of Code ..."
An Empirical Study of Code Smells in JavaScript Projects
Amir Saboury, Pooya Musavi, Foutse Khomh , and Giuliano Antoniol (Polytechnique Montréal, Canada) JavaScript is a powerful scripting programming language that has gained a lot of attention this past decade. Initially used exclusively for client-side web development, it has evolved to become one of the most popular programming languages, with developers now using it for both client-side and server-side application development. Similar to applications written in other programming languages, JavaScript applications contain code smells, which are poor design choices that can negatively impact the quality of an application. In this paper, we investigate code smells in JavaScript server-side applications with the aim to understand how they impact the fault-proneness of applications. We detect 12 types of code smells in 537 releases of five popular JavaScript applications (i.e., express, grunt, bower, less.js, and request) and perform survival analysis, comparing the time until a fault occurrence, in files containing code smells and files without code smells. Results show that (1) on average, files without code smells have hazard rates 65% lower than files with code smells. (2) Among the studied smells, “Variable Re-assign” and “Assignment In Conditional statements” code smells have the highest hazard rates. Additionally, we conduct a survey with 1,484 JavaScript developers, to understand the perception of developers towards our studied code smells. We found that developers consider “Nested Callbacks”, “Variable Re-assign” and “Long Parameter List” code smells to be serious design problems that hinder the maintainability and reliability of applications. This assessment is in line with the findings of our quantitative analysis. Overall, code smells affect negatively the quality of JavaScript applications and developers should consider tracking and removing them early on before the release of applications to the public. @InProceedings{SANER17p294, author = {Amir Saboury and Pooya Musavi and Foutse Khomh and Giuliano Antoniol}, title = {An Empirical Study of Code Smells in JavaScript Projects}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {294--305}, doi = {}, year = {2017}, } |
|
Santos, Gustavo |
SANER '17: "Recommending Source Code Locations ..."
Recommending Source Code Locations for System Specific Transformations
Gustavo Santos, Klérisson V. R. Paixão, Nicolas Anquetil, Anne Etien, Marcelo de Almeida Maia, and Stéphane Ducasse (University of Lille, France; CNRS, France; Inria, France; Federal University of Uberlândia, Brazil) From time to time, developers perform sequences of code transformations in a systematic and repetitive way. This may happen, for example, when introducing a design pattern in a legacy system: similar classes have to be introduced, containing similar methods that are called in a similar way. Automation of these sequences of transformations has been proposed in the literature to avoid errors due to their repetitive nature. However, developers still need support to identify all the relevant code locations that are candidate for transformation. Past research showed that these kinds of transformation can lag for years with forgotten instances popping out from time to time as other evolutions bring them into light. In this paper, we evaluate three distinct code search approaches (“structural”, based on Information Retrieval, and AST based algorithm) to find code locations that would require similar transformations. We validate the resulting candidate locations from these approaches on real cases identified previously in literature. The results show that looking for code with similar roles, e.g., classes in the same hierarchy, provides interesting results with an average recall of 87% and in some cases the precision up to 70%. @InProceedings{SANER17p160, author = {Gustavo Santos and Klérisson V. R. Paixão and Nicolas Anquetil and Anne Etien and Marcelo de Almeida Maia and Stéphane Ducasse}, title = {Recommending Source Code Locations for System Specific Transformations}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {160--170}, doi = {}, year = {2017}, } |
|
Sawada, Naoya |
SANER '17: "HDSKG: Harvesting Domain Specific ..."
HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
Xuejiao Zhao, Zhenchang Xing , Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shang-Wei Lin (Nanyang Technological University, Singapore; Australian National University, Australia; Charles Sturt University, Australia; NTT, Japan) Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. @InProceedings{SANER17p56, author = {Xuejiao Zhao and Zhenchang Xing and Muhammad Ashad Kabir and Naoya Sawada and Jing Li and Shang-Wei Lin}, title = {HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {56--67}, doi = {}, year = {2017}, } Info |
|
Schaufelbühl, Andreas |
SANER '17: "Analyzing Reviews and Code ..."
Analyzing Reviews and Code of Mobile Apps for Better Release Planning
Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella, and Harald C. Gall (University of Zurich, Switzerland) The mobile applications industry experiences an unprecedented high growth, developers working in this context face a fierce competition in acquiring and retaining users. They have to quickly implement new features and fix bugs, or risks losing their users to the competition. To achieve this goal they must closely monitor and analyze the user feedback they receive in form of reviews. However, successful apps can receive up to several thousands of reviews per day, manually analysing each of them is a time consuming task. To help developers deal with the large amount of available data, we manually analyzed the text of 1566 user reviews and defined a high and low level taxonomy containing mobile specific categories (e.g. performance, resources, battery, memory, etc.) highly relevant for developers during the planning of maintenance and evolution activities. Then we built the User Request Referencer (URR) prototype, using Machine Learning and Information Retrieval techniques, to automatically classify reviews according to our taxonomy and recommend for a particular review what are the source code files that need to be modified to handle the issue described in the user review. We evaluated our approach through an empirical study involving the reviews and code of 39 mobile applications. Our results show a high precision and recall of URR in organising reviews according to the defined taxonomy. @InProceedings{SANER17p91, author = {Adelina Ciurumelea and Andreas Schaufelbühl and Sebastiano Panichella and Harald C. Gall}, title = {Analyzing Reviews and Code of Mobile Apps for Better Release Planning}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {91--102}, doi = {}, year = {2017}, } |
|
Schulze, Sandro |
SANER '17: "Variant-Preserving Refactorings ..."
Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line
Wolfram Fenske, Jens Meinicke, Sandro Schulze, Steffen Schulze, and Gunter Saake (University of Magdeburg, Germany; Carnegie Mellon University, USA) A common and simple way to create custom product variants is to copy and adapt existing software (a.k.a. the clone-and-own approach). Clone-and-own promises low initial costs for creating a new variant as existing code is easily reused. However, clone-and-own also comes with major drawbacks for maintenance and evolution since changes, such as bug fixes, need to be synchronized among several product variants. Software product lines (SPLs) provide solutions to these problems because commonalities are implemented only once. Thus, in an SPL, changes also need to be applied only once. Therefore, the migration of cloned product variants to an SPL would be beneficial. The main tasks of migration are the identification and extraction of commonalities from existing products. However, these tasks are challenging and currently not well-supported. In this paper, we propose a step-wise and semi-automated process to migrate cloned product variants to a feature-oriented SPL. Our process relies on clone detection to identify code that is common to multiple variants and novel, variant-preserving refactorings to extract such common code. We evaluated our approach on five cloned product variants, reducing code clones by 25%. Moreover, we provide qualitative insights into possible limitations and potentials for removing even more redundant code. We argue that our approach can effectively decrease synchronization effort compared to clone-and-own development and thus reduce the long-term costs for maintenance and evolution. @InProceedings{SANER17p316, author = {Wolfram Fenske and Jens Meinicke and Sandro Schulze and Steffen Schulze and Gunter Saake}, title = {Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {316--326}, doi = {}, year = {2017}, } |
|
Schulze, Steffen |
SANER '17: "Variant-Preserving Refactorings ..."
Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line
Wolfram Fenske, Jens Meinicke, Sandro Schulze, Steffen Schulze, and Gunter Saake (University of Magdeburg, Germany; Carnegie Mellon University, USA) A common and simple way to create custom product variants is to copy and adapt existing software (a.k.a. the clone-and-own approach). Clone-and-own promises low initial costs for creating a new variant as existing code is easily reused. However, clone-and-own also comes with major drawbacks for maintenance and evolution since changes, such as bug fixes, need to be synchronized among several product variants. Software product lines (SPLs) provide solutions to these problems because commonalities are implemented only once. Thus, in an SPL, changes also need to be applied only once. Therefore, the migration of cloned product variants to an SPL would be beneficial. The main tasks of migration are the identification and extraction of commonalities from existing products. However, these tasks are challenging and currently not well-supported. In this paper, we propose a step-wise and semi-automated process to migrate cloned product variants to a feature-oriented SPL. Our process relies on clone detection to identify code that is common to multiple variants and novel, variant-preserving refactorings to extract such common code. We evaluated our approach on five cloned product variants, reducing code clones by 25%. Moreover, we provide qualitative insights into possible limitations and potentials for removing even more redundant code. We argue that our approach can effectively decrease synchronization effort compared to clone-and-own development and thus reduce the long-term costs for maintenance and evolution. @InProceedings{SANER17p316, author = {Wolfram Fenske and Jens Meinicke and Sandro Schulze and Steffen Schulze and Gunter Saake}, title = {Variant-Preserving Refactorings for Migrating Cloned Products to a Product Line}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {316--326}, doi = {}, year = {2017}, } |
|
Seinturier, Lionel |
SANER '17: "Dynamic Patch Generation for ..."
Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming
Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus (University of Lille, France; Inria, France) Null pointer exceptions (NPE) are the number one cause of uncaught crashing exceptions in production. In this paper, we aim at exploring the search space of possible patches for null pointer exceptions with metaprogramming. Our idea is to transform the program under repair with automated code transformation, so as to obtain a metaprogram. This metaprogram contains automatically injected hooks, that can be activated to emulate a null pointer exception patch. This enables us to perform a fine-grain analysis of the runtime context of null pointer exceptions. We set up an experiment with 16 real null pointer exceptions that have happened in the field. We compare the effectiveness of our metaprogramming approach against simple templates for repairing null pointer exceptions. @InProceedings{SANER17p349, author = {Thomas Durieux and Benoit Cornu and Lionel Seinturier and Martin Monperrus}, title = {Dynamic Patch Generation for Null Pointer Exceptions using Metaprogramming}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {349--358}, doi = {}, year = {2017}, } |
|
Serebrenik, Alexander |
SANER '17: "Code of Conduct in Open Source ..."
Code of Conduct in Open Source Projects
Parastou Tourani, Bram Adams, and Alexander Serebrenik (Polytechnique Montréal, Canada; Eindhoven University of Technology, Netherlands) Open source projects rely on collaboration of members from all around the world using web technologies like GitHub and Gerrit. This mixture of people with a wide range of backgrounds including minorities like women, ethnic minorities, and people with disabilities may increase the risk of offensive and destroying behaviours in the community, potentially leading affected project members to leave towards a more welcoming and friendly environment. To counter these effects, open source projects increasingly are turning to codes of conduct, in an attempt to promote their expectations and standards of ethical behaviour. In this first of its kind empirical study of codes of conduct in open source software projects, we investigated the role, scope and influence of codes of conduct through a mixture of quantitative and qualitative analysis, supported by interviews with practitioners. We found that the top codes of conduct are adopted by hundreds to thousands of projects, while all of them share 5 common dimensions. @InProceedings{SANER17p24, author = {Parastou Tourani and Bram Adams and Alexander Serebrenik}, title = {Code of Conduct in Open Source Projects}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {24--33}, doi = {}, year = {2017}, } |
|
Siegmund, Janet |
SANER '17: "Shorter Identifier Names Take ..."
Shorter Identifier Names Take Longer to Comprehend
Johannes Hofmeister, Janet Siegmund, and Daniel V. Holt (University of Passau, Germany; University of Heidelberg, Germany) Developers spend the majority of their time comprehending code, a process in which identifier names play a key role. Although many identifier naming styles exist, they often lack an empirical basis and it is not quite clear whether short or long identifier names facilitate comprehension. In this paper, we investigate the effect of different identifier naming styles (letters, abbreviations, words) on program comprehension, and whether these effects arise because of their length or their semantics. We conducted an experimental study with 72 professional C# developers, who looked for defects in source-code snippets. We used a within-subjects design, such that each developer saw all three versions of identifier naming styles and we measured the time it took them to find a defect. We found that words lead to, on average, 19% faster comprehension speed compared to letters and abbreviations, but we did not find a significant difference in speed between letters and abbreviations. The results of our study suggest that defects in code are more difficult to detect when code contains only letters and abbreviations. Words as identifier names facilitate program comprehension and can help to save costs and improve software quality. @InProceedings{SANER17p217, author = {Johannes Hofmeister and Janet Siegmund and Daniel V. Holt}, title = {Shorter Identifier Names Take Longer to Comprehend}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {217--227}, doi = {}, year = {2017}, } Info |
|
Spoor, Marten |
SANER '17: "The Dark Side of Event Sourcing: ..."
The Dark Side of Event Sourcing: Managing Data Conversion
Michiel Overeem, Marten Spoor, and Slinger Jansen (AFAS Software, Netherlands; Utrecht University, Netherlands) Evolving software systems includes data schema changes, and because of those schema changes data has to be converted. Converting data between two different schemas while continuing the operation of the system is a challenge when that system is expected to be available always. Data conversion in event sourced systems introduces new challenges, because of the relative novelty of the event sourcing architectural pattern, because of the lack of standardized tools for data conversion, and because of the large amount of data that is stored in typical event stores. This paper addresses the challenge of schema evolution and the resulting data conversion for event sourced systems. First of all a set of event store upgrade operations is proposed that can be used to convert data between two versions of a data schema. Second, a set of techniques and strategies that execute the data conversion while continuing the operation of the system is discussed. The final contribution is an event store upgrade framework that identifies which techniques and strategies can be combined to execute the event store upgrade operations while continuing operation of the system. Two utilizations of the framework are given, the first being as decision support in upfront design of an upgrade system for event sourced systems. The framework can also be utilized as the description of an automated upgrade system that can be used for continuous deployment. The event store upgrade framework is evaluated in interviews with three renowned experts in the domain and has been found to be a comprehensive overview that can be utilized in the design and implementation of an upgrade system. The automated upgrade system has been implemented partially and applied in experiments. @InProceedings{SANER17p193, author = {Michiel Overeem and Marten Spoor and Slinger Jansen}, title = {The Dark Side of Event Sourcing: Managing Data Conversion}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {193--204}, doi = {}, year = {2017}, } |
|
Stevens, Reinout |
SANER '17: "Extracting Executable Transformations ..."
Extracting Executable Transformations from Distilled Code Changes
Reinout Stevens and Coen De Roover (Vrije Universiteit Brussel, Belgium) Change distilling algorithms compute a sequence of fine-grained changes that, when executed in order, transform a given source AST into a given target AST. The resulting change sequences are used in the field of mining software repositories to study source code evolution. Unfortunately, detecting and specifying source code evolutions in such a change sequence is cumbersome. We therefore introduce a tool-supported approach that identifies minimal executable subsequences in a sequence of distilled changes that implement a particular evolution pattern, specified in terms of intermediate states of the AST that undergoes each change. This enables users to describe the effect of multiple changes, irrespective of their execution order, while ensuring that different change sequences that implement the same code evolution are recalled. Correspondingly, our evaluation is two-fold. Using examples, we demonstrate the expressiveness of specifying source code evolutions through intermediate ASTs. We also show that our approach is able to recall different implementation variants of the same source code evolution in open-source histories. @InProceedings{SANER17p171, author = {Reinout Stevens and Coen De Roover}, title = {Extracting Executable Transformations from Distilled Code Changes}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {171--181}, doi = {}, year = {2017}, } |
|
Sun, Jianling |
SANER '17: "Detecting Similar Repositories ..."
Detecting Similar Repositories on GitHub
Yun Zhang, David Lo , Pavneet Singh Kochhar, Xin Xia, Quanlai Li, and Jianling Sun (Zhejiang University, China; Singapore Management University, Singapore; University of California at Berkeley, USA) GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN. @InProceedings{SANER17p13, author = {Yun Zhang and David Lo and Pavneet Singh Kochhar and Xin Xia and Quanlai Li and Jianling Sun}, title = {Detecting Similar Repositories on GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {13--23}, doi = {}, year = {2017}, } Info |
|
Tang, Yutian |
SANER '17: "StiCProb: A Novel Feature ..."
StiCProb: A Novel Feature Mining Approach using Conditional Probability
Yutian Tang and Hareton Leung (Hong Kong Polytechnic University, China) Software Product Line Engineering is a key approach to construct applications with systematical reuse of architecture, documents and other relevant components. To migrate legacy software into a product line system, it is essential to identify the code segments that should be constructed as features from the source base. However, this could be an error-prone and complicated task, as it involves exploring a complex structure and extracting the relations between different components within a system. And normally, representing structural information of a program in a mathematical way should be a promising direction to investigate. We improve this situation by proposing a probability-based approach named StiCProb to capture source code fragments for feature concerned, which inherently provides a conditional probability to describe the closeness between two programming elements. In the case study, we conduct feature mining on several legacy systems, to compare our approach with other related approaches. As demonstrated in our experiment, our approach could support developers to locate features within legacy successfully with a better performance of 83% for precision and 41% for recall. @InProceedings{SANER17p45, author = {Yutian Tang and Hareton Leung}, title = {StiCProb: A Novel Feature Mining Approach using Conditional Probability}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {45--55}, doi = {}, year = {2017}, } Info |
|
Tiella, Roberto |
SANER '17: "Automatic Generation of Opaque ..."
Automatic Generation of Opaque Constants Based on the K-Clique Problem for Resilient Data Obfuscation
Roberto Tiella and Mariano Ceccato (Fondazione Bruno Kessler, Italy) Data obfuscations are program transformations used to complicate program understanding and conceal actual values of program variables. The possibility to hide constant values is a basic building block of several obfuscation techniques. For example, in XOR Masking a constant mask is used to encode data, but this mask must be hidden too, in order to keep the obfuscation resilient to attacks. In this paper, we present a novel technique based on the k-clique problem, which is known to be NP-complete, to generate opaque constants, i.e. values that are difficult to guess by static analysis. In our experimental assessment we show that our opaque constants are computationally cheap to generate, both at obfuscation time and at runtime. Moreover, due to the NP-completeness of the k-clique problem, our opaque constants can be proven to be hard to attack with state-of-the-art static analysis tools. @InProceedings{SANER17p182, author = {Roberto Tiella and Mariano Ceccato}, title = {Automatic Generation of Opaque Constants Based on the K-Clique Problem for Resilient Data Obfuscation}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {182--192}, doi = {}, year = {2017}, } |
|
Tourani, Parastou |
SANER '17: "Code of Conduct in Open Source ..."
Code of Conduct in Open Source Projects
Parastou Tourani, Bram Adams, and Alexander Serebrenik (Polytechnique Montréal, Canada; Eindhoven University of Technology, Netherlands) Open source projects rely on collaboration of members from all around the world using web technologies like GitHub and Gerrit. This mixture of people with a wide range of backgrounds including minorities like women, ethnic minorities, and people with disabilities may increase the risk of offensive and destroying behaviours in the community, potentially leading affected project members to leave towards a more welcoming and friendly environment. To counter these effects, open source projects increasingly are turning to codes of conduct, in an attempt to promote their expectations and standards of ethical behaviour. In this first of its kind empirical study of codes of conduct in open source software projects, we investigated the role, scope and influence of codes of conduct through a mixture of quantitative and qualitative analysis, supported by interviews with practitioners. We found that the top codes of conduct are adopted by hundreds to thousands of projects, while all of them share 5 common dimensions. @InProceedings{SANER17p24, author = {Parastou Tourani and Bram Adams and Alexander Serebrenik}, title = {Code of Conduct in Open Source Projects}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {24--33}, doi = {}, year = {2017}, } |
|
Tyszberowicz, Shmuel |
SANER '17: "Efficient Method Extraction ..."
Efficient Method Extraction for Automatic Elimination of Type-3 Clones
Ran Ettinger, Shmuel Tyszberowicz, and Shay Menaia (Ben-Gurion University of the Negev, Israel; Academic College of Tel Aviv-Yaffo, Israel) A semantics-preserving transformation by Komondoor and Horwitz has been shown to be most effective in the elimination of type-3 clones. The two original algorithms for realizing this transformation, however, are not as efficient as the related (slice-based) transformations. We present an asymptotically-faster algorithm that implements the same transformation via bidirectional reachability on a program dependence graph, and we prove its equivalence to the original formulation. @InProceedings{SANER17p327, author = {Ran Ettinger and Shmuel Tyszberowicz and Shay Menaia}, title = {Efficient Method Extraction for Automatic Elimination of Type-3 Clones}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {327--337}, doi = {}, year = {2017}, } |
|
Valente, Marco Tulio |
SANER '17: "Historical and Impact Analysis ..."
Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente (Federal University of Minas Gerais, Brazil) Change is a routine in software development. Like any system, libraries also evolve over time. As a consequence, clients are compelled to update and, thus, benefit from the available API improvements. However, some of these API changes may break contracts previously established, resulting in compilation errors and behavioral changes. In this paper, we study a set of questions regarding API breaking changes. Our goal is to measure the amount of breaking changes on real-world libraries and its impact on clients at a large-scale level. We assess (i) the frequency of breaking changes, (ii) the behavior of these changes over time, (iii) the impact on clients, and (iv) the characteristics of libraries with high frequency of breaking changes. Our large-scale analysis on 317 real-world Java libraries, 9K releases, and 260K client applications shows that (i) 14.78% of the API changes break compatibility with previous versions, (ii) the frequency of breaking changes increases over time, (iii) 2.54% of their clients are impacted, and (iv) systems with higher frequency of breaking changes are larger, more popular, and more active. Based on these results, we provide a set of lessons to better support library and client developers in their maintenance tasks. @InProceedings{SANER17p138, author = {Laerte Xavier and Aline Brito and Andre Hora and Marco Tulio Valente}, title = {Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {138--147}, doi = {}, year = {2017}, } |
|
Vijay-Shanker, K. |
SANER '17: "Automatically Generating Natural ..."
Automatically Generating Natural Language Descriptions for Object-Related Statement Sequences
Xiaoran Wang, Lori Pollock, and K. Vijay-Shanker (University of Delaware, USA) Current source code analyses driving software maintenance tools treat methods as either a single unit or a set of individual statements or words. They often leverage method names and any existing internal comments. However, internal comments are rare, and method names do not typically capture the method’s multiple high-level algorithmic steps that are too small to be a single method, but require more than one statement to implement. Previous work demonstrated feasibility of identifying high level actions automatically for loops; however, many high level actions remain unaddressed and undocumented, particularly sequences of consecutive statements that are associated with each other primarily by object references. We call these object-related action units. In this paper, we present an approach to automatically generate natural language descriptions of object-related action units within methods. We leverage the available, large source of high-quality open source projects to learn the templates of object-related actions, identify the statement that can represent the main action, and generate natural language descriptions for these actions. Our evaluation study of a set of 100 object-related statement sequences showed promise of our approach to automatically identify the action and arguments and generate natural language descriptions. @InProceedings{SANER17p205, author = {Xiaoran Wang and Lori Pollock and K. Vijay-Shanker}, title = {Automatically Generating Natural Language Descriptions for Object-Related Statement Sequences}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {205--216}, doi = {}, year = {2017}, } |
|
Wang, Xiaoran |
SANER '17: "Automatically Generating Natural ..."
Automatically Generating Natural Language Descriptions for Object-Related Statement Sequences
Xiaoran Wang, Lori Pollock, and K. Vijay-Shanker (University of Delaware, USA) Current source code analyses driving software maintenance tools treat methods as either a single unit or a set of individual statements or words. They often leverage method names and any existing internal comments. However, internal comments are rare, and method names do not typically capture the method’s multiple high-level algorithmic steps that are too small to be a single method, but require more than one statement to implement. Previous work demonstrated feasibility of identifying high level actions automatically for loops; however, many high level actions remain unaddressed and undocumented, particularly sequences of consecutive statements that are associated with each other primarily by object references. We call these object-related action units. In this paper, we present an approach to automatically generate natural language descriptions of object-related action units within methods. We leverage the available, large source of high-quality open source projects to learn the templates of object-related actions, identify the statement that can represent the main action, and generate natural language descriptions for these actions. Our evaluation study of a set of 100 object-related statement sequences showed promise of our approach to automatically identify the action and arguments and generate natural language descriptions. @InProceedings{SANER17p205, author = {Xiaoran Wang and Lori Pollock and K. Vijay-Shanker}, title = {Automatically Generating Natural Language Descriptions for Object-Related Statement Sequences}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {205--216}, doi = {}, year = {2017}, } |
|
Wimmer, Manuel |
SANER '17: "Automated Generation of Consistency-Achieving ..."
Automated Generation of Consistency-Achieving Model Editors
Patrick Neubauer, Robert Bill, Tanja Mayerhofer, and Manuel Wimmer (Vienna University of Technology, Austria) The advances of domain-specific modeling languages (DSMLs) and their editors created with modern language workbenches, have convinced domain experts of applying them as important and powerful means in their daily endeavors. Despite the fact that such editors are proficient in retaining syntactical model correctness, they present major shortages in mastering the preservation of consistency in models with elaborated language-specific constraints which require language engineers to manually implement sophisticated editing capabilities. Consequently, there is a demand for automating procedures to support editor users in both comprehending as well as resolving consistency violations. In this paper, we present an approach to automate the generation of advanced editing support for DSMLs offering automated validation, content-assist, and quick fix capabilities beyond those created by state-of-the-art language workbenches that help domain experts in retaining and achieving the consistency of models. For validation, we show potential error causes for violated constraints, instead of only the context in which constraints are violated. The state-space explosion problem is mitigated by our approach resolving constraint violations by increasing the neighborhood scope in a three-stage process, seeking constraint repair solutions presented as quick fixes to the editor user. We illustrate and provide an initial evaluation of our approach based on an Xtext-based DSML for modeling service clusters. @InProceedings{SANER17p127, author = {Patrick Neubauer and Robert Bill and Tanja Mayerhofer and Manuel Wimmer}, title = {Automated Generation of Consistency-Achieving Model Editors}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {127--137}, doi = {}, year = {2017}, } Info |
|
Xavier, Laerte |
SANER '17: "Historical and Impact Analysis ..."
Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study
Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente (Federal University of Minas Gerais, Brazil) Change is a routine in software development. Like any system, libraries also evolve over time. As a consequence, clients are compelled to update and, thus, benefit from the available API improvements. However, some of these API changes may break contracts previously established, resulting in compilation errors and behavioral changes. In this paper, we study a set of questions regarding API breaking changes. Our goal is to measure the amount of breaking changes on real-world libraries and its impact on clients at a large-scale level. We assess (i) the frequency of breaking changes, (ii) the behavior of these changes over time, (iii) the impact on clients, and (iv) the characteristics of libraries with high frequency of breaking changes. Our large-scale analysis on 317 real-world Java libraries, 9K releases, and 260K client applications shows that (i) 14.78% of the API changes break compatibility with previous versions, (ii) the frequency of breaking changes increases over time, (iii) 2.54% of their clients are impacted, and (iv) systems with higher frequency of breaking changes are larger, more popular, and more active. Based on these results, we provide a set of lessons to better support library and client developers in their maintenance tasks. @InProceedings{SANER17p138, author = {Laerte Xavier and Aline Brito and Andre Hora and Marco Tulio Valente}, title = {Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {138--147}, doi = {}, year = {2017}, } |
|
Xia, Xin |
SANER '17: "Detecting Similar Repositories ..."
Detecting Similar Repositories on GitHub
Yun Zhang, David Lo , Pavneet Singh Kochhar, Xin Xia, Quanlai Li, and Jianling Sun (Zhejiang University, China; Singapore Management University, Singapore; University of California at Berkeley, USA) GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN. @InProceedings{SANER17p13, author = {Yun Zhang and David Lo and Pavneet Singh Kochhar and Xin Xia and Quanlai Li and Jianling Sun}, title = {Detecting Similar Repositories on GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {13--23}, doi = {}, year = {2017}, } Info |
|
Xing, Zhenchang |
SANER '17: "HDSKG: Harvesting Domain Specific ..."
HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
Xuejiao Zhao, Zhenchang Xing , Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shang-Wei Lin (Nanyang Technological University, Singapore; Australian National University, Australia; Charles Sturt University, Australia; NTT, Japan) Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. @InProceedings{SANER17p56, author = {Xuejiao Zhao and Zhenchang Xing and Muhammad Ashad Kabir and Naoya Sawada and Jing Li and Shang-Wei Lin}, title = {HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {56--67}, doi = {}, year = {2017}, } Info |
|
Xu, Baowen |
SANER '17: "An Empirical Investigation ..."
An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults
Yiyang Feng, Wanwangying Ma, Yibiao Yang, Hongmin Lu, Yuming Zhou , and Baowen Xu (Nanjing University, China) In recent years, it has been shown that fault prediction models could effectively guide test effort allocation in finding faults if they have a high enough fault prediction accuracy (Norm(Popt) > 0.78). However, it is often difficult to achieve such a high fault prediction accuracy in practice. As a result, fault-prediction-model-guided allocation (FPA) methods may be not applicable in real development environments. To attack this problem, in this paper, we propose a new type of test effort allocation strategy: reliability-growth-model-guided allocation (RGA) method. For a given project release V, RGA attempts to predict the optimal test effort allocation for V by learning the fault distribution information from the previous releases. Based on three open-source projects, we empirically investigate the cost-effectiveness of three test effort allocation strategies for finding faults: RGA, FPA, and structural-complexity-guided allocation (SCA) method. The experimental results show that RGA shows a promising performance in finding faults when compared with SCA and FPA. @InProceedings{SANER17p371, author = {Yiyang Feng and Wanwangying Ma and Yibiao Yang and Hongmin Lu and Yuming Zhou and Baowen Xu}, title = {An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {371--381}, doi = {}, year = {2017}, } |
|
Yang, Yibiao |
SANER '17: "An Empirical Investigation ..."
An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults
Yiyang Feng, Wanwangying Ma, Yibiao Yang, Hongmin Lu, Yuming Zhou , and Baowen Xu (Nanjing University, China) In recent years, it has been shown that fault prediction models could effectively guide test effort allocation in finding faults if they have a high enough fault prediction accuracy (Norm(Popt) > 0.78). However, it is often difficult to achieve such a high fault prediction accuracy in practice. As a result, fault-prediction-model-guided allocation (FPA) methods may be not applicable in real development environments. To attack this problem, in this paper, we propose a new type of test effort allocation strategy: reliability-growth-model-guided allocation (RGA) method. For a given project release V, RGA attempts to predict the optimal test effort allocation for V by learning the fault distribution information from the previous releases. Based on three open-source projects, we empirically investigate the cost-effectiveness of three test effort allocation strategies for finding faults: RGA, FPA, and structural-complexity-guided allocation (SCA) method. The experimental results show that RGA shows a promising performance in finding faults when compared with SCA and FPA. @InProceedings{SANER17p371, author = {Yiyang Feng and Wanwangying Ma and Yibiao Yang and Hongmin Lu and Yuming Zhou and Baowen Xu}, title = {An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {371--381}, doi = {}, year = {2017}, } |
|
Yang, Zijiang |
SANER '17: "Scalable Tag Recommendation ..."
Scalable Tag Recommendation for Software Information Sites
Pingyi Zhou, Jin Liu, Zijiang Yang, and Guangyou Zhou (Wuhan University, China; Western Michigan University, USA; Central China Normal University, China) Software developers can search, share and learn development experience, solutions, bug fixes and open source projects in software information sites such as StackOverflow and Freecode. Many software information sites rely on tags to classify their contents, i.e. software objects, in order to improve the performance and accuracy of various operations on the sites. The quality of tags thus has a significant impact on the usefulness of these sites. High quality tags are expected to be concise and can describe the most important features of the software objects. Unfortunately tagging is inherently an uncoordinated process. The choice of tags made by individual software developers is dependent not only on a developer's understanding of the software object but also on the developer's English skills and preferences. As a result, the number of different tags grows rapidly along with continuous addition of software objects. With thousands of different tags, many of which introduce noise, software objects become poorly classified. Such phenomenon affects negatively the speed and accuracy of developers' queries. In this paper, we propose a tool called TagMulRec to automatically recommend tags and classify software objects in evolving large-scale software information sites. Given a new software object, TagMulRec locates the software objects that are semantically similar to the new one and exploit their tags. We have evaluated TagMulRec on four software information sites, StackOverflow, AskUbuntu, AskDifferent and Freecode. According to our empirical study, TagMulRec is not only accurate but also scalable that can handle a large-scale software information site with millions of software objects and thousands of tags. @InProceedings{SANER17p272, author = {Pingyi Zhou and Jin Liu and Zijiang Yang and Guangyou Zhou}, title = {Scalable Tag Recommendation for Software Information Sites}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {272--282}, doi = {}, year = {2017}, } |
|
Younes, Mehdi Adel Ait |
SANER '17: "Investigating the Energy Impact ..."
Investigating the Energy Impact of Android Smells
Antonin Carette, Mehdi Adel Ait Younes, Geoffrey Hecht, Naouel Moha, and Romain Rouvoy (Université du Québec à Montréal, Canada; Inria, France; University of Lille, France; IUF, France) Android code smells are bad implementation practices within Android applications (or apps) that may lead to poor software quality. These code smells are known to degrade the performance of apps and to have an impact on energy consumption. However, few studies have assessed the positive impact on energy consumption when correcting code smells. In this paper, we therefore propose a tooled and reproducible approach, called Hot-Pepper, to automatically correct code smells and evaluate their impact on energy consumption. Currently, Hot-Pepper is able to automatically correct three types of Android-specific code smells: Internal Getter/Setter, Member Ignoring Method, and HashMap Usage. Hot-Pepper derives four versions of the apps by correcting each detected smell independently, and all of them at once. Hot-Pepper is able to report on the energy consumption of each app version with a single user scenario test. Our empirical study on five open-source Android apps shows that correcting the three aforementioned Android code smells effectively and significantly reduces the energy consumption of apps. In particular, we observed a global reduction in energy consumption by 4,83% in one app when the three code smells are corrected. We also take advantage of the flexibility of Hot-Pepper to investigate the impact of three picture smells (bad picture format, compression, and bitmap format) in sample apps. We observed that the usage of optimised JPG pictures with the Android default bitmap format is the most energy efficient combination in Android apps. We believe that developers can benefit from our approach and results to guide their refactoring, and thus improve the energy consumption of their mobile apps. @InProceedings{SANER17p115, author = {Antonin Carette and Mehdi Adel Ait Younes and Geoffrey Hecht and Naouel Moha and Romain Rouvoy}, title = {Investigating the Energy Impact of Android Smells}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {115--126}, doi = {}, year = {2017}, } |
|
Zaidman, Andy |
SANER '17: "Software-Based Energy Profiling ..."
Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?
Dario Di Nucci, Fabio Palomba , Antonio Prota, Annibale Panichella , Andy Zaidman, and Andrea De Lucia (University of Salerno, Italy; Delft University of Technology, Netherlands; University of Luxembourg, Luxembourg) Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile. However, all these solutions present their own advantages and disadvantages. Hardware-based tools are highly precise, but at the same time their use is bound to the acquisition of costly hardware components. Model-based tools require the calibration of parameters needed to correctly create a model on a specific hardware device. Software-based approaches do not need any hardware components, but they rely on battery measurements and, thus, they are hardware-assisted. These tools are cheaper and easier to use than hardware-based tools, but they are believed to be less precise. In this paper, we take a deeper look at the pros and cons of software-based solutions investigating to what extent their measurements depart from hardware-based solutions. To this aim, we propose a software-based tool named PETrA that we compare with the hardware-based Monsoon toolkit on 54 Android apps. The results show that PETrA performs similarly to Monsoon despite not using any sophisticated hardware components. In fact, in all the apps the mean relative error with respect to Monsoon is lower than 0.05. Moreover, for 95% of the analyzed methods the estimation error is within 5% of the actual values measured using the hardware-based toolkit. @InProceedings{SANER17p103, author = {Dario Di Nucci and Fabio Palomba and Antonio Prota and Annibale Panichella and Andy Zaidman and Andrea De Lucia}, title = {Software-Based Energy Profiling of Android Apps: Simple, Efficient and Reliable?}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {103--114}, doi = {}, year = {2017}, } Video Info |
|
Zhang, Yun |
SANER '17: "Detecting Similar Repositories ..."
Detecting Similar Repositories on GitHub
Yun Zhang, David Lo , Pavneet Singh Kochhar, Xin Xia, Quanlai Li, and Jianling Sun (Zhejiang University, China; Singapore Management University, Singapore; University of California at Berkeley, USA) GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN. @InProceedings{SANER17p13, author = {Yun Zhang and David Lo and Pavneet Singh Kochhar and Xin Xia and Quanlai Li and Jianling Sun}, title = {Detecting Similar Repositories on GitHub}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {13--23}, doi = {}, year = {2017}, } Info |
|
Zhao, Xuejiao |
SANER '17: "HDSKG: Harvesting Domain Specific ..."
HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
Xuejiao Zhao, Zhenchang Xing , Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shang-Wei Lin (Nanyang Technological University, Singapore; Australian National University, Australia; Charles Sturt University, Australia; NTT, Japan) Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. @InProceedings{SANER17p56, author = {Xuejiao Zhao and Zhenchang Xing and Muhammad Ashad Kabir and Naoya Sawada and Jing Li and Shang-Wei Lin}, title = {HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {56--67}, doi = {}, year = {2017}, } Info |
|
Zhou, Guangyou |
SANER '17: "Scalable Tag Recommendation ..."
Scalable Tag Recommendation for Software Information Sites
Pingyi Zhou, Jin Liu, Zijiang Yang, and Guangyou Zhou (Wuhan University, China; Western Michigan University, USA; Central China Normal University, China) Software developers can search, share and learn development experience, solutions, bug fixes and open source projects in software information sites such as StackOverflow and Freecode. Many software information sites rely on tags to classify their contents, i.e. software objects, in order to improve the performance and accuracy of various operations on the sites. The quality of tags thus has a significant impact on the usefulness of these sites. High quality tags are expected to be concise and can describe the most important features of the software objects. Unfortunately tagging is inherently an uncoordinated process. The choice of tags made by individual software developers is dependent not only on a developer's understanding of the software object but also on the developer's English skills and preferences. As a result, the number of different tags grows rapidly along with continuous addition of software objects. With thousands of different tags, many of which introduce noise, software objects become poorly classified. Such phenomenon affects negatively the speed and accuracy of developers' queries. In this paper, we propose a tool called TagMulRec to automatically recommend tags and classify software objects in evolving large-scale software information sites. Given a new software object, TagMulRec locates the software objects that are semantically similar to the new one and exploit their tags. We have evaluated TagMulRec on four software information sites, StackOverflow, AskUbuntu, AskDifferent and Freecode. According to our empirical study, TagMulRec is not only accurate but also scalable that can handle a large-scale software information site with millions of software objects and thousands of tags. @InProceedings{SANER17p272, author = {Pingyi Zhou and Jin Liu and Zijiang Yang and Guangyou Zhou}, title = {Scalable Tag Recommendation for Software Information Sites}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {272--282}, doi = {}, year = {2017}, } |
|
Zhou, Pingyi |
SANER '17: "Scalable Tag Recommendation ..."
Scalable Tag Recommendation for Software Information Sites
Pingyi Zhou, Jin Liu, Zijiang Yang, and Guangyou Zhou (Wuhan University, China; Western Michigan University, USA; Central China Normal University, China) Software developers can search, share and learn development experience, solutions, bug fixes and open source projects in software information sites such as StackOverflow and Freecode. Many software information sites rely on tags to classify their contents, i.e. software objects, in order to improve the performance and accuracy of various operations on the sites. The quality of tags thus has a significant impact on the usefulness of these sites. High quality tags are expected to be concise and can describe the most important features of the software objects. Unfortunately tagging is inherently an uncoordinated process. The choice of tags made by individual software developers is dependent not only on a developer's understanding of the software object but also on the developer's English skills and preferences. As a result, the number of different tags grows rapidly along with continuous addition of software objects. With thousands of different tags, many of which introduce noise, software objects become poorly classified. Such phenomenon affects negatively the speed and accuracy of developers' queries. In this paper, we propose a tool called TagMulRec to automatically recommend tags and classify software objects in evolving large-scale software information sites. Given a new software object, TagMulRec locates the software objects that are semantically similar to the new one and exploit their tags. We have evaluated TagMulRec on four software information sites, StackOverflow, AskUbuntu, AskDifferent and Freecode. According to our empirical study, TagMulRec is not only accurate but also scalable that can handle a large-scale software information site with millions of software objects and thousands of tags. @InProceedings{SANER17p272, author = {Pingyi Zhou and Jin Liu and Zijiang Yang and Guangyou Zhou}, title = {Scalable Tag Recommendation for Software Information Sites}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {272--282}, doi = {}, year = {2017}, } |
|
Zhou, Yuming |
SANER '17: "An Empirical Investigation ..."
An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults
Yiyang Feng, Wanwangying Ma, Yibiao Yang, Hongmin Lu, Yuming Zhou , and Baowen Xu (Nanjing University, China) In recent years, it has been shown that fault prediction models could effectively guide test effort allocation in finding faults if they have a high enough fault prediction accuracy (Norm(Popt) > 0.78). However, it is often difficult to achieve such a high fault prediction accuracy in practice. As a result, fault-prediction-model-guided allocation (FPA) methods may be not applicable in real development environments. To attack this problem, in this paper, we propose a new type of test effort allocation strategy: reliability-growth-model-guided allocation (RGA) method. For a given project release V, RGA attempts to predict the optimal test effort allocation for V by learning the fault distribution information from the previous releases. Based on three open-source projects, we empirically investigate the cost-effectiveness of three test effort allocation strategies for finding faults: RGA, FPA, and structural-complexity-guided allocation (SCA) method. The experimental results show that RGA shows a promising performance in finding faults when compared with SCA and FPA. @InProceedings{SANER17p371, author = {Yiyang Feng and Wanwangying Ma and Yibiao Yang and Hongmin Lu and Yuming Zhou and Baowen Xu}, title = {An Empirical Investigation into the Cost-Effectiveness of Test Effort Allocation Strategies for Finding Faults}, booktitle = {Proc.\ SANER}, publisher = {IEEE}, pages = {371--381}, doi = {}, year = {2017}, } |
124 authors
proc time: 0.98