Powered by
2018 IEEE 12th International Workshop on Software Clones (IWSC),
March 20, 2018,
Campobasso, Italy
2018 IEEE 12th International Workshop on Software Clones (IWSC)
Frontmatter
Message from the Chairs
Welcome to the 2018 International Workshop on Software Clones (IWSC). It is the 12th workshop in the series since the first at ICSM 2002 in Montreal. This years workshop is co-located with the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2018) in Campobasso, Italy.
Keynote
Large Scale Clone Detection, Analysis, and Benchmarking: An Evolutionary Perspective (Keynote)
Chanchal K. Roy
(University of Saskatchewan, Canada)
Copying a code fragment and then reusing it by pasting and adapting (e.g., adding/modifying/deleting statements) is a common practice in software development, which results in a significant amount of duplicated code in software systems. Developers consider cloning as one of the principled reengineering approaches and often intentionally practice cloning for a variety of reasons such as faster development, avoiding risk by reusing stable old code, or for time pressure. On the other hand, duplicated code poses a number of threats to the maintenance of software systems such as clones are the #1 “bad smell” in Flower’s refactoring list and several recent studies including studies with industrial systems show that although for many cases clones are not really harmful, and even could be useful for some cases, they could be also detrimental to software maintenance. For example, reusing a fragment containing unknown bugs may result in bugs propagation, or any changes in requirements involving a cloned fragment may lead to changes to all the similar fragments to it, multiplying the work to be done. Furthermore, inconsistent changes to the cloned fragments during any updating processes may lead to severe unexpected behaviour. Software clones are thus considered to be one of the major contributors to the high software maintenance cost, which could be up to 80% of total software development cost. The era of Big Data has introduced new applications for clone detection. For example, clone detection has been used to find similar mobile applications, to intelligently tag code snippets, to identify code examples, and so on from large inter-project repositories. The dual role of clones in software development and maintenance, along with these many emerging new applications of clone detection, has led to a great many clone detection tools and analysis frameworks. In this keynote talk, I will review the cloning literature to date, in particular, I will talk about our recent work on large scale clone detection, and the challenges in evaluating such clone detectors and how we have overcome them at least in part with our BigCloneBench and Mutation framework. I will then talk about the recent advances in clone analysis and management along with a vision for a comprehensive clone management system.
@InProceedings{IWSC18p1,
author = {Chanchal K. Roy},
title = {Large Scale Clone Detection, Analysis, and Benchmarking: An Evolutionary Perspective (Keynote)},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {1--1},
doi = {},
year = {2018},
}
Clone Analysis
Are There Functionally Similar Code Clones in Practice?
Verena Käfer, Stefan Wagner, and
Rainer Koschke
(University of Stuttgart, Germany; University of Bremen, Germany)
Having similar code fragments, also called clones, in software systems can lead to nnecessary comprehension, review and change efforts. Syntactically similar clones can often be encountered in practice. The same is not clear for only functionally similar clones (FSC). We conducted an exploratory survey among developers to investigate whether they encounter unctionally similar clones in practice and whether there is a difference in their inclination to remove them to syntactically similar clones. Of the 34 developers answering the survey, 31 have experienced FSC in their professional work, and 24 have experienced problems caused by FSCs. We found no difference in the inclination and reasoning for removing FSCs and syntactically similar clones. FSCs exist in practice and should be investigated to bring clone detectors to the same quality as for syntactically similar clones, because being able to detect them allows developers to manage and potentially remove them.
@InProceedings{IWSC18p2,
author = {Verena Käfer and Stefan Wagner and Rainer Koschke},
title = {Are There Functionally Similar Code Clones in Practice?},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {2--8},
doi = {},
year = {2018},
}
Structural Clones: An Evolution Perspective
Jaweria Kanwal, Hamid Abdul Basit, and Onaiza Maqbool
(Quaid-i-Azam University, Pakistan; Lahore University of Management Sciences, Pakistan)
Structural clones are recurring patterns of simple code clones in software that represent a bigger picture of similarity in software (e.g. software design). Elevating the analysis of cloning to structural clone level helps in better clone management in terms of clone understanding, maintenance and evolution. In this paper, we propose a systematic approach to study structural clone evolution in software versions. We use our approach to analyze the evolutionary behavior of structural clones and also compare it with the evolution of simple clones. We performed experiments on different versions of three Java systems. Our analysis of structural clone evolution reveals interesting evolutionary characteristics of clones. For example, one finding is that simple clones are more frequently changed than structural clones whereas average lifetime of structural clones is less than that of simple clones. Study of clone evolution is helpful for identifying maintenance implications of clones and for devising better clone management systems.
@InProceedings{IWSC18p9,
author = {Jaweria Kanwal and Hamid Abdul Basit and Onaiza Maqbool},
title = {Structural Clones: An Evolution Perspective},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {9--15},
doi = {},
year = {2018},
}
Generated Code in Studies on Clone Rates
Rainer Koschke and Moritz Weinig
(University of Bremen, Germany)
Various earlier studies have measured clone rates for diverse
projects. One of the reasons for exceptionally high clone rates for
individual source files was found to be auto-generated
code. Automatically generated code is generally not maintained and,
hence, should be excluded from clone-rate measurements. This kind of
code might even introduce a bias to clone rates of projects when there
is a large amount of generated code and clone rates for generated
files generally deviate from the average clone rate for handwritten
code. While some generated files stuck out with clone rates above the
average in earlier studies, we do not know whether this is generally
the case and how much code is actually generated automatically.
This paper investigates the amount of generated files in projects,
whether clone rates for generated files really differ from handwritten
code, and-overall-whether generated code in fact introduces a bias
to clone rates. We heuristically detect generated files in a very
large open-source project corpus of programs written in C, C++, C#,
or Java and report the number of projects with generated code. For
these projects, we compare clone rates of generated and handwritten
files.
Our results show higher clone rates for generated files. Moreover,
when we aggregate clone rates from files to projects, the clone rates
of projects with at least one generated file are also slightly higher
than in projects for which no generated files were detected. Our
results suggest that researchers should indeed take special care to
exclude generated code in studies on clone rates.
@InProceedings{IWSC18p16,
author = {Rainer Koschke and Moritz Weinig},
title = {Generated Code in Studies on Clone Rates},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {16--22},
doi = {},
year = {2018},
}
Cloning Applications: Code Generation and Software Quality Metrics
On the Characteristics of Buggy Code Clones: A Code Quality Perspective
Md. Rakibul Islam and Minhaz F. Zibran
(University of New Orleans, USA)
Code clone is an immensely studied code smell. Not all the clones in a software system are equally harmful. Earlier work studied various traits of clones including their stability and relationships with program faults against non-cloned code. This paper presents a comparative study on the characteristics of buggy and non-buggy clones from a code quality perspective.
In the light of 29 code quality metrics, we study buggy and non-buggy clones in 2,077 revisions of three software systems written in Java. The findings from this work add to the characterization of buggy clones. Such a characterization will be useful in cost-effective clone management and clone-aware software development.
@InProceedings{IWSC18p23,
author = {Md. Rakibul Islam and Minhaz F. Zibran},
title = {On the Characteristics of Buggy Code Clones: A Code Quality Perspective},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {23--29},
doi = {},
year = {2018},
}
Towards Automated Generation of Java Methods: A Way of Automated Reuse-Based Programming
Kento Shimonaka,
Yoshiki Higo, Junnosuke Matsumoto, Keigo Naitou, and Shinji Kusumoto
(Osaka University, Japan)
Automatic programming has been researched for a long time. A variety of methodologies have been proposed. However, they have limited applicability, or they can generate only a few lines of code. In this research, the authors are trying to generate source code of Java methods based on their specifications. In this paper, we propose a reuse-based code generation technique with method signature and test cases. First, our technique searches existing Java methods whose signature are the same as the one input by a user. Then, our technique reworks each of them by using test cases input by the user. Methods passing all the test cases are given to the user. At this moment, the authors have implemented a naive prototype and conducted experiments with four open source software. In total, our technique succeeded to generate 18 Java methods. In this paper, we also introduce some actual examples of generated Java methods and some ideas to enhance our
technique.
@InProceedings{IWSC18p30,
author = {Kento Shimonaka and Yoshiki Higo and Junnosuke Matsumoto and Keigo Naitou and Shinji Kusumoto},
title = {Towards Automated Generation of Java Methods: A Way of Automated Reuse-Based Programming},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {30--36},
doi = {},
year = {2018},
}
Correlation Analysis between Code Clone Metrics and Project Data on the Same Specification Projects
Yoshiki Higo, Shinsuke Matsumoto, Shinji Kusumoto, Takashi Fujinami, and Takashi Hoshino
(Osaka University, Japan; NTT, Japan)
The presence of code clones is pointed out as a factor that makes software maintenance more difficult. On the other hand, some research studies reported that only a small part of code clones requires simultaneous changes and their negative influences on software maintenance are limited. Besides, some other studies reported that code clones often have positive effects on software development. Currently, the authors are researching exploring the effect of clones on software development and maintenance. In this paper, the authors report their exploratory results on the relationship between clone metrics and project data such as the number of test cases and the number of found bugs. The targets of this exploration are nine web-based software systems. Interestingly, all of them were developed based on the same specification. In other words, they are functionally the same software systems. By targeting such projects, we can explore how implementation differences affect software development. As a result, unit/integration/system testing become more difficult in case that many clones exist in a project.
@InProceedings{IWSC18p37,
author = {Yoshiki Higo and Shinsuke Matsumoto and Shinji Kusumoto and Takashi Fujinami and Takashi Hoshino},
title = {Correlation Analysis between Code Clone Metrics and Project Data on the Same Specification Projects},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {37--43},
doi = {},
year = {2018},
}
Clone Detection Techniques and Clone Visualization
A Picture Is Worth a Thousand Words: Code Clone Detection Based on Image Similarity
Chaiyong Ragkhitwetsagul, Jens Krinke, and Bruno Marnette
(University College London, UK; Prodo, UK)
This paper introduces a new code clone detection technique based on image
similarity. The technique captures visual perception of code seen by humans in
an IDE by applying syntax highlighting and images conversion on raw source code
text. We compared two similarity measures, Jaccard and earth mover's distance
(EMD) for our image-based code clone detection technique. Jaccard similarity
offered better detection performance than EMD. The F1
score of our technique on detecting Java clones with pervasive code
modifications is comparable to five well-known code clone detectors: CCFinderX,
Deckard, iClones, NiCad, and Simian. A Gaussian blur filter is chosen as a normalisation
technique for type-2 and type-3 clones. We found that blurring code images
before similarity computation resulted in higher precision and recall. The
detection performance after including the blur filter increased by 1 to 6
percent. The manual investigation of clone pairs in three software systems
revealed that our technique, while it missed some of the true clones, could also
detect additional true clone pairs missed by NiCad.
@InProceedings{IWSC18p44,
author = {Chaiyong Ragkhitwetsagul and Jens Krinke and Bruno Marnette},
title = {A Picture Is Worth a Thousand Words: Code Clone Detection Based on Image Similarity},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {44--50},
doi = {},
year = {2018},
}
Info
Detecting Functionally Similar Code within the Same Project
Ryo Tajima, Masataka Nagura, and Shingo Takada
(Keio University, Japan; Nihon University, Japan)
Multiple developers often take part in a software
development project. Although these developers are collaborating
towards the development within the same project, each developer
creates code on their own. This may lead to duplicate or
similar code appearing in different parts of the software. Such
code should be removed to improve maintainability. This paper
proposes an approach to automatically detect such code, which
we shall call functionally similar code. The unit of detection is at
the method level, and we focus on input/output and the method
structure using program dependence graph. We show the results
of applying our approach on open source software.
@InProceedings{IWSC18p51,
author = {Ryo Tajima and Masataka Nagura and Shingo Takada},
title = {Detecting Functionally Similar Code within the Same Project},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {51--57},
doi = {},
year = {2018},
}
Towards Slice-Based Semantic Clone Detection
Hakam W. Alomari and
Matthew Stephan
(Miami University, USA)
This paper presents our proposed approach for detecting code clones based on similar slices of different versions of large software systems. We begin by presenting our initial thoughts on realizing software slice clone detection. We describe our initial results obtained by means of scripts to identify clones at different levels of granularity. The clones between versions are represented as pairs of cloned slices. Our results include a case study of over 191 versions of the Linux kernel, spanning over 10 years. In the near future, we plan on experimenting with established clone detectors to realize a complete and robust analysis approach.
@InProceedings{IWSC18p58,
author = {Hakam W. Alomari and Matthew Stephan},
title = {Towards Slice-Based Semantic Clone Detection},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {58--59},
doi = {},
year = {2018},
}
Code Difference Visualization by a Call Tree
Toshihiro Kamiya
(Shimane University, Japan)
Understanding modifications to a software product is essential in software maintenance. To help programmers understand modifications, especially how code changes in a refactoring, this paper presents a semi-automated dynamic analysis to compare two revisions of a product.
The approach basically detects “similar but different” sub-tree pairs between call trees from execution traces of the two revisions and then draws up a call graph of the pairs. In addition, pruning techniques or heuristics are used to make the graph smaller and easier to be understood.
@InProceedings{IWSC18p60,
author = {Toshihiro Kamiya},
title = {Code Difference Visualization by a Call Tree},
booktitle = {Proc.\ IWSC},
publisher = {IEEE},
pages = {60--63},
doi = {},
year = {2018},
}
proc time: 0.18