Deep Neural Networks (DNNs) have been often used for the labeling of image files (e.g., object detection). Although they can be applied for the labeling of code fragment (i.e., code-to-code search) in software engineering, a large number of code fragments are required for each label in the learning process of DNNs. In this paper, we present an approach for code-to-code search based on a DNN model and code mutation for generating enough number of code fragments for each label. The preliminary experiment shows high precision and recall of the proposed approach.
The typical structure of unit test code (setup - stimulate - verify - teardown)
gives rise to duplicated test logic. Researchers have demonstrated the widespread use
of syntactic clones in test code, yet if duplicated test code is indeed a
problem, then semantic clones may be an issue as well. However, while detecting
syntactic similarities can be done relatively easy, semantic similarities are
more difficult to find. In this paper we present a novel way of detecting
semantic clones by exploiting the unique features present in test code. We
demonstrate on the Apache Commons Math Library's test suite that our approach
can detect 259 semantic clones, of which only 54 were also detected by NiCad.
This confirms that it is both feasible and worthwhile to investigate semantic
clones in test code.
Code review is key to ensuring the absence of potential issues in source code.
Code reviewers spend a large amount of time to manually check submitted patches based on their knowledge.
Since a number of patches sometimes have similar potential issues, code reviewers need to suggest similar source code changes to patch authors.
If patch authors notice similar code improvement patterns by themselves before submitting to code review, reviewers' cost would be reduced.
In order to detect similar code changes patterns, this study employs a sequential pattern mining to detect source code improvement patterns that frequently appear in code review history.
In a case study using a code review dataset of the OpenStack project, we found that the detected patterns by our proposed approach included effective examples to improve patches without reviewers' manual check.
We also found that the patterns have been changed in time series; our pattern mining approach timely achieves to track the effective code improvement patterns.
The existence of code clones has several negative
impacts on software maintenance which is why vast amount of
research exists in the literature to characterize clone evolution.
Most of those focused on clone genealogy, and clone changeness
(consistent and inconsistent changes). However, analyzing clone
evolution with respect to clone location and clone lifetime requires
more attention to better characterize clone evolution. In this
research, an empirical study has been performed on clone
evolution by considering clone location (i.e., Inter-File and IntraFile)
and clone lifetime. The study has been performed on four
open source software covering 12 to 66 number of versions. In
the study, it has been found that, (i) Intra-File clones occurred
in a repository more than Inter-File clones, which infers that
developers tend to clone code in the same file than different
files and (ii) Intra-File clones are more volatile than Inter-File
clones, which infers that developers like to refactor or change
clones of the same file more than clones spanning different files.
These observations help to conclude which clones should get more
attention during clone maintenance tasks like refactoring.
Analyzing histories of code clones is important for understanding how they affect software development and developers.
For this, many studies have been devoted to the approach of tracking code clones.
However, to the best of our knowledge, no existing studies have attempted to track code clones in long-term and fine-grained change histories.
In this paper, we report on the analysis of histories of method-level code clones hosted by a fine-grained version control system called historage, which allowed us to track source code entities across commits.
We have tracked and analyzed method-level code clones in 10 open source software projects and found out that
(1) in many projects, method-level code clones are removed regardless of whether they were changed or how frequently they were changed,
and (2) a group of method-level code clones created at the same time tend to survive longer than those created individually.
We believe that these findings will provide useful insights for future research on code clones such as determining the priority of code clone management.
Legacy systems are important in business but difficult to maintain.
One of the causes of the difficulties is a large number of code clones in the systems;
Those clones implement similar functionalities using common loop idioms in a company.
Since the loop idioms have been developed to implement popular functionalities,
most of them are likely to be translated into simple SQL statements in a new, modernized version of a system.
To investigate the feasibility of the approach, we propose a method to automatically extract cloned loop idioms embedded in COBOL program files.
We manually classified the extracted idioms and labeled them according to their functionalities.
We evaluated the accuracy of our classification result with three experts.