ICSE 2011 Workshops
33rd International Conference on Software Engineering
Powered by
Conference Publishing Consulting

Workshop on Emerging Trends in Software Metrics (WETSoM 2011), May 24, 2011, Waikiki, Honolulu, HI, USA

WETSoM 2011 – Proceedings

Contents - Abstracts - Authors

Workshop on Emerging Trends in Software Metrics (WETSoM 2011)


Title Page

The Workshop on Emerging Trends in Software Metrics aims at bringing together researchers and practitioners to discuss the progress of software metrics. The motivation for this workshop is the low impact that software metrics has on current software development. The goals of this workshop are to critically examine the evidence for the effectiveness of existing metrics and to identify new directions for development of software metrics.


Data Quality: Cinderella at the Software Metrics Ball?
Martin Shepperd
(Brunel University, UK)
In this keynote I explore what exactly do we mean by data quality, techniques to assess data quality and the very significant challenges that poor data quality can pose. I believe we neglect data quality at our peril since — whether we like it or not — our research results are founded upon data and our assumptions that data quality issues do not confound our results. A systematic review of the literature suggests that it is a minority practice to even explicitly discuss data quality. I therefore suggest that this topic should become a higher priority amongst empirical software engineering researchers.

Article Search

Software Quality

Integrating Quality Models and Static Analysis for Comprehensive Quality Assessment
Klaus Lochmann and Lars Heinemann
(Technische Universität München, Germany)
To assess the quality of software, two ingredients are available today: (1) quality models defining abstract quality characteristics and (2) code analysis tools providing a large variety of metrics. However, there exists a gap between these two worlds. The quality attributes defined in quality models are too abstract to be operationalized. On the other side, the aggregation of the results of static code analysis tools remains a challenge. We address these problems by defining a quality model based on an explicit meta-model. It allows to operationalize quality models by defining how metrics calculated by tools are aggregated. Furthermore, we propose a new approach for normalizing the results of rule-based code analysis tools, which uses the information on the structure of the source code in the quality model. We evaluate the quality model by providing tool support for both developing quality models and conducting automatic quality assessments. Our results indicate that large quality models can be built based on our meta-model. The automatic assessment shows a high correlation between the automatic assessment and an expert-based ranking.

Article Search
Is My Project's Truck Factor Low? Theoretical and Empirical Considerations About the Truck Factor Threshold
Marco Torchiano, Filippo Ricca, and Alessandro Marchetto
(Politecnico di Torino, Italy; Università di Genova, Italy; Fondazione Bruno Kessler, Italy)
The Truck Factor is a simple way, proposed by the agile community, to measure the system’s knowledge distribution in a team of developers. It can be used to highlight potential project problems due to the inadequate distribution of the system knowledge. Notwithstanding its relevance, only few studies investigated the Truck Factor and proposed ways to efficiently measure, evaluate and use it. In particular, the effective use of the Truck Factor is limited by the lack of reliable thresholds. In this preliminary paper, we present a theoretical model concerning the Truck Factor and, in particular, we investigate its use to define the maximum achievable Truck Factor value in a project. The relevance of such a value concerns the definition of a reliable threshold for the Truck Factor. Furthermore in the paper, we document an experiment in which we apply the proposed model to real software projects with the aim of comparing the maximum achievable value of the Truck Factor with the unique threshold proposed in literature. The preliminary outcome we achieved shows that the existing threshold has some limitations and problems.

Article Search
Analyzing Tool Usage to Understand to What Extent Experts Change their Activities when Mentoring
Pekka Abrahamsson, Ilenia Fronza, and Jelena Vlasenko
(Free University of Bozen, Italy)
Automated In-Process Software Engineering Measurement and Analysis (AISEMA) systems represent a major advancement in tracking non-invasively the activities of developers. We have built on the top of an AISEMA system a model that enables to understand better how the tools are used in practical real-life development settings. In this work we evaluate to what extent experienced developers change their activities during mentoring activities in Pair Programming (PP) and, in this case, how long this effect can be observed. We compare how the experienced developers use the tools when working with other experts and when with new developers. The results indicate that there is a notable difference in the way the tools are used between the experts working together and the experts mentoring the new developers that have just joined the team. Moreover, over time the difference between pairs of experts and mixed pairs (experts and novices) working together becomes almost unnoticeable.

Article Search
By No Means: A Study on Aggregating Software Metrics
Bogdan Vasilescu, Alexander Serebrenik, and Mark van den Brand
(Technische Universiteit Eindhoven, Netherlands)
Fault prediction models usually employ software metrics which were previously shown to be a strong predictor for defects, e.g., SLOC. However, metrics are usually defined on a micro-level (method, class, package), and should therefore be aggregated in order to provide insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the mean, median, or sum, recently econometric aggregation techniques, such as the Gini, Theil, and Hoover indices have been proposed.
In this paper we wish to understand whether the aggregation technique influences the presence and strength of the relation between SLOC and defects. Our results indicate that correlation is not strong, and is influenced by the aggregation technique.

Article Search
Insights into Component Testing Process
Vikrant Kaulgud and Vibhu Saujanya Sharma
(Accenture Technology Labs, India)
Effective component testing (or commonly termed as Unit Testing) is important to control defect slippage into the testing stage. Often testing teams lack in-process visibility into the effectiveness of ongoing component testing. Using project data such as code coverage and schedule and effort estimates, we generate temporal and rate-based insights into component testing effectiveness. A simple composite metric is used for measuring and forecasting the health of the component testing process. The early warning signals, based on the forecast and associated insights, lead teams to take proactive actions for improving component testing. In our ongoing experimental studies, we have observed that use of these insights cause a substantial reduction in defect slippage.

Article Search

Changes and Defects

Linking Software Design Metrics to Component Change-Proneness
Claire Ingram and Steve Riddle
(Newcastle University, UK)
One technique from value-based software engineering involves prioritising the system and selectively applying time-consuming techniques (such as traceability) in order to maximise return on investment. This prioritisation could be based on predicted change-proneness of code modules, if a sufficiently accurate prediction can be achieved. Several previous studies have examined links between software change-proneness and software metrics such as size and complexity. However, conclusions differ as to the strength of the relationships. We present here a new case study project, extracting a range of complexity values from the code modules and testing for the existence of a significant link between change-proneness and complexity. We find only limited evidence of a linear relationship, but analysis using other statistical techniques does reveal some other significant links.

Article Search
Stability of Java Interfaces: A Preliminary Investigation
Jonathan Chow and Ewan Tempero
(The University of Auckland, New Zealand)
The attribute of stability is regarded by some as an important attribute of software. Some claims regarding software design quality imply that what are called interfaces in Java are stable. This paper introduces some new metrics for investigating such claims, and presents some preliminary measurements from these metrics, which indicate that developers do not consistently develop stable interfaces.

Article Search
Different Strokes for Different Folks: A Case Study on Software Metrics for Different Defect Categories
Ayse Tosun Mısırlı, Bora Çağlayan, Andriy V. Miranskyy, Ayşe Başar Bener, and Nuzio Ruffolo
(Bogazici University, Turkey; IBM Canada Ltd., Canada; Ryerson University, Canada)

Article Search
Concern-Based Cohesion as Change Proneness Indicator: An Initial Empirical Study
Bruno C. da Silva, Cláudio Sant'Anna, and Christina Chavez
(Federal University of Bahia, Brazil)
Structure-based cohesion metrics, such as the well-known Chidamber and Kemerer’s Lack of Cohesion in Methods (LCOM), fail to capture the semantic notion of a software component’s cohesion. Some researchers claim that it is one of the reasons they are not good indicators of change proneness. The Lack of Concern-based Cohesion metric (LCC) is an alternative cohesion metric which is centered on counting the number of concerns a component implements. A concern is any important concept, feature, property or area of interest of a system that we want to treat in a modular way. In this way, LCC focus on what really matters for assessing a component’s cohesion - the amount of responsibilities placed on them. Our aim in this paper is to present an initial investigation about the applicability of this concern-based cohesion metric as a change proneness indicator. We also checked if this metric has a correlation with efferent coupling. An initial empirical assessment work was done with two small to medium-sized systems. Our results indicated a moderate to storng correlation between LCC and change proneness, and also a strong correlation between LCC and efferent coupling.

Article Search
A Revised Web Objects Method to Estimate Web Application Development Effort
Raffaella Folgieri, Giulio Barabino, Giulio Concas, Erika Corona, Roberto De Lorenzi, Michele L. Marchesi, and Andrea Segni
(University of Genova, Italy; University of Milan, Italy; University of Cagliari, Italy; Datasiel spa, Italy)
We present a study of the effectiveness of estimating web application development effort using Function Points and Web Objects methods, and a method we propose – the Revised Web Objects (RWO). RWO is an upgrading of WO method, aimed to account for new web development styles and technologies. It also introduces an up-front classification of web applications according to their size, scope and technology, to further refine their effort estimation. These methods were applied to a data-set of 24 projects obtained by Datasiel spa, a mid-sized Italian company, focused on web application projects, showing that RWO performs statistically better than WO, and roughly in the same way as FP.

Article Search
Which Code Construct Metrics are Symptoms of Post Release Failures?
Meiyappan Nagappan, Brendan Murphy, and Mladen Vouk
(North Carolina State University, USA; Microsoft Research, USA)
Software metrics, such as code complexity metrics and code churn metrics, are used to predict failures. In this paper we study a specific set of metrics called code construct metrics and relate them to post release failures. We use the values of the code construct metrics for each file to characterize that file. We analyze the code construct metrics along with the post release failure data on the files (that splits the files into two classes: files with post release failures and files without post release failures). In our analysis we compare a file with post release failure to a set of files without post release failures, that have similar characteristics. In our comparison we identify which code construct metric, more often than the others, differs the most between these two classes of files. The goal of our research is to find out which code construct metrics can perhaps be used as symptoms of post release failures. In this paper we analyzed the code construct metrics of Eclipse 2.0, 2.1, and 3.0. Our results indicate that MethodInvocation, QualifiedName, and SimpleName, are the code constructs that differentiates the two classes of files the most and hence are the key symptoms/indicators of a file with post release failures in these versions of Eclipse.

Article Search

Challenges and Future Research Trends

The Fractal Dimension Metric and Its Use to Assess Object-Oriented Software Quality
Ivana Turnu, Giulio Concas, Michele L. Marchesi, and Roberto Tonelli
(University of Cagliari, Italy)
We present a study were software systems are considered as complex networks which have a self-similar structure under a length-scale transformation. On such complex software networks we computed a self-similar coefficient, also known as fractal dimension, using “the box counting method ”.
We analyzed various releases of the publically available Eclipse software systems, calculating the fractal dimension for twenty sub-projects, randomly chosen, for every release, as well as for each release as a whole. Our results display an over- all consistency among the sub-projects and among all the analyzed releases. We found a very good correlation between the fractal di- mension and the number of bugs for Eclipse and for twenty sub-projects. Since the fractal dimension is just a scalar number that characterizes a whole system, while complex- ity and quality metrics are in general computed on every system module, this result suggests that the fractal dimen- sion could be considered as a global quality metric for large software systems. Our results need however to be confirmed for other large software systems.

Article Search
Program Slicing-Based Cohesion Measurement: The Challenges of Replicating Studies Using Metrics
David Bowes, Tracy Hall, and Andrew Kerr
(University of Hertfordshire, UK; Brunel University, UK)
Background: It is important to develop corpuses of data to test out the efficacy of using metrics. Replicated studies are an important contribution to corpuses of metrics data. There are few replicated studies using metrics reported in software engineering. Aim: To contribute more data to the body of evidence on the use of novel program slicing-based cohesion metrics. Method: We replicate a very well regarded study by Meyers and Binkley [15, 16] which analyses the cohesion of open source projects using program slicing-based metrics. Results: Our results are very different from Meyers and Binkley’s original results. This suggests that there are a variety of opportunities for inconsistently to creep into the collection and analysis of metrics data during replicated studies. Conclusion: We conclude that researchers using metrics data must present their work with sufficient detail for replication to be possible. Without this detail it is difficult for subsequent researchers to accurately replicate a study such that consistent and reliable data can be added to a body of evidence.

Article Search
Human Judgement and Software Metrics: Vision for the Future
Carolyn Mair and Martin Shepperd
(Southampton Solent University, UK; Brunel University, UK)
Background: There has been much research into building formal (metrics-based) prediction systems with the aim of improving resource estimation and planning of software projects. However the ‘objectivity’ of such systems is illusory in the sense that many inputs need themselves to be estimated by the software engineer. Method: We review the uptake of past software project prediction research and identify relevant cognitive psychology research on expert behaviour. In particular we explore potential applications of recent metacognition research. Results: We find the human aspect is largely ignored, despite the availability of many important results from cognitive psychology. Conclusions: In order to increase the actual use of our metrics research e.g. effort prediction systems we need to have a more integrated view of how such research might be used and who might be using it. This leads to our belief that future research must be more holistic and inter-disciplinary.

Article Search

proc time: 0.4