Powered by
Conference Publishing Consulting

2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE), May 19, 2013, San Francisco, CA, USA

TEFSE 2013 – Proceedings

Contents - Abstracts - Authors

Preface

Title Page


Message from the Chairs
Welcome to the 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE 2013), which is collocated with the 35th International Conference on Software Engineering (ICSE 2013) in San Francisco, California. We hope you will all enjoy the location as much as the workshop and the main conference.

Committees


Full Papers

Why Innovation Processes Need to Support Traceability
Thomas Beyhl, Gregor Berg, and Holger Giese
(HPI, Germany)
Today, more and more companies employ innovation processes to gain a competitive advantage. The resulting ideas, i.e. products or services, are often desirable for end users, but also have to be feasible to produce and viable to sell. In practice, innovation processes (e.g. design thinking) and engineering are two separate processes with an information handover in between. This handover often includes a presentation and a prototype, which illustrate the overall idea. However, the rationales leading to this final idea are often neglected. Without this information, engineers are not able to make well-informed trade-off decisions between different aspects of the final idea, as they are required when realizing a desirable product feasibly and viably. Thus, engineers require a handover that needs to be as detailed and explicit as possible to close the documentation gap between non-engineers and engineers. In this position paper, we discuss how employing traceability can close this handover gap. Specifically, we illustrate how Gotel and Morris’ traceability framework can be applied for innovative engineering processes. We present which benefits traceability provides to innovators and engineers and how traceability can improve the successful realization of innovative ideas.

Decision-Centric Traceability of Architectural Concerns
Jane Cleland-Huang, Mehdi Mirakhorli, Adam Czauderna, and Mateusz Wieloch
(DePaul University, USA)
We present an architecture-centric approach for achieving traceability between stakeholders' quality concerns, architecturally significant requirements, design rationales, and source code. In Decision-Centric Traceability (DCT), all trace links are focused around architectural decisions that include factors as varied as platforms, languages, frameworks, patterns, and lower-level architectural tactics. We show how DCT supports critical software engineering activities such as safety-case construction, impact analysis, stakeholder satisfaction analysis, requirements validation, and architectural preservation. Our approach is illustrated and validated with examples drawn from the architectural decisions and subsequent design of the TraceLab project funded by the US National Science Foundation under a Major Research Infrastructure grant.

Getting More from Requirements Traceability: Requirements Testing Progress
Celal Ziftci and Ingolf Krüger
(UC San Diego, USA)
Requirements Engineering (RE) and Testing are important steps in many software development processes. It is critical to monitor the progress of the testing phase to allocate resources (person-power, time, computational resources) properly, and to make sure the prioritization of requirements are reflected during testing, i.e. more critical requirements are given higher priority and tested well. In this paper, we propose a new metric to help stakeholders monitor the progress of the testing phase from a requirements perspective, i.e. which requirements are tested adequately, and which ones insufficiently. Unlike existing progress related metrics, such as code coverage and MC/DC (modified condition/decision) coverage, this metric is on the requirements level, not source code level. We propose to automatically reverse engineer this metric from the existing test cases of a system. We also propose a method to evaluate this metric, and report the results of three case studies. On these case studies, our technique obtains results within 75.23% - 91.11% of the baseline on average.

Using Traceability Links to Identifying Potentially Erroneous Artifacts during Regulatory Reviews
Wuwei Shen, Chung-Ling Lin, and Andrian Marcus
(Western Michigan University, USA; Wayne State University, USA)
Safety critical systems emphasize the high quality of both hardware and software of a product since the safety of the product to the public tops all the other considerations. In these domains, regulatory agencies are entitled to conduct reviews on the entire range of artifacts produced from system design to system performance/maintenance. Regulatory review is comprised of the pre-market review and the post-market review. Each aspect of the regulatory review is time-consuming and laborious. In this paper we target situations when errors are identified in the reviewed software, either during pre- or post-market review. We propose an automated mechanism, which utilizes traceability information to identify lists of related software artifacts that are similar to those involved in the reported errors. With the tool recommendations, regulators can quickly investigate these suspicious locations and avoid the occurrence of future hazardous events.

Towards Recovering and Maintaining Trace Links for Model Sketches across Interactive Displays
Markus Kleffmann, Matthias Book, and Volker Gruhn
(University of Duisburg-Essen, Germany)
In complex software projects, it is often difficult for a team of stakeholders with heterogeneous backgrounds to maintain a common understanding of the system´s structure and the challenges in its implementation. In this paper, we therefore introduce the concept of an augmented "Interaction Room", i.e. a physical room whose walls are outfitted with wall-sized touchscreens that visualize different aspects of a software system. The information displayed on the walls is related, so when a user changes the content on one wall, e.g. by editing or navigating a diagram, the contents on the other walls should change correspondingly. This raises the need for traceability techniques to recover trace links between the walls and to maintain them in real-time. In contrast to how one works with existing modeling tools, the pragmatic methodology of the Interaction Room encourages users to work with sketches that may often remain incomplete and inconsistent. This makes the identification and maintenance of trace links particularly difficult. We therefore describe how a combination of prospective and retrospective traceability techniques can be used to recover and maintain the trace links in such an interactive room.

Ontology-Based Trace Retrieval
Yonghua Li and Jane Cleland-Huang
(Wuhan University of Technology, China; DePaul University, USA)
In automated requirements trace retrieval, an ontology can be used as an intermediary artifact to identify relationships that would not be recognized by standard information retrieval techniques. However, ontologies must be carefully constructed to fit the needs of the project. In this paper we present a technique for incorporating information from general and domain-specific ontologies into the tracing process. Our approach applies the domain ontology at the phrase level and then uses a general ontology to augment simple term matching in order to deduce relationships between individual terms weighted according to the relative importance of the phrase in which they occur. The combined weights are used to compute the overall similarity between a source and target artifact in order to establish a candidate trace link. We experimentally evaluated our approach against the standard Vector Space Model (VSM) and show that a domain ontology combined with generalized ontology returned greatest improvements in trace accuracy.

Human Recoverability Index: A TraceLab Experiment
Alexander Dekhtyar and Michael Hilton
(Cal Poly, USA)
It has been generally accepted that not all trace links in a given requirements traceability matrix are equal - both human analysts and automated methods are good at spotting some links, but have blind spots for some other. One way to choose automated techniques for inclusion in assisted tracing processes (i.e., the tracing processes that combine the expertise of a human analyst and special-purpose tracing software) is to select the techniques that tend to discover more links that are hard for human analysts to observe and establish on their own. This paper proposes a new measure of performance of a tracing method: human recoverability index-based recall. In the presence of knowledge about the difficulty of link recovery by human analysts, this measure rewards methods that are able to recover such links over methods that tend to recover the same links as the human analysts. We describe a TraceLab experiment we designed to evaluate automated trace recovery methods based on this measure and provide a case study of the use of this experiment to profile and evaluate different automated tracing techniques.

Trace Matrix Analyzer (TMA)
Wenbin Li, Jane Huffman Hayes, Fan Yang, Ken Imai, Jesse Yannelli, Chase Carnes, and Maureen Doyle
(University of Kentucky, USA; Northern Kentucky University, USA)
A Trace Matrix (TM) represents the relationship between software engineering artifacts and is foundational for many software assurance techniques such as criticality analysis. In a large project, a TM might represent the relationships between thousands of elements of dozens of artifacts (for example, between design elements and code elements, between requirements and test cases). In mission- and safety-critical systems, a third party agent may be given the job to assess a TM prepared by the developer. Due to the size and complexity of the task, automated techniques are needed. We have developed a technique for analyzing a TM, called Trace Matrix Analyzer (TMA), so that third party agents can perform their work faster and more effectively. To validate, we applied TMA to two TMs with known problems and golden answersets: MoonLander and MODIS. We also asked an experienced software engineer to manually review the TM. We found that TMA properly identified TM issues and was much faster than manual review, but also falsely identified issues for one dataset. This work addresses the Trusted Grand Challenge, research projects 3, 5, and 6.

Towards an Eye-Tracking Enabled IDE for Software Traceability Tasks
Braden Walters, Michael Falcone, Alexander Shibble, and Bonita Sharif
(Youngstown State University, USA)
The paper presents iTrace, an eye-tracking plug-in for the Eclipse IDE. The premise is to use developers eye gaze as input to traceability tasks such as generating links between various artifacts. The design, architecture, and current state of iTrace is described. Support for a variety of traceability tasks such as link retrieval, link evolution, link visualization, and empirical studies are also discussed. An initial link generation heuristic using iTrace is presented with plans for future evaluation.

Backward Propagation of Code Refinements on Transformational Code Generation Environments
Victor Guana and Eleni Stroulia
(University of Alberta, Canada)
Transformational code generation is at the core of generative software development. It advocates the modeling of common and variable features in software-system families with domain-specific languages, and the specification of transformation compositions for successively refining the abstract domain models towards eventually enriching them with execution semantics. Thus, using code-generation environments, families of software systems can be generated, based on models specified in high-level domain languages. The major advantage of this software-construction methodology stems from the fact that it enables the reuse of verified execution semantics, derived from domain models. However, like all software, once an implementation is generated, it is bound to evolve and manually refined to introduce features that were not captured by its original generation environment. This paper describes a conceptual framework for identifying features that have to be propagated backwards to generation engines, from refined generated references. Our conceptual framework is based on static and symbolic execution analysis, and aims to contribute to the maintenance and evolution challenges of model-driven development.

REquirements TRacing On target (RETRO) Enhanced with an Automated Thesaurus Builder: An Empirical Study
Sandeep Pandanaboyana, Shreeram Sridharan, Jesse Yannelli, and Jane Huffman Hayes
(University of Kentucky, USA)
Abstract - Several techniques have been proposed to increase the performance of the tracing process, including use of a thesaurus. Some thesauri pre-exist and have been shown to improve the recall for some datasets. But the drawback is that they are manually generated by analysts based on study and analysis of the textual artifacts being traced. To alleviate that effort, we developed an application that accepts textual artifacts as input and generates a thesaurus dynamically, we call it Thesaurus Builder. We evaluated the performance of REquirements TRacing On target (RETRO) with a Thesaurus generated by Thesaurus Builder. We found that recall increased from 81.9% with no thesaurus to 87.18% when the dynamic thesaurus was used. We also found that Okapi weighting resulted in better recall and precision than TF-IDF weighting, but only precision was statistically significant.

Establishing Content Traceability for Software Applications: An Approach Based on Structuring and Tracking of Configuration Elements
Padmalata Nistala and Priyanka Kumari
(TATA Consultancy Services, India)
Establishing content traceability between various software artifacts or configuration elements at granular level and identifying the gaps in traceability at every phase is a key challenges in software development. In other disciplines such as manufacturing and systems engineering we can find models, well established principles and practices for formulating and tracing the product parts and composition. This paper extends the system model and product breakdown structure concepts from these disciplines to software systems. We propose a model that provides a granular view of software product composition and content traceability through structured relationships among various software configuration elements. Here we define the key configuration elements essential for the alignment and traceability, create a structure through interconnected relationships of these elements at each phase and analyze the inconsistencies in the relationship. The model provides a visual representation to understand the completeness at each of the development stages. The content traceability is established from both completeness and correctness perspectives and gaps are identified at each phase. The paper briefly describes the model and initial results from pilot implementation in an industry application.

Enabling Traceability Reuse for Impact Analyses: A Feasibility Study in a Safety Context
Markus Borg, Orlena C. Z. Gotel, and Krzysztof Wnuk
(Lund University, Sweden)
Engineers working on safety critical software development must explicitly specify trace links as part of Impact Analyses (IA), both to code and non-code development artifacts. In large-scale projects, constituting information spaces of thousands of artifacts, conducting IA is tedious work relying on extensive system understanding. We propose to support this activity by enabling engineers to reuse knowledge from previously completed IAs. We do this by mining the trace links in documented IA reports, creating a semantic network of the resulting traceability, and rendering the resulting network amenable to visual analyses. We studied an Issue Management System (IMS), from within a company in the power and automation domain, containing 4,845 IA reports from 9 years of development relating to a single safety critical system. The domain has strict process requirements guiding the documented IAs. We used link mining to extract trace links, from these IA reports to development artifacts, and to determine their link semantics. We constructed a semantic network of the interrelated development artifacts, containing 6,104 non-code artifacts and 9,395 trace links, and we used two visualizations to examine the results. We provide initial suggestions as to how the knowledge embedded in such a network can be (re-)used to advance support for IA.

A TraceLab-Based Solution for Identifying Traceability Links using LSI
Nouh Alhindawi, Omar Meqdadi, Brian Bartman, and Jonathan I. Maletic
(Kent State University, USA)
An information retrieval technique, latent semantic indexing (LSI), is used to automatically identify traceability links from system documentation to program source code. The experiment is performed in the TraceLab framework. The solution provides templates and components for building and querying LSI space and datasets (corpora) that can be used as inputs for these components. The proposed solution is evaluated on traceability links already discovered by mining adaptive commits of the open source system KDE/Koffice. The results show that the approach can identify of traceability links with high precision using TraceLab components.

The Role of Artefact Corpus in LSI-Based Traceability Recovery
Gabriele Bavota, Andrea De LuciaORCID logo, Rocco Oliveto, Annibale Panichella, Fabio Ricci, and Genoveffa Tortora
(University of Sannio, Italy; University of Salerno, Italy; University of Molise, Italy)
Latent Semantic Indexing (LSI) is an advanced method widely and successfully employed in Information Retrieval (IR). It is an extension of Vector Space Model (VSM) and it is able to overcome VSM in canonical IR scenarios where it is used on very large document repositories. LSI has also been used to semi-automatically generate traceability links between software artefacts. However, in such a scenario LSI is not able to overcome VSM. This contradicting result is probably due to the different characteristics of software artefact repositories as compared to document repositories. In this paper we present a preliminary empirical study to analyze how the size and the vocabulary of the repository - in terms of number of documents and terms (i.e., the vocabulary) - affects the retrieval accuracy. Even if replications are needed to generalize our findings, the study presented in this paper provides some insights that might be used as guidelines for selecting the more adequate methods to be used for traceability recovery depending on the particular application context.

Challenge Track

Traceability Challenge 2013: Statistical Analysis for Traceability Experiments: Software Verification and Validation Research Laboratory (SVVRL) of the University of Kentucky
Mark Hays, Jane Huffman Hayes, Arnold J. Stromberg, and Arne C. Bathke
(University of Kentucky, USA)
An important aspect of traceability experiments is the ability to compare techniques. In order to assure proper comparison, it is necessary to perform statistical analysis of the dependent variables collected from technique application. Currently, there is a lack of components in TraceLab to support such analysis. The Software Verification and Validation Research Laboratory (SVVRL) and the Statistics Department of the University of Kentucky have developed a collection of such components as well as a workflow for determining what type of analysis to apply (parametric, non-parametric). The components use industry-accepted R algorithms. The components have been validated using independent standard statistical algorithms applied to publicly available datasets. This work addresses the Purposed grand challenge (research project 4) and Cost-Effective Grand Challenge (research project 4) as well as the Valued Grand Challenge - research project 6.

Traceability Challenge 2013: Query+ Enhancement for Semantic Tracing (QuEST): Software Verification and Validation Research Laboratory (SVVRL) of the University of Kentucky
Wenbin Li and Jane Huffman Hayes
(University of Kentucky, USA)
We present the process and methods applied in undertaking the Traceability Challenge in addressing the Ubiquitous Grand Challenge, Research Project 3. Terms contained within queries (along with document collection terms, hence the “+”) have been enhanced to include semantic tags that indicate whether a term represents an action or an agent. This information is obtained by calling the Senna semantic role labeling tool. The standard TF-IDF component in TraceLab is then used to recover trace links. The QuEST method was applied to four datasets. Results based on the provided answer sets show that QuEST improved Mean Average Precision (MAP) for two artifact pairs of two of the datasets when the artifacts used natural language, but generally did not outperform unaugmented datasets not using natural language. We provide insights on this finding.

Towards Feature-Aware Retrieval of Refinement Traces
Patrick Rempel, Patrick Mäder, and Tobias Kuschke
(TU Ilmenau, Germany)
Requirements traceability supports practitioners in reaching higher project maturity and better product quality. To gain this support, traces between various artifacts of the software development process are required. Depending on the number of existing artifacts, establishing traces can be a time-consuming and error-prone task. Additionally, the manual creation of traces frequently interrupts the software development process. In order to overcome those problems, practitioners are asking for techniques that support the creation of traces (see Grand Challenge: Ubiquitous (GC-U)). In this paper, we propose the usage of a graph clustering algorithm to support the retrieval of refinement traces. Refinement traces are traces that exist between artifacts created in different phases of a development project, e.g., between features and use cases. We assessed the effectiveness of our approach in several TraceLab experiments. These experiments employ three standard datasets containing differing types of refinement traces. Results show that graph clustering can improve the retrieval of refinement traces and is a step towards the overall goal of ubiquitous traceability.

Configuring Topic Models for Software Engineering Tasks in TraceLab
Bogdan Dit, Annibale Panichella, Evan Moritz, Rocco Oliveto, Massimiliano Di Penta, Denys PoshyvanykORCID logo, and Andrea De LuciaORCID logo
(College of William and Mary, USA; University of Salerno, Italy; University of Molise, Italy; University of Sannio, Italy)
A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.

Trace-by-Classification: A Machine Learning Approach to Generate Trace Links for Frequently Occurring Software Artifacts
Mateusz Wieloch, Sorawit Amornborvornwong, and Jane Cleland-Huang
(DePaul University, USA)
Over the past decade the traceability research community has focused upon developing and improving trace retrieval techniques in order to retrieve trace links between a source artifact, such as a requirement, and set of target artifacts, such as a set of java classes. In this Trace Challenge paper we present a previously published technique that uses machine learning to trace software artifacts that recur is similar forms across across multiple projects. Examples include quality concerns related to non-functional requirements such as security, performance, and usability; regulatory codes that are applied across multiple systems; and architectural-decisions that are found in many different solutions. The purpose of this paper is to release a publicly available TraceLab experiment including reusable and modifiable components as well as associated datasets, and to establish baseline results that would encourage further experimentation.

proc time: 0.48