ICSE 2011 – Proceedings

A Case Study of Measuring Process Risk for Early Insights into Software Safety
Lucas Layman, Victor R. Basili, Marvin V. Zelkowitz, and Karen L. Fisher
(Fraunhofer CESE, USA; University of Maryland, USA; NASA Goddard Spaceflight Center, USA)
In this case study, we examine software safety risk in three flight hardware systems in NASA’s Constellation spaceflight program. We applied our Technical and Process Risk Measurement (TPRM) methodology to the Constellation hazard analysis process to quantify the technical and process risks involving software safety in the early design phase of these projects. We analyzed 154 hazard reports and collected metrics to measure the prevalence of software in hazards and the specificity of descriptions of software causes of hazardous conditions. We found that 49-70% of 154 hazardous conditions could be caused by software or software was involved in the prevention of the hazardous condition. We also found that 12-17% of the 2013 hazard causes involved software, and that 23-29% of all causes had a software control. The application of the TRPM methodology identified process risks in the application of the hazard analysis process itself that may lead to software safety risk.

Model-Driven Engineering Practices in Industry
John Hutchinson, Mark Rouncefield, and Jon Whittle
(Lancaster University, UK)
In this paper, we attempt to address the relative absence of empirical studies of model driven engineering through describing the practices of three commercial organizations as they adopted a model driven engineering approach to their software development. Using in-depth semi-structured interviewing we invited practitioners to reflect on their experiences and selected three to use as exemplars or case studies. In documenting some details of attempts to deploy model driven practices, we identify some ‘lessons learned’, in particular the importance of complex organizational, managerial and social factors – as opposed to simple technical factors – in the relative success, or failure, of the endeavour. As an example of organizational change management the successful deployment of model driven engineering appears to require: a progressive and iterative approach; transparent organizational commitment and motivation; integration with existing organizational processes and a clear business focus.

SORASCS: A Case Study in SOA-based Platform Design for Socio-Cultural Analysis
Bradley Schmerl, David Garlan, Vishal Dwivedi, Michael W. Bigrigg, and Kathleen M. Carley
(CMU, USA)
An increasingly important class of software-based systems is platforms that permit integration of third-party components, services, and tools. Service-Oriented Architecture (SOA) is one such platform that has been successful in providing integration and distribution in the business domain, and could be effective in other domains (e.g., scientific computing, healthcare, and complex decision making). In this paper, we discuss our application of SOA to provide an integration platform for socio-cultural analysis, a domain that, through models, tries to understand, analyze and predict relationships in large complex social systems. In developing this platform, called SORASCS, we had to overcome issues we believe are generally applicable to any application of SOA within a domain that involves technically naïve users and seeks to establish a sustainable software ecosystem based on a common integration platform. We discuss these issues, the lessons learned about the kinds of problems that occur, and pathways toward a solution.

Industry Software Architecture

A Method for Selecting SOA Pilot Projects Including a Pilot Metrics Framework
Liam O'Brien, James Gibson, and Jon Gray
(CSIRO, Australia; ANU, Australia; NICTA, Australia)
Many organizations are introducing Service Oriented Architecture (SOA) as part of their business transformation projects to take advantage of the proposed benefits associated with using SOA. However, in many cases organizations don’t necessarily know on which projects introducing SOA would be of value and show real benefits to the organization. In this paper we outline a method and pilot metrics framework (PMF) to help organization’s select from a set of candidate projects those which would be most suitable for piloting SOA. The PMF is used as part of a method based on identifying a set of benefit and risk criteria, investigating each of the candidate projects, mapping them to the criteria and then selecting the most suitable project(s). The paper outlines a case study where the PMF was applied in a large government organization to help them select pilot projects and develop an overall strategy for introducing SOA into their organization.

Architecture Evaluation without an Architecture: Experience with the Smart Grid
Rick Kazman, Len Bass, James Ivers

, and Gabriel A. Moreno
(SEI/CMU, USA; University of Hawaii, USA)
This paper describes an analysis of some of the challenges facing one portion of the Smart Grid in the United States—residential Demand Response (DR) systems. The purposes of this paper are twofold: 1) to discover risks to residential DR systems and 2) to illustrate an architecture-based analysis approach to uncovering risks that span a collection of technical and social concerns. The results presented here are specific to residential DR but the approach is general and it could be applied to other systems within the Smart Grid and other critical infrastructure domains. Our architecture-based analysis is different from most other approaches to analyzing complex systems in that it addresses multiple quality attributes simultaneously (e.g., performance, reliability, security, modifiability, usability, etc.) and it considers the architecture of a complex system from a socio-technical perspective where the actions of the people in the system are as important, from an analysis perspective, as the physical and computational elements of the system. This analysis can be done early in a system’s lifetime, before substantial resources have been committed to its construction or procurement, and so it provides extremely cost-effective risk analysis.

Bringing Domain-Specific Languages to Digital Forensics
Jeroen van den Bos and Tijs van der Storm

(Netherlands Forensic Institute, Netherlands; Centrum Wiskunde en Informatica, Netherlands)
Digital forensics investigations often consist of analyzing large quantities of data. The software tools used for analyzing such data are constantly evolving to cope with a multiplicity of versions and variants of data formats. This process of customization is time consuming and error prone. To improve this situation we present Derric, a domainspecific language (DSL) for declaratively specifying data structures. This way, the specification of structure is separated from data processing. The resulting architecture encourages customization and facilitates reuse. It enables faster development through a division of labour between investigators and software engineers. We have performed an initial evaluation of Derric by constructing a data recovery tool. This so-called carver has been automatically derived from a declarative description of the structure of JPEG files. We compare it to existing carvers, and show it to be in the same league both with respect to recovered evidence, and runtime performance.

Software Engineering at Large

Building and Using Pluggable Type-Checkers
Werner Dietl, Stephanie Dietzel, Michael D. Ernst

, Kıvanç Muşlu, and Todd W. Schiller
(University of Washington, USA)
This paper describes practical experience building and using pluggable type-checkers. A pluggable type-checker refines (strengthens) the built-in type system of a programming language. This permits programmers to detect and prevent, at compile time, defects that would otherwise have been manifested as run-time errors. The prevented defects may be generally applicable to all programs, such as null pointer dereferences. Or, an application-specific pluggable type system may be designed for a single application. We built a series of pluggable type checkers using the Checker Framework, and evaluated them on 2 million lines of code, finding hundreds of bugs in the process. We also observed 28 first-year computer science students use a checker to eliminate null pointer errors in their course projects. Along with describing the checkers and characterizing the bugs we found, we report the insights we had throughout the process. Overall, we found that the type checkers were easy to write, easy for novices to productively use, and effective in finding real bugs and verifying program properties, even for widely tested and used open source projects.

Deploying CogTool: Integrating Quantitative Usability Assessment into Real-World Software Development
Rachel Bellamy, Bonnie E. John, and Sandra Kogan
(IBM Research Watson, USA; CMU, USA; IBM Software Group, USA)
Usability concerns are often difficult to integrate into real-world software development processes. To remedy this situation, IBM research and development, partnering with Carnegie Mellon University, has begun to employ a repeatable and quantifiable usability analysis method, embodied in CogTool, in its development practice. CogTool analyzes tasks performed on an interactive system from a storyboard and a demonstration of tasks on that storyboard, and predicts the time a skilled user will take to perform those tasks. We discuss how IBM designers and UX professionals used CogTool in their existing practice for contract compliance, communication within a product team and between a product team and its customer, assigning appropriate personnel to fix customer complaints, and quantitatively assessing design ideas before a line of code is written. We then reflect on the lessons learned by both the development organizations and the researchers attempting this technology transfer from academic research to integration into real-world practice, and we point to future research to even better serve the needs of practice.

Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL
Daniel Port, Allen Nikora, Jairus Hihn, and LiGuo Huang
(University of Hawaii, USA; Jet Propulsion Laboratory, USA; Southern Methodist University, USA)
Often repositories of systems engineering artifacts at NASA’s Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick “wins” or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.

Software Metrics

An Evaluation of the Internal Quality of Business Applications: Does Size Matter?
Bill Curtis, Jay Sappidi, and Jitendra Subramanyam
(CAST, USA)
This study summarizes results of a study of the internal, structural quality of 288 business applications comprising 108 million lines of code collected from 75 companies in 8 industry segments. These applications were submitted to a static analysis that evaluates quality within and across application components that may be coded in different languages. The analysis consists of evaluating the application against a repository of over 900 rules of good architectural and coding practice. Results are presented for measures of security, performance, and changeability. The effect of size on quality is evaluated, and the ability of modularity to reduce the impact of size is suggested by the results.

Characterizing the Differences Between Pre- and Post- Release Versions of Software
Paul Luo Li, Ryan Kivett, Zhiyuan Zhan, Sung-eok Jeon, Nachiappan Nagappan, Brendan Murphy, and Andrew J. Ko
(Microsoft Inc., USA; University of Washington, USA; Microsoft Research, USA)
Many software producers utilize beta programs to predict postrelease quality and to ensure that their products meet quality expectations of users. Prior work indicates that software producers need to adjust predictions to account for usage environments and usage scenarios differences between beta populations and postrelease populations. However, little is known about how usage characteristics relate to field quality and how usage characteristics differ between beta and post-release. In this study, we examine application crash, application hang, system crash, and usage information from millions of Windows® users to 1) examine the effects of usage characteristics differences on field quality (e.g. which usage characteristics impact quality), 2) examine usage characteristics differences between beta and post-release (e.g. do impactful usage characteristics differ), and 3) report experiences adjusting field quality predictions for Windows. Among the 18 usage characteristics that we examined, the five most important were: the number of application executed, whether the machines was pre-installed by the original equipment manufacturer, two sub-populations (two language/geographic locales), and whether Windows was 64-bit (not 32-bit). We found each of these usage characteristics to differ between beta and post-release, and by adjusting for the differences, accuracy of field quality predictions for Windows improved by ~59%.

Why Software Quality Improvement Fails (and How to Succeed Nevertheless)
Jonathan Streit and Markus Pizka
(itestra GmbH, Germany)
Quality improvement is the key to enormous cost reduction in the IT business. However, improvement projects often fail in practice. In many cases, stakeholders fearing, e.g., a loss of power or not recognizing the benefits inhibit the improvement. Systematic change management and an economic perspective help to overcome these issues, but are little known and seldom applied. This industrial experience report presents the main challenges in software quality improvement projects as well as practices for tackling them. The authors have performed over 50 quality analyses and quality improvement projects in mission-critical software systems of European banking, insurance and automotive companies.

Software Testing and Analysis

Code Coverage Analysis in Practice for Large Systems
Yoram Adler, Noam Behar, Orna Raz, Onn Shehory, Nadav Steindler, Shmuel Ur, and Aviad Zlotnick
(IBM Research Haifa, Israel; Microsoft, Israel; Shmuel Ur Innovation, Israel)
Large systems generate immense quantities of code coverage data. A user faced with the task of analyzing this data, for example, to decide on test areas to improve, faces a ’needle in a haystack’ problem. In earlier studies we introduced substring hole analysis, a technique for presenting large quantities of coverage data in a succinct way. Here we demonstrate the successful use of substring hole analysis on large scale data from industrial software systems. For this end we augment substring hole analysis by introducing a work flow and tool support for practical code coverage analysis. We conduct real data experiments indicating that augmented substring hole analysis enables code coverage analysis where it was previously impractical, correctly identifies functionality that is missing from existing tests, and can increase the probability of finding bugs. These facilitate cost-effective code coverage analysis.

Practical Change Impact Analysis Based on Static Program Slicing for Industrial Software Systems
Mithun Acharya and Brian Robinson
(ABB Corporate Research, USA)
Change impact analysis, i.e., knowing the potential consequences of a software change, is critical for the risk analysis, developer effort estimation, and regression testing of evolving software. Static program slicing is an attractive option for enabling routine change impact analysis for newly committed changesets during daily software build. For small programs with a few thousand lines of code, static program slicing scales well and can assist precise change impact analysis. However, as we demonstrate in this paper, static program slicing faces unique challenges when applied routinely on large and evolving industrial software systems. Despite recent advances in static program slicing, to our knowledge, there have been no studies of static change impact analysis applied on large and evolving industrial software systems. In this paper, we share our experiences in designing a static change impact analysis framework for such software systems. We have implemented our framework as a tool called Imp and have applied Imp on an industrial codebase with over a million lines of C/ C++ code with promising empirical results.

Value-Based Program Characterization and Its Application to Software Plagiarism Detection
Yoon-Chan Jhi, Xinran Wang, Xiaoqi Jia, Sencun Zhu, Peng Liu, and Dinghao Wu
(Pennsylvania State University, USA; Chinese Academy of Sciences, China)
Identifying similar or identical code fragments becomes much more challenging in code theft cases where plagiarizers can use various automated code transformation techniques to hide stolen code from being detected. Previous works in this field are largely limited in that (1) most of them cannot handle advanced obfuscation techniques; (2) the methods based on source code analysis are less practical since the source code of suspicious programs is typically not available until strong evidences are collected; and (3) those depending on the features of specific operating systems or programming languages have limited applicability. Based on an observation that some critical runtime values are hard to be replaced or eliminated by semanticspreserving transformation techniques, we introduce a novel approach to dynamic characterization of executable programs. Leveraging such invariant values, our technique is resilient to various control and data obfuscation techniques. We show how the values can be extracted and refined to expose the critical values and how we can apply this runtime property to help solve problems in software plagiarism detection. We have implemented a prototype with a dynamic taint analyzer atop a generic processor emulator. Our experimental results show that the value-based method successfully discriminates 34 plagiarisms obfuscated by SandMark, plagiarisms heavily obfuscated by KlassMaster, programs obfuscated by Thicket, and executables obfuscated by Loco/Diablo.

Tools and Environments

A Comparison of Model-based and Judgment-based Release Planning in Incremental Software Projects
Hans Christian Benestad and Jo E. Hannay
(Simula Research Laboratory, Norway)
Numerous factors are involved when deciding when to implement which features in incremental software development. To facilitate a rational and eﬃcient planning process, release planning models make such factors explicit and compute release plan alternatives according to optimization principles. However, experience suggests that industrial use of such models is limited. To investigate the feasibility of model and tool support, we compared input factors assumed by release planning models with factors considered by expert planners. The former factors were cataloged by systematically surveying release planning models, while the latter were elicited through repertory grid interviews in three software organizations. The findings indicate a substantial overlap between the two approaches. However, a detailed analysis reveals that models focus on only select parts of a possibly larger space of relevant planning factors. Three concrete areas of mismatch were identified: (1) continuously evolving requirements and specifications, (2) continuously changing prioritization criteria, and (3) authority-based decision processes. With these results in mind, models, tools and guidelines can be adjusted to address better real-life development processes.

An Industrial Case Study on Quality Impact Prediction for Evolving Service-Oriented Software
Heiko Koziolek, Bastian Schlich, Carlos Bilich, Roland Weiss, Steffen Becker, Klaus Krogmann, Mircea Trifu, Raffaela Mirandola, and Anne Koziolek
(ABB Corporate Research, Germany; University of Paderborn, Germany; FZI, Germany; Politecnico di Milano, Italy; KIT, Germany)
Systematic decision support for architectural design decisions is a major concern for software architects of evolving service-oriented systems. In practice, architects often analyse the expected performance and reliability of design alternatives based on prototypes or former experience. Model-driven prediction methods claim to uncover the tradeoffs between different alternatives quantitatively while being more cost-effective and less error-prone. However, they often suffer from weak tool support and focus on single quality attributes. Furthermore, there is limited evidence on their effectiveness based on documented industrial case studies. Thus, we have applied a novel, model-driven prediction method called Q-ImPrESS on a large-scale process control system consisting of several million lines of code from the automation domain to evaluate its evolution scenarios. This paper reports our experiences with the method and lessons learned. Benefits of Q-ImPrESS are the good architectural decision support and comprehensive tool framework, while one drawback is the time-consuming data collection.

Enabling the Runtime Assertion Checking of Concurrent Contracts for the Java Modeling Language
Wladimir Araujo, Lionel C. Briand, and Yvan Labiche

(Juniper Networks, Canada; Simula Research Laboratory, Norway; University of Oslo, Norway; Carleton University, Canada)
Though there exists ample support for Design by Contract (DbC) for sequential programs, applying DbC to concurrent programs presents several challenges. In previous work, we extended the Java Modeling Language (JML) with constructs to specify concurrent contracts for Java programs. We present a runtime assertion checker (RAC) for the expanded JML capable of verifying assertions for concurrent Java programs. We systematically evaluate the validity of system testing results obtained via runtime assertion checking using actual concurrent and functional faults on a highly concurrent industrial system from the telecommunications domain.