SE-CSE 2013 – Proceedings

Foreword
Welcome to the 2013 International Workshop on Software Engineering for Computational Science & Engineering. We are excited about the program for this year. The submissions represented a good mix of papers from the software engineering perspective and papers from the computational science & engineering perspective. This year we accepted two types of papers: full papers (10 pages) and position papers (4 pages). We received a total of 20 submissions (15 full papers and 5 position papers). After a thorough review process, the program committee and organizing committee selected 10 full papers and 4 position papers for inclusion in this year’s program.

Full Papers

The Software Development Process of FLASH, a Multiphysics Simulation Code
Anshu Dubey, Katie Antypas, Alan Calder, Bruce Fryxell, Don Lamb, Paul Ricker, Lynn Reid, Katherine Riley, Robert Rosner, Andrew Siegel, Francis Timmes, Natalia Vladimirova, and Klaus Weide
(University of Chicago, USA; Lawrence Berkeley National Laboratory, USA; Stony Brook University, USA; University of Michigan, USA; University of Illinois at Urbana-Champaign, USA; University of Western Australia, Australia; Argonne National Laboratory, USA; Arizona State University, USA; University of New Mexico, USA)
The FLASH code has evolved into a modular and extensible scientific simulation software system over the decade of its existence. During this time it has been cumulatively used by over a thousand researchers in several scientific communities (i.e. astrophysics, cosmology, high-energy density physics, turbulence, fluid-structure interactions) to obtain results for research. The code started its life as an amalgamation of two already existing software packages and sections of other codes developed independently by various participating members of the team for other purposes. In the evolution process it has undergone four major revisions, three of which involved a significant architectural advancement. A corresponding evolution of the software process and policies for maintenance occurred simultaneously. The code is currently in its 4.x release with a substantial user community. Recently there has been an upsurge in the contributions by external users; some provide significant new capability. This paper outlines the software development and evolution processes that have contributed to the success of the FLASH code.

A Case Study: Agile Development in the Community Laser-Induced Incandescence Modeling Environment (CLiiME)
Aziz Nanthaamornphong, Karla Morris

, Damian W. I. Rouson, and Hope A. Michelsen
(University of Alabama, USA; Sandia National Laboratories, USA)
The multidisciplinary requirements of current computational modeling problems preclude the development of scientific software that is maintained and used by selected scientists. The multidisciplinary nature of these efforts requires the development of large scale software projects established with a wide developer and user base in mind. This article describes some of the software-engineering practices adopted in a scientific-software application for a laser-induced incandescence community model. The project uses an Agile and Test-Driven Development approach to implement the infrastructure for the development of a collaborative model that is to be extended, modified, and used by different researchers. We discuss some of the software-engineering practices that can be exploited through the life of a project, starting with its inception when only a hand full of developers are contributing to the project and the mechanism we have put in place in order to allow the natural expansion of the model.

Binary Instrumentation Support for Measuring Performance in OpenMP Programs
Mustafa Elfituri, Jeanine Cook, and Jonathan Cook
(New Mexico State University, USA)
In parallel computations, evaluating the causes of poor speedup is an important development activity to reach the goal of creating the most efficient parallel computation possible. In our research on irregular parallel computations, especially graph algorithms, we had specific measurement needs for which a dearth of tools could be found. We created PGOMP, a small library-based profiling tool for the Gnu OpenMP implementation, and show its use here in discovering some of the causes of poor speedup in graph computations.

Software Design for Decoupled Parallel Meshing of CAD Models
Serban Georgescu and Peter Chow
(Fujitsu Labs, UK)
The creation of Finite Element (FE) meshes is one of the most time-consuming steps in FE analysis. While the exponential increase in computational power, following Moore's law, has gradually reduced the time spent in the FE solver, this has not generally been the case for FE mesh creation software. There are two main reason why this has been the case: most FE meshers are still serial and human intervention is generally required. In this paper we present the design of a system that tackles both these issues. More specifically, this paper proposes a system that, in combination with an unmodified off-the-shelf serial meshing program and an off-the-shelf CAD kernel, results in a fast and scalable tool capable of meshing complex CAD models, such as the ones used in industry, with reduced user intervention. To achieve scalability, our system uses two levels of parallelism: assembly level parallelism - across the multiple parts found in an assembly-type CAD model, and part level parallelism - obtained by partitioning individual CAD solids in multiple sections at the CAD level. We show preliminary results for the parallel meshing of a complex laptop model via which we highlight both some of the achieved benefits and the main challenges that need to be addressed in order to obtain good scalability.

Scientific Software Process Improvement Decisions: A Proposed Research Strategy
Erika S. Mesh and J. Scott Hawker
(Rochester Institute of Technology, USA)
Scientific research is hard enough; software shouldn't make it harder. While traditional software engineering development and management practices have been shown to be effective in scientific software projects, adoption of these practices has been limited. Rather than presume to create a prescriptive scientific software process improvement manual or leave scientists to determine their own plans with only minimal references as support, we posit that a hybrid approach is required to adequately support and guide scientific SPI decisions. This paper presents a grounded theory approach for determining the driving factors of scientific software process planning activities in order to generate supporting data for a proposed Scientific Software Process Improvement Framework (SciSPIF).

Water Science Software Institute: An Open Source Engagement Process
Stan Ahalt, Larry Band, Barbara Minsker, Margaret Palmer, Michael Tiemann, Ray Idaszak, Chris Lenhardt, and Mary Whitton
(RENCI, USA; University of North Carolina at Chapel Hill, USA; NCSA, USA; SESYNC, USA; Red Hat, USA)
We have conceptualized a public/private Water Science Software Institute (WSSI) whose mission is “to enable and accelerate transformative water science by concurrently transforming both the software culture and the research culture of the water science community”. To achieve our goals, we have applied an Open Community Engagement Process (OCEP), based in large part on the principles and practices of Agile and Open Source software development. This manuscript describes the WSSI and the OCEP model we have developed to operationalize the WSSI.

Techniques for Testing Scientific Programs Without an Oracle
Upulee Kanewala and James M. Bieman
(Colorado State University, USA)
The existence of an oracle is often assumed in software testing. But in many situations, especially for scientific programs, oracles do not exist or they are too hard to implement. This paper examines three techniques that are used to test programs without oracles: (1) Metamorphic testing, (2) Run-time Assertions and (3) Developing test oracles using machine learning. We examine these methods in terms of their (1) fault finding ability, (2) automation, and (3) required domain knowledge. Several case studies apply these three techniques to effectively test scientific programs that do not have oracles. Certain techniques have reported a better fault finding ability than the others when testing specific programs. Finally, there is potential to increase the level of automation of these techniques, thereby reducing the required level of domain knowledge. Techniques that can potentially be automated include (1) detection of likely metamorphic relations, (2) static analyses to eliminate spurious invariants and (3) structural analyses to develop machine learning generated oracles.

Design and Rationale of a Quality Assurance Process for a Scientific Framework
Hanna Remmel, Barbara Paech, Christian Engwer, and Peter Bastian
(University of Heidelberg, Germany; University of Münster, Germany)
The testing of scientific frameworks is a challenging task. The special characteristics of scientific software e.g. missing test oracle, the need for high performance parallel computing, and high priority of non-functional requirements, need to be accounted for as well as the large variability in a framework. In our previous research, we have shown how software product line engineering can be applied to support the testing of scientific frameworks. We developed a process for handling the variability of a framework using software product line (SPL) variability modeling. From the variability models, we derive test applications and use them for system tests for the framework. In this paper we examine the overall quality assurance for a scientific framework. First, we propose a SPL test strategy for scientific frameworks called Variable test Application strategy for Frameworks (VAF). This test strategy tests both, commonality and variability, of the framework and supports the frameworks users in testing their applications by creating reusable test artifacts. We operationalize VAF with test activities that are combined with other quality assurance activities to form the design of a quality assurance process for scientific frameworks. We introduce a list of special characteristics for scientific software that we use as rationale for the design of this process.

Implementing Continuous Integration Software in an Established Computational Chemistry Software Package
Robin M. Betz and Ross C. Walker
(San Diego Supercomputer Center, USA; UC San Diego, USA)
Continuous integration is the software engineering principle of rapid and automated development and testing. We identify several key points of continuous integration and demonstrate how they relate to the needs of computational science projects by discussing the implementation and relevance of these principles to AMBER, a large and widely used molecular dynamics software package. The use of a continuous integration server has both improved collaboration and communication between AMBER developers, who are globally distributed, as well as making failure and benchmark information that would be time consuming for individual developers to obtain by themselves, available in real time. Continuous integration servers currently available are aimed at the software engineering community and can be difficult to adapt to the needs of computational science projects, however as demonstrated in this paper the effort payoff can be rapid since uncommon errors are found and contributions from geographically separated researchers are unified into one easily-accessible web-based interface.

Practical Formal Correctness Checking of Million-Core Problem Solving Environments for HPC
Diego Caminha B. de Oliveira, Zvonimir Rakamarić, Ganesh Gopalakrishnan, Alan Humphrey, Qingyu Meng, and Martin Berzins
(University of Utah, USA)
While formal correctness checking methods have been deployed at scale in a number of important practical domains, we believe that such an experiment has yet to occur in the domain of high performance computing at the scale of a million CPU cores. This paper presents preliminary results from the Uintah Runtime Verification (URV) project that has been launched with this objective. Uintah is an asynchronous task-graph based problem-solving environment that has shown promising results on problems as diverse as fluid-structure interaction and turbulent combustion at well over 200K cores to date. Uintah has been tested on leading platforms such as Kraken, Keenland, and Titan consisting of multicore CPUs and GPUs, incorporates several innovative design features, and is following a roadmap for development well into the million core regime. The main results from the URV project to date are crystallized in two observations: (1) A diverse array of well-known ideas from light-weight formal methods and testing/observing HPC systems at scale have an excellent chance of succeeding. The real challenges are in finding out exactly which combinations of ideas to deploy, and where. (2) Large-scale problem solving environments for HPC must be designed such that they can be "crashed early" (at smaller scales of deployment) and "crashed often" (have effective ways of input generation and schedule perturbation that cause vulnerabilities to be attacked with higher probability). Furthermore, following each crash, one must "explain well" (given the extremely obscure ways in which an error finally manifests itself, we must develop ways to record information leading up to the crash in informative ways, to minimize off-site debugging burden). Our plans to achieve these goals and to measure our success are described. We also highlight some of the broadly applicable concepts and approaches.

Position Papers

DSLs, DLA, DxT, and MDE in CSE
Bryan Marker, Robert Van de Geijn, and Don Batory
(University of Texas at Austin, USA)
We narrate insights from a collaboration between researchers in Software Engineering (SE) and in the domain of Dense Linear Algebra (DLA) libraries. We highlight our impressions of how software development for computational science has traditionally been different from the development of software in other domains. We observe that scientific software (at least DLA libraries) is often developed by domain experts rather than legions of programmers. For this reason, researchers in SE need to impact the productivity of experts rather than the productivity of the masses. We document this and other lessons learned.

Towards Flexible Automated Support to Improve the Quality of Computational Science and Engineering Software
Davide Falessi and Forrest Shull
(Fraunhofer CESE, USA)
Continual evolution of the available hardware (e.g. in terms of increasing size, architecture, and computing power) and software (e.g. reusable libraries) is the norm rather than exception. Our goal is to enable CSE developers to spend more of their time finding scientific results by capitalizing on these evolutions instead of being stuck in fixing software engineering (SE) problems such as porting the application to new hardware, debugging, reusing (unreliable) code, and integrating open source libraries. In this paper we sketch a flexible automated solution supporting scientists and engineers in developing accurate and reliable CSE applications. This solution, by collecting and analyzing product and process metrics, enables the application of well-established software engineering best practices (e.g., separation of concerns, regression testing and inspections) and it is based upon the principles of automation, flexibility and iteration.

Implicit Provenance Gathering through Configuration Management
Vitor C. Neves, Vanessa Braganholo, and Leonardo Murta
(UFF, Brazil)
Scientific experiments based on computer simulations usually consume and produce huge amounts of data. Data provenance is used to help scientists answer queries related to how experiment data were generated or changed. However, during the experiment execution, data not explicitly referenced by the experiment specification may lead to an implicit data flow missed by the existing provenance gathering infrastructures. This paper introduces a novel approach to gather and store implicit data flow provenance through configuration management. Our approach opens some new opportunities in terms of provenance analysis, such as identifying implicit data flows, identifying data transformations along an experiment trial, comparing data evolution in different trials of the same experiment, and identifying side effects on data evolution caused by implicit data flows.

Exploring Issues in Software Systems Used and Developed by Domain Experts
Jette Henderson and Dewayne E. Perry
(University of Texas at Austin, USA)
Software engineering researchers have paid a good deal of attention to fault cause, discovery, and repair in software systems developed by software professionals. Yet, not all software is developed by software professionals, and consequently, not as much research has been conducted that explores fault cause, discovery, and repair in software systems developed by domain experts. In this exploratory paper, we outline research plans for studying the types of faults that domain experts encounter when developing software for their own research. To attain this goal we propose a multiple case study that will allow us to explore questions about domain expert software use, needs, and development.

SE-CSE 2013 – Proceedings

5th International Workshop on Software Engineering for Computational Science and Engineering (SE-CSE)

Preface

Full Papers

Position Papers