SEPS 2017 – Proceedings

Message from the Chairs
Welcome to the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems (SEPS 2017) held in Vancouver, Canada on October 23, 2017 and co-located with the ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH 2017). The purpose of this workshop is to provide a stable forum for researchers and practitioners dealing with compelling challenges of the software development life cycle on modern parallel platforms.

Full Papers

MALT: A Malloc Tracker
Sébastien Valat, Andres S. Charif-Rubial, and William Jalby
(Exascale Computing Research, France; University of Versailles, France)
At the beginning of computer science memory management was a big issue with applications requiring to fit in the small amount of available memory (close to a few kilobytes). Hardware evolution has made this resource cheap for the past few years. Available memory is now close to a few hundred gigabytes. But the current evolution in the multi/many-core era tends to make some issues come back. The memory available tends not to follow the increasing number of cores making the memory resource per thread rare again. We also encounter new issues with the requirement to manage a bigger space with many more allocated objects. This new aspect increases the probability of memory leaks. It also increases the probability of memory management performance issues. Hence, with MALT we provide a tool to track the memory allocated by an application. We then map the extracted metrics onto the source code, just like kcachegrind does with valgrind for the CPU performance. Compared to most available tools, MALT can also be used to track potential performance losses due to bad allocation patterns (too many allocations, small allocations, recycling large allocations, short-lived allocations...) thanks to the various metrics it exposes to the user. This paper will detail the metrics extracted by MALT and how we present them to the user thanks to a nice web based graphical interface which is missing with most of the available Linux tools.

Publisher's Version

Performance Analysis and Optimization of the RAMPAGE Metal Alloy Potential Generation Software
Philip C. Roth, Hongzhang Shan, David Riegner, Nikolas Antolin, Sarat Sreepathi, Leonid Oliker, Samuel Williams, Shirley Moore, and Wolfgang Windl
(Oak Ridge National Laboratory, USA; Lawrence Berkeley National Laboratory, USA; Ohio State University, USA)
The Rapid Alloy Method for Producing Accurate, General Empirical potential generation toolkit (RAMPAGE) is a program for fitting multicomponent interatomic potential functions for metal alloys. In this paper, we describe a collaborative effort between domain scientists and performance engineers to improve the parallelism, scalability, and maintainability of the code. We modified RAMPAGE to use the Message Passing Interface (MPI) for communication and synchronization, to use more than one MPI process when evaluating candidate potential functions, and to have its MPI processes execute functionality that was previously executed by external non-MPI processes. We ported RAMPAGE to run on the Eos and Titan Cray systems of the United States Department of Energy (DOE)'s Oak Ridge Leadership Computing Facility (OLCF), and the Cori and Edison systems at the DOE's National Energy Research Scientific Computing Center (NERSC). Our modifications resulted in a 7x speedup on 8 Eos system nodes, and scalability up to 2048 processes on the Cori system with Intel Knights Landing processors. To improve maintainability of the RAMPAGE source code, we introduced several software engineering best practices to the RAMPAGE developers' workflow.

Publisher's Version

The Influence of HPCToolkit and Score-P on Hardware Performance Counters
Jan-Patrick Lehr, Christian Iwainsky, and Christian Bischof
(TU Darmstadt, Germany)
Performance measurement and analysis are commonly carried out tasks for high-performance computing applications. Both sampling and instrumentation approaches for performance measurement can capture hardware performance counter (HWPC) metrics to asses the software's ability to use the functional units of the processor. Since the measurement software usually executes on the same processor, it necessarily competes with the target application for hardware resources. Consequently, the measurement system perturbs the target application, which often results in runtime overhead. While the runtime overhead of different measurement techniques has been previously studied, it has not been thoroughly examined to what extent HWPC values are perturbed by the measurement process. In this paper, we investigate the influence of the two widely-used performance measurement systems HPCToolkit (sampling) and Score-P (instrumentation) w.r.t. their influence on HWPC. Our experiments on the SPEC CPU 2006 C/C++ benchmarks show that, while Score-P's default instrumentation can massively increase runtime, it does not always heavily perturb relevant HWPC. On the other hand, HPCToolkit shows no significant runtime overhead, but significantly influences some relevant HWPC. We conclude that for every performance experiment sufficient baseline measurements are essential to identify the HWPC that remain valid indicators of performance for a given measurement technique. Thus, performance analysis tools need to offer easily accessible means to automate the baseline and validation functionality.

Publisher's Version

Transactional Actors: Communication in Transactions
Janwillem Swalens, Joeri De Koster, and Wolfgang De Meuter
(Vrije Universiteit Brussel, Belgium)
Developers often require different concurrency models to fit the various concurrency needs of the different parts of their applications. Many programming languages, such as Clojure, Scala, and Haskell, cater to this need by incorporating different concurrency models. It has been shown that, in practice, developers often combine these concurrency models. However, they are often combined in an ad hoc way and the semantics of the combination is not always well-defined. The starting hypothesis of this paper is that different concurrency models need to be carefully integrated such that the properties of each individual model are still maintained.
This paper proposes one such combination, namely the combination of the actor model and software transactional memory. In this paper we show that, while both individual models offer strong safety guarantees, these guarantees are no longer valid when they are combined. The main contribution of this paper is a novel hybrid concurrency model called transactional actors that combines both models while preserving their guarantees. This paper also presents an implementation in Clojure and an experimental evaluation of the performance of the transactional actor model.

Publisher's Version

Position Papers

How to Test Your Concurrent Software: An Approach for the Selection of Testing Techniques
Silvana Morita Melo, Simone do Rocio Senger de Souza, Paulo Sergio Lopes de Souza, and Jeffrey C. Carver
(University of São Paulo, Brazil; University of Alabama, USA)
High-Performance Computing (HPC) applications consist of concurrent programs with multi-process and/or multithreaded models with varying degrees of parallelism. Although their design patterns, models, and principles are similar to those of sequential ones, their non-deterministic behavior makes the testing activity more complex. In an attempt to solve such complexity, several techniques for concurrent software testing have been developed over the past years. However, the transference of knowledge between academy and industry remains a challenge, mainly due to the lack of a solid base of evidence with information that assists the decision-making process. This paper proposes the construction of a body of evidence for the concurrent programming field that supports the selection of an adequate testing technique for a software project. We propose a characterization schema which assists the decision-making support and is based on relevant information from the technical literature regarding available techniques, attributes, and concepts of concurrent programming that affect the testing process. The schema classified 109 studies that compose the preliminary body of evidence. A survey was conducted with specialists for the validation of the schema, regarding adequacy and relevance of the attributes defined. The results indicate the schema is effective and can support testing teams for concurrent applications.

Publisher's Version

Declaring Lua Data Types for GPU Code Generation
Paulo Motta
(Motta & SantAnna Pesquisa e Desenvolvimento, Brazil)
Some effort has been employed to allow interpreted languages to be able to take advantage of the computing capabilities of GPUs. Using interpreted languages allows to abstract the hardware and its specificities away from the user application, making development less complicated. However, due to hardware dependencies, the code needs to be compiled before execution. We want to compile a Lua function into a GPU kernel as transparently as possible, allowing the user to access the underlying hardware, without the complexities related to the traditional GPU programming. This scenario presents a great challenge on how to infer the variables data types while interfering as little as possible on the user programming paradigm.

Publisher's Version

Facilitating Collaboration in High-Performance Computing Projects with an Interaction Room
Matthias Book, Morris Riedel, Helmut Neukirchen, and Markus Götz
(University of Iceland, Iceland; Jülich Supercomputing Centre, Germany)
The design, development and deployment of scientific computing applications can be quite complex as they require scientific, high-performance computing (HPC), and software engineering expertise. Often, HPC applications are however developed by end users who are experts in their scientific domain, but need support from a supercomputing centre for the engineering and optimization aspects. The cooperation and communication between experts from these quite different disciplines can be difficult though. We therefore propose to employ the Interaction Room, a technique that facilitates interdisciplinary collaboration in complex software projects.

Publisher's Version

SEPS 2017 – Proceedings

4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems (SEPS 2017)

Frontmatter

Full Papers

Position Papers