PLDI 2018 Workshops
39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018)
Powered by
Conference Publishing Consulting

5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2018), June 19, 2018, Philadelphia, PA, USA

ARRAY 2018 – Advance Table of Contents

Contents - Abstracts - Authors

5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2018)


Title Page

Message from the Chairs
Welcome to the proceedings of the fifth ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, held on June 19, 2018, in conjunction with PLDI 2018 in Philadelphia, USA. The purpose of the workshop is to discuss experience with, and future developments of, programming with arrays, as well as general aspects of Computer Science loosely centered on the general theme of arrays. The intention of the steering committee is that the workshop provide an annual focal point where the array-programming community can gather and share ideas: researchers, educators, implementors, programmers, hobbyists, and enthusiasts of all stripes—all welcome.

Array Language Commonalities

A Rosetta Stone for Array Languages
Artjoms Šinkarovs, Robert Bernecky, Hans-Nikolai Vießmann, and Sven-Bodo Scholz
(Heriot-Watt University, UK; Snake Island Research, Canada)
This paper aims to foster cross-fertilisation between programming language and compiler research performed on different array programming language infrastructures. We study how to enable better comparability of concepts and techniques by looking into generic translations between array languages. Our approach is based on the idea of a basic core language Heh which only captures the absolute essentials of array languages: multi-dimensional arrays and shape-invariant operations on them. Subsequently, we investigate how to map these constructs into several existing languages: SaC, APL, Julia, Python, and C. This approach provides us with some first understanding on how the peculiarities of these languages affect their suitability for expressing the basic building-blocks of array languages. We show that the existing tool-chains by-and-large are very sensitive to the way code is specified. Furthermore, we expose several fundamental programming patterns where optimisations missing in one or the other tool chain inhibit fair comparisons and, with it, cross-fertilisation.
Article Search

Exploiting Dynamic Information

Petalisp: Run Time Code Generation for Operations on Strided Arrays
Marco Heisig and Harald Köstler
(Friedrich-Alexander University Erlangen-Nürnberg, Germany)
We present the data parallel programming library Petalisp --- an extension of Common Lisp for data parallel programming. The core of Petalisp is deliberately simple. It features only a single data structure --- the strided array --- and four operations on such strided arrays, namely element-wise application of functions, reduction with a binary function, fusion of several arrays and affine-linear reshaping. The novelty of our approach is that the contents of each strided array are subject to lazy evaluation and that we delay performance analysis, optimization and code generation entirely to the run time. In doing so, we considerably increase the ability of our compiler to make qualified decisions, at the price of significant run time overhead. We show that this overhead is manageable and that Petalisp is able to execute several thousand explicit array evaluations per second.
Article Search
Profile-Based Vectorization for MATLAB
Patryk Kiepas, Jaroslaw Kozlak, Claude Tadonki, and Corinne Ancourt
(MINES ParisTech, France; AGH University of Science and Technology, Poland)
In recent years, MATLAB's just-in-time (JIT) interpreter has improved the execution time of for-loops to the extent that loops can outperform equivalent array operations in some scenarios. This has caused systematic translation of loops to array operations, a prevalent approach for performance improvement in MATLAB, to sometimes yield a performance loss. Therefore, we propose a selective strategy to loop translation with selection criteria guided by loop profiling data. As a result, only loops with a high-performance speedup potential are selected for translation to array operations. The results of our experiments confirm the efficiency of our approach and illustrate the cases where systematic translation leads to a performance degradation.
Article Search Info

Types and Correctness

Parallel Programming with Arrays in Kappa 
Beatrice Åkerblom, Elias Castegren, and Tobias Wrigstad
(Stockholm University, Sweden; Uppsala University, Sweden)
Array algorithms where operations are applied to disjoint parts of an array lend themselves well to parallelism, since parallel threads can operate on the parts of the array without synchronisation. However, implementing such algorithms requires programmer diligence to verify that a thread does not accidentally access an element of the array while another thread is updating the same element. An off-by-one error can lead to data-races and non-deterministic bugs which are notoriously hard to track down. Previous work on Kappa, a capability-based type system, provides data-race freedom for concurrent, object-oriented programs, even in the presence of concurrent mutating accesses to the same object. In this paper we show how Kappa can be extended to handle concurrent programming with arrays. By defining array capabilities which grant access to (parts of) an array, we can piggy-back on the existing type system in a straightforward fashion. We illustrate how split and merge operations integrate with Kappa in practise by discussing the implementation of a divide-and-conquer quicksort algorithm. We explore the semantics of the Kappa extension by using a simple imperative calculus and sketch on how it could be implemented efficiently.
Article Search
Rank Polymorphism Viewed as a Constraint Problem
Justin Slepak, Panagiotis Manolios, and Olin Shivers
(Northeastern University, USA)
Rank polymorphism serves as a type of control flow used in array-oriented languages, where functions are automatically lifted to operate on high-dimensional arguments. The iteration space is derived directly from the shape of the data, presenting a challenge to compilation. A type system can characterize data shape, though the level of detail is beyond what can be reasonably expected from entirely human-generated annotations. The task of checking or inferring shapes can be phrased as solving constraints in the theory of the free monoid over the natural numbers, but the constraints involve both universal and existential quantification. Here is a plan of attack for leveraging past work on decision procedures, which has generally focused on the purely existential fragment of the theory.
Article Search
Proving a Core Code for FDM Correct by 2 + dw Tests
Magne Haveraaen
(University of Bergen, Norway)

Software correctness in general is a hard problem, and especially so for high performance computing (HPC). One problem being that array layout and traversal may depend on array size and hardware properties (cache size, core count, etc), making verification almost specific to every execution of the software. Can one of the reasons for the difficulties be that we are focussing too much on specific instances and low level detail, and not enough on abstraction and generic code? There exists several meta-theorems for generic code, one promising correctness by testing of minimal data sets, another free theorems.

Here we present three generic array algorithms which can form the core when implementing explicit solvers for finite difference methods (FDM). Knowing the size for the computational grid, the first two generic algorithms requires one test each to ensure correctness. The third requires dw tests, where d is the number of dimensions, e.g., 3, and w is the width of the difference operator, e.g., 3 or 5. This verification is fast enough to be an integral part of a simulation run, which typically will have hundreds of calls with these parameter combinations.

Generic implementations also provide free theorems, which gives rise to optimisation rules for the FDM code.

We illustrate this on a 3D Burgers’ solver using these generic core codes implemented for CPU and GPU.

Article Search

Accessing the Memory System

Inner Array Inlining for Structure of Arrays Layout
Matthias Springer, Yaozhu Sun, and Hidehiko Masuhara
(Tokyo Institute of Technology, Japan)
Previous work has shown how the well-studied and SIMD-friendly Structure of Arrays (SOA) data layout strategy can speed up applications in high-performance computing compared to a traditional Array of Structures (AOS) data layout. However, a standard SOA layout cannot handle structures with inner arrays; such structures appear frequently in graph-based applications and object-oriented designs with associations of high multiplicity. This work extends the SOA data layout to structures with array-typed fields. We present different techniques for inlining (embedding) inner arrays into an AOS or SOA layout, as well as the design and implementation of an embedded C++/CUDA DSL that lets programmers write such layouts in a notation close to standard C++. We evaluate several layout strategies with a traffic flow simulation, an important real-world application in transport planning.
Article Search Info
An Array API for Finite Difference Methods
Eva Burrows, Helmer André Friis, and Magne Haveraaen
(University of Bergen, Norway; IRIS, Norway)
As we move towards exascale computing, computer architecture is bound to see dramatic changes. Multiple nodes, with or without shared memory, multicore and accelerators (GPUs, FPGAs) will be the norm. For many domains, such as finite difference numerical simulations, the array used to represent a perfect match between the user level code and the hardware architecture’s uniform memory access, well supported by programming languages and compilers. Facing the exascale challenge, we propose replacing the compiler supported array by an array API, empowering the software developer to implement their own array-memory layout. Application code written towards such an API will be independent of underlying architecture changes, thus easily ported between architectures. Here we demonstrate the viability of this approach by demonstrating an array API for finite difference solvers.
Article Search

proc time: 0.18