ARRAY 2019 – Proceedings

Welcome from the Chair
Array-oriented programming offers a unique blend of programmer productivity and high-performance parallel execution. As an abstraction, it directly mirrors high-level mathematical constructions commonly used in many fields from natural sciences through engineering to financial modelling. As a language feature, it exposes regular control flow, exhibits structured data dependencies, and lends itself to many types of program analysis. Furthermore, many modern computer architectures, particularly highly parallel architectures such as GPUs and FPGAs, lend themselves to efficiently executing array operations.
The ARRAY workshop series is intended to bring together researchers from many different communities, including language designers, library developers, compiler researchers, and practitioners, who are using or working on numeric, array-centric aspects of programming languages, libraries and methodologies from all domains: imperative or declarative; object-oriented or functional; interpreted or compiled; strongly typed, weakly typed, or untyped.

Keynote

Array Processing on Steroids for the SKA Radio-Telescope (Keynote)
Peter J. Braam
(University of Oxford, UK)
The Square Kilometre Array (SKA) radio telescope will be a massive scientific instrument entering service in the late 2020's. The conversion of its antenna signals to images and the detection of transient phenomena is a massive computational undertaking, requiring 200PB/sec of memory bandwidth, all dedicated to array processing. In this talk we will give an overview of the data processing in the telescope and the process that has been followed to design suitable algorithms and systems. We will highlight parts of the challenge that have interesting relationships to computer science, and then transition to review recent technological developments such as memory, machine learning accelerators, and new floating point formats that may prove helpful.

Publisher's Version

Papers

Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Benjamin Chetioui, Lenore Mullin, Ole Abusdal, Magne Haveraaen, Jaakko Järvi, and Sandra Macià
(University of Bergen, Norway; SUNY Albany, USA; Barcelona Supercomputing Center, Spain)
Numerous scientific-computational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multi-dimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the array-based computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific array-based computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous high-level application layer from the discrete formulation of the collective array-based numerical methods and algorithms and the final detailed low-level code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture.

Publisher's Version

Data-Parallel Flattening by Expansion
Martin Elsman

, Troels Henriksen

, and Niels Gustav Westphal Serup
(University of Copenhagen, Denmark)
We present a higher-order programmer-level technique for compiling particular kinds of irregular data-parallel problems to parallel hardware. The technique, which we have named ``flattening-by-expansion'' builds on a number of segmented data-parallel operations but is itself implemented as a higher-order generic function, which makes it useful for many irregular problems. Concretely, the implementation is given in Futhark and we demonstrate the usefulness of the functionality for a number of irregular problems and show that, in practice, the irregular problems are compiled to efficient parallel code that can be executed on GPUs. The technique is useful in any data-parallel language that provides a key set of primitives.

Publisher's Version

ALPyNA: Acceleration of Loops in Python for Novel Architectures
Dejice Jacob and Jeremy Singer

(University of Glasgow, UK)
We present ALPyNA, an automatic loop parallelization framework for Python, which analyzes data dependences within nested loops and dynamically generates CUDA kernels for GPU execution. The ALPyNA system applies classical dependence analysis techniques to discover and exploit potential parallelism. The skeletal structure of the dependence graph is determined statically (if possible) or at runtime; this is combined with type and bounds information discovered at runtime, to auto-generate high-performance kernels for offload to GPU.
We demonstrate speedups of up to 1000x relative to the native CPython interpreter across four array-intensive numerical Python benchmarks. Performance improvement is related to both iteration domain size and dependence graph complexity. Nevertheless, this approach promises to bring the benefits of manycore parallelism to application developers.

Publisher's Version

High-Level Synthesis of Functional Patterns with Lift
Martin Kristien, Bruno Bodin, Michel Steuwer

, and Christophe Dubach
(University of Edinburgh, UK; Yale-NUS College, Singapore; University of Glasgow, UK)
High-level languages are commonly seen as a good fit to tackle the problem of performance portability across parallel architectures. The Lift framework is a recent approach which combines high-level, array-based programming abstractions, with a system of rewrite-rules that express algorithmic as well as low-level hardware optimizations. Lift has successfully demonstrated its ability to address the challenge of performance portability across multiple types of CPU and GPU devices by automatically generating code that is on-par with highly optimized hand-written code.
This paper demonstrates the potential of Lift for targeting FPGA-based platforms. It presents the design of new Lift parallel patterns operating on data streams, and describes the implementation of a Lift VHDL backend. This approach is evaluated on a Xilinx XC7Z010 FPGA using matrix multiplication, leading to a 10x speed-up over highly optimized CPU code and a commercial HLS tool. Furthermore, by considering the potential of design space exploration enabled by Lift, this work is a stepping stone towards automatically generated competitive code for FPGAs.

Publisher's Version

Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods
Erdal Mutlu, Karol Kowalski, and Sriram Krishnamoorthy
(Pacific Northwest National Laboratory, USA)
The widespread use of tensor operations in describing electronic structure calculations has motivated the design of software frameworks for productive development of scalable optimized tensor-based electronic structure methods. Whereas prior work focused on Cartesian abstractions for dense tensors, we present an algebra to specify and perform tensor operations on a larger class of block-sparse tensors. We illustrate the use of this framework in expressing real-world computational chemistry calculations beyond the reach of existing frameworks.

Publisher's Version

TeIL: A Type-Safe Imperative Tensor Intermediate Language
Norman A. Rink and Jeronimo Castrillon

(TU Dresden, Germany)
Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels. Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour. In this paper, we give examples of such kernels.
The tensor languages are superficially similar to the well-known functional array languages, for which formal definitions often exist. However, the tensor languages are inherently imperative. In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics. For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible. Based on TeIL's formal semantics we develop a type-safety result in the Coq proof assistant.

Publisher's Version

Convolutional Neural Networks in APL
Artjoms Šinkarovs

, Robert Bernecky, and Sven-Bodo Scholz
(Heriot-Watt University, UK; Snake Island Research, Canada)
This paper shows how a Convolutional Neural Network (CNN) can be implemented in APL. Its first-class array support ideally fits that domain, and the operations of APL facilitate rapid and concise creation of generically reusable building blocks. For our example, only ten blocks are needed, and they can be expressed as ten lines of native APL. All these blocks are purely functional, and they are built out of a small number of builtin operators, resulting in a highly portable specification that is immediately runnable and should be suitable for high-performance optimizations and parallel execution. This implies that APL can be seen as a framework to define shallowly-embedded machine learning DSLs without any external dependencies, making them useful at least for experiments and prototyping. We explain the construction of each CNN building block, and briefly discuss the performance of the resulting specification.

Publisher's Version

Info

Records with Rank Polymorphism
Justin Slepak, Olin Shivers

, and Panagiotis Manolios
(Northeastern University, USA)
In a rank-polymorphic programming language, all functions automatically lift to operate on arbitrarily high-dimensional aggregate data. By adding records to such a language, we can support computation on data frames, a tabular data structure containing heterogeneous data but in which individual columns are homogeneous. In such a setting, a data frame is a vector of records, subject to both ordinary array operations (, filtering, reducing, sorting) and lifted record operations—projecting a field lifts to projecting a column. Data frames have become a popular tool for exploratory data analysis, but fluidity of interacting with data frames via lifted record operations depends on how the language’s records are designed.
We investigate three languages with different notions of record data: Racket, Standard ML, and Python. For each, we examine several common tasks for working with data frames and how the language’s records make these tasks easy or hard. Based on their advantages and disadvantages, we synthesize their ideas to produce a design for record types which is flexible for both scalar and lifted computation.

Publisher's Version

Linear Algebraic Depth-First Search
Daniele G. Spampinato, Upasana Sridhar, and Tze Meng Low

(Carnegie Mellon University, USA)
There is a recent push by a segment of the graph community to implement graph algorithms in the language of linear algebra. However, graph algorithms that depend on depth-first search (DFS) techniques are often highlighted as limitations of the linear algebraic approach as linear algebraic formulation of DFS algorithms are few, if any. This paper provides a linear algebraic approach for developing DFS graph algorithms and demonstrates its use for defining three classical DFS-based computations: Binary tree traversal, topological sort, and biconnected components.

Publisher's Version

ARRAY 2019 – Proceedings

6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming (ARRAY 2019)

Frontmatter

Keynote

Papers