Workshop ARRAY 2019 – Author Index 
Contents 
Abstracts 
Authors

Abusdal, Ole 
ARRAY '19: "Finite Difference Methods ..."
Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Benjamin Chetioui, Lenore Mullin, Ole Abusdal, Magne Haveraaen, Jaakko Järvi, and Sandra Macià (University of Bergen, Norway; SUNY Albany, USA; Barcelona Supercomputing Center, Spain) Numerous scientificcomputational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multidimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the arraybased computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific arraybased computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous highlevel application layer from the discrete formulation of the collective arraybased numerical methods and algorithms and the final detailed lowlevel code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture. @InProceedings{ARRAY19p2, author = {Benjamin Chetioui and Lenore Mullin and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Sandra Macià}, title = {Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {213}, doi = {10.1145/3315454.3329954}, year = {2019}, } Publisher's Version Article Search 

Bernecky, Robert 
ARRAY '19: "Convolutional Neural Networks ..."
Convolutional Neural Networks in APL
Artjoms Šinkarovs, Robert Bernecky, and SvenBodo Scholz (HeriotWatt University, UK; Snake Island Research, Canada) This paper shows how a Convolutional Neural Network (CNN) can be implemented in APL. Its firstclass array support ideally fits that domain, and the operations of APL facilitate rapid and concise creation of generically reusable building blocks. For our example, only ten blocks are needed, and they can be expressed as ten lines of native APL. All these blocks are purely functional, and they are built out of a small number of builtin operators, resulting in a highly portable specification that is immediately runnable and should be suitable for highperformance optimizations and parallel execution. This implies that APL can be seen as a framework to define shallowlyembedded machine learning DSLs without any external dependencies, making them useful at least for experiments and prototyping. We explain the construction of each CNN building block, and briefly discuss the performance of the resulting specification. @InProceedings{ARRAY19p69, author = {Artjoms Šinkarovs and Robert Bernecky and SvenBodo Scholz}, title = {Convolutional Neural Networks in APL}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {6979}, doi = {10.1145/3315454.3329960}, year = {2019}, } Publisher's Version Article Search Info 

Bodin, Bruno 
ARRAY '19: "HighLevel Synthesis of Functional ..."
HighLevel Synthesis of Functional Patterns with Lift
Martin Kristien, Bruno Bodin, Michel Steuwer, and Christophe Dubach (University of Edinburgh, UK; YaleNUS College, Singapore; University of Glasgow, UK) Highlevel languages are commonly seen as a good fit to tackle the problem of performance portability across parallel architectures. The Lift framework is a recent approach which combines highlevel, arraybased programming abstractions, with a system of rewriterules that express algorithmic as well as lowlevel hardware optimizations. Lift has successfully demonstrated its ability to address the challenge of performance portability across multiple types of CPU and GPU devices by automatically generating code that is onpar with highly optimized handwritten code. This paper demonstrates the potential of Lift for targeting FPGAbased platforms. It presents the design of new Lift parallel patterns operating on data streams, and describes the implementation of a Lift VHDL backend. This approach is evaluated on a Xilinx XC7Z010 FPGA using matrix multiplication, leading to a 10x speedup over highly optimized CPU code and a commercial HLS tool. Furthermore, by considering the potential of design space exploration enabled by Lift, this work is a stepping stone towards automatically generated competitive code for FPGAs. @InProceedings{ARRAY19p35, author = {Martin Kristien and Bruno Bodin and Michel Steuwer and Christophe Dubach}, title = {HighLevel Synthesis of Functional Patterns with Lift}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {3545}, doi = {10.1145/3315454.3329957}, year = {2019}, } Publisher's Version Article Search 

Braam, Peter J. 
ARRAY '19: "Array Processing on Steroids ..."
Array Processing on Steroids for the SKA RadioTelescope (Keynote)
Peter J. Braam (University of Oxford, UK) The Square Kilometre Array (SKA) radio telescope will be a massive scientific instrument entering service in the late 2020's. The conversion of its antenna signals to images and the detection of transient phenomena is a massive computational undertaking, requiring 200PB/sec of memory bandwidth, all dedicated to array processing. In this talk we will give an overview of the data processing in the telescope and the process that has been followed to design suitable algorithms and systems. We will highlight parts of the challenge that have interesting relationships to computer science, and then transition to review recent technological developments such as memory, machine learning accelerators, and new floating point formats that may prove helpful. @InProceedings{ARRAY19p1, author = {Peter J. Braam}, title = {Array Processing on Steroids for the SKA RadioTelescope (Keynote)}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {11}, doi = {10.1145/3315454.3329953}, year = {2019}, } Publisher's Version Article Search 

Castrillon, Jeronimo 
ARRAY '19: "TeIL: A TypeSafe Imperative ..."
TeIL: A TypeSafe Imperative Tensor Intermediate Language
Norman A. Rink and Jeronimo Castrillon (TU Dresden, Germany) Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels. Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour. In this paper, we give examples of such kernels. The tensor languages are superficially similar to the wellknown functional array languages, for which formal definitions often exist. However, the tensor languages are inherently imperative. In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics. For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible. Based on TeIL's formal semantics we develop a typesafety result in the Coq proof assistant. @InProceedings{ARRAY19p57, author = {Norman A. Rink and Jeronimo Castrillon}, title = {TeIL: A TypeSafe Imperative Tensor Intermediate Language}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {5768}, doi = {10.1145/3315454.3329959}, year = {2019}, } Publisher's Version Article Search 

Chetioui, Benjamin 
ARRAY '19: "Finite Difference Methods ..."
Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Benjamin Chetioui, Lenore Mullin, Ole Abusdal, Magne Haveraaen, Jaakko Järvi, and Sandra Macià (University of Bergen, Norway; SUNY Albany, USA; Barcelona Supercomputing Center, Spain) Numerous scientificcomputational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multidimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the arraybased computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific arraybased computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous highlevel application layer from the discrete formulation of the collective arraybased numerical methods and algorithms and the final detailed lowlevel code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture. @InProceedings{ARRAY19p2, author = {Benjamin Chetioui and Lenore Mullin and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Sandra Macià}, title = {Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {213}, doi = {10.1145/3315454.3329954}, year = {2019}, } Publisher's Version Article Search 

Dubach, Christophe 
ARRAY '19: "HighLevel Synthesis of Functional ..."
HighLevel Synthesis of Functional Patterns with Lift
Martin Kristien, Bruno Bodin, Michel Steuwer, and Christophe Dubach (University of Edinburgh, UK; YaleNUS College, Singapore; University of Glasgow, UK) Highlevel languages are commonly seen as a good fit to tackle the problem of performance portability across parallel architectures. The Lift framework is a recent approach which combines highlevel, arraybased programming abstractions, with a system of rewriterules that express algorithmic as well as lowlevel hardware optimizations. Lift has successfully demonstrated its ability to address the challenge of performance portability across multiple types of CPU and GPU devices by automatically generating code that is onpar with highly optimized handwritten code. This paper demonstrates the potential of Lift for targeting FPGAbased platforms. It presents the design of new Lift parallel patterns operating on data streams, and describes the implementation of a Lift VHDL backend. This approach is evaluated on a Xilinx XC7Z010 FPGA using matrix multiplication, leading to a 10x speedup over highly optimized CPU code and a commercial HLS tool. Furthermore, by considering the potential of design space exploration enabled by Lift, this work is a stepping stone towards automatically generated competitive code for FPGAs. @InProceedings{ARRAY19p35, author = {Martin Kristien and Bruno Bodin and Michel Steuwer and Christophe Dubach}, title = {HighLevel Synthesis of Functional Patterns with Lift}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {3545}, doi = {10.1145/3315454.3329957}, year = {2019}, } Publisher's Version Article Search 

Elsman, Martin 
ARRAY '19: "DataParallel Flattening by ..."
DataParallel Flattening by Expansion
Martin Elsman, Troels Henriksen, and Niels Gustav Westphal Serup (University of Copenhagen, Denmark) We present a higherorder programmerlevel technique for compiling particular kinds of irregular dataparallel problems to parallel hardware. The technique, which we have named ``flatteningbyexpansion'' builds on a number of segmented dataparallel operations but is itself implemented as a higherorder generic function, which makes it useful for many irregular problems. Concretely, the implementation is given in Futhark and we demonstrate the usefulness of the functionality for a number of irregular problems and show that, in practice, the irregular problems are compiled to efficient parallel code that can be executed on GPUs. The technique is useful in any dataparallel language that provides a key set of primitives. @InProceedings{ARRAY19p14, author = {Martin Elsman and Troels Henriksen and Niels Gustav Westphal Serup}, title = {DataParallel Flattening by Expansion}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1424}, doi = {10.1145/3315454.3329955}, year = {2019}, } Publisher's Version Article Search 

Haveraaen, Magne 
ARRAY '19: "Finite Difference Methods ..."
Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Benjamin Chetioui, Lenore Mullin, Ole Abusdal, Magne Haveraaen, Jaakko Järvi, and Sandra Macià (University of Bergen, Norway; SUNY Albany, USA; Barcelona Supercomputing Center, Spain) Numerous scientificcomputational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multidimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the arraybased computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific arraybased computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous highlevel application layer from the discrete formulation of the collective arraybased numerical methods and algorithms and the final detailed lowlevel code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture. @InProceedings{ARRAY19p2, author = {Benjamin Chetioui and Lenore Mullin and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Sandra Macià}, title = {Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {213}, doi = {10.1145/3315454.3329954}, year = {2019}, } Publisher's Version Article Search 

Henriksen, Troels 
ARRAY '19: "DataParallel Flattening by ..."
DataParallel Flattening by Expansion
Martin Elsman, Troels Henriksen, and Niels Gustav Westphal Serup (University of Copenhagen, Denmark) We present a higherorder programmerlevel technique for compiling particular kinds of irregular dataparallel problems to parallel hardware. The technique, which we have named ``flatteningbyexpansion'' builds on a number of segmented dataparallel operations but is itself implemented as a higherorder generic function, which makes it useful for many irregular problems. Concretely, the implementation is given in Futhark and we demonstrate the usefulness of the functionality for a number of irregular problems and show that, in practice, the irregular problems are compiled to efficient parallel code that can be executed on GPUs. The technique is useful in any dataparallel language that provides a key set of primitives. @InProceedings{ARRAY19p14, author = {Martin Elsman and Troels Henriksen and Niels Gustav Westphal Serup}, title = {DataParallel Flattening by Expansion}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1424}, doi = {10.1145/3315454.3329955}, year = {2019}, } Publisher's Version Article Search 

Jacob, Dejice 
ARRAY '19: "ALPyNA: Acceleration of Loops ..."
ALPyNA: Acceleration of Loops in Python for Novel Architectures
Dejice Jacob and Jeremy Singer (University of Glasgow, UK) We present ALPyNA, an automatic loop parallelization framework for Python, which analyzes data dependences within nested loops and dynamically generates CUDA kernels for GPU execution. The ALPyNA system applies classical dependence analysis techniques to discover and exploit potential parallelism. The skeletal structure of the dependence graph is determined statically (if possible) or at runtime; this is combined with type and bounds information discovered at runtime, to autogenerate highperformance kernels for offload to GPU. We demonstrate speedups of up to 1000x relative to the native CPython interpreter across four arrayintensive numerical Python benchmarks. Performance improvement is related to both iteration domain size and dependence graph complexity. Nevertheless, this approach promises to bring the benefits of manycore parallelism to application developers. @InProceedings{ARRAY19p25, author = {Dejice Jacob and Jeremy Singer}, title = {ALPyNA: Acceleration of Loops in Python for Novel Architectures}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {2534}, doi = {10.1145/3315454.3329956}, year = {2019}, } Publisher's Version Article Search 

Järvi, Jaakko 
ARRAY '19: "Finite Difference Methods ..."
Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Benjamin Chetioui, Lenore Mullin, Ole Abusdal, Magne Haveraaen, Jaakko Järvi, and Sandra Macià (University of Bergen, Norway; SUNY Albany, USA; Barcelona Supercomputing Center, Spain) Numerous scientificcomputational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multidimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the arraybased computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific arraybased computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous highlevel application layer from the discrete formulation of the collective arraybased numerical methods and algorithms and the final detailed lowlevel code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture. @InProceedings{ARRAY19p2, author = {Benjamin Chetioui and Lenore Mullin and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Sandra Macià}, title = {Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {213}, doi = {10.1145/3315454.3329954}, year = {2019}, } Publisher's Version Article Search 

Kowalski, Karol 
ARRAY '19: "Toward Generalized Tensor ..."
Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods
Erdal Mutlu, Karol Kowalski, and Sriram Krishnamoorthy (Pacific Northwest National Laboratory, USA) The widespread use of tensor operations in describing electronic structure calculations has motivated the design of software frameworks for productive development of scalable optimized tensorbased electronic structure methods. Whereas prior work focused on Cartesian abstractions for dense tensors, we present an algebra to specify and perform tensor operations on a larger class of blocksparse tensors. We illustrate the use of this framework in expressing realworld computational chemistry calculations beyond the reach of existing frameworks. @InProceedings{ARRAY19p46, author = {Erdal Mutlu and Karol Kowalski and Sriram Krishnamoorthy}, title = {Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {4656}, doi = {10.1145/3315454.3329958}, year = {2019}, } Publisher's Version Article Search 

Krishnamoorthy, Sriram 
ARRAY '19: "Toward Generalized Tensor ..."
Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods
Erdal Mutlu, Karol Kowalski, and Sriram Krishnamoorthy (Pacific Northwest National Laboratory, USA) The widespread use of tensor operations in describing electronic structure calculations has motivated the design of software frameworks for productive development of scalable optimized tensorbased electronic structure methods. Whereas prior work focused on Cartesian abstractions for dense tensors, we present an algebra to specify and perform tensor operations on a larger class of blocksparse tensors. We illustrate the use of this framework in expressing realworld computational chemistry calculations beyond the reach of existing frameworks. @InProceedings{ARRAY19p46, author = {Erdal Mutlu and Karol Kowalski and Sriram Krishnamoorthy}, title = {Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {4656}, doi = {10.1145/3315454.3329958}, year = {2019}, } Publisher's Version Article Search 

Kristien, Martin 
ARRAY '19: "HighLevel Synthesis of Functional ..."
HighLevel Synthesis of Functional Patterns with Lift
Martin Kristien, Bruno Bodin, Michel Steuwer, and Christophe Dubach (University of Edinburgh, UK; YaleNUS College, Singapore; University of Glasgow, UK) Highlevel languages are commonly seen as a good fit to tackle the problem of performance portability across parallel architectures. The Lift framework is a recent approach which combines highlevel, arraybased programming abstractions, with a system of rewriterules that express algorithmic as well as lowlevel hardware optimizations. Lift has successfully demonstrated its ability to address the challenge of performance portability across multiple types of CPU and GPU devices by automatically generating code that is onpar with highly optimized handwritten code. This paper demonstrates the potential of Lift for targeting FPGAbased platforms. It presents the design of new Lift parallel patterns operating on data streams, and describes the implementation of a Lift VHDL backend. This approach is evaluated on a Xilinx XC7Z010 FPGA using matrix multiplication, leading to a 10x speedup over highly optimized CPU code and a commercial HLS tool. Furthermore, by considering the potential of design space exploration enabled by Lift, this work is a stepping stone towards automatically generated competitive code for FPGAs. @InProceedings{ARRAY19p35, author = {Martin Kristien and Bruno Bodin and Michel Steuwer and Christophe Dubach}, title = {HighLevel Synthesis of Functional Patterns with Lift}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {3545}, doi = {10.1145/3315454.3329957}, year = {2019}, } Publisher's Version Article Search 

Low, Tze Meng 
ARRAY '19: "Linear Algebraic DepthFirst ..."
Linear Algebraic DepthFirst Search
Daniele G. Spampinato, Upasana Sridhar, and Tze Meng Low (Carnegie Mellon University, USA) There is a recent push by a segment of the graph community to implement graph algorithms in the language of linear algebra. However, graph algorithms that depend on depthfirst search (DFS) techniques are often highlighted as limitations of the linear algebraic approach as linear algebraic formulation of DFS algorithms are few, if any. This paper provides a linear algebraic approach for developing DFS graph algorithms and demonstrates its use for defining three classical DFSbased computations: Binary tree traversal, topological sort, and biconnected components. @InProceedings{ARRAY19p93, author = {Daniele G. Spampinato and Upasana Sridhar and Tze Meng Low}, title = {Linear Algebraic DepthFirst Search}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {93104}, doi = {10.1145/3315454.3329962}, year = {2019}, } Publisher's Version Article Search 

Macià, Sandra 
ARRAY '19: "Finite Difference Methods ..."
Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Benjamin Chetioui, Lenore Mullin, Ole Abusdal, Magne Haveraaen, Jaakko Järvi, and Sandra Macià (University of Bergen, Norway; SUNY Albany, USA; Barcelona Supercomputing Center, Spain) Numerous scientificcomputational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multidimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the arraybased computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific arraybased computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous highlevel application layer from the discrete formulation of the collective arraybased numerical methods and algorithms and the final detailed lowlevel code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture. @InProceedings{ARRAY19p2, author = {Benjamin Chetioui and Lenore Mullin and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Sandra Macià}, title = {Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {213}, doi = {10.1145/3315454.3329954}, year = {2019}, } Publisher's Version Article Search 

Manolios, Panagiotis 
ARRAY '19: "Records with Rank Polymorphism ..."
Records with Rank Polymorphism
Justin Slepak, Olin Shivers, and Panagiotis Manolios (Northeastern University, USA) In a rankpolymorphic programming language, all functions automatically lift to operate on arbitrarily highdimensional aggregate data. By adding records to such a language, we can support computation on data frames, a tabular data structure containing heterogeneous data but in which individual columns are homogeneous. In such a setting, a data frame is a vector of records, subject to both ordinary array operations (, filtering, reducing, sorting) and lifted record operations—projecting a field lifts to projecting a column. Data frames have become a popular tool for exploratory data analysis, but fluidity of interacting with data frames via lifted record operations depends on how the language’s records are designed. We investigate three languages with different notions of record data: Racket, Standard ML, and Python. For each, we examine several common tasks for working with data frames and how the language’s records make these tasks easy or hard. Based on their advantages and disadvantages, we synthesize their ideas to produce a design for record types which is flexible for both scalar and lifted computation. @InProceedings{ARRAY19p80, author = {Justin Slepak and Olin Shivers and Panagiotis Manolios}, title = {Records with Rank Polymorphism}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {8092}, doi = {10.1145/3315454.3329961}, year = {2019}, } Publisher's Version Article Search 

Mullin, Lenore 
ARRAY '19: "Finite Difference Methods ..."
Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Benjamin Chetioui, Lenore Mullin, Ole Abusdal, Magne Haveraaen, Jaakko Järvi, and Sandra Macià (University of Bergen, Norway; SUNY Albany, USA; Barcelona Supercomputing Center, Spain) Numerous scientificcomputational domains make use of array data. The core computing of the numerical methods and the algorithms involved is related to multidimensional array manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of the arraybased computations. As we move towards exascale computing, writing portable code for efficient data parallel computations is increasingly requiring an abstract productive working environment. To that end, we present the design of a framework for optimizing scientific arraybased computations, building a case study for a Partial Differential Equations solver. By embedding the Mathematics of Arrays formalism in the Magnolia programming language, we assemble a software stack capable of abstracting the continuous highlevel application layer from the discrete formulation of the collective arraybased numerical methods and algorithms and the final detailed lowlevel code. The case study lays the groundwork for achieving optimized memory layout and efficient computations while preserving a stable abstraction layer independent of underlying algorithms and changes in the architecture. @InProceedings{ARRAY19p2, author = {Benjamin Chetioui and Lenore Mullin and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Sandra Macià}, title = {Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {213}, doi = {10.1145/3315454.3329954}, year = {2019}, } Publisher's Version Article Search 

Mutlu, Erdal 
ARRAY '19: "Toward Generalized Tensor ..."
Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods
Erdal Mutlu, Karol Kowalski, and Sriram Krishnamoorthy (Pacific Northwest National Laboratory, USA) The widespread use of tensor operations in describing electronic structure calculations has motivated the design of software frameworks for productive development of scalable optimized tensorbased electronic structure methods. Whereas prior work focused on Cartesian abstractions for dense tensors, we present an algebra to specify and perform tensor operations on a larger class of blocksparse tensors. We illustrate the use of this framework in expressing realworld computational chemistry calculations beyond the reach of existing frameworks. @InProceedings{ARRAY19p46, author = {Erdal Mutlu and Karol Kowalski and Sriram Krishnamoorthy}, title = {Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {4656}, doi = {10.1145/3315454.3329958}, year = {2019}, } Publisher's Version Article Search 

Rink, Norman A. 
ARRAY '19: "TeIL: A TypeSafe Imperative ..."
TeIL: A TypeSafe Imperative Tensor Intermediate Language
Norman A. Rink and Jeronimo Castrillon (TU Dresden, Germany) Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels. Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour. In this paper, we give examples of such kernels. The tensor languages are superficially similar to the wellknown functional array languages, for which formal definitions often exist. However, the tensor languages are inherently imperative. In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics. For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible. Based on TeIL's formal semantics we develop a typesafety result in the Coq proof assistant. @InProceedings{ARRAY19p57, author = {Norman A. Rink and Jeronimo Castrillon}, title = {TeIL: A TypeSafe Imperative Tensor Intermediate Language}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {5768}, doi = {10.1145/3315454.3329959}, year = {2019}, } Publisher's Version Article Search 

Scholz, SvenBodo 
ARRAY '19: "Convolutional Neural Networks ..."
Convolutional Neural Networks in APL
Artjoms Šinkarovs, Robert Bernecky, and SvenBodo Scholz (HeriotWatt University, UK; Snake Island Research, Canada) This paper shows how a Convolutional Neural Network (CNN) can be implemented in APL. Its firstclass array support ideally fits that domain, and the operations of APL facilitate rapid and concise creation of generically reusable building blocks. For our example, only ten blocks are needed, and they can be expressed as ten lines of native APL. All these blocks are purely functional, and they are built out of a small number of builtin operators, resulting in a highly portable specification that is immediately runnable and should be suitable for highperformance optimizations and parallel execution. This implies that APL can be seen as a framework to define shallowlyembedded machine learning DSLs without any external dependencies, making them useful at least for experiments and prototyping. We explain the construction of each CNN building block, and briefly discuss the performance of the resulting specification. @InProceedings{ARRAY19p69, author = {Artjoms Šinkarovs and Robert Bernecky and SvenBodo Scholz}, title = {Convolutional Neural Networks in APL}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {6979}, doi = {10.1145/3315454.3329960}, year = {2019}, } Publisher's Version Article Search Info 

Serup, Niels Gustav Westphal 
ARRAY '19: "DataParallel Flattening by ..."
DataParallel Flattening by Expansion
Martin Elsman, Troels Henriksen, and Niels Gustav Westphal Serup (University of Copenhagen, Denmark) We present a higherorder programmerlevel technique for compiling particular kinds of irregular dataparallel problems to parallel hardware. The technique, which we have named ``flatteningbyexpansion'' builds on a number of segmented dataparallel operations but is itself implemented as a higherorder generic function, which makes it useful for many irregular problems. Concretely, the implementation is given in Futhark and we demonstrate the usefulness of the functionality for a number of irregular problems and show that, in practice, the irregular problems are compiled to efficient parallel code that can be executed on GPUs. The technique is useful in any dataparallel language that provides a key set of primitives. @InProceedings{ARRAY19p14, author = {Martin Elsman and Troels Henriksen and Niels Gustav Westphal Serup}, title = {DataParallel Flattening by Expansion}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1424}, doi = {10.1145/3315454.3329955}, year = {2019}, } Publisher's Version Article Search 

Shivers, Olin 
ARRAY '19: "Records with Rank Polymorphism ..."
Records with Rank Polymorphism
Justin Slepak, Olin Shivers, and Panagiotis Manolios (Northeastern University, USA) In a rankpolymorphic programming language, all functions automatically lift to operate on arbitrarily highdimensional aggregate data. By adding records to such a language, we can support computation on data frames, a tabular data structure containing heterogeneous data but in which individual columns are homogeneous. In such a setting, a data frame is a vector of records, subject to both ordinary array operations (, filtering, reducing, sorting) and lifted record operations—projecting a field lifts to projecting a column. Data frames have become a popular tool for exploratory data analysis, but fluidity of interacting with data frames via lifted record operations depends on how the language’s records are designed. We investigate three languages with different notions of record data: Racket, Standard ML, and Python. For each, we examine several common tasks for working with data frames and how the language’s records make these tasks easy or hard. Based on their advantages and disadvantages, we synthesize their ideas to produce a design for record types which is flexible for both scalar and lifted computation. @InProceedings{ARRAY19p80, author = {Justin Slepak and Olin Shivers and Panagiotis Manolios}, title = {Records with Rank Polymorphism}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {8092}, doi = {10.1145/3315454.3329961}, year = {2019}, } Publisher's Version Article Search 

Singer, Jeremy 
ARRAY '19: "ALPyNA: Acceleration of Loops ..."
ALPyNA: Acceleration of Loops in Python for Novel Architectures
Dejice Jacob and Jeremy Singer (University of Glasgow, UK) We present ALPyNA, an automatic loop parallelization framework for Python, which analyzes data dependences within nested loops and dynamically generates CUDA kernels for GPU execution. The ALPyNA system applies classical dependence analysis techniques to discover and exploit potential parallelism. The skeletal structure of the dependence graph is determined statically (if possible) or at runtime; this is combined with type and bounds information discovered at runtime, to autogenerate highperformance kernels for offload to GPU. We demonstrate speedups of up to 1000x relative to the native CPython interpreter across four arrayintensive numerical Python benchmarks. Performance improvement is related to both iteration domain size and dependence graph complexity. Nevertheless, this approach promises to bring the benefits of manycore parallelism to application developers. @InProceedings{ARRAY19p25, author = {Dejice Jacob and Jeremy Singer}, title = {ALPyNA: Acceleration of Loops in Python for Novel Architectures}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {2534}, doi = {10.1145/3315454.3329956}, year = {2019}, } Publisher's Version Article Search 

Šinkarovs, Artjoms 
ARRAY '19: "Convolutional Neural Networks ..."
Convolutional Neural Networks in APL
Artjoms Šinkarovs, Robert Bernecky, and SvenBodo Scholz (HeriotWatt University, UK; Snake Island Research, Canada) This paper shows how a Convolutional Neural Network (CNN) can be implemented in APL. Its firstclass array support ideally fits that domain, and the operations of APL facilitate rapid and concise creation of generically reusable building blocks. For our example, only ten blocks are needed, and they can be expressed as ten lines of native APL. All these blocks are purely functional, and they are built out of a small number of builtin operators, resulting in a highly portable specification that is immediately runnable and should be suitable for highperformance optimizations and parallel execution. This implies that APL can be seen as a framework to define shallowlyembedded machine learning DSLs without any external dependencies, making them useful at least for experiments and prototyping. We explain the construction of each CNN building block, and briefly discuss the performance of the resulting specification. @InProceedings{ARRAY19p69, author = {Artjoms Šinkarovs and Robert Bernecky and SvenBodo Scholz}, title = {Convolutional Neural Networks in APL}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {6979}, doi = {10.1145/3315454.3329960}, year = {2019}, } Publisher's Version Article Search Info 

Slepak, Justin 
ARRAY '19: "Records with Rank Polymorphism ..."
Records with Rank Polymorphism
Justin Slepak, Olin Shivers, and Panagiotis Manolios (Northeastern University, USA) In a rankpolymorphic programming language, all functions automatically lift to operate on arbitrarily highdimensional aggregate data. By adding records to such a language, we can support computation on data frames, a tabular data structure containing heterogeneous data but in which individual columns are homogeneous. In such a setting, a data frame is a vector of records, subject to both ordinary array operations (, filtering, reducing, sorting) and lifted record operations—projecting a field lifts to projecting a column. Data frames have become a popular tool for exploratory data analysis, but fluidity of interacting with data frames via lifted record operations depends on how the language’s records are designed. We investigate three languages with different notions of record data: Racket, Standard ML, and Python. For each, we examine several common tasks for working with data frames and how the language’s records make these tasks easy or hard. Based on their advantages and disadvantages, we synthesize their ideas to produce a design for record types which is flexible for both scalar and lifted computation. @InProceedings{ARRAY19p80, author = {Justin Slepak and Olin Shivers and Panagiotis Manolios}, title = {Records with Rank Polymorphism}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {8092}, doi = {10.1145/3315454.3329961}, year = {2019}, } Publisher's Version Article Search 

Spampinato, Daniele G. 
ARRAY '19: "Linear Algebraic DepthFirst ..."
Linear Algebraic DepthFirst Search
Daniele G. Spampinato, Upasana Sridhar, and Tze Meng Low (Carnegie Mellon University, USA) There is a recent push by a segment of the graph community to implement graph algorithms in the language of linear algebra. However, graph algorithms that depend on depthfirst search (DFS) techniques are often highlighted as limitations of the linear algebraic approach as linear algebraic formulation of DFS algorithms are few, if any. This paper provides a linear algebraic approach for developing DFS graph algorithms and demonstrates its use for defining three classical DFSbased computations: Binary tree traversal, topological sort, and biconnected components. @InProceedings{ARRAY19p93, author = {Daniele G. Spampinato and Upasana Sridhar and Tze Meng Low}, title = {Linear Algebraic DepthFirst Search}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {93104}, doi = {10.1145/3315454.3329962}, year = {2019}, } Publisher's Version Article Search 

Sridhar, Upasana 
ARRAY '19: "Linear Algebraic DepthFirst ..."
Linear Algebraic DepthFirst Search
Daniele G. Spampinato, Upasana Sridhar, and Tze Meng Low (Carnegie Mellon University, USA) There is a recent push by a segment of the graph community to implement graph algorithms in the language of linear algebra. However, graph algorithms that depend on depthfirst search (DFS) techniques are often highlighted as limitations of the linear algebraic approach as linear algebraic formulation of DFS algorithms are few, if any. This paper provides a linear algebraic approach for developing DFS graph algorithms and demonstrates its use for defining three classical DFSbased computations: Binary tree traversal, topological sort, and biconnected components. @InProceedings{ARRAY19p93, author = {Daniele G. Spampinato and Upasana Sridhar and Tze Meng Low}, title = {Linear Algebraic DepthFirst Search}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {93104}, doi = {10.1145/3315454.3329962}, year = {2019}, } Publisher's Version Article Search 

Steuwer, Michel 
ARRAY '19: "HighLevel Synthesis of Functional ..."
HighLevel Synthesis of Functional Patterns with Lift
Martin Kristien, Bruno Bodin, Michel Steuwer, and Christophe Dubach (University of Edinburgh, UK; YaleNUS College, Singapore; University of Glasgow, UK) Highlevel languages are commonly seen as a good fit to tackle the problem of performance portability across parallel architectures. The Lift framework is a recent approach which combines highlevel, arraybased programming abstractions, with a system of rewriterules that express algorithmic as well as lowlevel hardware optimizations. Lift has successfully demonstrated its ability to address the challenge of performance portability across multiple types of CPU and GPU devices by automatically generating code that is onpar with highly optimized handwritten code. This paper demonstrates the potential of Lift for targeting FPGAbased platforms. It presents the design of new Lift parallel patterns operating on data streams, and describes the implementation of a Lift VHDL backend. This approach is evaluated on a Xilinx XC7Z010 FPGA using matrix multiplication, leading to a 10x speedup over highly optimized CPU code and a commercial HLS tool. Furthermore, by considering the potential of design space exploration enabled by Lift, this work is a stepping stone towards automatically generated competitive code for FPGAs. @InProceedings{ARRAY19p35, author = {Martin Kristien and Bruno Bodin and Michel Steuwer and Christophe Dubach}, title = {HighLevel Synthesis of Functional Patterns with Lift}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {3545}, doi = {10.1145/3315454.3329957}, year = {2019}, } Publisher's Version Article Search 
30 authors
proc time: 1.75