Workshop ARRAY 2021 – Author Index 
Contents 
Abstracts 
Authors

Abusdal, Ole 
ARRAY '21: "Padding in the Mathematics ..."
Padding in the Mathematics of Arrays
Benjamin Chetioui , Ole Abusdal, Magne Haveraaen, Jaakko Järvi , and Lenore Mullin (University of Bergen, Norway; Western Norway University of Applied Sciences, Norway; University of Turku, Finland; SUNY Albany, USA) Multidimensional array manipulation constitutes a core component of numerous numerical methods, e.g. finite difference solvers of Partial Differential Equations (PDEs). The efficiency of such computations is tightly connected to traversing array data in a hardwarefriendly way. The Mathematics of Arrays (MoA) allows reasoning about array computations at a high level and enables systematic transformations of arraybased programs. We have previously shown that stencil computations reduce to MoA's Denotational Normal Form (DNF). Here we bring to light MoA's Operational Normal Forms (ONFs) that allow for adapting array computations to hardware characteristics. ONF transformations start from the DNF. Alongside the ONF transformations, we extend MoA with rewriting rules for padding. These new rules allow both a simplification of array indexing and a systematic approach to introducing halos to PDE solvers. Experiments on various architectures confirm the flexibility of the approach. @InProceedings{ARRAY21p15, author = {Benjamin Chetioui and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Lenore Mullin}, title = {Padding in the Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1526}, doi = {10.1145/3460944.3464311}, year = {2021}, } Publisher's Version 

Chetioui, Benjamin 
ARRAY '21: "Padding in the Mathematics ..."
Padding in the Mathematics of Arrays
Benjamin Chetioui , Ole Abusdal, Magne Haveraaen, Jaakko Järvi , and Lenore Mullin (University of Bergen, Norway; Western Norway University of Applied Sciences, Norway; University of Turku, Finland; SUNY Albany, USA) Multidimensional array manipulation constitutes a core component of numerous numerical methods, e.g. finite difference solvers of Partial Differential Equations (PDEs). The efficiency of such computations is tightly connected to traversing array data in a hardwarefriendly way. The Mathematics of Arrays (MoA) allows reasoning about array computations at a high level and enables systematic transformations of arraybased programs. We have previously shown that stencil computations reduce to MoA's Denotational Normal Form (DNF). Here we bring to light MoA's Operational Normal Forms (ONFs) that allow for adapting array computations to hardware characteristics. ONF transformations start from the DNF. Alongside the ONF transformations, we extend MoA with rewriting rules for padding. These new rules allow both a simplification of array indexing and a systematic approach to introducing halos to PDE solvers. Experiments on various architectures confirm the flexibility of the approach. @InProceedings{ARRAY21p15, author = {Benjamin Chetioui and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Lenore Mullin}, title = {Padding in the Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1526}, doi = {10.1145/3460944.3464311}, year = {2021}, } Publisher's Version 

Elsman, Martin 
ARRAY '21: "Towards SizeDependent Types ..."
Towards SizeDependent Types for Array Programming
Troels Henriksen and Martin Elsman (University of Copenhagen, Denmark) We present a type system for expressing size constraints on array types in an MLstyle type system. The goal is to detect shape mismatches at compiletime, while being simpler than full dependent types. The main restrictions is that the only terms that can occur in types are array sizes, and syntactically they must be variables or constants. For those programs where this is not sufficient, we support a form of existential types, with the type system automatically managing the requisite bookkeeping. We formalise a large subset of the type system in a small core language, which we prove sound. We also present an integration of the type system in the highperformance parallel functional language Futhark, and show on a collection of 44 representative programs that the restrictions in the type system are not too problematic in practice. @InProceedings{ARRAY21p1, author = {Troels Henriksen and Martin Elsman}, title = {Towards SizeDependent Types for Array Programming}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {114}, doi = {10.1145/3460944.3464310}, year = {2021}, } Publisher's Version 

Haveraaen, Magne 
ARRAY '21: "Padding in the Mathematics ..."
Padding in the Mathematics of Arrays
Benjamin Chetioui , Ole Abusdal, Magne Haveraaen, Jaakko Järvi , and Lenore Mullin (University of Bergen, Norway; Western Norway University of Applied Sciences, Norway; University of Turku, Finland; SUNY Albany, USA) Multidimensional array manipulation constitutes a core component of numerous numerical methods, e.g. finite difference solvers of Partial Differential Equations (PDEs). The efficiency of such computations is tightly connected to traversing array data in a hardwarefriendly way. The Mathematics of Arrays (MoA) allows reasoning about array computations at a high level and enables systematic transformations of arraybased programs. We have previously shown that stencil computations reduce to MoA's Denotational Normal Form (DNF). Here we bring to light MoA's Operational Normal Forms (ONFs) that allow for adapting array computations to hardware characteristics. ONF transformations start from the DNF. Alongside the ONF transformations, we extend MoA with rewriting rules for padding. These new rules allow both a simplification of array indexing and a systematic approach to introducing halos to PDE solvers. Experiments on various architectures confirm the flexibility of the approach. @InProceedings{ARRAY21p15, author = {Benjamin Chetioui and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Lenore Mullin}, title = {Padding in the Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1526}, doi = {10.1145/3460944.3464311}, year = {2021}, } Publisher's Version 

Henriksen, Troels 
ARRAY '21: "Towards SizeDependent Types ..."
Towards SizeDependent Types for Array Programming
Troels Henriksen and Martin Elsman (University of Copenhagen, Denmark) We present a type system for expressing size constraints on array types in an MLstyle type system. The goal is to detect shape mismatches at compiletime, while being simpler than full dependent types. The main restrictions is that the only terms that can occur in types are array sizes, and syntactically they must be variables or constants. For those programs where this is not sufficient, we support a form of existential types, with the type system automatically managing the requisite bookkeeping. We formalise a large subset of the type system in a small core language, which we prove sound. We also present an integration of the type system in the highperformance parallel functional language Futhark, and show on a collection of 44 representative programs that the restrictions in the type system are not too problematic in practice. @InProceedings{ARRAY21p1, author = {Troels Henriksen and Martin Elsman}, title = {Towards SizeDependent Types for Array Programming}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {114}, doi = {10.1145/3460944.3464310}, year = {2021}, } Publisher's Version 

Hlava, Marek 
ARRAY '21: "Acceleration of Lattice Models ..."
Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives
Wojciech Michal Pawlak , Marek Hlava, Martin Metaksov, and Cosmin Eugen Oancea (SimCorp, Denmark; University of Copenhagen, Denmark) This paper reports on the acceleration of a standard, latticebased numerical algorithm that is widely used in finance for pricing a class of fixedincome vanilla derivatives. We start with a highlevel algorithmic specification, exhibiting irregular nested parallelism, which is challenging to map efficiently to GPU hardware. From it we systematically derive and optimize two CUDA implementations, which utilize only the outer or all levels of parallelism, respectively. A detailed evaluation demonstrates (i) the high impact of the proposed optimizations, (ii) the complementary strength and weaknesses of the two GPU versions, and that (iii) they are on average 2.4× faster than our welltuned CPUparallel implementation (OpenMP+AVX2) running on 104 hardware threads, and by 3to4 order of magnitude faster than an OpenMPparallel implementation using the popular QuantLib library. @InProceedings{ARRAY21p27, author = {Wojciech Michal Pawlak and Marek Hlava and Martin Metaksov and Cosmin Eugen Oancea}, title = {Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {2738}, doi = {10.1145/3460944.3464309}, year = {2021}, } Publisher's Version 

Järvi, Jaakko 
ARRAY '21: "Padding in the Mathematics ..."
Padding in the Mathematics of Arrays
Benjamin Chetioui , Ole Abusdal, Magne Haveraaen, Jaakko Järvi , and Lenore Mullin (University of Bergen, Norway; Western Norway University of Applied Sciences, Norway; University of Turku, Finland; SUNY Albany, USA) Multidimensional array manipulation constitutes a core component of numerous numerical methods, e.g. finite difference solvers of Partial Differential Equations (PDEs). The efficiency of such computations is tightly connected to traversing array data in a hardwarefriendly way. The Mathematics of Arrays (MoA) allows reasoning about array computations at a high level and enables systematic transformations of arraybased programs. We have previously shown that stencil computations reduce to MoA's Denotational Normal Form (DNF). Here we bring to light MoA's Operational Normal Forms (ONFs) that allow for adapting array computations to hardware characteristics. ONF transformations start from the DNF. Alongside the ONF transformations, we extend MoA with rewriting rules for padding. These new rules allow both a simplification of array indexing and a systematic approach to introducing halos to PDE solvers. Experiments on various architectures confirm the flexibility of the approach. @InProceedings{ARRAY21p15, author = {Benjamin Chetioui and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Lenore Mullin}, title = {Padding in the Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1526}, doi = {10.1145/3460944.3464311}, year = {2021}, } Publisher's Version 

Metaksov, Martin 
ARRAY '21: "Acceleration of Lattice Models ..."
Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives
Wojciech Michal Pawlak , Marek Hlava, Martin Metaksov, and Cosmin Eugen Oancea (SimCorp, Denmark; University of Copenhagen, Denmark) This paper reports on the acceleration of a standard, latticebased numerical algorithm that is widely used in finance for pricing a class of fixedincome vanilla derivatives. We start with a highlevel algorithmic specification, exhibiting irregular nested parallelism, which is challenging to map efficiently to GPU hardware. From it we systematically derive and optimize two CUDA implementations, which utilize only the outer or all levels of parallelism, respectively. A detailed evaluation demonstrates (i) the high impact of the proposed optimizations, (ii) the complementary strength and weaknesses of the two GPU versions, and that (iii) they are on average 2.4× faster than our welltuned CPUparallel implementation (OpenMP+AVX2) running on 104 hardware threads, and by 3to4 order of magnitude faster than an OpenMPparallel implementation using the popular QuantLib library. @InProceedings{ARRAY21p27, author = {Wojciech Michal Pawlak and Marek Hlava and Martin Metaksov and Cosmin Eugen Oancea}, title = {Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {2738}, doi = {10.1145/3460944.3464309}, year = {2021}, } Publisher's Version 

Mullin, Lenore 
ARRAY '21: "Padding in the Mathematics ..."
Padding in the Mathematics of Arrays
Benjamin Chetioui , Ole Abusdal, Magne Haveraaen, Jaakko Järvi , and Lenore Mullin (University of Bergen, Norway; Western Norway University of Applied Sciences, Norway; University of Turku, Finland; SUNY Albany, USA) Multidimensional array manipulation constitutes a core component of numerous numerical methods, e.g. finite difference solvers of Partial Differential Equations (PDEs). The efficiency of such computations is tightly connected to traversing array data in a hardwarefriendly way. The Mathematics of Arrays (MoA) allows reasoning about array computations at a high level and enables systematic transformations of arraybased programs. We have previously shown that stencil computations reduce to MoA's Denotational Normal Form (DNF). Here we bring to light MoA's Operational Normal Forms (ONFs) that allow for adapting array computations to hardware characteristics. ONF transformations start from the DNF. Alongside the ONF transformations, we extend MoA with rewriting rules for padding. These new rules allow both a simplification of array indexing and a systematic approach to introducing halos to PDE solvers. Experiments on various architectures confirm the flexibility of the approach. @InProceedings{ARRAY21p15, author = {Benjamin Chetioui and Ole Abusdal and Magne Haveraaen and Jaakko Järvi and Lenore Mullin}, title = {Padding in the Mathematics of Arrays}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {1526}, doi = {10.1145/3460944.3464311}, year = {2021}, } Publisher's Version 

Oancea, Cosmin Eugen 
ARRAY '21: "Acceleration of Lattice Models ..."
Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives
Wojciech Michal Pawlak , Marek Hlava, Martin Metaksov, and Cosmin Eugen Oancea (SimCorp, Denmark; University of Copenhagen, Denmark) This paper reports on the acceleration of a standard, latticebased numerical algorithm that is widely used in finance for pricing a class of fixedincome vanilla derivatives. We start with a highlevel algorithmic specification, exhibiting irregular nested parallelism, which is challenging to map efficiently to GPU hardware. From it we systematically derive and optimize two CUDA implementations, which utilize only the outer or all levels of parallelism, respectively. A detailed evaluation demonstrates (i) the high impact of the proposed optimizations, (ii) the complementary strength and weaknesses of the two GPU versions, and that (iii) they are on average 2.4× faster than our welltuned CPUparallel implementation (OpenMP+AVX2) running on 104 hardware threads, and by 3to4 order of magnitude faster than an OpenMPparallel implementation using the popular QuantLib library. @InProceedings{ARRAY21p27, author = {Wojciech Michal Pawlak and Marek Hlava and Martin Metaksov and Cosmin Eugen Oancea}, title = {Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {2738}, doi = {10.1145/3460944.3464309}, year = {2021}, } Publisher's Version 

Pawlak, Wojciech Michal 
ARRAY '21: "Acceleration of Lattice Models ..."
Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives
Wojciech Michal Pawlak , Marek Hlava, Martin Metaksov, and Cosmin Eugen Oancea (SimCorp, Denmark; University of Copenhagen, Denmark) This paper reports on the acceleration of a standard, latticebased numerical algorithm that is widely used in finance for pricing a class of fixedincome vanilla derivatives. We start with a highlevel algorithmic specification, exhibiting irregular nested parallelism, which is challenging to map efficiently to GPU hardware. From it we systematically derive and optimize two CUDA implementations, which utilize only the outer or all levels of parallelism, respectively. A detailed evaluation demonstrates (i) the high impact of the proposed optimizations, (ii) the complementary strength and weaknesses of the two GPU versions, and that (iii) they are on average 2.4× faster than our welltuned CPUparallel implementation (OpenMP+AVX2) running on 104 hardware threads, and by 3to4 order of magnitude faster than an OpenMPparallel implementation using the popular QuantLib library. @InProceedings{ARRAY21p27, author = {Wojciech Michal Pawlak and Marek Hlava and Martin Metaksov and Cosmin Eugen Oancea}, title = {Acceleration of Lattice Models for Pricing Portfolios of FixedIncome Derivatives}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {2738}, doi = {10.1145/3460944.3464309}, year = {2021}, } Publisher's Version 

Scholz, SvenBodo 
ARRAY '21: "Array Languages Make Neural ..."
Array Languages Make Neural Networks Fast
Artjoms Šinkarovs , HansNikolai Vießmann , and SvenBodo Scholz (HeriotWatt University, UK; Radboud University Nijmegen, Netherlands) Most implementations of machine learning algorithms are based on specialpurpose frameworks such as TensorFlow or PyTorch. While these frameworks are convenient to use, they introduce multimillion lines of code dependency that one has to trust, understand and potentially modify. As an alternative, this paper investigates a direct implementation of a state of the art Convolutional Neural Network (CNN) in an array language. While our implementation requires 150 lines of code to define the specialpurpose operators needed for CNNs, which are readily provided through frameworks such as TensorFlow and PyTorch, our implementation outperforms these frameworks by factors 2 and 3 on a fixed set of hardware — a 64core GPUaccelerated machine; for a simple example network. The resulting specification is written in a rankpolymorphic dataparallel style, and it can be immediately leveraged by optimising compilers. Indeed, array languages make neural networks fast. @InProceedings{ARRAY21p39, author = {Artjoms Šinkarovs and HansNikolai Vießmann and SvenBodo Scholz}, title = {Array Languages Make Neural Networks Fast}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {3950}, doi = {10.1145/3460944.3464312}, year = {2021}, } Publisher's Version 

Šinkarovs, Artjoms 
ARRAY '21: "Array Languages Make Neural ..."
Array Languages Make Neural Networks Fast
Artjoms Šinkarovs , HansNikolai Vießmann , and SvenBodo Scholz (HeriotWatt University, UK; Radboud University Nijmegen, Netherlands) Most implementations of machine learning algorithms are based on specialpurpose frameworks such as TensorFlow or PyTorch. While these frameworks are convenient to use, they introduce multimillion lines of code dependency that one has to trust, understand and potentially modify. As an alternative, this paper investigates a direct implementation of a state of the art Convolutional Neural Network (CNN) in an array language. While our implementation requires 150 lines of code to define the specialpurpose operators needed for CNNs, which are readily provided through frameworks such as TensorFlow and PyTorch, our implementation outperforms these frameworks by factors 2 and 3 on a fixed set of hardware — a 64core GPUaccelerated machine; for a simple example network. The resulting specification is written in a rankpolymorphic dataparallel style, and it can be immediately leveraged by optimising compilers. Indeed, array languages make neural networks fast. @InProceedings{ARRAY21p39, author = {Artjoms Šinkarovs and HansNikolai Vießmann and SvenBodo Scholz}, title = {Array Languages Make Neural Networks Fast}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {3950}, doi = {10.1145/3460944.3464312}, year = {2021}, } Publisher's Version 

Vießmann, HansNikolai 
ARRAY '21: "Array Languages Make Neural ..."
Array Languages Make Neural Networks Fast
Artjoms Šinkarovs , HansNikolai Vießmann , and SvenBodo Scholz (HeriotWatt University, UK; Radboud University Nijmegen, Netherlands) Most implementations of machine learning algorithms are based on specialpurpose frameworks such as TensorFlow or PyTorch. While these frameworks are convenient to use, they introduce multimillion lines of code dependency that one has to trust, understand and potentially modify. As an alternative, this paper investigates a direct implementation of a state of the art Convolutional Neural Network (CNN) in an array language. While our implementation requires 150 lines of code to define the specialpurpose operators needed for CNNs, which are readily provided through frameworks such as TensorFlow and PyTorch, our implementation outperforms these frameworks by factors 2 and 3 on a fixed set of hardware — a 64core GPUaccelerated machine; for a simple example network. The resulting specification is written in a rankpolymorphic dataparallel style, and it can be immediately leveraged by optimising compilers. Indeed, array languages make neural networks fast. @InProceedings{ARRAY21p39, author = {Artjoms Šinkarovs and HansNikolai Vießmann and SvenBodo Scholz}, title = {Array Languages Make Neural Networks Fast}, booktitle = {Proc.\ ARRAY}, publisher = {ACM}, pages = {3950}, doi = {10.1145/3460944.3464312}, year = {2021}, } Publisher's Version 
14 authors
proc time: 4.08