Powered by
9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC 2021),
August 22, 2021,
Virtual, Republic of Korea
9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC 2021)
Frontmatter
Welcome from the Chairs
Welcome to the 2021 edition of the ACM SIGPLAN International
Workshopon Functional High-Performance and Numerical Computing
(FHPNC), virtually co-located with ICFP on August 22, 2021.
FHNPC aims at bringing together practitioners who employ functional and
declarative programming languages for scientific computing, as well as researchers
who work on languages, techniques and frameworks in this space. A particular focus
of this workshop is the domain of high-performance computing and numerical programming.
Papers
Improving GHC Haskell NUMA Profiling
Ruairidh MacGregor,
Phil Trinder, and
Hans-Wolfgang Loidl
(University of Glasgow, UK; Heriot-Watt University, UK)
As the number of cores increases Non-Uniform Memory Access (NUMA) is becoming increasingly prevalent in general purpose machines. Effectively exploiting NUMA can significantly reduce memory access latency and thus runtime by 10-20%, and profiling provides information on how to optimise. Language-level NUMA profilers are rare, and mostly profile conventional languages executing on Virtual Machines. Here we profile, and develop new NUMA profilers for, a functional language executing on a runtime system.
We start by using existing OS and language level tools to systematically profile 8 benchmarks from the GHC Haskell nofib suite on a typical NUMA server (8 regions, 64 cores). We propose a new metric: NUMA access rate that allows us to compare the load placed on the memory system by different programs, and use it to contrast the benchmarks. We demonstrate significant differences in NUMA usage between computational and data-intensive benchmarks, e.g. local memory access rates of 23% and 30% respectively. We show that small changes to coordination behaviour can significantly alter NUMA usage, and for the first time quantify the effectiveness of the GHC 8.2 NUMA adaption.
We identify information not available from existing profilers and extend both the numaprof profiler, and the GHC runtime system to obtain three new NUMA profiles: OS thread allocation locality, GC count (per region and generation) and GC thread locality. The new profiles not only provide a deeper understanding of program memory usage, they also suggest ways that GHC can be adapted to better exploit NUMA architectures.
@InProceedings{FHPNC21p1,
author = {Ruairidh MacGregor and Phil Trinder and Hans-Wolfgang Loidl},
title = {Improving GHC Haskell NUMA Profiling},
booktitle = {Proc.\ FHPNC},
publisher = {ACM},
pages = {1--12},
doi = {10.1145/3471873.3472974},
year = {2021},
}
Publisher's Version
Parallelism-Preserving Automatic Differentiation for Second-Order Array Languages
Adam Paszke,
Matthew J. Johnson,
Roy Frostig, and
Dougal Maclaurin
(Google Research, Poland; Google Research, USA)
We develop automatic differentiation (AD) procedures for reductions and scans—parameterized by arbitrary differentiable monoids—in a way that preserves parallelism, by rewriting them as other reductions and scans. This is in contrast with the literature and with existing AD systems, which are either general, but force sequential execution of the derivative program, or only include hand-crafted rules for a select few monoids (usually (0, +), (1, ×), (−∞, max) and (∞, min)) and thus lack the general flexibility of second-order languages.
@InProceedings{FHPNC21p13,
author = {Adam Paszke and Matthew J. Johnson and Roy Frostig and Dougal Maclaurin},
title = {Parallelism-Preserving Automatic Differentiation for Second-Order Array Languages},
booktitle = {Proc.\ FHPNC},
publisher = {ACM},
pages = {13--23},
doi = {10.1145/3471873.3472975},
year = {2021},
}
Publisher's Version
Computing Persistent Homology in Futhark
Erik von Brömssen
(Chalmers University of Technology, Sweden)
We present a massively parallel algorithm for computing persistent homology, a concept within the field of topological data analysis, and we implement it in the purely functional array-based language Futhark, which has an efficient compiler targeting GPUs.
Computing persistent homology consists of bringing a certain sparse matrix to a reduced form.
We compare our implementation with OpenPH, an existing library for computing persistent homology on GPUs, and on large matrices we achieve speedups of 2.3 to 5.
Our work shows both that persistent homology can be computed efficiently entirely on GPU hardware, and that Futhark can be used for this kind of sparse matrix manipulation.
@InProceedings{FHPNC21p24,
author = {Erik von Brömssen},
title = {Computing Persistent Homology in Futhark},
booktitle = {Proc.\ FHPNC},
publisher = {ACM},
pages = {24--36},
doi = {10.1145/3471873.3472976},
year = {2021},
}
Publisher's Version
Generating High Performance Code for Irregular Data Structures using Dependent Types
Federico Pizzuti,
Michel Steuwer, and
Christophe Dubach
(University of Edinburgh, UK; McGill University, Canada)
Parallel architectures offer high performance but are challenging to program. Data parallel functional languages offer a solution by providing a high-level programming model to work with accelerators such as GPUs. Existing languages are designed to work with dense arrays, limiting their usefulness in expressing irregular data structures, such as graphs and sparse matrices important in many application domains.
This paper addresses this limitation by extending a data-parallel language with limited dependent types, including position dependent arrays and dependent pairs to model irregular data structures. The approach is demonstrated through three case studies: dense to sparse matrix conversion, sparse matrix-vector multiplication, and parallel breadth-first search.
Experimental results show that this approach outperforms state-of-the-art implementations on GPUs. Compared to Nvidia’s cuSparse, our automatically generated code achieves an average speedup of 1.2× for dense to sparse matrix conversion and 1.3× for sparse matrix-vector multiplication.
@InProceedings{FHPNC21p37,
author = {Federico Pizzuti and Michel Steuwer and Christophe Dubach},
title = {Generating High Performance Code for Irregular Data Structures using Dependent Types},
booktitle = {Proc.\ FHPNC},
publisher = {ACM},
pages = {37--49},
doi = {10.1145/3471873.3472977},
year = {2021},
}
Publisher's Version
proc time: 1.05