PLDI 2020 Co-Located Events
41st ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2020)

Powered by

Conference Publishing Consulting

41st ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2020), June 15–20, 2020, London, UK

PLDI 2020 Co-Located Events – Proceedings

Contents - Abstracts - Authors

2020 ACM SIGPLAN International Symposium on Memory Management (ISMM 2020)

Frontmatter

Title Page

Welcome from the Chairs
It is with great pleasure that we welcome you to the ACM SIGPLAN International Symposium on Memory Management (ISMM) 2020. This is the 19th installment of ISMM and was originally meant to take place in London, UK on June 16, 2020, in conjunction with PLDI. However, due to the COVID-19 pandemic, the physical conference had to be cancelled and ISMM moved to a virtual conference format.

ISMM 2020 Organization
Committee Listings

ISMM 2020 Sponsors
Sponsors

Papers

Garbage Collection using a Finite Liveness Domain
Aman Bansal, Saksham Goel, Preey Shah, Amitabha Sanyal, and Prasanna Kumar
(IIT Bombay, India)
Functional languages manage heap data through garbage collection. Since static analysis of heap data is difficult, garbage collectors conservatively approximate the liveness of heap objects by reachability i.e. every object that is reachable from the root set is considered live. Consequently, a large amount of memory that is reachable but not used further during execution is left uncollected by the collector. Earlier attempts at liveness-based garbage collection for languages supporting structured types were based on analyses that considered arbitrary liveness values, i.e. they assumed that any substructure of the data could be potentially live. This made the analyses complex and unscalable. However, functional programs traverse structured data like lists in identifiable patterns. We identify a set of eight usage patterns that we use as liveness values. The liveness analysis that accompanies our garbage collector is based on this set; liveness arising out of other patterns of traversal are conservatively approximated by this set. This restriction to a small set of liveness values reaps several benefits -- it results in a simple liveness analysis which scales to much larger programs with minimal loss of precision, enables the use of a faster collection technique, and is extendable to higher-order programs. Our experiments with a purely functional subset of Scheme show a reduction in the analysis time by orders of magnitude. In addition, the minimum heap size required to run programs is comparable with a liveness-based collector with unrestricted liveness values, and in situations where memory is limited, the garbage collection time is lower than its reachability counterpart.

Publisher's Version

Prefetching in Functional Languages
Sam Ainsworth and Timothy M. Jones

(University of Cambridge, UK)
Functional programming languages contain a number of runtime and language features, such as garbage collection, indirect memory accesses, linked data structures and immutability, that interact with a processor’s memory system. These conspire to cause a variety of unintuitive memory-performance effects. For example, it is slower to traverse through linked lists and arrays of data that have been sorted than to traverse the same data accessed in the order it was allocated.
We seek to understand these issues and mitigate them in a manner consistent with functional languages, taking advantage of the features themselves where possible. For example, immutability and garbage collection force linked lists to be allocated roughly sequentially in memory, even when the data pointed to within each node is not. We add language primitives for software-prefetching to the OCaml language to exploit this, and observe significant performance improvements a variety of micro- and macro-benchmarks, resulting in speedups of up to 2× on the out-of-order superscalar Intel Haswell and Xeon Phi Knights Landing systems, and up to 3× on the in-order Arm Cortex-A53.

Publisher's Version

Improving Phase Change Memory Performance with Data Content Aware Access
Shihao Song, Anup Das, Onur Mutlu, and Nagarajan Kandasamy
(Drexel University, USA; ETH Zurich, Switzerland)
Phase change memory (PCM) is a scalable non-volatile memory technology that has low access latency (like DRAM) and high capacity (like Flash). Writing to PCM incurs significantly higher latency and energy penalties compared to reading its content. A prominent characteristic of PCM’s write operation is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both.
In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in work- loads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all- zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state- of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON improves the effective access latency by 31%, overall system performance by 27%, and total memory system energy consumption by 43% compared to the best of performance-oriented state-of-the-art techniques.

Publisher's Version

Verified Sequential Malloc/Free
Andrew W. Appel

and David A. Naumann
(Princeton University, USA; Stevens Institute of Technology, USA)
We verify the functional correctness of an array-of-bins (segregated free-lists) single-thread malloc/free system with respect to a correctness specification written in separation logic. The memory allocator is written in standard C code compatible with the standard API; the specification is in the Verifiable C program logic, and the proof is done in the Verified Software Toolchain within the Coq proof assistant. Our "resource-aware" specification can guarantee when malloc will successfully return a block, unlike the standard Posix specification that allows malloc to return NULL whenever it wants to. We also prove subsumption (refinement): the resource-aware specification implies a resource-oblivious spec.

Publisher's Version

Understanding and Optimizing Persistent Memory Allocation
Wentao Cai, Haosen Wen, H. Alan Beadle, Chris Kjellqvist, Mohammad Hedayati, and Michael L. Scott
(University of Rochester, USA)
The proliferation of fast, dense, byte-addressable nonvolatile memory suggests that data might be kept in pointer-rich "in-memory" format across program runs and even process and system crashes. For full generality, such data requires dynamic memory allocation, and while the allocator could in principle be "rolled into" each data structure, it is desirable to make it a separate abstraction. Toward this end, we introduce recoverability, a correctness criterion for persistent allocators, together with a nonblocking allocator, Ralloc, that satisfies this criterion. Ralloc is based on the LRMalloc of Leite and Rocha, with three key innovations. First, we persist just enough information during normal operation to permit correct reconstruction of the heap after a full-system crash. Our reconstruction mechanism performs garbage collection (GC) to identify and remedy any failure-induced memory leaks. Second, we introduce the notion of filter functions, which identify the locations of pointers within persistent blocks to mitigate the limitations of conservative GC. Third, to allow persistent regions to be mapped at an arbitrary address, we employ position-independent (offset-based) pointers for both data and metadata. Experiments show Ralloc to be performance-competitive with both Makalu, the state-of-the-art lock-based persistent allocator, and such transient allocators as LRMalloc and JEMalloc. In particular, reliance on GC and offline metadata reconstruction allows Ralloc to pay almost nothing for persistence during normal operation.

Publisher's Version

ThinGC: Complete Isolation with Marginal Overhead
Albert Mingkun Yang, Erik Österlund

, Jesper Wilhelmsson, Hanna Nyblom, and Tobias Wrigstad

(Uppsala University, Sweden; Oracle, Sweden; KTH, Sweden)
Previous works on leak-tolerating GC and write-rationing GC show that most reads/writes in an application are concentrated to a small number of objects. This suggests that many applications enjoy a clear and stable clustering of hot (recently read and/or written) and cold (the inverse of hot) objects. These results have been shown in the context of Jikes RVM, for stop-the-world collectors. This paper explores a similar design for a concurrent collector in the context of OpenJDK, plus a separate collector to manage cold objects in their own subheap. We evaluate the design and implementation of ThinGC using algorithms from JGraphT and the DaCapo suite. The results show that ThinGC considers fewer objects cold than previous work, and maintaining separate subheaps of hot and cold objects induces marginal overhead for most benchmarks except one, where large slowdown due to excessive reheats is observed.

Publisher's Version

Alligator Collector: A Latency-Optimized Garbage Collector for Functional Programming Languages
Ben Gamari and Laura Dietz
(Well-Typed LLP, UK; University of New Hampshire, USA)
Modern hardware and applications require runtime systems that can operate under large-heap and low-latency requirements. For many client/server or interactive applications, reducing average and maximum pause times is more important than maximizing throughput.
The GHC Haskell runtime system version 8.10.1 offers a new latency-optimized garbage collector as an alternative to the existing throughput-optimized copying garbage collector. This paper details the latency-optimized GC design, which is a generational collector integrating GHC's existing collector and bump-pointer allocator with a non-moving collector and non-moving heap suggested by Ueno and Ohori. We provide an empirical analysis on the latency/throughput tradeoffs. We augment the established nofib micro benchmark with a response-time focused benchmark that simulates real-world applications such as LRU caches, web search, and key-value stores.

Publisher's Version

Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories
Shihao Song, Anup Das, and Nagarajan Kandasamy
(Drexel University, USA)
Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory.
We show that our hybrid tiered-memory architecture has a tremendous potential to improve performance and reliability, if exploited by an efficient page management policy at the operating system (OS). Modern OSes are already aware of inter-memory asymmetry. They migrate pages between the two memory types during program execution, starting from an initial allocation of the page to a randomly-selected free physical address in the memory. We extend existing OS awareness in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS’s page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.

Publisher's Version

4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2020)

Frontmatter

Title Page

Welcome from the Chairs
Welcome to the fourth ACM SIGPLAN workshop on Machine Learning and Programming Languages (MAPL), co-located with PLDI 2020. The focus of MAPL is to advance interdisciplinary research across the fields of machine learning (ML) and programming languages (PL). The workshop was born of the belief that although significant advances have been made in both ML and PL, there are many open research problems that are likely to be best solved using a combination of the two. These include the development of machine learning to support novel programming systems that are better able to capture programmer’s intentions, invent new algorithms, and potentially help with the autonomous generation and optimization of software programs. It also includes the use of programming systems technologies to improve machine learning infrastructure and to support learning with stronger guarantees.

Papers

Generating Correctness Proofs with Neural Networks
Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner

(University of California at San Diego, USA)
Foundational verification allows programmers to build software which has been empirically shown to have high levels of assurance in a variety of important domains. However, the cost of producing foundationally verified software remains prohibitively high for most projects, as it requires significant manual effort by highly trained experts. In this paper we present Proverbot9001, a proof search system using machine learning techniques to produce proofs of software correctness in interactive theorem provers. We demonstrate Proverbot9001 on the proof obligations from a large practical proof project, the CompCert verified C compiler, and show that it can effectively automate what were previously manual proofs, automatically producing proofs for 28% of theorem statements in our test dataset, when combined with solver-based tooling. Without any additional solvers, we exhibit a proof completion rate that is a 4X improvement over prior state-of-the-art machine learning models for generating proofs in Coq.

Publisher's Version

Semi-static Type, Shape, and Symbolic Shape Inference for Dynamic Computation Graphs
Momoko Hattori, Shimpei Sawada, Shinichiro Hamaji, Masahiro Sakai, and Shunsuke Shimizu
(University of Tokyo, Japan; Preferred Networks, Japan)
The growing interest in deep learning has created a demand to compile computation graphs to accelerate execution and to deploy applications on various devices. Modern deep learning frameworks construct computation graphs dynamically. This gives rise to the problem of inferring types of dynamic computation graphs. Two approaches are known. One of them is a dynamic approach that constructs the computation graph from the execution trace of an actual input. While this approach is straightforward, the results of the shape inference will consist of only concrete values and is often difficult for users to interpret. The other one performs static analysis over the source program. This method can produce symbolic shape inference results but suffers from the dynamic nature of the host programming language Python.
In this paper, we propose a novel approach for type, shape, and symbolic shape inference of dynamic computation graphs as a mixture of the above two methods. We present results of applying our prototype inference engine for networks written with PyTorch and demonstrate its effectiveness for nontrivial networks.

Publisher's Version

On the Challenges in Programming Mixed-Precision Deep Neural Networks
Ruizhe Zhao, Wayne Luk, Chao Xiong, Xinyu Niu, and Kuen Hung Tsoi
(Imperial College London, UK; Corerain Technologies, China)
Deep Neural Networks (DNNs) are resilient to reduced data precision, which motivates exploiting low-precision data formats for more efficient computation, especially on custom hardware accelerators. Multiple low-precision types can be mixed to fit the dynamic range of different DNN layers. However, these formats are not often supported on popular microprocessors and Deep Learning (DL) frameworks, hence we have to manually implement and optimize such novel data types and integrate them with multiple DL framework components, which is tedious and error-prone. This paper first reviews three major challenges in programming mixed-precision DNNs, including generating high-performance arithmetic and typecast functions, reducing the recompilation time and bloated binary size caused by excessive template specialization, and optimizing mixed-precision DNN computational graphs. We present our approach, Lowgen, a framework that addresses these challenges. For each challenge, we present our solution implemented and tested on our in-house, TensorFlow-like DL framework. Empirical evaluation shows that Lowgen can automatically generate efficient data type implementations that enable significant speed-up, which greatly lowers the development effort and enhances research productivity on mixed-precision DNN.

Publisher's Version

Learning Quantitative Representation Synthesis
Mayur Patil, Farzin Houshmand

, and Mohsen Lesani

(University of California at Riverside, USA)
Software systems often use specialized combinations of data structures to store and retrieve data. Designing and maintaining custom data structures particularly concurrent ones is time-consuming and error-prone.We let the user declare the required data as a high-level specification of a relation and method interface, and automatically synthesize correct and efficient concurrent data representations. We present provably sound syntactic derivations to synthesize structures that efficiently support the interface.We then synthesize synchronization to support concurrent execution on the structures. Multiple candidate representations may satisfy the same specification and we aim at quantitative selection of the most efficient candidate. Previous works have either used dynamic auto-tuners to execute and measure the performance of the candidates or used static cost functions to estimate their performance. However, repeating the execution for many candidates is time-consuming and a single performance model cannot be an effective predictor of all workloads across all platforms. We present a novel approach to quantitative synthesis that learns the performance model. We developed a synthesis tool called Leqsy that trains an artificial neural network to statically predict the performance of candidate representations. Experimental evaluations demonstrate that Leqsy can synthesize near-optimum representations.

Publisher's Version

Learned Garbage Collection
Lujing Cen, Ryan Marcus, Hongzi Mao, Justin Gottschlich

, Mohammad Alizadeh, and Tim Kraska
(Massachusetts Institute of Technology, USA; Intel Labs, USA)
Several programming languages use garbage collectors (GCs) to automatically manage memory for the programmer. Such collectors must decide when to look for unreachable objects to free, which can have a large performance impact on some applications. In this preliminary work, we propose a design for a learned garbage collector that autonomously learns over time when to perform collections. By using reinforcement learning, our design can incorporate user-defined reward functions, allowing an autonomous garbage collector to learn to optimize the exact metric the user desires (e.g., request latency or queries per second). We conduct an initial experimental study on a prototype, demonstrating that an approach based on tabular Q learning may be promising.

Publisher's Version

9th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2020)

Frontmatter

Title Page

Message from the Chairs
Welcome to the 9th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2020), co-located with the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). In keeping with past workshops, SOAP 2020 focuses on results that are related to all aspects of static and dynamic analysis of programs.

Keynote

Formal Reasoning and the Hacker Way (Keynote)
Peter W. O'Hearn
(University College London, UK; Facebook, UK)
In 2013 I moved from to industry after over 25 years in academia, when Facebook acquired a verification startup, Monoidics, that I was involved with. In this talk I’ll recount the clash of cultures I encountered, where traditionally calm and cool formal reasoning techniques came in contact with a heated software development methodology based on rapid modification of large codebases (thousands of modifications per day on 10s MLOC). I will tell how we found that static formal reasoning could thrive, if certain technical approaches (based on compositionality), how the industrial experience caused me to question some of the assumptions I learned in academic static analysis, and how I’ve come out the other side with new science spurred by that experience (most recently, incorrectness logic). Overall, I hope to convey that having science and engineering playing off one another in a tight feedback loop is possible, even advantageous, when practicing static analysis in industry at present.

Publisher's Version

Papers

TACAI: An Intermediate Representation Based on Abstract Interpretation
Michael Reif, Florian Kübler, Dominik Helm

, Michael Eichberg, and Mira Mezini

(TU Darmstadt, Germany; University of Paderborn, Germany)
Most Java static analysis frameworks provide an intermediate presentation (IR) of Java Bytecode to facilitate the development of static analyses. While such IRs are often based on three-address code, the transformation itself is a great opportunity to apply optimizations to the transformed code, such as constant propagation.
In this paper, we propose TACAI, a refinable IR that is based on abstract interpretation results of a method's bytecode. Exchanging the underlying abstract interpretation domains enables the creation of various IRs of different precision levels. Our evaluation shows that TACAI can be efficiently computed and provides slightly more precise receiver-type information than Soot's Shimple representation. Furthermore, we show how exchanging the underlying abstract domains directly impacts the generated IR.

Publisher's Version

Value and Allocation Sensitivity in Static Python Analyses
Raphaël Monat

, Abdelraouf Ouadjaout

, and Antoine Miné

(Sorbonne University, France; CNRS, France; LIP6, France)
Sound static analyses for large subsets of static programming languages such as C are now widespread. For example the Astrée static analyzer soundly overapproximates the behavior of C programs that do not contain any dynamic code loading, longjmp statements nor recursive functions. The sound and precise analysis of widely used dynamic programming languages like JavaScript and Python remains a challenge. This paper examines the variation of static analyses of Python – in precision, time and memory usage – by adapting three parameters:
the value sensitivity, the allocation sensitivity and the activation of an abstract garbage collector
. It is not clear yet which level of sensitivity constitutes a sweet spot in terms of precision versus efficiency to achieve a meaningful Python analysis. We thus perform an experimental evaluation using a prototype static analyzer on benchmarks a few thousand lines long. Key findings are: the value analysis does not improve the precision over type-related alarms; the value analysis is three times costlier than the type analysis; the allocation sensitivity depends on the value sensitivity; using an abstract garbage collector lowers memory usage and running times, but does not affect precision.

Publisher's Version

Explaining Bug Provenance with Trace Witnesses
Jixiang Shen, Xi Wu, Neville Grech, Bernhard Scholz

, and Yannis Smaragdakis
(University of Sydney, Australia; University of Athens, Greece)
Bug finders are mainstream tools used during software development that significantly improve the productivity of software engineers and lower maintenance costs. These tools search for software anomalies by scrutinising the program's code using static program analysis techniques, i.e., without executing the code. However, current bug finders do not explain why bugs were found, primarily due to coarse-grain abstractions that abstract away large portions of the operational semantics of programming languages. To further improve the utility of bug finders, it is paramount to explain reported bugs to the end-users.
In this work, we devise a new technique that produces a program trace for a reported bug giving insight into the root cause for the reported bug. For the generation of the program trace, we use an abstracted flow-based semantics for programs to overcome the undecidability of the problem. We simplify the semantic problem by mapping an input program with a reported bug to a Constant Copy Machine (CCM) for the trace construction. Using CCM the semantics of the program can be weakened, and thus bug provenance can be solved in polynomial time, producing a shortest trace in the process which gives the shortest explanation. The technique is reified in the bug tracing tool Digger and is evaluated on several open-source Java programs.

Publisher's Version

proc time: 8.79