MPLR 2022
19th International Conference on Managed Programming Languages and Runtimes (MPLR 2022)

Powered by

Conference Publishing Consulting

19th International Conference on Managed Programming Languages and Runtimes (MPLR 2022), September 14–15, 2022, Brussels, Belgium

MPLR 2022 – Proceedings

Contents - Abstracts - Authors

Frontmatter

Title Page

Welcome from the Chairs
Welcome to MPLR 2022, the 19th International Conference on Managed Programming Languages and Runtimes. MPLR is a successor to the conference series on Managed Languages and Runtimes (ManLang) which originated as Principles and Practice of Programming in Java (PPPJ). This is a premier forum for presenting and discussing novel results in all aspects of managed programming languages and runtime systems, which serve as building blocks for many of the most important computing systems, ranging from small-scale (embedded and real-time systems) to large-scale (cloud-computing and big-data platforms) and anything in between (mobile, IoT, and wearable applications).

Organization

Keynotes

On JavaScript Ahead-of-Time Compilation Performance (Keynote)
Manuel Serrano
(Inria, France)
JavaScript is particularly difficult to implement efficiently because most of its expressions have all sorts of different meanings that involve all sorts of different executions that are not distinguished by any syntactic or type annotation. The common intuition is that only JIT compilers can handle it efficiently because they can rely on heuristic-based strategies that require having the program and the data on hand. But static (AOT) compilers can speculate and use heuristics too! To demonstrate that, we propose Hopc, an AOT compiler for JavaScript, based on the following assumption:
The most likely data structure a program will use is the one for which the compiler is able to produce its best code.
Thus, contrary to most AOT compilers, Hopc does not rely on complex static analyses to optimize programs. It simply tries to generate its best code that it protects with guards. In this presentation, I will present its main optimizations and an in-depth performance analysis.

Publisher's Version

Performance Optimizations in the .NET GC (Keynote)
Maoni Stephens
(Microsoft, USA)
In this talk we will look at a few interesting optimizations done in the .NET GC. One of the .NET GC's principles is to separate mechanisms from policies which allows for much more dynamic tuning decisions. We will also use these optimizations to illustrate this principle.
How does the .NET GC organize its objects on the heap when processing them during various GC phases? It groups adjacent live objects into what's called plugs and uses dead objects inbetween them, called gaps, to store necessary info to facilitate relocation. Special handling is done for pinned objects. Dead spaces inbetween pinned plugs which are threaded onto a free list can get used sooner to satisfy allocation requests by a mechanism called demotion. And we can decide if we want to make them available sooner by separating mechanisms from policies.
The GC can also make a decision whether it's more productive to do a compacting or sweeping GC after calculating the relocation information (another example of separating mechanisms from policies). If it turns out to be a sweeping GC, the allocations done in the free list of the older generation will need to be repaired. We repair this by an UNDO field on each free list item and keeping a count on the original free list item we took off for allocation, and we only need to repair that many when threading back the items whose UNDO field was touched.

Publisher's Version

Papers

Analysing and Predicting Energy Consumption of Garbage Collectors in OpenJDK
Marina Shimchenko, Mihail Popov, and Tobias Wrigstad
(Uppsala University, Sweden; Inria, France)
Sustainable computing needs energy-efficient software. This paper explores the potential of leveraging the nature of software written in managed languages: increasing energy efficiency by changing a program's memory management strategy without altering a single line of code. To this end, we perform comprehensive energy profiling of 35 Java applications across four benchmarks. In many cases, we find that it is possible to save energy by replacing the default G1 collector with another without sacrificing performance. Furthermore, potential energy savings can be even higher if performance regressions are permitted. Inspired by these results, we study what the most energy-efficient GCs are to help developers prune the search space for energy profiling at a low cost. Finally, we show that machine learning can be successfully applied to the problem of finding an energy-efficient GC configuration for an application, reducing the cost even further.

Publisher's Version

Automatic Array Transformation to Columnar Storage at Run Time
Lukas Makor, Sebastian Kloibhofer, David Leopoldseder, Daniele Bonetta, Lukas Stadler, and Hanspeter Mössenböck
(JKU Linz, Austria; Oracle Labs, Austria; Oracle Labs, Netherlands)
Today's huge memories make it possible to store and process large data structures in memory instead of in a database. Hence, accesses to this data should be optimized, which is normally relegated either to the runtimes and compilers or is left to the developers, who often lack the knowledge about optimization strategies. As arrays are often part of the language, developers frequently use them as an underlying storage mechanism. Thus, optimization of arrays may be vital to improve performance of data-intensive applications. While compilers can apply numerous optimizations to speed up accesses, it would also be beneficial to adapt the actual layout of the data in memory to improve cache utilization. However, runtimes and compilers typically do not perform such memory layout optimizations. In this work, we present an approach to dynamically perform memory layout optimizations on arrays of objects to transform them into a columnar memory layout, a storage layout frequently used in analytical applications that enables faster processing of read-intensive workloads. By integration into a state-of-the-art JavaScript runtime, our approach can speed up queries for large workloads by up to 9x, where the initial transformation overhead is amortized over time.

Publisher's Version

Better Understanding the Costs and Benefits of Automatic Memory Management
Kunal Sareen and Stephen M. Blackburn
(Australian National University, Australia)
Automatic memory management relieves programmers of the burden of having to reason about object lifetimes in order to soundly reclaim allocated memory. However, this automation comes at a cost. The cost and benefits of garbage collection relative to manual memory management have been the subject of contention for a long time, and will likely remain so. However, until now, the question is surprisingly under-studied. We examine the costs and benefits of garbage collection through four studies, exploring: (i) the space overheads of garbage collection; (ii) the effects of garbage collection on the execution time of a mutator using a free-list allocator; (iii) how proximity to garbage collection events affects mutator performance; and (iv) the effects of the delay in memory reuse on manually managed workloads. We conduct this study in a contemporary setting using recent CPU microarchitectures, and novel methodologies including a mark-sweep collector built upon off-the-shelf free-list allocators, allowing us to shed new light on garbage collection overheads in a modern context.
We find that: (i) simple, fully-copying collectors such as semi-space have average space overheads of about 65-80%, while immix has an overhead of 11-17% over a baseline approximating manual memory management; (ii) for the collection frequencies we evaluate, garbage collection has little impact on mutator time when using an optimized free-list allocator; (iii) the proximity of mutator work to the nearest collection has little impact on its performance, unless a free-list allocator is used; and (iv) postponing the reuse of memory generally has little effect on performance, but is allocator-specific, and is noticeable if the delay is significant relative to the workload’s footprint. The costs and benefits of garbage collection are likely to remain subject to contentious discussion. However, the methodologies and evaluations we present here provide a deeper understanding of the differences in costs between manual memory management and garbage collection.

Publisher's Version

Compressed Forwarding Tables Reconsidered
Jonas Norlinder, Erik Österlund, and Tobias Wrigstad
(Uppsala University, Sweden; Oracle, Sweden)
How concurrent compacting collectors store and manage forwarding information is crucial for their performance.
In this paper, we propose CFW, a representation technique for forwarding information that guarantees that forwarding information for an entire heap can be stored in at most 3.14% of its size. By providing such a guarantee, we simplify the task of deploying programs with respect to memory needs. This is important given how memory is typically the dominating factor in the cost model for cloud-based deployments.
We explore the design space of our technique through a prototype implementation on-top of ZGC. A rigorous performance evaluation shows promising results.

Publisher's Version

Dynamic Taint Analysis with Label-Defined Semantics
Jacob Kreindl, Daniele Bonetta, Lukas Stadler, David Leopoldseder, and Hanspeter Mössenböck
(JKU Linz, Austria; Oracle Labs, USA; Oracle Labs, Austria)
Dynamic taint analysis is a popular analysis technique which tracks the propagation of specific values while a program executes. To this end, a taint label is attached to these values and is dynamically propagated to any values derived from them. Frequent application of this analysis technique in many fields has led to the development of general-purpose analysis platforms with taint propagation capabilities. However, these platforms generally limit analysis developers to a specific implementation language, to specific propagation semantics or to specific taint label representations.
In this paper we present label-defined dynamic taint analysis, a language-agnostic approach for specifying the properties of a dynamic taint analysis in terms of propagated taint labels. This approach enables analysis platforms to support arbitrary adaptations to these properties by delegating propagation decisions to propagated taint labels and thus to provide more flexibility to analysis developers than other analysis platforms. We implemented this approach in TruffleTaint, a GraalVM-based taint analysis platform, and integrated it with GraalVM’s language interoperability and tooling support. We further integrated our approach with GraalVM’s performance optimizations. Our performance evaluation shows that label-defined taint analysis can reach peak performance similar to that of equivalent engine-integrated taint analyses. In addition to supporting the convenient reimplementation of existing dynamic taint analyses, our approach enables new capabilities for these analyses. It also enabled us to implement a novel tooling infrastructure for analysis developers as well as tooling support for end users.

Publisher's Version

Event-Based Out-of-Place Debugging
Tom Lauwaerts, Carlos Rojas Castillo, Robbert Gurdeep Singh, Matteo Marra, Christophe Scholliers, and Elisa Gonzalez Boix
(Universiteit Gent, Belgium; Vrije Universiteit Brussel, Belgium)
Debugging IoT applications is challenging due to the hardware constraints of IoT devices, making advanced techniques like record-replay debugging impractical. As a result, programmers often rely on manual resets or inefficient and time-consuming debugging techniques such as printf. Although simulators can help in that regard, their applicability is limited because they fall short of accurately simulating and reproducing the runtime conditions where bugs appear. In this work, we explore a novel debugging approach called event-based out-of-place debugging in which developers can capture a remotely running program and debug it locally on a (more powerful) machine. Our approach thus provides rich debugging features (e.g., step-back) that normally would not run on the hardware restricted devices. Two different strategies are offered to deal with resources which cannot be easily transferred (e.g., sensors): pull-based (akin to remote debugging), or push-based (where data updates are pushed to developer's machine during the debug session). We present EDWARD, an event-based out-of-place debugger prototype, implemented by extending the WARDuino WebAssembly microcontroller Virtual Machine, that has been integrated into Visual Studio Code. To validate our approach, we show how our debugger helps uncover IoT bugs representative of real-world applications through several use-case applications. Initial benchmarks show that event-based out-of-place debugging can drastically reduce debugging latency.

Publisher's Version

Machine-Learning-Based Self-Optimizing Compiler Heuristics
Raphael Mosaner, David Leopoldseder, Wolfgang Kisling, Lukas Stadler, and Hanspeter Mössenböck
(JKU Linz, Austria; Oracle Labs, Austria)
Compiler optimizations are often based on hand-crafted heuristics to guide the optimization process. These heuristics are designed to benefit the average program and are otherwise static or only customized by profiling information. We propose machine-learning-based self-optimizing compiler heuristics, a novel approach for fitting optimization decisions in a dynamic compiler to specific environments. This is done by updating a machine learning model with extracted performance data at run time. Related work—which primarily targets static compilers—has already shown that machine learning can outperform hand-crafted heuristics. Our approach is specifically designed for dynamic compilation and uses concepts such as deoptimization for transparently switching between generating data and performing machine learning decisions in single program runs. We implemented our approach in the GraalVM, a high-performance production VM for dynamic compilation. When evaluating our approach by replacing loop peeling heuristics with learned models we encountered speedups larger than 30% for several benchmarks and only few slowdowns of up to 7%.

Publisher's Version

Porting a JIT Compiler to RISC-V: Challenges and Opportunities
Quentin Ducasse, Guillermo Polito, Pablo Tesone, Pascal Cotret, and Loïc Lagadec
(ENSTA Bretagne, France; University of Lille, France; CNRS, France; Inria, France; Centrale Lille, France; CRIStAL, France)
The RISC-V Instruction Set Architecture (ISA) is an open-source, modular and extensible ISA. The ability to add new instructions into a dedicated core opens up perspectives to accelerate VM components or provide dedicated hardware IPs to applications running on top. However, the RISC-V ISA design is clashing on several aspects with other ISAs and therefore software historically built around them. Among them, the lack of condition codes and instruction expansion through simple instruction combination. In this paper we present the challenges of porting Cogit, the Pharo's JIT compiler tightly linked to the x86 ISA, on RISC-V. We present concrete examples of them and the rationale behind their inclusion in the RISC-V ISA. We show how those mismatches are solved through design choices of the compilation process or through tools helping development: a VM simulation framework to keep the development in a high-level environment for the most part, an ISA-agnostic test harness covering main VM functionalities and a machine code debugger to explore and execute generated machine code. We also present a way to prototype custom instructions and execute them in the Pharo environment.

Publisher's Version

SecSharp: Towards Efficient Trusted Execution in Managed Languages (Work in Progress)
Gilang Mentari Hamidy, Pieter Philippaerts, and Wouter Joosen
(KU Leuven, Belgium)
Trusted execution environments (TEEs) gained significant traction in recent years. They have become the foundation of Confidential Computing in cloud services, where certain security properties can be guaranteed on untrusted servers. Despite this adoption, writing code to target TEEs remains challenging. The SDKs for popular TEE implementations, like Intel SGX, are aimed at low-level languages like C/C++. Previous research has introduced support for developing and running programs written in managed programming languages in a TEE environment. However, in these works, the language runtime is embedded into the TEE, increasing the Trusted Computing Base (TCB) and thus inherently reducing trust into the TEE itself. To solve this problem, we propose a new approach to integrate the development of TEE code in managed languages, without the need to embed the full language runtime inside the TEE. It allows developers to write the TEE logic as part of their program in a managed programming language. Using the existing compiler infrastructure, the TEE logic is extracted and passed to a source-to-source compiler that transforms it into a low-level unmanaged equivalent. The resulting low-level code is then compiled by the compiler toolchain targeting the TEE platform. This paper reports on the design and the first results of our work-in-progress implementation of SecSharp, a tool to enable TEE development in C#.

Publisher's Version

Towards a Model Checking Framework for a New Collector Framework
Bochen Xu, Eliot Moss, and Stephen M. Blackburn
(University of Massachusetts at Amherst, USA; Google, Australia)
Garbage collectors provide memory safety, an important step toward program correctness. However, correctness of the collector itself can be challenging to establish, given both the style in which such systems are written and the weakly-ordered memory accesses of modern hardware. One way to maximize benefits is to use a framework in which effort can be focused on the correctness of small, modular critical components from which various collectors may be composed. Full proof of correctness is likely impractical, so we propose to gain a degree of confidence in collector correctness by applying model checking to critical kernels within a garbage collection framework. We further envisage a model framework, paralleling the framework nature of the collector, in hope that it will be easy to create new models for new collectors. We describe here a prototype model structure, and present results of model checking both stop-the-world and snapshot-at-the-beginning concurrent marking. We found useful regularities of model structure, and that models could be checked within possible time and space budgets on capable servers. This suggests that collectors built in a modular style might be model checked, and further that it may be worthwhile to develop a model checking framework with a domain-specific language from which to generate those models.

Publisher's Version

Posters

Analyzing the Cost of Safety for Vectorized Bytecode in Dynamically-Typed Languages
Nicolás Rainhart, Guillermo Polito, Pablo Tesone, and Stéphane Ducasse
(University of Lille, France; Inria, France; CNRS, France; Centrale Lille, France; CRIStAL, France; Pharo Consortium, France)
Vector instructions are a class of processor instructions that allow data level parallelism by performing the same instruction on multiple pieces of data, instead of a single one as usual.
On managed runtimes, special support from the virtual machine is required to generate vector instructions for the host machine’s architecture. There are two possible approaches to achieve this: add virtual machine intrinsics that use vector instructions, or extend the bytecode set with vector operations that are then mapped to processor vector instructions.
Intrinsics are methods implemented at the VM level, using native code. They allow low-level control of the operations being performed and are generally used to build entire operations, such as sorting an array or adding two arrays together.
Bytecode operations, on the other hand, are usually designed to be composable. For example, the operation of adding two arrays can be composed of bytecodes to load a part of an array into a vector register, one to add the vector registers, and one to store the result back into an array.
We implemented both approaches in the Pharo VM and analyzed the impact of such safety checks on performance. We found that the most significant difference in performance between intrinsic methods and bytecode operations stems from the latter performing redundant type checking on bytecode's parameters, bounds checking on array accesses, and overflow tests on array iterators.
These checks are necessary since bytecodes can be composed in arbitrary sequences, and so they need to prevent invalid ones from causing catastrophic errors. This requires setting up safety checks on each bytecode to ensure that each operates with valid parameters. However, once a bytecode sequence is formed, some of these checks could be removed without compromising the correctness of that sequence.
We propose existing compiler optimizations than could be used to reduce the impact of these checks on performance, although we have not yet evaluated their effectiveness.

Publisher's Version

Automatically Transforming Arrays to Columnar Storage at Run Time
Sebastian Kloibhofer, Lukas Makor, David Leopoldseder, Daniele Bonetta, Lukas Stadler, and Hanspeter Mössenböck
(JKU Linz, Austria; Oracle Labs, Austria; Oracle Labs, Netherlands)
Picking the right data structure for the right job is one of the key challenges for every developer. However, especially in the realm of object-oriented programming, the memory layout of data structures is often still suboptimal for certain data access patterns, due to objects being scattered across the heap. Therefore, this work presents an approach for the automated transformation of arrays of objects into a contiguous format (called columnar arrays). At run time, we identify suitable arrays, perform the transformation and use a dynamic compiler to gain performance improvements. In the evaluation, we show that our approach can improve the performance of certain queries over large, uniform arrays.

Publisher's Version

BestGC: An Automatic GC Selector Software
Sanaz Tavakolisomeh, Rodrigo Bruno, and Paulo Ferreira
(University of Oslo, Norway; INESC-ID, Portugal; University of Lisbon, Portugal)
Garbage collection (GC) solutions are widely used in programming languages like Java. Such GC solutions follow prioritized goals and behave differently regarding crucial performance metrics like pause time, throughput, and memory usage. Consequently, running an application using each GC would have an evident impact on the application's performance, especially on those dealing with massive data and tons of transactions. Nevertheless, it is challenging for a user/developer to pick a GC solution that fits an application's performance goals. However, surprisingly, there is no tool to help a user/developer on this matter. In this study, we follow an effective methodology to build a heuristic combining throughput and pause time, with diverse heap sizes available, to score four production GCs (G1, Parallel, Shenandoah, and ZGC). Then we propose a system, BestGC, which offers the most suitable GC solution for a user application taking into account the performance goals of the user application.

Publisher's Version

Characterizing WebAssembly Bytecode
Yuxin Qin, Dejice Jacob, and Jeremy Singer
(University of Glasgow, UK)
WebAssembly, known as Wasm, is an interpreted portable bytecode execution format that is growing in popularity. In this work, we perform a simple pair of characterizations of Wasm. Statically, we compare the instruction set to other portable bytecodes like JVM and PCODE, showing that Wasm operates at a lower abstraction level than JVM, similar to PCODE. Dynamically, we study the Wasm instruction mix for a set of common benchmark applications. This investigation reveals that, like JVM, data movement operations occur most frequently in instruction traces. We conclude by discussing possible future directions for optimizing Wasm execution.

Publisher's Version

Selecting Semi-permanent Object Candidates in Dynamically-Typed Reflective Languages
Nahuel Palumbo, Pablo Tesone, Guillermo Polito, and Stéphane Ducasse
(University of Lille, France; Inria, France; CNRS, France; Centrale Lille, France; CRIStAL, France; Pharo Consortium, France)
Garbage collector pauses can take human perceivable pauses on big heaps. In this poster, we argue the existence of semi-permanent objects: objects that have really low chances of being collected, and even lower chances in production scenarios where the application code is not subject to change at run-time. The key challenge that stems from having a semi-permanent generation is to select what objects are meant to be moved or allocated to such generation. We present preliminary work that shows that a manual selection of code-related objects following hand-crafted heuristics reduces GC pause times by half.

Publisher's Version

Demonstrations

Boehm-Demers-Weiser Garbage Collection on Morello
Dejice Jacob and Jeremy Singer
(University of Glasgow, UK)
The Boehm-Demers-Weiser collector is a conservative garbage collection scheme generally used for C/C++ applications. In this demo, we will show the collector running with C benchmark programs on a new CHERI-style hardware capability platform, specifically the Arm Morello system.

Publisher's Version

Polyglot, Label-Defined Dynamic Taint Analysis in TruffleTaint
Jacob Kreindl, Daniele Bonetta, Lukas Stadler, David Leopoldseder, and Hanspeter Mössenböck
(JKU Linz, Austria; Oracle Labs, Netherlands; Oracle Labs, Austria)
Dynamic taint analysis assigns taint labels to sensitive data and tracks the propagation of such tainted data during program execution. This program analysis technique has been implemented in various analysis platforms targeting specific programming languages or program representations and has been applied to diverse fields such as software security and debugging. While some of these platforms support customization of their taint analysis, such customization is typically limited to certain analysis properties or to predefined options. This limitation can require analysis developers to modify the analysis platform in order to adapt other analysis properties or to implement new taint analysis applications.
We designed label-defined dynamic taint analysis as a new approach to specifying a dynamic taint analysis in terms of taint labels. This approach enables an analysis platform to allow analysis developers to adapt arbitrary analysis properties without modifying the platform itself. We implemented our approach in TruffleTaint, a GraalVM-based dynamic taint analysis platform targeting multiple programming languages. Our prototype supports implementing taint analyses in multiple programming languages and further provides tooling support for analysis development. In this tool demonstration we will present the capabilities of our prototype and demonstrate the implementation of label-defined dynamic taint analyses with common adaptations to various analysis properties.

Publisher's Version

proc time: 5.39