MPLR 2020 – Proceedings

Welcome from the Chairs
Welcome to MPLR 2020, the 17th International Conference on Managed Programming Languages and Runtimes. MPLR is a successor to the conference series on Managed Languages and Runtimes (ManLang). It is a premier forum for presenting and discussing novel results in all aspects of managed programming languages and runtime systems, which serve as building blocks for some of the most important computing systems around, ranging from small-scale (embedded and real-time systems) to large-scale (cloud-computing and big-data platforms) and anything in between (mobile, IoT, and wearable applications).

Keynotes

Garbage Collection: Implementation, Innovation, Performance, and Security (Keynote)
Stephen M. Blackburn

(Australian National University, Australia)
Garbage collection is a key technology underpinning many popular languages today. However, garbage collectors are the epitome of performance- and security-critical low-level systems programming. They are notoriously difficult to implement correctly and are extremely performance-sensitive. Performance, innovation, and security are often viewed as being in tension; all the more so in systems programming. In this talk, I will reflect on my experience in academia and industry building garbage collectors, and will particularly address the question of how critical systems software can be engineered to maximize performance, innovation and security.

Publisher's Version

Hardware Support for Managed Languages: An Old Idea Whose Time Has Finally Come? (Keynote)
Martin Maas

(Google Research, USA)
A large number of workloads are written in managed languages, including server workloads in data centers, web applications in browsers, and client workloads on desktops or mobile devices. Due to their widespread adoption, improving the performance and efficiency of managed-language features such as garbage collection, JIT compilation, and dynamic runtime checks can have a significant impact on many real workloads. Hardware support for such features has been investigated since the 1970s, but still has not seen widespread practical adoption.
In this talk, I will discuss trends that make today a great time to revisit these ideas. I will describe work done at UC Berkeley that moves garbage collection into a small hardware accelerator close to the memory controller and performs GC more efficiently than a CPU. I will also talk about current work within the open-source RISC-V project on developing standard extensions for managed-language support in the context of the free and open RISC-V ISA. Finally, I will lay out opportunities for research in this area, and how open-source infrastructure can be used to build end-to-end prototypes of this work in an academic setting.

Publisher's Version

Research Papers

From Causality to Stability: Understanding and Reducing Meta-Data in CRDTs
Jim Bauwens and Elisa Gonzalez Boix
(Vrije Universiteit Brussel, Belgium)
Modern distributed applications increasingly replicate data to guarantee both high availability of systems and optimal user experience. Conflict-Free Replicated Data Types (CRDTs) are a family of data types specially designed for highly available systems that guarantee some form of eventual consistency. To ensure state convergence between replicas, CRDT implementations need to keep track of additional meta-data. This is not a scalable strategy, as a growing amount of meta-data has to be kept.
In this paper, we show that existing solutions for this problem miss optimisation opportunities and may lead to less reactive CRDTs. For this, we analyse the relation between meta-data and the causality of operations in operation-based CRDTs. We explore a new optimisation strategy for pure operation-based CRDTs and show how it reduces memory overhead. Our approach takes advantage of the communication layer providing reliable delivery to determine causal stability, and as a result, meta-data can be removed sooner. We furthermore propose a solution for improving the reactivity of CRDTs built on a reliable causal broadcasting layer.
We apply our strategy to pure-operation based CRDTs and validate our approach by measuring its impact on several different set-ups. The results show how our approach can lead to significant improvements in meta-data cleanup when compared to state-of-the-art techniques.

Publisher's Version

Multi-language Dynamic Taint Analysis in a Polyglot Virtual Machine
Jacob Kreindl

, Daniele Bonetta

, Lukas Stadler

, David Leopoldseder

, and Hanspeter Mössenböck

(JKU Linz, Austria; Oracle Labs, USA; Oracle Labs, Austria)
Dynamic taint analysis is a popular program analysis technique in which sensitive data is marked as tainted and the propagation of tainted data is tracked in order to determine whether that data reaches critical program locations. This analysis technique has been successfully applied to software vulnerability detection, malware analysis, testing and debugging, and many other fields. However, existing approaches of dynamic taint analysis are either language-specific or they target native code. Neither is suitable for analyzing applications in which high-level dynamic languages such as JavaScript and low-level languages such as C interact.In these approaches, the language boundary forms an opaque barrier that prevents a sound analysis of data flow in the other language and can thus lead to the analysis being evaded.
In this paper we introduce TruffleTaint, a platform for multi-language dynamic taint analysis that uses language-independent techniques for propagating taint labels to overcome the language boundary but still allows for language-specific taint propagation rules. Based on the Truffle framework for implementing runtimes for programming languages, TruffleTaint supports propagating taint in and between a selection of dynamic and low-level programming languages and can be easily extended to support additional languages. We demonstrate TruffleTaint’s propagation capabilities and evaluate its performance using several benchmarks from the Computer Language Benchmarks Game, which we implemented as combinations of C, JavaScript and Python code and which we adapted to propagate taint in various scenarios of language interaction. Our evaluation shows that TruffleTaint causes low to zero slowdown when no taint is introduced, rivaling state-of-the-art dynamic taint analysis platforms, and only up to ∼40x slowdown when taint is introduced.

Publisher's Version

Efficient, Near Complete, and Often Sound Hybrid Dynamic Data Race Prediction
Martin Sulzmann

and Kai Stadtmüller
(Karlsruhe University of Applied Sciences, Germany)
Dynamic data race prediction aims to identify races based on a single program run represented by a trace. The challenge is to remain efficient while being as sound and as complete as possible. Efficient means a linear run-time as otherwise the method unlikely scales for real-world programs. We introduce an efficient, near complete and often sound dynamic data race prediction method that combines the lockset method with several improvements made in the area of happens-before methods. By near complete we mean that the method is complete in theory but for efficiency reasons the implementation applies some optimizations that may result in incompleteness. The method can be shown to be sound for two threads but is unsound in general. Experiments show that our method works well in practice.

Publisher's Version

Efficient Dispatch of Multi-object Polymorphic Call Sites in Contextual Role-Oriented Programming Languages
Lars Schütze

and Jeronimo Castrillon

(TU Dresden, Germany)
Adaptive software becomes more and more important as computing is increasingly context-dependent. Runtime adaptability can be achieved by dynamically selecting and applying context-specific code. Role-oriented programming has been proposed as a paradigm to enable runtime adaptive software by design. Roles change the objects’ behavior at runtime, thus adapting the software to a given context. The cost of adaptivity is however a high runtime overhead stemming from executing compositions of behavior-modifying code. It has been shown that the overhead can be reduced by optimizing dispatch plans at runtime when contexts do not change, but no method exists to reduce the overhead in cases with high context variability. This paper presents a novel approach to implement polymorphic role dispatch, taking advantage of run-time information to effectively guard abstractions and enable reuse even in the presence of variable contexts. The concept of polymorphic inline caches is extended to role invocations. We evaluate the implementation with a benchmark for role-oriented programming languages achieving a geometric mean speedup of 4.0× (3.8× up to 4.5×) with static contexts, and close to no overhead in the case of varying contexts over the current implementation of contextual roles in Object Teams.

Publisher's Version

Work-in-Progress Papers

SymJEx: Symbolic Execution on the GraalVM
Sebastian Kloibhofer

, Thomas Pointhuber

, Maximilian Heisinger, Hanspeter Mössenböck

, Lukas Stadler

, and David Leopoldseder

(JKU Linz, Austria; Oracle Labs, Austria)
Developing software systems is inherently subject to errors that can later cause failures in production. While testing can help to identify critical issues, it is limited to concrete inputs and states. Exhaustive testing is infeasible in practice; hence we can never prove the absence of faults. Symbolic execution, i.e., the process of symbolically reasoning about the program state during execution, can inspect the behavior of a system under all possible concrete inputs at run time. It automatically generates logical constraints that match the program semantics and uses theorem provers to verify the existence of error states within the application. This paper presents a novel symbolic execution engine called SymJEx, implemented on top of the multi-language Java Virtual Machine GraalVM. SymJEx uses the Graal compiler's intermediate representation to derive and evaluate path conditions, allowing GraalVM users to leverage the engine to improve software quality. In this work, we show how SymJEx finds non-trivial faults in existing software systems and compare our approach with established symbolic execution engines.

Publisher's Version

Transparent Acceleration of Java-based Deep Learning Engines
Athanasios Stratikopoulos

, Mihai-Cristian Olteanu, Ian Vaughan, Zoran Sevarac, Nikos Foutris

, Juan Fumero, and Christos Kotselidis

(University of Manchester, UK; Deep Netts, USA)
The advent of modern cloud services, along with the huge volume of data produced on a daily basis, have increased the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. In recent years, hardware accelerators have been employed as a means to meet this demand, due to the high parallelism that these applications exhibit. Although this approach can yield high performance, the development of new deep learning neural networks on heterogeneous hardware requires a steep learning curve. The main reason is that existing deep learning engines support the static compilation of the accelerated code, that can be accessed via wrapper calls from a wide range of managed programming languages (e.g., Java, Python, Scala). Therefore, the development of high-performance neural network architectures is fragmented between programming models, thereby forcing developers to manually specialize the code for heterogeneous execution. The specialization of the applications' code for heterogeneous execution is not a trivial task, as it requires developers to have hardware expertise and use a low-level programming language, such as OpenCL, CUDA or High Level Synthesis (HLS) tools.
In this paper we showcase how we have employed TornadoVM, a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts on heterogeneous hardware. Our work shows how a pure Java-based deep learning neural network engine can be dynamically compiled at runtime and specialized for particular hardware accelerators, without requiring developers to employ any low-level programming framework typically used for such devices. Our preliminary results show up to 6.45x end-to-end performance speedup and up to 88.5x kernel performance speedup, when executing the feed forward process of the network's training on the GPUs against the sequential execution of the original Deep Netts framework.

Publisher's Version

You Can’t Hide You Can’t Run: A Performance Assessment of Managed Applications on a NUMA Machine
Orion Papadakis

, Foivos S. Zakkak

, Nikos Foutris

, and Christos Kotselidis

(University of Manchester, UK; Red Hat, UK)
The ever-growing demand for more memory capacity from applications has always been a challenging factor in computer architecture. The advent of the Non Unified Memory Access (NUMA) architecture has achieved to work around the physical constraints of a single processor by providing more system memory using pools of processors, each with their own memory elements, but with variable access times. However, the efficient exploitation of such computing systems is a non-trivial task for software engineers. We have observed that the performance of more than half of the applications picked from two distinct benchmark suites is negatively affected when running on a NUMA machine, in the absence of manual tuning. This finding motivated us to develop a new profiling tool, so called PerfUtil, to study, characterize and better understand why those benchmarks have sub-optimal performance on NUMA machines. PerfUtil's effectiveness is based on its ability to track numerous events throughout the system at the managed runtime system level, that, ultimately, assists in demystifying NUMA peculiarities and accurately characterize managed applications profiles.

Publisher's Version

trcview: Interactive Architecture Agnostic Execution Trace Analysis
Daniel Pekarek

and Hanspeter Mössenböck

(JKU Linz, Austria)
Debuggers are traditionally used to investigate and observe the dynamic behavior of software. Reverse debuggers record a program execution and provide a method to step through the program forward and backward, to quickly locate the operation of interest. However, recording based approaches usually assume a specific method of recording the program execution. Furthermore, the recording and analysis is often linked in a certain way, so that it is not trivial, to quickly add support for new architectures or other recording tools.
To solve these shortcomings, we defined a set of essential event types, to fully capture a program execution in a platform-independent way. A prototype of the interactive trace analysis software was implemented in Java, which can handle recorded execution traces with 65 million instructions, when using a Java heap size of 16GiB for the analysis tool. To validate the platform-independence, 3 fundamentally different architectures were tested: AMD64, PowerPC, and PDP-11.

Publisher's Version

MPLR 2020 – Proceedings

Frontmatter

Keynotes

Research Papers

Work-in-Progress Papers