Powered by
2024 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH 2024),
October 20–25, 2024,
Pasadena, CA, USA
2024 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH 2024)
Frontmatter
Doctoral Symposium
Unified Analysis Techniques for Programs with Outcomes
Noam Zilberstein
(Cornell University, USA)
The theories required for program analysis have evolved past the foundations developed decades ago.
Programs display computational effects such as nondeterminism, randomization, exceptions, and mutable state.
Orthogonally,
bug-finding is often more tangible than establishing correctness.
Specialized techniques have been introduced to handle effects and incorrectness reasoning.
While successful, the disjointed nature of those techniques means that new analysis systems must be built to analyze each kind of program and property.
In my research, I am creating a logical foundation to consolidate the aforementioned theories, with which I plan to develop a new generation of static analysis tools that are simpler and more efficient than those available today.
Article Search
A VM-Based Approach for Power Modeling
Joseph Raskind
(SUNY Binghamton, USA)
Power modeling is an essential building block for computer systems in support of energy optimization, energy profiling, and energy-aware application development. In 2024, I introduced VESTA, a novel approach to modeling the power consumption of applications with one key insight: language runtime events are often correlated with a sustained level of power consumption. When compared with the established approach of power modeling based on hardware performance counters (HPCs), VESTA has the benefit of solely requiring application-scoped information and enabling a higher level of explainability, while achieving comparable or even higher precision. I hope to continue on this track of discovering the deep connections between VM-events and power consumption while also providing practical solutions for mapping one to the other through the construction of power models.
Article Search
Full-Stack Collaboration for Robust Heterogeneity-Enabled AI Systems
Yuxin Qiu
(University of California at Riverside, USA)
In this new era of AI with diverse hardware accelerators such as GPUs and quantum circuits, achieving system-wide robustness requires tackling issues throughout all system layers, spanning from software applications to hardware components. My research is to enhance the robustness of heterogeneity-enabled AI systems by reinventing software testing and analysis techniques via leveraging full-stack insights and advanced AI capabilities. I have completed one research project and have collaborated on a couple of others at the application and language levels. As the next steps, I will explore (1) holistic regression testing to prioritize test inputs associated with system-wide changes and (2) full-stack analysis to optimize computing resource allocation and reduce hardware reliance by analyzing application characteristics and using alternative resources in tandem.
Article Search
JMVX: Improving Record-Replay for Managed Languages
David Schwartz
(University of Illinois at Chicago, USA)
Record-Replay (RR) is a useful technique for investigating concurrency related bugs whose appearance are subject to particular thread schedulings. RR logs/records the nondeterministic inputs that influence a program such as thread scheduling, shared memory operations, and external file/socket data. The recording can be replayed, providing the same inputs to replicate the bug. Unfortunately, RR is not good at patch validation. Mutable replay is when a recording generated by the buggy version of a program is replayed on the patched version. Patches typically introduce divergent behavior which often prevents replay entirely.
In addition, current RR systems are slow for languages running on a Virtual Machine like Java. They limit parallelism while forcing the Garbage Collector and Just-In-Time compiler to behave eactly the same between the recording and replay. RR systems designed for Java replay mulithreaded code incorrectly or require the use of a customized JVM.
To alleviate this, we developed a prototype RR system for Java called JMVX. It operates within Java bytecode, allowing it to tolerate benign divergences --- differences in a program's execution caused by functionally equivalent operations introduced by the garbage collector, just in time compiler, and class loading. JMVX supports multithreaded replay by providing a total ordering of monitor/lock operations via a vector clock. JMVX will serve as a foundation to experiment with partial order implementations of the vector clock and specialized support for thread pools. In addition, we plan to support mutable replay by allowing the replay to interact "live" with a mock version of a resource's snapshot.
Article Search
Static-Dynamic Information Flow Control in Rust
Vincent Beardsley
(Ohio State University, USA)
Information Flow Control (IFC) provides security by preventing high-secrecy data from flowing to low-secrecy data, and low-integrity data from flowing to high-integrity data. However, prior approaches to IFC either suffer from imprecision and falsely identify security violations, or require modification of the programming language or compiler, reducing the practical utility for programmers. Recent work Cocoon demonstrates an approach to IFC that allows for precision without modifying the language or compiler, but is severely limited in its practical use by virtue of only supporting IFC for static secrecy. I have created an IFC approach called Carapace that adds support for both integrity and dynamic labels.
Article Search
Step-wise Execution of Data-Centric Systems
Chi Zhang
(Nanjing University, China)
Data-centric systems play a crucial role in computer systems, and their correctness is of paramount importance. Logic bugs are one of the main types of bugs in data-centric systems, leading to incorrect results. Existing methods for testing data-centric systems face challenges such as ineffective test oracles, insufficient coverage of target functionalities, and low efficiency in test case generation. To address these issues, we propose a general, novel black-box methodology for testing various kinds of important data-centric systems. Our key insight is that we can incrementally generate the test case and use the intermediate results to construct the reference test case. The result of the original test case should be identical to that of the reference test case. We have applied this methodology to test both Datalog engines and DBMSs, successfully uncovering 75 unique bugs. We believe that this idea might be applicable to many other systems as well, and will next explore this idea to Deep Learning libraries.
Article Search
Student Research Competition
A Parameterized Framework for the Formal Verification of Zero-Knowledge Virtual Machines
Youwei Zhong
(Shanghai Jiao Tong University, China)
Zero-knowledge virtual machines (zkVMs) enable verifiable computation on via succinct Zero-knowledge proofs (ZKPs). However, current zkVMs, still in development, show many bugs. This paper introduces a parameterized framework for the formal verification of zkVMs in Coq. We prove the soundness and completeness of the constraint generation algorithm from machine instructions to semantics-level constraints. Existing works target specific zkVMs, and require repeated proof work in this phase, whereas our proofs are parameterized and can be reused in development and by all zkVMs. We also demonstrate the generality of our framework by instantiation on two examples: Cairo VM and a simplified zkEVM.
Article Search
Algebraic Effect Handlers with Bidirectional Type-Checking
Maya Mückenschnabel
(Unaffiliated, Czechia)
In the development of type systems there are multiple paths for
achieving more expressiveness. On the one hand, algebraic effect
handlers allow us to reason about
program's side effects. On the other hand,
dependent types make it possible to more
precisely reason about program states and data and to prove general
mathematical statements.
Our work is concerned with the development of a practical Lisp-based
programming language that combines dependent types with effect
handlers. More specifically we focus on describing the type system
that has the two following features:
Article Search
Automatically Generating an Abstract Interpretation-Based Optimizer from a DSL
Ken Jin Ooi
(National University of Singapore, Singapore)
Just-in-Time (JIT) compilers can gain information at run time that are not available to Ahead-of-Time (AOT) compilers. As such, abstract interpretation baseline JIT compilers are common in many dynamic language implementations. Yet the reference implementation of Python --- CPython, has largely avoided implementing a baseline JIT compiler, likely due to the prohibitive maintenance costs associated with one. This paper implements an abstract-interpretation based optimizer for CPython bytecode that is easy to maintain and less error-prone by automatically generating the optimizer from a pre-existing Domain Specific Language (DSL) --- reusing the same DSL used to specify the interpreter. The key insight presented in this paper is that the very same DSL used to generate a concrete interpreter can also generate an abstract interpreter, providing multiple benefits such as being less error-prone and greater extensibility. The proposed abstract interpreter has been accepted into CPython 3.13 and forms a part of its experimental JIT compiler.
Article Search
A Formal Approach to the Analysis of Human-Machine Interaction with Fuzzy Logic
Leyi Cui
(Columbia University, USA)
Human uses mental models when interacting with systems. Misalignment between a mental model and the system design, known as mode confusions, can lead to automation surprises. To better handle the vagueness of mental models through formalization, Fuzzy Mental Model Finite State Machines (FMMFSMs), incorporating fuzzy logic, have been introduced. This work explores the potential of FMMFSMs in formal analyses of human-machine interactions, proposing a set of formal behavior patterns of mode confusions and a tool for identifying mode confusions and unsafe interactions.
Article Search
Automatic Local Inverse Calculation for Change of Variables
Elias Rojas Collins
(Massachusetts Institute of Technology, USA)
Inversion is a fundamental operation that arises frequently in probabilistic inference and computer graphics. For example, inversion is used to decrease variance and to enable differentiation in variational inference (e.g., reparameterization trick) and in differentiable rendering (e.g., to integrate over object boundaries). Existing approaches to inversion limit the class of functions inverted, for example, to affine functions, or require a user-specified inverse. We study when a local inverse—an inverse that is valid in a neighborhood of a point—exists. We provide an algorithm to approximate the local inverse and give the convergence rate of the solver. We present LIN, a system that automatically computes the local inverse of a function using a fixed-point solver. We implement LIN in Python and use it to automatically compute the local inverse of affine, polar, and hyperbolic changes of variables arising in image stylization.
Article Search
Kawa: An Abstract Language for Scalable and Variable Detection of Spectre Vulnerabilities
Zheyuan Wu,
Haoyi Zeng, and
Aaron Bies
(Saarland University, Germany)
Since the discovery of Spectre attacks, various detection methods for speculative vulnerabilities have been developed. Sound static analyses based on symbolic execution give precise results but lack scalability, while pattern-based analyses can accommodate large code bases but may be unsound and require manually crafted patterns for each microarchitecture.
We introduce Kawa, an abstract language designed to model control and data flows, allowing efficient analysis of Spectre vulnerabilities. Kawa's abstract nature also enables interpretation as schemata to capture entire classes of concrete programs with speculative leaks.
Article Search
Design of Fractional Permissions for a Gradual Verifier
Craig Liu
(Purdue University, USA)
Static verification assures program correctness but requires heavy user annotation. Gradual verification alleviates this burden by allowing users to write partial, imprecise specifications that are checked statically where possible and dynamically when necessary. The first gradual verifier, Gradual C0, reasons about shared heap memory through a permission logic. This paper describes an extension to Gradual C0 which supports fractional permissions in its permission logic.
Article Search
Grammar Derivation Visualization in Automata Theory
Tijana Minić and
Andrés M. Garced
(Seton Hall University, USA)
In Formal Languages and Automata Theory courses, students are exposed to grammars. They are expected to learn how to design them and how to use them to derive words in the grammar’s language. Some students, however, find understanding word derivation difficult due to nondeterminism and find it difficult to determine if the grammars they develop are correct. This research project aims to develop novel visualization tools to help students understand grammar derivation and establish the correctness of their designs.
Article Search
VESC: Towards Temporal Verification of Smart Contracts
Samuel Larsen,
Kevin Johanson, and
Yuandong Cyrus Liu
(Grinnell College, USA)
Blockchain technologies are applied in diverse domains such as financial systems, supply chains, and identity management, leading to the emergence of various smart contract languages design. These contracts often involve time dependent transactions recorded immutably on the blockchain, making their correctness crucial. This paper addresses the formal verification of temporal behaviors in smart contracts without human interaction. We study 9 recent smart contract languages used in 7 leading blockchains and model 27 common temporal patterns from 3148 benchmarks across 9 domain specific application categories. We introduce VESC, a temporal specification language that allows developers to specify temporal properties in structured natural language, which VESC compiles into formal linear temporal logic. Our experiments demonstrate that VESC effectively specifies common temporal behaviors, paving the way for automated temporal verification of smart contracts.
Article Search
Understanding Program Visualizations in the Wild
Joel Enrique Castro Hernandez and
Olohi Goodness John
(University of California at Berkeley, USA; Smith College, USA)
Visualizations play a significant role in writing, debugging, profiling, and generally understanding programs. However, little work has been done to understand the structure of program visualizations at a fundamental level— namely, why and how they scale. Previous work on visualization comprehension focuses narrowly on a few domains or on surface-level details. While this work makes it possible to point out how different features of two vastly different visualizations may be independently effective, it does not provide an underlying theory that explains both. We develop a theory of the interpretability of program visualizations focused on their abstraction and composition properties. We summarize the preliminary results from our grounded theory study of open-source software visualizations, comprising more than 150 examples. Finally, we hint at work-in-progress to experimentally confirm our findings.
Article Search
Posters
Meerkat: Distributed Reactive Live Semantics with Causal Consistency
Heng Zhong and
Anrui Liu
(Fudan University, China; Carnegie Mellon University, USA)
We introduce Meerkat, a novel, type-safe reactive programming language supporting concurrent live code updates in distributed scenarios. Meerkat seamlessly integrates the development of both server-side and client-side components within a unified framework. The language ensures causal consistency during data updates through the Historiographer algorithm. The soundness of concurrent code updates is maintained by a type-checking process protected by our locking protocol.
Article Search
Ordering Rejectable Stacks in SGLR Parsing
Jeff Smits and
Daniel A. A. Pelsmaeker
(Delft University of Technology, Netherlands)
Scannerless Generalised LR can parse context-free grammars extended with conjunction-with-negation, represented by reject rules. To handle reject rules correctly, Visser defines a specific ordering between states that can be reduced during parsing: first reduce non-rejectable states, then rejectable states. However, he remarks that in specific situations an ordering between rejectable states is required for correct parsing results, for example when encoding intersections with reject rules. The question of how to order rejectable states was left open. In this work, we present a solution that orders stacks topped by rejectable states.
Article Search
proc time: 6.13