Powered by
Proceedings of the ACM on Programming Languages, Volume 9, Number PLDI,
June 16–20, 2025,
Seoul, Republic of Korea
Frontmatter
Papers
Verifying General-Purpose RCU for Reclamation in Relaxed Memory Separation Logic
Jaehwang Jung,
Sunho Park,
Janggun Lee,
Jeho Yeon, and
Jeehoon Kang
(KAIST, Republic of Korea)
Read-Copy-Update (RCU) is a critical synchronization mechanism for concurrent data structures, enabling efficient deferred memory reclamation. However, implementing and using RCU correctly is challenging due to its inherent concurrency complexities. While previous work verified RCU, they either relied on unrealistic assumptions of sequentially consistent (SC) memory model or lacked three key features of general-purpose RCU libraries: modular specification, switchable critical sections, and concurrent writer support.
We present the first formal verification of a general-purpose RCU in realistic relaxed memory consistency (RMC), addressing the challenges posed by these features. To achieve modular specification that encompasses relaxed behaviors, we extend existing SC specifications to account for explicit synchronization. To support switchable critical sections, which require read-after-write (RAW) synchronization, we introduce a reasoning principle for RAW-synchronizing SC fences. Using this principle, we also present the first formal verification of Peterson’s mutex in RMC. To support concurrent writers performing partially ordered writes, we avoid assuming a total order of links and instead formulate invariants based on per-node incoming link histories. Our proofs are mechanized in the iRC11 relaxed memory separation logic, built upon Iris, in Rocq.
Article Search
Artifacts Available
Leveraging Immutability to Validate Hazard Pointers for Optimistic Traversals
Janggun Lee,
Jeonghyeon Kim, and
Jeehoon Kang
(KAIST, Republic of Korea)
Hazard pointers (HP) is one of the earliest manual memory reclamation algorithms for concurrent data structures. It is widely used for its robustness: memory overhead is bounded (e.g., by the number of threads). To access a node, threads first announce the protection of each to-be-accessed node, which prevents its reclamation. After announcement, they validate the node's reachability from the root to ensure that no threads have missed the announcement and reclaimed it. Traversal-based data structures typically takes a marking-based validation strategy. This strategy uses a node's mark to indicate whether the node is to be detached. Unmarked nodes are considered safe to traverse as both the node and its successors are still reachable, while marked nodes are considered unsafe. However, this strategy is inapplicable to the efficient optimistic traversal strategy that skips over marked nodes.
We propose a new validation strategy for HP that supports lock-free data structures with optimistic traversal, such as lists, trees, and skip lists. The key idea is to exploit the immutability of marked nodes, and validate their reachability at once by checking the reachability of the most recent unmarked node. To ensure correctness, we prove the safety of Harris's list protected with the new strategy in Rocq using the Iris separation logic framework. We show that the new strategy's performance is competitive with state-of-the-art reclamation algorithms when applied to data structures with optimistic traversal, while remaining simple and robust.
Article Search
Artifacts Available
Verifying Lock-Free Traversals in Relaxed Memory Separation Logic
Sunho Park,
Jaehwang Jung,
Janggun Lee, and
Jeehoon Kang
(KAIST, Republic of Korea)
We report the first formal verification of a lock-free list, skiplist, and a skiplist-based priority queue against a strong specification in relaxed memory consistency (RMC). RMC allows relaxed behaviors in which memory accesses may be reordered with other operations, posing two significant challenges for the verification of lock-free traversals. (1) Specification challenge: formulating a specification that is flexible enough to capture relaxed behaviors, yet simple enough to be easily understood and used. We address this challenge by proposing the per-key linearizable history specification that enforces a total order of operations for each key that respects causality, rather than a total order of all operations. (2) Verification challenge: devising verification techniques for reasoning about the reachability of edges for traversing threads, which can read stale edges due to relaxed behaviors. We address this challenge by introducing the shadowed-by relation that formalizes the notion of outdated edges. This relation enables us to establish a total order of edges and thus their associated operations for each key, required to satisfy the strong specification. All our proofs are mechanized on the iRC11 relaxed memory separation logic, built on the Iris framework in Rocq.
Article Search
Artifacts Available
RefinedProsa: Connecting Response-Time Analysis with C Verification for Interrupt-Free Schedulers
Kimaya Bedarkar,
Laila Elbeheiry,
Michael Sammler,
Lennard Gäher,
Björn Brandenburg,
Derek Dreyer, and
Deepak Garg
(MPI-SWS, Germany; ETH Zurich, Switzerland; ISTA, Switzerland)
There has been a recent upsurge of interest in formal, machine-checked verification of timing guarantees for C implementations of real-time system schedulers. However, prior work has only considered tick-based schedulers, which enjoy a clearly defined notion of time: the time "quantum". In this work, we present a new approach to real-time systems verification for interrupt-free schedulers, which are commonly used in deeply embedded and resource-constrained systems but which do not enjoy a natural notion of periodic time. Our approach builds on and connects two recently developed Rocq-based systems—RefinedC (for foundational C verification) and Prosa (for verified response-time analysis)—adapting the former to reason about timed traces and the latter to reason about overheads. We apply the resulting system, which we call RefinedProsa, to verify Rössl, a simple yet representative, fixed-priority, non-preemptive, interrupt-free scheduler implemented in C.
Article Search
Efficient Timestamping for Sampling-Based Race Detection
Minjian Zhang,
Daniel Wee Soong Lim,
Mosaad Al Thokair,
Umang Mathur, and
Mahesh Viswanathan
(University of Illinois at Urbana-Champaign, USA; National University of Singapore, Singapore; Saudi Aramco, Saudi Arabia)
Dynamic race detection based on the happens before (HB) partial order has now become the de facto approach to quickly identify data races in multi-threaded software. Most practical implementations for detecting these races use timestamps to infer causality between events and detect races based on these timestamps. Such an algorithm updates timestamps (stored in vector clocks) at every event in the execution, and is known to induce excessive overhead. Random sampling has emerged as a promising algorithmic paradigm to offset this overhead. It offers the promise of making sound race detection scalable. In this work we consider the task of designing an efficient sampling based race detector with low overhead for timestamping when the number of sampled events is much smaller than the total events in an execution. To solve this problem, we propose (1) a new notion of freshness timestamp, (2) a new data structure to store timestamps, and (3) an algorithm that uses a combination of them to reduce the cost of timestamping in sampling based race detection. Further, we prove that our algorithm is close to optimal --- the number of vector clock traversals is bounded by the number of sampled events and number of threads, and further, on any given dynamic execution, the cost of timestamping due to our algorithm is close to the amount of work any timestamping-based algorithm must perform on that execution, that is it is instance optimal. Our evaluation on real world benchmarks demonstrates the effectiveness of our proposed algorithm over prior timestamping algorithms that are agnostic to sampling.
Article Search
Artifacts Available
Relaxing Alias Analysis: Exploring the Unexplored Space
Michel Weber,
Theodoros Theodoridis, and
Zhendong Su
(ETH Zurich, Switzerland)
Alias analysis is a fundamental compiler analysis that powers numerous optimizations. While research has focused on deriving more precise alias information assuming that the compiler will optimize better, recent work shows a negligible, or even negative, performance impact of alias information. In this work, we shift the perspective from refining to relaxing alias information, i.e., removing information, to complement existing work and challenge that assumption systematically. Our study on a state-of-the-art compiler, LLVM, running the SPEC CPU 2017 benchmark suite, shows (1) a small overall impact—removing alias analysis entirely has little impact on the final binaries, (2) few influential queries—only a small fraction, namely ∼3%, of the alias information leads to changes in the final binary, and (3) lost potential—random relaxations can reduce execution time by 21% and binary size by 39% for certain cases, suggesting that compilers could better utilize alias information. Through this work, we advocate that it is beneficial for future research to avoid simply refining the general precision of alias analysis, but also to explore how to find and refine the most relevant queries, and how to more effectively utilize alias information.
Article Search
Artifacts Available
StacKAT: Infinite State Network Verification
Jules Jacobs,
Nate Foster,
Tobias Kappé,
Dexter Kozen,
Lily Saada,
Alexandra Silva, and
Jana Wagemaker
(Cornell University, USA; Jane Street, USA; Leiden University, Netherlands; Radboud University Nijmegen, Netherlands)
We develop StacKAT, a network verification language featuring loops, finite state variables, nondeterminism, and---most importantly---access to a stack with accompanying push and pop operations.
By viewing the variables and stack as the (parsed) headers and (to-be-parsed) contents of a network packet, StacKAT can express a wide range of network behaviors including parsing, source routing, and telemetry. These behaviors are difficult or impossible to model using existing languages like NetKAT.
We develop a decision procedure for StacKAT program equivalence, based on finite automata.
This decision procedure provides the theoretical basis for verifying network-wide properties and is able to provide counterexamples for inequivalent programs.
Finally, we provide an axiomatization of StacKAT equivalence and establish its completeness.
Article Search
Partial Evaluation, Whole-Program Compilation
Chris Fallin and
Maxwell Bernstein
(F5, USA; Recurse Center, USA)
There is a tension in dynamic language runtime design between speed and correctness. State-of-the-art JIT compilation, the result of enormous industrial investment and significant research, achieves heroic speedups at the cost of complexity. This complexity leads to subtle and sometimes catastrophic correctness bugs. Much of this complexity comes from the existence of multiple tiers and the need to maintain correspondence between these separate definitions of the language’s semantics; it also comes from the indirect nature of the semantics implicitly encoded in a compiler backend. One way to address this complexity is to automatically derive, as much as possible, the compiled code from a single source-of-truth, such as the interpreter tier. In this work, we introduce a partial evaluator that can compile a whole guest-language function ahead-of-time, without tracing or profiling, “for free.” This transform unrolls an interpreter function expressed in a standard compiler intermediate representation (static single assignment or SSA) and uses partial evaluation of the interpreter function and its regular control flow to drive the guest-language compilation. The effect of this is that the transform is applicable to almost unmodified existing interpreters in systems languages such as C or C++, producing ahead-of-time guest-language compilers. We show the effectiveness of this new tool by applying it to the interpreter tier of an existing industrial JavaScript engine, SpiderMonkey, yielding 2.17× speedups, and the PUC-Rio Lua interpreter, yielding 1.84× speedups. Finally, we outline an approach to carry this work further, deriving more of the capabilities of a JIT backend from first principles while retaining correctness.
Article Search
Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact
Lucian Popescu and
Nuno P. Lopes
(INESC-ID, Portugal; Instituto Superior Técnico - University of Lisbon, Portugal)
The C and C++ languages define hundreds of cases as having undefined behavior (UB). These include, for example, corner cases where different CPU architectures disagree on the semantics of an instruction and the language does not want to force a specific implementation (e.g., shift by a value larger than the bitwidth). Another class of UB involves errors that the language chooses not to detect because it would be too expensive or impractical, such as dereferencing out-of-bounds pointers.
Although there is a common belief within the compiler community that UB enables certain optimizations that would not be possible otherwise, no rigorous large-scale studies have been conducted on this subject. At the same time, there is growing interest in eliminating UB from programming languages to improve security.
In this paper, we present the first comprehensive study that examines the performance impact of exploiting UB in C and C++ applications across multiple CPU architectures. Using LLVM, a compiler known for its extensive use of UB for optimizations, we demonstrate that, for the benchmarks and UB categories that we evaluated, the end-to-end performance gains are minimal. Moreover, when performance regresses, it can often be recovered through small improvements to optimization algorithms or by using link-time optimizations.
Preprint
Task-Based Tensor Computations on Modern GPUs
Rohan Yadav,
Michael Garland,
Alex Aiken, and
Michael Bauer
(Stanford University, USA; NVIDIA, USA)
Domain-specific, fixed-function units are becoming increasingly common in modern processors. As the computational demands of applications evolve, the capabilities and programming interfaces of these fixed-function units continue to change. NVIDIA’s Hopper GPU architecture contains multiple fixed-function units per compute unit, including an asynchronous data movement unit (TMA) and an asynchronous matrix multiplication unit (Tensor Core). Efficiently utilizing these units requires a fundamentally different programming style than previous architectures; programmers must now develop warp-specialized kernels that orchestrate producer-consumer pipelines between the asynchronous units. To manage the complexity of programming these new architectures, we introduce Cypress, a task-based programming model with sequential semantics. Cypress programs are a set of designated functions called tasks that operate on tensors and are free of communication and synchronization. Cypress programs are bound to the target machine through a mapping specification that describes where tasks should run and in which memories tensors should be materialized. We present a compiler architecture that lowers Cypress programs into CUDA programs that perform competitively with expert-written codes. Cypress achieves 0.88x-1.06x the performance of cuBLAS on GEMM, and between 0.80x-0.98x the performance of the currently best-known Flash Attention implementation while eliminating all aspects of explicit data movement and asynchronous computation from application code.
Article Search
DR.FIX: Automatically Fixing Data Races at Industry Scale
Farnaz Behrang,
Zhizhou Zhang,
Georgian-Vlad Saioc,
Peng Liu, and
Milind Chabbi
(Uber Technologies, USA; Aarhus University, Denmark)
Data races are a prevalent class of concurrency bugs in shared-memory parallel programs, posing significant challenges to software reliability and reproducibility. While there is an extensive body of research on detecting data races and a wealth of practical detection tools across various programming languages, considerably less effort has been directed toward automatically fixing data races at an industrial scale. In large codebases, data races are continuously introduced and exhibit myriad patterns, making automated fixing particularly challenging.
In this paper, we tackle the problem of automatically fixing data races at an industrial scale. We present Dr.Fix, a tool that combines large language models (LLMs) with program analysis to generate fixes for data races in real-world settings, effectively addressing a broad spectrum of racy patterns in complex code contexts. Implemented for Go—the programming language widely used in modern microservice architectures where concurrency is pervasive and data races are common—Dr.Fix seamlessly integrates into existing development workflows. We detail the design of Dr.Fix and examine how individual design choices influence the quality of the fixes produced. Over the past 18 months, Dr.Fix has been integrated into developer workflows at Uber demonstrating its practical utility. During this period, Dr.Fix produced patches for 224 (55%) from a corpus of 404 data races spanning various categories; 193 of these patches (86%) were accepted by more than a hundred developers via code reviews and integrated into the codebase.
Article Search
Link-Time Optimization of Dynamic Casts in C++ Programs
Xufan Lu and
Nuno P. Lopes
(INESC-ID, Portugal; Instituto Superior Técnico - University of Lisbon, Portugal)
A core design principle of C++ is that users should only incur costs for features they actually use, both in terms of performance and code size. A notable exception to this rule is the run-time type information (RTTI) data, used for dynamic downcasts, exceptions, and run-time type introspection.
For classes that define at least one virtual method, compilers generate RTTI data that uniquely identifies the type, including a string for the type name. In large programs with complex type inheritance hierarchies, this RTTI data can grow substantially in size. Moreover, dynamic casting algorithms are linear in the type hierarchy size, causing some programs to spend considerable time on these casts.
The common workaround is to use the -fno-rtti compiler flag, which disables RTTI data generation. However, this approach has significant drawbacks, such as disabling polymorphic exceptions and dynamic casts, and requiring the flag to be applied across the entire program due to ABI changes.
In this paper, we propose a new link-time optimization to mitigate both the performance and size overhead associated with dynamic casts and RTTI data. Our optimization replaces costly library calls for downcasts with short instruction sequences and eliminates unnecessary RTTI data by modifying vtables to remove RTTI slots. Our prototype, implemented in the LLVM compiler, demonstrates an average speedup of 1.4
Preprint
Robustifying Debug Information Updates in LLVM via Control Flow Conformance Analysis
Shan Huang,
Jingjing Liang,
Ting Su, and
Qirun Zhang
(East China Normal University, China; Georgia Institute of Technology, USA)
Optimizing compilers, such as LLVM, generate *debug information* in machine code to aid debugging. This information is particularly important when debugging optimized code, as modern software is often compiled with optimization enabled. However, properly updating debug information to reflect code transformations during optimization is a complex task that often relies on manual effort. This complexity makes the process prone to errors, which can lead to incorrect or lost debug information. Finding and fixing potential debug information update errors is vital to maintaining the accuracy and reliability of the overall debugging process. To our knowledge, no existing techniques can rectify debug information update errors in LLVM. While black-box testing approaches can find such bugs, they can neither pinpoint the root causes nor suggest fixes.
To fill the gap, we propose the *first* technique to *robustify* debug information updates in LLVM. In particular, our robustification approach can find and fix incorrect debug location updates. Central to our approach is the observation that the debug locations in the original and optimized programs must satisfy a *conformance relation*. The relation ensures that LLVM optimizations do not introduce extraneous debug location information on the control-flow paths of the optimized programs. We introduce *control-flow conformance analysis*, a novel approach that determines the reference updates ensuring the conformance relation by observing the execution of LLVM optimization passes and analyzing the debug locations in the control-flow graphs of programs under optimization. The determined reference updates are then used to check developer-written updates in LLVM. When discrepancies arise, the reference updates serve as the update skeletons to guide the fixing.
We realized our approach as a tool named MetaLoc, which determines proper debug location updates for LLVM optimizations. More importantly, with MetaLoc, we have reported and patched 46 previously unknown update errors in LLVM. All the patches, along with 22 new regression tests, have been merged into the LLVM codebase, effectively improving the accuracy and reliability of debug information in all programs optimized by LLVM. Furthermore, our approach uncovered and led to corrections in two issues within LLVM’s official documentation on debug information updates.
Article Search
A Uniform Framework for Handling Position Constraints in String Solving
Yu-Fang Chen,
Vojtěch Havlena,
Michal Hečko,
Lukáš Holík, and
Ondřej Lengál
(Academia Sinica, Taiwan; Brno University of Technology, Czech Republic; Aalborg University, Denmark)
We introduce a novel decision procedure for solving the class of position string constraints, which includes string disequalities, ¬prefixof, ¬suffixof, str.at, and ¬str.at. These constraints are generated frequently in almost any application of string constraint solving. Our procedure avoids expensive encoding of the constraints to word equations and, instead, reduces the problem to checking conflicts on positions satisfying an integerconstraint obtained from the Parikh image of a polynomial-sized finite automaton with a special structure. By the reduction to counting, solving position constraints becomes NP-complete and for some cases even falls into PTime. This is much cheaper than the previously used techniques, which either used reductions generating word equations and length constraints (for which modern string solvers use exponential-space algorithms) or incomplete techniques. Our method is relevant especially for automata-based string solvers, which have recently achieved the best results in terms of practical efficiency, generality, and completeness guarantees. This work allows them to excel also on position constraints, which used to be their weakness. Besides the efficiency gains, we show that our framework may be extended to solve a large fragment of ¬contains (in NExpTime), for which decidability has been long open, and gives a hope to solve the general problem. Our implementation of the technique within the Z3-Noodler solver significantly improves its performance on position constraints.
Preprint
Artifacts Available
MarQSim: Reconciling Determinism and Randomness in Compiler Optimization for Quantum Simulation
Xiuqi Cao,
Junyu Zhou,
Yuhao Liu,
Yunong Shi, and
Gushu Li
(University of Pennsylvania, USA; AWS Quantum Technologies, USA)
Quantum Hamiltonian simulation, fundamental in quantum algorithm design, extends far beyond its foundational roots, powering diverse quantum computing applications. However, optimizing the compilation of quantum Hamiltonian simulation poses significant challenges. Existing approaches fall short in reconciling deterministic and randomized compilation, lack appropriate intermediate representations, and struggle to guarantee correctness. Addressing these challenges, we present MarQSim, a novel compilation framework. MarQSim leverages a Markov chain-based approach, encapsulated in the Hamiltonian Term Transition Graph, adeptly reconciling deterministic and randomized compilation benefits. Furthermore, we formulate a Minimum-Cost Flow model that can tune transition matrices to enforce correctness while accommodating various optimization objectives. Experimental results demonstrate MarQSim's superiority in generating more efficient quantum circuits for simulating various quantum Hamiltonians while maintaining precision.
Article Search
Type-Aware Constraining for LLM Code Generation
Niels Mündler,
Jingxuan He,
Hao Wang,
Koushik Sen,
Dawn Song, and
Martin Vechev
(ETH Zurich, Switzerland; University of California at Berkeley, USA)
Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although constrained decoding is a promising approach to alleviate this issue, it has only been applied to handle either domain-specific languages or syntactic features of general-purpose programming languages. However, LLMs frequently generate code with typing errors, which are beyond the domain of syntax and generally hard to adequately constrain. To address this challenge, we introduce a type-constrained decoding approach that leverages type systems to guide code generation. We develop novel prefix automata for this purpose and introduce a sound approach to enforce well-typedness based on type inference and a search over inhabitable types. We formalize our approach on a foundational simply-typed language and extend it to TypeScript to demonstrate practicality. Our evaluation on the HumanEval and MBPP datasets shows that our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks across LLMs of various sizes and model families, including state-of-the-art open-weight models with more than 30B parameters. The results demonstrate the generality and effectiveness of our approach in constraining LLM code generation with formal rules of type systems.
Preprint
Artifacts Available
Optimization-Directed Compiler Fuzzing for Continuous Translation Validation
Jaeseong Kwon,
Bongjun Jang,
Juneyoung Lee, and
Kihong Heo
(KAIST, Republic of Korea; AWS, USA)
Incorrect compiler optimizations can lead to unintended program behavior and security vulnerabilities. However, the enormous size and complexity of modern compilers make it challenging to ensure the correctness of optimizations. The problem becomes more severe as compiler engineers continuously add new optimizations to improve performance and support new language features. In this paper, we propose Optimuzz, a framework to effectively detect incorrect optimization bugs in such continuously changing compilers. The key idea is to combine two complementary techniques: directed grey-box fuzzing and translation validation. We design a novel optimization-directed fuzzing framework that efficiently generates input programs to trigger specific compiler optimizations. Optimuzz then use existing translation validation tools to verify the correctness of the optimizations on the input programs. We instantiate our approach for two major compilers, LLVM and TurboFan. The results show that Optimuzz can effectively detect miscompilation bugs in these compilers compared to the state-of-the-art tools. We also applied Optimuzz to the latest version of LLVM and discovered 55 new miscompilation bugs.
Article Search
Info
Artifacts Available
Solving Floating-Point Constraints with Continuous Optimization
Qian Chen,
Chenqi Cui,
Fengjuan Gao,
Yu Wang,
Ke Wang, and
Linzhang Wang
(Nanjing University, China; Nanjing University of Science and Technology, China; Visa Research, USA)
The Satisfiability Modulo Theory (SMT) problem over floating-point operations presents a significant challenge. State-of-the-art SMT solvers often run into difficulties when dealing with large, complex floating-point constraints. Recently, a new approach to floating-point constraint solving emerges, utilizing mathematical optimization (MO) methods as an engine of their solving approach.
Despite the novelty, these methods can fall short in both effectiveness and efficiency due to issues of the translated functions (e.g., discontinuity) and inherent limitations of their underlying MO method (e.g., imprecise search process, scalability issues).
Driven by these weaknesses of prior solvers, this paper introduces a new MO-based approach that is shown highly potent in solving floating-point constraints. Specifically, on the benchmarks of JFS (a recent solver based on fuzzing), Grater, a realization of our approach, solves as many constraints as Bitwuzla and one more than CVC5 but runs over 10 times faster and over 40 times faster than Bitwuzla and CVC5 in median solving time across all benchmarks. It is worth mentioning that Bitwuzla and CVC5 are the strongest solvers for floating-point constraints according to results of the annual international SMT solver competition (SMT-COMP). Together, they have won all gold medals for QF_FPArith and FPArith divisions, which focus on floating-point constraints solving, over the past three years. To further evaluate Grater, we select over 100 most difficult benchmarks from the FP SMT-LIB, a logic regularly used in SMT-COMP. The difficulty is measured by the complexity of the composition (e.g., number of variables, clauses) and the interdependencies within constraints. Grater again solves the same number of constraints as Bitwuzla and CVC5 while running over 10 times faster than both solvers in average solving time, and over 50 times (resp. 30 times) faster than Bitwuzla (resp. CVC5) in median solving time.
We release the source code of Grater, along with all evaluation data, including detailed comparisons of Grater against each baseline solver (i.e., Z3, CVC5, Bitwuzla, JFS, XSat, and CoverMe), at https://github.com/grater-exp/grater-experiment to facilitate reproducibility.
Article Search
Reductive Analysis with Compiler-Guided Large Language Models for Input-Centric Code Optimizations
Xiangwei Wang,
Xinning Hui,
Chunhua Liao, and
Xipeng Shen
(North Carolina State University, USA; Lawrence Livermore National Laboratory, USA)
Input-centric program optimization aims to optimize code by considering the relations between program inputs and program behaviors. Despite its promise, a long-standing barrier for its adoption is the difficulty of automatically identifying critical features of complex inputs. This paper introduces a novel technique, reductive analysis through compiler-guided Large Language Models (LLMs), to solve the problem through a synergy between compilers and LLMs. It uses a reductive approach to overcome the scalability and other limitations of LLMs in program code analysis. The solution, for the first time, automates the identification of critical input features without heavy instrumentation or profiling, cutting the time needed for input identification by 44× (or 450× for local LLMs), reduced from 9.6 hours to 13 minutes (with remote LLMs) or 77 seconds (with local LLMs) on average, making input characterization possible to be integrated into the workflow of program compilations. Optimizations on those identified input features show similar or even better results than those identified by previous profiling-based methods, leading to optimizations that yield 92.6% accuracy in selecting the appropriate adaptive OpenMP parallelization decisions, and 20–30% performance improvement of serverless computing while reducing resource usage by 50–60%.
Article Search
Quantum Register Machine: Efficient Implementation of Quantum Recursive Programs
Zhicheng Zhang and
Mingsheng Ying
(University of Technology Sydney, Australia)
Quantum recursive programming has been recently introduced for describing sophisticated and complicated quantum algorithms in a compact and elegant way. However, implementation of quantum recursion involves intricate interplay between quantum control flow and recursive procedure calls. In this paper, we aim at resolving this fundamental challenge and develop a series of techniques to efficiently implement quantum recursive programs. Our main contributions include:
1. We propose a notion of quantum register machine, the first quantum architecture (including an instruction set) that provides instruction-level support for quantum control flow and recursive procedure calls at the same time.
2. Based on quantum register machine, we describe the first comprehensive implementation process of quantum recursive programs, including the compilation, the partial evaluation of quantum control flow, and the execution on the quantum register machine.
3. As a bonus, our efficient implementation of quantum recursive programs also offers automatic parallelisation of quantum algorithms. For implementing certain quantum algorithmic subroutine, like the widely used quantum multiplexor, we can even obtain exponential parallel speed-up (over the straightforward implementation) from this automatic parallelisation. This demonstrates that quantum recursive programming can be win-win for both modularity of programs and efficiency of their implementation.
Preprint
Destabilizing Iris
Simon Spies,
Niklas Mück,
Haoyi Zeng,
Michael Sammler,
Andrea Lattuada,
Peter Müller, and
Derek Dreyer
(MPI-SWS, Germany; Saarland University, Germany; ETH Zurich, Switzerland; ISTA, Switzerland)
Article Search
Probabilistic Kleene Algebra with Angelic Nondeterminism
Shawn Ong,
Stephanie Ma, and
Dexter Kozen
(Cornell University, USA)
We introduce a version of probabilistic Kleene algebra with angelic nondeterminism and a corresponding class of automata. Our approach implements semantics via distributions over multisets in order to overcome theoretical barriers arising from the lack of a distributive law between the powerset and Giry monads. We produce a full Kleene theorem and a coalgebraic theory, as well as both operational and denotational semantics and equational reasoning principles.
Article Search
CRGC: Fault-Recovering Actor Garbage Collection in Pekko
Dan Plyukhin,
Gul Agha, and
Fabrizio Montesi
(University of Southern Denmark, USA; University of Illinois at Urbana-Champaign, USA; University of Southern Denmark, Denmark)
Actors are lightweight reactive processes that communicate by asynchronous message-passing. Actors address common problems like concurrency control and fault tolerance, but resource management remains challenging: in all four of the most popular actor frameworks (Pekko, Akka, Erlang, and Elixir) programmers must explicitly kill actors to free up resources. To simplify resource management, researchers have devised actor garbage collectors (actor GCs) that monitor the application and detect when actors are safe to kill. However, existing actor GCs are impractical for distributed systems where the network is unreliable and nodes can fail. The simplest actor GCs do not collect cyclic garbage, whereas more sophisticated actor GCs are not fault-recovering: dropped messages and crashed nodes can cause actors to become garbage that never gets collected.
We present Conflict-free Replicated Garbage Collection (CRGC): the first fault-recovering cyclic actor GC. In CRGC, actors and nodes record information locally and broadcast updates to the garbage collectors running on each node. CRGC does not require locks, explicit memory barriers, or any assumptions about message delivery order, except for reliable FIFO channels from actors to their local garbage collector. Moreover, CRGC is simple: we concisely present its operational semantics, which has been formalized in TLA+, and prove both soundness (non-garbage actors are never killed) and completeness (all garbage actors are eventually killed, under reasonable assumptions). We also present a preliminary implementation in Apache Pekko and measure its performance using two actor benchmark suites. Our results show the performance overhead of CRGC is competitive with simpler approaches like weighted reference counting, while also being much more powerful.
Article Search
Artifacts Available
A Hybrid Approach to Semi-automated Rust Verification
Sacha-Élie Ayoun,
Xavier Denis,
Petar Maksimović, and
Philippa Gardner
(Imperial College London, UK; ETH Zurich, Switzerland; Nethermind, UK)
We propose a hybrid approach to end-to-end Rust verification where the proof effort is split into powerful automated verification of safe Rust and targeted semi-automated verification of unsafe Rust.
To this end, we present Gillian-Rust, a proof-of-concept semi-automated verification tool built on top of the Gillian platform that can reason about type safety and functional correctness of unsafe code. Gillian-Rust automates a rich separation logic for real-world Rust, embedding the lifetime logic of RustBelt and the parametric prophecies of RustHornBelt, and is able to verify real-world Rust standard library code with only minor annotations and with verification times orders of magnitude faster than those of comparable tools.
We link Gillian-Rust with Creusot, a state-of-the-art verifier for safe Rust, by providing a systematic encoding of unsafe code specifications that Creusot can use but cannot verify, demonstrating the feasibility of our hybrid approach.
Article Search
Artifacts Available
QVM: Quantum Gate Virtualization Machine
Nathaniel Tornow,
Emmanouil Giortamis, and
Pramod Bhatotia
(TU Munich, Germany; LRZ, Germany)
We present the Quantum Gate Virtualization Machine (QVM), an end-to-end generic system for scalable execution of large quantum circuits with high fidelity on noisy and small quantum processors (QPUs) by leveraging gate virtualization. QVM exposes a virtual circuit intermediate representation (IR) that extends the notion of quantum circuits to incorporate gate virtualization. Based on the virtual circuit as our IR, we propose the QVM compiler—an extensible compiler infrastructure to transpile a virtual circuit through a series of modular optimization passes to produce a set of optimized circuit fragments. Lastly, these transpiled circuit fragments are executed on QPUs using our QVM runtime—a scalable and parallel infrastructure to virtualize and execute circuit fragments on a set of QPUs.
We evaluate QVM on IBM’s 7- and 27-qubit QPUs. Our evaluation shows that our approach allows for the execution of circuits with up to double the number of qubits compared to the qubit-count of a QPU, while improving fidelity by 4.7× on average compared to larger QPUs and that we can effectively reduce circuit depths to only 40% of the original circuit depths.
Article Search
Artifacts Available
Tree Borrows
Neven Villani,
Johannes Hostert,
Derek Dreyer, and
Ralf Jung
(University of Grenoble Alpes - VERIMAG, France; ETH Zurich, Switzerland; MPI-SWS, Germany)
The Rust programming language is well known for its ownership-based type system, which offers strong guarantees like memory safety and data race freedom. However, Rust also provides unsafe escape hatches, for which safety is not guaranteed automatically and must instead be manually upheld by the programmer. This creates a tension. On the one hand, compilers would like to exploit the strong guarantees of the type system—particularly those pertaining to aliasing of pointers—in order to unlock powerful intraprocedural optimizations. On the other hand, those optimizations are easily invalidated by “badly behaved” unsafe code. To ensure correctness of such optimizations, it thus becomes necessary to clearly define what unsafe code is “badly behaved.” In prior work, Stacked Borrows defined a set of rules achieving this goal. However, Stacked Borrows rules out several patterns that turn out to be common in real-world unsafe Rust code, and it does not account for advanced features of the Rust borrow checker that were introduced more recently.
To resolve these issues, we present Tree Borrows. As the name suggests, Tree Borrows is defined by replacing the stack at the heart of Stacked Borrows with a tree. This overcomes the aforementioned limitations: our evaluation on the 30 000 most widely used Rust crates shows that Tree Borrows rejects 54
Article Search
Artifacts Available
Verified Foundations for Differential Privacy
Markus de Medeiros,
Muhammad Naveed,
Tancrède Lepoint,
Temesghen Kahsai,
Tristan Ravitch,
Stefan Zetzsche,
Anjali Joshi,
Joseph Tassarotti,
Aws Albarghouthi, and
Jean-Baptiste Tristan
(New York University, USA; Amazon, UK; Amazon, USA)
Article Search
Efficient, Portable, Census-Polymorphic Choreographic Programming
Mako Bates,
Shun Kashiwa,
Syed Jafri,
Gan Shen,
Lindsey Kuper, and
Joseph P. Near
(University of Vermont, USA; University of California at San Diego, USA; University of California at Santa Cruz, USA)
Choreographic programming (CP) is a paradigm for implementing distributed systems that uses a single global program to define the actions and interactions of all participants. Library-level CP implementations, like HasChor, integrate well with mainstream programming languages but have several limitations: Their conditionals require extra communication; they require specific host-language features (e.g., monads); and they lack support for programming patterns that are essential for implementing realistic distributed applications.
We make three contributions to library-level CP to specifically address these challenges. First, we propose and formalize conclaves and multiply-located values, which enable efficient conditionals in library-level CP without redundant communication. Second, we propose census polymorphism, a technique for abstracting over the number of participants in a choreography. Third, we introduce a design pattern for library-level CP in host languages without support for monads. We demonstrate these contributions via implementations in Haskell, Rust, and TypeScript.
Article Search
Artifacts Available
Taking out the Toxic Trash: Recovering Precision in Mixed Flow-Sensitive Static Analyses
Fabian Stemmler,
Michael Schwarz,
Julian Erhard,
Sarah Tilscher, and
Helmut Seidl
(TU Munich, Germany; LMU Munich, Germany)
Static analysis of real-world programs combines flow- and context-sensitive analyses of local program states
with computation of flow- and context-insensitive invariants at globals, that, e.g., abstract data shared by
multiple threads. The values of locals and globals may mutually depend on each other, with the analysis of
local program states both making contributions to globals and querying their values. Usually, all contributions
to globals are accumulated during fixpoint iteration, with widening applied to enforce termination. Such
flow-insensitive information often becomes unnecessarily imprecise and can include superfluous contributions
— trash — which, in turn, may be toxic to the precision of the overall analysis. To recover precision of globals,
we propose techniques complementing each other: Narrowing on globals differentiates contributions by origin;
reluctant widening limits the amount of widening applied at globals; and finally, abstract garbage collection
undoes contributions to globals and propagates their withdrawal. The experimental evaluation shows that
these techniques increase the precision of mixed flow-sensitive analyses at a reasonable cost.
Article Search
Artifacts Available
Relational Abstractions Based on Labeled Union-Find
Dorian Lesbre,
Matthieu Lemerre,
Hichem Rami Ait-El-Hara, and
François Bobot
(Université Paris-Saclay - CEA LIST, France; OCamlPro, France)
We introduce a new family of abstractions based on a data structure that we call labeled union-find, an extension of the classic efficient union-find data structure where edges carry labels. These labels have a composition operation that obey the group axioms. Like union-find, the labeled version can efficiently compute the transitive closure of a relation, but it is not limited to equivalence relations; it can represent any injective transformation between equivalence classes, which includes two-variables per equality (TVPE) constraints of the form y = a× x + b. Using abstract interpretation theory, we study the properties deriving from the use of abstract relations as labels, and the combination of labeled union-find with other representations of constraints, allowing both improvements in precision and simplification of existing constraints. Due to its efficiency, the labeled union-find abstractions could find many uses; we use it in two use cases, program analysis based on abstract interpretation and constraint solving for SMT, with encouraging preliminary results.
Article Search
Artifacts Available
Functional Meaning for Parallel Streaming
Nick Rioux and
Steve Zdancewic
(University of Pennsylvania, USA)
Nondeterminism introduced by race conditions and message reorderings makes parallel and distributed programming hard. Nevertheless, promising approaches such as LVars and CRDTs address this problem by introducing a partial order structure on shared state that describes how the state evolves over time. Monotone programs that respect the order are deterministic. Datalog-inspired languages incorporate this idea of monotonicity in a first-class way but they are not general-purpose. We would like parallel and distributed languages to be as natural to use as any functional language, without sacrificing expressivity, and with a formal basis of study as appealing as the lambda calculus.
This paper presents λ∨, a core language for deterministic parallelism that embodies the ideas above. In λ∨, values may increase over time according to a streaming order and all computations are monotone with respect to that order. The streaming order coincides with the approximation order found in Scott semantics and so unifies the foundations of functional programming with the foundations of deterministic distributed computation. The resulting lambda calculus has a computationally adequate model rooted in domain theory. It integrates the compositionality and power of abstraction characteristic of functional programming with the declarative nature of Datalog.
Preprint
Membership Testing for Semantic Regular Expressions
Yifei Huang,
Matin Amini,
Alexis Le Glaunec,
Konstantinos Mamouras, and
Mukund Raghothaman
(University of Southern California, USA; Rice University, USA)
This paper is about semantic regular expressions (SemREs). This is a concept that was recently proposed by Smore (Chen et al. 2023) in which classical regular expressions are extended with a primitive to query external oracles such as databases and large language models (LLMs). SemREs can be used to identify lines of text containing references to semantic concepts such as cities, celebrities, political entities, etc. The focus in their paper was on automatically synthesizing semantic regular expressions from positive and negative examples.
In this paper, we study the membership testing problem.
First, we present a two-pass NFA-based algorithm to determine whether a string w matches a SemRE r in O(|r|2 |w|2 + |r| |w|3) time, assuming the oracle responds to each query in unit time. In common situations, where oracle queries are not nested, we show that this procedure runs in O(|r|2 |w|2) time. Experiments with a prototype implementation of this algorithm validate our theoretical analysis, and show that the procedure massively outperforms a dynamic programming-based baseline, and incurs a ≈ 2 × overhead over the time needed for interaction with the oracle.
Second, we establish connections between SemRE membership testing and the triangle finding problem from graph theory, which suggest that developing algorithms which are simultaneously practical and asymptotically faster might be challenging. Furthermore, algorithms for classical regular expressions primarily aim to optimize their time and memory consumption. In contrast, an important consideration in our setting is to minimize the cost of invoking the oracle. We demonstrate an Ω(|w|2) lower bound on the number of oracle queries necessary to make this determination.
Preprint
Artifacts Available
MISAAL: Synthesis-Based Automatic Generation of Efficient and Retargetable Semantics-Driven Optimizations
Abdul Rafae Noor,
Dhruv Baronia,
Akash Kothari,
Muchen Xu,
Charith Mendis, and
Vikram S. Adve
(University of Illinois at Urbana-Champaign, USA)
Using program synthesis to select instructions for and optimize input programs is receiving increasing attention. However, existing synthesis-based compilers are faced by two major challenges that prohibit the deployment of program synthesis in production compilers: exorbitantly long synthesis times spanning several minutes and hours; and scalability issues that prevent synthesis of complex modern compute and data swizzle instructions, which have been found to maximize performance of modern tensor and stencil workloads. This paper proposes MISAAL, a synthesis-based compiler that employs a novel strategy to use formal semantics of hardware instructions to automatically prune a large search space of rewrite rules for modern complex instructions in an offline stage. MISAAL also proposes a novel methodology to make term-rewriting process in the online stage (at compile-time) extremely lightweight so as to enable programs to compile in seconds. Our results show that MISAAL reduces compilation times by up to a geomean of 16x compared to the state-of-the-art synthesis-based compiler, HYDRIDE. MISAAL also delivers competitive runtime performance against the production compiler for image processing and deep learning workloads, Halide, as well as HYDRIDE across x86, Hexagon and ARM.
Article Search
An Interactive Debugger for Rust Trait Errors
Gavin Gray,
Will Crichton, and
Shriram Krishnamurthi
(Brown University, USA)
Compiler diagnostics for type inference failures are notoriously bad, and type classes only make the problem worse. By introducing a complex search process during inference, type classes can lead to wholly inscrutable or useless errors. We describe a system, Argus, for interactively visualizing type class inferences to help programmers debug inference failures, applied specifically to Rust’s trait system. The core insight of Argus is to avoid the traditional model of compiler diagnostics as one-size-fits-all, instead providing the programmer with different views on the search tree corresponding to different debugging goals. Argus carefully uses defaults to improve debugging productivity, including interface design (e.g., not showing full paths of types by default) and heuristics (e.g., sorting obligations based on the expected complexity of fixing them). We evaluated Argus in a user study where N = 25 participants debugged type inference failures in realistic Rust programs, finding that participants using Argus correctly localized 2.2× as many faults and localized 3.3× faster compared to not using Argus.
Article Search
Artifacts Available
Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search
Pinhan Zhao,
Yuepeng Wang, and
Xinyu Wang
(University of Michigan, USA; Simon Fraser University, Canada)
We present a novel symbolic reasoning engine for SQL which can efficiently generate an input I for n queries P1, ⋯, Pn, such that their outputs on I satisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of each Pi — that is, a subset of Pi’s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.
Article Search
Artifacts Available
Divergence-Aware Testing of Graphics Shader Compiler Back-Ends
Dongwei Xiao,
Shuai Wang,
Zhibo Liu,
Yiteng Peng,
Daoyuan Wu, and
Zhendong Su
(Hong Kong University of Science and Technology, China; ETH Zurich, Switzerland)
Graphics shaders are the core of modern 3D visual effects, enabling developers to create realistic, real-time rendering of 3D scenes. Shaders are specialized programs written in high-level shading languages like GLSL, and graphics shader compilers translate these high-level shader programs into low-level binaries that run on GPUs. These shader compilers are complex programs with multiple layers: front-end, middle-end, and back-end. Despite significant development efforts from industrial GPU vendors such as NVIDIA and AMD, graphics shader compilers still contain bugs that can impact downstream applications and lead to negative consequences from poor user experience in entertainment to accidents in driving assistance systems. Because they are complex and deep in the compilation pipeline, the back-ends of shader compilers are particularly challenging to test. Our empirical exploration shows that state-of-the-art testing tools for shader compilers do not specifically target the back-ends and are thus ineffective in uncovering back-end bugs.
This work fills this gap and introduces ShadeDiv, an automated testing tool specifically designed to uncover bugs in the back-ends of graphics shader compilers. To this end, ShadeDiv generates test inputs with two novel, carefully designed strategies to support the unique computational models of the back-ends, namely control and data flow divergence among GPU threads. ShadeDiv deliberately perturbs divergence patterns in both the control and data flow of shader programs to effectively trigger back-end optimizations. Our evaluation of ShadeDiv on graphics shader compilers from four mainstream GPU vendors uncovered 12 back-end bugs. Further comparison with existing shader compiler testing tools shows that ShadeDiv achieves a 25% coverage increase in the back-end components and finds four times as many back-end bugs.
Article Search
Pointer Analysis for Database-Backed Applications
Yufei Liang,
Teng Zhang,
Ganlin Li,
Tian Tan,
Chang Xu,
Chun Cao,
Xiaoxing Ma, and
Yue Li
(Nanjing University, China)
Database-backed applications form the backbone of modern software, yet their complexity poses significant challenges for static analysis. These applications involve intricate interactions among application code, diverse database frameworks such as JDBC, Hibernate, and Spring Data JPA, and languages like Java and SQL. In this paper, we introduce DBridge, the first pointer analysis specifically designed for Java database-backed applications, capable of statically constructing comprehensive Java-to-database value flows. DBridge unifies application code analysis, database access specification modeling, SQL analysis, and database abstraction within a single pointer analysis framework, capturing interactions across a wide range of database access APIs and frameworks. Additionally, we present DB-Micro, a new micro-benchmark suite with 824 test cases crafted to systematically evaluate static analysis for database-backed applications. Experiments on DB-Micro and large, complex, real-world applications demonstrate DBridge's effectiveness, achieving high recall and precision in building Java-to-database value flows efficiently and outperforming state-of-the-art tools in SQL statement identification. To further validate DBridge's utility, we develop three client analyses for security and program understanding. Evaluation on these real-world applications reveals 30 Stored XSS attack vulnerabilities and 3 horizontal broken access control vulnerabilities, all previously undiscovered and real, as well as a high detection rate in impact analysis for schema changes. By open-sourcing DBridge (14K LoC) and DB-Micro (22K LoC), we seek to help advance static analysis for modern database-backed applications in the future.
Article Search
Artifacts Available
Principal Type Inference under a Prefix: A Fresh Look at Static Overloading
Daan Leijen and
Wenjia Ye
(Microsoft Research, USA; National University of Singapore, Singapore; University of Hong Kong, China)
At the heart of the Damas-Hindley-Milner (HM) type system lies the abstraction rule which derives a function type for a lambda expression. In this rule, the type of the parameter can be ”guessed”, and can be any type that fits the derivation. The beauty of the HM system is that there always exists a most general type that encompasses all possible derivations – Algorithm W is used to infer these most general types in practice.
Unfortunately, this property is also the bane of the HM type rules. Many languages extend HM typing with additional features which often require complex side conditions to the type rules to maintain principal types. For example, various type systems for impredicative type inference, like HMF, FreezeML, or Boxy types, require let-bindings to always assign most general types. Such a restriction is difficult to specify as a logical deduction rule though, as it ranges over all possible derivations. Despite these complications, the actual implementations of various type inference algorithms are usually straightforward extensions of algorithm W, and from an implementation perspective, much of the complexity of various type system extensions, like boxes or polymorphic weights, is in some sense artificial.
In this article we rephrase the HM type rules as _type inference under a prefix_, called HMQ. HMQ is sound and complete with respect to the HM type rules, but always derives principal types that correspond to the types inferred by algorithm W. The HMQ type rules are close to the clarity of the declarative HM type rules, but also specific enough to ”read off” an inference algorithm, and can form an excellent basis to describe type system extensions in practice. We show in particular how to describe the FreezeML and HMF systems in terms of inference under a prefix, and how we no longer require complex side conditions. We also show a novel formalization of static overloading in HMQ as implemented in Koka language.
Article Search
Robust Constant-Time Cryptography
Matthew Kolosick,
Basavesh Ammanaghatta Shivakumar,
Sunjay Cauligi,
Marco Patrignani,
Marco Vassena,
Ranjit Jhala, and
Deian Stefan
(University of California at San Diego, USA; Virginia Tech, USA; ICSI, Germany; University of Trento, Italy; Utrecht University, Netherlands)
Cryptographic library developers take care to ensure their library does not leak secrets even when there are (inevitably) exploitable vulnerabilities in the applications the library is linked against. To do so, they choose some class of application vulnerabilities to defend against and hardcode protections against those vulnerabilities in the library code. A single set of choices is a poor fit for all contexts: a chosen protection could impose unnecessary overheads in contexts where those attacks are impossible, and an ignored protection could render the library insecure in contexts where the attack is feasible.
We introduce RoboCop, a new methodology and toolchain for building secure and efficient applications from cryptographic libraries, via four contributions. First, we present an operational semantics that describes the behavior of a (cryptographic) library executing in the context of a potentially vulnerable application so that we can precisely specify what different attackers can observe. Second, we use our semantics to define a novel security property, Robust Constant Time (RCT), that defines when a cryptographic library is secure in the context of a vulnerable application. Crucially, our definition is parameterized by an attacker model, allowing us to factor out the classes of attackers that a library may wish to secure against. This refactoring yields our third contribution: a compiler that can synthesize bespoke cryptographic libraries with security tailored to the specific application context against which the library will be linked, guaranteeing that the library is RCT in that context. Finally, we present an empirical evaluation that shows the RoboCop compiler can automatically generate code to efficiently protect a wide range (over 500) of cryptographic library primitives against three classes of attacks: read gadgets (due to application memory safety vulnerabilities), speculative read gadgets (due to application speculative execution vulnerabilities), and concurrent observations (due to application threads), with performance overhead generally under 2% for protections from read gadgets and under 4% for protections from speculative read gadgets, thus freeing library developers from making one-size-fits-all choices between security and performance.
Article Search
PulseCore: An Impredicative Concurrent Separation Logic for Dependently Typed Programs
Gabriel Ebner,
Guido Martínez,
Aseem Rastogi,
Thibault Dardinier,
Megan Frisella,
Tahina Ramananandro, and
Nikhil Swamy
(Microsoft Research, USA; Microsoft Research, India; ETH Zurich, Switzerland; University of Washington, USA)
Article Search
Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices
Yang He,
Xiaoyu Liu, and
Yuepeng Wang
(Simon Fraser University, Canada)
Image editing is a common task across a wide range of domains, from personal use to professional applications. Despite advances in computer vision, current tools still demand significant manual effort for editing tasks that require repetitive operations on images with many objects. In this paper, we present a novel approach to automating the image editing process using program synthesis. We propose a new algorithm based on lattice structures to automatically synthesize object selection predicates for image editing from positive and negative examples. By leveraging the algebraic properties of lattices, our algorithm efficiently synthesizes an optimal object selection predicate among multiple correct solutions. We have implemented our technique and evaluated it on 100 tasks over 20 images. The evaluation result demonstrates our tool is effective and efficient, which outperforms state-of-the-art synthesizers and LLM-based approaches.
Article Search
Artifacts Available
Dynamic Region Ownership for Concurrency Safety
Fridtjof Peer Stoldt,
Brandt Bucher,
Sylvan Clebsch,
Matthew A. Johnson,
Matthew J. Parkinson,
Guido van Rossum,
Eric Snow, and
Tobias Wrigstad
(Uppsala University, Sweden; Microsoft, USA; Microsoft Azure Research, UK)
The ways in which the components of a program interact with each
other in a concurrent setting can be considerably more complex
than in a sequential setting. The core problem is unrestricted
shared mutable state. An alternative to unrestricted shared
mutable state is to restrict the sharing using Ownership.
Ownership can turn what would have been a race into a
deterministic failure that can be explained to the programmer.
However, Ownership has predominantly taken place in statically
typed languages.
In this paper, we explore retrofitting an existing dynamically
typed programming language with an ownership model based on
regions. Our core aim is to provide safe concurrency, that is, the
ownership model should provide deterministic dynamic failures of
ownership that can be explained to the programmer. We present a
dynamic model of ownership that provides ownership of groups
objects called regions. We provide dynamic enforcement of our
region discipline, which we have implemented in a simple
interpreter that provides a Python-like syntax and semantics, and
report on our first steps into integrating it into an existing
language, Python.
Article Search
Program Synthesis From Partial Traces
Margarida Ferreira,
Victor Nicolet,
Joey Dodds, and
Daniel Kroening
(Carnegie Mellon University, USA; INESC-ID, Portugal; Instituto Superior Técnico - University of Lisbon, Portugal; Amazon, Canada; Amazon, USA)
We present the first technique to synthesize programs that compose side-effecting functions, pure functions, and control flow, from partial traces containing records of only the side-effecting functions. This technique can be applied to synthesize API composing scripts from logs of calls made to those APIs, or a script from traces of system calls made by a workload, for example. All of the provided traces are positive examples, meaning that they describe desired behavior. Our approach does not require negative examples. Instead, it generalizes over the examples and uses cost metrics to prevent over-generalization. Because the problem is too complex for traditional monolithic program synthesis techniques, we propose a new combination of optimizing rewrites and syntax-guided program synthesis. The resulting program is correct by construction, so its output will always be able to reproduce the input traces.
We evaluate the quality of the programs synthesized when considering various optimization metrics and the synthesizer's efficiency on real-world benchmarks. The results show that our approach can generate useful real-world programs.
Article Search
Artifacts Available
Probabilistic Refinement Session Types
Qiancheng Fu,
Ankush Das, and
Marco Gaboardi
(Boston University, USA)
Session types provide a formal type system to define and verify communication protocols between message-passing processes. In order to analyze randomized systems, recent works have extended session types with probabilistic type constructors. Unfortunately, all the proposed extensions only support constant probabilities which limits their applicability to real-world systems. Our work addresses this limitation by introducing probabilistic refinement session types which enable symbolic reasoning for concurrent probabilistic systems in a core calculus we call PReST. The type system is carefully designed to be a conservative extension of refinement session types and supports both probabilistic and regular choice type operators. We also implement PReST in a prototype which we use for validating probabilistic concurrent programs. The added expressive power leads to significant challenges, both in the meta theory and implementation of PReST, particularly with type checking: it requires reconstructing intermediate types for channels when type checking probabilistic branching expressions. The theory handles this by semantically quantifying refinement variables in probabilistic typing rules, a deviation from standard refinement type systems. The implementation relies on a bi-directional type checker that uses an SMT solver to reconstruct the intermediate types minimizing annotation overhead and increasing usability. To guarantee that probabilistic processes are almost-surely terminating, we integrate cost analysis into our type system to obtain expected upper bounds on recursion depth. We evaluate PReST on a wide variety of benchmarks from 4 categories: (i) randomized distributed protocols such as Itai and Rodeh's leader election, bounded retransmission, etc., (ii) parametric Markov chains such as random walks, (iii) probabilistic analysis of concurrent data structures such as queues, and (iv) distributions obtained by composing uniform distributions using operators like max, and sum. Our experiments show that the PReST type checker scales to large programs with sophisticated probabilistic distributions.
Article Search
Info
Artifacts Available
Graphiti: Bridging Graph and Relational Database Queries
Yang He,
Ruijie Fang,
Işıl Dillig, and
Yuepeng Wang
(Simon Fraser University, Canada; University of Texas at Austin, USA)
This paper presents an automated reasoning technique for checking equivalence between graph database queries written in Cypher and relational queries in SQL. To formalize a suitable notion of equivalence in this setting, we introduce the concept of database transformers, which transform database instances between graph and relational models. We then propose a novel verification methodology that checks equivalence modulo a given transformer by reducing the original problem to verifying equivalence between a pair of SQL queries. This reduction is achieved by embedding a subset of Cypher into SQL through syntax-directed translation, allowing us to leverage existing research on automated reasoning for SQL while obviating the need for reasoning simultaneously over two different data models. We have implemented our approach in a tool called Graphiti and used it to check equivalence between graph and relational queries. Our experiments demonstrate that Graphiti is useful both for verification and refutation and that it can uncover subtle bugs, including those found in Cypher tutorials and academic papers.
Article Search
Artifacts Available
Verifying Solutions to Semantics-Guided Synthesis Problems
Charlie Murphy,
Keith J.C. Johnson,
Thomas Reps, and
Loris D'Antoni
(University of Wisconsin-Madison, USA; University of California at San Diego, USA)
Semantics-Guided Synthesis (SemGuS) provides a framework to specify synthesis problems in a solver-agnostic and domain-agnostic way, by allowing a user to provide both the syntax and semantics of the language in which the desired program should be synthesized. Because synthesis and verification are closely intertwined, the SemGuS framework raises the following question: how does one verify that a user-given program satisfies a given specification when interpreted according to a user-given semantics?
In this paper, we prove that this form of language-agnostic verification (specifically that verifying whether a program is a valid solution to a SemGuS problem) can be reduced to proving validity of a query in the µCLP calculus, a fixed-point logic that is capable of expressing alternating least and greatest fixed-points. Our encoding into µCLP allows us to further classify the SemGuS verification problems into ones that are reducible to satisfiability of (i) first-order-logic formulas, (ii) Constrained Horn Clauses, and (iii) µCLP queries. Furthermore, our encoding shines light on some limitations of the SemGuS framework, such as its inability to model nondeterminism and reactive synthesis. We thus propose a modification to SemGuS that makes it more expressive, and for which verifying solutions is exactly equivalent to proving validity of a query in the µCLP calculus. Our implementation of SemGuS verifiers based on the above encoding can verify instances that were not even encodable in previous work. Furthermore, we use our SemGuS verifiers within an enumeration-based SemGuS solver to correctly synthesize solutions to SemGuS problems that no previous SemGuS synthesizer could solve.
Article Search
Handling the Selection Monad
Gordon Plotkin and
Ningning Xie
(Google DeepMind, USA)
The selection monad on a set consists of selection functions. These select an element from the set, based on a loss (dually, reward) function giving the loss resulting from a choice of an element. Abadi and Plotkin used the monad to model a language with operations making choices of computations taking account of the loss that would arise from each choice. However, their choices were optimal, and they asked if they could instead be programmer provided.
In this work, we present a novel design enabling programmers to do so. We present a version of algebraic effect handlers enriched by computational ideas inspired by the selection monad. Specifically, as well as the usual delimited continuations, our new kind of handlers additionally have access to choice continuations, that give the possible future losses. In this way programmers can write operations implementing optimisation algorithms that are aware of the losses arising from their possible choices.
We give an operational semantics for a higher-order model language λ C, and establish desirable properties including progress, type soundness, and termination for a subset with a mild hierarchical constraint on allowable operation types. We give this subset a selection monad denotational semantics, and prove soundness and adequacy results. We also present a Haskell implementation and give a variety of programming examples.
Article Search
Spineless Traversal for Layout Invalidation
Marisa Kirisame,
Tiezhi Wang, and
Pavel Panchekha
(University of Utah, USA; Tongji University, China)
Latency is a major concern for web rendering engines like those in Chrome, Safari, and Firefox. These engines reduce latency by using an incremental layout algorithm to redraw the page when the user interacts with it. In such an algorithm, elements that change frame-to-frame are marked dirty, and only those elements are processed to draw the next frame, dramatically reducing latency. However, the standard incremental layout algorithm must search the page for dirty elements, accessing auxiliary elements in the process. These auxiliary elements add cache misses and stalled cycles, and are responsible for a sizable fraction of all layout latency.
We introduce a new, faster incremental layout algorithm called Spineless Traversal. Spineless Traversal uses a cache-friendlier priority queue algorithm that avoids accessing auxiliary nodes and thus reduces cache traffic and stalls. This leads to dramatic speedups on the most latency-critical interactions such as hovering, typing, and animation. Moreover, thanks to numerous low-level optimizations, Spineless Traversal is competitive across the whole spectrum of incremental layout workloads. Spineless Traversal is faster than the standard approach on 83.0% of 2216 benchmarks, with a mean speedup of 1.80× concentrated in the most latency-critical interactions.
Article Search
Slotted E-Graphs: First-Class Support for (Bound) Variables in E-Graphs
Rudi Schneider,
Marcus Rossel,
Amir Shaikhha,
Andrés Goens,
Thomas Kœhler, and
Michel Steuwer
(Technische Universität Berlin, Germany; Barkhausen Institut, Germany; University of Edinburgh, UK; University of Amsterdam, Netherlands; CNRS - ICube Lab, France)
Equality saturation has gained significant interest as a powerful optimization and reasoning technique. At its heart is the e-graph data structure, that space-efficiently represents equal sub-terms uniquely.
An important open problem in this context is extending this efficient representation to languages featuring (bound) variables.
Independent of how we represent variables in e-graphs, either as names or nameless (using de Bruijn indices), sharing is broken as sub-terms that differ only in the names of their variables are represented separately.
This results in aggressive e-graph growth, bad performance, as well as reduced expressiveness.
In this paper, we present a novel approach to representing bound variables in e-graphs by making them a first-class built-in feature of the data structure.
Our slotted e-graph represents terms that differ only by (bound or free) variable names uniquely.
To do so, e-classes that represent equivalent terms via e-nodes are parameterized by slots, abstracting over free variables of the represented terms.
Referring to an e-class from an e-node now requires relating the variables from its context to the slots of the e-class.
Our evaluation of slotted e-graph uses two case studies from compiler optimization and theorem proving to show that performing equality saturation for languages with bound variables is greatly simplified and that we can solve practically relevant problems that cannot be solved with e-graphs using de Bruijn indices.
Preprint
Usability Barriers for Liquid Types
Catarina Gamboa,
Abigail Reese,
Alcides Fonseca, and
Jonathan Aldrich
(Carnegie Mellon University, USA; University of Lisbon, Portugal)
Liquid types can express richer verification properties than simple type systems. However, despite their advantages, liquid types have yet to achieve widespread adoption. To understand why, we conducted a study analyzing developers' challenges with liquid types, focusing on LiquidHaskell.
Our findings reveal nine key barriers that span three categories, including developer experience, scalability challenges with complex and large codebases, and understanding the verification process. Together, these obstacles provide a comprehensive view of the usability challenges to the broader adoption of liquid types and offer insights that can inform the current and future design and implementation of liquid type systems.
Article Search
Guided Tensor Lifting
Yixuan Li,
José Wesley de Souza Magalhães,
Alexander Brauckmann,
Michael F. P. O'Boyle, and
Elizabeth Polgreen
(University of Edinburgh, UK)
Domain-specific languages (DSLs) for machine learning are revolutionizing the speed and efficiency of machine learning workloads as they enable users easy access to high-performance compiler optimizations and accelerators. However, to take advantage of these capabilities, a user must first translate their legacy code from the language it is currently written in, into the new DSL. The process of automatically lifting code into these DSLs has been identified by several recent works, which propose program synthesis as a solution. However, synthesis is expensive and struggles to scale without carefully designed and hard-wired heuristics. In this paper, we present an approach for lifting that combines an enumerative synthesis approach with a Large Language Model used to automatically learn the domain-specific heuristics for program lifting, in the form of a probabilistic grammar. Our approach outperforms the state-of-the-art tools in this area, despite only using learned heuristics.
Article Search
Correctly Rounded Math Libraries without Worrying about the Application’s Rounding Mode
Sehyeok Park,
Justin Kim, and
Santosh Nagarakatte
(Rutgers University, USA)
Our RLibm project has recently proposed methods to generate a single implementation for an elementary
function that produces correctly rounded results for multiple rounding modes and representations with
up to 32-bits. They are appealing for developing fast reference libraries without double rounding issues. The key insight is to build polynomial approximations that produce the correctly rounded result for a representation with two additional bits when compared to the largest target representation and with the "non-standard" round-to-odd rounding mode, which makes double rounding the RLibm math library result to any smaller target representation innocuous. The resulting approximations generated by the RLibm approach are implemented with machine supported floating-point operations with the round-to-nearest rounding mode. When an application uses a rounding mode other than the round-to-nearest mode, the RLibm math library saves the application's rounding mode, changes the system's rounding mode to round-to-nearest, computes the correctly rounded result, and restores the application’s rounding mode. This frequent change of rounding modes has a performance cost.
This paper proposes two new methods, which we call rounding-invariant outputs and rounding-invariant
input bounds, to avoid the frequent changes to the rounding mode and the dependence on the round-to-nearest mode. First, our new rounding-invariant outputs method proposes using the round-to-zero rounding mode to implement RLibm's polynomial approximations. We propose fast, error-free transformations to emulate a round-to-zero result from any standard rounding mode without changing the rounding mode. Second, our rounding-invariant input bounds method factors any rounding error due to different rounding modes using interval bounds in the RLibm pipeline. Both methods make a different set of trade-offs and improve the performance of resulting libraries by more than 2X.
Article Search
Practical Type Inference with Levels
Andong Fan,
Han Xu, and
Ningning Xie
(University of Toronto, Canada; Princeton University, USA)
Modern functional languages rely on sophisticated type inference algorithms. However, there often exists a gap between the theoretical presentation of these algorithms and their practical implementations. Specifically, implementations employ techniques not explicitly included in formal specifications, causing undesirable consequences. First, this leads to confusion and unforeseen challenges for developers adhering to the formal specification. Moreover, theoretical guarantees established for a formal presentation may not directly translate to the implementation. This paper focuses on formalizing one such technique, known as levels, which is widely used in practice but whose theoretical treatment remains largely understudied. We present the first comprehensive formalization of levels and demonstrate their applicability to type inference implementations.
Article Search
AWDIT: An Optimal Weak Database Isolation Tester
Lasse Møldrup and
Andreas Pavlogiannis
(Aarhus University, Denmark)
Database isolation is a formal contract concerning the level of data consistency that a database provides to its clients. In order to achieve low latency, high throughput, and partition tolerance, modern databases forgo strong transaction isolation for weak isolation guarantees. However, several production databases have been found to suffer from isolation bugs, breaking their data-consistency contract. Black-box testing is a prominent technique for detecting isolation bugs, by checking whether histories of database transactions adhere to a prescribed isolation level.
In order to test databases on realistic workloads of large size, isolation testers must be as efficient as possible, a requirement that has initiated a study of the complexity of isolation testing. Although testing strong isolation has been known to be NP-complete, weak isolation levels were recently shown to be testable in polynomial time, which has propelled the scalability of testing tools. However, existing testers have a large polynomial complexity, restricting testing to workloads of only moderate size, which is not typical of large-scale databases. How efficiently can we provably test weak database isolation?
In this work, we develop AWDIT, a highly-efficient and provably optimal tester for weak database isolation. Given a history H of size n and k sessions, AWDIT tests whether H satisfies the most common weak isolation levels of Read Committed (RC), Read Atomic (RA), and Causal Consistency (CC) in time O(n3/2), O(n3/2), and O(n· k), respectively, improving significantly over the state of the art. Moreover, we prove that AWDIT is essentially optimal, in the sense that there is a lower bound of n3/2, based on the combinatorial BMM hypothesis, for any weak isolation level between RC and CC. Our experiments show that AWDIT is significantly faster than existing, highly optimized testers; e.g., for the ∼20% largest histories, AWDIT obtains an average speedup of 245×, 193×, and 62× for RC, RA, and CC, respectively, over the best baseline.
Preprint
Artifacts Available
proc time: 19.67