OOPSLA1 2026
Proceedings of the ACM on Programming Languages, Volume 10, Number OOPSLA1
Powered by
Conference Publishing Consulting

Proceedings of the ACM on Programming Languages, Volume 10, Number OOPSLA1

OOPSLAA – Journal Issue

Contents - Abstracts - Authors
Title Page


Article: oopslaa26foreword-fm000-p doi:
Editorial Message
The Proceedings of the ACM series presents high-quality research across diverse areas of computer science, as represented by the ACM Special Interest Groups (SIGs). The Proceedings of the ACM on Programming Languages (PACMPL) focuses on research on all aspects of programming languages, from design to implementation and from mathematical formalisms to empirical studies. The journal operates in close collaboration with the Special Interest Group on Programming Languages (SIGPLAN) and is committed to making high-quality peer-reviewed scientific research in programming languages free of restrictions on both access and use. This issue of PACMPL publishes 75 articles selected for OOPSLA1 2026. These articles represent a broad cross-section of current programming-languages research, spanning language design, verification, program analysis, compilers, systems, security, software engineering, AI for programming, and human-centered aspects of PL. Collectively, these papers reflect both the technical depth and the intellectual breadth of the community.

Article: oopslaa26foreword-fm001-p doi:
Learning Symmetric Invariants from Symmetric Samples
Zhijie Xu and Fei He
(Tsinghua University, China; Key Laboratory for Information System Security, China)
Invariant synthesis is a fundamental problem in program verification, yet existing learning-based approaches rarely exploit the inherent symmetry present in many programs, particularly parameterized and concurrent systems. Such symmetry induces a symmetric reachable state space, naturally yielding symmetric samples and admitting symmetric invariants, motivating the task of learning symmetric invariants from symmetric samples. To this end, we introduce symmetric decision trees (SDTs), a novel hypothesis class that enforces symmetry structurally, guaranteeing symmetric invariants by construction. Furthermore, we develop a learning algorithm to construct SDTs and integrate it as the learner within the Horn-ICE framework, yielding our approach, Horn-SDT. Empirical evaluation on parameterized programs demonstrates that Horn-SDT achieves faster convergence and constructs more compact trees compared to non-symmetric baselines.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p9-p doi:10.1145/3798200
Decompiling for Constant-Time Analysis
Santiago Arranz-Olmos, Gilles Barthe, Lionel Blatter, Youcef Bouzid, Sören van der Wall, and Zhiyuan Zhang
(MPI-SP, Germany; IMDEA Software Institute, Spain; ENS Paris-Saclay, France; TU Braunschweig, Germany)
The constant-time programming discipline is commonly used to protect cryptographic libraries against side-channel attacks. However, it is hard to write constant-time code; moreover, compilers can introduce constant-time violations. Therefore, it is important to ensure that assembly code is constant-time. One approach is to show that source programs are constant-time, and that constant-timeness is preserved by compilation. In this paper, we explore the methodological soundness and scalability of the Decompile-then-Analyze approach, a less conventional alternative that has been suggested in the broader setting of static analysis. Informally, the Decompile-then-Analyze approach uses decompilers a front-end for static analysis tools. As a motivation for our study, we show that current decompilers eliminate CT vulnerabilities before CT analysis, leading to non-CT programs being accepted as constant-time. Independently, we provide constructed examples of non-CT, exploitable, programs that are accepted by two popular CT analysis tools; in both cases the culprit are program transformations that are used internally prior to CT analysis and eliminate CT violations. While our examples do not invalidate the general approach of these tools, they emphasize the need for studying the Decompile-then-Analyze approach. On the methodological side, we define the notion of CT transparency. Informally, a program transformation is CT transparent if does not eliminate nor introduce CT violations. We also provide general methods for proving that a transformation is CT transparent, and show that several transformations of interest are transparent. We also sketch an extension of CT transparency to speculative constant-time, which is used by cryptographic software as a protection against Spectre attacks. On the practical side, we build a CT-transparent version of the popular LLVM-based decompiler RetDec, and combine it with CT-LLVM, an existing CT verification tool for LLVM. We evaluate the resulting tool, called CT-RetDec on a benchmark set of real-world vulnerabilities in binaries, and show that the modifications had significant impact on how well CT-RetDec performs.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p13-p doi:10.1145/3798201
Taming the Hydra: Targeted Control-Flow Transformations for Dynamic Symbolic Execution
Charitha Saumya, Muhammad Hassan, Rohan Gangaraju, Milind Kulkarni, and Kirshanthan Sundararajah
(Intel, USA; Virginia Tech, USA; University of Texas at Austin, USA; Purdue University, USA)
Dynamic Symbolic Execution (DSE) suffers from the path explosion problem when the target program has many conditional branches. The classical approach for managing the path explosion problem is dynamic state merging. Dynamic state merging combines similar symbolic program states to avoid the exponential growth in the number of states during DSE. However, state merging still requires solver invocations at each program branch, even when both paths of the branch are feasible. Moreover, the best path search strategy for DSE may not create the best state merging opportunities. Some drawbacks of state merging can be mitigated by compile-time state merging (i.e., branch elimination by converting control-flow into dataflow). In this paper, we propose a non-semantics-preserving but failure-preserving compiler transformation for removing expensive symbolic branches in a program to improve the scalability of DSE. We have developed a framework for detecting spurious bugs that our transformation can insert. Finally, we show that our transformation can significantly improve the performance of DSE on various benchmark programs and help improve the performance of coverage and bug discovery of large real-world programs.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p20-p doi:10.1145/3798202
Floating-Point Usage on GitHub: A Large-Scale Study of Statically Typed Languages
Andrea Gilot, Tobias Wrigstad, and Eva Darulova
(Uppsala University, Sweden)
Reasoning about floating-point arithmetic is notoriously hard. While static and dynamic analysis techniques or program repair have made significant progress, more work is still needed to make them relevant to real-world code. On the critical path to that goal is understanding what real-world floating-point code looks like. To close that knowledge gap, this paper presents the first large-scale empirical study of floating-point arithmetic usage across public GitHub repositories. We focus on statically typed languages to allow our study to scale to millions of repositories. We follow state-of the art mining practices including random sampling and filtering based on only intrinsic properties to avoid bias, and identify floating-point usage by searching for keywords in the source code, and programming language constructs (e.g., loops) by parsing the code. Our evaluation supports the claim often made in papers that floating-point arithmetic is widely used. Comparing statistics such as size and usage of certain constructs and functions, we find that benchmarks used in literature to evaluate automated reasoning techniques for floating-point arithmetic are in certain aspects representative of 'real-world' code, but not in all. We publish a dataset of 10 million real-world floating-point functions extracted from our study. We demonstrate in a case study how it may be used to identify new floating-point benchmarks and help future techniques for floating-point arithmetic to be designed and evaluated to match actual users' expectations.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p22-p doi:10.1145/3798203
OBsmith: LLM-Powered JavaScript Obfuscator Testing
Shan Jiang, Chenguang Zhu, and Sarfraz Khurshid
(University of Texas at Austin, USA)
JavaScript obfuscators are widely deployed to protect intellectual property and resist reverse engineering, yet their correctness has been largely overlooked compared to performance and resilience. Existing evaluations typically measure resistance to deobfuscation, leaving the critical question of whether obfuscators preserve program semantics unanswered. Incorrect transformations can silently alter functionality, compromise reliability, and erode security—undermining the very purpose of obfuscation. To address this gap, we present OBsmith, a novel framework to systematically test JavaScript obfuscators using large language models (LLMs). OBsmith leverages LLMs to generate program sketches—abstract templates capturing diverse language constructs, idioms, and corner cases—which are instantiated into executable programs and subjected to obfuscation under different configurations. Besides LLM-powered sketching, OBsmith also employs a second source: automatic extraction of skeletons from real programs. This extraction path enables more focused testing of project-specific features and lets developers inject domain knowledge into the resulting test cases. OBsmith uses two techniques to derive test oracles: (i) reference-oriented equivalence testing, which takes the original program as reference oracle (ground truth) and checks whether the obfuscated version preserves equivalent functionality, and (ii) metamorphic testing, which applies semantics-preserving transformations to the original program and checks if obfuscation violates expected behavior.
We evaluate OBsmith on two widely used obfuscators, Obfuscator.IO and JS-Confuser, generating 600 sketches using six popular LLMs. OBsmith fills these sketches and generates over 3,000 candidate programs and obfuscates them across seven obfuscation configurations. OBsmith uncovers 11 previously unknown correctness bugs. Under an equal program budget, five general purpose state-of-the-art JavaScript fuzzers (FuzzJIT, Jsfunfuzz, Superion, DIE, Fuzzilli) failed to detect these issues, highlighting OBsmith’s complementary focus on obfuscation-induced misbehavior. An ablation shows that all components except our generic MRs contribute to at least one bug class; the negative MR result suggests the need for obfuscator-specific metamorphic relations. Our results also seed a discussion on how to balance obfuscation presets and performance cost. We envision OBsmith as an important step towards automated testing and quality assurance of obfuscators and other semantic-preserving toolchains.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Article: oopslaa26main-p25-p doi:10.1145/3798204
Peeling Off the Cocoon: Unveiling Suppressed Golden Seeds for Mutational Greybox Fuzzing
Ruixiang Qian, Chunrong Fang, Zengxu Chen, Youxin Fu, and Zhenyu Chen
(Nanjing University, China)
Mutational greybox fuzzing (MGF) is a powerful software testing technique. Initial seeds are critical for MGF since they define the space of possible inputs and fundamentally shape the effectiveness of MGF. Nevertheless, having more initial seeds is not always better. A bloated initial seed set can inhibit throughput, thereby degrading the effectiveness of MGF. To avoid bloating, modern fuzzing practices recommend performing seed selection to maintain golden seeds (i.e., those identified as beneficial for MGF) while minimizing the size of the set. Typically, seed selection favors seeds that execute unique code regions and discards those that contribute stale coverage. This coverage-based strategy is straightforward and useful, and is widely adopted by the fuzzing community. However, coverage-based seed selection (CSS) is not flawless and has a notable blind spot: it fails to identify golden seeds suppressed by unpassed coverage guards, even if these seeds contain valuable payload that can benefit MGF. This blind spot prevents suppressed golden seeds from realizing their true values, which may ultimately degrade the effectiveness of downstream MGF.
In this paper, we propose a novel technique named PoCo to address the blind spot of traditional CSS. The basic idea behind PoCo is to manifest the true strengths of the suppressed golden seeds by gradually disabling obstacle conditional guards. To this end, we develop a lightweight program transformation to enable flexible disabling of guards and devise a novel guard hierarchy analysis to identify obstacle ones. An iterative seed selection algorithm is constructed to stepwise select suppressed golden seeds. We prototype PoCo on top of the AFL++ utilities (version 4.10c) and compare it with seven baselines, including two state-of-the-art tools afl-cmin and OptiMin. Compared with afl-cmin, PoCo selects 3–40 additional seeds within a practical time budget of two hours. To evaluate how effective the studied techniques are in seeding MGF, we further conduct extensive fuzzing (over 17,280 CPU hours) with eight different targets from a mature benchmark named Magma, adopting the most representative fuzzer AFL++ for MGF. The results show that the additional seeds selected by PoCo yield modest improvements in both code coverage and bug discovery. Although our evaluation reveals some limitations of PoCo, it also demonstrates the presence and value of suppressed golden seeds. Based on the evaluation results, we distill lessons and insights that may inspire the fuzzing community.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p37-p doi:10.1145/3798205
Diatom: Polylithic Binary Lifting with Data-Flow Summaries and Type-Aware IR Linking
Anshunkang Zhou and Charles Zhang
(Hong Kong University of Science and Technology, China)
Binary lifting, which translates binary code into LLVM intermediate representations (IRs) through iterative IR transformations for recovering high-level constructs from low-level machine features, is the cornerstone of many binary analysis systems. Therefore, the scalability and precision of the upper layer analysis could be greatly affected by the underlying binary lifting. However, all existing binary lifters still suffer from severe performance problems in that they require much time to handle extremely large binaries, which becomes a barrier to achieving the expected performance gains in various analyses and hinders them from meeting the requirement of quick response in modern continuous integration pipelines. We found that the root cause of the scalability issue is the inherent "monolithic" design that performs all lifting stages on a single LLVM module, which entails a global environment that enforces sequential dependences between any two transformations on IRs, thus limiting the parallelism.
This paper presents Diatom, a novel parallel binary lifter powered by a new "polylithic" design, which decomposes the monolithic LLVM module into partitions to perform fully parallelized binary lifting. In the meantime, it leverages light-weight data-flow summaries and type-aware IR linking to avoid soundness loss caused by separating dependent code fragments. Large-scale experiments on 16 real-world benchmarks whose sizes range from dozens of megabytes (MBs) to several gigabytes (GBs) show that Diatom achieves an average speedup of 7.45× and a maximum speedup of 16.8× over a traditional monolithic binary lifter, while still maintaining the lifting soundness. Diatom can complete the translation for the Linux Kernel binary within only 10 minutes, which significantly accelerates the overall binary code analysis process.

Publisher's Version Article: oopslaa26main-p41-p doi:10.1145/3798206
Handling Exceptions and Effects with Automatic Resource Analysis
Ethan Chu, Yiyang Guo, and Jan Hoffmann
(Carnegie Mellon University, USA)
There exist many techniques for automatically deriving parametric resource (or cost) bounds by analyzing the source code of a program. These techniques work effectively for a large class of programs and language features. However, non-local transfer of control as needed for exception or effect handlers has remained a challenge.
This paper presents the first automatic resource bound analysis that supports non-local control transfer between exceptions or effects and their handlers. The analysis is an extension of type-based automatic amortized resource analysis (AARA), which automates the potential method of amortized analysis. It is presented for a simple functional language with lists and linear potential functions. However, the ideas are directly applicable to richer settings and implemented for Standard ML and polynomial potential functions.
Apart from the new type system for exceptions and effects, a main contribution is a novel syntactic type- soundness theorem that establishes the correctness of the derived bounds with respect to a stack-based abstract machine. An experimental evaluation shows that the new analysis is capable of analyzing programs that cannot be analyzed by existing methods and that the efficiency overhead of supporting exception and effect handlers is low.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p42-p doi:10.1145/3798207
RAT-CAT-SAT: Model Checking Memory Consistency Models
Jan Grünke, Thomas Haas, and Roland Meyer
(TU Braunschweig, Germany)
We present a model checking approach for axiomatic memory consistency models written in the language CAT. It can prove properties of single memory models like the monotonicity of barriers, and also compare models like TSO and ARM. To achieve this expressiveness, our approach supports the full rational fragment of CAT. We not only support the Kleene star operation, composition, and union to define relations in a memory model, but also, for the first time, intersection, converse, and the construction of relations from sets.
Our model checking approach for memory consistency models is logical in nature: we formulate the problem as satisfiability in a logical theory of relations. Our technical contribution is then a new theory solver that is sound, complete, and optimal from a complexity point of view. At the heart of our solver is a cyclic proof system that can detect invariants on-the-fly. This allows us to terminate early not only in the case that a model has been found, but also in the case of unsatisfiability. Our solver is easy to implement and easy to combine with heuristics. We have implemented a combination with counterexample-guided abstraction refinement, and exercised the prototype on a number of benchmarks - with promising results.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p45-p doi:10.1145/3798208
Specy: Learning Specifications for Distributed Systems from Event Traces
Mike He, Ankush Desai, Aishwarya Jagarapu, Doug Terry, Sharad Malik, and Aarti Gupta
(Princeton University, USA; Snowflake, USA; Amazon Web Services, USA; LinkedIn, USA)
Reasoning about the correctness of distributed systems is a significant challenge, with precise correctness specifications serving as an essential prerequisite to verification. However, identifying and formulating specifications remains a major hurdle for developers in practice. Specy addresses this challenge by automatically learning specifications from observable event traces generated by message exchanges in distributed systems. The system employs a specialized grammar tailored for event-based specifications, incorporating support for quantifiers over events – a capability essential for capturing the complex behavioral patterns inherent in distributed protocols. Specy utilizes a novel learning procedure that combines grammar-based enumerative search with dynamic learning from event traces, providing effective control over the specification search. We evaluated Specy on established distributed protocols and industrial case studies, demonstrating its ability to successfully learn important protocol specifications. Specy can discover previously unidentified specifications overlooked by developers, automatically derive inductive invariants that were previously constructed manually for verification purposes, and, through run-time monitoring in production systems, reveal gaps in testing coverage – highlighting opportunities to leverage specifications in practice.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p48-p doi:10.1145/3798209
Commuting Conversions and Join Points for Call-by-Push-Value
Jonathan Chan, Madi Gudin, Annabel Levy, and Stephanie Weirich
(University of Pennsylvania, USA; Amherst College, USA; University of Maryland, Baltimore County, USA)
Levy’s call-by-push-value (CBPV) is a language that subsumes both call-by-name and call-by-value lambda calculi by syntactically distinguishing values from computations and explicitly specifying execution order. This low-level handling of computation suspension and resumption makes CBPV suitable as a compiler intermediate representation (IR), while its substitution evaluation semantics affords compositional reasoning about programs. In particular, βη-equivalences in CBPV have been used to justify compiler optimizations in low-level IRs. However, these equivalences do not validate commuting conversions, which are key transformations in compiler passes such as A-normalization. Such transformations syntactically rearrange computations without affecting evaluation order, and can reveal new opportunities for inlining.
In this work, we identify the commuting conversions of CBPV, define a commuting conversion normal form (CCNF) for CBPV, present a single-pass transformation into CCNF based on A-normalization, and prove that well-typed, translated programs evaluate to the same result. To avoid the usual code duplication issues that also arise with A-normal form, we adapt the explicit join point constructs by Maurer et al. Our results are all mechanized in Lean 4.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p54-p doi:10.1145/3798210
Hermes: Making Path-Sensitive Pointer Analysis Scalable for Sparse Value-Flow Analysis
Yuxuan He, Ruilin Jiang, He Zhang, Qingkai Shi, Huaxun Huang, and Rongxin Wu
(Xiamen University, China; Nanjing University, China)
Sparse Value-Flow Analysis (SVFA) is essential for detecting software bugs such as null pointer dereference and memory leak. However, SVFA heavily relies on path-sensitive pointer analysis, which faces significant scalability challenges when analyzing industrial-scale projects, notably the summary-explosion problem. To address this issue, we propose Hermes, which symbolizes memory side effects and constructs an incomplete Sparse Value-Flow Graph (SVFG) called Lazy Symbolic Expression Graph (LSEG). Leveraging this structure, Hermes builds inter-procedural value flows relevant to bug detection only when necessary, significantly reducing the overhead of pointer analysis and streamlining the bug-search paths. Evaluations on large-scale real-world projects demonstrate that, compared to the state-of-the-art, Hermes achieves average speedups of at least 9.84× and 4.79× for pointer analysis and bug search, respectively, without sacrificing the effectiveness of bug detection.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: oopslaa26main-p61-p doi:10.1145/3798211
MetaSpace: Metamorphic Testing for Spatial Cognition in Embodied Agents
Gengyang Xu, Dongwei Xiao, Yiteng Peng, and Shuai Wang
(Hong Kong University of Science and Technology, China)
An embodied agent is an intelligent entity that interacts with its environment through a physical body. Currently, the evaluation of embodied agents primarily relies on two paradigms: (1) manually annotated Visual Question Answering (VQA) pairs and (2) high-level task completion metrics, such as success in navigation or manipulation. The former is labor-intensive and subject to variability in annotation quality. The latter may obscure critical vulnerabilities, allowing agents to complete tasks through suboptimal means or safety violations, thereby concealing safety risks and inefficiencies. Given that spatial cognition is the cornerstone for executing embodied tasks, there is a pressing need to assess whether embodied agents possess robust spatial cognition during task execution.
Inspired by metamorphic testing principles in software engineering, we propose MetaSpace, a novel framework designed to evaluate the spatial cognition of agents. By leveraging spatiotemporal multimodal states derived from real execution trajectories, MetaSpace automatically generates test cases based on predefined metamorphic relations (MRs) grounded in logical rules and physical laws. Crucially, we encode these MRs as executable rules in a logic programming language (Prolog). Violations of these relations indicate failures in spatial cognition. Our empirical evaluation across three embodied scenarios demonstrates that MetaSpace successfully detects 90,422 spatial cognition errors in state-of-the-art (SOTA) MLLM-driven agents. We introduce the Spatial Cognition (SC) score to quantify performance. Results indicate that all SOTA agents achieve average scores between 0.44 and 0.52, significantly lower than the human benchmark of 0.96. Additionally, these agents struggle with directional tasks, with SC scores consistently below 0.38. In contrast, their performance in magnitude-related tasks is relatively better, with most SC scores exceeding 0.5. To mitigate the identified spatial cognition errors, we explore potential improvement strategies. Preliminary results suggest that traditional prompting techniques (e.g., Chain of Thought) are limited, while spatially-aware prompting (e.g., cognitive maps) shows promise. Our findings underscore the importance of ongoing community efforts to enhance embodied agent performance by prioritizing the improvement of spatial cognition, a fundamental requirement for executing embodied tasks.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p63-p doi:10.1145/3798212
Debugging Debugging Information using Dynamic Call Trees
J. Ryan Stinnett and Stephen Kell
(King's College London, UK)
Debugging tools rely on compiler-generated metadata to present a source-language view, but current compilers often throw away or corrupt debugging information in optimised programs. Attempts to test debugging information are confounded by ad-hoc limitations of the debug info formats and a lack of clarity on whether the compiler or the format is to blame for any given loss. Adopting the "residual program" conceptual view of debug info, we conduct a study of the quality of debugging information in respect of the source-level dynamic call trees it can recover. We compare the trees recovered from optimised and unoptimised versions of the same program, producing a classification of the observed divergences. For each class, we analyse whether format or compiler is to blame and identify specific ways to address these defects. We also validate our classification across a larger collection of well-known codebases.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Article: oopslaa26main-p64-p doi:10.1145/3798213
Beer: Interactive Alarm Resolution in Bayesian Program Analysis via Exploration-Exploitation
Haoran Lin, Zhenyu Yan, and Xin Zhang
(Peking University, China)
Interactive Bayesian program analysis enhances static analysis by modeling derivations as probabilistic dependencies, enabling ranking alarms by calculated confidences, proposing highly likely alarms for user inspection, and updating confidences with inspection results. Existing interactive approaches adopt a purely greedy, exploitation-only selection strategy that always inspects the highest-confidence alarm. However, such strategies are prone to local optima, leading to redundant inspections and delayed identification of true alarms. We propose Beer (Bayesian Exploration-Exploitation Ranker), a framework that systematically integrates the Exploration-Exploitation trade-off into Bayesian program analysis. Beer leverages structural correlations between alarms—derived from shared root causes in the Bayesian model—to estimate expected information gain and guide exploration. When repeated false alarms indicate model stagnation, Beer selects alarms from minimally explored, highly correlated clusters to accelerate learning. Implemented atop the Bingo framework, Beer achieves up to 32% effectiveness in ranking efficiency over the greedy baseline on datarace, thread-escape, and taint analyses, demonstrating the efficacy of exploration-guided alarm resolution.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p91-p doi:10.1145/3798214
noDice: Inference for Discrete Probabilistic Programs with Nondeterminism and Conditioning
Tobias Gürtler and Benjamin Lucien Kaminski
(Saarland University, Germany; Saarland Informatics Campus, Germany; University College London, UK)
Probabilistic programming languages (PPLs) are an expressive and intuitive means of representing complex probability distributions. In that realm, languages like Dice target an important class of probabilistic programs: those whose probability distributions are discrete. Discrete distributions are common in many fields, including text analysis, network verification, artificial intelligence, and graph analysis. Another important feature in the world of probabilistic modeling are nondeterministic choices as found in Markov Decision Processes (MDPs) which play a major role in reinforcement learning. Modern PPLs usually lack support for nondeterminism. We address this gap with the introduction of noDice, which extends the discrete probabilistic inference engine Dice. noDice performs inference on loop-free programs by constructing an MDP so that the distributions modeled by the program correspond to schedulers in the MDP. Furthermore, decision diagrams are used as an intermediate step to exploit the program structure and drastically reduce the state space of the MDP.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p98-p doi:10.1145/3798215
Mechanically Translating Iterative Dataflow Analysis to Algebraic Program Analysis
Chenyu Zhou, Jingbo Wang, and Chao Wang
(University of Southern California, USA; Purdue University, USA)
We propose a method for mechanically translating iterative dataflow analysis (IDA) algorithms to algebraic program analysis (APA) algorithms capable of computing exactly the same set of dataflow facts. The method is useful because while most of the dataflow analysis algorithms used in practice are expressed as iterative procedures, APA provides an alternative and inherently-compositional approach to solving dataflow problems, thus making it well suited for certain applications (e.g., incremental analysis of a program that goes through frequent code changes, or amortizing the cost of answering a large number of dataflow queries for the same program). However, manually crafting an APA algorithm not only is labor intensive and error prone but also can lead to suboptimal performance. Our method overcomes the limitation by providing a mechanical translation that guarantees to be correct by construction. Our method handles a broad class of dataflow analysis problems — the only requirements are that (1) the set of dataflow facts is finite and (2) the dataflow functions distribute over the confluence operation (e.g., set union). They include classical dataflow problems whose IDA algorithms can be expressed using Gen/Kill sets, such as reaching definitions, live variables, and available expressions. They also include non Gen/Kill problems such as copy constant propagation, truly-live variables, and possibly-initialized variables. Our experimental evaluation shows that the translated APA algorithms are not only simpler and easier to understand, but also significantly faster than manually-designed APA algorithms, especially for incremental program analysis.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: oopslaa26main-p99-p doi:10.1145/3798216
SymGPT: Auditing Smart Contracts via Combining Symbolic Execution with Large Language Models
Shihao Xia, Mengting He, Shuai Shao, Tingting Yu, Yiying Zhang, Nobuko Yoshida, and Linhai Song
(Pennsylvania State University, USA; University of Connecticut, USA; University of California at San Diego, USA; University of Oxford, UK; Institute of Computing Technology at Chinese Academy of Sciences, China)
To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each defining a set of rules governing contract behavior. Violating these rules can cause serious security issues and financial losses, signifying the importance of verifying ERC compliance. Today’s practices of such verification include manual audits, expert-developed program-analysis tools, and large language models (LLMs), all of which remain ineffective at detecting ERC rule violations.
This paper introduces SymGPT, a tool that combines LLMs with symbolic execution to automatically verify smart contracts’ compliance with ERC rules. We begin by empirically analyzing 132 ERC rules from three major ERC standards, examining their content, security implications, and natural language descriptions. Based on this study, SymGPTinstructs an LLM to translate ERC rules into a domain-specific language, synthesizes constraints from the translated rules to model potential rule violations, and performs symbolic execution for violation detection. Our evaluation shows that SymGPTidentifies 5,783 ERC rule violations in 4,000 real-world contracts, including 1,375 violations with clear attack paths for financial theft. Furthermore, SymGPToutperforms six automated techniques and a security-expert auditing service, underscoring its superiority over current smart contract analysis methods.

Publisher's Version Article: oopslaa26main-p107-p doi:10.1145/3798217
CMakeSonar: A Static Approach to Detecting CMake Bugs with a Fine-Grained Type System
Haotian Han, Zihang Zhong, Qingan Li, Jingling Xue, and Mengting Yuan
(Wuhan University, China; UNSW Sydney, Australia)
As build systems and their scripts grow in size and complexity, detecting bugs in build configurations becomes increasingly challenging due to the rich functionality and weak typing of build scripting languages. This paper introduces CMakeSonar, the first static approach to precisely identifying semantic bugs in CMake scripts. CMakeSonar addresses this challenge by (1) designing a fine-grained type system that captures the runtime semantics of CMake values, and (2) performing a flow-sensitive analysis that detects inconsistent and ill-typed value usages by solving type constraints. Our approach identifies configuration and usage errors that can silently affect build correctness, portability, and deployment safety. In our evaluation, CMakeSonar identifies 155 bugs across 36 real-world CMake projects on GitHub, of which 23 have been accepted and fixed by developers. With a false positive rate of 4.32% and a recall of 97.48%, CMakeSonar demonstrates that precise static analysis can effectively uncover high-impact bugs in untyped build systems.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p114-p doi:10.1145/3798218
When Specifications Meet Reality: Uncovering API Inconsistencies in Ethereum Infrastructure
Jie Ma, Ningyu He, Jinwen Xi, Mingzhe Xing, Liangxin Liu, Jiushenzi Luo, Xiaopeng Fu, Chiachih Wu, Haoyu Wang, Ying Gao, and Yinliang Yue
(Beihang University, China; Zhongguancun Laboratory, China; Hong Kong Polytechnic University, Hong Kong; Beijing Institute of Technology, China; Amber Group, Hong Kong; Huazhong University of Science and Technology, China)
The Ethereum ecosystem, which secures over $381 billion in assets, fundamentally relies on client APIs as the sole interface between users and the blockchain. However, these critical APIs suffer from widespread implementation inconsistencies, which can lead to financial discrepancies, degraded user experiences, and threats to network reliability. Despite this criticality, existing testing approaches remain manual and incomplete: they require extensive domain expertise, struggle to keep pace with Ethereum's rapid evolution, and fail to distinguish genuine bugs from acceptable implementation variations. We present APIDiffer, the first specification-guided differential testing framework designed to automatically detect API inconsistencies across Ethereum's diverse client ecosystem. APIDiffer transforms API specifications into comprehensive test suites through two key innovations: (1) specification-guided test input generation that creates both syntactically valid and invalid requests enriched with real-time blockchain data, and (2) specification-aware false positive filtering that leverages large language models to distinguish genuine bugs from acceptable variations. Our evaluation across all 11 major Ethereum clients reveals the pervasiveness of API bugs in production systems. APIDiffer uncovered 72 bugs, with 90.28% already confirmed or fixed by developers, including one critical error in the official specifications themselves. Beyond these raw numbers, APIDiffer achieves up to 89.67% higher code coverage than existing tools and reduces false positive rates by 37.38%. The Ethereum community's response validates our impact: developers have integrated our test cases, expressed interest in adopting our methodology, and escalated one bug to the official Ethereum Project Management meeting. By making APIDiffer open-source, we enable continuous validation of Ethereum client API implementations, thereby strengthening the foundational integrity of the entire Ethereum ecosystem.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p122-p doi:10.1145/3798219
Type Inference for Functional and Imperative Dynamic Languages
Mickaël Laurent and Jan Vitek
(Charles University, Czech Republic; Czech Technical University, Czech Republic)
In this paper, we formalize a type system based on set-theoretic types for dynamic languages that support both functional and imperative programming paradigms. We adapt prior work in the typing of overloaded and generic functions to support an impure lambda-calculus, focusing on imperative features commonly found in dynamic languages such as JavaScript, Python, and Julia. We introduce a general notion of parametric opaque data types using set-theoretic types, enabling precise modeling of mutable data structures while promoting modularity, clarity, and readability. Finally, we compare our approach to existing work and evaluate our prototype implementation on a range of examples.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p124-p doi:10.1145/3798220
Fully-Automatic Type Inference for Borrows with Lifetimes
William Brandon, Benjamin Driscoll, Frank Dai, Jonathan Ragan-Kelley, Mae Milano, and Alex Aiken
(Massachusetts Institute of Technology, USA; Stanford University, USA; Unaffiliated, USA; Princeton University, USA)
We present a new pure functional language and type system with borrows with lifetimes, and a corresponding fully-automatic type inference procedure. Inference provides users the performance benefits of borrows with lifetimes without requiring user annotation. If the user's program cannot be typed, inference inserts a handful of reference count operations so that it can be typed. We provide a heap semantics for our borrowing language and prove a soundness theorem, guaranteeing that well-typed programs do not violate memory safety. We implement our memory management strategy as part of the Morphic language stack and compare it to Perceus, a state-of-the-art reference-counting technique based on linear type inference. We find that our system is able to eliminate almost all reference count operations across a range of programs, reducing reference count increments by 75-100% on all benchmarks with reference count increments under the baseline. As a result, we achieve a 1.48x geomean speedup overall on all benchmarks.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p128-p doi:10.1145/3798221
(Dis)Proving Spectre Security with Speculation-Passing Style
Santiago Arranz-Olmos, Gilles Barthe, Lionel Blatter, Xingyu Xie, and Zhiyuan Zhang
(MPI-SP, Germany; IMDEA Software Institute, Spain)
Constant-time (CT) verification tools are commonly used for detecting potential side-channel vulnerabilities in cryptographic libraries. Recently, a new class of tools, called speculative constant-time (SCT) tools, has also been used for detecting potential Spectre vulnerabilities. In many cases, these SCT tools have emerged as liftings of CT tools. However, these liftings are seldom defined precisely and are almost never analyzed formally. The goal of this paper is to address this gap, by developing formal foundations for these liftings, and to demonstrate that these foundations can yield practical benefits.
Concretely, we introduce a program transformation, coined Speculation-Passing Style (SPS), for reducing SCT verification to CT verification. Essentially, the transformation instruments the program with a new input that corresponds to attacker-controlled predictions and modifies the program to follow them. This approach is sound and complete, in the sense that a program is SCT if and only if its SPS transform is CT. Thus, we can leverage existing CT verification tools to prove SCT; we illustrate this by combining SPS with three standard methodologies for CT verification, namely reducing it to noninterference, assertion safety, and dynamic taint analysis. We realize these combinations with three existing tools, EasyCrypt, Binsec/Rel, and CTGrind, and we evaluate them on Kocher’s benchmarks for Spectre-v1. Our results focus on Spectre-v1 in the standard CT leakage model; however, we also discuss applications of our method to other variants of Spectre and other leakage models.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p130-p doi:10.1145/3798222
Static Factorisation of Probabilistic Programs with User-Labelled Sample Statements and While Loops
Markus Böck and Jürgen Cito
(TU Wien, Austria)
It is commonly known that any Bayesian network can be implemented as a probabilistic program, but the reverse direction is not so clear. In this work, we address the open question to what extent a probabilistic program with user-labelled sample statements and while loops - features found in languages like Gen, Turing, and Pyro - can be represented graphically. To this end, we extend existing operational semantics to support these language features. By translating a program to its control-flow graph, we define a sound static analysis that approximates the dependency structure of the random variables in the program. As a result, we obtain a static factorisation of the implicitly defined program density, which is equivalent to the known Bayesian network factorisation for programs without loops and constant labels, but constitutes a novel graphical representation for programs that define an unbounded number of random variables via loops or dynamic labels. We further develop a sound program slicing technique to leverage this structure to statically enable three well-known optimisations for the considered program class: we reduce the variance of gradient estimates in variational inference and we speed up both single-site Metropolis Hastings and sequential Monte Carlo. These optimisations are proven correct and empirically shown to match or outperform existing techniques.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p132-p doi:10.1145/3798223
Mixtris: Mechanised Higher-Order Separation Logic for Mixed Choice Multiparty Message Passing
Jonas Kastberg Hinrichsen, Iwan Quémerais, and Lars Birkedal
(Aalborg University, Denmark; ENS-Lyon, France; Aarhus University, Denmark)
Mixed choice multiparty message passing is an expressive concurrency programming paradigm where components use non-determinism to choose between concurrent options for sending and receiving messages. This flexibility makes it possible to program advanced algorithms, such as leader election protocols, succinctly. We present Mixtris, a mechanised higher-order separation logic for reasoning about functional correctness of higher-order imperative programs with mixed choice multiparty message passing in a shared memory setting. Mixtris builds upon recent work on separation logic for (non-mixed choice) multiparty message-passing programs, by drawing inspiration from session type systems for mixed choice multiparty message-passing programs. Mixtris is the first program logic for mixed choice multiparty message passing. We prove soundness of Mixtris using a novel model of our mixed choice multiparty protocols. We demonstrate how Mixtris can be used to formally reason about challenging examples, including some leader election protocols such as Chang and Roberts’s ring leader election protocol. All the results in the paper (both meta-theory and examples) have been formalised in the Rocq proof assistant on top of the Iris program logic framework.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p134-p doi:10.1145/3798224
Phaedrus: Predicting Dynamic Application Behavior with Lightweight Generative Models and LLMs
Bodhisatwa Chatterjee, Neeraj Jadhav, and Santosh Pande
(Georgia Institute of Technology, USA)
Application profiling is an indispensable technique for many software development tasks, such as code and memory layout optimizations, where optimization decisions are tailored to specific program profiles. Unfortunately, modern application codebases exhibit highly variant behavior across different inputs, creating challenges for conventional profiling approaches that rely on a single representative execution instance. In this paper, we propose Phaedrus, a new compiler-assisted deep learning framework designed to predict dynamic program behavior across varied execution instances, specifically focusing on dynamic function call prediction. These predicted call sequences are subsequently used to guide input-specific compiler optimizations, producing code specialized for each execution instance.
Traditional profile-guided optimization methods struggle with the input-dependent variability of modern applications, where profiling on different inputs yields divergent application behaviors. To address this, Phaedrus proposes two new approaches: Application Profile Synthesis (Dynamis), a profile-less approach where Large Language Models (LLMs) directly infer dynamic functions based on source code & static compiler analysis, bypassing the need for traditional profiling, and Application Profile Generalization (Morpheus), which uses generative models trained on compressed and augmented Whole Program Path (WPP) based function profiles to predict application behavior under unseen inputs. Our experiments show that Phaedrus accurately identifies the most frequently executed and runtime-dominated hotspot functions, accounting for up to 85–99% of total execution time. Leveraging these predictions, Phaedrus enables superior profile-guided optimizations, delivering an average speedup of 6% (up to 25%) and a binary size reduction of 5.19% (up to 19%), without any program execution. In addition, Phaedrus reduces WPP function profile sizes by up to 107×.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p136-p doi:10.1145/3798225
Metamorphic Testing for Infrastructure-as-Code Engines
David Spielmann, George Zakhour, Dominik Arnold, Matteo Biagiola, Roland Meier, and Guido Salvaneschi
(University of St. Gallen, Switzerland; University of Zurich, Switzerland; USI Lugano, Switzerland; armasuisse, Switzerland)
Infrastructure-as-Code (IaC) engines, such as Terraform, OpenTofu, and Pulumi, automate the provisioning and management of cloud resources. They parse IaC specifications and orchestrate the required actions, making them the backbone of modern clouds, and critical to the reliability of both the underlying infrastructure and the software that depends on it. Despite this importance, this class of systems has received little attention: prior work largely targets the correctness of IaC programs rather than the IaC engines themselves. Existing test suites rely on manually written oracles and struggle to expose faults that manifest across multiple executions, leaving a significant reliability gap.
We present EMIaC, a metamorphic testing framework for IaC engines. EMIaC defines metamorphic relations as graph-based transformations of IaC programs and checks invariants across executions of the original and transformed programs. A central novelty is our use of e-graphs in software testing, as both a test-input generator and an equivalence oracle. E-graphs compactly represent program equivalences, enabling the systematic generation of large spaces of equivalent IaC programs. To ground these relations, we analyze 43,593 real-world Terraform programs and show that IaC dependency graphs are typically small and sparse, making e-graphs a natural fit.
Evaluating EMIaC on Pulumi, Terraform, and OpenTofu, we show that it complements existing test suites by exercising engine-critical code paths and covering 98 previously untested statements in Terraform and 1,313 in Pulumi. EMIaC also uncovers previously unknown issues in all three test suites, improving their adequacy. Three test cases have been merged into Terraform's main branch, and Pulumi has merged a specification fix.

Publisher's Version Published Artifact Artifacts Available Article: oopslaa26main-p139-p doi:10.1145/3798226
Class-Dictionary Specialization with Rank-2 Polymorphic Functions
Yong Qi Foo and Michael D. Adams
(National University of Singapore, Singapore)
Rank-2 polymorphism, when combined with type-class-constrained arguments, enables powerful abstractions and code reuse by allowing functions to accept arguments that are themselves ad-hoc polymorphic. Optimizing compilers like the Glasgow Haskell Compiler (GHC) use techniques like class-dictionary specialization and inlining for optimization. However, these techniques can falter when the rank-2 polymorphic functions are recursive. Specializing the polymorphic arguments of recursive rank-2 polymorphic function applications requires inlining the applied function to expose type and dictionary applications to the arguments, but the compiler’s heuristic-driven inliner is reluctant to inline recursive functions, risking non-termination during compilation. This stalemate causes recursive rank-2 polymorphic functions to remain un-optimized and be left with a runtime penalty. Within the Haskell ecosystem, we identify this stalemate within three widely used libraries: the Scrap Your Boilerplate (SYB) and SYB With Class (SYB3) generic-programming libraries, and a fragment of the lens optics library. In these libraries, the combination of rank-2 polymorphism, type-class constraints and recursion is central to their implementation.
In this paper, we present a new optimization technique that breaks this stalemate and enables class-dictionary specialization with recursive rank-2 polymorphic functions. We introduce a partial evaluator that strategically applies the standard transformations of inlining, β-reduction and memoization to applications of rank-2 polymorphic functions that are partially static, i.e., whose arguments have some information known statically. This process exposes applications of its polymorphic arguments onto concrete types and dictionaries, which can be specialized by the standard compiler optimization pipeline. Additionally, we introduce type-constant folding to evaluate run-time type-equality tests statically, further leveraging static type information gained from partial evaluation. We implement our technique as a GHC plugin and demonstrate its effectiveness by resolving the performance bottlenecks in the aforementioned Haskell libraries. On SYB and SYB3 traversals, our technique achieves speedups of 43× on average (up to 155×) and 6.1× on average (up to 9.5×), respectively, matching the performance of their hand-written counterparts. On the fragment of the lens library containing the identified slowdowns, our technique achieves speedups of 1.6× on average (up to 2.1×).

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: oopslaa26main-p141-p doi:10.1145/3798227
Sound and Complete Invariant-Based Heap Encodings
Zafer Esen, Philipp Rümmer, and Tjark Weber
(Uppsala University, Sweden; University of Regensburg, Germany)
Verification of programs operating on heap-allocated data structures, for instance lists or trees, poses significant challenges due to the potentially unbounded size of such data structures.We present time-indexed heap invariants, a novel invariant-based heap encoding leveraging uninterpreted predicates and prophecy variables to reduce verification of heap-manipulating programs to verification of programs over integers only. Our encoding of heap is general and agnostic to specific data structures. To the best of our knowledge, our approach is the first heap invariant-based method that achieves both soundness and completeness. We provide formal proofs establishing the correctness of our encodings. Through an experimental evaluation, we demonstrate that time-indexed heap invariants significantly extend the capability of existing verification tools, allowing automatic verification of programs with heap that were previously out of reach for state-of-the-art tools.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p144-p doi:10.1145/3798228
Scylla: Translating an Applicative Subset of C to Safe Rust
Aymeric Fromherz and Jonathan Protzenko
(Inria, France; Google, USA)
The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C. Automatically translating C to Rust is thus an appealing course of action. Several works have gone down this path, handling an ever-increasing subset of C through a variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of porting existing codebases to memory-safe languages. We instead advocate for a different approach, where the programmer iterates on the original C, gradually making the code more structured until it becomes eligible for compilation to safe Rust. This means that redesigns and rewrites can be evaluated incrementally for performance and correctness against existing test suites and production environments.
Compiling structured C to safe Rust relies on the following contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static analysis based on “split trees” which allows expressing C’s pointer arithmetic using Rust’s slices and splitting operations; an analysis that infers which borrows need to be mutable; and a compilation strategy for C pointer types that is compatible with Rust’s distinction between non-owned and owned allocations. We evaluate our approach on real-world cryptographic libraries, binary parsers and serializers, and a file compression library. We show that these can be rewritten to Rust with small refactors of the original C code, and that the resulting Rust code exhibits similar performance characteristics as the original C code. As part of our translation process, we also identify and report undefined behaviors in the bzip2 compression library and in Microsoft’s implementation of the FrodoKEM cryptographic primitive.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p146-p doi:10.1145/3798229
Hybrid Game Control Envelope Synthesis
Aditi Kabra, Jonathan Laurent, Stefan Mitsch, and André Platzer
(Carnegie Mellon University, USA; KIT, Germany; DePaul University, USA)
Control problems for embedded systems like cars and trains can be modeled by two-player hybrid games. Control envelopes, which are families of safe control solutions, correspond to nondeterministic policies that ensure a player following them will not lose. Each deterministic, finite specialization of the nondeterministic policy is a control solution. This paper synthesizes control envelopes for hybrid games that are as permissive as possible. It introduces subvalue maps, a compositional representation of such policies that enables verification and synthesis along the structure of the game. An inductive logical characterization in differential game logic (dGL) checks whether a subvalue map induces a sound control envelope which ensures that the player never loses, no matter what actions the opponent plays. The maximal subvalue map, which allows the most action options while still winning, is shown to exist and satisfy a logical characterization. An inductive subvalue map synthesis framework is obtained from the soundness characterization. An evaluation of the framework uses the significant expressivity of dGL to model and solve a broad range of control challenges.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Article: oopslaa26main-p158-p doi:10.1145/3798230
Hunting CUDA Bugs at Scale with cuFuzz
Mohamed Tarek Ibn Ziad and Christos Kozyrakis
(NVIDIA, USA; Stanford University, USA)
GPUs play an increasingly important role in modern software. However, the heterogeneous host-device execution model and expanding software stacks make GPU programs prone to memory-safety and concurrency bugs that evade static analysis. While fuzz-testing, combined with dynamic error checking tools, offers a plausible solution, it remains underutilized for GPUs. In this work, we identify three main obstacles limiting prior GPU fuzzing efforts: (1) kernel-level fuzzing leading to false positives, (2) lack of device-side coverage-guided feedback, and (3) incompatibility between coverage and sanitization tools. We present cuFuzz, the first CUDA-oriented fuzzer that makes GPU fuzzing practical by addressing these obstacles.
cuFuzz uses whole program fuzzing to avoid false positives from independently fuzzing device-side kernels. It leverages NVBit to instrument device-side instructions and merges the resultant coverage with compiler-based host coverage. Finally, cuFuzz decouples sanitization from coverage collection by executing host- and device-side sanitizers in separate processes. cuFuzz uncovers 43 previously unknown bugs (19 in commercial libraries) across 14 CUDA programs, including illegal memory accesses, uninitialized reads, and data races. cuFuzz achieves significantly more discovered edges and unique inputs compared to baseline approaches, especially on closed-source targets. Moreover, we quantify the execution time overheads of the different cuFuzz components and add persistent-mode support to improve the overall fuzzing throughput. Our results demonstrate that cuFuzz is an effective and deployable addition to the GPU testing toolbox. cuFuzz is publicly available at https://github.com/NVlabs/cuFuzz/.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p167-p doi:10.1145/3798231
Beacon: Detecting Broken Access Control Vulnerabilities in DBMSs via System Catalog Consistency Validation
Zongrui Peng, Jingzhou Fu, Zhiyong Wu, Jie Liang, Xiangdong Huang, Dalong Shi, and Yu Jiang
(Tsinghua University, China; Beihang University, China; Aviation Industry Corporation of China, China)
Access control in DBMSs is critical for ensuring data security and integrity. However, the increasing complexity of its implementation often introduces broken access control (BAC) vulnerabilities. These vulnerabilities can lead to severe consequences, including privilege escalation, unauthorized data access, or even full compromise of the DBMS. Existing manual testing for BAC vulnerabilities is time-consuming and incomplete. Automated methods like static analysis also struggle in DBMSs, as static rules are difficult to apply to multi-level and dynamically changing privileges.
In this paper, we propose Beacon, which detects BAC vulnerabilities by validating the consistency between SQL operations and system catalogs. Our key insight is that the visibility of objects in the system catalogs is consistent with the user's access control: if an object is invisible to a user in the system catalogs, the user should not have any access privileges on that. Any inconsistency suggests that a user is exceeding their privileges, indicating a potential BAC vulnerability. We used Beacon to test eight popular DBMSs (e.g., MySQL and MariaDB), uncovering 39 previously unknown BAC vulnerabilities. Among them, 19 result in privilege escalation, and 20 lead to unauthorized information disclosure.Moreover, 7 of them have existed in DBMSs for more than 6 years, with the longest-persisting one lasting 13 years. DBMS vendors took these issues seriously and have already confirmed all of these vulnerabilities. Many vendors provided positive feedback, recognizing the importance of addressing these vulnerabilities. For instance, OceanBase awarded bounties for reported vulnerabilities, underscoring Beacon's role in improving DBMS access control.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: oopslaa26main-p172-p doi:10.1145/3798232
Efficient Directed Hybrid Fuzzing via Target-Centric Seed Selection and Generation
Zhen Li, Shenghan Liu, Qiuping Yi, Pengbo Du, and Hongliang Liang
(Beijing University of Posts and Telecommunications, China)
Software vulnerabilities pose severe security threats, highlighting the need for effective automated detection. Directed hybrid fuzzing, which combines the rapid exploration of fuzz testing with the precise constraint solving of symbolic execution, has made notable advancements in vulnerability discovery. However, existing directed hybrid fuzzing approaches still face two key challenges: (1) inefficient seed selection, leading to inadequate prioritization of optimal inputs for symbolic execution, and (2) inefficient seed generation, resulting in suboptimal seed generation. To address these issues, we propose TACO-Fuzz, TArget-Centric cOncolic Fuzzing, which introduces a two-phase target-centric seed selection strategy to prioritize under-explored paths and a target-centric seed generation approach based on constructing extended path conditions, thereby improving seed quality. Our evaluation on a selected set of public benchmarks shows that TACO-Fuzz can outperform several representative state-of-the-art directed fuzzing tools, achieving up to an average speedup of nearly 10x in reaching target locations, along with comparable improvements in reproducing real-world vulnerabilities. Moreover, TACO-Fuzz contributed to the discovery of 17 previously unknown vulnerabilities, each assigned a CVE, and demonstrated faster vulnerability discovery and reproduction in most cases.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Article: oopslaa26main-p176-p doi:10.1145/3798233
Efficient Incremental GR(1) Synthesis via Monotonic Fixed-Point Reuse
Sirui Liu, Wei Dong, Yijie Zheng, and Haonan Guo
(National University of Defense Technology, China)
Although reactive synthesis guarantees correct-by-construction implementations, its practical adoption is limited by a performance bottleneck in the iterative design-and-refinement cycle of formal specifications. GR(1) synthesizers perform redundant, from-scratch computations for each specification modification, severely slowing the development workflow. To address this inefficiency, we propose an incremental synthesis method that exploits the monotonicity of the underlying fixed-point computations. By reusing the system winning region from the preceding check, our method significantly accelerates realizability check.
Our work applies an incremental method to iterative GR(1) specification development covering the full GR(1) scope, including system guarantees and environment assumptions, to accelerate realizability checking. Furthermore, we introduce two heuristics for early fixed-point detection during incremental realizability checking. Finally, we analyze why prior work failed to integrate an incremental technique limited to system guarantees into the unrealizable core minimization algorithm DDMin, further establishing iterative development as a suitable application setting for incremental methods.
Evaluated on a large-scale benchmark of 8,282 specifications, our method exhibits a strong positive correlation between performance gain and specification complexity. For the most challenging specifications, which constitute the primary bottlenecks in development, our approach achieves speedups of several orders of magnitude, reducing computation times in some cases from nearly an hour to just seconds.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: oopslaa26main-p187-p doi:10.1145/3798234
VeriEQ: Finding Verilog Simulators and Synthesizers Bugs with Equivalence Circuit Transformation
Zhen Yan, Yuanliang Chen, Fuchen Ma, Zehong Yu, Dalong Shi, and Yu Jiang
(Tsinghua University, China; Aviation Industry Corporation of China, China)
Verilog simulators and synthesizers play a critical role in chip design and verification. However, due to the complexity of simulation and synthesis processes, they are prone to introducing various types of bugs. Among these, Behavioral Deviation Bugs (BDBs) are particularly severe, as they can cause incorrect results by introducing subtle semantic deviations. Such deviations make the chip behave differently from its intended design and may even enable hardware backdoors. In this work, we propose VeriEQ, an automated framework based on the concept of metamorphic testing, to detect BDBs by generating semantically equivalent Verilog programs. First, to increase the likelihood of triggering BDBs, we analyze the structural patterns of historical BDB cases and design a specialized Verilog code template. Second, we generate semantically equivalent variants by applying equivalence-preserving circuit transformation rules. These rules incorporate constraints on bit-width and signedness to ensure logical consistency before and after transformation. Finally, we design an inlined deviation-checking mechanism that embeds multiple equivalent modules within a single testbench, thereby improving testing efficiency. We implement and evaluate VeriEQ on four mainstream Verilog simulators and synthesizers. Experimental results demonstrate that VeriEQ achieves a 138.1% to 4161.9% speedup compared to state-of-the-art tools. In total, VeriEQ successfully detects 33 previously unknown bugs, including 29 BDBs and 4 hang bugs as additional findings. All discovered bugs have been confirmed, with 27 already fixed. In contrast, existing tools are able to detect only 1 to 7 bugs.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: oopslaa26main-p188-p doi:10.1145/3798235
SART: Sign-Absolute Reformulation Theory for Binary Variable Reduction in Neural Network Verification
Jin Xu, Miaomiao Zhang, and Bowen Du
(Tongji University, China)
Complete formal verification of neural networks is crucial for their deployment in safety-critical domains. A key bottleneck stems from encoding complexity: traditional methods assign one binary variable per unstable ReLU neuron. We propose the Sign-Absolute Reformulation Theory (SART), which fundamentally breaks the conventional one-to-one mapping between unstable neurons and binary variables by establishing formal reducibility criteria. This allows for finer-grained modeling, where each unstable neuron corresponds on average to fewer than one binary variable, thereby reducing verification complexity at its source. Based on SART, we derive a theoretical lower bound on the number of binary variables required for complete verification and, under the assumption that PNP, prove that variables in the final layer can be compressed by 50%, while the number of variables in intermediate layers cannot be further reduced. To overcome the apparent “last-layer-only” limitation, we recast verification as a sequential process and, crucially, show that the gain lifts to the entire network: LayerABS, a SART-based progressive tightening verifier, iteratively treats intermediate layers as temporary final layers and propagates tight bounds that shrink the global search space and binary-variable counts. Furthermore, we reveal a structural law influencing verification complexity: when the signs of weights of unstable neurons satisfy numerical symmetry, with positive and negative weights equal or differing by at most one, the worst-case verification complexity achieves the theoretical optimum, offering theoretical guidance for the design of verification-friendly architectures. As a general-purpose underlying encoding, the value of SART is independent of specific algorithms. To comprehensively evaluate its effectiveness, we first evaluate the abstraction-free SART encoding, and then integrate it with abstraction techniques to construct the complete verifier LayerABS and its incomplete variant Incomplete-LayerABS. Across benchmarks, our methods surpass state-of-the-art baselines, validating SART’s practical impact.

Publisher's Version Article: oopslaa26main-p200-p doi:10.1145/3798237
InspectCoder: Dynamic Analysis-Driven Self Repair through Interactive LLM-Debugger Collaboration
Yunkun Wang, Yue Zhang, Guochang Li, Chen Zhi, Binhua Li, Fei Huang, Yongbin Li, and Shuiguang Deng
(Zhejiang University, China; Alibaba, China)
Complex logic errors in LLM-generated code are challenging to diagnose and repair. While existing LLM-based self-repair approaches conduct intensive static semantic analysis or rely on superficial execution logs, they miss the in-depth runtime behaviors that often expose bug root causes—lacking the interactive dynamic analysis capabilities that make human debugging effective. We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control. Our dual-agent framework enables strategic breakpoint placement, targeted state inspection, and incremental runtime experimentation within stateful debugger sessions. Unlike existing methods that follow fixed log collection procedures, InspectCoder adaptively inspects and perturbs relevant intermediate states at runtime, and leverages immediate process rewards from debugger feedback to guide multi-step reasoning, transforming LLM debugging paradigm from blind trial-and-error into systematic root cause diagnosis. We conduct comprehensive experiments on two challenging self-repair benchmarks: BigCodeBench-R and LiveCodeBench-R. InspectCoder achieves 5.10%–60.37% relative improvements in repair accuracy over the strongest baseline, while delivering 1.67x-2.24x superior bug-fix efficiency respectively. We also contribute InspectWare, an open-source middleware that abstracts debugger complexities and maintains stateful debugging sessions across mainstream Python testing frameworks. Our work provides actionable insight into the interactive LLM-debugger systems, demonstrating the significant potential of LLM-driven dynamic analysis for automated software engineering.

Publisher's Version Article: oopslaa26main-p202-p doi:10.1145/3798238
Geo: A Query Rewrite Framework for Graph Pattern Mining
Nazanin Yousefian, Kasra Jamshidi, Keval Vora, and Anders Miltner
(Simon Fraser University, Canada)
Graph pattern mining is important for analyzing graph data. Graph mining systems typically require answering pattern matching queries, which involve solving the NP-complete subgraph isomorphism problem. To address this, domain experts often develop custom pattern matching query optimization strategies based on exploiting substructural similarities across different patterns. While these optimizers can be effective, their development is challenging due to the complex structural properties of the patterns (e.g., subsymmetries), which are difficult to address. This complexity limits the exploration of interactions between different optimization strategies and restricts experts from continuously improving the optimizers—such as by incorporating additional custom or general pattern-based equivalences over time.
In this paper, we present a programmable pattern matching query optimizer called Geo, which automatically manages the interactions between various equivalences, ensures the optimizations maintain correctness of results, and simplifies the management of substructure equivalences. Geo exposes a simple but flexible language for expressing pattern equivalences as rewrite rules. By maintaining canonical representations of generated patterns during equality saturation, Geo avoids issues arising from syntactic differences in isomorphic patterns. Additionally, we develop embedded reconstructablility (EmRec) that tracks provenance across equivalences to ensure various reconstructability needs of desired outputs. Our evaluation demonstrates that Geo can discover novel query equivalences through complex composition of various rewrite rules, enabling our optimized queries to achieve a cost reduction of up to 99% compared to the queries in prior work. We further test Geo’s effectiveness at speeding up practical graph mining problems by using it in two representative case studies – approximate pattern matching and quasi-clique mining, and find it is highly effective at optimizing these tasks, enabling cost reductions of up to 71%.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: oopslaa26main-p206-p doi:10.1145/3798239
Lawyer: Modular Obligations-Based Liveness Reasoning in Higher-Order Impredicative Concurrent Separation Logic
Egor Namakonov, Justus Fasse, Bart Jacobs, Lars Birkedal, and Amin Timany
(Aarhus University, Denmark; KU Leuven, Belgium)
Higher-order concurrent separation logics, such as Iris, have been tremendously successful in verifying safety properties of concurrent programs. However, state-of-the-art attempts to verify liveness properties in such logics have so far either lacked modularity (the ability to compose specifications of independent modules), or they have been far too complex to mechanize in a proof assistant. In this work, we introduce Lawyer --- a mechanized program logic for modular verification of (fair) termination of concurrent programs. Lawyer draws inspiration from state-of-the-art approaches that use obligations for specifying and proving termination. However, unlike these approaches, which incorporate obligations by instrumenting the source code with erasable auxiliary code and state, Lawyer avoids such instrumentations. Instead, Lawyer incorporates obligations into the logic by embedding them into a purely logical labeled transition system that the program is shown to refine --- this makes Lawyer far more amenable to mechanization. We demonstrate the expressivity of Lawyer by verifying termination of a range of examples, including modular verification of a client program whose termination relies on correctness of a fair lock library, and (separately) proving that a ticket lock implementation implements that library's interface. To the best of our knowledge, Lawyer is the first mechanized program logic that supports modular higher-order impredicative liveness specifications of program modules. All the results that appear in the paper have been mechanized in the Rocq proof assistant on top of the Iris separation logic framework.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: oopslaa26main-p211-p doi:10.1145/3798240
LARTS: Language Abstractions for Real-Time and Secure Systems
Yanqi Li, Hongliang Liang, Rui Yao, Yang Zhang, Dong Liu, Lei Wang, and