Powered by
Proceedings of the ACM on Programming Languages, Volume 8, Number OOPSLA2,
October 20–25, 2024,
Pasadena, CA, USA
Frontmatter
Editorial Message
The Proceedings of the ACM series presents the highest-quality research conducted in diverse areas of computer science, as represented by the ACM Special Interest Groups (SIGs).
The ACM Proceedings of the ACM on Programming Languages (PACMPL) focuses on research on all aspects of programming languages, from design to implementation and from mathematical formalisms to empirical studies.
The journal operates in close collaboration with the Special Interest Group on Programming Languages (SIGPLAN) and is committed to making high-quality peer-reviewed scientific research in programming languages free of restrictions on both access and use.
This issue of the PACMPL journal publishes 95 articles that were submitted in response to a call for papers seeking contributions on all practical and theoretical investigations of programming languages, systems, and environments.
Papers may target any stage of software development, including requirements, modelling, prototyping, design, implementation, generation, analysis, verification, testing, evaluation, maintenance, and reuse of software systems. Contributions may include the development of new tools, techniques, principles, and evaluations.
Papers
VarLifter: Recovering Variables and Types from Bytecode of Solidity Smart Contracts
Yichuan Li,
Wei Song, and
Jeff Huang
(Nanjing University of Science and Technology, China; Texas A&M University, USA)
Since funds or tokens in smart contracts are maintained through specific state variables, contract audit, an effective means for security assurance, particularly focuses on these variables and their related operations. However, the absence of publicly accessible source code for numerous contracts, with only bytecode exposed, hinders audit efforts. Recovering variables and their types from Solidity bytecode is thus a critical task in smart contract analysis and audit, yet this is a challenging task because the bytecode loses variable and type information, only with low-level data operated by stack manipulations and untyped memory/storage accesses. The state-of-the-art smart contract decompilers miss identifying many variables and incorrectly infer the types for many identified variables. To this end, we propose VarLifter, a lifter dedicated to the precise and efficient recovery of typed variables. VarLifter interprets every read or written field of a data region as at least one potential variable, and after discarding falsely identified variables, it progressively refines the variable types based on the variable behaviors in the form of operation sequences. We evaluate VarLifter on 34,832 real-world Solidity smart contracts. VarLifter attains a precision of 97.48% and a recall of 91.84% for typed variable recovery. Moreover, VarLifter finishes analyzing 77% of smart contracts in around 10 seconds per contract. If VarLifter is used to replace the variable recovery modules of the two state-of-the-art Solidity bytecode decompilers, 52.4%, and 74.6% more typed variables will be correctly recovered, respectively. The applications of VarLifter to contract decompilation, contract audit, and contract bytecode fuzzing illustrate that the recovered variable information improves many contract analysis tasks.
Article Search
Artifacts Available
Artifacts Functional
A Dependent Nominal Physical Type System for Static Analysis of Memory in Low Level Code
Julien Simonnet,
Matthieu Lemerre, and
Mihaela Sighireanu
(University Paris-Saclay - CEA - List, France; University Paris-Saclay - ENS Paris-Saclay - CNRS - LMF, France)
We tackle the problem of checking non-proof-carrying code, i.e. automatically proving type-safety (implying in our type system spatial memory safety) of low-level C code or of machine code resulting from its compilation without modification. This requires a precise static analysis that we obtain by having a type system which (i) is expressive enough to encode common low-level idioms, like pointer arithmetic, discriminating variants by bit-stealing on aligned pointers, storing the size and the base address of a buffer in distinct parts of the memory, or records with flexible array members, among others; and (ii) can be embedded in an abstract interpreter. We propose a new type system that meets these criteria. The distinguishing feature of this type system is a nominal organization of contiguous memory regions, which (i) allows nesting, concatenation, union, and sharing parameters between regions; (ii) induces a lattice over sets of addresses from the type definitions; and (iii) permits updates to memory cells that change their type without requiring one to control aliasing. We provide a semantic model for our type system, which enables us to derive sound type checking rules by abstract interpretation, then to integrate these rules as an abstract domain in a standard flow-sensitive static analysis. Our experiments on various challenging benchmarks show that semantic type-checking using this expressive type system generally succeeds in proving type safety and spatial memory safety of C and machine code programs without modification, using only user-provided function prototypes.
Preprint
Info
Artifacts Available
Artifacts Reusable
Object-Oriented Fixpoint Programming with Datalog
David Klopp,
Sebastian Erdweg, and
André Pacak
(JGU Mainz, Germany)
Modern usages of Datalog exceed its original design purpose in scale and complexity.
In particular, Datalog lacks abstractions for code organization and reuse, making programs hard to maintain.
Is it possible to exploit abstractions and design patterns from object-oriented programming (OOP) while retaining a Datalog-like fixpoint semantics?
To answer this question, we design a new OOP language called OODL with common OOP features: dynamic object allocation, object identity, dynamic dispatch, and mutation.
However, OODL has a Datalog-like fixpoint semantics, such that recursive computations iterate until their result becomes stable.
We develop two semantics for OODL: a fixpoint interpreter and a compiler that translates OODL to Datalog.
Although the side effects found in OOP (object allocation and mutation) conflict with Datalog's fixpoint semantics, we can mostly resolve these incompatibilities through extensions of OODL.
Within fixpoint computations, we employ immutable algebraic data structures (e.g. case classes in Scala), rather than relying on object allocation, and we introduce monotonically mutable data types (mono types) to enable a relaxed form of mutation.
Our performance evaluation shows that the interpreter fails to solve fixpoint problems efficiently, whereas the compiled code exploits Datalog's semi-naïve evaluation.
Article Search
Intensional Functions
Zachary Palmer,
Nathaniel Wesley Filardo, and
Ke Wu
(Swarthmore College, USA; Microsoft, Canada; Johns Hopkins University, USA)
Functions in functional languages have a single elimination form — application — and cannot be compared, hashed, or subjected to other non-application operations. These operations can be approximated via defunctionalization: functions are replaced with first-order data and calls are replaced with invocations of a dispatch function. Operations such as comparison may then be implemented for these first-order data to approximate e.g. deduplication of continuations in algorithms such as unbounded searches. Unfortunately, this encoding is tedious, imposes a maintenance burden, and obfuscates the affected code. We introduce an alternative in intensional functions, a language feature which supports the definition of non-application operations in terms of a function’s definition site and closure-captured values. First-order data operations may be defined on intensional functions without burdensome code transformation. We give an operational semantics and type system and prove their formal properties. We further define intensional monads, whose Kleisli arrows are intensional functions, enabling monadic values to be similarly subjected to additional operations.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
Automating Unrealizability Logic: Hoare-Style Proof Synthesis for Infinite Sets of Programs
Shaan Nagy,
Jinwoo Kim,
Thomas Reps, and
Loris D’Antoni
(University of Wisconsin-Madison, USA; Seoul National University, South Korea; University of California at San Diego, USA)
Automated verification of all members of a (potentially infinite) set of programs has the potential to be useful in program synthesis, as well as in verification of dynamically loaded code, concurrent code, and language properties. Existing techniques for verification of sets of programs are limited in scope and unable to create or use interpretable or reusable information about sets of programs. The consequence is that one cannot learn anything from one verification problem that can be used in another. Unrealizability logic (UL), proposed by Kim et al. as the first Hoare-style proof system to prove properties over sets of programs (defined by a regular tree grammar), presents a theoretical framework that can express and use reusable insight. In particular, UL features nonterminal summaries---inductive facts that characterize recursive nonterminals (analogous to procedure summaries in Hoare logic). In this work, we design the first UL proof synthesis algorithm, implemented as Wuldo. Specifically, we decouple the problem of deciding how to apply UL rules from the problem of synthesizing/checking nonterminal summaries by computing proof structure in a fully syntax-directed fashion. We show that Wuldo, when provided nonterminal summaries, can express and prove verification problems beyond the reach of existing tools, including establishing how infinitely many programs behave on infinitely many inputs. In some cases, Wuldo can even synthesize the necessary nonterminal summaries. Moreover, Wuldo can reuse previously proven nonterminal summaries across verification queries, making verification 1.96 times as fast as when summaries are instead proven from scratch.
Preprint
Artifacts Available
Artifacts Reusable
Results Reproduced
Statistical Testing of Quantum Programs via Fixed-Point Amplitude Amplification
Chan Gu Kang,
Joonghoon Lee, and
Hakjoo Oh
(Korea University, South Korea)
We present a new technique for accelerating quantum program testing. Given a quantum circuit with an input/output specification, our goal is to check whether executing the program on the input state produces the expected output. In quantum computing, however, it is impossible to directly check the equivalence of the two quantum states. Instead, we rely on statistical testing, which involves repeated program executions, state measurements, and subsequent comparisons with the specified output. To guarantee a high level of assurance, however, this method requires an extensive number of measurements. In this paper, we propose a solution to alleviate this challenge by adapting Fixed-Point Amplitude Amplification (FPAA) for quantum program testing. We formally present our technique, demonstrate its ability to reduce the required number of measurements as well as runtime cost without sacrificing the original statistical guarantee, and showcase its runtime effectiveness through case studies.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
A Low-Level Look at A-Normal Form
William J. Bowman
(University of British Columbia, Canada)
A-normal form (ANF) is a widely studied intermediate form in which local control and data flow is made explicit in syntax, and a normal form in which many programs with equivalent control-flow graphs have a single normal syntactic representation. However, ANF is difficult to implement effectively and, as we formalize, difficult to extend with new lexically scoped constructs such as scoped region-based allocation. The problem, as has often been observed, is that normalization of commuting conversions is hard.
This traditional view of ANF that normalizing commuting conversions is hard, found in formal models and informed by high-level calculi, is wrong. By studying the low-level intensional aspects of ANF, we can derive a normal form in which normalizing commuting conversion is easy, does not require join points, or code duplication, or renormalization after inlining, and is easily extended with new lexically scoped effects. We formalize the connection between ANF and monadic form and their intensional properties, derive an imperative ANF, and design a compiler pipeline from an untyped -calculus with scoped regions, to monadic form, to a low-level imperative monadic form in which A-normalization is trivial and safe for regions. We prove that any such compiler preserves, or optimizes, stack and memory behaviour compared to ANF. Our formalization reconstructs and systematizes pragmatic choices found in practice, including current production-ready compilers.
The main take-away from this work is that, in general, monadic form should be preferred over ANF, and A-normalization should only be done in a low-level imperative intermediate form. This maximizes the advantages of each form, and avoids all the standard problems with ANF.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Full Iso-Recursive Types
Litao Zhou,
Qianyong Wan, and
Bruno C. d. S. Oliveira
(University of Hong Kong, China)
There are two well-known formulations of recursive types: iso-recursive and equi-recursive types. Abadi and Fiore [LICS 1996] have shown that iso- and equi-recursive types have the same expressive power. However, their encoding of equi-recursive types in terms of iso-recursive types requires explicit coercions. These coercions come with significant additional computational overhead, and complicate reasoning about the equivalence of the two formulations of recursive types. This paper proposes a generalization of iso-recursive types called full iso-recursive types. Full iso-recursive types allow encoding all programs with equi-recursive types without computational overhead. Instead of explicit term coercions, all type transformations are captured by computationally irrelevant casts, which can be erased at runtime without affecting the semantics of the program. Consequently, reasoning about the equivalence between the two approaches can be greatly simplified. We present a calculus called λFiµ, which extends the simply typed lambda calculus (STLC) with full iso-recursive types. The λFiµ calculus is proved to be type sound, and shown to have the same expressive power as a calculus with equi-recursive types. We also extend our results to subtyping, and show that equi-recursive subtyping can be expressed in terms of iso-recursive subtyping with cast operators.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Non-termination Proving at Scale
Azalea Raad,
Julien Vanegue, and
Peter O’Hearn
(Imperial College London, United Kingdom; Bloomberg, USA; University College London, United Kingdom)
Program termination is a classic non-safety property whose falsification cannot in general be witnessed by a finite trace. This makes testing for non-termination challenging, and also a natural target for symbolic proof. Several works in the literature apply non-termination proving to small, self-contained benchmarks, but it has not been developed for large, real-world projects; as such, despite its allure, non-termination proving has had limited practical impact. We develop a compositional theory for non-termination proving, paving the way for its scalable application to large codebases. Discovering non-termination is an under-approximate problem, and we present UNTer, a sound and complete under-approximate logic for proving non-termination. We then extend UNTer with separation logic and develop UNTerSL for heap-manipulating programs, yielding a compositional proof method amenable to automation via under-approximation and bi-abduction. We extend the Pulse analyser from Meta and develop Pulse∞, an automated, compositional prover for non-termination based on UNTerSL. We have run Pulse∞ on large codebases and libraries, each comprising hundreds of thousands of lines of code, including OpenSSL, libxml2, libxpm and CryptoPP; it discovered several previously-unknown non-termination bugs and have reported them to developers of these libraries.
Preprint
Info
Artifacts Available
Artifacts Reusable
Compiler Support for Sparse Tensor Convolutions
Peiming Liu,
Alexander J Root,
Anlun Xu,
Yinying Li,
Fredrik Kjolstad, and
Aart J.C. Bik
(Google Research, USA; Stanford University, USA; Google, USA)
This paper extends prior work on sparse tensor algebra compilers to generate asymptotically efficient code for tensor expressions with affine subscript expressions. Our technique enables compiler support for a wide range of sparse computations, including sparse convolutions and pooling that are widely used in ML and graphics applications. We propose an approach that gradually rewrites compound subscript expressions to simple subscript expressions with loops that exploit the sparsity pattern of the input sparse tensors. As a result, the time complexity of the generated kernels is bounded by the number of stored elements and not by the shape of the tensors. Our approach seamlessly integrates into existing frameworks and is compatible with recent advances in compilers for sparse computations, including the flexibility to efficiently handle arbitrary combinations of different sparse tensor formats. The implementation of our algorithm is open source and upstreamed to the MLIR sparse compiler. Experimental results show that our method achieves 19.5x speedup when compared with the state-of-the-art compiler-based method at 99.9% sparsity. The generated sparse kernels start to outperform dense convolution implementations at about 80% sparsity.
Article Search
Artifacts Functional
Iris-MSWasm: Elucidating and Mechanising the Security Invariants of Memory-Safe WebAssembly
Maxime Legoupil,
June Rousseau,
Aïna Linn Georges,
Jean Pichon-Pharabod, and
Lars Birkedal
(Aarhus University, Denmark; MPI-SWS, Germany)
WebAssembly offers coarse-grained encapsulation guarantees via its module system, but does not support fine-grained sharing of its linear memory. MSWasm is a recent proposal which extends WebAssembly with fine-grained memory sharing via handles, a type of capability that guarantees spatial and temporal safety, and thus enables an expressive yet safe style of programming with flexible sharing.
In this paper, we formally validate the pen-and-paper design of MSWasm. To do so, we first define MSWasmCert, a mechanisation of MSWasm that makes it a fully-defined, conservative extension of WebAssembly 1.0, including the module system. We then develop Iris-MSWasm, a foundational reasoning framework for MSWasm composed of a separation logic to reason about known code, and a logical relation to reason about unknown, potentially adversarial code. Iris-MSWasm thereby makes explicit a key aspect of the implicit universal contract of MSWasm: robust capability safety. We apply Iris-MSWasm to reason about key use cases of handles, in which the effect of calling an unknown function is bounded by robust capability safety. Iris-MSWasm thus works as a framework to prove complex security properties of MSWasm programs, and provides a foundation to evaluate the language-level guarantees of MSWasm.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
libLISA: Instruction Discovery and Analysis on x86-64
Jos Craaijo,
Freek Verbeek, and
Binoy Ravindran
(Open Universiteit, Netherlands; Virginia Tech, USA)
Even though heavily researched, a full formal model of the x86-64 instruction set is still not available. We present libLISA, a tool for automated discovery and analysis of the ISA of a CPU. This produces the most extensive formal x86-64 model to date, with over 118000 different instruction groups. The process requires as little human specification as possible: specifically, we do not rely on a human-written (dis)assembler to dictate which instructions are executable on a given CPU, or what their in- and outputs are. The generated model is CPU-specific: behavior that is "undefined" is synthesized for the current machine. Producing models for five different x86-64 machines, we mutually compare them, discover undocumented instructions, and generate instruction sequences that are CPU-specific. Experimental evaluation shows that we enumerate virtually all instructions within scope, that the instructions' semantics are correct w.r.t. existing work, and that we improve existing work by exposing bugs in their handwritten models.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Synthesizing Formal Semantics from Executable Interpreters
Jiangyi Liu,
Charlie Murphy,
Anvay Grover,
Keith J.C. Johnson,
Thomas Reps, and
Loris D’Antoni
(University of Wisconsin-Madison, USA; University of California at San Diego, USA)
Program verification and synthesis frameworks that allow one to customize the language in which one is interested typically require the user to provide a formally defined semantics for the language.
Because writing a formal semantics can be a daunting and error-prone task, this requirement stands in the way of such frameworks being adopted by non-expert users.
We present an algorithm that can automatically synthesize inductively defined syntax-directed semantics when given (i) a grammar describing the syntax of a language and (ii) an executable (closed-box) interpreter for computing the semantics of programs in the language of the grammar.
Our algorithm synthesizes the semantics in the form of Constrained-Horn Clauses (CHCs), a natural, extensible, and formal logical framework for specifying inductively defined relations that has recently received widespread adoption in program verification and synthesis.
The key innovation of our synthesis algorithm is a Counterexample-Guided Synthesis (CEGIS) approach that breaks the hard problem of synthesizing a set of constrained Horn clauses into small, tractable expression-synthesis problems that can be dispatched to existing SyGuS synthesizers.
Our tool Synantic synthesized inductively-defined formal semantics from 14 interpreters for languages used in program-synthesis applications.
When synthesizing formal semantics for one of our benchmarks, Synantic unveiled an inconsistency in the semantics computed by the interpreter for a language of regular expressions; fixing the inconsistency resulted in a more efficient semantics and, for some cases, in a 1.2x speedup for a synthesizer solving synthesis problems over such a language.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
A Modal Type Theory of Expected Cost in Higher-Order Probabilistic Programs
Vineet Rajani,
Gilles Barthe, and
Deepak Garg
(University of Kent, United Kingdom; MPI-SP, Germany; IMDEA Software Institute, Spain; MPI-SWS, Germany)
The design of online learning algorithms typically aims to optimise the incurred loss or cost, e.g., the number of classification mistakes made by the algorithm. The goal of this paper is to build a type-theoretic framework to prove that a certain algorithm achieves its stated bound on the cost. Online learning algorithms often rely on randomness, their loss functions are often defined as expectations, precise bounds are often non-polynomial (e.g., logarithmic) and proofs of optimality often rely on potential-based arguments. Accordingly, we present pλ-amor, a type-theoretic graded modal framework for analysing (expected) costs of higher-order probabilistic programs with recursion. pλ-amor is an effect-based framework which uses graded modal types to represent potentials, cost and probability at the type level. It extends prior work (λ-amor) on cost analysis for deterministic programs. We prove pλ-amor sound relative to a Kripke step-indexed model which relates potentials with probabilistic coupling. We use pλ-amor to prove cost bounds of several examples from the online machine learning literature. Finally, we describe an extension of pλ-amor with a graded comonad and describe the relationship between the different modalities.
Article Search
Sensitivity by Parametricity
Elisabet Lobo-Vesga,
Alejandro Russo,
Marco Gaboardi, and
Carlos Tomé Cortiñas
(DPella, Sweden; Chalmers University of Technology, Sweden; Boston University, USA)
The work of Fuzz has pioneered the use of functional programming languages where types allow reasoning about the sensitivity of programs. Fuzz and subsequent work (e.g., DFuzz and Duet) use advanced technical devices like linear types, modal types, and partial evaluation. These features usually require the design of a new programming language from scratch—a significant task on its own! While these features are part of the classical toolbox of programming languages, they are often unfamiliar to non-experts in this field. Fortunately, recent studies (e.g., Solo) have shown that linear and complex types in general, are not strictly needed for the task of determining programs’ sensitivity since this can be achieved by annotating base types with static sensitivity information. In this work, we take a different approach. We propose to enrich base types with information about the metric relation between values, and we present the novel idea of applying parametricity to derive direct proofs for the sensitivity of functions. A direct consequence of our result is that calculating and proving the sensitivity of functions is reduced to simply type-checking in a programming language with support for polymorphism and type-level naturals. We formalize our main result in a calculus, prove its soundness, and implement a software library in the programming language Haskell–where we reason about the sensitivity of canonical examples. We show that the simplicity of our approach allows us to exploit the type inference of the host language to support a limited form of sensitivity inference. Furthermore, we extend the language with a privacy monad to showcase how our library can be used in practical scenarios such as the implementation of differentially private programs, where the privacy guarantees depend on the sensitivity of user-defined functions. Our library, called Spar, is implemented in less than 500 lines of code.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Mix Testing: Specifying and Testing ABI Compatibility of C/C++ Atomics Implementations
Luke Geeson,
James Brotherston,
Wilco Dijkstra,
Alastair F. Donaldson,
Lee Smith,
Tyler Sorensen, and
John Wickerson
(University College London, United Kingdom; Arm, United Kingdom; Imperial College London, United Kingdom; University of California at Santa Cruz, USA)
The correctness of complex software depends on the correctness of both the source code and the compilers that generate corresponding binary code. Compilers must do more than preserve the semantics of a single source file: they must ensure that generated binaries can be composed with other binaries to form a final executable. The compatibility of composition is ensured using an Application Binary Interface (ABI), which specifies details of calling conventions, exception handling, and so on. Unfortunately, there are no official ABIs for concurrent programs, so different atomics mappings, although correct in isolation, may induce bugs when composed. Indeed, today, mixing binaries generated by different compilers can lead to an erroneous resulting binary. We present mix testing: a new technique designed to find compiler bugs when the instructions of a C/C++ test are separately compiled for multiple compatible architectures and then mixed together. We define a class of compiler bugs, coined mixing bugs, that arise when parts of a program are compiled separately using different mappings from C/C++ atomic operations to assembly sequences. To demonstrate the generality of mix testing, we have designed and implemented a tool, atomic-mixer, which we have used: (a) to reproduce one existing non-mixing bug that state-of-the-art concurrency testing tools are limited to being able to find (showing that atomic-mixer at least meets the capabilities of these tools), and (b) to find four previously-unknown mixing bugs in LLVM and GCC, and one prospective mixing bug in mappings proposed for the Java Virtual Machine. Lastly, we have worked with engineers at Arm to specify, for the first time, an atomics ABI for Armv8, and have used atomic-mixer to validate the LLVM and GCC compilers against it.
Article Search
Video
Info
Artifacts Available
Artifacts Functional
Statically Contextualizing Large Language Models with Typed Holes
Andrew Blinn,
Xiang Li,
June Hyung Kim, and
Cyrus Omar
(University of Michigan, USA)
Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate code context, particularly when working with definitions that are neither in the training data nor near the cursor. This paper demonstrates that tighter integration with the type and binding structure of the programming language in use, as exposed by its language server, can help address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server is able to identify the type and typing context of the hole that the programmer is filling, with Hazel's total syntax and type error correction ensuring that a meaningful program sketch is available whenever the developer requests a completion. This allows the system to prompt the LLM with codebase-wide contextual information that is not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server, which provides error localization and error messages. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications with accompanying unit tests that have been written from scratch to avoid data contamination, and that can easily be ported to new languages because they do not have large external library dependencies. These applications serve as challenge problems due to their extensive reliance on application-specific data structures. Through an ablation study, we examine the impact of contextualization with type definitions, function headers, and errors messages, individually and in combination. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel, a low-resource language, we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource mainstream languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring
Adhitha Dias,
Logan Anderson,
Kirshanthan Sundararajah,
Artem Pelenitsyn, and
Milind Kulkarni
(Purdue University, USA; Virginia Tech, USA)
Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint.
Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.
Article Search
Artifacts Available
Quantum Probabilistic Model Checking for Time-Bounded Properties
Seungmin Jeon,
Kyeongmin Cho,
Changu Kang,
Janggun Lee,
Hakjoo Oh, and
Jeehoon Kang
(KAIST, South Korea; Korea University, South Korea)
Probabilistic model checking (PMC) is a verification technique for analyzing the properties of probabilistic systems. However, existing techniques face challenges in verifying large systems with high accuracy. PMC struggles with state explosion, where the number of states grows exponentially with the size of the system, making large system verification infeasible. While statistical model checking (SMC) avoids PMC’s state explosion problem by using a simulation approach, it suffers from runtime explosion, requiring numerous samples for high accuracy.
To address these limitations in verifying large systems with high accuracy, we present quantum probabilistic model checking (QPMC), the first method leveraging quantum computing for PMC with respect to time-bounded properties. QPMC addresses state explosion by encoding PMC problems into quantum circuits that superpose states within qubits. Additionally, QPMC resolves runtime explosion through Quantum Amplitude Estimation, efficiently estimating the probabilities of specified properties. We prove that QPMC correctly solves PMC problems and achieves a quadratic speedup in time complexity compared to SMC.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Crabtree: Rust API Test Synthesis Guided by Coverage and Type
Yoshiki Takashima,
Chanhee Cho,
Ruben Martins,
Limin Jia, and
Corina S. Păsăreanu
(Carnegie Mellon University, USA)
Rust type system constrains pointer operations, preventing bugs such as use-after-free. However, these constraints may be too strict for programming tasks such as implementing cyclic data structures. For such tasks, programmers can temporarily suspend checks using the unsafe keyword. Rust libraries wrap unsafe code blocks and expose higher-level APIs. They need to be extensively tested to uncover memory-safety bugs that can only be triggered by unexpected API call sequences or inputs. While prior works have attempted to automatically test Rust library APIs, they fail to test APIs with common Rust features, such as polymorphism, traits, and higher-order functions, or they have scalability issues and can only generate tests for a small number of combined APIs. We propose , a testing tool for Rust library APIs that can automatically synthesize test cases with native support for Rust traits and higher-order functions. Our tool improves upon the test synthesis algorithms of prior works by combining synthesis and fuzzing through a coverage- and type-guided search algorithm that intelligently grows test programs and input corpus towards testing more code. To the best of our knowledge, our tool is the first to generate well-typed tests for libraries that make use of higher-order trait functions. Evaluation of Crabtree on 30 libraries found four previously unreported memory-safety bugs, all of which were accepted by the respective authors.
Article Search
Artifacts Available
Merging Gradual Typing
Wenjia Ye,
Bruno C. d. S. Oliveira, and
Matías Toro
(University of Hong Kong, China; University of Chile, Chile)
Programming language mechanisms with a type-directed semantics are nowadays common and widely used. Such mechanisms include gradual typing, type classes, implicits and intersection types with a merge operator. While sharing common challenges in their design and having complementary strengths, type-directed mechanisms have been mostly independently studied.
This paper studies a new calculus, called λM⋆, which combines two type-directed mechanisms: gradual typing and a merge operator based on intersection types. Gradual typing enables a smooth transition between dynamically and statically typed code, and is available in languages such as TypeScript or Flow. The merge operator generalizes record concatenation to allow merges of values of any two types. Recent work has shown that the merge operator enables modelling expressive OOP features like first-class traits/classes and dynamic inheritance with static type-checking. These features are not found in mainstream statically typed OOP languages, but they can be found in dynamically or gradually typed languages such as JavaScript or TypeScript. In λM⋆, by exploiting the complementary strengths of gradual typing and the merge operator, we obtain a foundation for modelling gradually typed languages with both first-class classes and dynamic inheritance. We study a static variant of λM⋆ (called λM); prove the type-soundness of λM⋆; show that λM⋆ can encode gradual rows and all well-typed terms in the GTFL≲ calculus; and show that λM⋆ satisfies gradual typing criteria. The dynamic gradual guarantee (DGG) is challenging due to the possibility of ambiguity errors. We establish a variant of the DGG using a semantic notion of precision based on a step-indexed logical relation.
Article Search
Artifacts Available
Artifacts Functional
Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
Federico Cassano,
John Gouwar,
Francesca Lucchetti,
Claire Schlesinger,
Anders Freeman,
Carolyn Jane Anderson,
Molly Q Feldman,
Michael Greenberg,
Abhinav Jangda, and
Arjun Guha
(Northeastern University, USA; Wellesley College, USA; Oberlin College, USA; Stevens Institute of Technology, USA; Microsoft Research, USA; Roblox, USA)
Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml, Racket, and several others).
This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called MultiPL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. MultiPL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done.
Using datasets generated with MultiPL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that MultiPL-T continues to outperform other fine-tuning approaches for low-resource languages. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.
Preprint
Info
Artifacts Available
Artifacts Functional
Results Reproduced
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models
Chenyuan Yang,
Yinlin Deng,
Runyu Lu,
Jiayi Yao,
Jiawei Liu,
Reyhaneh Jabbarvand, and
Lingming Zhang
(University of Illinois at Urbana-Champaign, USA; Huazhong University of Science and Technology, China; Chinese University of Hong Kong, Shenzhen, China)
Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates test programs without sufficient understanding of internal compiler behaviors. As such, they often fail to construct test programs to exercise intricate optimizations. Meanwhile, traditional white-box techniques, such as symbolic execution, are computationally inapplicable to the giant codebase of compiler systems. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and even have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, guiding LLMs with compiler source-code information remains a missing piece of research in compiler testing.
To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization, with a spotlight on detecting deep logic bugs in the emerging deep learning (DL) compilers. WhiteFox adopts a multi-agent framework: (i) an LLM-based analysis agent examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) an LLM-based generation agent produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are also used as feedback to further enhance the test generation prompt on the fly. Our evaluation on the three most popular DL compilers (i.e., PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite) shows that WhiteFox can generate high-quality test programs to exercise deep optimizations requiring intricate conditions, practicing up to 8 times more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 101 bugs for the compilers under test, with 92 confirmed as previously unknown and 70 already fixed. Notably, WhiteFox has been recently acknowledged by the PyTorch team, and is in the process of being incorporated into its development workflow. Finally, beyond DL compilers, WhiteFox can also be adapted for compilers in different domains, such as LLVM, where WhiteFox has already found multiple bugs.
Article Search
HiPy: Extracting High-Level Semantics from Python Code for Data Processing
Michael Jungmair,
Alexis Engelke, and
Jana Giceva
(TU Munich, Germany)
Data science workloads frequently include Python code, but Python's dynamic nature makes efficient execution hard. Traditional approaches either treat Python as a black box, missing out on optimization potential, or are limited to a narrow domain. However, a deep and efficient integration of user-defined Python code into data processing systems requires extracting the semantics of the entire Python code.
In this paper, we propose a novel approach for extracting the high-level semantics by transforming general Python functions into program generators that generate a statically-typed IR when executed.
The extracted IR then allows for high-level, domain-specific optimizations and the generation of efficient C++ code. With our prototype implementation, HiPy, we achieve single-threaded speedups of 2-20x for many workloads. Furthermore, HiPy is also capable of accelerating Python code in other domains like numerical data, where it can sometimes even outperform specialized compilers.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
Automatically Reducing Privilege for Access Control Policies
Loris D’Antoni,
Shuo Ding,
Amit Goel,
Mathangi Ramesh,
Neha Rungta, and
Chungha Sung
(Amazon Web Services, USA; Georgia Institute of Technology, USA)
Access control policies are programs used to secure cloud resources. These polices should only grant the necessary permissions that a given application needs. However, it is challenging to write and maintain policies as applications and their required permissions change over time.
In this paper, we focus on the Amazon Web Services (AWS) IAM policy language and present an approach that, given a policy, synthesizes a modified policy that is more restrictive and better abides to the principle of least privilege. Our approach looks at the actual access history (e.g., access logs) used by an application and computes the least permissive local modification of the user-given policy that still provides the same permissions that were observed in the access history. We treat the problem of computing the least permissive policy as a generalization problem in a lattice of possible policies (i.e., the set of local modifications). We show that our synthesis algorithm comes with correctness guarantees and is amendable to an efficient implementation that is easy to parallelize. We implement our algorithm in a tool IAM-PolicyRefiner and evaluate it on policies attached to AWS roles with access logs. For each role, IAM-PolicyRefiner can compute easy-to-inspect refined policies in less than 1 minute, and the refined policies do not overfit to the requests in the log---i.e., the policies also allow requests in a left-out test set of requests.
Article Search
Fully Verified Instruction Scheduling
Ziteng Yang,
Jun Shirako, and
Vivek Sarkar
(Georgia Institute of Technology, USA)
CompCert project, the state-of-the-art compiler that achieves the first end-to-end formally verified C compiler, does not support fully verified instruction scheduling. Instead, existing research that works on such topics only implements translation validation. This means they do not have direct formal proof that the scheduling algorithm is correct, but only a posterior validation to check each compiling case. Using such a method, CompCert accepts a valid C program and compiles correctly only when the untrusted scheduler generates a correct result. However, it does not guarantee the complete correctness of the scheduler. It also causes compile-time validation overhead in the view of runtime performance.
In this work, we present the first achievement in developing a mechanized library for fully verified instruction scheduling while keeping the proof workload acceptably lightweight. The idea to reduce the proof length is to exploit a simple property that the topological reordering of a topological sorted list is equal to a sequence of swapping adjacent unordered elements. Together with the transitivity of semantic simulation relation, the only burden will become proving the semantic preservation of a transition that only swaps two adjacent independent instructions inside one block. After successfully proving this result, proving the correctness of any new instruction scheduling algorithm only requires proof that it preserved the syntax-level dependence among instructions, instead of reasoning about semantics details every time. We implemented a mechanized library of such methods in the Coq proof assistant based on CompCert's library as a framework and used the list scheduling algorithm as a case study to show the correctness can be formally proved using our theory.
We show that with our method that abstracts away the semantics details, it is flexible to implement any scheduler that reorders instructions with little extra proof burden. Our scheduler in the case study also abstracts away the outside scheduling heuristic as a universal parameter so it is flexible to modify without touching any correctness proof.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Quantitative Weakest Hyper Pre: Unifying Correctness and Incorrectness Hyperproperties via Predicate Transformers
Linpeng Zhang,
Noam Zilberstein,
Benjamin Lucien Kaminski, and
Alexandra Silva
(University College London, United Kingdom; Cornell University, USA; Saarland University, Germany)
We present a novel weakest pre calculus for reasoning about quantitative hyperproperties over nondeterministic and probabilistic programs. Whereas existing calculi allow reasoning about the expected value that a quantity assumes after program termination from a single initial state, we do so for initial sets of states or initial probability distributions.
We thus (i) obtain a weakest pre calculus for hyper Hoare logic and (ii) enable reasoning about so-called hyperquantities which include expected values but also quantities (e.g. variance) out of scope of previous work. As a byproduct, we obtain a novel strongest post for weighted programs that extends both existing strongest and strongest liberal post calculi. Our framework reveals novel dualities between forward and backward transformers, correctness and incorrectness, as well as nontermination and unreachability.
Preprint
CoolerSpace: A Language for Physically Correct and Computationally Efficient Color Programming
Ethan Chen,
Jiwon Chang, and
Yuhao Zhu
(University of Rochester, USA)
Color programmers manipulate lights, materials, and the resulting colors from light-material interactions. Existing libraries for color programming provide only a thin layer of abstraction around matrix operations. Color programs are, thus, vulnerable to bugs arising from mathematically permissible but physically meaningless matrix computations. Correct implementations are difficult to write and optimize. We introduce CoolerSpace to facilitate physically correct and computationally efficient color programming. CoolerSpace raises the level of abstraction of color programming by allowing programmers to focus on describing the logic of color physics. Correctness and efficiency are handled by CoolerSpace. The type system in CoolerSpace assigns physical meaning and dimensions to user-defined objects. The typing rules permit only legal computations informed by color physics and perception. Along with type checking, CoolerSpace also generates performance-optimized programs using equality saturation. CoolerSpace is implemented as a Python library and compiles to ONNX, a common intermediate representation for tensor computations. CoolerSpace not only prevents common errors in color programming, but also does so without run-time overhead: even unoptimized CoolerSpace programs out-perform existing Python-based color programming systems by up to 5.7 times; our optimizations provide up to an additional 1.4 times speed-up.
Article Search
Artifacts Available
Plume: Efficient and Complete Black-Box Checking of Weak Isolation Levels
Si Liu,
Long Gu,
Hengfeng Wei, and
David Basin
(ETH Zurich, Switzerland; Nanjing University, China)
Modern databases embrace weak isolation levels to cater for highly available transactions. However, weak isolation bugs have recently manifested in many production databases. This raises the concern of whether database implementations actually deliver their promised isolation guarantees in practice. In this paper we present Plume, the first efficient, complete, black-box checker for weak isolation levels. Plume builds on modular, fine-grained, transactional anomalous patterns, with which we establish sound and complete characterizations of representative weak isolation levels, including read committed, read atomicity, and transactional causal consistency. Plume leverages a novel combination of two techniques, vectors and tree clocks, to accelerate isolation checking. Our extensive assessment shows that Plume can reproduce all known violations in a large collection of anomalous database execution histories, detect new isolation bugs in three production databases along with informative counterexamples, find more weak isolation anomalies than the state-of-the-art checkers, and efficiently validate isolation guarantees under a wide variety of workloads.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Computing Precise Control Interface Specifications
Eric Hayden Campbell,
Hossein Hojjat, and
Nate Foster
(Cornell University, USA; Tehran Institute for Advanced Studies, Iran)
Verifying network programs is challenging because of how they divide labor: the control plane computes high level routes through the network and compiles them to device configurations, while the data plane uses these configurations to realize the desired forwarding behavior. In practice, the correctness of the data plane often assumes that the configurations generated by the control plane will satisfy complex specifications. Consequently, validation tools such as program verifiers, runtime monitors, fuzzers, and test-case generators must be aware of these control interface specifications (ci-specs) to avoid raising false alarms. In this paper, we propose the first algorithm for computing precise ci-specs for network data planes. Our specifications are designed to be efficiently monitorable—concretely, checking that a fixed configuration satisfies a ci-spec can be done in polynomial time. Our algorithm, based on modular program instrumentation, quantifier elimination, and a path-based analysis, is more expressive than prior work, and is applicable to practical network programs. We describe an implementation and show that ci-specs computed by our tool are useful for finding real bugs in real-world data plane programs.
Preprint
Artifacts Available
Artifacts Reusable
Automating Pruning in Top-Down Enumeration for Program Synthesis Problems with Monotonic Semantics
Keith J.C. Johnson,
Rahul Krishnan,
Thomas Reps, and
Loris D’Antoni
(University of Wisconsin-Madison, USA; University of California at San Diego, USA)
In top-down enumeration for program synthesis, abstraction-based pruning uses an abstract domain to approximate the set of possible values that a partial program, when completed, can output on a given input. If the set does not contain the desired output, the partial program and all its possible completions can be pruned. In its general form, abstraction-based pruning requires manually designed, domain-specific abstract domains and semantics, and thus has only been used in domain-specific synthesizers.
This paper provides sufficient conditions under which a form of abstraction-based pruning can be automated for arbitrary synthesis problems in the general-purpose Semantics-Guided Synthesis (SemGuS) framework without requiring manually-defined abstract domains. We show that if the semantics of the language for which we are synthesizing programs exhibits some monotonicity properties, one can obtain an abstract interval-based semantics for free from the concrete semantics of the programming language, and use such semantics to effectively prune the search space. We also identify a condition that ensures such abstract semantics can be used to compute a precise abstraction of the set of values that a program derivable from a given hole in a partial program can produce. These precise abstractions make abstraction-based pruning more effective.
We implement our approach in a tool, Moito, which can tackle synthesis problems defined in the SemGuS framework. Moito can automate interval-based pruning without any a-priori knowledge of the problem domain, and solve synthesis problems that previously required domain-specific, abstraction-based synthesizers— e.g., synthesis of regular expressions, CSV file schema, and imperative programs from examples.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Refinement Type Refutations
Robin Webbers,
Klaus von Gleissenthall, and
Ranjit Jhala
(Vrije Universiteit Amsterdam, Netherlands; University of California at San Diego, USA)
Refinement types combine SMT decidable constraints with a compositional, syntax-directed type system to provide a convenient way to statically and automatically check properties of programs. However, when type checking fails, programmers must use cryptic error messages that, at best, point out the code location where a subtyping constraint failed to determine the root cause of the failure. In this paper, we introduce refinement type refutations, a new approach to explaining why refinement type checking fails, which mirrors the compositional way in which refinement type checking is carried out. First, we show how to systematically transform standard bidirectional type checking rules to obtain refutations. Second, we extend the approach to account for global constraint-based refinement inference via the notion of a must-instantiation: a set of concrete inhabitants of the types of subterms that suffice to demonstrate why typing fails. Third, we implement our method in HayStack—an extension to LiqidHaskell which automatically finds type-refutations when refinement type checking fails, and helps users understand refutations via an interactive user-interface. Finally, we present an empirical evaluation of HayStack using the regression benchmark-set of LiqidHaskell, and the benchmark set of G2, a previous method that searches for (non-compositional) counterexample traces by symbolically executing Haskell source. We show that HayStack can find refutations for 99.7% of benchmarks, including those with complex typing constructs (e.g., abstract and bounded refinements, and reflection), and does so, an order of magnitude faster than G2.
Article Search
Info
The Ultimate Conditional Syntax
Luyu Cheng and
Lionel Parreaux
(Hong Kong University of Science and Technology, Hong Kong)
Functional programming languages typically support expressive pattern-matching syntax allowing programmers
to write concise and type-safe code, especially appropriate for manipulating algebraic data types. Many
features have been proposed to enhance the expressiveness of stock pattern-matching syntax, such as pattern
bindings, pattern alternatives (a.k.a. disjunction), pattern conjunction, view patterns, pattern guards, pattern
synonyms, active patterns, ‘if-let’ patterns, multi-way if-expressions, etc. In this paper, we propose a new
pattern-matching syntax that is both more expressive and (we argue) simpler and more readable than previous
alternatives. Our syntax supports parallel and nested matches interleaved with computations and intermediate
bindings. This is achieved through a form of nested multi-way if-expressions with a condition-splitting mechanism
to factor common conditional prefixes as well as a binding technique we call conditional pattern flowing.
We motivate this new syntax with many examples in the setting of MLscript, a new ML-family programming
language. We describe a straightforward desugaring pass from our rich source syntax into a minimal core
syntax that only supports flat patterns and has an intuitive small-step semantics. We then provide a translation
from the core syntax into a normalized syntax without backtracking, which is more amenable to coverage
checking and compilation, and formally prove that our translation is semantics-preserving. We view this work
as a step towards rethinking pattern matching to make it more powerful and natural to use. Our syntax can
easily be integrated, in part or in whole, into existing as well as future programming language designs.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
On the Expressive Power of Languages for Static Variability
Paul Maximilian Bittner,
Alexander Schultheiß,
Benjamin Moosherr,
Jeffrey M. Young,
Leopoldo Teixeira,
Eric Walkingshaw,
Parisa Ataei, and
Thomas Thüm
(University of Paderborn, Germany; Ulm University, Germany; University of Bern, Switzerland; University of Ulm, Germany; IOHK, USA; Federal University of Pernambuco, Brazil; Unaffiliated, USA; TU Braunschweig, Germany)
Variability permeates software development to satisfy ever-changing requirements and mass-customization needs. A prime example is the Linux kernel, which employs the C preprocessor to specify a set of related but distinct kernel variants. To study, analyze, and verify variational software, several formal languages have been proposed. For example, the choice calculus has been successfully applied for type checking and symbolic execution of configurable software, while other formalisms have been used for variational model checking, change impact analysis, among other use cases. Yet, these languages have not been formally compared, hence, little is known about their relationships. Crucially, it is unclear to what extent one language subsumes another, how research results from one language can be applied to other languages, and which language is suitable for which purpose or domain. In this paper, we propose a formal framework to compare the expressive power of languages for static (i.e., compile-time) variability. By establishing a common semantic domain to capture a widely used intuition of explicit variability, we can formulate the basic, yet to date neglected, properties of soundness, completeness, and expressiveness for variability languages. We then prove the (un)soundness and (in)completeness of a range of existing languages, and relate their ability to express the same variational systems. We implement our framework as an extensible open source Agda library in which proofs act as correct compilers between languages or differencing algorithms. We find different levels of expressiveness as well as complete and incomplete languages w.r.t. our unified semantic domain, with the choice calculus being among the most expressive languages.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Programmable MCMC with Soundly Composed Guide Programs
Long Pham,
Di Wang,
Feras A. Saad, and
Jan Hoffmann
(Carnegie Mellon University, USA; Peking University, China)
Probabilistic programming languages (PPLs) provide language support for expressing flexible probabilistic models and solving Bayesian inference problems. PPLs with programmable inference make it possible for users to obtain improved results by customizing inference engines using guide programs that are tailored to a corresponding model program. However, errors in guide programs can compromise the statistical soundness of the inference. This article introduces a novel coroutine-based framework for verifying the correctness of user-written guide programs for a broad class of Markov chain Monte Carlo (MCMC) inference algorithms. Our approach rests on a novel type system for describing communication protocols between a model program and a sequence of guides that each update only a subset of random variables. We prove that, by translating guide types to context-free processes with finite norms, it is possible to check structural type equality between models and guides in polynomial time. This connection gives rise to an efficient type-inference algorithm for probabilistic programs with flexible constructs such as general recursion and branching. We also contribute a coverage-checking algorithm that verifies the support of sequentially composed guide programs agrees with that of the model program, which is a key soundness condition for MCMC inference with multiple guides. Evaluations on diverse benchmarks show that our type-inference and coverage-checking algorithms efficiently infer types and detect sound and unsound guides for programs that existing static analyses cannot handle.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Extending the C/C++ Memory Model with Inline Assembly
Paulo Emílio de Vilhena,
Ori Lahav,
Viktor Vafeiadis, and
Azalea Raad
(Imperial College London, United Kingdom; Tel Aviv University, Israel; MPI-SWS, Germany)
Programs written in C/C++ often include inline assembly: a snippet of architecture-specific assembly code used to access low-level functionalities that are impossible or expensive to simulate in the source language. Although inline assembly is widely used, its semantics has not yet been formally studied.
In this paper, we overcome this deficiency by investigating the effect of inline assembly on the consistency semantics of C/C++ programs. We propose the first memory model of the C++ Programming Language with support for inline assembly for Intel's x86 including non-temporal stores and store fences. We argue that previous provably correct compiler optimizations and correct compiler mappings should remain correct under such an extended model and we prove that this requirement is met by our proposed model.
Article Search
Effects and Coeffects in Call-by-Push-Value
Cassia Torczon,
Emmanuel Suárez Acevedo,
Shubh Agrawal,
Joey Velez-Ginorio, and
Stephanie Weirich
(University of Pennsylvania, USA; University of Michigan, USA)
Effect and coeffect tracking integrate many types of compile-time
analysis, such as cost, liveness, or dataflow, directly into a language's type
system. In this paper, we investigate the addition of effect and coeffect
tracking to the type system of call-by-push-value (CBPV), a computational
model useful in compilation for its isolation of effects and for its ability
to cleanly express both call-by-name and call-by-value computations. Our
main result is effect-and-coeffect soundness, which asserts that the
type system accurately bounds the effects that the program may trigger
during execution and accurately tracks the demands that the program may make
on its environment. This result holds for two different dynamic semantics: a
generic one that can be adapted for different coeffects and one that is
adapted for reasoning about resource usage. In particular, the second
semantics discards the evaluation of unused values and pure computations
while ensuring that effectful computations are always evaluated, even if
their results are not required. Our results have been mechanized using
the Coq proof assistant.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Gradient: Gradual Compartmentalization via Object Capabilities Tracked in Types
Aleksander Boruch-Gruszecki,
Adrien Ghosn,
Mathias Payer, and
Clément Pit-Claudel
(Charles University, Czechia; Microsoft Research, United Kingdom; EPFL, Switzerland)
Modern software needs fine-grained compartmentalization, i.e., intra-process isolation.
A particularly important reason for it are supply-chain attacks,
the need for which is aggravated by modern applications depending on hundreds or even thousands of libraries.
Object capabilities are a particularly salient approach to compartmentalization,
but they require the entire program to assume a lack of ambient authority.
Most of existing code was written under no such assumption;
effectively, existing applications need to undergo a rewrite-the-world migration to reap the advantages of ocap.
We propose gradual compartmentalization,
an approach which allows gradually migrating an application to object capabilities,
component by component in arbitrary order,
all the while continuously enjoying security guarantees.
The approach relies on runtime authority enforcement
and tracking the authority of objects the type system.
We present Gradient, a proof-of-concept gradual compartmentalization extension to Scala
which uses Enclosures and Capture Tracking as its key components.
We evaluate our proposal by migrating the standard XML library of Scala to Gradient.
Article Search
Compilation of Shape Operators on Sparse Arrays
Alexander J Root,
Bobby Yan,
Peiming Liu,
Christophe Gyurgyik,
Aart J.C. Bik, and
Fredrik Kjolstad
(Stanford University, USA; Google Research, USA)
We show how to build a compiler for a sparse array language that supports shape operators such as reshaping or concatenating arrays, in addition to compute operators. Existing sparse array programming systems implement generic shape operators for only some sparse data structures, reduce shape operators on other data structures to those, and do not support fusion. Our system compiles sparse array expressions to code that efficiently iterates over reshaped views of irregular sparse data structures, without needing to materialize temporary storage for intermediates. Our evaluation shows that our approach generates sparse array code competitive with popular sparse array libraries: our generated shape operators achieve geometric mean speed-ups of 1.66×–15.3× when compared to hand-written kernels in scipy.sparse and 1.67×–651× when compared to generic implementations in pydata/sparse. For operators that require data structure conversions in these libraries, our generated code achieves geometric mean speed-ups of 7.29×–13.0× when compared to scipy.sparse and 21.3×–511× when compared to pydata/sparse. Finally, our evaluation demonstrates that fusing shape and compute operators improves the performance of several expressions by geometric mean speed-ups of 1.22×–2.23×.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Tachis: Higher-Order Separation Logic with Credits for Expected Costs
Philipp G. Haselwarter,
Kwing Hei Li,
Markus de Medeiros,
Simon Oddershede Gregersen,
Alejandro Aguirre,
Joseph Tassarotti, and
Lars Birkedal
(Aarhus University, Denmark; New York University, USA)
We present Tachis, a higher-order separation logic to reason about the expected cost of probabilistic programs. Inspired by the uses of time credits for reasoning about the running time of deterministic programs, we introduce a novel notion of probabilistic cost credit. Probabilistic cost credits are a separation logic resource that can be used to pay for the cost of operations in programs, and that can be distributed across all possible branches of sampling instructions according to their weight, thus enabling us to reason about expected cost. The representation of cost credits as separation logic resources gives Tachis a great deal of flexibility and expressivity. In particular, it permits reasoning about amortized expected cost by storing excess credits as potential into data structures to pay for future operations. Tachis further supports a range of cost models, including running time and entropy usage. We showcase the versatility of this approach by applying our techniques to prove upper bounds on the expected cost of a variety of probabilistic algorithms and data structures, including randomized quicksort, hash tables, and meldable heaps. All of our results have been mechanized using Coq, Iris, and the Coquelicot real analysis library.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Making Formulog Fast: An Argument for Unconventional Datalog Evaluation
Aaron Bembenek,
Michael Greenberg, and
Stephen Chong
(University of Melbourne, Australia; Stevens Institute of Technology, USA; Harvard University, USA)
With its combination of Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping—a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with some surprising results: while 2.2× speedups can be obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In the place of semi-naive evaluation, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog’s SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé’s code generator achieve mean 5.2× and 7.6× speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. All in all, using compilation and eager evaluation (as appropriate), Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F♯, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that traditional semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Realistic Realizability: Specifying ABIs You Can Count On
Andrew Wagner,
Zachary Eisbach, and
Amal Ahmed
(Northeastern University, USA)
The Application Binary Interface (ABI) for a language defines the interoperability rules for its target platforms, including data layout and calling conventions, such that compliance with the rules ensures “safe” execution and perhaps certain resource usage guarantees. These rules are relied upon by compilers, libraries, and foreign- function interfaces. Unfortunately, ABIs are typically specified in prose, and while type systems for source languages have evolved, ABIs have comparatively stalled, lacking advancements in expressivity and safety.
We propose a vision for richer, semantic ABIs to improve interoperability and library integration, sup- ported by a methodology for formally specifying ABIs using realizability models. These semantic ABIs con- nect abstract, high-level types to unwieldy, but well-behaved, low-level code. We illustrate our approach with a case study formalizing the ABI of a functional source language in terms of a reference-counting im- plementation in a C-like target language. A key contribution supporting this case study is a graph-based model of separation logic that captures the ownership and accessibility of reference-counted resources using modalities inspired by hybrid logic. To highlight the flexibility of our methodology, we show how various de- sign decisions can be interpreted into the semantic ABI. Finally, we provide the first formalization of library evolution, a distinguishing feature of Swift’s ABI.
Article Search
PolyJuice: Detecting Mis-compilation Bugs in Tensor Compilers with Equality Saturation Based Rewriting
Chijin Zhou,
Bingzhou Qian,
Gwihwan Go,
Quan Zhang,
Shanshan Li, and
Yu Jiang
(Tsinghua University, China; National University of Defense Technology, China)
Tensor compilers are essential for deploying deep learning applications across various hardware platforms. While powerful, they are inherently complex and present significant challenges in ensuring correctness. This paper introduces PolyJuice, an automatic detection tool for identifying mis-compilation bugs in tensor compilers. Its basic idea is to construct semantically-equivalent computation graphs to validate the correctness of tensor compilers. The main challenge is to construct equivalent graphs capable of efficiently exploring the diverse optimization logic during compilation. We approach it from two dimensions. First, we propose arithmetic and structural equivalent rewrite rules to modify the dataflow of a tensor program. Second, we design an efficient equality saturation based rewriting framework to identify the most simplified and the most complex equivalent computation graphs for an input graph. After that, the outcome computation graphs have different dataflow and will likely experience different optimization processes during compilation. We applied it to five well-tested industrial tensor compilers, namely PyTorch Inductor, OnnxRuntime, TVM, TensorRT, and XLA, as well as two well-maintained academic tensor compilers, EinNet and Hidet. In total, PolyJuice detected 84 non-crash mis-compilation bugs, out of which 49 were confirmed with 20 fixed.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Semantics Lifting for Syntactic Sugar
Zhichao Guan,
Yiyuan Cao,
Tailai Yu,
Ziheng Wang,
Di Wang, and
Zhenjiang Hu
(Peking University, China; Tsinghua University, China)
Syntactic sugar plays a crucial role in engineering programming languages. It offers convenient syntax and higher-level of abstractions, as witnessed by its pervasive use in both general-purpose and domain-specific contexts. Unfortunately, the traditional approach of translating programs containing syntactic sugars into the host language can lead to abstraction leakage, breaking the promise of convenience and hindering program comprehension. To address this challenge, we introduce the idea of semantics lifting that aims to statically derive self-contained evaluation rules for syntactic sugars. More specifically, we propose a semantics-lifting framework that consists of (i) a general algorithm for deriving host-independent semantics of syntactic sugars from the semantics of the host language and the desugaring rules, (ii) a formulation of the correctness and abstraction properties for a lifted semantics, and (iii) a systematic investigation of sufficient conditions that ensure a lifted semantics is provably correct and abstract. To evaluate our semantics-lifting framework, we have implemented a system named Osazone and conducted several case studies, demonstrating that our approach is flexible, effective, and practical for implementing domain-specific languages.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
MEA2: A Lightweight Field-Sensitive Escape Analysis with Points-to Calculation for Golang
Boyao Ding,
Qingwei Li,
Yu Zhang,
Fugen Tang, and
Jinbao Chen
(University of Science and Technology of China, China; Institute of Artificial Intelligence at Hefei Comprehensive National Science Center, China)
Escape analysis plays a crucial role in garbage-collected languages as it enables the allocation of non-escaping variables on the stack by identifying the dynamic lifetimes of objects and pointers. This helps in reducing heap allocations and alleviating garbage collection pressure. However, Go, as a garbage-collected language, employs a fast yet conservative escape analysis, which is field-insensitive and omits point-to-set calculation to expedite compilation. This results in more variables being allocated on the heap. Empirical statistics reveal that field access and indirect memory access are prevalent in real-world Go programs, suggesting potential opportunities for escape analysis to enhance program performance. In this paper, we propose MEA2, an escape analysis framework atop GoLLVM (an LLVM-based Go compiler), which combines field sensitivity and points-to analysis. Moreover, a novel generic function summary representation is designed to facilitate fast inter-procedural analysis. We evaluated it by using MEA2 to perform stack allocation in 12 wildly-use open-source projects. The results show that, compared to Go’s escape analysis, MEA2 can reduce heap allocation sites by 7.9
Article Search
Weighted Context-Free-Language Ordered Binary Decision Diagrams
Meghana Sistla,
Swarat Chaudhuri, and
Thomas Reps
(University of Texas at Austin, USA; University of Wisconsin, USA)
This paper presents a new data structure, called Weighted Context-Free-Language Ordered BDDs (WCFLOBDDs), which are a hierarchically structured decision diagram, akin to Weighted BDDs (WBDDs) enhanced with a procedure-call mechanism. For some functions, WCFLOBDDs are exponentially more succinct than WBDDs. They are potentially beneficial for representing functions of type Bn → D, when a function’s image V ⊆ D has many different values. We apply WCFLOBDDs in quantum-circuit simulation, and find that they perform better than WBDDs on certain benchmarks. With a 15-minute timeout, the number of qubits that can be handled by WCFLOBDDs is 1-64× that of WBDDs(and 1-128× that of CFLOBDDs, which are an unweighted version of WCFLOBDDs). These results support the conclusion that for this application—from the standpoint of problem size, measured as the number of qubits—WCFLOBDDs provide the best of both worlds: performance roughly matches whichever of WBDDs and CFLOBDDs is better.(From the standpoint of running time, the results are more nuanced.)
Preprint
Artifacts Available
Artifacts Functional
Results Reproduced
Finding ∀∃ Hyperbugs using Symbolic Execution
Arthur Correnson,
Tobias Nießen,
Bernd Finkbeiner, and
Georg Weissenbacher
(CISPA Helmholtz Center for Information Security, Germany; TU Wien, Austria)
Many important hyperproperties, such as refinement and generalized non-interference, fall into the class of ∀∃ hyperproperties and require, for each execution trace of a system, the existence of another trace relating to the first one in a certain way. The alternation of quantifiers renders ∀∃ hyperproperties extremely difficult to verify, or even just to test. Indeed, contrary to trace properties, where it suffices to find a single counterexample trace, refuting a ∀∃ hyperproperty requires not only to find a trace, but also a proof that no second trace satisfies the specified relation with the first trace. As a consequence, automated testing of ∀∃ hyperproperties falls out of the scope of existing automated testing tools. In this paper, we present a fully automated approach to detect violations of ∀∃ hyperproperties in software systems. Our approach extends bug-finding techniques based on symbolic execution with support for trace quantification. We provide a prototype implementation of our approach, and demonstrate its effectiveness on a set of challenging examples.
Article Search
Multris: Functional Verification of Multiparty Message Passing in Separation Logic
Jonas Kastberg Hinrichsen,
Jules Jacobs, and
Robbert Krebbers
(Aarhus University, Denmark; Cornell University, USA; Radboud University Nijmegen, Netherlands)
We introduce Multris, a separation logic for verifying functional correctness of programs that combine multiparty message-passing communication with shared-memory concurrency. The foundation of our work is a novel concept of multiparty protocol consistency, which guarantees safe communication among a set of parties, provided each party adheres to its prescribed protocol. Our concept of protocol consistency is inspired by the bottom-up approach for multiparty session types. However, by considering it in the context of separation logic instead of a type system, we go further in terms of generality by supporting new notions of implicit transfer of knowledge and implicit transfer of resources. We develop tactics for automatically verifying protocol consistency and for reasoning about message-passing operations in Multris. We evaluate Multris on a range of examples, including the well-known two- and three-buyer protocols, as well as a new verification benchmark based on Chang and Roberts's ring leader election protocol. To ensure the reliability of our work, we prove soundness of Multris w.r.t. a low-level channel semantics using the Iris framework in Coq.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Law and Order for Typestate with Borrowing
Hannes Saffrich,
Yuki Nishida, and
Peter Thiemann
(University of Freiburg, Germany; Kyoto University, Japan)
Typestate systems are notoriously complex as they require
sophisticated machinery for tracking aliasing.
We propose a new, transition-oriented foundation for typestate in the
setting of impure functional programming.
Our approach relies on ordered types for simple alias tracking and
its formalization draws on work on bunched implications.
Yet, we support a flexible notion of borrowing in the presence of typestate.
Our core calculus comes with a notion of resource types indexed by an
ordered partial monoid that models abstract state transitions.
We prove syntactic type soundness with respect to a
resource-instrumented semantics.
We give an algorithmic version of our type system and prove its
soundness. Algorithmic typing facilitates a
simple surface language that does not expose tedious details of
ordered types. We implemented a typechecker for the surface language
along with an interpreter for the core language.
Article Search
FPCC: Detecting Floating-Point Errors via Chain Conditions
Xin Yi,
Hengbiao Yu,
Liqian Chen,
Xiaoguang Mao, and
Ji Wang
(National University of Defense Technology, China; National University of Defense Technology, Changsha, China)
Floating-point arithmetic is notorious for its rounding errors, which can propagate and accumulate, leading to unacceptable results. Detecting inputs that can trigger significant floating-point errors is crucial for enhancing the reliability of numerical programs. Existing methods for generating error-triggering inputs often rely on costly shadow executions that involve high-precision computations or suffer from false positives. This paper introduces chain conditions to capture the propagation and accumulation of floating-point errors, using them to guide the search for error-triggering inputs. We have implemented a tool named FPCC and evaluated it on 88 functions from the GNU Scientific Library, as well as 21 functions with multiple inputs from previous research. The experimental results demonstrate the effectiveness and efficiency of our approach: (1) FPCC achieves 100% accuracy in detecting significant errors for the reported rank-1 inputs, while 72.69% rank-1 inputs from the state-of-the-art tool ATOMU can trigger significant errors. Overall, 99.64% (1049/1053) of the inputs reported by FPCC can trigger significant errors, whereas only 19.45% (141/723) of the inputs reported by ATOMU can trigger significant errors; (2) FPCC exhibits a 2.17x speedup over ATOMU in detecting significant errors; (3) FPCC also excels in supporting functions with multiple inputs, outperforming the state-of-the-art technique. To facilitate further research in the community, we have made FPCC available on GitHub at https://github.com/DataReportRe/FPCC.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
Scaling Abstraction Refinement for Program Analyses in Datalog using Graph Neural Networks
Zhenyu Yan,
Xin Zhang, and
Peng Di
(Peking University, China; Ant Group, China)
Counterexample-guided abstraction refinement (CEGAR) is a popular approach for automatically selecting abstractions with high precision and low time costs. Existing works cast abstraction refinements as constraint-solving problems. Due to the complexity of these problems, they cannot be scaled to large programs or complex analyses. We propose a novel approach that applies graph neural networks to improve the scalability of CEGAR for Datalog-based program analyses. By constructing graphs directly from the Datalog solver’s calculations, our method then uses a neural network to score abstraction parameters based on the information in these graphs. Then we reform the constraint problems such that the constraint solver ignores parameters with low scores. This in turn reduces the solution space and the size of the constraint problems. Since our graphs are directly constructed from Datalog computation without human effort, our approach can be applied to a broad range of parametric static analyses implemented in Datalog. We evaluate our approach on a pointer analysis and a typestate analysis and our approach can answer 2.83× and 1.5× as many queries as the baseline approach on large programs for the pointer analysis and the typestate analysis, respectively.
Article Search
Artifacts Available
Artifacts Functional
Minotaur: A SIMD-Oriented Synthesizing Superoptimizer
Zhengyang Liu,
Stefan Mada, and
John Regehr
(University of Utah, USA)
A superoptimizing compiler—-one that performs a meaningful search of the program space as part of the optimization process—-can find optimization opportunities that are missed by even the best existing optimizing compilers. We created Minotaur: a superoptimizer for LLVM that uses program synthesis to improve its code generation, focusing on integer and floating-point SIMD code. On an Intel Cascade Lake processor, Minotaur achieves an average speedup of 7.3% on the GNU Multiple Precision library (GMP)’s benchmark suite, with a maximum speedup of 13%. On SPEC CPU 2017, our superoptimizer produces an average speedup of 1.5%, with a maximum speedup of 4.5% for 638.imagick. Every optimization produced by Minotaur has been formally verified, and several optimizations that it has discovered have been implemented in LLVM as a result of our work.
Article Search
A Typed Multi-level Datalog IR and Its Compiler Framework
David Klopp,
Sebastian Erdweg, and
André Pacak
(JGU Mainz, Germany)
The resurgence of Datalog in the last two decades has led to a multitude of new Datalog systems.
These systems explore novel ideas for improving Datalog's programmability and performance, making important contributions to the field.
Unfortunately, the individual systems progress at a much slower pace than the overall field, because improvements in one system are rarely ported to other systems.
The reason for this rift is that each system provides its own Datalog dialect with specific notation, language features, and invariants, enabling specific optimization and execution strategies.
This paper presents the first compiler framework for Datalog that can be used to support any Datalog frontend language and to target any Datalog backend.
The centerpiece of our framework is a novel typed multi-level Datalog IR that supports IR extensions and guarantees executability.
Existing Datalog systems can provide a compiler frontend that translates their Datalog dialect to the extended IR.
The IR is then progressively lowered toward core Datalog, allowing optimizations at each level.
At last, compiler backends can target different Datalog solvers.
We have implemented the compiler framework and integrated 4~Datalog frontends and 3~Datalog backends, using 16~IR extensions.
We also formalize the IR's flexible type system, which is bidirectional, flow-sensitive, bipolar, and uses three-valued typing contexts.
The type system simultaneously validates type compatibility and precisely tracks bindings of logic variables while permitting IR extensions.
Article Search
HardTaint: Production-Run Dynamic Taint Analysis via Selective Hardware Tracing
Yiyu Zhang,
Tianyi Liu,
Yueyang Wang,
Yun Qi,
Kai Ji,
Jian Tang,
Xiaoliang Wang,
Xuandong Li, and
Zhiqiang Zuo
(Nanjing University, China)
Dynamic taint analysis (DTA), as a fundamental analysis technique, is widely used in security, privacy, and diagnosis, etc. As DTA demands to collect and analyze massive taint data online, it suffers extremely high runtime overhead. Over the past decades, numerous attempts have been made to lower the overhead of DTA. Unfortunately, the reductions they achieved are marginal, causing DTA only applicable to the debugging/testing scenarios. In this paper, we propose and implement HardTaint, a system that can realize production-run dynamic taint tracking. HardTaint adopts a hybrid and systematic design which combines static analysis, selective hardware tracing and parallel graph processing techniques. The comprehensive evaluations demonstrate that HardTaint introduces only around 8% runtime overhead which is an order of magnitude lower than the state-of-the-arts, while without sacrificing any taint detection capability.
Article Search
Artifacts Available
Jmvx: Fast Multi-threaded Multi-version Execution and Record-Replay for Managed Languages
David Schwartz,
Ankith Kowshik, and
Luís Pina
(University of Illinois, Chicago, USA)
Multi-Version eXecution (MVX) is a technique that deploys many equivalent versions of the same program — variants — as a single program, with direct applications in important fields such as: security, reliability, analysis, and availability. MVX can be seen as “online Record/Replay (RR)”, as RR captures a program’s execution as a log stored on disk that can later be replayed to observe the same execution. Unfortunately, current MVX techniques target programs written in C/C++ and do not support programs written in managed languages, which are the vast majority of code written nowadays.
This paper presents the design, implementation, and evaluation of Jmvx— a novel system for performing MVX and RR on programs written in managed languages. Jmvx supports programs written in Java by intercepting automatically identified non-deterministic methods, via a novel dynamic analysis technique, and ensuring that all variants execute the same methods and obtain the same data.
Jmvx supports multi-threaded programs, by capturing synchronization operations in one variant, and ensuring all other variants follow the same ordering. We validated that Jmvx supports MVX and RR by applying it to a suite of benchmarks representative of programs written in Java. Internally, Jmvx uses a circular buffer located in shared memory between JVMs to enable fast communication between all variants, averaging 5% |47% performance overhead when performing MVX with multithreading support disabled|enabled, 8% |25% when recording, and 13% |73% when replaying.
Article Search
Artifacts Available
Lexical Effect Handlers, Directly
Cong Ma,
Zhaoyi Ge,
Edward Lee, and
Yizhou Zhang
(University of Waterloo, Canada)
Lexically scoping effect handlers is a language-design idea that equips algebraic effects with a modular semantics: it enables local-reasoning principles without giving up on the control-flow expressiveness that makes effect handlers powerful. However, we observe that existing implementations risk incurring costs akin to the run-time search for dynamically scoped handlers. This paper presents a compilation strategy for lexical effect handlers, adhering to the lexical scoping principle and targeting a language with low-level control over stack layout. Key aspects of this approach are formalized and proven correct. We embody the ideas in a language called Lexa and a compiler that translates high-level effect handling in Lexa to low-level stack switching. We evaluate the Lexa compiler on a set of benchmarks; the results suggest that it generates efficient code, reducing running-time complexity from quadratic to linear in some cases.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
HybridSA: GPU Acceleration of Multi-pattern Regex Matching using Bit Parallelism
Alexis Le Glaunec,
Lingkun Kong, and
Konstantinos Mamouras
(Rice University, USA)
Multi-pattern matching is widely used in modern software for applications requiring high throughput such as protein search, network traffic inspection, virus or spam detection. Graphics Processor Units (GPUs) excel at executing massively parallel workloads. Regular expression (regex) matching is typically performed by simulating the execution of deterministic finite automata (DFAs) or nondeterministic finite automata (NFAs). The natural implementations of these automata simulation algorithms on GPUs are highly inefficient because they give rise to irregular memory access patterns.
This paper presents HybridSA, a heterogeneous CPU-GPU parallel engine for multi-pattern matching. HybridSA uses bit parallelism to efficiently simulate NFAs on GPUs, thus reducing the number of memory accesses and increasing the throughput. Our bit-parallel algorithms extend the classical shift-and algorithm for string matching to a large class of regular expressions and reduce automata simulation to a small number of bitwise operations. We have developed a compiler to translate regular expressions into bit masks, perform optimizations, and choose the best algorithms to run on the GPU. The majority of the regular expressions are accelerated on the GPU, while the patterns that exhibit random memory accesses are executed on the CPU in parallel. We evaluate HybridSA against state-of-the-art CPU and GPU engines, as well as a hybrid combination of the two. HybridSA achieves between 4 and 60 times higher throughput than the state-of-the-art CPU engine and between 4 and 233 times better than the state-of-the-art GPU engine across a collection of real-world benchmarks.
Article Search
A Runtime System for Interruptible Query Processing: When Incremental Computing Meets Fine-Grained Parallelism
Jeff Eymer,
Philip Dexter,
Joseph Raskind, and
Yu David Liu
(SUNY Binghamton, USA)
Online data services have stringent performance requirement and must tolerate workload fluctuation. This paper introduces PitStop, a new query language runtime design built on the idea of interruptible query processing: the time-consuming task of data inspection for processing each query or update may be interrupted and resumed later at the boundary of fine-grained data partitions. This counter-intuitive idea enables a novel form of fine-grained concurrency while preserving sequential consistency. We build PitStop through modifying the language runtime of Cypher, the query language of a state-of-the-art graph database, Neo4j. Our evaluation on the Google Cloud shows that PitStop can outperform unmodified Neo4j during workload fluctuation, with reduced latency and increased throughput.
Article Search
Artifacts Available
StarMalloc: Verifying a Modern, Hardened Memory Allocator
Antonin Reitz,
Aymeric Fromherz, and
Jonathan Protzenko
(Inria, France; Microsoft Research, USA)
We present StarMalloc, a verified, efficient, security-oriented, and concurrent memory allocator. Using the Steel separation logic framework, we show how to specify and verify a multitude of low-level patterns and delicate security mechanisms, by relying on a combination of dependent types, SMT, and modular abstractions to enable efficient verification. We produce a verified artifact, in C, that implements the entire API surface of an allocator, and as such works as a drop-in replacement for real-world projects, notably the Firefox browser.
As part of StarMalloc, we develop several generic datastructures and proof libraries directly reusable in future systems verification projects. We also extend the Steel toolchain to express several low-level idioms that were previously missing. Finally, we show that StarMalloc exhibits competitive performance by evaluating it against 10 state-of-the-art memory allocators, and against a variety of real-world projects, such as Redis, the Lean compiler, and the Z3 SMT solver.
Article Search
Artifacts Available
Artifacts Reusable
AUTOMAP: Inferring Rank-Polymorphic Function Applications with Integer Linear Programming
Robert Schenck,
Nikolaj Hey Hinnerskov,
Troels Henriksen,
Magnus Madsen, and
Martin Elsman
(University of Copenhagen, Denmark; Aarhus University, Denmark)
Dynamically typed array languages such as Python, APL, and Matlab lift scalar operations to arrays and replicate scalars to fit applications. We present a mechanism for automatically inferring map and replicate operations in a statically-typed language in a way that resembles the programming experience of a dynamically-typed language while preserving the static typing guarantees. Our type system---which supports parametric polymorphism, higher-order functions, and top-level let-generalization---makes use of integer linear programming in order to find the minimum number of operations needed to elaborate to a well-typed program. We argue that the inference system provides useful and unsurprising guarantees to the programmer. We demonstrate important theoretical properties of the mechanism and report on the implementation of the mechanism in the statically-typed array programming language Futhark.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Concurrent Data Structures Made Easy
Callista Le,
Kiran Gopinathan,
Koon Wen Lee,
Seth Gilbert, and
Ilya Sergey
(Yale-NUS College, Singapore; National University of Singapore, Singapore; Ahrefs, Singapore)
Design of an efficient thread-safe concurrent data structure is a balancing act between its implementation complexity and performance. Lock-based concurrent data structures, which are relatively easy to derive from their sequential counterparts and to prove thread-safe, suffer from poor throughput under even light multi-threaded workload. At the same time, lock-free concurrent structures allow for high throughput, but are notoriously difficult to get right and require careful reasoning to formally establish their correctness.
In this work, we explore a solution to this conundrum based on a relatively old idea of batch parallelism---an approach for designing high-throughput concurrent data structures via a simple insight: efficiently processing a batch of a priori known operations in parallel is easier than optimising performance for a stream of arbitrary asynchronous requests. Alas, batch-parallel structures have not seen wide practical adoption due to (i) the inconvenience of having to structure multi-threaded programs to explicitly group operations and (ii) the lack of a systematic methodology to implement batch-parallel structures as simply as lock-based ones.
We present OBatcher---a Multicore OCaml library that streamlines the design, implementation, and usage of batch-parallel structures. OBatcher solves the first challenge (how to use) by suggesting a new lightweight implicit batching design pattern that is built on top of generic asynchronous programming mechanisms. The second challenge (how to implement) is addressed by identifying a family of strategies for converting common sequential structures into the corresponding efficient batch-parallel versions, and by providing a library of functors that embody those strategies. We showcase OBatcher with a diverse set of benchmarks ranging from Red-Black and AVL trees to van Emde Boas trees, skip lists, and a thread-safe implementation of a Datalog solver. Our evaluation of all the implementations on large asynchronous workloads shows that (a) they consistently outperform the corresponding coarse-grained lock-based implementations---the only ones available in OCaml to date, and that (b) their throughput scales reasonably with the number of processors.
Preprint
Artifacts Available
Artifacts Reusable
Results Reproduced
Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models
Ningke Li,
Yuekang Li,
Yi Liu,
Ling Shi,
Kailong Wang, and
Haoyu Wang
(Huazhong University of Science and Technology, China; UNSW, Australia; Nanyang Technological University, Singapore)
Large language models (LLMs) have revolutionized language processing, but face critical challenges with security, privacy, and generating hallucinations — coherent but factually inaccurate outputs. A major issue is fact-conflicting hallucination (FCH), where LLMs produce content contradicting ground truth facts. Addressing FCH is difficult due to two key challenges: 1) Automatically constructing and updating benchmark datasets is hard, as existing methods rely on manually curated static benchmarks that cannot cover the broad, evolving spectrum of FCH cases. 2) Validating the reasoning behind LLM outputs is inherently difficult, especially for complex logical relations. To tackle these challenges, we introduce a novel logic-programming-aided metamorphic testing technique for FCH detection. We develop an extensive and extensible framework that constructs a comprehensive factual knowledge base by crawling sources like Wikipedia, seamlessly integrated into Drowzee. Using logical reasoning rules, we transform and augment this knowledge into a large set of test cases with ground truth answers. We test LLMs on these cases through template-based prompts, requiring them to provide reasoned answers. To validate their reasoning, we propose two semantic-aware oracles that assess the similarity between the semantic structures of the LLM answers and ground truth. Our approach automatically generates useful test cases and identifies hallucinations across six LLMs within nine domains, with hallucination rates ranging from 24.7% to 59.8%. Key findings include LLMs struggling with temporal concepts, out-of-distribution knowledge, and lack of logical reasoning capabilities. The results show that logic-based test cases generated by Drowzee effectively trigger and detect hallucinations. To further mitigate the identified FCHs, we explored model editing techniques, which proved effective on a small scale (with edits to fewer than 1000 knowledge pieces). Our findings emphasize the need for continued community efforts to detect and mitigate model hallucinations.
Article Search
Monotone Procedure Summarization via Vector Addition Systems and Inductive Potentials
Nikhil Pimpalkhare and
Zachary Kincaid
(Princeton University, USA)
This paper presents a technique for summarizing recursive procedures operating on integer variables. The motivation of our work is to create more predictable program analyzers, and in particular to formally guarantee compositionality and monotonicity of procedure summarization. To summarize a procedure, we compute its best abstraction as a vector addition system with resets (VASR) and exactly summarize the executions of this VASR over the context-free language of syntactic paths through the procedure. We improve upon this technique by refining the language of syntactic paths using (automatically synthesized) linear potential functions that bound the number of recursive calls within valid executions of the input program. We implemented our summarization technique in an automated program verification tool; our experimental evaluation demonstrates that our technique computes more precise summaries than existing abstract interpreters and that our tool’s verification capabilities are comparable with state-of-the-art software model checkers.
Article Search
Artifacts Available
Model Checking Distributed Protocols in Must
Constantin Enea,
Dimitra Giannakopoulou,
Michalis Kokologiannakis, and
Rupak Majumdar
(Amazon Web Services, France; École Polytechnique, France; Amazon Web Services, USA; ETH Zurich, Switzerland; Amazon Web Services, Germany; MPI-SWS, Germany)
We describe the design and implementation of Must, a framework for
modeling and automatically verifying distributed systems. Must
provides a concurrency API that supports multiple communication
models, on top of a mainstream programming language, such as Rust.
Given a program using this API, Must verifies it by means of a
novel, optimal dynamic partial order reduction algorithm that
maintains completeness and optimality for all communication models
supported by the API.
We use Must to design and verify models of distributed systems in
an industrial context. We demonstrate the usability of Must's API
by modeling high-level system idioms (e.g., timeouts, leader
election, versioning) as abstractions over the core API, and
demonstrate Must's scalability by verifying systems employed in
production (e.g., replicated logs, distributed transaction management
protocols), the verification of which lies beyond the capacity of
previous model checkers.
Article Search
Reward Augmentation in Reinforcement Learning for Testing Distributed Systems
Andrea Borgarelli,
Constantin Enea,
Rupak Majumdar, and
Srinidhi Nagendra
(MPI-SWS, Germany; LIX - CNRS - École Polytechnique, France; IRIF - CNRS - Université Paris Cité, France; Chennai Mathematical Institute, India)
Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on reinforcement learning. Since the natural reward structure is very sparse, the key to successful exploration in reinforcement learning is reward augmentation. We show two different techniques that build on one another. First, we provide a decaying exploration bonus based on the discovery of new states---the reward decays as the same state is visited multiple times. The exploration bonus captures the intuition from coverage-guided fuzzing of prioritizing new coverage points; in contrast to other schemes, we show that taking the maximum of the bonus and the Q-value leads to more effective exploration. Second, we provide waypoints to the algorithm as a sequence of predicates that capture interesting semantic scenarios. Waypoints exploit designer insight about the protocol and guide the exploration to ``interesting'' parts of the state space. Our reward structure ensures that new episodes can reliably get to deep interesting states even without execution caching. We have implemented our algorithm in Go. Our evaluation on three large benchmarks (RedisRaft, Etcd, and RSL) shows that our algorithm can significantly outperform baseline approaches in terms of coverage and bug finding.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
Rustlantis: Randomized Differential Testing of the Rust Compiler
Qian Wang and
Ralf Jung
(ETH Zurich, Switzerland; Imperial College London, United Kingdom)
Compilers are at the core of all computer architecture. Their middle-end and back-end are full of subtle code that is easy to get wrong. At the same time, the consequences of compiler bugs can be severe. Therefore, it is important that we develop techniques to increase our confidence in compiler correctness, and to help find the bugs that inevitably happen. One promising such technique that has successfully found many compiler bugs in the past is randomized differential testing, a fuzzing approach whereby the same program is executed with different compilers or different compiler settings to detect any unexpected differences in behavior. We present Rustlantis, the first fuzzer for the Rust programming language that is able to find new correctness bugs in the official Rust compiler. To avoid having to deal with Rust’s strict type and borrow checker, Rustlantis directly generates MIR, the central IR of the Rust compiler for optimizations. The program generation strategy of Rustlantis is a combination of statically tracking the state of the program, obscuring the program state for the compiler, and decoy blocks to lead optimizations astray. This has allowed us to identify 22 previously unknown bugs in the Rust compiler, most of which have been fixed.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO Architectures
Guillaume Ambal,
Brijesh Dongol,
Haggai Eran,
Vasileios Klimis,
Ori Lahav, and
Azalea Raad
(Imperial College London, United Kingdom; University of Surrey, United Kingdom; NVIDIA, Israel; Queen Mary University of London, United Kingdom; Tel Aviv University, Israel)
Remote direct memory access (RDMA) is a modern technology enabling networked machines to exchange information without involving the operating system of either side, and thus significantly speeding up data transfer in computer clusters. While RDMA is extensively used in practice and studied in various research papers, a formal underlying model specifying the allowed behaviours of concurrent RDMA programs running in modern multicore architectures is still missing. This paper aims to close this gap and provide semantic foundations of RDMA on x86-TSO machines. We propose three equivalent formal models, two operational models in different levels of abstraction and one declarative model, and prove that the three characterisations are equivalent. To gain confidence in the proposed semantics, the more concrete operational model has been reviewed by NVIDIA experts, a major vendor of RDMA systems, and we have empirically validated the declarative formalisation on various subtle litmus tests by extensive testing. We believe that this work is a necessary initial step for formally addressing RDMA-based systems by proposing language-level models, verifying their mapping to hardware, and developing reasoning techniques for concurrent RDMA programs.
Article Search
Info
Imperative Compositional Programming: Type Sound Distributive Intersection Subtyping with References via Bidirectional Typing
Wenjia Ye,
Yaozhu Sun, and
Bruno C. d. S. Oliveira
(University of Hong Kong, China)
Compositional programming is a programming paradigm that emphasizes modularity and is implemented in the CP programming language. The foundations for compositional programming are based on a purely functional variant of System F with intersection types, called Fi+, which includes distributivity rules for subtyping.
This paper shows how to extend compositional programming and CP with mutable references, enabling a modular, imperative compositional programming style. A technical obstacle solved in our work is the interaction between distributive intersection subtyping and mutable references. Davies and Pfenning [2000] studied this problem in standard formulations of intersection type systems and argued that, when combined with references, distributive subtyping rules lead to type unsoundness. To recover type soundness, they proposed dropping distributivity rules in subtyping. CP cannot adopt this solution, since it fundamentally relies on distributivity for modularity. Therefore, we revisit the problem and show that, by adopting bidirectional typing, a more lightweight and type sound restriction is possible: we can simply restrict the typing rule for references. This solution retains distributivity and an unrestricted intersection introduction rule. We present a first calculus, based on Davies and Pfenning's work, which illustrates the generality of our solution. Then we present an extension of Fi+ with references, which adopts our restriction and enables imperative compositional programming. We implement an extension of CP with references and show how to model a modular live-variable analysis in CP. Both calculi and their proofs are formalized in the Coq proof assistant.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
QuAC: Quick Attribute-Centric Type Inference for Python
Jifeng Wu and
Caroline Lemieux
(University of British Columbia, Canada)
Python’s dynamic typing facilitates rapid prototyping and underlies its popularity in many domains. However, dynamic typing reduces the power of many static checking and bug-finding tools. Python type annotations can make these tools more useful. Type inference tools aim to reduce developers’ burden of adding them. However, existing type inference tools struggle to support dynamic features, infer correct types (especially container type parameters and non-builtin types), and run in reasonable time. Inspired by Python’s duck typing, where the attributes accessed on Python expressions characterize their implicit interfaces, we propose QuAC (Quick Attribute-Centric Type Inference for Python). At its core, QuAC collects attribute sets for Python expressions and leverages information retrieval techniques to predict classes from these attribute sets. It also recursively predicts container type parameters. We evaluate QuAC’s performance on popular Python projects. Compared to state-of-the-art non-LLM baselines, QuAC predicts types with high accuracy complementary to those predicted by the baselines while not sacrificing coverage. It also demonstrates clear advantages in predicting container type parameters and non-builtin types and reduces run times. Furthermore, QuAC is nearly two orders of magnitude faster than an LLM-based method while covering nearly half of its errorless non-trivial type predictions. It is also significantly more consistent at predicting container type parameters and non-builtin types than the LLM-based method, regardless of whether the project has ground-truth type annotations.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Automated Verification of Parametric Channel-Based Process Communication
Georgian-Vlad Saioc,
Julien Lange, and
Anders Møller
(Aarhus University, Denmark; Royal Holloway University of London, United Kingdom)
A challenge of writing concurrent message passing programs is ensuring the absence of partial deadlocks, which can cause severe memory leaks in long running systems. Several static analysis techniques have been proposed for automatically detecting partial deadlocks in Go programs. For a large enterprise code base, we found these tools too imprecise to reason about process communication that is parametric, i.e., where the number of channel communication operations or the channel capacities are determined at runtime.
We present a novel approach to automatically verify the absence of partial deadlocks in Go program fragments with such parametric process communication. The key idea is to translate Go fragments to a core language that is sufficiently expressive to represent real-world parametric communication patterns and can be encoded into Dafny programs annotated with postconditions enforcing partial deadlock freedom. In situations where a fragment is partial deadlock free only when the concurrency parameters satisfy certain conditions, a suitable precondition can often be inferred.
Experimental results on a real-world code base containing 583 program fragments that are beyond the reach of existing techniques have shown that the approach can verify the absence of partial deadlocks in 145 cases. For an additional 228 cases, a nontrivial precondition is inferred that the surrounding code must satisfy to ensure partial deadlock freedom.
Article Search
Artifacts Available
Artifacts Reusable
Modular Synthesis of Efficient Quantum Uncomputation
Hristo Venev,
Timon Gehr,
Dimitar Dimitrov, and
Martin Vechev
(Sofia University St. Kliment Ohridski, Bulgaria; ETH Zurich, Switzerland)
A key challenge of quantum programming is uncomputation: the reversible deallocation of qubits. And while there has been much recent progress on automating uncomputation, state-of-the-art methods are insufficient for handling today’s expressive quantum programming languages. A core reason is that they operate on primitive quantum circuits, while quantum programs express computations beyond circuits, for instance, they can capture families of circuits defined recursively in terms of uncomputation and adjoints.
In this paper, we introduce the first modular automatic approach to synthesize correct and efficient uncomputation for expressive quantum programs. Our method is based on two core technical contributions: (i) an intermediate representation (IR) that can capture expressive quantum programs and comes with support for uncomputation, and (ii) modular algorithms over that IR for synthesizing uncomputation and adjoints.
We have built a complete end-to-end implementation of our method, including an implementation of the IR and the synthesis algorithms, as well as a translation from an expressive fragment of the Silq programming language to our IR and circuit generation from the IR. Our experimental evaluation demonstrates that we can handle programs beyond the capabilities of existing uncomputation approaches, while being competitive on the benchmarks they can handle. More broadly, we show that it is possible to benefit from the greater expressivity and safety offered by high-level quantum languages without sacrificing efficiency.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Type Inference Logics
Denis Carnier,
François Pottier, and
Steven Keuchel
(KU Leuven, Belgium; Inria, France; Vrije Universiteit Brussel, Belgium)
Type inference is essential for statically-typed languages such as OCaml and Haskell.
It can be decomposed into two (possibly interleaved) phases: a generator converts programs to constraints; a solver decides whether a constraint is satisfiable.
Elaboration, the task of decorating a program with explicit type annotations, can also be structured in this way.
Unfortunately, most machine-checked implementations of type inference do not follow this phase-separated, constraint-based approach.
Those that do are rarely executable, lack effectful abstractions, and do not include elaboration.
To close the gap between common practice in real-world implementations and mechanizations inside proof assistants, we propose an approach that enables modular reasoning about monadic constraint generation in the presence of elaboration.
Our approach includes a domain-specific base logic for reasoning about metavariables and a program logic that allows us to reason abstractly about the meaning of constraints.
To evaluate it, we report on a machine-checked implementation of our techniques inside the Coq proof assistant.
As a case study, we verify both soundness and completeness for three elaborating type inferencers for the simply typed lambda calculus with Booleans.
Our results are the first demonstration that type inference algorithms can be verified in the same form as they are implemented in practice: in an imperative style, modularly decomposed into constraint generation and solving, and delivering elaborated terms to the remainder of the compiler chain.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Wasm-R3: Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks
Doehyun Baek,
Jakob Getz,
Yusung Sim,
Daniel Lehmann,
Ben L. Titzer,
Sukyoung Ryu, and
Michael Pradel
(KAIST, South Korea; University of Stuttgart, Germany; Google, Germany; Carnegie Mellon University, USA)
WebAssembly (Wasm for short) brings a new, powerful capability to the web as well as Edge, IoT, and embedded systems. Wasm is a portable, compact binary code format with high performance and robust sandboxing properties. As Wasm applications grow in size and importance, the complex performance characteristics of diverse Wasm engines demand robust, representative benchmarks for proper tuning. Stopgap benchmark suites, such as PolyBenchC and libsodium, continue to be used in the literature, though they are known to be unrepresentative. Porting of more complex suites remains difficult because Wasm lacks many system APIs and extracting real-world Wasm benchmarks from the web is difficult due to complex host interactions. To address this challenge, we introduce Wasm-R3, the first record and replay technique for Wasm. Wasm-R3 transparently injects instrumentation into Wasm modules to record an execution trace from inside the module, then reduces the execution trace via several optimizations, and finally produces a replay module that is executable standalone without any host environment-on any engine. The benchmarks created by our approach are (i) realistic, because the approach records real-world web applications, (ii) faithful to the original execution, because the replay benchmark includes the unmodified original code, only adding emulation of host interactions, and (iii) standalone, because the replay benchmarks run on any engine. Applying Wasm-R3 to web-based Wasm applications in the wild demonstrates the correctness of our approach as well as the effectiveness of our optimizations, which reduce the recorded traces by 99.53% and the size of the replay benchmark by 9.98%. We release the resulting benchmark suite of 27 applications, called Wasm-R3-Bench, to the community, to inspire a new generation of realistic and standalone Wasm benchmarks.
Article Search
Artifacts Available
Semantic-Type-Guided Bug Finding
Kelvin Qian,
Scott Smith,
Brandon Stride,
Shiwei Weng, and
Ke Wu
(Johns Hopkins University, USA)
In recent years, there has been an increased interest in tools that establish incorrectness rather than correctness of program properties. In this work we build on this approach by developing a novel methodology to prove incorrectness of semantic typing properties of functional programs, extending the incorrectness approach to the model theory of functional program typing. We define a semantic type refuter which refutes semantic typings for a simple functional language. We prove our refuter is co-recursively enumerable, and that it is sound and complete with respect to a semantic typing notion. An initial implementation is described which uses symbolic evaluation to efficiently find type errors over a functional language with a rich type system.
Article Search
Artifacts Available
Artifacts Functional
Control-Flow Deobfuscation using Trace-Informed Compositional Program Synthesis
Benjamin Mariano,
Ziteng Wang,
Shankara Pailoor,
Christian Collberg, and
Işıl Dillig
(University of Texas at Austin, USA; University of Arizona, USA)
Code deobfuscation, which attempts to simplify code that has been intentionally obfuscated to prevent understanding, is a critical technique for downstream security analysis tasks like malware detection. While there has been significant prior work on code deobfuscation, most techniques either do not handle control flow obfuscations that modify control flow or they target specific classes of control flow obfuscations, making them unsuitable for handling new types of obfuscations or combinations of existing ones. In this paper, we study a new deobfuscation technique that is based on program synthesis and that can handle a broad class of control flow obfuscations. Given an obfuscated program P, our approach aims to synthesize a smallest program that is a control-flow reduction of P and that is semantically equivalent. Since our method does not assume knowledge about the types of obfuscations that have been applied to the original program, the underlying synthesis problem ends up being very challenging. To address this challenge, we propose a novel trace-informed compositional synthesis algorithm that leverages hints present in dynamic traces of the obfuscated program to decompose the synthesis problem into a set of simpler subproblems. In particular, we show how dynamic traces can be useful for inferring a suitable control-flow skeleton of the deobfuscated program and performing independent synthesis of each basic block. We have implemented this approach in a tool called Chisel and evaluate it on 546 benchmarks that have been obfuscated using combinations of six different obfuscation techniques. Our evaluation shows that our approach is effective and that it produces code that is almost identical (modulo variable renaming) to the original (non-obfuscated) program in 86% of cases. Our evaluation also shows that Chisel significantly outperforms existing techniques.
Article Search
Artifacts Functional
Unifying Static and Dynamic Intermediate Languages for Accelerator Generators
Caleb Kim,
Pai Li,
Anshuman Mohan,
Andrew Butt,
Adrian Sampson, and
Rachit Nigam
(Cornell University, USA)
Compilers for accelerator design languages (ADLs) translate high-level languages into application-specific hardware. ADL compilers rely on a hardware control interface to compose hardware units. There are two choices: static control, which relies on cycle-level timing; or dynamic control, which uses explicit signalling to avoid depending on timing details. Static control is efficient but brittle; dynamic control incurs hardware costs to support compositional reasoning. Piezo is an ADL compiler that unifies static and dynamic control in a single intermediate language (IL). Its key insight is that the IL’s static fragment is a refinement of its dynamic fragment: static code admits a subset of the run-time behaviors of the dynamic equivalent. Piezo can optimize code by combining facts from static and dynamic submodules, and it opportunistically converts code from dynamic to static control styles. We implement Piezo as an extension to an existing dynamic ADL compiler, Calyx. We use Piezo to implement a frontend for an existing ADL, a systolic array generator, and a packet-scheduling hardware generator to demonstrate its optimizations and the static–dynamic interactions it enables.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
Mark–Scavenge: Waiting for Trash to Take Itself Out
Jonas Norlinder,
Erik Österlund,
David Black-Schaffer, and
Tobias Wrigstad
(Uppsala University, Sweden; Oracle, Sweden)
Moving garbage collectors (GCs) typically free memory by evacuating live objects in order to reclaim contiguous memory regions. Evacuation is typically done either during tracing (scavenging), or after tracing when identification of live objects is complete (mark–evacuate). Scavenging typically requires more memory (memory for all objects to be moved), but performs less work in sparse memory areas (single pass). This makes it attractive for collecting young objects. Mark–evacuate typically requires less memory and performs less work in memory areas with dense object clusters, by focusing relocation around sparse regions, making it attractive for collecting old objects. Mark–evacuate also completes identification of live objects faster, making it attractive for concurrent GCs that can reclaim memory immediately after identification of live objects finishes (as opposed to when evacuation finishes), at the expense of more work compared to scavenging, for young objects. We propose an alternative approach for concurrent GCs to combine the benefits of scavenging with the benefits of mark–evacuate, for young objects. The approach is based on the observation that by the time young objects are relocated by a concurrent GC, they are likely to already be unreachable. By performing relocation lazily, most of the relocations in the defragmentation phase of mark–evacuate can typically be eliminated. Similar to scavenging, objects are relocated during tracing with the proposed approach. However, instead of relocating all objects that are live in the current GC cycle, it lazily relocates profitable sparse object clusters that survived from the previous GC cycle. This turns the memory headroom that concurrent GCs typically “waste” in order to safely avoid running out of memory before GC finishes, into an asset used to eliminate much of the relocation work, which constitutes a significant portion of the GC work. We call this technique mark–scavenge and implement it on-top of ZGC in OpenJDK in a collector we call MS-ZGC. We perform a performance evaluation that compares MS-ZGC against ZGC. The most striking result is (up to) 91% reduction in relocation of dead objects (depending on machine-dependent factors).
Article Search
Artifacts Available
Compositionality and Observational Refinement for Linearizability with Crashes
Arthur Oliveira Vale,
Zhongye Wang,
Yixuan Chen,
Peixin You, and
Zhong Shao
(Yale University, USA)
Crash-safety is an important property of real systems, as the main functionality of some systems is resilience to crashes. Toward a compositional verification approach for crash-safety under full-system crashes, one observes that crashes propagate instantaneously to all components across all levels of abstraction, even to unspecified components, hindering compositionality. Furthermore, in the presence of concurrency, a correctness criterion that addresses both crashes and concurrency proves necessary. For this, several adaptations of linearizability have been suggested, each featuring different trade-offs between complexity and expressiveness. The recently proposed compositional linearizability framework shows that to achieve compositionality with linearizability, both a locality and observational refinement property are necessary. Despite that, no linearizability criterion with crashes has been proven to support an observational refinement property. In this paper, we define a compositional model of concurrent computation with full-system crashes. We use this model to develop a compositional theory of linearizability with crashes, which reveals a criterion, crash-aware linearizability, as its inherent notion of linearizability and supports both locality and observational refinement. We then show that strict linearizability and durable linearizability factor through crash-aware linearizability as two different ways of translating between concurrent computation with and without crashes, enabling simple proofs of locality and observational refinement for a generalization of these two criteria. Then, we show how the theory can be connected with a program logic for durable and crash-aware linearizability, which gives the first program logic that verifies a form of linearizability with crashes. We showcase the advantages of compositionality by verifying a library facilitating programming persistent data structures and a fragment of a transactional interface for a file system.
Article Search
Making Sense of Multi-threaded Application Performance at Scale with NonSequitur
Augustine Wong,
Paul Bucci,
Ivan Beschastnikh, and
Alexandra Fedorova
(University of British Columbia, Canada)
Modern multi-threaded systems are highly complex. This makes their behavior difficult to understand. Developers frequently capture behavior in the form of program traces and then manually inspect these traces. Existing tools, however, fail to scale to traces larger than a million events.
In this paper we present an approach to compress multi-threaded traces in order to allow developers to visually explore these traces at scale. Our approach is able to compress traces that contain millions of events down to a few hundred events. We use this approach to design and implement a tool called NonSequitur.
We present three case studies which demonstrate how we used NonSequitur to analyze real-world performance issues with Meta's storage engine RocksDB and MongoDB's storage engine WiredTiger, two complex database backends. We also evaluate NonSequitur with 42 participants on traces from RocksDB and WiredTiger. We demonstrate that, in some cases, participants on average scored 11 times higher when performing performance analysis tasks on large execution traces. Additionally, for some performance analysis tasks, the participants spent on average three times longer with other tools than with NonSequitur.
Article Search
Video
Info
Artifacts Available
Artifacts Functional
Results Reproduced
Dependency-Aware Code Naturalness
Chen Yang,
Junjie Chen,
Jiajun Jiang, and
Yuliang Huang
(Tianjin University, China)
Code naturalness, which captures repetitiveness and predictability in programming languages, has proven valuable for various code-related tasks in software engineering. However, precisely measuring code naturalness remains a fundamental challenge. Existing methods measure code naturalness over individual lines of code while ignoring the deep semantic relations among different lines, e.g., program dependency, which may negatively affect the precision of the measure. Despite the intuitive appeal of extending the code naturalness measure to the code dependency domain (as there are some work that have initiated the utilization of code dependency for diverse code-related tasks), this assumption remains unexplored and warrants direct investigation. In this study, we aim to perform the first empirical study to investigate whether incorporating code dependency, instead of analyzing individual lines, can enhance the precision of measuring code naturalness.
To achieve that, we first propose a new method named DAN for measuring code naturalness by incorporating the rich dependency information in the code. Specifically, DAN extracts multiple sequences of code lines by traversing the program dependency graph, where different code lines are connected by dependencies in each sequence, and then the code naturalness will be measured by taking each sequence as a whole. In this way, the dependency information can be well captured. Finally, we have conducted an extensive study to evaluate the influence of code dependency for measuring code naturalness with DAN, and compared it with the state-of-the-art methods under three emerging application scenarios of code naturalness. The results demonstrate that DAN can not only better distinguish natural and unnatural code, but also substantially boost two important downstream applications of code naturalness, i.e., distinguishing buggy and non-buggy code lines and data cleansing for training better code models, reflecting the significance of code dependency in measuring code naturalness.
Article Search
Artifacts Available
Artifacts Functional
Validating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration
Dominik Winterer and
Zhendong Su
(ETH Zurich, Switzerland)
We introduce ET, a grammar-based enumerator for validating SMT solver correctness and performance. By compiling grammars of the SMT theories to algebraic datatypes, ET leverages the functional enumerator FEAT. ET is highly effective at bug finding and has many complimentary benefits. Despite the extensive and continuous testing of the state-of-the-art SMT solvers Z3 and cvc5, ET found 102 bugs, out of which 76 were confirmed and 32 were fixed. Moreover, ET can be used to understand the evolution of solvers. We derive eight grammars realizing all major SMT theories including the booleans, integers, reals, realints, bit-vectors, arrays, floating points, and strings. Using ET, we test all consecutive releases of the SMT solvers Z3 and CVC4/cvc5 from the last six years (61 versions) on 8 million formulas, and 488 million solver calls. Our results suggest improved correctness in recent versions of both solvers but decreased performance in newer releases of Z3 on small timeouts (since z3-4.8.11) and regressions in early cvc5 releases on larger timeouts. Due to its systematic testing and efficiency, we further advocate ET's use for continuous integration.
Article Search
Practical Verification of Smart Contracts using Memory Splitting
Shelly Grossman,
John Toman,
Alexander Bakst,
Sameer Arora,
Mooly Sagiv, and
Chandrakana Nandi
(Certora, Israel; Certora, USA)
SMT-based verification of low-level code requires modeling and reasoning about memory operations. Prior work has shown that optimizing memory representations is beneficial for scaling verification—pointer analysis, for example can be used to split memory into disjoint regions leading to faster SMT solving. However, these techniques are mostly designed for C and C++ programs with explicit operations for memory allocation which are not present in all languages. For instance, on the Ethereum virtual machine, memory is simply a monolithic array of bytes which can be freely accessed by Ethereum bytecode, and there is no allocation primitive.
In this paper, we present a memory splitting transformation guided by a conservative memory analysis for Ethereum bytecode generated by the Solidity compiler. The analysis consists of two phases: recovering memory allocation and memory regions, followed by a pointer analysis. The goal of the analysis is to enable memory splitting which in turn speeds up verification. We implemented both the analysis and the memory splitting transformation as part of a verification tool, CertoraProver, and show that the transformation speeds up SMT solving by up to 120x and additionally mitigates 16 timeouts when used on 229 real-world smart contract verification tasks.
Article Search
Artifacts Available
Sound and Partially-Complete Static Analysis of Data-Races in GPU Programs
Dennis Liew,
Tiago Cogumbreiro, and
Julien Lange
(University of Massachusetts, Boston, USA; Royal Holloway University of London, United Kingdom)
GPUs are progressively being integrated into modern society, playing a pivotal role in Artificial Intelligence
and High-Performance Computing. Programmers need a deep understanding of the GPU programming model
to avoid subtle data-races in their codes. Static verification that is sound and incomplete can guarantee
data-race freedom, but the alarms it raises may be spurious and need to be validated.
In this paper, we establish a True Positive Theorem for a static data-race detector for GPU programs, i.e., a
result that identifies a class of programs for which our technique only raises true alarms. Our work builds on
the formalism of memory access protocols, that models the concurrency operations of CUDA programs. The
crux of our approach is an approximation analysis that can correctly identify true alarms, and pinpoint the
conditions that make an alarm imprecise. Our approximation analysis detects when the reported locations are
reachable (control independence, or CI), and when the reported locations are precise (data independence, or
DI), as well identify inexact values in an alarm (Theorem 4.3). In addition to a True Positive result for programs
that are CI and DI (Theorem 4.5), we establish the root causes of spurious alarms depending on whether CI or
DI are present (Corollary 4.4).
We apply our theory to introduce FaialAA, the first sound and partially complete data-race detector. We
evaluate FaialAA in three experiments. First, in a comparative study with the state-of-the-art tools, we show that
FaialAA confirms more DRF programs than others while emitting 1.9× fewer potential alarms. Importantly, the
approximation analysis of FaialAA detects 10 undocumented data-races. Second, in an experiment studying 6
commits of data-race fixes in open source projects OpenMM and Nvidia’s MegaTron, FaialAA confirmed the
buggy and fixed versions of 5 commits, while others were only able to confirm 2. Third, we show that 59.5% of
2,770 programs are CI and DI, quantifying when the approximation analysis of FaialAA is complete.
This paper is accompanied by the mechanized proofs of the theoretical results presented therein and a tool
(FaialAA) implementing of our theory.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Effect Handlers for C via Coroutines
Mario Alvarez-Picallo,
Teodoro Freund,
Dan R. Ghica, and
Sam Lindley
(Huawei Research Centre, United Kingdom; University of Birmingham, United Kingdom; University of Edinburgh, United Kingdom)
Effect handlers provide a structured means for implementing user-defined, composable, and customisable computational effects, ranging from exceptions to generators to lightweight threads. We introduce libseff, a novel effect handlers library for C, based on coroutines. Whereas prior effect handler libraries for C are intended primarily as compilation targets, libseff is intended to be used directly from C programs. As such, the design of libseff parts ways from traditional effect handler implementations, both by using mutable coroutines as the main representation of pending computations, and by avoiding closures as handlers by way of reified effects. We show that the performance of libseff is competitive across a range of platforms and benchmarks.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
When Your Infrastructure Is a Buggy Program: Understanding Faults in Infrastructure as Code Ecosystems
Georgios-Petros Drosos,
Thodoris Sotiropoulos,
Georgios Alexopoulos,
Dimitris Mitropoulos, and
Zhendong Su
(ETH Zurich, Switzerland; University of Athens, Greece)
Modern applications have become increasingly complex and their manual installation and configuration is
no longer practical. Instead, IT organizations heavily rely on Infrastructure as Code (IaC) technologies, to automate the provisioning, configuration, and maintenance of computing infrastructures and systems. IaC systems typically offer declarative, domain-specific languages (DSLs) that allow system administrators and developers to write high-level programs that specify the desired state of their infrastructure in a reliable, predictable, and documented fashion. Just like traditional programs, IaC software is not immune to faults, with issues ranging from deployment failures to critical misconfigurations that often impact production systems used by millions of end users. Surprisingly, despite its crucial role in global infrastructure management, the tooling and techniques for ensuring IaC reliability still have room for improvement.
In this work, we conduct a comprehensive analysis of 360 bugs identified in IaC software within prominent IaC ecosystems including Ansible, Puppet, and Chef. Our work is the first in-depth exploration of bug characteristics in these widely-used IaC environments. Through our analysis we aim to understand: (1) how these bugs manifest, (2) their underlying root causes, (3) their reproduction requirements in terms of system state (e.g., operating system versions) or input characteristics, and (4) how these bugs are fixed. Based on our findings, we evaluate the state-of-the-art techniques for IaC reliability, identify their limitations, and provide a set of recommendations for future research. We believe that our study helps researchers to (1) better understand the complexity and peculiarities of IaC software, and (2) develop advanced tooling for more reliable and robust system configurations.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
A Case for First-Class Environments
Jinhao Tan and
Bruno C. d. S. Oliveira
(University of Hong Kong, China)
Formalizations of programming languages typically adopt the
substitution model from the lambda calculus. However, substitution
creates notorious complications for reasoning and implementation. Furthermore,
it is disconnected from practical implementations, which normally adopt
environments and closures.
In this paper we advocate for formalizing programming languages using
a novel style of small-step environment-based semantics, which avoids
substitution and is closer to implementations.
We present a call-by-value statically typed calculus, called λE, using
our small-step environment semantics. With our
alternative environment semantics
programming language constructs for first-class environments arise naturally, without creating significant additional
complexity. Therefore, λE also adopts first-class environments,
adding expressive power that is not available in conventional lambda
calculi. λE is a conservative
extension of the call-by-value Simply Typed Lambda Calculus (STLC),
and employs de Bruijn indices for its formalization, which fit
naturally with the environment-based semantics.
Reasoning about λE is simple, and in many cases
simpler than reasoning about the traditional STLC. We show an abstract
machine that implements the semantics of λE, and has an easy
correctness proof. We also extend λE with references. We show that
λE can model a simple form of first-class modules, and suggest using first-class
environments as an alternative to objects for modelling capabilities.
All technical results are formalized in the Coq proof assistant. In summary, our work shows that the small-step environment semantics that we adopt has three main and orthogonal benefits: 1) it simplifies the notorious binding problem in
formalizations and proof assistants; 2) it is closer to implementations;
and 3) additional
expressive power is obtained from first-class environments almost for free.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Fast and Optimal Extraction for Sparse Equality Graphs
Amir Kafshdar Goharshady,
Chun Kit Lam, and
Lionel Parreaux
(Hong Kong University of Science and Technology, China)
Equality graphs (e-graphs) are used to compactly represent equivalence classes of terms in symbolic reasoning systems. Beyond their original roots in automated theorem proving, e-graphs have been used in a variety of applications. They have become particularly important as the key ingredient in the popular technique of equality saturation, which has notable applications in compiler optimization, program synthesis, program verification, and symbolic execution, among others. In a typical equality saturation workflow, an e-graph is used to store a large number of equalities that are generated by local rewrites during a saturation phase, after which an optimal term is extracted from the e-graph as the output of the technique. However, despite its crucial role in equality saturation, e-graph extraction has received relatively little attention in the literature, which we seek to start addressing in this paper. Extraction is a challenging problem and is notably known to be NP-hard in general, so current equality saturation tools rely either on slow optimal extraction algorithms based on integer linear programming (ILP) or on heuristics that may not always produce the optimal result. In fact, in this paper, we show that e-graph extraction is hard to approximate within any constant ratio. Thus, any such heuristic will produce wildly suboptimal results in the worst case. Fortunately, we show that the problem becomes tractable when the e-graph is sparse, which is the case in many practical applications. We present a novel parameterized algorithm for extracting optimal terms from e-graphs with low treewidth, a measure of how “tree-like” a graph is, and prove its correctness. We also present an efficient Rust implementation of our algorithm and evaluate it against ILP on a number of benchmarks extracted from the Cranelift benchmark suite, a real-world compiler optimization library based on equality saturation. Our algorithm optimally extracts e-graphs with treewidths of up to 10 in a fraction of the time taken by ILP. These results suggest that our algorithm can be a valuable tool for equality saturation users who need to extract optimal terms from sparse e-graphs.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
Automated Robustness Verification of Concurrent Data Structure Libraries against Relaxed Memory Models
Kartik Nagar,
Anmol Sahoo,
Romit Roy Chowdhury, and
Suresh Jagannathan
(IIT Madras, India; Purdue University, USA; Chennai Mathematical Institute, India)
Clients reason about the behavior of concurrent data structure libraries such as sets, queues, or stacks using specifications that capture well-understood correctness conditions, such as linearizability. The implementation of these libraries, however, focused as they are on performance, may additionally exploit relaxed memory behavior allowed by the language or underlying hardware that weaken the strong ordering and visibility constraints on shared-memory accesses that would otherwise be imposed by a sequentially consistent (SC) memory model. As an alternative to developing new specification and verification mechanisms for reasoning about libraries under relaxed memory model, we instead consider the orthogonal problem of library robustness, a property that holds when all possible behaviors of a library implementation under relaxed memory model are also possible under SC. In this paper, we develop a new automated technique for verifying robustness of library implementations in the context of a C11-style memory model. This task is challenging because a most-general client may invoke an unbounded number of concurrently executing library operations that can manipulate an unbounded number of shared locations. We establish a novel inductive technique for verifying library robustness that leverages prior work on the robustness problem for the C11 memory model based on the search for a non-robustness witness under SC executions. We crucially rely on the fact that this search is carried out over SC executions, and use high-level SC specifications (including linearizability) of the library to verify the absence of a non-robustness witness. Our technique is compositional - we show how we can safely preserve robustness of multiple interacting library implementations and clients using additional SC fences to guarantee robustness of entire executions. Experimental results on a number of complex realistic library implementations demonstrate the feasibility of our approach.
Article Search
Artifacts Available
Artifacts Reusable
Results Reproduced
The ART of Sharing Points-to Analysis: Reusing Points-to Analysis Results Safely and Efficiently
Shashin Halalingaiah,
Vijay Sundaresan,
Daryl Maier, and
V. Krishna Nandivada
(IIT Madras, India; University of Texas at Austin, USA; IBM, Canada)
Data-flow analyses like points-to analysis can vastly improve the precision of other analyses, and enable powerful code optimizations. However, whole-program points-to analysis of large Java programs tends to be expensive – both in terms of time and memory. Consequently, many compilers (both static and JIT) and program-analysis tools tend to employ faster – but more conservative – points-to analyses to improve usability. As an alternative to such trading of precision for performance, various techniques have been proposed to perform precise yet expensive fixed-point points-to analyses ahead of time in a static analyzer, store the results, and then transmit them to independent compilation/program-analysis stages that may need them. However, an underlying concern of safety affects all such techniques – can a compiler (or program analysis tool) trust the points-to analysis results generated by another compiler/tool?
In this work, we address this issue of trust in the context of Java, while accounting for the issue of performance. We propose ART: Analysis-results Representation Template – a novel scheme to efficiently and concisely encode results of flow-sensitive, context-insensitive points-to analysis computed by a static analyzer for use in any independent system that may benefit from such a precise points-to analysis. ART also allows for fast regeneration of the encoded sound analysis results in such systems. Our scheme has two components: (i) a producer that can statically perform expensive points-to analysis and encode the same concisely, (ii) a consumer that, on receiving such encoded results (called artwork), can regenerate the points-to analysis results encoded by the artwork if it is deemed “safe”. The regeneration scheme completely avoids fixed-point computations and thus can help consumers like static analyzers and JIT compilers to obtain precise points-to information without paying a prohibitively high cost. We demonstrate the usage of ART by implementing a producer (in Soot) and two consumers (in Soot and the Eclipse OpenJ9 JIT compiler). We have evaluated our implementation over various benchmarks from the DaCapo and SPECjvm2008 suites. Our results demonstrate that using ART, a consumer can obtain precise flow-sensitive, context-insensitive points-to analysis results in less than (average) 1% of the time taken by a static analyzer to perform the same analysis, with the storage overhead of ART representing a small fraction of the program size (average around 4%).
Article Search
Info
Boosting the Performance of Alias-Aware IFDS Analysis with CFL-Based Environment Transformers
Haofeng Li,
Chenghang Shi,
Jie Lu,
Lian Li, and
Jingling Xue
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Zhongguancun Laboratory, China; UNSW, Australia)
The IFDS algorithm is pivotal in solving field-sensitive data-flow problems. However, its conventional use of access paths for field sensitivity leads to the generation of a large number of data-flow facts. This causes scalability challenges in larger programs, limiting its practical application in extensive codebases. In response, we propose a new field-sensitive technique that reinterprets the generation of access paths as a Context-Free Language (CFL) for field-sensitivity and formulates it as an IDE problem. This approach significantly reduces the number of data-flow facts generated and handled during the analysis, which is a major factor in performance degradation. To demonstrate the effectiveness of this approach, we developed a taint analysis tool, IDEDroid, in the IFDS/IDE framework. IDEDroid outperforms FlowDroid, an established IFDS-based taint analysis tool, in the analysis of 24 major Android apps while improving its precision (guaranteed theoretically). The speed improvement ranges from 2.1× to 2,368.4×, averaging at 222.0×, with precision gains reaching up to 20.0% (in terms of false positives reduced). This performance indicates that IDEDroid is substantially more effective in detecting information-flow leaks, making it a potentially superior tool for mobile app vetting in the market.
Article Search
Artifacts Available
Artifacts Functional
Results Reproduced
Higher-Order Model Checking of Effect-Handling Programs with Answer-Type Modification
Taro Sekiyama and
Hiroshi Unno
(National Institute of Informatics, Japan; SOKENDAI, Japan; Tohoku University, Japan)
Model checking is one of the successful program verification methodologies. Since the seminal work by Ong, the model checking of higher-order programs―called higher-order model checking, or HOMC for short―has gained attention. It is also crucial for making HOMC applicable to real-world software to address programs involving computational effects. Recently, Dal Lago and Ghyselen considered an extension of HOMC to algebraic effect handlers, which enable programming the semantics of effects. They showed a negative result for HOMC with algebraic effect handlers―it is undecidable. In this work, we explore a restriction on programs with algebraic effect handlers which ensures the decidability of HOMC while allowing implementations of various effects. We identify the crux of the undecidability as the use of an unbounded number of algebraic effect handlers being active at the same time. To prevent it, we introduce answer-type modification (ATM), which can bound the number of algebraic effect handlers that can be active at the same time. We prove that ATM can ensure the decidability of HOMC and show that it accommodates a wide range of effects. To evaluate our approach, we implemented an automated verifier EffCaml based on the presented techniques and confirmed that the program examples discussed in this paper can be automatically verified.
Article Search
Info
proc time: 20.61