POPL 2023
Proceedings of the ACM on Programming Languages, Volume 7, Number POPL
Powered by
Conference Publishing Consulting

Proceedings of the ACM on Programming Languages, Volume 7, Number POPL, January 15–21, 2023, Boston, MA, USA

POPL – Journal Issue

Contents - Abstracts - Authors


Title Page

Editorial Message
The Proceedings of the ACM series presents the highest-quality research conducted in diverse areas of computer science, as represented by the ACM Special Interest Groups (SIGs). The ACM Proceedings of the ACM on Programming Languages (PACMPL) focuses on research on all aspects of programming languages, from design to implementation and from mathematical formalisms to empirical studies. The journal operates in close collaboration with the Special Interest Group on Programming Languages (SIGPLAN) and is committed to making high-quality peer-reviewed scientific research in programming languages free of restrictions on both access and use.



CN: Verifying Systems C Code with Separation-Logic Refinement Types
Christopher PulteORCID logo, Dhruv C. Makwana ORCID logo, Thomas Sewell ORCID logo, Kayvan Memarian ORCID logo, Peter Sewell ORCID logo, and Neel Krishnaswami ORCID logo
(University of Cambridge, UK)
Despite significant progress in the verification of hypervisors, operating systems, and compilers, and in verification tooling, there exists a wide gap between the approaches used in verification projects and conventional development of systems software. We see two main challenges in bringing these closer together: verification handling the complexity of code and semantics of conventional systems software, and verification usability.
We describe an experiment in verification tool design aimed at addressing some aspects of both: we design and implement CN, a separation-logic refinement type system for C systems software, aimed at predictable proof automation, based on a realistic semantics of ISO C. CN reduces refinement typing to decidable propositional logic reasoning, uses first-class resources to support pointer aliasing and pointer arithmetic, features resource inference for iterated separating conjunction, and uses a novel syntactic restriction of ghost variables in specifications to guarantee their successful inference. We implement CN and formalise key aspects of the type system, including a soundness proof of type checking. To demonstrate the usability of CN we use it to verify a substantial component of Google's pKVM hypervisor for Android.

Publisher's Version
Step-Indexed Logical Relations for Countable Nondeterminism and Probabilistic Choice
Alejandro Aguirre ORCID logo and Lars BirkedalORCID logo
(Aarhus University, Denmark)
Developing denotational models for higher-order languages that combine probabilistic and nondeterministic choice is known to be very challenging. In this paper, we propose an alternative approach based on operational techniques. We study a higher-order language combining parametric polymorphism, recursive types, discrete probabilistic choice and countable nondeterminism. We define probabilistic generalizations of may- and must-termination as the optimal and pessimal probabilities of termination. Then we define step-indexed logical relations and show that they are sound and complete with respect to the induced contextual preorders. For may-equivalence we use step-indexing over the natural numbers whereas for must-equivalence we index over the countable ordinals. We then show than the probabilities of may- and must-termination coincide with the maximal and minimal probabilities of termination under all schedulers. Finally we derive the equational theory induced by contextual equivalence and show that it validates the distributive combination of the algebraic theories for probabilistic and nondeterministic choice.

Publisher's Version
A Type-Based Approach to Divide-and-Conquer Recursion in Coq
Pedro Abreu ORCID logo, Benjamin DelawareORCID logo, Alex Hubers ORCID logo, Christa Jenkins ORCID logo, J. Garrett Morris ORCID logo, and Aaron Stump ORCID logo
(Purdue University, USA; University of Iowa, USA)
This paper proposes a new approach to writing and verifying divide-and-conquer programs in Coq. Extending the rich line of previous work on algebraic approaches to recursion schemes, we present an algebraic approach to divide-and-conquer recursion: recursions are represented as a form of algebra, and from outer recursions, one may initiate inner recursions that can construct data upon which the outer recursions may legally recurse. Termination is enforced entirely by the typing discipline of our recursion schemes. Despite this, our approach requires little from the underlying type system, and can be implemented in System Fω plus a limited form of positive-recursive types. Our implementation of the method in Coq does not rely on structural recursion or on dependent types. The method is demonstrated on several examples, including mergesort, quicksort, Harper’s regular-expression matcher, and others. An indexed version is also derived, implementing a form of divide-and-conquer induction that can be used to reason about functions defined via our method.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Comparative Synthesis: Learning Near-Optimal Network Designs by Query
Yanjun Wang ORCID logo, Zixuan Li ORCID logo, Chuan Jiang ORCID logo, Xiaokang QiuORCID logo, and Sanjay Rao ORCID logo
(Purdue University, USA)
When managing wide-area networks, network architects must decide how to balance multiple conflicting metrics, and ensure fair allocations to competing traffic while prioritizing critical traffic. The state of practice poses challenges since architects must precisely encode their intent into formal optimization models using abstract notions such as utility functions, and ad-hoc manually tuned knobs. In this paper, we present the first effort to synthesize optimal network designs with indeterminate objectives using an interactive program-synthesis-based approach. We make three contributions. First, we present comparative synthesis, an interactive synthesis framework which produces near-optimal programs (network designs) through two kinds of queries (Validate and Compare), without an objective explicitly given. Second, we develop the first learning algorithm for comparative synthesis in which a voting-guided learner picks the most informative query in each iteration. We present theoretical analysis of the convergence rate of the algorithm. Third, we implemented Net10Q, a system based on our approach, and demonstrate its effectiveness on four real-world network case studies using black-box oracles and simulation experiments, as well as a pilot user study comprising network researchers and practitioners. Both theoretical and experimental results show the promise of our approach.

Publisher's Version Archive submitted (1.3 MB) Artifacts Functional
ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs
Alexander K. LewORCID logo, Mathieu Huot ORCID logo, Sam StatonORCID logo, and Vikash K. MansinghkaORCID logo
(Massachusetts Institute of Technology, USA; University of Oxford, UK)
Optimizing the expected values of probabilistic processes is a central problem in computer science and its applications, arising in fields ranging from artificial intelligence to operations research to statistical computing. Unfortunately, automatic differentiation techniques developed for deterministic programs do not in general compute the correct gradients needed for widely used solutions based on gradient-based optimization.
In this paper, we present ADEV, an extension to forward-mode AD that correctly differentiates the expectations of probabilistic processes represented as programs that make random choices. Our algorithm is a source-to-source program transformation on an expressive, higher-order language for probabilistic computation, with both discrete and continuous probability distributions. The result of our transformation is a new probabilistic program, whose expected return value is the derivative of the original program’s expectation. This output program can be run to generate unbiased Monte Carlo estimates of the desired gradient, that can be used within the inner loop of stochastic gradient descent. We prove ADEV correct using logical relations over the denotations of the source and target probabilistic programs. Because it modularly extends forward-mode AD, our algorithm lends itself to a concise implementation strategy, which we exploit to develop a prototype in just a few dozen lines of Haskell (https://github.com/probcomp/adev).

Publisher's Version Archive submitted (1 MB)
HFL(Z) Validity Checking for Automated Program Verification
Naoki KobayashiORCID logo, Kento Tanahashi ORCID logo, Ryosuke Sato ORCID logo, and Takeshi Tsukada ORCID logo
(University of Tokyo, Japan; Chiba University, Japan)
We propose an automated method for checking the validity of a formula of HFL(Z), a higher-order logic with fixpoint operators and integers. Combined with Kobayashi et al.'s reduction from higher-order program verification to HFL(Z) validity checking, our method yields a fully automated, uniform verification method for arbitrary temporal properties of higher-order functional programs expressible in the modal mu-calculus, including termination, non-termination, fair termination, fair non-termination, and also branching-time properties. We have implemented our method and obtained promising experimental results.

Publisher's Version
From SMT to ASP: Solver-Based Approaches to Solving Datalog Synthesis-as-Rule-Selection Problems
Aaron Bembenek ORCID logo, Michael Greenberg ORCID logo, and Stephen Chong ORCID logo
(Harvard University, USA; Stevens Institute of Technology, USA)
Given a set of candidate Datalog rules, the Datalog synthesis-as-rule-selection problem chooses a subset of these rules that satisfies a specification (such as an input-output example). Building off prior work using counterexample-guided inductive synthesis, we present a progression of three solver-based approaches for solving Datalog synthesis-as-rule-selection problems. Two of our approaches offer some advantages over existing approaches, and can be used more generally to solve arbitrary SMT formulas containing Datalog predicates; the third—an encoding into standard, off-the-shelf answer set programming (ASP)—leads to significant speedups (∼ 9× geomean) over the state of the art while synthesizing higher quality programs.
Our progression of solutions explores the space of interactions between SAT/SMT and Datalog, identifying ASP as a promising tool for working with and reasoning about Datalog. Along the way, we identify Datalog programs as monotonic SMT theories, which enjoy particularly efficient interactions in SMT; our plugins for popular SMT solvers make it easy to load an arbitrary Datalog program into the SMT solver as a custom monotonic theory. Finally, we evaluate our approaches using multiple underlying solvers to provide a more thorough and nuanced comparison against the current state of the art.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Why Are Proofs Relevant in Proof-Relevant Models?
Axel Kerinec ORCID logo, Giulio Manzonetto ORCID logo, and Federico Olimpieri ORCID logo
(Université Sorbonne Paris Nord, France; LIPN, France; CNRS, France; University of Leeds, UK)
Relational models of λ-calculus can be presented as type systems, the relational interpretation of a λ-term being given by the set of its typings. Within a distributors-induced bicategorical semantics generalizing the relational one, we identify the class of ‘categorified’ graph models and show that they can be presented as type systems as well. We prove that all the models living in this class satisfy an Approximation Theorem stating that the interpretation of a program corresponds to the filtered colimit of the denotations of its approximants. As in the relational case, the quantitative nature of our models allows to prove this property via a simple induction, rather than using impredicative techniques. Unlike relational models, our 2-dimensional graph models are also proof-relevant in the sense that the interpretation of a λ-term does not contain only its typings, but the whole type derivations. The additional information carried by a type derivation permits to reconstruct an approximant having the same type in the same environment. From this, we obtain the characterization of the theory induced by the categorified graph models as a simple corollary of the Approximation Theorem: two λ-terms have isomorphic interpretations exactly when their B'ohm trees coincide.

Publisher's Version
Formally Verified Native Code Generation in an Effectful JIT: Turning the CompCert Backend into a Formally Verified JIT Compiler
Aurèle Barrière ORCID logo, Sandrine Blazy ORCID logo, and David Pichardie ORCID logo
(University of Rennes, France; Inria, France; CNRS, France; IRISA, France; Meta, France)
Modern Just-in-Time compilers (or JITs) typically interleave several mechanisms to execute a program. For faster startup times and to observe the initial behavior of an execution, interpretation can be initially used. But after a while, JITs dynamically produce native code for parts of the program they execute often. Although some time is spent compiling dynamically, this mechanism makes for much faster times for the remaining of the program execution. Such compilers are complex pieces of software with various components, and greatly rely on a precise interplay between the different languages being executed, including on-stack-replacement. Traditional static compilers like CompCert have been mechanized in proof assistants, but JITs have been scarcely formalized so far, partly due to their impure nature and their numerous components. This work presents a model JIT with dynamic generation of native code, implemented and formally verified in Coq. Although some parts of a JIT cannot be written in Coq, we propose a proof methodology to delimit, specify and reason on the impure effects of a JIT. We argue that the daunting task of formally verifying a complete JIT should draw on existing proofs of native code generation. To this end, our work successfully reuses CompCert and its correctness proofs during dynamic compilation. Finally, our prototype can be extracted and executed.

Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable
On the Expressive Power of String Constraints
Joel D. Day ORCID logo, Vijay Ganesh ORCID logo, Nathan Grewal ORCID logo, and Florin Manea ORCID logo
(Loughborough University, UK; University of Waterloo, Canada; University of Göttingen, Germany)
We investigate properties of strings which are expressible by canonical types of string constraints. Specifically, we consider a landscape of 20 logical theories, whose syntax is built around combinations of four common elements of string constraints: language membership (e.g. for regular languages), concatenation, equality between string terms, and equality between string-lengths. For a variable x and formula f from a given theory, we consider the set of values for which x may be substituted as part of a satisfying assignment, or in other words, the property f expresses through x. Since we consider string-based logics, this set is a formal language. We firstly consider the relative expressive power of different combinations of string constraints by comparing the classes of languages expressible in the corresponding theories, and are able to establish a mostly complete picture in this regard. Secondly, we consider the question of deciding whether the language or property expressed by a variable/formula in one theory can be expressed in another theory. We establish several negative results which are relevant to preprocessing and normalisation of string constraints in practice. Some of our results have strong connections to important open problems regarding word equations and the theory of string solving.

Publisher's Version
Proto-Quipper with Dynamic Lifting
Peng Fu ORCID logo, Kohei Kishida ORCID logo, Neil J. Ross ORCID logo, and Peter Selinger ORCID logo
(Dalhousie University, Canada; University of Illinois at Urbana-Champaign, USA)
Quipper is a functional programming language for quantum computing. Proto-Quipper is a family of languages aiming to provide a formal foundation for Quipper. In this paper, we extend Proto-Quipper-M with a construct called dynamic lifting, which is present in Quipper. By virtue of being a circuit description language, Proto-Quipper has two separate runtimes: circuit generation time and circuit execution time. Values that are known at circuit generation time are called parameters, and values that are known at circuit execution time are called states. Dynamic lifting is an operation that enables a state, such as the result of a measurement, to be lifted to a parameter, where it can influence the generation of the next portion of the circuit. As a result, dynamic lifting enables Proto-Quipper programs to interleave classical and quantum computation. We describe the syntax of a language we call Proto-Quipper-Dyn. Its type system uses a system of modalities to keep track of the use of dynamic lifting. We also provide an operational semantics, as well as an abstract categorical semantics for dynamic lifting based on enriched category theory. We prove that both the type system and the operational semantics are sound with respect to our categorical semantics. Finally, we give some examples of Proto-Quipper-Dyn programs that make essential use of dynamic lifting.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference
Wonyeol Lee ORCID logo, Xavier Rival ORCID logo, and Hongseok Yang ORCID logo
(Stanford University, USA; Inria, France; ENS, France; CNRS, France; PSL University, France; KAIST, South Korea; IBS, South Korea)
We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program, and show how the analysis can be used to improve the pathwise gradient estimator, one of the most popular methods for posterior inference and model learning. Our improvement increases the scope of the estimator from differentiable models to non-differentiable ones without requiring manual intervention of the user; the improved estimator automatically identifies differentiable parts of a given probabilistic program using our static analysis, and applies the pathwise gradient estimator to the identified parts while using a more general but less efficient estimator, called score estimator, for the rest of the program. Our analysis has a surprisingly subtle soundness argument, partly due to the misbehaviours of some target smoothness properties when viewed from the perspective of program analysis designers. For instance, some smoothness properties, such as partial differentiability and partial continuity, are not preserved by function composition, and this makes it difficult to analyse sequential composition soundly without heavily sacrificing precision. We formulate five assumptions on a target smoothness property, prove the soundness of our analysis under those assumptions, and show that our leading examples satisfy these assumptions. We also show that by using information from our analysis instantiated for differentiability, our improved gradient estimator satisfies an important differentiability requirement and thus computes the correct estimate on average (i.e., returns an unbiased estimate) under a regularity condition. Our experiments with representative probabilistic programs in the Pyro language show that our static analysis is capable of identifying smooth parts of those programs accurately, and making our improved pathwise gradient estimator exploit all the opportunities for high performance in those programs.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Executing Microservice Applications on Serverless, Correctly
Konstantinos KallasORCID logo, Haoran Zhang ORCID logo, Rajeev AlurORCID logo, Sebastian Angel ORCID logo, and Vincent Liu ORCID logo
(University of Pennsylvania, USA; Microsoft Research, USA)
While serverless platforms substantially simplify the provisioning, configuration, and management of cloud applications, implementing correct services on top of these platforms can present significant challenges to programmers. For example, serverless infrastructures introduce a host of failure modes that are not present in traditional deployments. Individual serverless instances can fail while others continue to make progress, correct but slow instances can be killed by the cloud provider as part of resource management, and providers will often respond to such failures by re-executing requests. For functions with side-effects, these scenarios can create behaviors that are not observable in serverful deployments.
In this paper, we propose mu2sls, a framework for implementing microservice applications on serverless using standard Python code with two extra primitives: transactions and asynchronous calls. Our framework orchestrates user-written services to address several challenges, such as failures and re-executions, and provides formal guarantees that the generated serverless implementations are correct. To that end, we present a novel service specification abstraction and formalization of serverless implementations that facilitate reasoning about the correctness of a given application’s serverless implementation. This formalization forms the basis of the mu2sls prototype, which we then use to develop a few real-world microservice applications and show that the performance of the generated serverless implementations achieves significant scalability (3-5× the throughput of a sequential implementation) while providing correctness guarantees in the context of faults, re-execution, and concurrency.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional
babble: Learning Better Abstractions with E-Graphs and Anti-unification
David Cao ORCID logo, Rose Kunkel ORCID logo, Chandrakana Nandi ORCID logo, Max WillseyORCID logo, Zachary Tatlock ORCID logo, and Nadia PolikarpovaORCID logo
(University of California at San Diego, USA; Certora, USA; University of Washington, USA)
Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in the input.
We propose library learning modulo theory (LLMT), a new library learning algorithm that additionally takes as input an equational theory for a given problem domain. LLMT uses e-graphs and equality saturation to compactly represent the space of programs equivalent modulo the theory, and uses a novel e-graph anti-unification technique to find common patterns in the corpus more directly and efficiently.
We implemented LLMT in a tool named babble. Our evaluation shows that babble achieves better compression orders of magnitude faster than the state of the art. We also provide a qualitative evaluation showing that babble learns reusable functions on inputs previously out of reach for library learning.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
MSWasm: Soundly Enforcing Memory-Safe Execution of Unsafe Code
Alexandra E. Michael ORCID logo, Anitha Gollamudi ORCID logo, Jay Bosamiya ORCID logo, Evan Johnson ORCID logo, Aidan Denlinger ORCID logo, Craig Disselkoen ORCID logo, Conrad Watt ORCID logo, Bryan ParnoORCID logo, Marco Patrignani ORCID logo, Marco Vassena ORCID logo, and Deian Stefan ORCID logo
(University of California at San Diego, USA; University of Washington, USA; University of Massachusetts Lowell, USA; Carnegie Mellon University, USA; Arm, USA; University of Cambridge, UK; University of Trento, Italy; Utrecht University, Netherlands)
Most programs compiled to WebAssembly (Wasm) today are written in unsafe languages like C and C++. Unfortunately, memory-unsafe C code remains unsafe when compiled to Wasm—and attackers can exploit buffer overflows and use-after-frees in Wasm almost as easily as they can on native platforms. Memory- Safe WebAssembly (MSWasm) proposes to extend Wasm with language-level memory-safety abstractions to precisely address this problem. In this paper, we build on the original MSWasm position paper to realize this vision. We give a precise and formal semantics of MSWasm, and prove that well-typed MSWasm programs are, by construction, robustly memory safe. To this end, we develop a novel, language-independent memory-safety property based on colored memory locations and pointers. This property also lets us reason about the security guarantees of a formal C-to-MSWasm compiler—and prove that it always produces memory-safe programs (and preserves the semantics of safe programs). We use these formal results to then guide several implementations: Two compilers of MSWasm to native code, and a C-to-MSWasm compiler (that extends Clang). Our MSWasm compilers support different enforcement mechanisms, allowing developers to make security-performance trade-offs according to their needs. Our evaluation shows that on the PolyBenchC suite, the overhead of enforcing memory safety in software ranges from 22% (enforcing spatial safety alone) to 198% (enforcing full memory safety), and 51.7% when using hardware memory capabilities for spatial safety and pointer integrity.
More importantly, MSWasm’s design makes it easy to swap between enforcement mechanisms; as fast (especially hardware-based) enforcement techniques become available, MSWasm will be able to take advantage of these advances almost for free.

Publisher's Version Published Artifact Info Artifacts Available Artifacts Functional
Grisette: Symbolic Compilation as a Functional Programming Library
Sirui Lu ORCID logo and Rastislav Bodík ORCID logo
(University of Washington, USA; Google Research, USA)
The development of constraint solvers simplified automated reasoning about programs and shifted the engineering burden to implementing symbolic compilation tools that translate programs into efficiently solvable constraints. We describe Grisette, a reusable symbolic evaluation framework for implementing domain-specific symbolic compilers. Grisette evaluates all execution paths and merges their states into a normal form that avoids making guards mutually exclusive. This ordered-guards representation reduces the constraint size 5-fold and the solving time more than 2-fold. Grisette is designed entirely as a library, which sidesteps the complications of lifting the host language into the symbolic domain. Grisette is purely functional, enabling memoization of symbolic compilation as well as monadic integration with host libraries. Grisette is statically typed, which allows catching programming errors at compile time rather than delaying their detection to the constraint solver. We implemented Grisette in Haskell and evaluated it on benchmarks that stress both the symbolic evaluation and constraint solving.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Locally Nameless Sets
Andrew M. Pitts ORCID logo
(University of Cambridge, UK)
This paper provides a new mathematical foundation for the locally nameless representation of syntax with binders, one informed by nominal techniques. It gives an equational axiomatization of two key locally nameless operations, "variable opening" and "variable closing" and shows that a lot of the locally nameless infrastructure can be defined from that in a syntax-independent way, including crucially a "shift" functor for name binding. That functor operates on a category whose objects we call locally nameless sets. Functors combining shift with sums and products have initial algebras that recover the usual locally nameless representation of syntax with binders in the finitary case. We demonstrate this by uniformly constructing such an initial locally nameless set for each instance of Plotkin's notion of binding signature. We also show by example that the shift functor is useful for locally nameless sets of a semantic rather than a syntactic character. The category of locally nameless sets is proved to be isomorphic to a known topos of finitely supported M-sets, where M is the full transformation monoid on a countably infinite set. A corollary of the proof is that several categories that have been used in the literature to model variable renaming operations on syntax with binders are all equivalent to each other and to the category of locally nameless sets.

Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable
A Bowtie for a Beast: Overloading, Eta Expansion, and Extensible Data Types in F⋈
Nick Rioux ORCID logo, Xuejing Huang ORCID logo, Bruno C. d. S. OliveiraORCID logo, and Steve ZdancewicORCID logo
(University of Pennsylvania, USA; University of Hong Kong, China)
The typed merge operator offers the promise of a compositional style of statically-typed programming in which solutions to the expression problem arise naturally. This approach, dubbed compositional programming, has recently been demonstrated by Zhang et al.
Unfortunately, the merge operator is an unwieldy beast. Merging values from overlapping types may be ambiguous, so disjointness relations have been introduced to rule out undesired nondeterminism and obtain a well-behaved semantics. Past type systems using a disjoint merge operator rely on intersection types, but extending such systems to include union types or overloaded functions is problematic: naively adding either reintroduces ambiguity. In a nutshell: the elimination forms of unions and overloaded functions require values to be distinguishable by case analysis, but the merge operator can create exotic values that violate that requirement.
This paper presents F, a core language that demonstrates how unions, intersections, and overloading can all coexist with a tame merge operator. The key is an underlying design principle that states that any two inhabited types can support either the deterministic merging of their values, or the ability to distinguish their values, but never both. To realize this invariant, we decompose previously studied notions of disjointness into two new, dual relations that permit the operation that best suits each pair of types. This duality respects the polarization of the type structure, yielding an expressive language that we prove to be both type safe and deterministic.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Kater: Automating Weak Memory Model Metatheory and Consistency Checking
Michalis KokologiannakisORCID logo, Ori LahavORCID logo, and Viktor VafeiadisORCID logo
(MPI-SWS, Germany; Tel Aviv University, Israel)
The metatheory of axiomatic weak memory models covers questions like the correctness of compilation mappings from one model to another and the correctness of local program transformations according to a given model---topics usually requiring lengthy human investigation. We show that these questions can be solved by answering a more basic question: "Given two memory models, is one weaker than the other?" Moreover, for a wide class of axiomatic memory models, we show that this basic question can be reduced to a language inclusion problem between regular languages, which is decidable.
Similarly, implementing an efficient check for whether an execution graph is consistent according to a given memory model has required non-trivial manual effort. Again, we show that such efficient checks can be derived automatically for a wide class of axiomatic memory models, and that incremental consistency checks can be incorporated in GenMC, a state-of-the-art model checker for concurrent programs. As a result, we get the first time- and space-efficient bounded verifier taking the axiomatic memory model as an input parameter.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
An Algebra of Alignment for Relational Verification
Timos Antonopoulos ORCID logo, Eric Koskinen ORCID logo, Ton Chanh Le ORCID logo, Ramana Nagasamudram ORCID logo, David A. Naumann ORCID logo, and Minh Ngo ORCID logo
(Yale University, USA; Stevens Institute of Technology, USA)
Relational verification encompasses information flow security, regression verification, translation validation for compilers, and more. Effective alignment of the programs and computations to be related facilitates use of simpler relational invariants and relational procedure specs, which in turn enables automation and modular reasoning. Alignment has been explored in terms of trace pairs, deductive rules of relational Hoare logics (RHL), and several forms of product automata. This article shows how a simple extension of Kleene Algebra with Tests (KAT), called BiKAT, subsumes prior formulations, including alignment witnesses for forall-exists properties, which brings to light new RHL-style rules for such properties. Alignments can be discovered algorithmically or devised manually but, in either case, their adequacy with respect to the original programs must be proved; an explicit algebra enables constructive proof by equational reasoning. Furthermore our approach inherits algorithmic benefits from existing KAT-based techniques and tools, which are applicable to a range of semantic models.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Optimal CHC Solving via Termination Proofs
Yu Gu ORCID logo, Takeshi Tsukada ORCID logo, and Hiroshi UnnoORCID logo
(University of Tsukuba, Japan; Chiba University, Japan; RIKEN AIP, Japan)
Motivated by applications to open program reasoning such as maximal specification inference, this paper studies optimal CHC solving, a problem to compute maximal and/or minimal solutions of constrained Horn clauses (CHCs). This problem and its subproblems have been studied in the literature, and a major approach is to iteratively improve a solution of CHCs until it becomes optimal. So a key ingredient of optimization methods is the optimality checking of a given solution. We propose a novel optimality checking method, as well as an optimization method using the proposed optimality checker, based on a computational theoretical analysis of the optimality checking problem. The key observation is that the optimality checking problem is closely related to the termination analysis of programs, and this observation is useful both theoretically and practically. From a theoretical perspective, it clarifies a limitation of an existing method and incorrectness of another method in the literature. From a practical perspective, it allows us to apply techniques of termination analysis to the optimality checking of a solution of CHCs. We present an optimality checking method based on constraint-based synthesis of termination arguments, implemented our method, evaluated it on CHCs that encode maximal specification synthesis problems, and obtained promising results.

Publisher's Version
Towards a Higher-Order Mathematical Operational Semantics
Sergey GoncharovORCID logo, Stefan Milius ORCID logo, Lutz SchröderORCID logo, Stelios Tsampas ORCID logo, and Henning Urbat ORCID logo
(University of Erlangen-Nuremberg, Germany)
Compositionality proofs in higher-order languages are notoriously involved, and general semantic frameworks guaranteeing compositionality are hard to come by. In particular, Turi and Plotkin’s bialgebraic abstract GSOS framework, which has been successfully applied to obtain off-the-shelf compositionality results for first-order languages, so far does not apply to higher-order languages. In the present work, we develop a theory of abstract GSOS specifications for higher-order languages, in effect transferring the core principles of Turi and Plotkin’s framework to a higher-order setting. In our theory, the operational semantics of higher-order languages is represented by certain dinatural transformations that we term pointed higher-order GSOS laws. We give a general compositionality result that applies to all systems specified in this way and discuss how compositionality of the SKI calculus and the λ-calculus w.r.t. a strong variant of Abramsky’s applicative bisimilarity are obtained as instances.

Publisher's Version
Unrealizability Logic
Jinwoo Kim ORCID logo, Loris D'AntoniORCID logo, and Thomas RepsORCID logo
(University of Wisconsin-Madison, USA; Seoul National University, South Korea)
We consider the problem of establishing that a program-synthesis problem is unrealizable (i.e., has no solution in a given search space of programs). Prior work on unrealizability has developed some automatic techniques to establish that a problem is unrealizable; however, these techniques are all black-box, meaning that they conceal the reasoning behind why a synthesis problem is unrealizable.
In this paper, we present a Hoare-style reasoning system, called unrealizability logic for establishing that a program-synthesis problem is unrealizable. To the best of our knowledge, unrealizability logic is the first proof system for overapproximating the execution of an infinite set of imperative programs. The logic provides a general, logical system for building checkable proofs about unrealizability. Similar to how Hoare logic distills the fundamental concepts behind algorithms and tools to prove the correctness of programs, unrealizability logic distills into a single logical system the fundamental concepts that were hidden within prior tools capable of establishing that a program-synthesis problem is unrealizable.

Publisher's Version
The Geometry of Causality: Multi-token Geometry of Interaction and Its Causal Unfolding
Simon Castellan ORCID logo and Pierre Clairambault ORCID logo
(University of Rennes, France; Inria, France; CNRS, France; IRISA, France; Université Aix-Marseille, France; Université de Toulon, France; LIS, France)
We introduce a multi-token machine for Idealized Parallel Algol (IPA), a higher-order concurrent programming language with shared state and semaphores. Our machine takes the shape of a compositional interpretation of terms as Petri structures, certain coloured Petri nets. For the purely functional fragment of IPA, our machine is conceptually close to Geometry of Interaction token machines, originating from Linear Logic and presenting higher-order computation as the low-level process of a token walking through a graph (a proof net) representing the term. We combine here these ideas with folklore ideas on the representation of first-order imperative concurrent programs as coloured Petri nets.
To prove our machine computationally adequate with respect to the reference operational semantics, we follow game semantics and represent types as certain games specifying dependencies and conflict between computational events. Petri strategies are those Petri structures obeying the rules of the game extracted from the type. We show how Petri strategies unfold to concurrent strategies in the sense of concurrent games on event structures. This link with concurrent strategies not only allows us to prove adequacy of our machine, but also lets us generate operationally a causal description of the behaviour of programs at higher-order types, which is shown to coincide with that given denotationally by the interpretation in concurrent games.

Publisher's Version Info Artifacts Functional
A High-Level Separation Logic for Heap Space under Garbage Collection
Alexandre Moine ORCID logo, Arthur CharguéraudORCID logo, and François PottierORCID logo
(Inria, France; Université de Strasbourg, France; CNRS, France; ICube, France)
We present a Separation Logic with space credits for reasoning about heap space in a sequential call-by-value lambda-calculus equipped with garbage collection and mutable state. A key challenge is to design sound, modular, lightweight mechanisms for establishing the unreachability of a block. Prior work in this area uses pointed-by assertions to keep track of the predecessors of every block, but is carried out in the setting of an assembly-like programming language. We take up the challenge in the setting of a high-level language, where a key problem is to identify and reason about the memory locations that the garbage collector considers as roots. For this purpose, we propose novel "stackable" assertions, which keep track of the existence of stack-to-heap pointers without explicitly recording their origin. Furthermore, we explain how to reason about closures -- concrete heap-allocated data structures that implement the abstract concept of a first-class function. We demonstrate the expressiveness and tractability of our program logic via a range of examples, including recursive functions on linked lists, objects implemented using closures and mutable internal state, recursive functions in continuation-passing style, and three stack implementations that exhibit different space bounds. These last three examples illustrate reasoning about the reachability of the items stored in a container as well as amortized reasoning about space. All of our results are proved in Coq on top of Iris.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
The Path to Durable Linearizability
Emanuele D'OsualdoORCID logo, Azalea RaadORCID logo, and Viktor VafeiadisORCID logo
(MPI-SWS, Germany; Imperial College London, UK)
There is an increasing body of literature proposing new and efficient persistent versions of concurrent data structures ensuring that a consistent state can be recovered after a power failure or a crash. Their correctness is typically stated in terms of durable linearizability (DL), which requires that individual library operations appear to be executed atomically in a sequence consistent with the real-time order and, moreover, that recovering from a crash return a state corresponding to a prefix of that sequence. Sadly, however, there are hardly any formal DL proofs, and those that do exist cover the correctness of rather simple persistent algorithms on specific (simplified) persistency models. In response, we propose a general, powerful, modular, and incremental proof technique that can be used to guide the development and establish DL. Our technique is (1) general, in that it is not tied to a specific persistency and/or consistency model, (2) powerful, in that it can handle the most advanced persistent algorithms in the literature, (3) modular, in that it allows the reuse of an existing linearizability argument, and (4) incremental, in that the additional requirements for establishing DL depend on the complexity of the algorithm to be verified. We illustrate this technique on various versions of a persistent set, leading to the link-free set of Zuriel et al.

Publisher's Version
DimSum: A Decentralized Approach to Multi-language Semantics and Verification
Michael Sammler ORCID logo, Simon Spies ORCID logo, Youngju Song ORCID logo, Emanuele D'OsualdoORCID logo, Robbert KrebbersORCID logo, Deepak GargORCID logo, and Derek DreyerORCID logo
(MPI-SWS, Germany; Radboud University Nijmegen, Netherlands)
Prior work on multi-language program verification has achieved impressive results, including the compositional verification of complex compilers. But the existing approaches to this problem impose a variety of restrictions on the overall structure of multi-language programs (e.g. fixing the source language, fixing the set of involved languages, fixing the memory model, or fixing the semantics of interoperation). In this paper, we explore the problem of how to avoid such global restrictions.
Concretely, we present DimSum: a new, decentralized approach to multi-language semantics and verification, which we have implemented in the Coq proof assistant. Decentralization means that we can define and reason about languages independently from each other (as independent modules communicating via events), but also combine and translate between them when necessary (via a library of combinators).
We apply DimSum to a high-level imperative language Rec (with an abstract memory model and function calls), a low-level assembly language Asm (with a concrete memory model, arbitrary jumps, and syscalls), and a mathematical specification language Spec. We evaluate DimSum on two case studies: an Asm library extending Rec with support for pointer comparison, and a coroutine library for Rec written in Asm. In both cases, we show how DimSum allows the Asm libraries to be abstracted to Rec-level specifications, despite the behavior of the Asm libraries not being syntactically expressible in Rec itself. We also verify an optimizing multi-pass compiler from Rec to Asm, showing that it is compatible with these Asm libraries.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
A General Noninterference Policy for Polynomial Time
Emmanuel Hainry ORCID logo and Romain Péchoux ORCID logo
(Université de Lorraine, France; CNRS, France; Inria, France; LORIA, France)
We introduce a new noninterference policy to capture the class of functions computable in polynomial time on an object-oriented programming language. This policy makes a clear separation between the standard noninterference techniques for the control flow and the layering properties required to ensure that each “security” level preserves polynomial time soundness, and is thus very powerful as for the class of programs it can capture. This new characterization is a proper extension of existing tractable characterizations of polynomial time based on safe recursion. Despite the fact that this noninterference policy is Π10-complete, we show that it can be instantiated to some decidable and conservative instance using shape analysis techniques.

Publisher's Version
CoqQ: Foundational Verification of Quantum Programs
Li Zhou ORCID logo, Gilles Barthe ORCID logo, Pierre-Yves Strub ORCID logo, Junyi Liu ORCID logo, and Mingsheng Ying ORCID logo
(MPI-SP, Germany; Institute of Software at Chinese Academy of Sciences, China; IMDEA Software Institute, Spain; Meta, France; University of Chinese Academy of Sciences, China; Tsinghua University, China)
CoqQ is a framework for reasoning about quantum programs in the Coq proof assistant. Its main components are: a deeply embedded quantum programming language, in which classic quantum algorithms are easily expressed, and an expressive program logic for proving properties of programs. CoqQ is foundational: the program logic is formally proved sound with respect to a denotational semantics based on state-of-art mathematical libraries (MathComp and MathComp Analysis). CoqQ is also practical: assertions can use Dirac expressions, which eases concise specifications, and proofs can exploit local and parallel reasoning, which minimizes verification effort. We illustrate the applicability of CoqQ with many examples from the literature.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
A Core Calculus for Equational Proofs of Cryptographic Protocols
Joshua Gancher ORCID logo, Kristina Sojakova ORCID logo, Xiong Fan ORCID logo, Elaine Shi ORCID logo, and Greg Morrisett ORCID logo
(Carnegie Mellon University, USA; Inria, France; Rutgers University, USA; Cornell University, USA)
Many proofs of interactive cryptographic protocols (e.g., as in Universal Composability) operate by proving the protocol at hand to be observationally equivalent to an idealized specification. While pervasive, formal tool support for observational equivalence of cryptographic protocols is still a nascent area of research. Current mechanization efforts tend to either focus on diff-equivalence, which establishes observational equivalence between protocols with identical control structures, or require an explicit witness for the observational equivalence in the form of a bisimulation relation. Our goal is to simplify proofs for cryptographic protocols by introducing a core calculus, IPDL, for cryptographic observational equivalences. Via IPDL, we aim to address a number of theoretical issues for cryptographic proofs in a simple manner, including probabilistic behaviors, distributed message-passing, and resource-bounded adversaries and simulators. We demonstrate IPDL on a number of case studies, including a distributed coin toss protocol, Oblivious Transfer, and the GMW multi-party computation protocol. All proofs of case studies are mechanized via an embedding of IPDL into the Coq proof assistant.

Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable
Making a Type Difference: Subtraction on Intersection Types as Generalized Record Operations
Han Xu ORCID logo, Xuejing Huang ORCID logo, and Bruno C. d. S. OliveiraORCID logo
(Peking University, China; University of Hong Kong, China)
In programming languages with records, objects, or traits, it is common to have operators that allow dropping, updating or renaming some components. These operators are useful for programmers to explicitly deal with conflicts and override or update some components. While such operators have been studied for record types, little work has been done to generalize and study their theory for other types.
This paper shows that, given subtyping and disjointness relations, we can specify and derive algorithmic implementations for a general type difference operator that works for other types, including function types, record types and intersection types. When defined in this way, the type difference algebra has many desired properties that are expected from a subtraction operator. Together with a generic merge operator, using type difference we can generalize many operations on records formalized in the literature. To illustrate the usefulness of type difference we create an intermediate calculus with a rich set of operators on expressions of arbitrary type, and demonstrate applications of these operators in CP, a prototype language for Compositional Programming. The semantics of the calculus is given by elaborating into a calculus with disjoint intersection types and a merge operator. We have implemented type difference and all the operators in the CP language. Moreover, all the calculi and related proofs are mechanically formalized in the Coq theorem prover.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Qunity: A Unified Language for Quantum and Classical Computing
Finn Voichick ORCID logo, Liyi Li ORCID logo, Robert RandORCID logo, and Michael HicksORCID logo
(University of Maryland, USA; University of Chicago, USA; Amazon, USA)
We introduce Qunity, a new quantum programming language designed to treat quantum computing as a natural generalization of classical computing. Qunity presents a unified syntax where familiar programming constructs can have both quantum and classical effects. For example, one can use sum types to implement the direct sum of linear operators, exception-handling syntax to implement projective measurements, and aliasing to induce entanglement. Further, Qunity takes advantage of the overlooked BQP subroutine theorem, allowing one to construct reversible subroutines from irreversible quantum algorithms through the uncomputation of "garbage" outputs. Unlike existing languages that enable quantum aspects with separate add-ons (like a classical language with quantum gates bolted on), Qunity provides a unified syntax and a novel denotational semantics that guarantees that programs are quantum mechanically valid. We present Qunity's syntax, type system, and denotational semantics, showing how it can cleanly express several quantum algorithms. We also detail how Qunity can be compiled into a low-level qubit circuit language like OpenQASM, proving the realizability of our design.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
FlashFill++: Scaling Programming by Example by Cutting to the Chase
José Cambronero ORCID logo, Sumit GulwaniORCID logo, Vu LeORCID logo, Daniel Perelman ORCID logo, Arjun Radhakrishna ORCID logo, Clint Simon ORCID logo, and Ashish Tiwari ORCID logo
(Microsoft, USA)
Programming-by-Examples (PBE) involves synthesizing an "intended program" from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-specific language (DSL) with "effectively-invertible" (EI) operators at the top and "effectively-enumerable" (EE) operators at the bottom. This facilitates an effective combination of top-down synthesis strategy (which backpropagates outputs over various paths in the DSL using inverse functions) with a bottom-up synthesis strategy (which propagates inputs over various paths in the DSL). We address the problem of scaling synthesis to large DSLs with several non-EI/EE operators. This is motivated by the need to support a richer class of transformations and the need for readable code generation. We propose a novel solution strategy that relies on propagating fewer values and over fewer paths.
Our first key idea is that of "cut functions" that prune the set of values being propagated by using knowledge of the sub-DSL on the other side. Cuts can be designed to preserve completeness of synthesis; however, DSL designers may use incomplete cuts to have finer control over the kind of programs synthesized. In either case, cuts make search feasible for non-EI/EE operators and efficient for deep DSLs. Our second key idea is that of "guarded DSLs" that allow a precedence on DSL operators, which dynamically controls exploration of various paths in the DSL. This makes search efficient over grammars with large fanouts without losing recall. It also makes ranking simpler yet more effective in learning an intended program from very few examples. Both cuts and precedence provide a mechanism to the DSL designer to restrict search to a reasonable, and possibly incomplete, space of programs.
Using cuts and gDSLs, we have built FlashFill++, an industrial-strength PBE engine for performing rich string transformations, including datetime and number manipulations. The FlashFill++ gDSL is designed to enable readable code generation in different target languages including Excel's formula language, PowerFx, and Python. We show FlashFill++ is more expressive, more performant, and generates better quality code than comparable existing PBE systems. FlashFill++ is being deployed in several mass-market products ranging from spreadsheet software to notebooks and business intelligence applications, each with millions of users.

Publisher's Version
Witnessability of Undecidable Problems
Shuo Ding ORCID logo and Qirun Zhang ORCID logo
(Georgia Institute of Technology, USA)
Many problems in programming language theory and formal methods are undecidable, so they cannot be solved precisely. Practical techniques for dealing with undecidable problems are often based on decidable approximations. Undecidability implies that those approximations are always imprecise. Typically, practitioners use heuristics and ad hoc reasoning to identify imprecision issues and improve approximations, but there is a lack of computability-theoretic foundations about whether those efforts can succeed.
This paper shows a surprising interplay between undecidability and decidable approximations: there exists a class of undecidable problems, such that it is computable to transform any decidable approximation to a witness input demonstrating its imprecision. We call those undecidable problems witnessable problems. For example, if a program property P is witnessable, then there exists a computable function fP, such that fP takes as input the code of any program analyzer targeting P and produces an input program w on which the program analyzer is imprecise. An even more surprising fact is that the class of witnessable problems includes almost all undecidable problems in programming language theory and formal methods. Specifically, we prove the diagonal halting problem K is witnessable, and the class of witnessable problems is closed under complements and many-one reductions. In particular, all “non-trivial semantic properties of programs” mentioned in Rice’s theorem are witnessable. We also explicitly construct a problem in the non-witnessable (and undecidable) class and show that both classes have cardinality 20.
Our results offer a new perspective on the understanding of undecidability: for witnessable problems, although it is impossible to solve them precisely, it is always possible to improve any decidable approximation to make it closer to the precise solution. This fact formally demonstrates that research efforts on such approximations are promising and shows there exist universal ways to identify precision issues of program analyzers, program verifiers, SMT solvers, etc., because their essences are decidable approximations of witnessable problems.

Publisher's Version
Single-Source-Single-Target Interleaved-Dyck Reachability via Integer Linear Programming
Yuanbo Li ORCID logo, Qirun Zhang ORCID logo, and Thomas RepsORCID logo
(Georgia Institute of Technology, USA; University of Wisconsin-Madison, USA)
An interleaved-Dyck (InterDyck) language consists of the interleaving of two or more Dyck languages, where each Dyck language represents a set of strings of balanced parentheses.InterDyck-reachability is a fundamental framework for program analyzers that simultaneously track multiple properly-matched pairs of actions such as call/return, lock/unlock, or write-data/read-data.Existing InterDyck-reachability algorithms are based on the well-known tabulation technique.
This paper presents a new perspective on solving InterDyck-reachability. Our key observation is that for the single-source-single-target InterDyck-reachability variant, it is feasible to summarize all paths from the source node to the target node based on path expressions. Therefore, InterDyck-reachability becomes an InterDyck-path-recognition problem over path expressions. Instead of computing summary edges as in traditional tabulation algorithms, this new perspective enables us to express InterDyck-reachability as a parenthesis-counting problem, which can be naturally formulated via integer linear programming (ILP).
We implemented our ILP-based algorithm and performed extensive evaluations based on two client analyses (a reachability analysis for concurrent programs and a taint analysis). In particular, we evaluated our algorithm against two types of algorithms: (1) the general all-pairs InterDyck-reachability algorithms based on linear conjunctive language (LCL) reachability and synchronized pushdown system (SPDS) reachability, and (2) two domain-specific algorithms for both client analyses. The experimental results are encouraging. Our algorithm achieves 1.42×, 28.24×, and 11.76× speedup for the concurrency-analysis benchmarks compared to all-pair LCL-reachability, SPDS-reachability, and domain-specific tools, respectively; 1.2×, 69.9×, and 0.98× speedup for the taint-analysis benchmarks. Moreover, the algorithm also provides precision improvements, particularly for taint analysis, where it achieves 4.55%, 11.1%, and 6.8% improvement, respectively.

Publisher's Version
Higher-Order Leak and Deadlock Free Locks
Jules JacobsORCID logo and Stephanie BalzerORCID logo
(Radboud University Nijmegen, Netherlands; Carnegie Mellon University, USA)
Reasoning about concurrent programs is challenging, especially if data is shared among threads. Program correctness can be violated by the presence of data races—whose prevention has been a topic of concern both in research and in practice. The Rust programming language is a prime example, putting the slogan fearless concurrency in practice by not only employing an ownership-based type system for memory management, but also using its type system to enforce mutual exclusion on shared data. Locking, unfortunately, not only comes at the price of deadlocks but shared access to data may also cause memory leaks.
This paper develops a theory of deadlock and leak freedom for higher-order locks in a shared memory concurrent setting. Higher-order locks allow sharing not only of basic values but also of other locks and channels, and are themselves first-class citizens. The theory is based on the notion of a sharing topology, administrating who is permitted to access shared data at what point in the program. The paper first develops higher-order locks for acyclic sharing topologies, instantiated in a λ-calculus with higher-order locks and message-passing concurrency. The paper then extends the calculus to support circular dependencies with dynamic lock orders, which we illustrate with a dynamic version of Dijkstra’s dining philosophers problem. Well-typed programs in the resulting calculi are shown to be free of deadlocks and memory leaks, with proofs mechanized in the Coq proof assistant.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
A Robust Theory of Series Parallel Graphs
Rajeev AlurORCID logo, Caleb Stanford ORCID logo, and Christopher WatsonORCID logo
(University of Pennsylvania, USA; University of California at San Diego, USA; University of California at Davis, USA)
Motivated by distributed data processing applications, we introduce a class of labeled directed acyclic graphs constructed using sequential and parallel composition operations, and study automata and logics over them. We show that deterministic and non-deterministic acceptors over such graphs have the same expressive power, which can be equivalently characterized by Monadic Second-Order logic and the graded µ-calculus. We establish closure under composition operations and decision procedures for membership, emptiness, and inclusion. A key feature of our graphs, called synchronized series-parallel graphs (SSPG), is that parallel composition introduces a synchronization edge from the newly introduced source vertex to the sink. The transfer of information enabled by such edges is crucial to the determinization construction, which would not be possible for the traditional definition of series-parallel graphs.
SSPGs allow both ordered ranked parallelism and unordered unranked parallelism. The latter feature means that in the corresponding automata, the transition function needs to account for an arbitrary number of predecessors by counting each type of state only up to a specified constant, thus leading to a notion of counting complexity that is distinct from the classical notion of state complexity. The determinization construction translates a nondeterministic automaton with n states and k counting complexity to a deterministic automaton with 2n2 states and kn counting complexity, and both these bounds are shown to be tight. Furthermore, for nondeterministic automata a bound of 2 on counting complexity suffices without loss of expressiveness.

Publisher's Version
A Compositional Theory of Linearizability
Arthur Oliveira Vale ORCID logo, Zhong Shao ORCID logo, and Yixuan Chen ORCID logo
(Yale University, USA)
Compositionality is at the core of programming languages research and has become an important goal toward scalable verification of large systems. Despite that, there is no compositional account of linearizability, the gold standard of correctness for concurrent objects.
In this paper, we develop a compositional semantics for linearizable concurrent objects. We start by showcasing a common issue, which is independent of linearizability, in the construction of compositional models of concurrent computation: interaction with the neutral element for composition can lead to emergent behaviors, a hindrance to compositionality. Category theory provides a solution for the issue in the form of the Karoubi envelope. Surprisingly, and this is the main discovery of our work, this abstract construction is deeply related to linearizability and leads to a novel formulation of it. Notably, this new formulation neither relies on atomicity nor directly upon happens-before ordering and is only possible because of compositionality, revealing that linearizability and compositionality are intrinsically related to each other.
We use this new, and compositional, understanding of linearizability to revisit much of the theory of linearizability, providing novel, simple, algebraic proofs of the locality property and of an analogue of the equivalence with observational refinement. We show our techniques can be used in practice by connecting our semantics with a simple program logic that is nonetheless sound concerning this generalized linearizability.

Publisher's Version Info
Conditional Contextual Refinement
Youngju Song ORCID logo, Minki Cho ORCID logo, Dongjae Lee ORCID logo, Chung-Kil Hur ORCID logo, Michael Sammler ORCID logo, and Derek DreyerORCID logo
(Seoul National University, South Korea; MPI-SWS, Germany)
Much work in formal verification of low-level systems is based on one of two approaches: refinement or separation logic. These two approaches have complementary benefits: refinement supports the use of programs as specifications, as well as transitive composition of proofs, whereas separation logic supports conditional specifications, as well as modular ownership reasoning about shared state. A number of verification frameworks employ these techniques in tandem, but in all such cases the benefits of the two techniques remain separate. For example, in frameworks that use relational separation logic to prove contextual refinement, the relational separation logic judgment does not support transitive composition of proofs, while the contextual refinement judgment does not support conditional specifications. In this paper, we propose Conditional Contextual Refinement (or CCR, for short), the first verification system to not only combine refinement and separation logic in a single framework but also to truly marry them together into a unified mechanism enjoying all the benefits of refinement and separation logic simultaneously. Specifically, unlike in prior work, CCR’s refinement specifications are both conditional (with separation logic pre- and post-conditions) and transitively composable. We implement CCR in Coq and evaluate its effectiveness on a range of interesting examples.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Tail Recursion Modulo Context: An Equational Approach
Daan LeijenORCID logo and Anton Lorenzen ORCID logo
(Microsoft Research, USA; University of Edinburgh, UK)
The tail-recursion modulo cons transformation can rewrite functions that are not quite tail-recursive into a tail-recursive form that can be executed efficiently. In this article we generalize tail recursion modulo cons (TRMc) to modulo contexts (TRMC), and calculate a general TRMC algorithm from its specification. We can instantiate our general algorithm by providing an implementation of application and composition on abstract contexts, and showing that our context laws_ hold. We provide some known instantiations of TRMC, namely modulo evaluation contexts (CPS), and associative operations, and further instantiantions not so commonly associated with TRMC, such as defunctionalized evaluation contexts, monoids, semirings, exponents, and cons products. We study the modulo cons instantiation in particular and prove that an instantiation using Minamide’s hole calculus is sound. We also calculate a second instantiation in terms of the Perceus heap semantics to precisely reason about the soundness of in-place update. While all previous approaches to TRMc fail in the presence of non-linear control (for example induced by call/cc, shift/reset or algebraic effect handlers), we can elegantly extend the heap semantics to a hybrid approach which dynamically adapts to non-linear control flow. We have a full implementation of hybrid TRMc in the Koka language and our benchmark shows the TRMc transformed functions are always as fast or faster than using manual alternatives.

Publisher's Version
Top-Down Synthesis for Library Learning
Matthew Bowers ORCID logo, Theo X. Olausson ORCID logo, Lionel Wong ORCID logo, Gabriel Grand ORCID logo, Joshua B. Tenenbaum ORCID logo, Kevin Ellis ORCID logo, and Armando Solar-Lezama ORCID logo
(Massachusetts Institute of Technology, USA; Cornell University, USA)
This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch’s scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early—further allowing it to scale to challenging datasets by means of early stopping.

Publisher's Version Published Artifact Archive submitted (1.6 MB) Info Artifacts Available Artifacts Reusable
Admissible Types-to-PERs Relativization in Higher-Order Logic
Andrei Popescu ORCID logo and Dmitriy TraytelORCID logo
(University of Sheffield, UK; University of Copenhagen, Denmark)
Relativizing statements in Higher-Order Logic (HOL) from types to sets is useful for improving productivity when working with HOL-based interactive theorem provers such as HOL4, HOL Light and Isabelle/HOL. This paper provides the first comprehensive definition and study of types-to-sets relativization in HOL, done in the more general form of types-to-PERs (partial equivalence relations). We prove that, for a large practical fragment of HOL which includes container types such as datatypes and codatatypes, types-to-PERs relativization is admissible, in that the provability of the original, type-based statement implies the provability of its relativized, PER-based counterpart. Our results also imply the admissibility of a previously proposed axiomatic extension of HOL with local type definitions. We have implemented types-to-PERs relativization as an Isabelle tool that performs relativization of HOL theorems on demand.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
You Only Linearize Once: Tangents Transpose to Gradients
Alexey Radul ORCID logo, Adam Paszke ORCID logo, Roy Frostig ORCID logo, Matthew J. Johnson ORCID logo, and Dougal Maclaurin ORCID logo
(Google Research, USA; Google Research, Poland)
Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two “modes”—forward and reverse—which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the linear and non-linear parts and then (iii) transposition of the linear part.
To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzipping let expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD.
We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward).

Publisher's Version
When Less Is More: Consequence-Finding in a Weak Theory of Arithmetic
Zachary Kincaid ORCID logo, Nicolas Koh ORCID logo, and Shaowei Zhu ORCID logo
(Princeton University, USA)
This paper presents a theory of non-linear integer/real arithmetic and algorithms for reasoning about this theory. The theory can be conceived of as an extension of linear integer/real arithmetic with a weakly-axiomatized multiplication symbol, which retains many of the desirable algorithmic properties of linear arithmetic. In particular, we show that the conjunctive fragment of the theory can be effectively manipulated (analogously to the usual operations on convex polyhedra, the conjunctive fragment of linear arithmetic). As a result, we can solve the following consequence-finding problem: given a ground formula F, find the strongest conjunctive formula that is entailed by F. As an application of consequence-finding, we give a loop invariant generation algorithm that is monotone with respect to the theory and (in a sense) complete. Experiments show that the invariants generated from the consequences are effective for proving safety properties of programs that require non-linear reasoning.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Dynamic Race Detection with O(1) Samples
Mosaad Al Thokair ORCID logo, Minjian Zhang ORCID logo, Umang Mathur ORCID logo, and Mahesh Viswanathan ORCID logo
(University of Illinois at Urbana-Champaign, USA; National University of Singapore, Singapore)
Happens before-based dynamic analysis is the go-to technique for detecting data races in large scale software projects due to the absence of false positive reports. However, such analyses are expensive since they employ expensive vector clock updates at each event, rendering them usable only for in-house testing. In this paper, we present a sampling-based, randomized race detector that processes only constantly many events of the input trace even in the worst case. This is the first sub-linear time (i.e., running in o(n) time where n is the length of the trace) dynamic race detection algorithm; previous sampling based approaches like run in linear time (i.e., O(n)). Our algorithm is a property tester for -race detection — it is sound in that it never reports any false positive, and on traces that are far, with respect to hamming distance, from any race-free trace, the algorithm detects an -race with high probability. Our experimental evaluation of the algorithm and its comparison with state-of-the-art deterministic and sampling based race detectors shows that the algorithm does indeed have significantly low running time, and detects races quite often.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Affine Monads and Lazy Structures for Bayesian Programming
Swaraj Dash ORCID logo, Younesse KaddarORCID logo, Hugo Paquet ORCID logo, and Sam StatonORCID logo
(University of Oxford, UK)
We show that streams and lazy data structures are a natural idiom for programming with infinite-dimensional Bayesian methods such as Poisson processes, Gaussian processes, jump processes, Dirichlet processes, and Beta processes. The crucial semantic idea, inspired by developments in synthetic probability theory, is to work with two separate monads: an affine monad of probability, which supports laziness, and a commutative, non-affine monad of measures, which does not. (Affine means that T(1)≅ 1.) We show that the separation is important from a decidability perspective, and that the recent model of quasi-Borel spaces supports these two monads.
To perform Bayesian inference with these examples, we introduce new inference methods that are specially adapted to laziness; they are proven correct by reference to the Metropolis-Hastings-Green method. Our theoretical development is implemented as a Haskell library, LazyPPL.

Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable
Dargent: A Silver Bullet for Verified Data Layout Refinement
Zilin Chen ORCID logo, Ambroise Lafont ORCID logo, Liam O'ConnorORCID logo, Gabriele Keller ORCID logo, Craig McLaughlin ORCID logo, Vincent Jackson ORCID logo, and Christine Rizkallah ORCID logo
(UNSW, Australia; University of Cambridge, UK; University of Edinburgh, UK; Utrecht University, Netherlands; University of Melbourne, Australia)
Systems programmers need fine-grained control over the memory layout of data structures, both to produce performant code and to comply with well-defined interfaces imposed by existing code, standardised protocols or hardware. Code that manipulates these low-level representations in memory is hard to get right. Traditionally, this problem is addressed by the implementation of tedious marshalling code to convert between compiler-selected data representations and the desired compact data formats. Such marshalling code is error-prone and can lead to a significant runtime overhead due to excessive copying. While there are many languages and systems that address the correctness issue, by automating the generation and, in some cases, the verification of the marshalling code, the performance overhead introduced by the marshalling code remains. In particular for systems code, this overhead can be prohibitive. In this work, we address both the correctness and the performance problems.
We present a data layout description language and data refinement framework, called Dargent, which allows programmers to declaratively specify how algebraic data types are laid out in memory. Our solution is applied to the Cogent language, but the general ideas behind our solution are applicable to other settings. The Dargent framework generates C code that manipulates data directly with the desired memory layout, while retaining the formal proof that this generated C code is correct with respect to the functional semantics. This added expressivity removes the need for implementing and verifying marshalling code, which eliminates copying, smoothens interoperability with surrounding systems, and increases the trustworthiness of the overall system.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Recursive Subtyping for All
Litao Zhou ORCID logo, Yaoda Zhou ORCID logo, and Bruno C. d. S. OliveiraORCID logo
(University of Hong Kong, China)
Recursive types and bounded quantification are prominent features in many modern programming languages, such as Java, C#, Scala or TypeScript. Unfortunately, the interaction between recursive types, bounded quantification and subtyping has shown to be problematic in the past. Consequently, defining a simple foundational calculus that combines those features and has desirable properties, such as decidability, transitivity of subtyping, conservativity and a sound and complete algorithmic formulation has been a long time challenge.
This paper presents an extension of kernel ‍F, called Fµ, with iso-recursive types. F is a well-known polymorphic calculus with bounded quantification. In Fµ we add iso-recursive types, and correspondingly extend the subtyping relation with iso-recursive subtyping using the recently proposed nominal unfolding rules. We also add two smaller extensions to F. The first one is a generalization of the kernel ‍F rule for bounded quantification that accepts equivalent rather than equal bounds. The second extension is the use of so-called structural folding/unfolding rules, inspired by the structural unfolding rule proposed by Abadi, Cardelli, and Viswanathan [1996]. The structural rules add expressive power to the more conventional folding/unfolding rules in the literature, and they enable additional applications. We present several results, including: type soundness; transitivity and decidability of subtyping; the conservativity of Fµ over F; and a sound and complete algorithmic formulation of Fµ. Moreover, we study an extension of Fµ, called F≤≥µ, which includes lower bounded quantification in addition to the conventional (upper) bounded quantification of F. All the results in this paper have been formalized in the Coq theorem prover.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Stratified Commutativity in Verification Algorithms for Concurrent Programs
Azadeh FarzanORCID logo, Dominik Klumpp ORCID logo, and Andreas Podelski ORCID logo
(University of Toronto, Canada; University of Freiburg, Germany)
The importance of exploiting commutativity relations in verification algorithms for concurrent programs is well-known. They can help simplify the proof and improve the time and space efficiency. This paper studies commutativity relations as a first-class object in the setting of verification algorithms for concurrent programs. A first contribution is a general framework for abstract commutativity relations. We introduce a general soundness condition for commutativity relations, and present a method to automatically derive sound abstract commutativity relations from a given proof. The method can be used in a verification algorithm based on abstraction refinement to compute a new commutativity relation in each iteration of the abstraction refinement loop. A second result is a general proof rule that allows one to combine multiple commutativity relations, with incomparable power, in a stratified way that preserves soundness and allows one to profit from the full power of the combined relations. We present an algorithm for the stratified proof rule that performs an optimal combination (in a sense made formal), enabling usage of stratified commutativity in algorithmic verification. We empirically evaluate the impact of abstract commutativity and stratified combination of commutativity relations on verification algorithms for concurrent programs.

Publisher's Version Archive submitted (780 kB)
Type-Preserving, Dependence-Aware Guide Generation for Sound, Effective Amortized Probabilistic Inference
Jianlin LiORCID logo, Leni Ven ORCID logo, Pengyuan Shi ORCID logo, and Yizhou ZhangORCID logo
(University of Waterloo, Canada)
In probabilistic programming languages (PPLs), a critical step in optimization-based inference methods is constructing, for a given model program, a trainable guide program. Soundness and effectiveness of inference rely on constructing good guides, but the expressive power of a universal PPL poses challenges. This paper introduces an approach to automatically generating guides for deep amortized inference in a universal PPL. Guides are generated using a type-directed translation per a novel behavioral type system. Guide generation extracts and exploits independence structures using a syntactic approach to conditional independence, with a semantic account left to further work. Despite the control-flow expressiveness allowed by the universal PPL, generated guides are guaranteed to satisfy a critical soundness condition and moreover, consistently improve training and inference over state-of-the-art baselines for a suite of benchmarks.

Publisher's Version
Quantitative Inhabitation for Different Lambda Calculi in a Unifying Framework
Victor Arrial ORCID logo, Giulio Guerrieri ORCID logo, and Delia KesnerORCID logo
(Université Paris Cité - CNRS - IRIF, France; Aix Marseille Université - CNRS - LIS, France; Edinburgh Research Centre - Central Software Institute - Huawei, UK; Institut Universitaire de France, France)
We solve the inhabitation problem for a language called λ!, a subsuming paradigm (inspired by call-by-push-value) being able to encode, among others, call-by-name and call-by-value strategies of functional programming. The type specification uses a non-idempotent intersection type system, which is able to capture quantitative properties about the dynamics of programs. As an application, we show how our general methodology can be used to derive inhabitation algorithms for different lambda-calculi that are encodable into λ!.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Fast Coalgebraic Bisimilarity Minimization
Jules JacobsORCID logo and Thorsten Wißmann ORCID logo
(Radboud University Nijmegen, Netherlands)
Coalgebraic bisimilarity minimization generalizes classical automaton minimization to a large class of automata whose transition structure is specified by a functor, subsuming strong, weighted, and probabilistic bisimilarity. This offers the enticing possibility of turning bisimilarity minimization into an off-the-shelf technology, without having to develop a new algorithm for each new type of automaton. Unfortunately, there is no existing algorithm that is fully general, efficient, and able to handle large systems.
We present a generic algorithm that minimizes coalgebras over an arbitrary functor in the category of sets as long as the action on morphisms is sufficiently computable. The functor makes at most O(m logn) calls to the functor-specific action, where n is the number of states and m is the number of transitions in the coalgebra.
While more specialized algorithms can be asymptotically faster than our algorithm (usually by a factor of (m/n)), our algorithm is especially well suited to efficient implementation, and our tool often uses much less time and memory on existing benchmarks, and can handle larger automata, despite being more generic.

Publisher's Version Published Artifact Artifacts Available Artifacts Functional
An Operational Approach to Library Abstraction under Relaxed Memory Concurrency
Abhishek Kr Singh ORCID logo and Ori LahavORCID logo
(Tel Aviv University, Israel)
Concurrent data structures and synchronization mechanisms implemented by expert developers are indispensable for modular software development. In this paper, we address the fundamental problem of library abstraction under weak memory concurrency, and identify a general library correctness condition allowing clients of the library to reason about program behaviors using the specification code, which is often much simpler than the concrete implementation. We target (a fragment of) the RC11 memory model, and develop an equivalent operational presentation that exposes knowledge propagation between threads, and is sufficiently expressive to capture library behaviors as totally ordered operational execution traces. We further introduce novel access modes to the language that allow intricate specifications accounting for library internal synchronization that is not exposed to the client, as well as the library's demands on external synchronization by the client. We illustrate applications of our approach in several examples of different natures.

Publisher's Version
Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations
Tom J. Smeding ORCID logo and Matthijs I. L. Vákár ORCID logo
(Utrecht University, Netherlands)
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
A Partial Order View of Message-Passing Communication Models
Cinzia Di Giusto ORCID logo, Davide Ferré ORCID logo, Laetitia Laversa ORCID logo, and Etienne Lozes ORCID logo
(Université Côte d'Azur, France; CNRS, France)
There is a wide variety of message-passing communication models, ranging from synchronous "rendez-vous" communications to fully asynchronous/out-of-order communications. For large-scale distributed systems, the communication model is determined by the transport layer of the network, and a few classes of orders of message delivery (FIFO, causally ordered) have been identified in the early days of distributed computing. For local-scale message-passing applications, e.g., running on a single machine, the communication model may be determined by the actual implementation of message buffers and by how FIFO queues are used. While large-scale communication models, such as causal ordering, are defined by logical axioms, local-scale models are often defined by an operational semantics. In this work, we connect these two approaches, and we present a unified hierarchy of communication models encompassing both large-scale and local-scale models, based on their concurrent behaviors. We also show that all the communication models we consider can be axiomatized in the monadic second order logic, and may therefore benefit from several bounded verification techniques based on bounded special treewidth.

Publisher's Version
Combining Functional and Automata Synthesis to Discover Causal Reactive Programs
Ria Das ORCID logo, Joshua B. Tenenbaum ORCID logo, Armando Solar-Lezama ORCID logo, and Zenna Tavares ORCID logo
(Stanford University, USA; Massachusetts Institute of Technology, USA; Basis, USA; Columbia University, USA)
We present a new algorithm that synthesizes functional reactive programs from observation data. The key novelty is to iterate between a functional synthesis step, which attempts to generate a transition function over observed states, and an automata synthesis step, which adds any additional latent state necessary to fully account for the observations. We develop a functional reactive DSL called Autumn that can express a rich variety of causal dynamics in time-varying, Atari-style grid worlds, and apply our method to synthesize Autumn programs from data. We evaluate our algorithm on a benchmark suite of 30 Autumn programs as well as a third-party corpus of grid-world-style video games. We find that our algorithm synthesizes 27 out of 30 programs in our benchmark suite and 21 out of 27 programs from the third-party corpus, including several programs describing complex latent state transformations, and from input traces containing hundreds of observations. We expect that our approach will provide a template for how to integrate functional and automata synthesis in other induction domains.

Publisher's Version Archive submitted (780 kB)
An Order-Theoretic Analysis of Universe Polymorphism
Kuen-Bang Hou (Favonia)ORCID logo, Carlo AngiuliORCID logo, and Reed Mullanix ORCID logo
(University of Minnesota, USA; Carnegie Mellon University, USA)
We present a novel formulation of universe polymorphism in dependent type theory in terms of monads on the category of strict partial orders, and a novel algebraic structure, displacement algebras, on top of which one can implement a generalized form of McBride’s “crude but effective stratification” scheme for lightweight universe polymorphism. We give some examples of exotic but consistent universe hierarchies, and prove that every universe hierarchy in our sense can be embedded in a displacement algebra and hence implemented via our generalization of McBride’s scheme. Many of our technical results are mechanized in Agda, and we have an OCaml library for universe levels based on displacement algebras, for use in proof assistant implementations.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Statically Resolvable Ambiguity
Viktor Palmkvist ORCID logo, Elias Castegren ORCID logo, Philipp Haller ORCID logo, and David Broman ORCID logo
(KTH Royal Institute of Technology, Sweden; Uppsala University, Sweden)
Traditionally, a grammar defining the syntax of a programming language is typically both context free and unambiguous. However, recent work suggests that an attractive alternative is to use ambiguous grammars,thus postponing the task of resolving the ambiguity to the end user. If all programs accepted by an ambiguous grammar can be rewritten unambiguously, then the parser for the grammar is said to be resolvably ambiguous. Guaranteeing resolvable ambiguity statically---for all programs---is hard, where previous work only solves it partially using techniques based on property-based testing. In this paper, we present the first efficient, practical, and proven correct solution to the statically resolvable ambiguity problem. Our approach introduces several key ideas, including splittable productions, operator sequences, and the concept of a grouper that works in tandem with a standard parser. We prove static resolvability using a Coq mechanization and demonstrate its efficiency and practical applicability by implementing and integrating resolvable ambiguity into an essential part of the standard OCaml parser.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
The Fine-Grained Complexity of CFL Reachability
Paraschos Koutris ORCID logo and Shaleen Deep ORCID logo
(University of Wisconsin-Madison, USA; Microsoft, USA)
Many problems in static program analysis can be modeled as the context-free language (CFL) reachability problem on directed labeled graphs. The CFL reachability problem can be generally solved in time O(n3), where n is the number of vertices in the graph, with some specific cases that can be solved faster. In this work, we ask the following question: given a specific CFL, what is the exact exponent in the monomial of the running time? In other words, for which cases do we have linear, quadratic or cubic algorithms, and are there problems with intermediate runtimes? This question is inspired by recent efforts to classify classic problems in terms of their exact polynomial complexity, known as fine-grained complexity. Although recent efforts have shown some conditional lower bounds (mostly for the class of combinatorial algorithms), a general picture of the fine-grained complexity landscape for CFL reachability is missing.
Our main contribution is lower bound results that pinpoint the exact running time of several classes of CFLs or specific CFLs under widely believed lower bound conjectures (e.g., Boolean Matrix Multiplication, k-Clique, APSP, 3SUM). We particularly focus on the family of Dyck-k languages (which are strings with well-matched parentheses), a fundamental class of CFL reachability problems. Remarkably, we are able to show a Ω(n2.5) lower bound for Dyck-2 reachability, which to the best of our knowledge is the first super-quadratic lower bound that applies to all algorithms, and shows that CFL reachability is strictly harder that Boolean Matrix Multiplication. We also present new lower bounds for the case of sparse input graphs where the number of edges m is the input parameter, a common setting in the database literature. For this setting, we show a cubic lower bound for Andersen’s Pointer Analysis which significantly strengthens prior known results.

Publisher's Version
Taking Back Control in an Intermediate Representation for GPU Computing
Vasileios Klimis ORCID logo, Jack Clark ORCID logo, Alan Baker ORCID logo, David Neto ORCID logo, John Wickerson ORCID logo, and Alastair F. DonaldsonORCID logo
(Imperial College London, UK; Google, Canada; Google, UK)
We describe our experiences successfully applying lightweight formal methods to substantially improve and reformulate an important part of Standard Portable Intermediate Representation SPIRV, an industry-standard language for GPU computing. The formal model that we present has allowed us to (1) identify several ambiguities and needless complexities in the way that structured control flow was defined in the SPIRV specification; (2) interact with the authors of the SPIRV specification to rectify these problems; (3) validate the developer tools and conformance test suites that support the SPIRV language by cross-checking them against our formal model, improving the tools, test suites, and our models in the process; and (4) develop a novel method for fuzzing SPIRV compilers to detect miscompilation bugs that leverages our formal model. The latest release of the SPIRV specification incorporates the revised set of control-flow definitions that have arisen from our work. Furthermore, our novel compiler-fuzzing technique has led to the discovery of twenty distinct, previously unknown bugs in SPIRV compilers from Google, the Khronos Group, Intel, and Mozilla. Our work showcases the practical impact that formal modelling and analysis techniques can have on the design and implementation of industry-standard programming languages.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Choice Trees: Representing Nondeterministic, Recursive, and Impure Programs in Coq
Nicolas Chappe ORCID logo, Paul HeORCID logo, Ludovic Henrio ORCID logo, Yannick ZakowskiORCID logo, and Steve ZdancewicORCID logo
(University of Lyon - ENS Lyon - UCBL - CNRS - Inria - LIP, France; University of Pennsylvania, USA)
This paper introduces ctrees, a monad for modeling nondeterministic, recursive, and impure programs in Coq. Inspired by Xia et al.'s itrees, this novel data structure embeds computations into coinductive trees with three kind of nodes: external events, and two variants of nondeterministic branching. This apparent redundancy allows us to provide shallow embedding of denotational models with internal choice in the style of CCS, while recovering an inductive LTS view of the computation. ctrees inherit a vast collection of bisimulation and refinement tools, with respect to which we establish a rich equational theory.
We connect ctrees to the itree infrastructure by showing how a monad morphism embedding the former into the latter permits to use ctrees to implement nondeterministic effects. We demonstrate the utility of ctrees by using them to model concurrency semantics in two case studies: CCS and cooperative multithreading.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Hefty Algebras: Modular Elaboration of Higher-Order Algebraic Effects
Casper Bach Poulsen ORCID logo and Cas van der RestORCID logo
(Delft University of Technology, Netherlands)
Algebraic effects and handlers is an increasingly popular approach to programming with effects. An attraction of the approach is its modularity: effectful programs are written against an interface of declared operations, which allows the implementation of these operations to be defined and refined without changing or recompiling programs written against the interface. However, higher-order operations (i.e., operations that take computations as arguments) break this modularity. While it is possible to encode higher-order operations by elaborating them into more primitive algebraic effects and handlers, such elaborations are typically not modular. In particular, operations defined by elaboration are typically not a part of any effect interface, so we cannot define and refine their implementation without changing or recompiling programs. To resolve this problem, a recent line of research focuses on developing new and improved effect handlers. In this paper we present a (surprisingly) simple alternative solution to the modularity problem with higher-order operations: we modularize the previously non-modular elaborations commonly used to encode higher-order operations. Our solution is as expressive as the state of the art in effects and handlers.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Elements of Quantitative Rewriting
Francesco Gavazzo ORCID logo and Cecilia Di Florio ORCID logo
(University of Pisa, Italy; University of Bologna, Italy)
We introduce a general theory of quantitative and metric rewriting systems, namely systems with a rewriting relation enriched over quantales modelling abstract quantities. We develop theories of abstract and term-based systems, refining cornerstone results of rewriting theory (such as Newman’s Lemma, Church-Rosser Theorem, and critical pair-like lemmas) to a metric and quantitative setting. To avoid distance trivialisation and lack of confluence issues, we introduce non-expansive, linear term rewriting systems, and then generalise the latter to the novel class of graded term rewriting systems. These systems make quantitative rewriting modal and context-sensitive, this way endowing rewriting with coeffectful behaviours.

Publisher's Version
Deconstructing the Calculus of Relations with Tape Diagrams
Filippo Bonchi ORCID logo, Alessandro Di Giorgio ORCID logo, and Alessio SantamariaORCID logo
(University of Pisa, Italy; University of Sussex, UK)
Rig categories with finite biproducts are categories with two monoidal products, where one is a biproduct and the other distributes over it. In this work we present tape diagrams, a sound and complete diagrammatic language for these categories, that can be intuitively thought as string diagrams of string diagrams. We test the effectiveness of our approach against the positive fragment of Tarski's calculus of relations.

Publisher's Version
SSA Translation Is an Abstract Interpretation
Matthieu Lemerre ORCID logo
(Université Paris-Saclay - CEA LIST, France)
Static single assignment (SSA) form is a popular intermediate representation that helps implement useful static analyses, including global value numbering (GVN), sparse dataflow analyses, or SMT-based abstract interpretation or model checking. However, the precision of the SSA translation itself depends on static analyses, and a priori static analysis is even indispensable in the case of low-level input languages like machine code.
To solve this chicken-and-egg problem, we propose to turn the SSA translation into a standard static analysis based on abstract interpretation. This allows the SSA translation to be combined with other static analyses in a single pass, taking advantage of the fact that it is more precise to combine analyses than applying passes in sequence.
We illustrate the practicality of these results by writing a simple dataflow analysis that performs SSA translation, optimistic global value numbering, sparse conditional constant propagation, and loop-invariant code motion in a single small pass; and by presenting a multi-language static analyzer for both C and machine code that uses the SSA abstract domain as its main intermediate representation.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
Probabilistic Resource-Aware Session Types
Ankush Das ORCID logo, Di Wang ORCID logo, and Jan Hoffmann ORCID logo
(Amazon, USA; Carnegie Mellon University, USA)
Session types guarantee that message-passing processes adhere to predefined communication protocols. Prior work on session types has focused on deterministic languages but many message-passing systems, such as Markov chains and randomized distributed algorithms, are probabilistic. To implement and analyze such systems, this article develops the meta theory of probabilistic session types with an application focus on automatic expected resource analysis. Probabilistic session types describe probability distributions over messages and are a conservative extension of intuitionistic (binary) session types. To send on a probabilistic channel, processes have to utilize internal randomness from a probabilistic branching or external randomness from receiving on a probabilistic channel. The analysis for expected resource bounds is smoothly integrated with the type system and is a variant of automatic amortized resource analysis. Type inference relies on linear constraint solving to automatically derive symbolic bounds for various cost metrics. The technical contributions include the meta theory that is based on a novel nested multiverse semantics and a type-reconstruction algorithm that allows flexible mixing of different sources of randomness without burdening the programmer with complex type annotations. The type system has been implemented in the language NomosPro with linear-time type checking. Experiments demonstrate that NomosPro is applicable in different domains such as cost analysis of randomized distributed algorithms, analysis of Markov chains, probabilistic analysis of amortized data structures and digital contracts. NomosPro is also shown to be scalable by (i) implementing two broadcast and a bounded retransmission protocol where messages are dropped with a fixed probability, and (ii) verifying the limiting distribution of a Markov chain with 64 states and 420 transitions.

Publisher's Version Published Artifact Artifacts Available Artifacts Reusable
A Calculus for Amortized Expected Runtimes
Kevin Batz ORCID logo, Benjamin Lucien Kaminski ORCID logo, Joost-Pieter Katoen ORCID logo, Christoph Matheja ORCID logo, and Lena Verscht ORCID logo
(RWTH Aachen University, Germany; Saarland University, Germany; University College London, UK; DTU, Denmark)
We develop a weakest-precondition-style calculus à la Dijkstra for reasoning about amortized expected runtimes of randomized algorithms with access to dynamic memory — the aert calculus. Our calculus is truly quantitative, i.e. instead of Boolean valued predicates, it manipulates real-valued functions. En route to the aert calculus, we study the ert calculus for reasoning about expected runtimes of Kaminski et al. [2018] extended by capabilities for handling dynamic memory, thus enabling compositional and local reasoning about randomized data structures. This extension employs runtime separation logic, which has been foreshadowed by Matheja [2020] and then implemented in Isabelle/HOL by Haslbeck [2021]. In addition to Haslbeck’s results, we further prove soundness of the so-extended ert calculus with respect to an operational Markov decision process model featuring countably-branching nondeterminism, provide extensive intuitive explanations, and provide proof rules enabling separation logic-style verification for upper bounds on expected runtimes. Finally, we build the so-called potential method for amortized analysis into the ert calculus, thus obtaining the aert calculus. Soundness of the aert calculus is obtained from the soundness of the ert calculus and some probabilistic form of telescoping. Since one needs to be able to handle changes in potential which can in principle be both positive or negative, the aert calculus needs to be — essentially — capable of handling certain signed random variables. A particularly pleasing feature of our solution is that, unlike e.g. Kozen [1985], we obtain a loop rule for our signed random variables, and furthermore, unlike e.g. Kaminski and Katoen [2017], the aert calculus makes do without the need for involved technical machinery keeping track of the integrability of the random variables.
Finally, we present case studies, including a formal analysis of a randomized delete-insert-find-any set data structure [Brodal et al. 1996], which yields a constant expected runtime per operation, whereas no deterministic algorithm can achieve this.

Publisher's Version
Reconciling Shannon and Scott with a Lattice of Computable Information
Sebastian Hunt ORCID logo, David Sands ORCID logo, and Sandro Stucki ORCID logo
(City University of London, UK; Chalmers University of Technology, Sweden; Amazon Prime Video, Sweden)
This paper proposes a reconciliation of two different theories of information. The first, originally proposed in a lesser-known work by Claude Shannon (some five years after the publication of his celebrated quantitative theory of communication), describes how the information content of channels can be described qualitatively, but still abstractly, in terms of information elements, where information elements can be viewed as equivalence relations over the data source domain. Shannon showed that these elements have a partial ordering, expressing when one information element is more informative than another, and that these partially ordered information elements form a complete lattice. In the context of security and information flow this structure has been independently rediscovered several times, and used as a foundation for understanding and reasoning about information flow.
The second theory of information is Dana Scott’s domain theory, a mathematical framework for giving meaning to programs as continuous functions over a particular topology. Scott’s partial ordering also represents when one element is more informative than another, but in the sense of computational progress – i.e. when one element is a more defined or evolved version of another.
To give a satisfactory account of information flow in computer programs it is necessary to consider both theories together, in order to understand not only what information is conveyed by a program (viewed as a channel, à la Shannon) but also how the precision with which that information can be observed is determined by the definedness of its encoding (à la Scott). To this end we show how these theories can be fruitfully combined, by defining the Lattice of Computable Information (LoCI), a lattice of preorders rather than equivalence relations. LoCI retains the rich lattice structure of Shannon’s theory, filters out elements that do not make computational sense, and refines the remaining information elements to reflect how Scott’s ordering captures possible varieties in the way that information is presented.
We show how the new theory facilitates the first general definition of termination-insensitive information flow properties, a weakened form of information flow property commonly targeted by static program analyses.

Publisher's Version
Higher-Order MSL Horn Constraints
Jerome Jochems ORCID logo, Eddie Jones ORCID logo, and Steven Ramsay ORCID logo
(University of Bristol, UK)
The monadic shallow linear (MSL) class is a decidable fragment of first-order Horn clauses that was discovered and rediscovered around the turn of the century, with applications in static analysis and verification. We propose a new class of higher-order Horn constraints which extend MSL to higher-order logic and develop a resolution-based decision procedure. Higher-order MSL Horn constraints can quite naturally capture the complex patterns of call and return that are possible in higher-order programs, which make them well suited to higher-order program verification. In fact, we show that the higher-order MSL satisfiability problem and the HORS model checking problem are interreducible, so that higher-order MSL can be seen as a constraint-based approach to higher-order model checking. Finally, we describe an implementation of our decision procedure and its application to verified socket programming.

Publisher's Version
Inductive Synthesis of Structurally Recursive Functional Programs from Non-recursive Expressions
Woosuk Lee ORCID logo and Hangyeol Cho ORCID logo
(Hanyang University, South Korea)
We present a novel approach to synthesizing recursive functional programs from input-output examples. Synthesizing a recursive function is challenging because recursive subexpressions should be constructed while the target function has not been fully defined yet. We address this challenge by using a new technique we call block-based pruning. A block refers to a recursion- and conditional-free expression (i.e., straight-line code) that yields an output from a particular input. We first synthesize as many blocks as possible for each input-output example, and then we explore the space of recursive programs, pruning candidates that are inconsistent with the blocks. Our method is based on an efficient version space learning, thereby effectively dealing with a possibly enormous number of blocks. In addition, we present a method that uses sampled input-output behaviors of library functions to enable a goal-directed search for a recursive program using the library. We have implemented our approach in a system called Trio and evaluated it on synthesis tasks from prior work and on new tasks. Our experiments show that Trio outperforms prior work by synthesizing a solution to 98% of the benchmarks in our benchmark suite.

Publisher's Version Published Artifact Archive submitted (710 kB) Artifacts Available Artifacts Reusable
Temporal Verification with Answer-Effect Modification: Dependent Temporal Type-and-Effect System with Delimited Continuations
Taro SekiyamaORCID logo and Hiroshi UnnoORCID logo
(National Institute of Informatics, Japan; University of Tsukuba, Japan; RIKEN AIP, Japan)
Type-and-effect systems are a widely used approach to program verification, verifying the result of a computation using types, and its behavior using effects. This paper extends an effect system for verifying temporal, value-dependent properties on event sequences yielded by programs, to the delimited control operators shift0/reset0. While these delimited control operators enable useful and powerful programming techniques, they hinder reasoning about the behavior of programs because of their ability to suspend, resume, discard, and duplicate delimited continuations. This problem is more serious in effect systems for temporal properties because these systems must be capable of identifying what event sequences are yielded by captured continuations. Our key observation for achieving effective reasoning in the presence of the delimited control operators is that their use modifies answer effects, which are temporal effects of the continuations. Based on this observation, we extend an effect system for temporal verification to accommodate answer-effect modification. Allowing answer-effect modification enables easily reasoning about traces that captured continuations yield. Another novel feature of our effect system is the support for dependently typed continuations, which allows us to reason about programs more precisely. We prove soundness of the effect system for finite event sequences via type safety and that for infinite event sequences using a logical relation.

Publisher's Version Archive submitted (940 kB)
Modular Primal-Dual Fixpoint Logic Solving for Temporal Verification
Hiroshi UnnoORCID logo, Tachio Terauchi ORCID logo, Yu Gu ORCID logo, and Eric Koskinen ORCID logo
(University of Tsukuba, Japan; RIKEN AIP, Japan; Waseda University, Japan; Stevens Institute of Technology, USA)
We present a novel approach to deciding the validity of formulas in first-order fixpoint logic with background theories and arbitrarily nested inductive and co-inductive predicates defining least and greatest fixpoints. Our approach is constraint-based, and reduces the validity checking problem of the given first-order-fixpoint logic formula (formally, an instance in a language called µCLP) to a constraint satisfaction problem for a recently introduced predicate constraint language.
Coupled with an existing sound-and-relatively-complete solver for the constraint language, this novel reduction alone already gives a sound and relatively complete method for deciding µCLP validity, but we further improve it to a novel modular primal-dual method. The key observations are (1) µCLP is closed under complement such that each (co-)inductive predicate in the original primal instance has a corresponding (co-)inductive predicate representing its complement in the dual instance obtained by taking the standard De Morgan’s dual of the primal instance, and (2) partial solutions for (co-)inductive predicates synthesized during the constraint solving process of the primal side can be used as sound upper-bounds of the corresponding (co-)inductive predicates in the dual side, and vice versa. By solving the primal and dual problems in parallel and exchanging each others’ partial solutions as sound bounds, the two processes mutually reduce each others’ solution spaces, thus enabling rapid convergence. The approach is also modular in that the bounds are synthesized and exchanged at granularity of individual (co-)inductive predicates.
We demonstrate the utility of our novel fixpoint logic solving by encoding a wide variety of temporal verification problems in µCLP, including termination/non-termination, LTL, CTL, and even the full modal µ-calculus model checking of infinite state programs. The encodings exploit the modularity in both the program and the property by expressing each loops and (recursive) functions in the program and sub-formulas of the property as individual (possibly nested) (co-)inductive predicates. Together with our novel modular primal-dual µCLP solving, we obtain a novel approach to efficiently solving a wide range of temporal verification problems.

Publisher's Version
Context-Bounded Verification of Context-Free Specifications
Pascal Baumann ORCID logo, Moses Ganardi ORCID logo, Rupak Majumdar ORCID logo, Ramanathan S. Thinniyam ORCID logo, and Georg Zetzsche ORCID logo
(MPI-SWS, Germany)
A fundamental problem in refinement verification is to check that the language of behaviors of an implementation is included in the language of the specification. We consider the refinement verification problem where the implementation is a multithreaded shared memory system modeled as a multistack pushdown automaton and the specification is an input-deterministic multistack pushdown language. Our main result shows that the context-bounded refinement problem, where we ask that all behaviors generated in runs of bounded number of context switches belong to a specification given by a Dyck language, is decidable and coNP-complete. The more general case of input-deterministic languages follows, with the same complexity. Context-bounding is essential since emptiness for multipushdown automata is already undecidable, and so is the refinement verification problem for the subclass of regular specifications. Input-deterministic languages capture many non-regular specifications of practical interest and our result opens the way for algorithmic analysis of these properties. The context-bounded refinement problem is coNP-hard already with deterministic regular specifications; our result demonstrates that the problem is not harder despite the stronger class of specifications. Our proof introduces several general techniques for formal languages and counter programs and shows that the search for counterexamples can be reduced in non-deterministic polynomial time to the satisfiability problem for existential Presburger arithmetic. These techniques are essential to ensure the coNP upper bound: existing techniques for regular specifications are not powerful enough for decidability, while simple reductions lead to problems that are either undecidable or have high complexities. As a special case, our decidability result gives an algorithmic verification technique to reason about reference counting and re-entrant locking in multithreaded programs.

Publisher's Version
Impredicative Observational Equality
Loïc Pujet ORCID logo and Nicolas TabareauORCID logo
(Inria, France)
In dependent type theory, impredicativity is a powerful logical principle that allows the definition of propositions that quantify over arbitrarily large types, potentially resulting in self-referential propositions. Impredicativity can provide a system with increased logical strength and flexibility, but in counterpart it comes with multiple incompatibility results. In particular, Abel and Coquand showed that adding definitional uniqueness of identity proofs (UIP) to the main proof assistants that support impredicative propositions (Coq and Lean) breaks the normalization procedure, and thus the type-checking algorithm. However, it was not known whether this stems from a fundamental incompatibility between UIP and impredicativity or if a more suitable algorithm could decide type-checking for a type theory that supports both. In this paper, we design a theory that handles both UIP and impredicativity by extending the recently introduced observational type theory TTobs with an impredicative universe of definitionally proof-irrelevant types, as initially proposed in the seminal work on observational equality of Altenkirch et al. We prove decidability of conversion for the resulting system, that we call CCobs, by harnessing proof-irrelevance to avoid computing with impredicative proof terms. Additionally, we prove normalization for CCobs in plain Martin-Löf type theory, thereby showing that adding proof-irrelevant impredicativity does not increase the computational content of the theory.

Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable

proc time: 13.96