POPL 2018 – Author Index |
Contents -
Abstracts -
Authors
|
A B C D E F G H I J K L M N O P R S T U V W Y Z
Abel, Andreas |
![]() Andreas Abel, Joakim Öhman (University of Gothenburg, Sweden; IMDEA Software Institute, Spain; Chalmers University of Technology, Sweden) Type theory should be able to handle its own meta-theory, both to justify its foundational claims and to obtain a verified implementation. At the core of a type checker for intensional type theory lies an algorithm to check equality of types, or in other words, to check whether two types are convertible. We have formalized in Agda a practical conversion checking algorithm for a dependent type theory with one universe à la Russell, natural numbers, and η-equality for Π types. We prove the algorithm correct via a Kripke logical relation parameterized by a suitable notion of equivalence of terms. We then instantiate the parameterized fundamental lemma twice: once to obtain canonicity and injectivity of type formers, and once again to prove the completeness of the algorithm. Our proof relies on inductive-recursive definitions, but not on the uniqueness of identity proofs. Thus, it is valid in variants of intensional Martin-Löf Type Theory as long as they support induction-recursion, for instance, Extensional, Observational, or Homotopy Type Theory. ![]() ![]() ![]() |
|
Abraham, Ittai |
![]() Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar (Tel Aviv University, Israel; VMware, USA; Stanford University, USA) Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object's local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contract executions, excluding these of the DAO or of contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state. ![]() ![]() ![]() |
|
Agrawal, Sheshansh |
![]() Sheshansh Agrawal, Krishnendu Chatterjee (IIT Bombay, India; IST Austria, Austria) Probabilistic programs extend classical imperative programs with real-valued random variables and random branching. The most basic liveness property for such programs is the termination property. The qualitative (aka almost-sure) termination problem asks whether a given program program terminates with probability 1. While ranking functions provide a sound and complete method for non-probabilistic programs, the extension of them to probabilistic programs is achieved via ranking supermartingales (RSMs). Although deep theoretical results have been established about RSMs, their application to probabilistic programs with nondeterminism has been limited only to programs of restricted control-flow structure. For non-probabilistic programs, lexicographic ranking functions provide a compositional and practical approach for termination analysis of real-world programs. In this work we introduce lexicographic RSMs and show that they present a sound method for almost-sure termination of probabilistic programs with nondeterminism. We show that lexicographic RSMs provide a tool for compositional reasoning about almost-sure termination, and for probabilistic programs with linear arithmetic they can be synthesized efficiently (in polynomial time). We also show that with additional restrictions even asymptotic bounds on expected termination time can be obtained through lexicographic RSMs. Finally, we present experimental results on benchmarks adapted from previous work to demonstrate the effectiveness of our approach. ![]() ![]() ![]() |
|
Ahman, Danel |
![]() Danel Ahman, Cédric Fournet (Inria, France; Microsoft Research, UK; ENS Paris, France; Microsoft Research, India; Microsoft Research, USA) We provide a way to ease the verification of programs whose state evolves monotonically. The main idea is that a property witnessed in a prior state can be soundly recalled in the current state, provided (1) state evolves according to a given preorder, and (2) the property is preserved by this preorder. In many scenarios, such monotonic reasoning yields concise modular proofs, saving the need for explicit program invariants. We distill our approach into the monotonic-state monad, a general yet compact interface for Hoare-style reasoning about monotonic state in a dependently typed language. We prove the soundness of the monotonic-state monad and use it as a unified foundation for reasoning about monotonic state in the F⋆ verification system. Based on this foundation, we build libraries for various mutable data structures like monotonic references and apply these libraries at scale to the verification of several distributed applications. ![]() ![]() ![]() ![]() ![]() Danel Ahman (Inria, France) We study algebraic computational effects and their handlers in the dependently typed setting. We describe computational effects using a generalisation of Plotkin and Pretnar's effect theories, whose dependently typed operations allow us to capture precise notions of computation, e.g., state with location-dependent store types and dependently typed update monads. Our treatment of handlers is based on an observation that their conventional term-level definition leads to unsound program equivalences being derivable in languages that include a notion of homomorphism. We solve this problem by giving handlers a novel type-based treatment via a new computation type, the user-defined algebra type, which pairs a value type (the carrier) with a set of value terms (the operations), capturing Plotkin and Pretnar's insight that effect handlers denote algebras. We then show that the conventional presentation of handlers can be routinely derived, and demonstrate that this type-based treatment of handlers provides a useful mechanism for reasoning about effectful computations. We also equip the resulting language with a sound denotational semantics based on families fibrations. ![]() ![]() |
|
Ahmed, Amal |
![]() Olivier Flückiger (Northeastern University, USA; Inria, France; Czech Technical University, Czechia) High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition. ![]() ![]() ![]() William J. Bowman, Youyou Cong, Nick Rioux, and Amal Ahmed (Northeastern University, USA; Ochanomizu University, Japan) Dependently typed languages such as Coq are used to specify and prove functional correctness of source programs, but what we ultimately need are guarantees about correctness of compiled code. By preserving dependent types through each compiler pass, we could preserve source-level specifications and correctness proofs into the generated target-language programs. Unfortunately, type-preserving compilation of dependent types is hard. In 2002, Barthe and Uustalu showed that type-preserving CPS is not possible for languages such as Coq. Specifically, they showed that for strong dependent pairs (Σ types), the standard typed call-by-name CPS is not type preserving. They further proved that for dependent case analysis on sums, a class of typed CPS translations—including the standard translation—is not possible. In 2016, Morrisett noticed a similar problem with the standard call-by-value CPS translation for dependent functions (Π types). In essence, the problem is that the standard typed CPS translation by double-negation, in which computations are assigned types of the form (A → ⊥) → ⊥, disrupts the term/type equivalence that is used during type checking in a dependently typed language. In this paper, we prove that type-preserving CPS translation for dependently typed languages is not not possible. We develop both call-by-name and call-by-value CPS translations from the Calculus of Constructions with both Π and Σ types (CC) to a dependently typed target language, and prove type preservation and compiler correctness of each translation. Our target language is CC extended with an additional equivalence rule and an additional typing rule, which we prove consistent by giving a model in the extensional Calculus of Constructions. Our key observation is that we can use a CPS translation that employs answer-type polymorphism, where CPS-translated computations have type ∀ α. (A → α) → α. This type justifies, by a free theorem, the new equality rule in our target language and allows us to recover the term/type equivalences that CPS translation disrupts. Finally, we conjecture that our translation extends to dependent case analysis on sums, despite the impossibility result, and provide a proof sketch. ![]() ![]() ![]() |
|
Aiken, Alex |
![]() Wonyeol Lee (Stanford University, USA; Microsoft Research, India) Industry standard implementations of math.h claim (often without formal proof) tight bounds on floating-point errors. We demonstrate a novel static analysis that proves these bounds and verifies the correctness of these implementations. Our key insight is a reduction of this verification task to a set of mathematical optimization problems that can be solved by off-the-shelf computer algebra systems. We use this analysis to prove the correctness of implementations in Intel's math library automatically. Prior to this work, these implementations could only be verified with significant manual effort. ![]() ![]() |
|
Albarghouthi, Aws |
![]() Aws Albarghouthi (University of Wisconsin-Madison, USA; University College London, UK) Differential privacy has emerged as a promising probabilistic formulation of privacy, generating intense interest within academia and industry. We present a push-button, automated technique for verifying ε-differential privacy of sophisticated randomized algorithms. We make several conceptual, algorithmic, and practical contributions: (i) Inspired by the recent advances on approximate couplings and randomness alignment, we present a new proof technique called coupling strategies, which casts differential privacy proofs as a winning strategy in a game where we have finite privacy resources to expend. (ii) To discover a winning strategy, we present a constraint-based formulation of the problem as a set of Horn modulo couplings (HMC) constraints, a novel combination of first-order Horn clauses and probabilistic constraints. (iii) We present a technique for solving HMC constraints by transforming probabilistic constraints into logical constraints with uninterpreted functions. (iv) Finally, we implement our technique in the FairSquare verifier and provide the first automated privacy proofs for a number of challenging algorithms from the differential privacy literature, including Report Noisy Max, the Exponential Mechanism, and the Sparse Vector Mechanism. ![]() ![]() |
|
Amin, Nada |
![]() Nada Amin and Tiark Rompf (University of Cambridge, UK; Purdue University, USA) Given a tower of interpreters, i.e., a sequence of multiple interpreters interpreting one another as input programs, we aim to collapse this tower into a compiler that removes all interpretive overhead and runs in a single pass. In the real world, a use case might be Python code executed by an x86 runtime, on a CPU emulated in a JavaScript VM, running on an ARM CPU. Collapsing such a tower can not only exponentially improve runtime performance, but also enable the use of base-language tools for interpreted programs, e.g., for analysis and verification. In this paper, we lay the foundations in an idealized but realistic setting. We present a multi-level lambda calculus that features staging constructs and stage polymorphism: based on runtime parameters, an evaluator either executes source code (thereby acting as an interpreter) or generates code (thereby acting as a compiler). We identify stage polymorphism, a programming model from the domain of high-performance program generators, as the key mechanism to make such interpreters compose in a collapsible way. We present Pink, a meta-circular Lisp-like evaluator on top of this calculus, and demonstrate that we can collapse arbitrarily many levels of self-interpretation, including levels with semantic modifications. We discuss several examples: compiling regular expressions through an interpreter to base code, building program transformers from modi ed interpreters, and others. We develop these ideas further to include reflection and reification, culminating in Purple, a reflective language inspired by Brown, Blond, and Black, which realizes a conceptually infinite tower, where every aspect of the semantics can change dynamically. Addressing an open challenge, we show how user programs can be compiled and recompiled under user-modified semantics. ![]() ![]() ![]() ![]() |
|
Bach Poulsen, Casper |
![]() Casper Bach Poulsen (Delft University of Technology, Netherlands; Portland State University, USA) A definitional interpreter defines the semantics of an object language in terms of the (well-known) semantics of a host language, enabling understanding and validation of the semantics through execution. Combining a definitional interpreter with a separate type system requires a separate type safety proof. An alternative approach, at least for pure object languages, is to use a dependently-typed language to encode the object language type system in the definition of the abstract syntax. Using such intrinsically-typed abstract syntax definitions allows the host language type checker to verify automatically that the interpreter satisfies type safety. Does this approach scale to larger and more realistic object languages, and in particular to languages with mutable state and objects? In this paper, we describe and demonstrate techniques and libraries in Agda that successfully scale up intrinsically-typed definitional interpreters to handle rich object languages with non-trivial binding structures and mutable state. While the resulting interpreters are certainly more complex than the simply-typed λ-calculus interpreter we start with, we claim that they still meet the goals of being concise, comprehensible, and executable, while guaranteeing type safety for more elaborate object languages. We make the following contributions: (1) A dependent-passing style technique for hiding the weakening of indexed values as they propagate through monadic code. (2) An Agda library for programming with scope graphs and frames, which provides a uniform approach to dealing with name binding in intrinsically-typed interpreters. (3) Case studies of intrinsically-typed definitional interpreters for the simply-typed λ-calculus with references (STLC+Ref) and for a large subset of Middleweight Java (MJ). ![]() ![]() ![]() |
|
Bao, Wenlei |
![]() Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and P. Sadayappan (Ohio State University, USA; Pacific Northwest National Laboratory, USA; Colorado State University, USA) Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost models are used, due to the lack of precise compile-time models for misses for hierarchical caches. The current state of practice for cache miss analysis is based on accurate simulation. However, simulation requires time proportional to the dataset/problem size, as well as the number of distinct cache configurations of interest to be evaluated. This paper takes a fundamentally different approach, by focusing on polyhedral programs with static control flow. Instead of relying on costly simulation, a closed-form solution for modeling of misses in a set associative cache hierarchy is developed. This solution can enable program transformation choice at compile time to optimize cache misses. A tool implementing the approach has been developed and used for validation of the framework. ![]() ![]() |
|
Barthe, Gilles |
![]() Ivan Radiček, Gilles Barthe (Vienna University of Technology, Austria; IMDEA Software Institute, Spain; SUNY Buffalo, USA; MPI-SWS, Germany) Formal frameworks for cost analysis of programs have been widely studied in the unary setting and, to a limited extent, in the relational setting. However, many of these frameworks focus only on the cost aspect, largely side-lining functional properties that are often a prerequisite for cost analysis, thus leaving many interesting programs out of their purview. In this paper, we show that elegant, simple, expressive proof systems combining cost analysis and functional properties can be built by combining already known ingredients: higher-order refinements and cost monads. Specifically, we derive two syntax-directed proof systems, UC and RC, for unary and relational cost analysis, by adding a cost monad to a (syntax-directed) logic of higher-order programs. We study the metatheory of the systems, show that several nontrivial examples can be verified in them, and prove that existing frameworks for cost analysis (RelCost and RAML) can be embedded in them. ![]() ![]() ![]() Gilles Barthe (IMDEA Software Institute, Spain; UPMC, France; Inria, France; University College London, UK; École Polytechnique, France) Program sensitivity, also known as Lipschitz continuity, describes how small changes in a program’s input lead to bounded changes in the output. We propose an average notion of program sensitivity for probabilistic programs—expected sensitivity—that averages a distance function over a probabilistic coupling of two output distributions from two similar inputs. By varying the distance, expected sensitivity recovers useful notions of probabilistic function sensitivity, including stability of machine learning algorithms and convergence of Markov chains. Furthermore, expected sensitivity satisfies clean compositional properties and is amenable to formal verification. We develop a relational program logic called EpRHL for proving expected sensitivity properties. Our logic features two key ideas. First, relational pre-conditions and post-conditions are expressed using distances, a real-valued generalization of typical boolean-valued (relational) assertions. Second, judgments are interpreted in terms of expectation coupling, a novel, quantitative generalization of probabilistic couplings which supports compositional reasoning. We demonstrate our logic on examples beyond the reach of prior relational logics. Our main example formalizes uniform stability of the stochastic gradient method. Furthermore, we prove rapid mixing for a probabilistic model of population dynamics. We also extend our logic with a transitivity principle for expectation couplings to capture the path coupling proof technique by Bubley and Dyer, and formalize rapid mixing of the Glauber dynamics from statistical physics. ![]() ![]() ![]() |
|
Bernardy, Jean-Philippe |
![]() Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack (University of Gothenburg, Sweden; Tweag I/O, France; Indiana University, USA; Microsoft Research, UK) Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values. To demonstrate the efficacy of our linear type system — both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types — we implemented our type system in ghc, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions. ![]() ![]() ![]() |
|
Biboudis, Aggelos |
![]() Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki (EPFL, Switzerland; Northeastern University, USA) Understanding a program entails understanding its context; dependencies, configurations and even implementations are all forms of contexts. Modern programming languages and theorem provers offer an array of constructs to define contexts, implicitly. Scala offers implicit parameters which are used pervasively, but which cannot be abstracted over. This paper describes a generalization of implicit parameters to implicit function types, a powerful way to abstract over the context in which some piece of code is run. We provide a formalization based on bidirectional type-checking that closely follows the semantics implemented by the Scala compiler. To demonstrate their range of abstraction capabilities, we present several applications that make use of implicit function types. We show how to encode the builder pattern, tagless interpreters, reader and free monads and we assess the performance of the monadic structures presented. ![]() ![]() ![]() |
|
Biernacki, Dariusz |
![]() Dariusz Biernacki, Maciej Piróg, Piotr Polesiuk, and Filip Sieczkowski (University of Wrocław, Poland) Algebraic effects and handlers have received a lot of attention recently, both from the theoretical point of view and in practical language design. This stems from the fact that algebraic effects give the programmer unprecedented freedom to define, combine, and interpret computational effects. This plenty-of-rope, however, demands not only a deep understanding of the underlying semantics, but also access to practical means of reasoning about effectful code, including correctness and program equivalence. In this paper we tackle this problem by constructing a step-indexed relational interpretation of a call-by-value calculus with algebraic effect handlers and a row-based polymorphic type-and-effect system. Our calculus, while striving for simplicity, enjoys desirable theoretical properties, and is close to the cores of programming languages with algebraic effects used in the wild, while the logical relation we build for it can be used to reason about non-trivial properties, such as contextual equivalence and contextual approximation of programs. Our development has been fully formalised in the Coq proof assistant. ![]() ![]() ![]() ![]() |
|
Birkedal, Lars |
![]() Amin Timany, Léo Stefanesco, Morten Krogh-Jespersen, and Lars Birkedal (KU Leuven, Belgium; IRIF, France; CNRS, France; University of Paris Diderot, France; Aarhus University, Denmark) We present a logical relations model of a higher-order functional programming language with impredicative polymorphism, recursive types, and a Haskell-style ST monad type with runST. We use our logical relations model to show that runST provides proper encapsulation of state, by showing that effectful computations encapsulated by runST are heap independent. Furthermore, we show that contextual refinements and equivalences that are expected to hold for pure computations do indeed hold in the presence of runST. This is the first time such relational results have been proven for a language with monadic encapsulation of state. We have formalized all the technical development and results in Coq. ![]() ![]() ![]() |
|
Blanvillain, Olivier |
![]() Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki (EPFL, Switzerland; Northeastern University, USA) Understanding a program entails understanding its context; dependencies, configurations and even implementations are all forms of contexts. Modern programming languages and theorem provers offer an array of constructs to define contexts, implicitly. Scala offers implicit parameters which are used pervasively, but which cannot be abstracted over. This paper describes a generalization of implicit parameters to implicit function types, a powerful way to abstract over the context in which some piece of code is run. We provide a formalization based on bidirectional type-checking that closely follows the semantics implemented by the Scala compiler. To demonstrate their range of abstraction capabilities, we present several applications that make use of implicit function types. We show how to encode the builder pattern, tagless interpreters, reader and free monads and we assess the performance of the monadic structures presented. ![]() ![]() ![]() |
|
Bodik, Rastislav |
![]() Kartik Chandra and Rastislav Bodik (Henry M. Gunn High School, USA; Stanford University, USA; University of Washington, USA) When designing a type system, we may want to mechanically check the design to guide its further development. We describe algorithms that perform symbolic reasoning about executable models of type systems. The algorithms support three queries. First, they check type soundness and synthesize a counterexample program if such a soundness bug is found. Second, they compare two versions of a type system, synthesizing a program accepted by one but rejected by the other. Third, they minimize the size of synthesized counterexample programs. These algorithms symbolically evaluate typecheckers and interpreters, producing formulas that characterize the set of programs that fail or succeed in the typechecker and the interpreter. However, symbolically evaluating interpreters poses efficiency challenges, which are caused by having to merge execution paths of the various possible input programs. Our main contribution is the bonsai tree, a novel symbolic representation of programs and program states that addresses these challenges. Bonsai trees encode complex syntactic information in terms of logical constraints, enabling more efficient merging. We implement these algorithms in the Bonsai tool, an assistant for type system designers. We perform case studies on how Bonsai helps test and explore a variety of type systems. Bonsai efficiently synthesizes counterexamples for soundness bugs previously inaccessible to automatic tools and is the first automated tool to find a counterexample for the recently discovered Scala soundness bug SI-9633. ![]() ![]() ![]() |
|
Boespflug, Mathieu |
![]() Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack (University of Gothenburg, Sweden; Tweag I/O, France; Indiana University, USA; Microsoft Research, UK) Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values. To demonstrate the efficacy of our linear type system — both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types — we implemented our type system in ghc, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions. ![]() ![]() ![]() |
|
Bowman, William J. |
![]() William J. Bowman, Youyou Cong, Nick Rioux, and Amal Ahmed (Northeastern University, USA; Ochanomizu University, Japan) Dependently typed languages such as Coq are used to specify and prove functional correctness of source programs, but what we ultimately need are guarantees about correctness of compiled code. By preserving dependent types through each compiler pass, we could preserve source-level specifications and correctness proofs into the generated target-language programs. Unfortunately, type-preserving compilation of dependent types is hard. In 2002, Barthe and Uustalu showed that type-preserving CPS is not possible for languages such as Coq. Specifically, they showed that for strong dependent pairs (Σ types), the standard typed call-by-name CPS is not type preserving. They further proved that for dependent case analysis on sums, a class of typed CPS translations—including the standard translation—is not possible. In 2016, Morrisett noticed a similar problem with the standard call-by-value CPS translation for dependent functions (Π types). In essence, the problem is that the standard typed CPS translation by double-negation, in which computations are assigned types of the form (A → ⊥) → ⊥, disrupts the term/type equivalence that is used during type checking in a dependently typed language. In this paper, we prove that type-preserving CPS translation for dependently typed languages is not not possible. We develop both call-by-name and call-by-value CPS translations from the Calculus of Constructions with both Π and Σ types (CC) to a dependently typed target language, and prove type preservation and compiler correctness of each translation. Our target language is CC extended with an additional equivalence rule and an additional typing rule, which we prove consistent by giving a model in the extensional Calculus of Constructions. Our key observation is that we can use a CPS translation that employs answer-type polymorphism, where CPS-translated computations have type ∀ α. (A → α) → α. This type justifies, by a free theorem, the new equality rule in our target language and allows us to recover the term/type equivalences that CPS translation disrupts. Finally, we conjecture that our translation extends to dependent case analysis on sums, despite the impossibility result, and provide a proof sketch. ![]() ![]() ![]() |
|
Breck, Jason |
![]() Zachary Kincaid (Princeton University, USA; University of Wisconsin-Madison, USA; GrammaTech, USA) Automatic generation of non-linear loop invariants is a long-standing challenge in program analysis, with many applications. For instance, reasoning about exponentials provides a way to find invariants of digital-filter programs, and reasoning about polynomials and/or logarithms is needed for establishing invariants that describe the time or memory usage of many well-known algorithms. An appealing approach to this challenge is to exploit the powerful recurrence-solving techniques that have been developed in the field of computer algebra, which can compute exact characterizations of non-linear repetitive behavior. However, there is a gap between the capabilities of recurrence solvers and the needs of program analysis: (1) loop bodies are not merely systems of recurrence relations---they may contain conditional branches, nested loops, non-deterministic assignments, etc., and (2) a client program analyzer must be able to reason about the closed-form solutions produced by a recurrence solver (e.g., to prove assertions). This paper presents a method for generating non-linear invariants of general loops based on analyzing recurrence relations. The key components are an abstract domain for reasoning about non-linear arithmetic, a semantics-based method for extracting recurrence relations from loop bodies, and a recurrence solver that avoids closed forms that involve complex or irrational numbers. Our technique has been implemented in a program analyzer that can analyze general loops and mutually recursive procedures. Our experiments show that our technique shows promise for non-linear assertion-checking and resource-bound generation. ![]() ![]() ![]() |
|
Brown, Matt |
![]() Matt Brown and Jens Palsberg (University of California at Los Angeles, USA) We present partial evaluation by specialization-safe normalization, a novel partial evaluation technique that is Jones-optimal, that can be self-applied to achieve the Futamura projections and that can be type-checked to ensure it always generates code with the correct type. Jones-optimality is the gold-standard for nontrivial partial evaluation and guarantees that a specializer can remove an entire layer of interpretation. We achieve Jones-optimality by using a novel affine-variable static analysis that directs specialization-safe normalization to always decrease a program’s runtime. We demonstrate the robustness of our approach by showing Jones-optimality in a variety of settings. We have formally proved that our partial evaluator is Jones-optimal for call-by-value reduction, and we have experimentally shown that it is Jones-optimal for call-by-value, normal-order, and memoized normal-order. Each of our experiments tests Jones-optimality with three different self-interpreters. We implemented our partial evaluator in Fωµ i, a recent language for typed self-applicable meta-programming. It is the first Jones-optimal and self-applicable partial evaluator whose type guarantees that it always generates type-correct code. ![]() ![]() ![]() ![]() |
|
Cai, Yufei |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Campora, John Peter |
![]() John Peter Campora, Sheng Chen, Martin Erwig, and Eric Walkingshaw (University of Louisiana at Lafayette, USA; Oregon State University, USA) Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically-typed to more statically-typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared. In this paper, we address this problem by developing migrational typing, which efficiently types all possible ways of adding or removing type annotations from a gradually typed program. The typing result supports automatically migrating a program to be as static as possible, or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing. ![]() ![]() ![]() |
|
Capriotti, Paolo |
![]() Paolo Capriotti and Nicolai Kraus (University of Nottingham, UK) Category theory in homotopy type theory is intricate as categorical laws can only be stated ”up to homotopy”, and thus require coherences. The established notion of a univalent category (as introduced by Ahrens et al.) solves this by considering only truncated types, roughly corresponding to an ordinary category. This fails to capture many naturally occurring structures, stemming from the fact that the naturally occurring structures in homotopy type theory are not ordinary, but rather higher categories. Out of the large variety of approaches to higher category theory that mathematicians have proposed, we believe that, for type theory, the simplicial strategy is best suited. Work by Lurie and Harpaz motivates the following definition. Given the first (n+3) levels of a semisimplicial type S, we can equip S with three properties: first, contractibility of the types of certain horn fillers; second, a completeness property; and third, a truncation condition. We call this a complete semi-Segal n-type. This is very similar to an earlier suggestion by Schreiber. The definition of a univalent (1-) category by Ahrens et al. can easily be extended or restricted to the definition of a univalent n-category (more precisely, (n,1)-category) for n ∈ {0,1,2}, and we show that the type of complete semi-Segal n-types is equivalent to the type of univalent n-categories in these cases. Thus, we believe that the notion of a complete semi-Segal n-type can be taken as the definition of a univalent n-category. We provide a formalisation in the proof assistant Agda using a completely explicit representation of semi-simplicial types for levels up to 4. ![]() ![]() ![]() ![]() |
|
Cathcart Burn, Toby |
![]() Toby Cathcart Burn, C.-H. Luke Ong, and Steven J. Ramsay (University of Oxford, UK; University of Bristol, UK) Motivated by applications in automated verification of higher-order functional programs, we develop a notion of constrained Horn clauses in higher-order logic and a decision problem concerning their satisfiability. We show that, although satisfiable systems of higher-order clauses do not generally have least models, there is a notion of canonical model obtained through a reduction to a problem concerning a kind of monotone logic program. Following work in higher-order program verification, we develop a refinement type system in order to reason about and automate the search for models. This provides a sound but incomplete method for solving the decision problem. Finally, we show that there is a sense in which we can use refinement types to express properties of terms whilst staying within the higher-order constrained Horn clause framework. ![]() ![]() ![]() |
|
Chalupa, Marek |
![]() Marek Chalupa, Krishnendu Chatterjee (Masaryk University, Czechia; IST Austria, Austria; Kena Labs, India; IIT Bombay, India) We present a new dynamic partial-order reduction method for stateless model checking of concurrent programs. A common approach for exploring program behaviors relies on enumerating the traces of the program, without storing the visited states (aka stateless exploration). As the number of distinct traces grows exponentially, dynamic partial-order reduction (DPOR) techniques have been successfully used to partition the space of traces into equivalence classes (Mazurkiewicz partitioning), with the goal of exploring only few representative traces from each class. We introduce a new equivalence on traces under sequential consistency semantics, which we call the observation equivalence. Two traces are observationally equivalent if every read event observes the same write event in both traces. While the traditional Mazurkiewicz equivalence is control-centric, our new definition is data-centric. We show that our observation equivalence is coarser than the Mazurkiewicz equivalence, and in many cases even exponentially coarser. We devise a DPOR exploration of the trace space, called data-centric DPOR, based on the observation equivalence. For acyclic architectures, our algorithm is guaranteed to explore exactly one representative trace from each observation class, while spending polynomial time per class. Hence, our algorithm is optimal wrt the observation equivalence, and in several cases explores exponentially fewer traces than any enumerative method based on the Mazurkiewicz equivalence. For cyclic architectures, we consider an equivalence between traces which is finer than the observation equivalence; but coarser than the Mazurkiewicz equivalence, and in some cases is exponentially coarser. Our data-centric DPOR algorithm remains optimal under this trace equivalence. Finally, we perform a basic experimental comparison between the existing Mazurkiewicz-based DPOR and our data-centric DPOR on a set of academic benchmarks. Our results show a significant reduction in both running time and the number of explored equivalence classes. ![]() ![]() |
|
Chandra, Kartik |
![]() Kartik Chandra and Rastislav Bodik (Henry M. Gunn High School, USA; Stanford University, USA; University of Washington, USA) When designing a type system, we may want to mechanically check the design to guide its further development. We describe algorithms that perform symbolic reasoning about executable models of type systems. The algorithms support three queries. First, they check type soundness and synthesize a counterexample program if such a soundness bug is found. Second, they compare two versions of a type system, synthesizing a program accepted by one but rejected by the other. Third, they minimize the size of synthesized counterexample programs. These algorithms symbolically evaluate typecheckers and interpreters, producing formulas that characterize the set of programs that fail or succeed in the typechecker and the interpreter. However, symbolically evaluating interpreters poses efficiency challenges, which are caused by having to merge execution paths of the various possible input programs. Our main contribution is the bonsai tree, a novel symbolic representation of programs and program states that addresses these challenges. Bonsai trees encode complex syntactic information in terms of logical constraints, enabling more efficient merging. We implement these algorithms in the Bonsai tool, an assistant for type system designers. We perform case studies on how Bonsai helps test and explore a variety of type systems. Bonsai efficiently synthesizes counterexamples for soundness bugs previously inaccessible to automatic tools and is the first automated tool to find a counterexample for the recently discovered Scala soundness bug SI-9633. ![]() ![]() ![]() |
|
Chang, Stephen |
![]() Stephen Chang, Alex Knauth, and Emina Torlak (Northeastern University, USA; University of Washington, USA) We present lambda_sym, a typed λ-calculus for lenient symbolic execution, where some language constructs do not recognize symbolic values. Its type system, however, ensures safe behavior of all symbolic values in a program. Our calculus extends a base occurrence typing system with symbolic types and mutable state, making it a suitable model for both functional and imperative symbolically executed languages. Naively allowing mutation in this mixed setting introduces soundness issues, however, so we further add concreteness polymorphism, which restores soundness without rejecting too many valid programs. To show that our calculus is a useful model for a real language, we implemented Typed Rosette, a typed extension of the solver-aided Rosette language. We evaluate Typed Rosette by porting a large code base, demonstrating that our type system accommodates a wide variety of symbolically executed programs. ![]() ![]() ![]() |
|
Chatterjee, Krishnendu |
![]() Krishnendu Chatterjee (IST Austria, Austria; IIT Bombay, India) A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graph where the edges are labeled with different types of opening and closing parentheses, and the reachability information is computed via paths whose parentheses are properly matched. We present new results for Dyck reachability problems with applications to alias analysis and data-dependence analysis. Our main contributions, that include improved upper bounds as well as lower bounds that establish optimality guarantees, are as follows: First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph with n nodes and m edges, we present: (i) an algorithm with worst-case running time O(m + n · α(n)), where α(n) is the inverse Ackermann function, improving the previously known O(n2) time bound; (ii) a matching lower bound that shows that our algorithm is optimal wrt to worst-case complexity; and (iii) an optimal average-case upper bound of O(m) time, improving the previously known O(m · logn) bound. Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is only linear, and only wrt the number of call sites. Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean Matrix Multiplication, which is a long-standing open problem. Thus we establish that the existing combinatorial algorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the same hardness holds for graphs of constant treewidth. Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependence analysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform all existing methods on the two problems, over real-world benchmarks. ![]() ![]() ![]() ![]() Marek Chalupa, Krishnendu Chatterjee (Masaryk University, Czechia; IST Austria, Austria; Kena Labs, India; IIT Bombay, India) We present a new dynamic partial-order reduction method for stateless model checking of concurrent programs. A common approach for exploring program behaviors relies on enumerating the traces of the program, without storing the visited states (aka stateless exploration). As the number of distinct traces grows exponentially, dynamic partial-order reduction (DPOR) techniques have been successfully used to partition the space of traces into equivalence classes (Mazurkiewicz partitioning), with the goal of exploring only few representative traces from each class. We introduce a new equivalence on traces under sequential consistency semantics, which we call the observation equivalence. Two traces are observationally equivalent if every read event observes the same write event in both traces. While the traditional Mazurkiewicz equivalence is control-centric, our new definition is data-centric. We show that our observation equivalence is coarser than the Mazurkiewicz equivalence, and in many cases even exponentially coarser. We devise a DPOR exploration of the trace space, called data-centric DPOR, based on the observation equivalence. For acyclic architectures, our algorithm is guaranteed to explore exactly one representative trace from each observation class, while spending polynomial time per class. Hence, our algorithm is optimal wrt the observation equivalence, and in several cases explores exponentially fewer traces than any enumerative method based on the Mazurkiewicz equivalence. For cyclic architectures, we consider an equivalence between traces which is finer than the observation equivalence; but coarser than the Mazurkiewicz equivalence, and in some cases is exponentially coarser. Our data-centric DPOR algorithm remains optimal under this trace equivalence. Finally, we perform a basic experimental comparison between the existing Mazurkiewicz-based DPOR and our data-centric DPOR on a set of academic benchmarks. Our results show a significant reduction in both running time and the number of explored equivalence classes. ![]() ![]() ![]() Sheshansh Agrawal, Krishnendu Chatterjee (IIT Bombay, India; IST Austria, Austria) Probabilistic programs extend classical imperative programs with real-valued random variables and random branching. The most basic liveness property for such programs is the termination property. The qualitative (aka almost-sure) termination problem asks whether a given program program terminates with probability 1. While ranking functions provide a sound and complete method for non-probabilistic programs, the extension of them to probabilistic programs is achieved via ranking supermartingales (RSMs). Although deep theoretical results have been established about RSMs, their application to probabilistic programs with nondeterminism has been limited only to programs of restricted control-flow structure. For non-probabilistic programs, lexicographic ranking functions provide a compositional and practical approach for termination analysis of real-world programs. In this work we introduce lexicographic RSMs and show that they present a sound method for almost-sure termination of probabilistic programs with nondeterminism. We show that lexicographic RSMs provide a tool for compositional reasoning about almost-sure termination, and for probabilistic programs with linear arithmetic they can be synthesized efficiently (in polynomial time). We also show that with additional restrictions even asymptotic bounds on expected termination time can be obtained through lexicographic RSMs. Finally, we present experimental results on benchmarks adapted from previous work to demonstrate the effectiveness of our approach. ![]() ![]() ![]() |
|
Chen, Sheng |
![]() John Peter Campora, Sheng Chen, Martin Erwig, and Eric Walkingshaw (University of Louisiana at Lafayette, USA; Oregon State University, USA) Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically-typed to more statically-typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared. In this paper, we address this problem by developing migrational typing, which efficiently types all possible ways of adding or removing type annotations from a gradually typed program. The typing result supports automatically migrating a program to be as static as possible, or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing. ![]() ![]() ![]() |
|
Chen, Taolue |
![]() Taolue Chen, Yan Chen, Matthew Hague (University of London, UK; Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Royal Holloway University of London, UK; University of Oxford, UK) The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in practice; for example, the use of regular constraints (pattern matching against a regular expression), and the string-replace function (replacing either the first occurrence or all occurrences of a ``pattern'' string constant/variable/regular expression by a ``replacement'' string constant/variable), among many others. Both regular constraints and the string-replace function are crucial for such applications as analysis of JavaScript (or more generally HTML5 applications) against cross-site scripting (XSS) vulnerabilities, which motivates us to consider a richer class of string constraints. The importance of the string-replace function (especially the replace-all facility) is increasingly recognised, which can be witnessed by the incorporation of the function in the input languages of several string constraint solvers. Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typically met by string constraints that are generated by symbolic execution. In this paper, we provide the first systematic study of straight-line string constraints with the string-replace function and the regular constraints as the basic operations. We show that a large class of such constraints (i.e. when only a constant string or a regular expression is permitted in the pattern) is decidable. We note that the string-replace function, even under this restriction, is sufficiently powerful for expressing the concatenation operator and much more (e.g. extensions of regular expressions with string variables). This gives us the most expressive decidable logic containing concatenation, replace, and regular constraints under the same umbrella. Our decision procedure for the straight-line fragment follows an automata-theoretic approach, and is modular in the sense that the string-replace terms are removed one by one to generate more and more regular constraints, which can then be discharged by the state-of-the-art string constraint solvers. We also show that this fragment is, in a way, a maximal decidable subclass of the straight-line fragment with string-replace and regular constraints. To this end, we show undecidability results for the following two extensions: (1) variables are permitted in the pattern parameter of the replace function, (2) length constraints are permitted. ![]() ![]() |
|
Chen, Yan |
![]() Taolue Chen, Yan Chen, Matthew Hague (University of London, UK; Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Royal Holloway University of London, UK; University of Oxford, UK) The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in practice; for example, the use of regular constraints (pattern matching against a regular expression), and the string-replace function (replacing either the first occurrence or all occurrences of a ``pattern'' string constant/variable/regular expression by a ``replacement'' string constant/variable), among many others. Both regular constraints and the string-replace function are crucial for such applications as analysis of JavaScript (or more generally HTML5 applications) against cross-site scripting (XSS) vulnerabilities, which motivates us to consider a richer class of string constraints. The importance of the string-replace function (especially the replace-all facility) is increasingly recognised, which can be witnessed by the incorporation of the function in the input languages of several string constraint solvers. Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typically met by string constraints that are generated by symbolic execution. In this paper, we provide the first systematic study of straight-line string constraints with the string-replace function and the regular constraints as the basic operations. We show that a large class of such constraints (i.e. when only a constant string or a regular expression is permitted in the pattern) is decidable. We note that the string-replace function, even under this restriction, is sufficiently powerful for expressing the concatenation operator and much more (e.g. extensions of regular expressions with string variables). This gives us the most expressive decidable logic containing concatenation, replace, and regular constraints under the same umbrella. Our decision procedure for the straight-line fragment follows an automata-theoretic approach, and is modular in the sense that the string-replace terms are removed one by one to generate more and more regular constraints, which can then be discharged by the state-of-the-art string constraint solvers. We also show that this fragment is, in a way, a maximal decidable subclass of the straight-line fragment with string-replace and regular constraints. To this end, we show undecidability results for the following two extensions: (1) variables are permitted in the pattern parameter of the replace function, (2) length constraints are permitted. ![]() ![]() |
|
Chin, Wei-Ngan |
![]() Quang-Trung Ta, Ton Chanh Le, Siau-Cheng Khoo, and Wei-Ngan Chin (National University of Singapore, Singapore) The symbolic-heap fragment of separation logic has been actively developed and advocated for verifying the memory-safety property of computer programs. At present, one of its biggest challenges is to effectively prove entailments containing inductive heap predicates. These entailments are usually proof obligations generated when verifying programs that manipulate complex data structures like linked lists, trees, or graphs. To assist in proving such entailments, this paper introduces a lemma synthesis framework, which automatically discovers lemmas to serve as eureka steps in the proofs. Mathematical induction and template-based constraint solving are two pillars of our framework. To derive the supporting lemmas for a given entailment, the framework firstly identifies possible lemma templates from the entailment's heap structure. It then sets up unknown relations among each template's variables and conducts structural induction proof to generate constraints about these relations. Finally, it solves the constraints to find out actual definitions of the unknown relations, thus discovers the lemmas. We have integrated this framework into a prototype prover and have experimented it on various entailment benchmarks. The experimental results show that our lemma-synthesis-assisted prover can prove many entailments that could not be handled by existing techniques. This new proposal opens up more opportunities to automatically reason with complex inductive heap predicates. ![]() ![]() ![]() ![]() |
|
Choudhary, Bhavya |
![]() Krishnendu Chatterjee (IST Austria, Austria; IIT Bombay, India) A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graph where the edges are labeled with different types of opening and closing parentheses, and the reachability information is computed via paths whose parentheses are properly matched. We present new results for Dyck reachability problems with applications to alias analysis and data-dependence analysis. Our main contributions, that include improved upper bounds as well as lower bounds that establish optimality guarantees, are as follows: First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph with n nodes and m edges, we present: (i) an algorithm with worst-case running time O(m + n · α(n)), where α(n) is the inverse Ackermann function, improving the previously known O(n2) time bound; (ii) a matching lower bound that shows that our algorithm is optimal wrt to worst-case complexity; and (iii) an optimal average-case upper bound of O(m) time, improving the previously known O(m · logn) bound. Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is only linear, and only wrt the number of call sites. Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean Matrix Multiplication, which is a long-standing open problem. Thus we establish that the existing combinatorial algorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the same hardness holds for graphs of constant treewidth. Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependence analysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform all existing methods on the two problems, over real-world benchmarks. ![]() ![]() ![]() |
|
Choudhury, Vikraman |
![]() Niki Vazou, Anish Tondwalkar, Vikraman Choudhury (University of Maryland, USA; University of California at San Diego, USA; Indiana University, USA; University of Edinburgh, UK; Input Output HK, UK) We introduce Refinement Reflection, a new framework for building SMT-based deductive verifiers. The key idea is to reflect the code implementing a user-defined function into the function’s (output) refinement type. As a consequence, at uses of the function, the function definition is instantiated in the SMT logic in a precise fashion that permits decidable verification. Reflection allows the user to write equational proofs of programs just by writing other programs using pattern-matching and recursion to perform case-splitting and induction. Thus, via the propositions-as-types principle, we show that reflection permits the specification of arbitrary functional correctness properties. Finally, we introduce a proof-search algorithm called Proof by Logical Evaluation that uses techniques from model checking and abstract interpretation, to completely automate equational reasoning. We have implemented reflection in Liquid Haskell and used it to verify that the widely used instances of the Monoid, Applicative, Functor, and Monad typeclasses actually satisfy key algebraic laws required to make the clients safe, and have used reflection to build the first library that actually verifies assumptions about associativity and ordering that are crucial for safe deterministic parallelism. ![]() ![]() ![]() |
|
Clairambault, Pierre |
![]() Pierre Clairambault, Charles Grellois, and Andrzej S. Murawski (University of Lyon, France; CNRS, France; ENS Lyon, France; Claude Bernard University Lyon 1, France; LIP, France; Inria, France; Aix-Marseille University, France; ENSAM, France; University of Toulon, France; University of Oxford, UK) Higher-order recursion schemes (HORS) have recently emerged as a promising foundation for higher-order program verification. We examine the impact of enriching HORS with linear types. To that end, we introduce two frameworks that blend non-linear and linear types: a variant of the λY -calculus and an extension of HORS, called linear HORS (LHORS). First we prove that the two formalisms are equivalent and there exist polynomial-time translations between them. Then, in order to support model-checking of (trees generated by) LHORS, we propose a refined version of alternating parity tree automata, called LNAPTA, whose behaviour depends on information about linearity. We show that the complexity of LNAPTA model-checking for LHORS depends on two type-theoretic parameters: linear order and linear depth. The former is in general smaller than the standard notion of order and ignores linear function spaces. In contrast, the latter measures the depth of linear clusters inside a type. Our main result states that LNAPTA model-checking of LHORS of linear order n is n-EXPTIME-complete, when linear depth is fixed. This generalizes and improves upon the classic result of Ong, which relies on the standard notion of order. To illustrate the significance of the result, we consider two applications: the MSO model-checking problem on variants of HORS with case distinction (RSFD and HORSC) on a finite domain and a call-by-value resource verification problem. In both cases, decidability can be established by translation into HORS, but the implied complexity bounds will be suboptimal due to increases in type order. In contrast, we show that the complexity bounds derived by translations into LHORS and appealing to our result are optimal in that they match the respective hardness results. ![]() ![]() |
|
Cong, Youyou |
![]() William J. Bowman, Youyou Cong, Nick Rioux, and Amal Ahmed (Northeastern University, USA; Ochanomizu University, Japan) Dependently typed languages such as Coq are used to specify and prove functional correctness of source programs, but what we ultimately need are guarantees about correctness of compiled code. By preserving dependent types through each compiler pass, we could preserve source-level specifications and correctness proofs into the generated target-language programs. Unfortunately, type-preserving compilation of dependent types is hard. In 2002, Barthe and Uustalu showed that type-preserving CPS is not possible for languages such as Coq. Specifically, they showed that for strong dependent pairs (Σ types), the standard typed call-by-name CPS is not type preserving. They further proved that for dependent case analysis on sums, a class of typed CPS translations—including the standard translation—is not possible. In 2016, Morrisett noticed a similar problem with the standard call-by-value CPS translation for dependent functions (Π types). In essence, the problem is that the standard typed CPS translation by double-negation, in which computations are assigned types of the form (A → ⊥) → ⊥, disrupts the term/type equivalence that is used during type checking in a dependently typed language. In this paper, we prove that type-preserving CPS translation for dependently typed languages is not not possible. We develop both call-by-name and call-by-value CPS translations from the Calculus of Constructions with both Π and Σ types (CC) to a dependently typed target language, and prove type preservation and compiler correctness of each translation. Our target language is CC extended with an additional equivalence rule and an additional typing rule, which we prove consistent by giving a model in the extensional Calculus of Constructions. Our key observation is that we can use a CPS translation that employs answer-type polymorphism, where CPS-translated computations have type ∀ α. (A → α) → α. This type justifies, by a free theorem, the new equality rule in our target language and allows us to recover the term/type equivalences that CPS translation disrupts. Finally, we conjecture that our translation extends to dependent case analysis on sums, despite the impossibility result, and provide a proof sketch. ![]() ![]() ![]() |
|
Cook, William R. |
![]() Yuepeng Wang, Isil Dillig (University of Texas at Austin, USA; Microsoft Research, USA) This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the same externally observable behavior before and after schema migration, there are no existing tools for proving equivalence of such programs. This paper takes a first step towards solving this problem by formalizing the equivalence and refinement checking problems for database-driven applications. We also propose a proof methodology based on the notion of bisimulation invariants over relational algebra with updates and describe a technique for synthesizing such bisimulation invariants. We have implemented the proposed technique in a tool called Mediator for verifying equivalence between database-driven applications written in our intermediate language and evaluate our tool on 21 benchmarks extracted from textbooks and real-world web applications. Our results show that the proposed methodology can successfully verify 20 of these benchmarks. ![]() ![]() |
|
Cyphert, John |
![]() Zachary Kincaid (Princeton University, USA; University of Wisconsin-Madison, USA; GrammaTech, USA) Automatic generation of non-linear loop invariants is a long-standing challenge in program analysis, with many applications. For instance, reasoning about exponentials provides a way to find invariants of digital-filter programs, and reasoning about polynomials and/or logarithms is needed for establishing invariants that describe the time or memory usage of many well-known algorithms. An appealing approach to this challenge is to exploit the powerful recurrence-solving techniques that have been developed in the field of computer algebra, which can compute exact characterizations of non-linear repetitive behavior. However, there is a gap between the capabilities of recurrence solvers and the needs of program analysis: (1) loop bodies are not merely systems of recurrence relations---they may contain conditional branches, nested loops, non-deterministic assignments, etc., and (2) a client program analyzer must be able to reason about the closed-form solutions produced by a recurrence solver (e.g., to prove assertions). This paper presents a method for generating non-linear invariants of general loops based on analyzing recurrence relations. The key components are an abstract domain for reasoning about non-linear arithmetic, a semantics-based method for extracting recurrence relations from loop bodies, and a recurrence solver that avoids closed forms that involve complex or irrational numbers. Our technique has been implemented in a program analyzer that can analyze general loops and mutually recursive procedures. Our experiments show that our technique shows promise for non-linear assertion-checking and resource-bound generation. ![]() ![]() ![]() |
|
Danielsson, Nils Anders |
![]() Nils Anders Danielsson (University of Gothenburg, Sweden; Chalmers University of Technology, Sweden) Up-to techniques are used to make it easier—or feasible—to construct, for instance, proofs of bisimilarity. This text shows how many up-to techniques can be framed as size-preserving functions, using sized types to keep track of sizes. Through a number of examples it is argued that this approach to up-to techniques is often convenient to use in practice. Some examples of functions that cannot be made size-preserving are also included, in order to illustrate the limits of the approach. On the more theoretical side a class of up-to techniques intended to capture a natural mode of use of size-preserving functions is defined. This class turns out to correspond closely to "functions below the companion", a notion recently introduced by Pous. ![]() ![]() ![]() |
|
Deacon, Will |
![]() Christopher Pulte (University of Cambridge, UK; ARM, UK; University of St. Andrews, UK) ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes could become visible to some other threads before becoming visible to all — but this has not been exploited in production implementations, the corresponding potential hardware optimisations are thought to have insufficient benefits in the ARM context, and it gives rise to subtle complications when combined with other ARMv8 features. The ARMv8 architecture has therefore been revised: it now has a multicopy-atomic model. It has also been simplified in other respects, including more straightforward notions of dependency, and the architecture now includes a formal concurrency model. In this paper we detail these changes and discuss their motivation. We define two formal concurrency models: an operational one, simplifying the Flowing model of Flur et al., and the axiomatic model of the revised ARMv8 specification. The models were developed by an academic group and by ARM staff, respectively, and this extended collaboration partly motivated the above changes. We prove the equivalence of the two models. The operational model is integrated into an executable exploration tool with new web interface, demonstrated by exhaustively checking the possible behaviours of a loop-unrolled version of a Linux kernel lock implementation, a previously known bug due to unprevented speculation, and a fixed version. ![]() ![]() ![]() |
|
De Araújo, Marcus R. |
![]() Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araújo, and Fernando Magno Quintão Pereira (Federal University of Minas Gerais, Brazil; Federal University of Ouro Preto, Brazil) Incomplete source code naturally emerges in software development: during the design phase, while evolving, testing and analyzing programs. Therefore, the ability to understand partial programs is a valuable asset. However, this problem is still unsolved in the C programming language. Difficulties stem from the fact that parsing C requires, not only syntax, but also semantic information. Furthermore, inferring types so that they respect C's type system is a challenging task. In this paper we present a technique that lets us solve these problems. We provide a unification-based type inference capable of dealing with C intricacies. The ideas we present let us reconstruct partial C programs into complete well-typed ones. Such program reconstruction has several applications: enabling static analysis tools in scenarios where software components may be absent; improving static analysis tools that do not rely on build-specifications; allowing stub-generation and testing tools to work on snippets; and assisting programmers on the extraction of reusable data-structures out of the program parts that use them. Our evaluation is performed on source code from a variety of C libraries such as GNU's Coreutils, GNULib, GNOME's GLib, and GDSL; on implementations from Sedgewick's books; and on snippets from popular open-source projects like CPython, FreeBSD, and Git. ![]() ![]() ![]() ![]() |
|
Devriese, Dominique |
![]() Dominique Devriese, Marco Patrignani, and Frank Piessens (KU Leuven, Belgium; MPI-SWS, Germany; CISPA, Germany) There has long been speculation in the scientific literature on how to dynamically enforce parametricity such as that yielded by System F. Almost 20 years ago, Sumii and Pierce proposed a formal compiler from System F into the cryptographic lambda calculus: an untyped lambda calculus extended with an idealised model of encryption. They conjectured that this compiler was fully abstract, i.e. that compiled terms are contextually equivalent if and only if the original terms were, a property that can be seen as a form of secure compilation. The conjecture has received attention in several other publications since then, but remains open to this day. More recently, several researchers have been looking at gradually-typed languages that extend System F. In this setting it is natural to wonder whether embedding System F into these gradually-typed languages preserves contextual equivalence and thus parametricity. In this paper, we answer both questions negatively. We provide a concrete counterexample: two System F terms whose contextual equivalence is not preserved by the Sumii-Pierce compiler, nor the embedding into the polymorphic blame calculus. This counterexample relies on the absence in System F of what we call a universal type, i.e., a type that all other types can be injected into and extracted from. As the languages in which System F is compiled have a universal type, the compilation cannot be fully abstract; this paper explains why. We believe this paper thus sheds light on recent results in the field of gradually typed languages and it provides a perspective for further research into secure compilation of polymorphic languages. ![]() ![]() |
|
Dillig, Isil |
![]() Xinyu Wang, Isil Dillig (University of Texas at Austin, USA; Microsoft Research, USA) We present a new approach to example-guided program synthesis based on counterexample-guided abstraction refinement. Our method uses the abstract semantics of the underlying DSL to find a program P whose abstract behavior satisfies the examples. However, since program P may be spurious with respect to the concrete semantics, our approach iteratively refines the abstraction until we either find a program that satisfies the examples or prove that no such DSL program exists. Because many programs have the same input-output behavior in terms of their abstract semantics, this synthesis methodology significantly reduces the search space compared to existing techniques that use purely concrete semantics. While synthesis using abstraction refinement (SYNGAR) could be implemented in different settings, we propose a refinement-based synthesis algorithm that uses abstract finite tree automata (AFTA). Our technique uses a coarse initial program abstraction to construct an initial AFTA, which is iteratively refined by constructing a proof of incorrectness of any spurious program. In addition to ruling out the spurious program accepted by the previous AFTA, proofs of incorrectness are also useful for ruling out many other spurious programs. We implement these ideas in a framework called Blaze, which can be instantiated in different domains by providing a suitable DSL and its corresponding concrete and abstract semantics. We have used the Blaze framework to build synthesizers for string and matrix transformations, and we compare Blaze with existing techniques. Our results for the string domain show that Blaze compares favorably with FlashFill, a domain-specific synthesizer that is now deployed in Microsoft PowerShell. In the context of matrix manipulations, we compare Blaze against Prose, a state-of-the-art general-purpose VSA-based synthesizer, and show that Blaze results in a 90x speed-up over Prose. In both application domains, Blaze also consistently improves upon the performance of two other existing techniques by at least an order of magnitude. ![]() ![]() ![]() Yuepeng Wang, Isil Dillig (University of Texas at Austin, USA; Microsoft Research, USA) This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the same externally observable behavior before and after schema migration, there are no existing tools for proving equivalence of such programs. This paper takes a first step towards solving this problem by formalizing the equivalence and refinement checking problems for database-driven applications. We also propose a proof methodology based on the notion of bisimulation invariants over relational algebra with updates and describe a technique for synthesizing such bisimulation invariants. We have implemented the proposed technique in a tool called Mediator for verifying equivalence between database-driven applications written in our intermediate language and evaluate our tool on 21 benchmarks extracted from textbooks and real-world web applications. Our results show that the proposed methodology can successfully verify 20 of these benchmarks. ![]() ![]() |
|
Dongol, Brijesh |
![]() Brijesh Dongol, Radha Jagadeesan, and James Riely (Brunel University London, UK; DePaul University, USA) The integration of transactions into hardware relaxed memory architectures is a topic of current research both in industry and academia. In this paper, we provide a general architectural framework for the introduction of transactions into models of relaxed memory in hardware, including the SC, TSO, ARMv8 and PPC models. Our framework incorporates flexible and expressive forms of transaction aborts and execution that have hitherto been in the realm of software transactional memory. In contrast to software transactional memory, we account for the characteristics of relaxed memory as a restricted form of distributed system, without a notion of global time. We prove abstraction theorems to demonstrate that the programmer API matches the intuitions and expectations about transactions. ![]() ![]() |
|
![]() |
Dreyer, Derek |
![]() Ralf Jung (MPI-SWS, Germany; Delft University of Technology, Netherlands) Rust is a new systems programming language that promises to overcome the seemingly fundamental tradeoff between high-level safety guarantees and low-level control over resource management. Unfortunately, none of Rust's safety claims have been formally proven, and there is good reason to question whether they actually hold. Specifically, Rust employs a strong, ownership-based type system, but then extends the expressive power of this core type system through libraries that internally use unsafe features. In this paper, we give the first formal (and machine-checked) safety proof for a language representing a realistic subset of Rust. Our proof is extensible in the sense that, for each new Rust library that uses unsafe features, we can say what verification condition it must satisfy in order for it to be deemed a safe extension to the language. We have carried out this verification for some of the most important libraries that are used throughout the Rust ecosystem. ![]() ![]() ![]() ![]() |
Ehrhard, Thomas |
![]() Thomas Ehrhard, Michele Pagani, and Christine Tasson (University of Paris Diderot, France; CNRS, France) We define a notion of stable and measurable map between cones endowed with measurability tests and show that it forms a cpo-enriched cartesian closed category. This category gives a denotational model of an extension of PCF supporting the main primitives of probabilistic functional programming, like continuous and discrete probabilistic distributions, sampling, conditioning and full recursion. We prove the soundness and adequacy of this model with respect to a call-by-name operational semantics and give some examples of its denotations. ![]() ![]() ![]() |
|
Emmi, Michael |
![]() Michael Emmi and Constantin Enea (SRI International, USA; University of Paris Diderot, France; CNRS, France) While many program properties like the validity of assertions and in-bounds array accesses admit nearly-trivial monitoring algorithms, the standard correctness criterion for concurrent data structures does not. Given an implementation of an arbitrary abstract data type, checking whether the operations invoked in one single concurrent execution are linearizable, i.e., indistinguishable from an execution where the same operations are invoked atomically, requires exponential time in the number of operations. In this work we identify a class of collection abstract data types which admit polynomial-time linearizability monitors. Collections capture the majority of concurrent data structures available in practice, including stacks, queues, sets, and maps. Although monitoring executions of arbitrary abstract data types requires enumerating exponentially-many possible linearizations, collections enjoy combinatorial properties which avoid the enumeration. We leverage these properties to reduce linearizability to Horn satisfiability. As far as we know, ours is the first sound, complete, and tractable algorithm for monitoring linearizability for types beyond single-value registers. ![]() ![]() |
|
Enea, Constantin |
![]() Michael Emmi and Constantin Enea (SRI International, USA; University of Paris Diderot, France; CNRS, France) While many program properties like the validity of assertions and in-bounds array accesses admit nearly-trivial monitoring algorithms, the standard correctness criterion for concurrent data structures does not. Given an implementation of an arbitrary abstract data type, checking whether the operations invoked in one single concurrent execution are linearizable, i.e., indistinguishable from an execution where the same operations are invoked atomically, requires exponential time in the number of operations. In this work we identify a class of collection abstract data types which admit polynomial-time linearizability monitors. Collections capture the majority of concurrent data structures available in practice, including stacks, queues, sets, and maps. Although monitoring executions of arbitrary abstract data types requires enumerating exponentially-many possible linearizations, collections enjoy combinatorial properties which avoid the enumeration. We leverage these properties to reduce linearizability to Horn satisfiability. As far as we know, ours is the first sound, complete, and tractable algorithm for monitoring linearizability for types beyond single-value registers. ![]() ![]() |
|
Erwig, Martin |
![]() John Peter Campora, Sheng Chen, Martin Erwig, and Eric Walkingshaw (University of Louisiana at Lafayette, USA; Oregon State University, USA) Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically-typed to more statically-typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared. In this paper, we address this problem by developing migrational typing, which efficiently types all possible ways of adding or removing type annotations from a gradually typed program. The typing result supports automatically migrating a program to be as static as possible, or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing. ![]() ![]() ![]() |
|
Espitau, Thomas |
![]() Gilles Barthe (IMDEA Software Institute, Spain; UPMC, France; Inria, France; University College London, UK; École Polytechnique, France) Program sensitivity, also known as Lipschitz continuity, describes how small changes in a program’s input lead to bounded changes in the output. We propose an average notion of program sensitivity for probabilistic programs—expected sensitivity—that averages a distance function over a probabilistic coupling of two output distributions from two similar inputs. By varying the distance, expected sensitivity recovers useful notions of probabilistic function sensitivity, including stability of machine learning algorithms and convergence of Markov chains. Furthermore, expected sensitivity satisfies clean compositional properties and is amenable to formal verification. We develop a relational program logic called EpRHL for proving expected sensitivity properties. Our logic features two key ideas. First, relational pre-conditions and post-conditions are expressed using distances, a real-valued generalization of typical boolean-valued (relational) assertions. Second, judgments are interpreted in terms of expectation coupling, a novel, quantitative generalization of probabilistic couplings which supports compositional reasoning. We demonstrate our logic on examples beyond the reach of prior relational logics. Our main example formalizes uniform stability of the stochastic gradient method. Furthermore, we prove rapid mixing for a probabilistic model of population dynamics. We also extend our logic with a transitivity principle for expectation couplings to capture the path coupling proof technique by Bubley and Dyer, and formalize rapid mixing of the Glauber dynamics from statistical physics. ![]() ![]() ![]() |
|
Farzan, Azadeh |
![]() Azadeh Farzan (University of Toronto, Canada; Princeton University, USA) Many problems in formal methods can be formalized as two-player games. For several applications—program synthesis, for example—in addition to determining which player wins the game, we are interested in computing a winning strategy for that player. This paper studies the strategy synthesis problem for games defined within the theory of linear rational arithmetic. Two types of games are considered. A satisfiability game, described by a quantified formula, is played by two players that take turns instantiating quantifiers. The objective of each player is to prove (or disprove) satisfiability of the formula. A reachability game, described by a pair of formulas defining the legal moves of each player, is played by two players that take turns choosing positions—rational vectors of some fixed dimension. The objective of each player is to reach a position where the opposing player has no legal moves (or to play the game forever). We give a complete algorithm for synthesizing winning strategies for satisfiability games and a sound (but necessarily incomplete) algorithm for synthesizing winning strategies for reachability games. ![]() ![]() ![]() |
|
Feng, Xinyu |
![]() Hongjin Liang and Xinyu Feng (Nanjing University, China; University of Science and Technology of China, China) Various progress properties have been proposed for concurrent objects, such as wait-freedom, lock-freedom, starvation-freedom and deadlock-freedom. However, none of them applies to concurrent objects with partial methods, i.e., methods that are supposed not to return under certain circumstances. A typical example is the lock_acquire method, which must not return when the lock has already been acquired. In this paper we propose two new progress properties, partial starvation-freedom (PSF) and partial deadlock-freedom (PDF), for concurrent objects with partial methods. We also design four patterns to write abstract specifications for PSF or PDF objects under strongly or weakly fair scheduling, so that these objects contextually refine the abstract specifications. Our Abstraction Theorem shows the equivalence between PSF (or PDF) and the progress-aware contextual refinement. Finally, we generalize the program logic LiLi to have a new logic to verify the PSF (or PDF) property and linearizability of concurrent objects. ![]() ![]() |
|
Fisher, Kathleen |
![]() Anders Miltner, Kathleen Fisher, Benjamin C. Pierce (Princeton University, USA; Tufts University, USA; University of Pennsylvania, USA) Bidirectional transformations between different data representations occur frequently in modern software systems. They appear as serializers and deserializers, as parsers and pretty printers, as database views and view updaters, and as a multitude of different kinds of ad hoc data converters. Manually building bidirectional transformations---by writing two separate functions that are intended to be inverses---is tedious and error prone. A better approach is to use a domain-specific language in which both directions can be written as a single expression. However, these domain-specific languages can be difficult to program in, requiring programmers to manage fiddly details while working in a complex type system. We present an alternative approach. Instead of coding transformations manually, we synthesize them from declarative format descriptions and examples. Specifically, we present Optician, a tool for type-directed synthesis of bijective string transformers. The inputs to Optician are a pair of ordinary regular expressions representing two data formats and a few concrete examples for disambiguation. The output is a well-typed program in Boomerang (a bidirectional language based on the theory of lenses). The main technical challenge involves navigating the vast program search space efficiently. In particular, and unlike most prior work on type-directed synthesis, our system operates in the context of a language with a rich equivalence relation on types (the theory of regular expressions). Consequently, program synthesis requires search in two dimensions: First, our synthesis algorithm must find a pair of "syntactically compatible types," and second, using the structure of those types, it must find a type- and example-compliant term. Our key insight is that it is possible to reduce the size of this search space without losing any computational power by defining a new language of lenses designed specifically for synthesis. The new language is free from arbitrary function composition and operates only over types and terms in a new disjunctive normal form. We prove (1) our new language is just as powerful as a more natural, compositional, and declarative language and (2) our synthesis algorithm is sound and complete with respect to the new language. We also demonstrate empirically that our new language changes the synthesis problem from one that admits intractable solutions to one that admits highly efficient solutions, able to synthesize intricate lenses between complex file formats in seconds. We evaluate Optician on a benchmark suite of 39 examples that includes both microbenchmarks and realistic examples derived from other data management systems including Flash Fill, a tool for synthesizing string transformations in spreadsheets, and Augeas, a tool for bidirectional processing of Linux system configuration files. ![]() ![]() ![]() |
|
Flückiger, Olivier |
![]() Olivier Flückiger (Northeastern University, USA; Inria, France; Czech Technical University, Czechia) High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition. ![]() ![]() |
|
Flur, Shaked |
![]() Christopher Pulte (University of Cambridge, UK; ARM, UK; University of St. Andrews, UK) ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes could become visible to some other threads before becoming visible to all — but this has not been exploited in production implementations, the corresponding potential hardware optimisations are thought to have insufficient benefits in the ARM context, and it gives rise to subtle complications when combined with other ARMv8 features. The ARMv8 architecture has therefore been revised: it now has a multicopy-atomic model. It has also been simplified in other respects, including more straightforward notions of dependency, and the architecture now includes a formal concurrency model. In this paper we detail these changes and discuss their motivation. We define two formal concurrency models: an operational one, simplifying the Flowing model of Flur et al., and the axiomatic model of the revised ARMv8 specification. The models were developed by an academic group and by ARM staff, respectively, and this extended collaboration partly motivated the above changes. We prove the equivalence of the two models. The operational model is integrated into an executable exploration tool with new web interface, demonstrated by exhaustively checking the possible behaviours of a loop-unrolled version of a Linux kernel lock implementation, a previously known bug due to unprevented speculation, and a fixed version. ![]() ![]() ![]() |
|
Fournet, Cédric |
![]() Danel Ahman, Cédric Fournet (Inria, France; Microsoft Research, UK; ENS Paris, France; Microsoft Research, India; Microsoft Research, USA) We provide a way to ease the verification of programs whose state evolves monotonically. The main idea is that a property witnessed in a prior state can be soundly recalled in the current state, provided (1) state evolves according to a given preorder, and (2) the property is preserved by this preorder. In many scenarios, such monotonic reasoning yields concise modular proofs, saving the need for explicit program invariants. We distill our approach into the monotonic-state monad, a general yet compact interface for Hoare-style reasoning about monotonic state in a dependently typed language. We prove the soundness of the monotonic-state monad and use it as a unified foundation for reasoning about monotonic state in the F⋆ verification system. Based on this foundation, we build libraries for various mutable data structures like monotonic references and apply these libraries at scale to the verification of several distributed applications. ![]() ![]() ![]() ![]() |
|
Fragoso Santos, José |
![]() José Fragoso Santos, Petar Maksimović, Daiva Naudžiūnienė, Thomas Wood, and Philippa Gardner (Imperial College London, UK; Mathematical Institute SASA, Serbia) The dynamic nature of JavaScript and its complex semantics make it a difficult target for logic-based verification. We introduce JaVerT, a semi-automatic JavaScript Verification Toolchain, based on separation logic and aimed at the specialist developer wanting rich, mechanically verified specifications of critical JavaScript code. To specify JavaScript programs, we design abstractions that capture its key heap structures (for example, prototype chains and function closures), allowing the developer to write clear and succinct specifications with minimal knowledge of the JavaScript internals. To verify JavaScript programs, we develop JaVerT, a verification pipeline consisting of: JS-2-JSIL, a well-tested compiler from JavaScript to JSIL, an intermediate goto language capturing the fundamental dynamic features of JavaScript; JSIL Verify, a semi-automatic verification tool based on a sound JSIL separation logic; and verified axiomatic specifications of the JavaScript internal functions. Using JaVerT, we verify functional correctness properties of: data-structure libraries (key-value map, priority queue) written in an object-oriented style; operations on data structures such as binary search trees (BSTs) and lists; examples illustrating function closures; and test cases from the official ECMAScript test suite. The verification times suggest that reasoning about larger, more complex code using JaVerT is feasible. ![]() ![]() ![]() |
|
French, Jon |
![]() Christopher Pulte (University of Cambridge, UK; ARM, UK; University of St. Andrews, UK) ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes could become visible to some other threads before becoming visible to all — but this has not been exploited in production implementations, the corresponding potential hardware optimisations are thought to have insufficient benefits in the ARM context, and it gives rise to subtle complications when combined with other ARMv8 features. The ARMv8 architecture has therefore been revised: it now has a multicopy-atomic model. It has also been simplified in other respects, including more straightforward notions of dependency, and the architecture now includes a formal concurrency model. In this paper we detail these changes and discuss their motivation. We define two formal concurrency models: an operational one, simplifying the Flowing model of Flur et al., and the axiomatic model of the revised ARMv8 specification. The models were developed by an academic group and by ARM staff, respectively, and this extended collaboration partly motivated the above changes. We prove the equivalence of the two models. The operational model is integrated into an executable exploration tool with new web interface, demonstrated by exhaustively checking the possible behaviours of a loop-unrolled version of a Linux kernel lock implementation, a previously known bug due to unprevented speculation, and a fixed version. ![]() ![]() ![]() |
|
Gaboardi, Marco |
![]() Ivan Radiček, Gilles Barthe (Vienna University of Technology, Austria; IMDEA Software Institute, Spain; SUNY Buffalo, USA; MPI-SWS, Germany) Formal frameworks for cost analysis of programs have been widely studied in the unary setting and, to a limited extent, in the relational setting. However, many of these frameworks focus only on the cost aspect, largely side-lining functional properties that are often a prerequisite for cost analysis, thus leaving many interesting programs out of their purview. In this paper, we show that elegant, simple, expressive proof systems combining cost analysis and functional properties can be built by combining already known ingredients: higher-order refinements and cost monads. Specifically, we derive two syntax-directed proof systems, UC and RC, for unary and relational cost analysis, by adding a cost monad to a (syntax-directed) logic of higher-order programs. We study the metatheory of the systems, show that several nontrivial examples can be verified in them, and prove that existing frameworks for cost analysis (RelCost and RAML) can be embedded in them. ![]() ![]() |
|
Gardner, Philippa |
![]() José Fragoso Santos, Petar Maksimović, Daiva Naudžiūnienė, Thomas Wood, and Philippa Gardner (Imperial College London, UK; Mathematical Institute SASA, Serbia) The dynamic nature of JavaScript and its complex semantics make it a difficult target for logic-based verification. We introduce JaVerT, a semi-automatic JavaScript Verification Toolchain, based on separation logic and aimed at the specialist developer wanting rich, mechanically verified specifications of critical JavaScript code. To specify JavaScript programs, we design abstractions that capture its key heap structures (for example, prototype chains and function closures), allowing the developer to write clear and succinct specifications with minimal knowledge of the JavaScript internals. To verify JavaScript programs, we develop JaVerT, a verification pipeline consisting of: JS-2-JSIL, a well-tested compiler from JavaScript to JSIL, an intermediate goto language capturing the fundamental dynamic features of JavaScript; JSIL Verify, a semi-automatic verification tool based on a sound JSIL separation logic; and verified axiomatic specifications of the JavaScript internal functions. Using JaVerT, we verify functional correctness properties of: data-structure libraries (key-value map, priority queue) written in an object-oriented style; operations on data structures such as binary search trees (BSTs) and lists; examples illustrating function closures; and test cases from the official ECMAScript test suite. The verification times suggest that reasoning about larger, more complex code using JaVerT is feasible. ![]() ![]() ![]() |
|
Garg, Deepak |
![]() Ivan Radiček, Gilles Barthe (Vienna University of Technology, Austria; IMDEA Software Institute, Spain; SUNY Buffalo, USA; MPI-SWS, Germany) Formal frameworks for cost analysis of programs have been widely studied in the unary setting and, to a limited extent, in the relational setting. However, many of these frameworks focus only on the cost aspect, largely side-lining functional properties that are often a prerequisite for cost analysis, thus leaving many interesting programs out of their purview. In this paper, we show that elegant, simple, expressive proof systems combining cost analysis and functional properties can be built by combining already known ingredients: higher-order refinements and cost monads. Specifically, we derive two syntax-directed proof systems, UC and RC, for unary and relational cost analysis, by adding a cost monad to a (syntax-directed) logic of higher-order programs. We study the metatheory of the systems, show that several nontrivial examples can be verified in them, and prove that existing frameworks for cost analysis (RelCost and RAML) can be embedded in them. ![]() ![]() |
|
Ghahramani, Zoubin |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Gilray, Thomas |
![]() Phúc C. Nguyễn, Thomas Gilray, Sam Tobin-Hochstadt, and David Van Horn (University of Maryland, USA; Indiana University, USA) Software contracts allow programmers to state rich program properties using the full expressive power of an object language. However, since they are enforced at runtime, monitoring contracts imposes significant overhead and delays error discovery. So contract veri cation aims to guarantee all or most of these properties ahead of time, enabling valuable optimizations and yielding a more general assurance of correctness. Existing methods for static contract verification satisfy the needs of more restricted target languages, but fail to address the challenges unique to those conjoining untyped, dynamic programming, higher-order functions, modularity, and statefulness. Our approach tackles all these features at once, in the context of the full Racket system—a mature environment for stateful, higher-order, multi-paradigm programming with or with- out types. Evaluating our method using a set of both pure and stateful benchmarks, we are able to verify 99.94% of checks statically (all but 28 of 49, 861). Stateful, higher-order functions pose significant challenges for static contract verification in particular. In the presence of these features, a modular analysis must permit code from the current module to escape permanently to an opaque context (unspecified code from outside the current module) that may be stateful and therefore store a reference to the escaped closure. Also, contracts themselves, being predicates wri en in unrestricted Racket, may exhibit stateful behavior; a sound approach must be robust to contracts which are arbitrarily expressive and interwoven with the code they monitor. In this paper, we present and evaluate our solution based on higher-order symbolic execution, explain the techniques we used to address such thorny issues, formalize a notion of behavioral approximation, and use it to provide a mechanized proof of soundness. ![]() ![]() ![]() ![]() |
|
Goel, Aviral |
![]() Olivier Flückiger (Northeastern University, USA; Inria, France; Czech Technical University, Czechia) High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition. ![]() ![]() |
|
Golan-Gueta, Guy |
![]() Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar (Tel Aviv University, Israel; VMware, USA; Stanford University, USA) Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object's local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contract executions, excluding these of the DAO or of contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state. ![]() ![]() ![]() |
|
Grégoire, Benjamin |
![]() Gilles Barthe (IMDEA Software Institute, Spain; UPMC, France; Inria, France; University College London, UK; École Polytechnique, France) Program sensitivity, also known as Lipschitz continuity, describes how small changes in a program’s input lead to bounded changes in the output. We propose an average notion of program sensitivity for probabilistic programs—expected sensitivity—that averages a distance function over a probabilistic coupling of two output distributions from two similar inputs. By varying the distance, expected sensitivity recovers useful notions of probabilistic function sensitivity, including stability of machine learning algorithms and convergence of Markov chains. Furthermore, expected sensitivity satisfies clean compositional properties and is amenable to formal verification. We develop a relational program logic called EpRHL for proving expected sensitivity properties. Our logic features two key ideas. First, relational pre-conditions and post-conditions are expressed using distances, a real-valued generalization of typical boolean-valued (relational) assertions. Second, judgments are interpreted in terms of expectation coupling, a novel, quantitative generalization of probabilistic couplings which supports compositional reasoning. We demonstrate our logic on examples beyond the reach of prior relational logics. Our main example formalizes uniform stability of the stochastic gradient method. Furthermore, we prove rapid mixing for a probabilistic model of population dynamics. We also extend our logic with a transitivity principle for expectation couplings to capture the path coupling proof technique by Bubley and Dyer, and formalize rapid mixing of the Glauber dynamics from statistical physics. ![]() ![]() ![]() |
|
Grellois, Charles |
![]() Pierre Clairambault, Charles Grellois, and Andrzej S. Murawski (University of Lyon, France; CNRS, France; ENS Lyon, France; Claude Bernard University Lyon 1, France; LIP, France; Inria, France; Aix-Marseille University, France; ENSAM, France; University of Toulon, France; University of Oxford, UK) Higher-order recursion schemes (HORS) have recently emerged as a promising foundation for higher-order program verification. We examine the impact of enriching HORS with linear types. To that end, we introduce two frameworks that blend non-linear and linear types: a variant of the λY -calculus and an extension of HORS, called linear HORS (LHORS). First we prove that the two formalisms are equivalent and there exist polynomial-time translations between them. Then, in order to support model-checking of (trees generated by) LHORS, we propose a refined version of alternating parity tree automata, called LNAPTA, whose behaviour depends on information about linearity. We show that the complexity of LNAPTA model-checking for LHORS depends on two type-theoretic parameters: linear order and linear depth. The former is in general smaller than the standard notion of order and ignores linear function spaces. In contrast, the latter measures the depth of linear clusters inside a type. Our main result states that LNAPTA model-checking of LHORS of linear order n is n-EXPTIME-complete, when linear depth is fixed. This generalizes and improves upon the classic result of Ong, which relies on the standard notion of order. To illustrate the significance of the result, we consider two applications: the MSO model-checking problem on variants of HORS with case distinction (RSFD and HORSC) on a finite domain and a call-by-value resource verification problem. In both cases, decidability can be established by translation into HORS, but the implied complexity bounds will be suboptimal due to increases in type order. In contrast, we show that the complexity bounds derived by translations into LHORS and appealing to our result are optimal in that they match the respective hardness results. ![]() ![]() |
|
Grossman, Shelly |
![]() Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar (Tel Aviv University, Israel; VMware, USA; Stanford University, USA) Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object's local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contract executions, excluding these of the DAO or of contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state. ![]() ![]() ![]() |
|
Hague, Matthew |
![]() Taolue Chen, Yan Chen, Matthew Hague (University of London, UK; Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Royal Holloway University of London, UK; University of Oxford, UK) The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in practice; for example, the use of regular constraints (pattern matching against a regular expression), and the string-replace function (replacing either the first occurrence or all occurrences of a ``pattern'' string constant/variable/regular expression by a ``replacement'' string constant/variable), among many others. Both regular constraints and the string-replace function are crucial for such applications as analysis of JavaScript (or more generally HTML5 applications) against cross-site scripting (XSS) vulnerabilities, which motivates us to consider a richer class of string constraints. The importance of the string-replace function (especially the replace-all facility) is increasingly recognised, which can be witnessed by the incorporation of the function in the input languages of several string constraint solvers. Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typically met by string constraints that are generated by symbolic execution. In this paper, we provide the first systematic study of straight-line string constraints with the string-replace function and the regular constraints as the basic operations. We show that a large class of such constraints (i.e. when only a constant string or a regular expression is permitted in the pattern) is decidable. We note that the string-replace function, even under this restriction, is sufficiently powerful for expressing the concatenation operator and much more (e.g. extensions of regular expressions with string variables). This gives us the most expressive decidable logic containing concatenation, replace, and regular constraints under the same umbrella. Our decision procedure for the straight-line fragment follows an automata-theoretic approach, and is modular in the sense that the string-replace terms are removed one by one to generate more and more regular constraints, which can then be discharged by the state-of-the-art string constraint solvers. We also show that this fragment is, in a way, a maximal decidable subclass of the straight-line fragment with string-replace and regular constraints. To this end, we show undecidability results for the following two extensions: (1) variables are permitted in the pattern parameter of the replace function, (2) length constraints are permitted. ![]() ![]() |
|
Heunen, Chris |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Hoenicke, Jochen |
![]() Oded Padon, Jochen Hoenicke, Giuliano Losa, Andreas Podelski (Tel Aviv University, Israel; University of Freiburg, Germany; University of California at Los Angeles, USA) We develop a new technique for verifying temporal properties of infinite-state (distributed) systems. The main idea is to reduce the temporal verification problem to the problem of verifying the safety of infinite-state systems expressed in first-order logic. This allows to leverage existing techniques for safety verification to verify temporal properties of interesting distributed protocols, including some that have not been mechanically verified before. We model infinite-state systems using first-order logic, and use first-order temporal logic (FO-LTL) to specify temporal properties. This general formalism allows to naturally model distributed systems, while supporting both unbounded-parallelism (where the system is allowed to dynamically create processes), and infinite-state per process. The traditional approach for verifying temporal properties of infinite-state systems employs well-founded relations (e.g. using linear arithmetic ranking functions). In contrast, our approach is based the idea of fair cycle detection. In finite-state systems, temporal verification can always be reduced to fair cycle detection (a system contains a fair cycle if it revisits a state after satisfying all fairness constraints). However, with both infinitely many states and infinitely many fairness constraints, a straightforward reduction to fair cycle detection is unsound. To regain soundness, we augment the infinite-state transition system by a dynamically computed finite set, that exploits the locality of transitions. This set lets us define a form of fair cycle detection that is sound in the presence of both infinitely many states, and infinitely many fairness constraints. Our approach allows a new style of temporal verification that does not explicitly involve ranking functions. This fits well with pure first-order verification which does not explicitly reason about numerical values. In particular, it can be used with effectively propositional first-order logic (EPR), in which case checking verification conditions is decidable. We applied our technique to verify temporal properties of several interesting protocols. To the best of our knowledge, we have obtained the first mechanized liveness proof for both TLB Shootdown, and Stoppable Paxos. ![]() ![]() ![]() ![]() |
|
Holík, Lukáš |
![]() Lukáš Holík (Brno University of Technology, Czechia; University of Oxford, UK; Uppsala University, Sweden) String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature. ![]() ![]() ![]() ![]() |
|
![]() |
Hriţcu, Cătălin |
![]() Danel Ahman, Cédric Fournet (Inria, France; Microsoft Research, UK; ENS Paris, France; Microsoft Research, India; Microsoft Research, USA) We provide a way to ease the verification of programs whose state evolves monotonically. The main idea is that a property witnessed in a prior state can be soundly recalled in the current state, provided (1) state evolves according to a given preorder, and (2) the property is preserved by this preorder. In many scenarios, such monotonic reasoning yields concise modular proofs, saving the need for explicit program invariants. We distill our approach into the monotonic-state monad, a general yet compact interface for Hoare-style reasoning about monotonic state in a dependently typed language. We prove the soundness of the monotonic-state monad and use it as a unified foundation for reasoning about monotonic state in the F⋆ verification system. Based on this foundation, we build libraries for various mutable data structures like monotonic references and apply these libraries at scale to the verification of several distributed applications. ![]() ![]() ![]() ![]() |
Hsu, Justin |
![]() Aws Albarghouthi (University of Wisconsin-Madison, USA; University College London, UK) Differential privacy has emerged as a promising probabilistic formulation of privacy, generating intense interest within academia and industry. We present a push-button, automated technique for verifying ε-differential privacy of sophisticated randomized algorithms. We make several conceptual, algorithmic, and practical contributions: (i) Inspired by the recent advances on approximate couplings and randomness alignment, we present a new proof technique called coupling strategies, which casts differential privacy proofs as a winning strategy in a game where we have finite privacy resources to expend. (ii) To discover a winning strategy, we present a constraint-based formulation of the problem as a set of Horn modulo couplings (HMC) constraints, a novel combination of first-order Horn clauses and probabilistic constraints. (iii) We present a technique for solving HMC constraints by transforming probabilistic constraints into logical constraints with uninterpreted functions. (iv) Finally, we implement our technique in the FairSquare verifier and provide the first automated privacy proofs for a number of challenging algorithms from the differential privacy literature, including Report Noisy Max, the Exponential Mechanism, and the Sparse Vector Mechanism. ![]() ![]() ![]() Gilles Barthe (IMDEA Software Institute, Spain; UPMC, France; Inria, France; University College London, UK; École Polytechnique, France) Program sensitivity, also known as Lipschitz continuity, describes how small changes in a program’s input lead to bounded changes in the output. We propose an average notion of program sensitivity for probabilistic programs—expected sensitivity—that averages a distance function over a probabilistic coupling of two output distributions from two similar inputs. By varying the distance, expected sensitivity recovers useful notions of probabilistic function sensitivity, including stability of machine learning algorithms and convergence of Markov chains. Furthermore, expected sensitivity satisfies clean compositional properties and is amenable to formal verification. We develop a relational program logic called EpRHL for proving expected sensitivity properties. Our logic features two key ideas. First, relational pre-conditions and post-conditions are expressed using distances, a real-valued generalization of typical boolean-valued (relational) assertions. Second, judgments are interpreted in terms of expectation coupling, a novel, quantitative generalization of probabilistic couplings which supports compositional reasoning. We demonstrate our logic on examples beyond the reach of prior relational logics. Our main example formalizes uniform stability of the stochastic gradient method. Furthermore, we prove rapid mixing for a probabilistic model of population dynamics. We also extend our logic with a transitivity principle for expectation couplings to capture the path coupling proof technique by Bubley and Dyer, and formalize rapid mixing of the Glauber dynamics from statistical physics. ![]() ![]() ![]() |
|
Hu, Zhenjiang |
![]() Hsiang-Shang Ko and Zhenjiang Hu (National Institute of Informatics, Japan) Among the frameworks of bidirectional transformations proposed for addressing various synchronisation (consistency maintenance) problems, Foster et al.’s [2007] asymmetric lenses have influenced the design of a generation of bidirectional programming languages. Most of these languages are based on a declarative programming model, and only allow the programmer to describe a consistency specification with ad hoc and/or awkward control over the consistency restoration behaviour. However, synchronisation problems are diverse and require vastly different consistency restoration strategies, and to cope with the diversity, the programmer must have the ability to fully control and reason about the consistency restoration behaviour. The putback-based approach to bidirectional programming aims to provide exactly this ability, and this paper strengthens the putback-based position by proposing the first fully fledged reasoning framework for a bidirectional language — a Hoare-style logic for Ko et al.’s [2016] putback-based language BiGUL. The Hoare-style logic lets the BiGUL programmer precisely characterise the bidirectional behaviour of their programs by reasoning solely in the putback direction, thereby offering a unidirectional programming abstraction that is reasonably straightforward to work with and yet provides full control not achieved by previous approaches. The theory has been formalised and checked in Agda, but this paper presents the Hoare-style logic in a semi-formal way to make it easily understood and usable by the working BiGUL programmer. ![]() ![]() ![]() ![]() |
|
Inala, Jeevana Priya |
![]() Jeevana Priya Inala and Rishabh Singh (Massachusetts Institute of Technology, USA; Microsoft Research, USA) Data integration between web sources and relational data is a key challenge faced by data scientists and spreadsheet users. There are two main challenges in programmatically joining web data with relational data. First, most websites do not expose a direct interface to obtain tabular data, so the user needs to formulate a logic to get to different webpages for each input row in the relational table. Second, after reaching the desired webpage, the user needs to write complex scripts to extract the relevant data, which is often conditioned on the input data. Since many data scientists and end-users come from diverse backgrounds, writing such complex regular-expression based logical scripts to perform data integration tasks is unfortunately often beyond their programming expertise. We present WebRelate, a system that allows users to join semi-structured web data with relational data in spreadsheets using input-output examples. WebRelate decomposes the web data integration task into two sub-tasks of i) URL learning and ii) input-dependent web extraction. We introduce a novel synthesis paradigm called "Output-constrained Programming By Examples", which allows us to use the finite set of possible outputs for the new inputs to efficiently constrain the search in the synthesis algorithm. We instantiate this paradigm for the two sub-tasks in WebRelate. The first sub-task generates the URLs for the webpages containing the desired data for all rows in the relational table. WebRelate achieves this by learning a string transformation program using a few example URLs. The second sub-task uses examples of desired data to be extracted from the corresponding webpages and learns a program to extract the data for the other rows. We design expressive domain-specific languages for URL generation and web data extraction, and present efficient synthesis algorithms for learning programs in these DSLs from few input-output examples. We evaluate WebRelate on 88 real-world web data integration tasks taken from online help forums and Excel product team, and show that WebRelate can learn the desired programs within few seconds using only 1 example for the majority of the tasks. ![]() ![]() ![]() |
|
Jagadeesan, Radha |
![]() Brijesh Dongol, Radha Jagadeesan, and James Riely (Brunel University London, UK; DePaul University, USA) The integration of transactions into hardware relaxed memory architectures is a topic of current research both in industry and academia. In this paper, we provide a general architectural framework for the introduction of transactions into models of relaxed memory in hardware, including the SC, TSO, ARMv8 and PPC models. Our framework incorporates flexible and expressive forms of transaction aborts and execution that have hitherto been in the realm of software transactional memory. In contrast to software transactional memory, we account for the characteristics of relaxed memory as a restricted form of distributed system, without a notion of global time. We prove abstraction theorems to demonstrate that the programmer API matches the intuitions and expectations about transactions. ![]() ![]() |
|
Jagannathan, Suresh |
![]() Gowtham Kaki, Kartik Nagar, Mahsa Najafzadeh, and Suresh Jagannathan (Purdue University, USA) Serializability is a well-understood correctness criterion that simplifies reasoning about the behavior of concurrent transactions by ensuring they are isolated from each other while they execute. However, enforcing serializable isolation comes at a steep cost in performance because it necessarily restricts opportunities to exploit concurrency even when such opportunities would not violate application-specific invariants. As a result, database systems in practice support, and often encourage, developers to implement transactions using weaker alternatives. These alternatives break the strong isolation guarantees offered by serializable transactions to permit greater concurrency. Unfortunately, the semantics of weak isolation is poorly understood, and usually explained only informally in terms of low-level implementation artifacts. Consequently, verifying high-level correctness properties in such environments remains a challenging problem. To address this issue, we present a novel program logic that enables compositional reasoning about the behavior of concurrently executing weakly-isolated transactions. Recognizing that the proof burden necessary to use this logic may dissuade application developers, we also describe an inference procedure based on this foundation that ascertains the weakest isolation level that still guarantees the safety of high-level consistency assertions associated with such transactions. The key to effective inference is the observation that weakly-isolated transactions can be viewed as functional (monadic) computations over an abstract database state, allowing us to treat their operations as state transformers over the database. This interpretation enables automated verification using off-the-shelf SMT solvers. Our development is parametric over a transaction’s specific isolation semantics, allowing it to be applicable over a range of concurrency control mechanisms. Case studies and experiments on real-world applications (written in an embedded DSL in OCaml) demonstrate the utility of our approach, and provide strong evidence that automated verification of weakly-isolated transactions can be placed on the same formal footing as their strongly-isolated serializable counterparts. ![]() ![]() |
|
Janků, Petr |
![]() Lukáš Holík (Brno University of Technology, Czechia; University of Oxford, UK; Uppsala University, Sweden) String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature. ![]() ![]() ![]() ![]() |
|
Jhala, Ranjit |
![]() Niki Vazou, Anish Tondwalkar, Vikraman Choudhury (University of Maryland, USA; University of California at San Diego, USA; Indiana University, USA; University of Edinburgh, UK; Input Output HK, UK) We introduce Refinement Reflection, a new framework for building SMT-based deductive verifiers. The key idea is to reflect the code implementing a user-defined function into the function’s (output) refinement type. As a consequence, at uses of the function, the function definition is instantiated in the SMT logic in a precise fashion that permits decidable verification. Reflection allows the user to write equational proofs of programs just by writing other programs using pattern-matching and recursion to perform case-splitting and induction. Thus, via the propositions-as-types principle, we show that reflection permits the specification of arbitrary functional correctness properties. Finally, we introduce a proof-search algorithm called Proof by Logical Evaluation that uses techniques from model checking and abstract interpretation, to completely automate equational reasoning. We have implemented reflection in Liquid Haskell and used it to verify that the widely used instances of the Monoid, Applicative, Functor, and Monad typeclasses actually satisfy key algebraic laws required to make the clients safe, and have used reflection to build the first library that actually verifies assumptions about associativity and ordering that are crucial for safe deterministic parallelism. ![]() ![]() ![]() |
|
![]() |
Jourdan, Jacques-Henri |
![]() Ralf Jung (MPI-SWS, Germany; Delft University of Technology, Netherlands) Rust is a new systems programming language that promises to overcome the seemingly fundamental tradeoff between high-level safety guarantees and low-level control over resource management. Unfortunately, none of Rust's safety claims have been formally proven, and there is good reason to question whether they actually hold. Specifically, Rust employs a strong, ownership-based type system, but then extends the expressive power of this core type system through libraries that internally use unsafe features. In this paper, we give the first formal (and machine-checked) safety proof for a language representing a realistic subset of Rust. Our proof is extensible in the sense that, for each new Rust library that uses unsafe features, we can say what verification condition it must satisfy in order for it to be deemed a safe extension to the language. We have carried out this verification for some of the most important libraries that are used throughout the Rust ecosystem. ![]() ![]() ![]() ![]() |
![]() |
Jung, Ralf |
![]() Ralf Jung (MPI-SWS, Germany; Delft University of Technology, Netherlands) Rust is a new systems programming language that promises to overcome the seemingly fundamental tradeoff between high-level safety guarantees and low-level control over resource management. Unfortunately, none of Rust's safety claims have been formally proven, and there is good reason to question whether they actually hold. Specifically, Rust employs a strong, ownership-based type system, but then extends the expressive power of this core type system through libraries that internally use unsafe features. In this paper, we give the first formal (and machine-checked) safety proof for a language representing a realistic subset of Rust. Our proof is extensible in the sense that, for each new Rust library that uses unsafe features, we can say what verification condition it must satisfy in order for it to be deemed a safe extension to the language. We have carried out this verification for some of the most important libraries that are used throughout the Rust ecosystem. ![]() ![]() ![]() ![]() |
Kaki, Gowtham |
![]() Gowtham Kaki, Kartik Nagar, Mahsa Najafzadeh, and Suresh Jagannathan (Purdue University, USA) Serializability is a well-understood correctness criterion that simplifies reasoning about the behavior of concurrent transactions by ensuring they are isolated from each other while they execute. However, enforcing serializable isolation comes at a steep cost in performance because it necessarily restricts opportunities to exploit concurrency even when such opportunities would not violate application-specific invariants. As a result, database systems in practice support, and often encourage, developers to implement transactions using weaker alternatives. These alternatives break the strong isolation guarantees offered by serializable transactions to permit greater concurrency. Unfortunately, the semantics of weak isolation is poorly understood, and usually explained only informally in terms of low-level implementation artifacts. Consequently, verifying high-level correctness properties in such environments remains a challenging problem. To address this issue, we present a novel program logic that enables compositional reasoning about the behavior of concurrently executing weakly-isolated transactions. Recognizing that the proof burden necessary to use this logic may dissuade application developers, we also describe an inference procedure based on this foundation that ascertains the weakest isolation level that still guarantees the safety of high-level consistency assertions associated with such transactions. The key to effective inference is the observation that weakly-isolated transactions can be viewed as functional (monadic) computations over an abstract database state, allowing us to treat their operations as state transformers over the database. This interpretation enables automated verification using off-the-shelf SMT solvers. Our development is parametric over a transaction’s specific isolation semantics, allowing it to be applicable over a range of concurrency control mechanisms. Case studies and experiments on real-world applications (written in an embedded DSL in OCaml) demonstrate the utility of our approach, and provide strong evidence that automated verification of weakly-isolated transactions can be placed on the same formal footing as their strongly-isolated serializable counterparts. ![]() ![]() |
|
Kaminski, Benjamin Lucien |
![]() Annabelle McIver, Carroll Morgan, Benjamin Lucien Kaminski, and Joost-Pieter Katoen (Macquarie University, Australia; UNSW, Australia; Data61 at CSIRO, Australia; RWTH Aachen University, Germany; University College London, UK; IST Austria, Austria) We present a new proof rule for proving almost-sure termination of probabilistic programs, including those that contain demonic non-determinism. An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so. It applies directly to the program's source code, even if the program contains demonic choice. Like others, we use variant functions (a.k.a. "super-martingales") that are real-valued and decrease randomly on each loop iteration; but our key innovation is that the amount as well as the probability of the decrease are parametric. We prove the soundness of the new rule, indicate where its applicability goes beyond existing rules, and explain its connection to classical results on denumerable (non-demonic) Markov chains. ![]() ![]() |
|
Kammar, Ohad |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Katoen, Joost-Pieter |
![]() Annabelle McIver, Carroll Morgan, Benjamin Lucien Kaminski, and Joost-Pieter Katoen (Macquarie University, Australia; UNSW, Australia; Data61 at CSIRO, Australia; RWTH Aachen University, Germany; University College London, UK; IST Austria, Austria) We present a new proof rule for proving almost-sure termination of probabilistic programs, including those that contain demonic non-determinism. An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so. It applies directly to the program's source code, even if the program contains demonic choice. Like others, we use variant functions (a.k.a. "super-martingales") that are real-valued and decrease randomly on each loop iteration; but our key innovation is that the amount as well as the probability of the decrease are parametric. We prove the soundness of the new rule, indicate where its applicability goes beyond existing rules, and explain its connection to classical results on denumerable (non-demonic) Markov chains. ![]() ![]() |
|
Khoo, Siau-Cheng |
![]() Quang-Trung Ta, Ton Chanh Le, Siau-Cheng Khoo, and Wei-Ngan Chin (National University of Singapore, Singapore) The symbolic-heap fragment of separation logic has been actively developed and advocated for verifying the memory-safety property of computer programs. At present, one of its biggest challenges is to effectively prove entailments containing inductive heap predicates. These entailments are usually proof obligations generated when verifying programs that manipulate complex data structures like linked lists, trees, or graphs. To assist in proving such entailments, this paper introduces a lemma synthesis framework, which automatically discovers lemmas to serve as eureka steps in the proofs. Mathematical induction and template-based constraint solving are two pillars of our framework. To derive the supporting lemmas for a given entailment, the framework firstly identifies possible lemma templates from the entailment's heap structure. It then sets up unknown relations among each template's variables and conducts structural induction proof to generate constraints about these relations. Finally, it solves the constraints to find out actual definitions of the unknown relations, thus discovers the lemmas. We have integrated this framework into a prototype prover and have experimented it on various entailment benchmarks. The experimental results show that our lemma-synthesis-assisted prover can prove many entailments that could not be handled by existing techniques. This new proposal opens up more opportunities to automatically reason with complex inductive heap predicates. ![]() ![]() ![]() ![]() |
|
Kincaid, Zachary |
![]() Zachary Kincaid (Princeton University, USA; University of Wisconsin-Madison, USA; GrammaTech, USA) Automatic generation of non-linear loop invariants is a long-standing challenge in program analysis, with many applications. For instance, reasoning about exponentials provides a way to find invariants of digital-filter programs, and reasoning about polynomials and/or logarithms is needed for establishing invariants that describe the time or memory usage of many well-known algorithms. An appealing approach to this challenge is to exploit the powerful recurrence-solving techniques that have been developed in the field of computer algebra, which can compute exact characterizations of non-linear repetitive behavior. However, there is a gap between the capabilities of recurrence solvers and the needs of program analysis: (1) loop bodies are not merely systems of recurrence relations---they may contain conditional branches, nested loops, non-deterministic assignments, etc., and (2) a client program analyzer must be able to reason about the closed-form solutions produced by a recurrence solver (e.g., to prove assertions). This paper presents a method for generating non-linear invariants of general loops based on analyzing recurrence relations. The key components are an abstract domain for reasoning about non-linear arithmetic, a semantics-based method for extracting recurrence relations from loop bodies, and a recurrence solver that avoids closed forms that involve complex or irrational numbers. Our technique has been implemented in a program analyzer that can analyze general loops and mutually recursive procedures. Our experiments show that our technique shows promise for non-linear assertion-checking and resource-bound generation. ![]() ![]() ![]() ![]() Azadeh Farzan (University of Toronto, Canada; Princeton University, USA) Many problems in formal methods can be formalized as two-player games. For several applications—program synthesis, for example—in addition to determining which player wins the game, we are interested in computing a winning strategy for that player. This paper studies the strategy synthesis problem for games defined within the theory of linear rational arithmetic. Two types of games are considered. A satisfiability game, described by a quantified formula, is played by two players that take turns instantiating quantifiers. The objective of each player is to prove (or disprove) satisfiability of the formula. A reachability game, described by a pair of formulas defining the legal moves of each player, is played by two players that take turns choosing positions—rational vectors of some fixed dimension. The objective of each player is to reach a position where the opposing player has no legal moves (or to play the game forever). We give a complete algorithm for synthesizing winning strategies for satisfiability games and a sound (but necessarily incomplete) algorithm for synthesizing winning strategies for reachability games. ![]() ![]() ![]() |
|
Knauth, Alex |
![]() Stephen Chang, Alex Knauth, and Emina Torlak (Northeastern University, USA; University of Washington, USA) We present lambda_sym, a typed λ-calculus for lenient symbolic execution, where some language constructs do not recognize symbolic values. Its type system, however, ensures safe behavior of all symbolic values in a program. Our calculus extends a base occurrence typing system with symbolic types and mutable state, making it a suitable model for both functional and imperative symbolically executed languages. Naively allowing mutation in this mixed setting introduces soundness issues, however, so we further add concreteness polymorphism, which restores soundness without rejecting too many valid programs. To show that our calculus is a useful model for a real language, we implemented Typed Rosette, a typed extension of the solver-aided Rosette language. We evaluate Typed Rosette by porting a large code base, demonstrating that our type system accommodates a wide variety of symbolically executed programs. ![]() ![]() ![]() |
|
Ko, Hsiang-Shang |
![]() Hsiang-Shang Ko and Zhenjiang Hu (National Institute of Informatics, Japan) Among the frameworks of bidirectional transformations proposed for addressing various synchronisation (consistency maintenance) problems, Foster et al.’s [2007] asymmetric lenses have influenced the design of a generation of bidirectional programming languages. Most of these languages are based on a declarative programming model, and only allow the programmer to describe a consistency specification with ad hoc and/or awkward control over the consistency restoration behaviour. However, synchronisation problems are diverse and require vastly different consistency restoration strategies, and to cope with the diversity, the programmer must have the ability to fully control and reason about the consistency restoration behaviour. The putback-based approach to bidirectional programming aims to provide exactly this ability, and this paper strengthens the putback-based position by proposing the first fully fledged reasoning framework for a bidirectional language — a Hoare-style logic for Ko et al.’s [2016] putback-based language BiGUL. The Hoare-style logic lets the BiGUL programmer precisely characterise the bidirectional behaviour of their programs by reasoning solely in the putback direction, thereby offering a unidirectional programming abstraction that is reasonably straightforward to work with and yet provides full control not achieved by previous approaches. The theory has been formalised and checked in Agda, but this paper presents the Hoare-style logic in a semi-formal way to make it easily understood and usable by the working BiGUL programmer. ![]() ![]() ![]() ![]() |
|
Koch, Christoph E. |
![]() Lionel Parreaux (EPFL, Switzerland; University of Pennsylvania, USA) Metaprograms are programs that manipulate (generate, analyze and evaluate) other programs. These tasks are greatly facilitated by quasiquotation, a technique to construct and deconstruct program fragments using quoted code templates expressed in the syntax of the manipulated language. We argue that two main flavors of quasiquotes have existed so far: Lisp-style quasiquotes, which can both construct and deconstruct programs but may produce code that contains type mismatches and unbound variables; and MetaML-style quasiquotes, which rely on static typing to prevent these errors, but can only construct programs. In this paper, we show how to combine the advantages of both flavors into a unified framework: we allow the construction, deconstruction and evaluation of program fragments while ensuring that generated programs are well-typed and well-scoped, a combination unseen in previous work. We formalize our approach as λ{}, a multi-stage calculus with code pattern matching and rewriting, and prove its type safety. We also present its realization in Squid, a metaprogramming framework for Scala, leveraging Scala’s expressive type system. To demonstrate the usefulness of our approach, we introduce speculative rewrite rules, a novel code transformation technique that makes decisive use of these capabilities, and we outline how it simplifies the design of some crucial query compiler optimizations. ![]() ![]() ![]() |
|
Kokologiannakis, Michalis |
![]() Michalis Kokologiannakis, Ori Lahav (National Technical University of Athens, Greece; Tel Aviv University, Israel; Uppsala University, Sweden; MPI-SWS, Germany) We present a stateless model checking algorithm for verifying concurrent programs running under RC11, a repaired version of the C/C++11 memory model without dependency cycles. Unlike most previous approaches, which enumerate thread interleavings up to some partial order reduction improvements, our approach works directly on execution graphs and (in the absence of RMW instructions and SC atomics) avoids redundant exploration by construction. We have implemented a model checker, called RCMC, based on this approach and applied it to a number of challenging concurrent programs. Our experiments confirm that RCMC is significantly faster, scales better than other model checking tools, and is also more resilient to small changes in the benchmarks. ![]() ![]() ![]() ![]() |
|
![]() |
Kraus, Nicolai |
![]() Paolo Capriotti and Nicolai Kraus (University of Nottingham, UK) Category theory in homotopy type theory is intricate as categorical laws can only be stated ”up to homotopy”, and thus require coherences. The established notion of a univalent category (as introduced by Ahrens et al.) solves this by considering only truncated types, roughly corresponding to an ordinary category. This fails to capture many naturally occurring structures, stemming from the fact that the naturally occurring structures in homotopy type theory are not ordinary, but rather higher categories. Out of the large variety of approaches to higher category theory that mathematicians have proposed, we believe that, for type theory, the simplicial strategy is best suited. Work by Lurie and Harpaz motivates the following definition. Given the first (n+3) levels of a semisimplicial type S, we can equip S with three properties: first, contractibility of the types of certain horn fillers; second, a completeness property; and third, a truncation condition. We call this a complete semi-Segal n-type. This is very similar to an earlier suggestion by Schreiber. The definition of a univalent (1-) category by Ahrens et al. can easily be extended or restricted to the definition of a univalent n-category (more precisely, (n,1)-category) for n ∈ {0,1,2}, and we show that the type of complete semi-Segal n-types is equivalent to the type of univalent n-categories in these cases. Thus, we believe that the notion of a complete semi-Segal n-type can be taken as the definition of a univalent n-category. We provide a formalisation in the proof assistant Agda using a completely explicit representation of semi-simplicial types for levels up to 4. ![]() ![]() ![]() ![]() |
Krebbers, Robbert |
![]() Ralf Jung (MPI-SWS, Germany; Delft University of Technology, Netherlands) Rust is a new systems programming language that promises to overcome the seemingly fundamental tradeoff between high-level safety guarantees and low-level control over resource management. Unfortunately, none of Rust's safety claims have been formally proven, and there is good reason to question whether they actually hold. Specifically, Rust employs a strong, ownership-based type system, but then extends the expressive power of this core type system through libraries that internally use unsafe features. In this paper, we give the first formal (and machine-checked) safety proof for a language representing a realistic subset of Rust. Our proof is extensible in the sense that, for each new Rust library that uses unsafe features, we can say what verification condition it must satisfy in order for it to be deemed a safe extension to the language. We have carried out this verification for some of the most important libraries that are used throughout the Rust ecosystem. ![]() ![]() ![]() ![]() ![]() Casper Bach Poulsen (Delft University of Technology, Netherlands; Portland State University, USA) A definitional interpreter defines the semantics of an object language in terms of the (well-known) semantics of a host language, enabling understanding and validation of the semantics through execution. Combining a definitional interpreter with a separate type system requires a separate type safety proof. An alternative approach, at least for pure object languages, is to use a dependently-typed language to encode the object language type system in the definition of the abstract syntax. Using such intrinsically-typed abstract syntax definitions allows the host language type checker to verify automatically that the interpreter satisfies type safety. Does this approach scale to larger and more realistic object languages, and in particular to languages with mutable state and objects? In this paper, we describe and demonstrate techniques and libraries in Agda that successfully scale up intrinsically-typed definitional interpreters to handle rich object languages with non-trivial binding structures and mutable state. While the resulting interpreters are certainly more complex than the simply-typed λ-calculus interpreter we start with, we claim that they still meet the goals of being concise, comprehensible, and executable, while guaranteeing type safety for more elaborate object languages. We make the following contributions: (1) A dependent-passing style technique for hiding the weakening of indexed values as they propagate through monadic code. (2) An Agda library for programming with scope graphs and frames, which provides a uniform approach to dealing with name binding in intrinsically-typed interpreters. (3) Case studies of intrinsically-typed definitional interpreters for the simply-typed λ-calculus with references (STLC+Ref) and for a large subset of Middleweight Java (MJ). ![]() ![]() ![]() |
|
Krishna, Siddharth |
![]() Siddharth Krishna, Dennis Shasha, and Thomas Wies (New York University, USA) Concurrent separation logics have helped to significantly simplify correctness proofs for concurrent data structures. However, a recurring problem in such proofs is that data structure abstractions that work well in the sequential setting are much harder to reason about in a concurrent setting due to complex sharing and overlays. To solve this problem, we propose a novel approach to abstracting regions in the heap by encoding the data structure invariant into a local condition on each individual node. This condition may depend on a quantity associated with the node that is computed as a fixpoint over the entire heap graph. We refer to this quantity as a flow. Flows can encode both structural properties of the heap (e.g. the reachable nodes from the root form a tree) as well as data invariants (e.g. sortedness). We then introduce the notion of a flow interface, which expresses the relies and guarantees that a heap region imposes on its context to maintain the local flow invariant with respect to the global heap. Our main technical result is that this notion leads to a new semantic model of separation logic. In this model, flow interfaces provide a general abstraction mechanism for describing complex data structures. This abstraction mechanism admits proof rules that generalize over a wide variety of data structures. To demonstrate the versatility of our approach, we show how to extend the logic RGSep with flow interfaces. We have used this new logic to prove linearizability and memory safety of nontrivial concurrent data structures. In particular, we obtain parametric linearizability proofs for concurrent dictionary algorithms that abstract from the details of the underlying data structure representation. These proofs cannot be easily expressed using the abstraction mechanisms provided by existing separation logics. ![]() ![]() |
|
Krishnamoorthy, Sriram |
![]() Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and P. Sadayappan (Ohio State University, USA; Pacific Northwest National Laboratory, USA; Colorado State University, USA) Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost models are used, due to the lack of precise compile-time models for misses for hierarchical caches. The current state of practice for cache miss analysis is based on accurate simulation. However, simulation requires time proportional to the dataset/problem size, as well as the number of distinct cache configurations of interest to be evaluated. This paper takes a fundamentally different approach, by focusing on polyhedral programs with static control flow. Instead of relying on costly simulation, a closed-form solution for modeling of misses in a set associative cache hierarchy is developed. This solution can enable program transformation choice at compile time to optimize cache misses. A tool implementing the approach has been developed and used for validation of the framework. ![]() ![]() |
|
Krogh-Jespersen, Morten |
![]() Amin Timany, Léo Stefanesco, Morten Krogh-Jespersen, and Lars Birkedal (KU Leuven, Belgium; IRIF, France; CNRS, France; University of Paris Diderot, France; Aarhus University, Denmark) We present a logical relations model of a higher-order functional programming language with impredicative polymorphism, recursive types, and a Haskell-style ST monad type with runST. We use our logical relations model to show that runST provides proper encapsulation of state, by showing that effectful computations encapsulated by runST are heap independent. Furthermore, we show that contextual refinements and equivalences that are expected to hold for pure computations do indeed hold in the presence of runST. This is the first time such relational results have been proven for a language with monadic encapsulation of state. We have formalized all the technical development and results in Coq. ![]() ![]() ![]() |
|
Kunčar, Ondřej |
![]() Ondřej Kunčar and Andrei Popescu (TU Munich, Germany; Middlesex University, UK; Institute of Mathematics at Romanian Academy, Romania) Definitions are traditionally considered to be a safe mechanism for introducing concepts on top of a logic known to be consistent. In contrast to arbitrary axioms, definitions should in principle be treatable as a form of abbreviation, and thus compiled away from the theory without losing provability. In particular, definitions should form a conservative extension of the pure logic. These properties are crucial for modern interactive theorem provers, since they ensure the consistency of the logic, as well as a valid environment for total/certified functional programming. We prove these properties, namely, safety and conservativity, for Higher-Order Logic (HOL), a logic implemented in several mainstream theorem provers and relied upon by thousands of users. Some unique features of HOL, such as the requirement to give non-emptiness proofs when defining new types and the impossibility to unfold type definitions, make the proof of these properties, and also the very formulation of safety, nontrivial. Our study also factors in the essential variation of HOL definitions featured by Isabelle/HOL, a popular member of the HOL-based provers family. The current work improves on recent results which showed a weaker property, consistency of Isabelle/HOL's definitions. ![]() ![]() ![]() |
|
Lahav, Ori |
![]() Michalis Kokologiannakis, Ori Lahav (National Technical University of Athens, Greece; Tel Aviv University, Israel; Uppsala University, Sweden; MPI-SWS, Germany) We present a stateless model checking algorithm for verifying concurrent programs running under RC11, a repaired version of the C/C++11 memory model without dependency cycles. Unlike most previous approaches, which enumerate thread interleavings up to some partial order reduction improvements, our approach works directly on execution graphs and (in the absence of RMW instructions and SC atomics) avoids redundant exploration by construction. We have implemented a model checker, called RCMC, based on this approach and applied it to a number of challenging concurrent programs. Our experiments confirm that RCMC is significantly faster, scales better than other model checking tools, and is also more resilient to small changes in the benchmarks. ![]() ![]() ![]() ![]() |
|
Lahiri, Shuvendu K. |
![]() Yuepeng Wang, Isil Dillig (University of Texas at Austin, USA; Microsoft Research, USA) This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the same externally observable behavior before and after schema migration, there are no existing tools for proving equivalence of such programs. This paper takes a first step towards solving this problem by formalizing the equivalence and refinement checking problems for database-driven applications. We also propose a proof methodology based on the notion of bisimulation invariants over relational algebra with updates and describe a technique for synthesizing such bisimulation invariants. We have implemented the proposed technique in a tool called Mediator for verifying equivalence between database-driven applications written in our intermediate language and evaluate our tool on 21 benchmarks extracted from textbooks and real-world web applications. Our results show that the proposed methodology can successfully verify 20 of these benchmarks. ![]() ![]() |
|
Lampropoulos, Leonidas |
![]() Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce (University of Pennsylvania, USA; Princeton University, USA) Property-based random testing (PBRT) is widely used in the functional programming and verification communities. For testing simple properties, PBRT tools such as QuickCheck can automatically generate random inputs of a given type. But for more complex properties, effective testing often demands generators for random inputs that belong to a given type and satisfy some logical condition. QuickCheck provides a library of combinators for building such generators by hand, but this can be tedious for simple conditions and error prone for more complex ones. Fortunately, the process can often be automated. The most prominent method, narrowing, works by traversing the structure of the condition, lazily instantiating parts of the data structure as constraints involving them are met. We show how to use ideas from narrowing to compile a large subclass of Coq's inductive relations into efficient generators, avoiding the interpretive overhead of previous implementations. More importantly, the same compilation technique allows us to produce proof terms certifying that each derived generator is good---i.e., sound and complete with respect to the inductive relation it was derived from. We implement our algorithm as an extension of QuickChick, an existing tool for property-based testing in Coq. We evaluate our method by automatically deriving good generators for the majority of the specifications in Software Foundations, a formalized textbook on programming language foundations. ![]() ![]() ![]() ![]() |
|
Le, Ton Chanh |
![]() Quang-Trung Ta, Ton Chanh Le, Siau-Cheng Khoo, and Wei-Ngan Chin (National University of Singapore, Singapore) The symbolic-heap fragment of separation logic has been actively developed and advocated for verifying the memory-safety property of computer programs. At present, one of its biggest challenges is to effectively prove entailments containing inductive heap predicates. These entailments are usually proof obligations generated when verifying programs that manipulate complex data structures like linked lists, trees, or graphs. To assist in proving such entailments, this paper introduces a lemma synthesis framework, which automatically discovers lemmas to serve as eureka steps in the proofs. Mathematical induction and template-based constraint solving are two pillars of our framework. To derive the supporting lemmas for a given entailment, the framework firstly identifies possible lemma templates from the entailment's heap structure. It then sets up unknown relations among each template's variables and conducts structural induction proof to generate constraints about these relations. Finally, it solves the constraints to find out actual definitions of the unknown relations, thus discovers the lemmas. We have integrated this framework into a prototype prover and have experimented it on various entailment benchmarks. The experimental results show that our lemma-synthesis-assisted prover can prove many entailments that could not be handled by existing techniques. This new proposal opens up more opportunities to automatically reason with complex inductive heap predicates. ![]() ![]() ![]() ![]() |
|
Lee, Wonyeol |
![]() Wonyeol Lee (Stanford University, USA; Microsoft Research, India) Industry standard implementations of math.h claim (often without formal proof) tight bounds on floating-point errors. We demonstrate a novel static analysis that proves these bounds and verifies the correctness of these implementations. Our key insight is a reduction of this verification task to a set of mathematical optimization problems that can be solved by off-the-shelf computer algebra systems. We use this analysis to prove the correctness of implementations in Intel's math library automatically. Prior to this work, these implementations could only be verified with significant manual effort. ![]() ![]() |
|
Li, Yangjia |
![]() Yangjia Li and Mingsheng Ying (Institute of Software at Chinese Academy of Sciences, China; University of Technology Sydney, Australia; Tsinghua University, China) We introduce the notion of linear ranking super-martingale (LRSM) for quantum programs (with nondeterministic choices, namely angelic and demonic choices). Several termination theorems are established showing that the existence of the LRSMs of a quantum program implies its termination. Thus, the termination problems of quantum programs is reduced to realisability and synthesis of LRSMs. We further show that the realisability and synthesis problem of LRSMs for quantum programs can be reduced to an SDP (Semi-Definite Programming) problem, which can be settled with the existing SDP solvers. The techniques developed in this paper are used to analyse the termination of several example quantum programs, including quantum random walks and quantum Bernoulli factory for random number generation. This work is essentially a generalisation of constraint-based approach to the corresponding problems for probabilistic programs developed in the recent literature by adding two novel ideas: (1) employing the fundamental Gleason's theorem in quantum mechanics to guide the choices of templates; and (2) a generalised Farkas' lemma in terms of observables (Hermitian operators) in quantum physics. ![]() ![]() |
|
Liang, Hongjin |
![]() Hongjin Liang and Xinyu Feng (Nanjing University, China; University of Science and Technology of China, China) Various progress properties have been proposed for concurrent objects, such as wait-freedom, lock-freedom, starvation-freedom and deadlock-freedom. However, none of them applies to concurrent objects with partial methods, i.e., methods that are supposed not to return under certain circumstances. A typical example is the lock_acquire method, which must not return when the lock has already been acquired. In this paper we propose two new progress properties, partial starvation-freedom (PSF) and partial deadlock-freedom (PDF), for concurrent objects with partial methods. We also design four patterns to write abstract specifications for PSF or PDF objects under strongly or weakly fair scheduling, so that these objects contextually refine the abstract specifications. Our Abstraction Theorem shows the equivalence between PSF (or PDF) and the progress-aware contextual refinement. Finally, we generalize the program logic LiLi to have a new logic to verify the PSF (or PDF) property and linearizability of concurrent objects. ![]() ![]() |
|
Lin, Anthony W. |
![]() Lukáš Holík (Brno University of Technology, Czechia; University of Oxford, UK; Uppsala University, Sweden) String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature. ![]() ![]() ![]() ![]() ![]() Taolue Chen, Yan Chen, Matthew Hague (University of London, UK; Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Royal Holloway University of London, UK; University of Oxford, UK) The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in practice; for example, the use of regular constraints (pattern matching against a regular expression), and the string-replace function (replacing either the first occurrence or all occurrences of a ``pattern'' string constant/variable/regular expression by a ``replacement'' string constant/variable), among many others. Both regular constraints and the string-replace function are crucial for such applications as analysis of JavaScript (or more generally HTML5 applications) against cross-site scripting (XSS) vulnerabilities, which motivates us to consider a richer class of string constraints. The importance of the string-replace function (especially the replace-all facility) is increasingly recognised, which can be witnessed by the incorporation of the function in the input languages of several string constraint solvers. Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typically met by string constraints that are generated by symbolic execution. In this paper, we provide the first systematic study of straight-line string constraints with the string-replace function and the regular constraints as the basic operations. We show that a large class of such constraints (i.e. when only a constant string or a regular expression is permitted in the pattern) is decidable. We note that the string-replace function, even under this restriction, is sufficiently powerful for expressing the concatenation operator and much more (e.g. extensions of regular expressions with string variables). This gives us the most expressive decidable logic containing concatenation, replace, and regular constraints under the same umbrella. Our decision procedure for the straight-line fragment follows an automata-theoretic approach, and is modular in the sense that the string-replace terms are removed one by one to generate more and more regular constraints, which can then be discharged by the state-of-the-art string constraint solvers. We also show that this fragment is, in a way, a maximal decidable subclass of the straight-line fragment with string-replace and regular constraints. To this end, we show undecidability results for the following two extensions: (1) variables are permitted in the pattern parameter of the replace function, (2) length constraints are permitted. ![]() ![]() |
|
Liu, Fengyun |
![]() Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki (EPFL, Switzerland; Northeastern University, USA) Understanding a program entails understanding its context; dependencies, configurations and even implementations are all forms of contexts. Modern programming languages and theorem provers offer an array of constructs to define contexts, implicitly. Scala offers implicit parameters which are used pervasively, but which cannot be abstracted over. This paper describes a generalization of implicit parameters to implicit function types, a powerful way to abstract over the context in which some piece of code is run. We provide a formalization based on bidirectional type-checking that closely follows the semantics implemented by the Scala compiler. To demonstrate their range of abstraction capabilities, we present several applications that make use of implicit function types. We show how to encode the builder pattern, tagless interpreters, reader and free monads and we assess the performance of the monadic structures presented. ![]() ![]() ![]() |
|
Löding, Christof |
![]() Christof Löding, P. Madhusudan (RWTH Aachen University, Germany; University of Illinois at Urbana-Champaign, USA) We give foundational results that explain the efficacy of heuristics used for dealing with quantified formulas and recursive definitions. We develop a framework for first order logic (FOL) over an uninterpreted combination of background theories. Our central technical result is that systematic term instantiation is complete for a fragment of FOL that we call safe. Coupled with the fact that unfolding recursive definitions is essentially term instantiation and with the observation that heap verification engines generate verification conditions in the safe fragment explains the efficacy of verification engines like natural proofs that resort to such heuristics. Furthermore, we study recursive definitions with least fixpoint semantics and show that though they are not amenable to complete procedures, we can systematically introduce induction principles that in practice bridge the divide between FOL and FOL with recursive definitions. ![]() ![]() ![]() |
|
Losa, Giuliano |
![]() Oded Padon, Jochen Hoenicke, Giuliano Losa, Andreas Podelski (Tel Aviv University, Israel; University of Freiburg, Germany; University of California at Los Angeles, USA) We develop a new technique for verifying temporal properties of infinite-state (distributed) systems. The main idea is to reduce the temporal verification problem to the problem of verifying the safety of infinite-state systems expressed in first-order logic. This allows to leverage existing techniques for safety verification to verify temporal properties of interesting distributed protocols, including some that have not been mechanically verified before. We model infinite-state systems using first-order logic, and use first-order temporal logic (FO-LTL) to specify temporal properties. This general formalism allows to naturally model distributed systems, while supporting both unbounded-parallelism (where the system is allowed to dynamically create processes), and infinite-state per process. The traditional approach for verifying temporal properties of infinite-state systems employs well-founded relations (e.g. using linear arithmetic ranking functions). In contrast, our approach is based the idea of fair cycle detection. In finite-state systems, temporal verification can always be reduced to fair cycle detection (a system contains a fair cycle if it revisits a state after satisfying all fairness constraints). However, with both infinitely many states and infinitely many fairness constraints, a straightforward reduction to fair cycle detection is unsound. To regain soundness, we augment the infinite-state transition system by a dynamically computed finite set, that exploits the locality of transitions. This set lets us define a form of fair cycle detection that is sound in the presence of both infinitely many states, and infinitely many fairness constraints. Our approach allows a new style of temporal verification that does not explicitly involve ranking functions. This fits well with pure first-order verification which does not explicitly reason about numerical values. In particular, it can be used with effectively propositional first-order logic (EPR), in which case checking verification conditions is decidable. We applied our technique to verify temporal properties of several interesting protocols. To the best of our knowledge, we have obtained the first mechanized liveness proof for both TLB Shootdown, and Stoppable Paxos. ![]() ![]() ![]() ![]() |
|
Madhusudan, P. |
![]() Christof Löding, P. Madhusudan (RWTH Aachen University, Germany; University of Illinois at Urbana-Champaign, USA) We give foundational results that explain the efficacy of heuristics used for dealing with quantified formulas and recursive definitions. We develop a framework for first order logic (FOL) over an uninterpreted combination of background theories. Our central technical result is that systematic term instantiation is complete for a fragment of FOL that we call safe. Coupled with the fact that unfolding recursive definitions is essentially term instantiation and with the observation that heap verification engines generate verification conditions in the safe fragment explains the efficacy of verification engines like natural proofs that resort to such heuristics. Furthermore, we study recursive definitions with least fixpoint semantics and show that though they are not amenable to complete procedures, we can systematically introduce induction principles that in practice bridge the divide between FOL and FOL with recursive definitions. ![]() ![]() ![]() |
|
Maillard, Kenji |
![]() Danel Ahman, Cédric Fournet (Inria, France; Microsoft Research, UK; ENS Paris, France; Microsoft Research, India; Microsoft Research, USA) We provide a way to ease the verification of programs whose state evolves monotonically. The main idea is that a property witnessed in a prior state can be soundly recalled in the current state, provided (1) state evolves according to a given preorder, and (2) the property is preserved by this preorder. In many scenarios, such monotonic reasoning yields concise modular proofs, saving the need for explicit program invariants. We distill our approach into the monotonic-state monad, a general yet compact interface for Hoare-style reasoning about monotonic state in a dependently typed language. We prove the soundness of the monotonic-state monad and use it as a unified foundation for reasoning about monotonic state in the F⋆ verification system. Based on this foundation, we build libraries for various mutable data structures like monotonic references and apply these libraries at scale to the verification of several distributed applications. ![]() ![]() ![]() ![]() |
|
Majumdar, Rupak |
![]() Rupak Majumdar (MPI-SWS, Germany) Random testing has proven to be an effective way to catch bugs in distributed systems in the presence of network partition faults. This is surprising, as the space of potentially faulty executions is enormous, and the bugs depend on a subtle interplay between sequences of operations and faults. We provide a theoretical justification of the effectiveness of random testing in this context. First, we show a general construction, using the probabilistic method from combinatorics, that shows that whenever a random test covers a fixed coverage goal with sufficiently high probability, a small randomly-chosen set of tests achieves full coverage with high probability. In particular, we show that our construction can give test sets exponentially smaller than systematic enumeration. Second, based on an empirical study of many bugs found by random testing in production distributed systems, we introduce notions of test coverage relating to network partition faults which are effective in finding bugs. Finally, we show using combinatorial arguments that for these notions of test coverage we introduce, we can find a lower bound on the probability that a random test covers a given goal. Our general construction then explains why random testing tools achieve good coverage---and hence, find bugs---quickly. While we formulate our results in terms of network partition faults, our construction provides a step towards rigorous analysis of random testing algorithms, and can be applicable in other scenarios. ![]() ![]() |
|
Maksimović, Petar |
![]() José Fragoso Santos, Petar Maksimović, Daiva Naudžiūnienė, Thomas Wood, and Philippa Gardner (Imperial College London, UK; Mathematical Institute SASA, Serbia) The dynamic nature of JavaScript and its complex semantics make it a difficult target for logic-based verification. We introduce JaVerT, a semi-automatic JavaScript Verification Toolchain, based on separation logic and aimed at the specialist developer wanting rich, mechanically verified specifications of critical JavaScript code. To specify JavaScript programs, we design abstractions that capture its key heap structures (for example, prototype chains and function closures), allowing the developer to write clear and succinct specifications with minimal knowledge of the JavaScript internals. To verify JavaScript programs, we develop JaVerT, a verification pipeline consisting of: JS-2-JSIL, a well-tested compiler from JavaScript to JSIL, an intermediate goto language capturing the fundamental dynamic features of JavaScript; JSIL Verify, a semi-automatic verification tool based on a sound JSIL separation logic; and verified axiomatic specifications of the JavaScript internal functions. Using JaVerT, we verify functional correctness properties of: data-structure libraries (key-value map, priority queue) written in an object-oriented style; operations on data structures such as binary search trees (BSTs) and lists; examples illustrating function closures; and test cases from the official ECMAScript test suite. The verification times suggest that reasoning about larger, more complex code using JaVerT is feasible. ![]() ![]() ![]() |
|
Mazza, Damiano |
![]() Damiano Mazza, Luc Pellissier, and Pierre Vial (CNRS, France; University of Paris 13, France; University of Paris Diderot, France) Starting from an exact correspondence between linear approximations and non-idempotent intersection types, we develop a general framework for building systems of intersection types characterizing normalization properties. We show how this construction, which uses in a fundamental way Melliès and Zeilberger's ``type systems as functors'' viewpoint, allows us to recover equivalent versions of every well known intersection type system (including Coppo and Dezani's original system, as well as its non-idempotent variants independently introduced by Gardner and de Carvalho). We also show how new systems of intersection types may be built almost automatically in this way. ![]() ![]() |
|
McIver, Annabelle |
![]() Annabelle McIver, Carroll Morgan, Benjamin Lucien Kaminski, and Joost-Pieter Katoen (Macquarie University, Australia; UNSW, Australia; Data61 at CSIRO, Australia; RWTH Aachen University, Germany; University College London, UK; IST Austria, Austria) We present a new proof rule for proving almost-sure termination of probabilistic programs, including those that contain demonic non-determinism. An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so. It applies directly to the program's source code, even if the program contains demonic choice. Like others, we use variant functions (a.k.a. "super-martingales") that are real-valued and decrease randomly on each loop iteration; but our key innovation is that the amount as well as the probability of the decrease are parametric. We prove the soundness of the new rule, indicate where its applicability goes beyond existing rules, and explain its connection to classical results on denumerable (non-demonic) Markov chains. ![]() ![]() |
|
Melo, Leandro T. C. |
![]() Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araújo, and Fernando Magno Quintão Pereira (Federal University of Minas Gerais, Brazil; Federal University of Ouro Preto, Brazil) Incomplete source code naturally emerges in software development: during the design phase, while evolving, testing and analyzing programs. Therefore, the ability to understand partial programs is a valuable asset. However, this problem is still unsolved in the C programming language. Difficulties stem from the fact that parsing C requires, not only syntax, but also semantic information. Furthermore, inferring types so that they respect C's type system is a challenging task. In this paper we present a technique that lets us solve these problems. We provide a unification-based type inference capable of dealing with C intricacies. The ideas we present let us reconstruct partial C programs into complete well-typed ones. Such program reconstruction has several applications: enabling static analysis tools in scenarios where software components may be absent; improving static analysis tools that do not rely on build-specifications; allowing stub-generation and testing tools to work on snippets; and assisting programmers on the extraction of reusable data-structures out of the program parts that use them. Our evaluation is performed on source code from a variety of C libraries such as GNU's Coreutils, GNULib, GNOME's GLib, and GDSL; on implementations from Sedgewick's books; and on snippets from popular open-source projects like CPython, FreeBSD, and Git. ![]() ![]() ![]() ![]() |
|
Michalevsky, Yan |
![]() Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar (Tel Aviv University, Israel; VMware, USA; Stanford University, USA) Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object's local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contract executions, excluding these of the DAO or of contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state. ![]() ![]() ![]() |
|
Miller, Heather |
![]() Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki (EPFL, Switzerland; Northeastern University, USA) Understanding a program entails understanding its context; dependencies, configurations and even implementations are all forms of contexts. Modern programming languages and theorem provers offer an array of constructs to define contexts, implicitly. Scala offers implicit parameters which are used pervasively, but which cannot be abstracted over. This paper describes a generalization of implicit parameters to implicit function types, a powerful way to abstract over the context in which some piece of code is run. We provide a formalization based on bidirectional type-checking that closely follows the semantics implemented by the Scala compiler. To demonstrate their range of abstraction capabilities, we present several applications that make use of implicit function types. We show how to encode the builder pattern, tagless interpreters, reader and free monads and we assess the performance of the monadic structures presented. ![]() ![]() ![]() |
|
Miltner, Anders |
![]() Anders Miltner, Kathleen Fisher, Benjamin C. Pierce (Princeton University, USA; Tufts University, USA; University of Pennsylvania, USA) Bidirectional transformations between different data representations occur frequently in modern software systems. They appear as serializers and deserializers, as parsers and pretty printers, as database views and view updaters, and as a multitude of different kinds of ad hoc data converters. Manually building bidirectional transformations---by writing two separate functions that are intended to be inverses---is tedious and error prone. A better approach is to use a domain-specific language in which both directions can be written as a single expression. However, these domain-specific languages can be difficult to program in, requiring programmers to manage fiddly details while working in a complex type system. We present an alternative approach. Instead of coding transformations manually, we synthesize them from declarative format descriptions and examples. Specifically, we present Optician, a tool for type-directed synthesis of bijective string transformers. The inputs to Optician are a pair of ordinary regular expressions representing two data formats and a few concrete examples for disambiguation. The output is a well-typed program in Boomerang (a bidirectional language based on the theory of lenses). The main technical challenge involves navigating the vast program search space efficiently. In particular, and unlike most prior work on type-directed synthesis, our system operates in the context of a language with a rich equivalence relation on types (the theory of regular expressions). Consequently, program synthesis requires search in two dimensions: First, our synthesis algorithm must find a pair of "syntactically compatible types," and second, using the structure of those types, it must find a type- and example-compliant term. Our key insight is that it is possible to reduce the size of this search space without losing any computational power by defining a new language of lenses designed specifically for synthesis. The new language is free from arbitrary function composition and operates only over types and terms in a new disjunctive normal form. We prove (1) our new language is just as powerful as a more natural, compositional, and declarative language and (2) our synthesis algorithm is sound and complete with respect to the new language. We also demonstrate empirically that our new language changes the synthesis problem from one that admits intractable solutions to one that admits highly efficient solutions, able to synthesize intricate lenses between complex file formats in seconds. We evaluate Optician on a benchmark suite of 39 examples that includes both microbenchmarks and realistic examples derived from other data management systems including Flash Fill, a tool for synthesizing string transformations in spreadsheets, and Augeas, a tool for bidirectional processing of Linux system configuration files. ![]() ![]() ![]() |
|
Morgan, Carroll |
![]() Annabelle McIver, Carroll Morgan, Benjamin Lucien Kaminski, and Joost-Pieter Katoen (Macquarie University, Australia; UNSW, Australia; Data61 at CSIRO, Australia; RWTH Aachen University, Germany; University College London, UK; IST Austria, Austria) We present a new proof rule for proving almost-sure termination of probabilistic programs, including those that contain demonic non-determinism. An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so. It applies directly to the program's source code, even if the program contains demonic choice. Like others, we use variant functions (a.k.a. "super-martingales") that are real-valued and decrease randomly on each loop iteration; but our key innovation is that the amount as well as the probability of the decrease are parametric. We prove the soundness of the new rule, indicate where its applicability goes beyond existing rules, and explain its connection to classical results on denumerable (non-demonic) Markov chains. ![]() ![]() |
|
Moss, Sean K. |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Murawski, Andrzej S. |
![]() Pierre Clairambault, Charles Grellois, and Andrzej S. Murawski (University of Lyon, France; CNRS, France; ENS Lyon, France; Claude Bernard University Lyon 1, France; LIP, France; Inria, France; Aix-Marseille University, France; ENSAM, France; University of Toulon, France; University of Oxford, UK) Higher-order recursion schemes (HORS) have recently emerged as a promising foundation for higher-order program verification. We examine the impact of enriching HORS with linear types. To that end, we introduce two frameworks that blend non-linear and linear types: a variant of the λY -calculus and an extension of HORS, called linear HORS (LHORS). First we prove that the two formalisms are equivalent and there exist polynomial-time translations between them. Then, in order to support model-checking of (trees generated by) LHORS, we propose a refined version of alternating parity tree automata, called LNAPTA, whose behaviour depends on information about linearity. We show that the complexity of LNAPTA model-checking for LHORS depends on two type-theoretic parameters: linear order and linear depth. The former is in general smaller than the standard notion of order and ignores linear function spaces. In contrast, the latter measures the depth of linear clusters inside a type. Our main result states that LNAPTA model-checking of LHORS of linear order n is n-EXPTIME-complete, when linear depth is fixed. This generalizes and improves upon the classic result of Ong, which relies on the standard notion of order. To illustrate the significance of the result, we consider two applications: the MSO model-checking problem on variants of HORS with case distinction (RSFD and HORSC) on a finite domain and a call-by-value resource verification problem. In both cases, decidability can be established by translation into HORS, but the implied complexity bounds will be suboptimal due to increases in type order. In contrast, we show that the complexity bounds derived by translations into LHORS and appealing to our result are optimal in that they match the respective hardness results. ![]() ![]() |
|
Nagar, Kartik |
![]() Gowtham Kaki, Kartik Nagar, Mahsa Najafzadeh, and Suresh Jagannathan (Purdue University, USA) Serializability is a well-understood correctness criterion that simplifies reasoning about the behavior of concurrent transactions by ensuring they are isolated from each other while they execute. However, enforcing serializable isolation comes at a steep cost in performance because it necessarily restricts opportunities to exploit concurrency even when such opportunities would not violate application-specific invariants. As a result, database systems in practice support, and often encourage, developers to implement transactions using weaker alternatives. These alternatives break the strong isolation guarantees offered by serializable transactions to permit greater concurrency. Unfortunately, the semantics of weak isolation is poorly understood, and usually explained only informally in terms of low-level implementation artifacts. Consequently, verifying high-level correctness properties in such environments remains a challenging problem. To address this issue, we present a novel program logic that enables compositional reasoning about the behavior of concurrently executing weakly-isolated transactions. Recognizing that the proof burden necessary to use this logic may dissuade application developers, we also describe an inference procedure based on this foundation that ascertains the weakest isolation level that still guarantees the safety of high-level consistency assertions associated with such transactions. The key to effective inference is the observation that weakly-isolated transactions can be viewed as functional (monadic) computations over an abstract database state, allowing us to treat their operations as state transformers over the database. This interpretation enables automated verification using off-the-shelf SMT solvers. Our development is parametric over a transaction’s specific isolation semantics, allowing it to be applicable over a range of concurrency control mechanisms. Case studies and experiments on real-world applications (written in an embedded DSL in OCaml) demonstrate the utility of our approach, and provide strong evidence that automated verification of weakly-isolated transactions can be placed on the same formal footing as their strongly-isolated serializable counterparts. ![]() ![]() |
|
Najafzadeh, Mahsa |
![]() Gowtham Kaki, Kartik Nagar, Mahsa Najafzadeh, and Suresh Jagannathan (Purdue University, USA) Serializability is a well-understood correctness criterion that simplifies reasoning about the behavior of concurrent transactions by ensuring they are isolated from each other while they execute. However, enforcing serializable isolation comes at a steep cost in performance because it necessarily restricts opportunities to exploit concurrency even when such opportunities would not violate application-specific invariants. As a result, database systems in practice support, and often encourage, developers to implement transactions using weaker alternatives. These alternatives break the strong isolation guarantees offered by serializable transactions to permit greater concurrency. Unfortunately, the semantics of weak isolation is poorly understood, and usually explained only informally in terms of low-level implementation artifacts. Consequently, verifying high-level correctness properties in such environments remains a challenging problem. To address this issue, we present a novel program logic that enables compositional reasoning about the behavior of concurrently executing weakly-isolated transactions. Recognizing that the proof burden necessary to use this logic may dissuade application developers, we also describe an inference procedure based on this foundation that ascertains the weakest isolation level that still guarantees the safety of high-level consistency assertions associated with such transactions. The key to effective inference is the observation that weakly-isolated transactions can be viewed as functional (monadic) computations over an abstract database state, allowing us to treat their operations as state transformers over the database. This interpretation enables automated verification using off-the-shelf SMT solvers. Our development is parametric over a transaction’s specific isolation semantics, allowing it to be applicable over a range of concurrency control mechanisms. Case studies and experiments on real-world applications (written in an embedded DSL in OCaml) demonstrate the utility of our approach, and provide strong evidence that automated verification of weakly-isolated transactions can be placed on the same formal footing as their strongly-isolated serializable counterparts. ![]() ![]() |
|
Naudžiūnienė, Daiva |
![]() José Fragoso Santos, Petar Maksimović, Daiva Naudžiūnienė, Thomas Wood, and Philippa Gardner (Imperial College London, UK; Mathematical Institute SASA, Serbia) The dynamic nature of JavaScript and its complex semantics make it a difficult target for logic-based verification. We introduce JaVerT, a semi-automatic JavaScript Verification Toolchain, based on separation logic and aimed at the specialist developer wanting rich, mechanically verified specifications of critical JavaScript code. To specify JavaScript programs, we design abstractions that capture its key heap structures (for example, prototype chains and function closures), allowing the developer to write clear and succinct specifications with minimal knowledge of the JavaScript internals. To verify JavaScript programs, we develop JaVerT, a verification pipeline consisting of: JS-2-JSIL, a well-tested compiler from JavaScript to JSIL, an intermediate goto language capturing the fundamental dynamic features of JavaScript; JSIL Verify, a semi-automatic verification tool based on a sound JSIL separation logic; and verified axiomatic specifications of the JavaScript internal functions. Using JaVerT, we verify functional correctness properties of: data-structure libraries (key-value map, priority queue) written in an object-oriented style; operations on data structures such as binary search trees (BSTs) and lists; examples illustrating function closures; and test cases from the official ECMAScript test suite. The verification times suggest that reasoning about larger, more complex code using JaVerT is feasible. ![]() ![]() ![]() |
|
Newton, Ryan R. |
![]() Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack (University of Gothenburg, Sweden; Tweag I/O, France; Indiana University, USA; Microsoft Research, UK) Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values. To demonstrate the efficacy of our linear type system — both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types — we implemented our type system in ghc, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions. ![]() ![]() ![]() ![]() Niki Vazou, Anish Tondwalkar, Vikraman Choudhury (University of Maryland, USA; University of California at San Diego, USA; Indiana University, USA; University of Edinburgh, UK; Input Output HK, UK) We introduce Refinement Reflection, a new framework for building SMT-based deductive verifiers. The key idea is to reflect the code implementing a user-defined function into the function’s (output) refinement type. As a consequence, at uses of the function, the function definition is instantiated in the SMT logic in a precise fashion that permits decidable verification. Reflection allows the user to write equational proofs of programs just by writing other programs using pattern-matching and recursion to perform case-splitting and induction. Thus, via the propositions-as-types principle, we show that reflection permits the specification of arbitrary functional correctness properties. Finally, we introduce a proof-search algorithm called Proof by Logical Evaluation that uses techniques from model checking and abstract interpretation, to completely automate equational reasoning. We have implemented reflection in Liquid Haskell and used it to verify that the widely used instances of the Monoid, Applicative, Functor, and Monad typeclasses actually satisfy key algebraic laws required to make the clients safe, and have used reflection to build the first library that actually verifies assumptions about associativity and ordering that are crucial for safe deterministic parallelism. ![]() ![]() ![]() |
|
Nguyễn, Phúc C. |
![]() Phúc C. Nguyễn, Thomas Gilray, Sam Tobin-Hochstadt, and David Van Horn (University of Maryland, USA; Indiana University, USA) Software contracts allow programmers to state rich program properties using the full expressive power of an object language. However, since they are enforced at runtime, monitoring contracts imposes significant overhead and delays error discovery. So contract veri cation aims to guarantee all or most of these properties ahead of time, enabling valuable optimizations and yielding a more general assurance of correctness. Existing methods for static contract verification satisfy the needs of more restricted target languages, but fail to address the challenges unique to those conjoining untyped, dynamic programming, higher-order functions, modularity, and statefulness. Our approach tackles all these features at once, in the context of the full Racket system—a mature environment for stateful, higher-order, multi-paradigm programming with or with- out types. Evaluating our method using a set of both pure and stateful benchmarks, we are able to verify 99.94% of checks statically (all but 28 of 49, 861). Stateful, higher-order functions pose significant challenges for static contract verification in particular. In the presence of these features, a modular analysis must permit code from the current module to escape permanently to an opaque context (unspecified code from outside the current module) that may be stateful and therefore store a reference to the escaped closure. Also, contracts themselves, being predicates wri en in unrestricted Racket, may exhibit stateful behavior; a sound approach must be robust to contracts which are arbitrarily expressive and interwoven with the code they monitor. In this paper, we present and evaluate our solution based on higher-order symbolic execution, explain the techniques we used to address such thorny issues, formalize a notion of behavioral approximation, and use it to provide a mechanized proof of soundness. ![]() ![]() ![]() ![]() |
|
Niksic, Filip |
![]() Rupak Majumdar (MPI-SWS, Germany) Random testing has proven to be an effective way to catch bugs in distributed systems in the presence of network partition faults. This is surprising, as the space of potentially faulty executions is enormous, and the bugs depend on a subtle interplay between sequences of operations and faults. We provide a theoretical justification of the effectiveness of random testing in this context. First, we show a general construction, using the probabilistic method from combinatorics, that shows that whenever a random test covers a fixed coverage goal with sufficiently high probability, a small randomly-chosen set of tests achieves full coverage with high probability. In particular, we show that our construction can give test sets exponentially smaller than systematic enumeration. Second, based on an empirical study of many bugs found by random testing in production distributed systems, we introduce notions of test coverage relating to network partition faults which are effective in finding bugs. Finally, we show using combinatorial arguments that for these notions of test coverage we introduce, we can find a lower bound on the probability that a random test covers a given goal. Our general construction then explains why random testing tools achieve good coverage---and hence, find bugs---quickly. While we formulate our results in terms of network partition faults, our construction provides a step towards rigorous analysis of random testing algorithms, and can be applicable in other scenarios. ![]() ![]() |
|
Novotný, Petr |
![]() Sheshansh Agrawal, Krishnendu Chatterjee (IIT Bombay, India; IST Austria, Austria) Probabilistic programs extend classical imperative programs with real-valued random variables and random branching. The most basic liveness property for such programs is the termination property. The qualitative (aka almost-sure) termination problem asks whether a given program program terminates with probability 1. While ranking functions provide a sound and complete method for non-probabilistic programs, the extension of them to probabilistic programs is achieved via ranking supermartingales (RSMs). Although deep theoretical results have been established about RSMs, their application to probabilistic programs with nondeterminism has been limited only to programs of restricted control-flow structure. For non-probabilistic programs, lexicographic ranking functions provide a compositional and practical approach for termination analysis of real-world programs. In this work we introduce lexicographic RSMs and show that they present a sound method for almost-sure termination of probabilistic programs with nondeterminism. We show that lexicographic RSMs provide a tool for compositional reasoning about almost-sure termination, and for probabilistic programs with linear arithmetic they can be synthesized efficiently (in polynomial time). We also show that with additional restrictions even asymptotic bounds on expected termination time can be obtained through lexicographic RSMs. Finally, we present experimental results on benchmarks adapted from previous work to demonstrate the effectiveness of our approach. ![]() ![]() ![]() |
|
Odersky, Martin |
![]() Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki (EPFL, Switzerland; Northeastern University, USA) Understanding a program entails understanding its context; dependencies, configurations and even implementations are all forms of contexts. Modern programming languages and theorem provers offer an array of constructs to define contexts, implicitly. Scala offers implicit parameters which are used pervasively, but which cannot be abstracted over. This paper describes a generalization of implicit parameters to implicit function types, a powerful way to abstract over the context in which some piece of code is run. We provide a formalization based on bidirectional type-checking that closely follows the semantics implemented by the Scala compiler. To demonstrate their range of abstraction capabilities, we present several applications that make use of implicit function types. We show how to encode the builder pattern, tagless interpreters, reader and free monads and we assess the performance of the monadic structures presented. ![]() ![]() ![]() |
|
Öhman, Joakim |
![]() Andreas Abel, Joakim Öhman (University of Gothenburg, Sweden; IMDEA Software Institute, Spain; Chalmers University of Technology, Sweden) Type theory should be able to handle its own meta-theory, both to justify its foundational claims and to obtain a verified implementation. At the core of a type checker for intensional type theory lies an algorithm to check equality of types, or in other words, to check whether two types are convertible. We have formalized in Agda a practical conversion checking algorithm for a dependent type theory with one universe à la Russell, natural numbers, and η-equality for Π types. We prove the algorithm correct via a Kripke logical relation parameterized by a suitable notion of equivalence of terms. We then instantiate the parameterized fundamental lemma twice: once to obtain canonicity and injectivity of type formers, and once again to prove the completeness of the algorithm. Our proof relies on inductive-recursive definitions, but not on the uniqueness of identity proofs. Thus, it is valid in variants of intensional Martin-Löf Type Theory as long as they support induction-recursion, for instance, Extensional, Observational, or Homotopy Type Theory. ![]() ![]() ![]() |
|
Ong, C.-H. Luke |
![]() Toby Cathcart Burn, C.-H. Luke Ong, and Steven J. Ramsay (University of Oxford, UK; University of Bristol, UK) Motivated by applications in automated verification of higher-order functional programs, we develop a notion of constrained Horn clauses in higher-order logic and a decision problem concerning their satisfiability. We show that, although satisfiable systems of higher-order clauses do not generally have least models, there is a notion of canonical model obtained through a reduction to a problem concerning a kind of monotone logic program. Following work in higher-order program verification, we develop a refinement type system in order to reason about and automate the search for models. This provides a sound but incomplete method for solving the decision problem. Finally, we show that there is a sense in which we can use refinement types to express properties of terms whilst staying within the higher-order constrained Horn clause framework. ![]() ![]() ![]() |
|
Ostermann, Klaus |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Padon, Oded |
![]() Oded Padon, Jochen Hoenicke, Giuliano Losa, Andreas Podelski (Tel Aviv University, Israel; University of Freiburg, Germany; University of California at Los Angeles, USA) We develop a new technique for verifying temporal properties of infinite-state (distributed) systems. The main idea is to reduce the temporal verification problem to the problem of verifying the safety of infinite-state systems expressed in first-order logic. This allows to leverage existing techniques for safety verification to verify temporal properties of interesting distributed protocols, including some that have not been mechanically verified before. We model infinite-state systems using first-order logic, and use first-order temporal logic (FO-LTL) to specify temporal properties. This general formalism allows to naturally model distributed systems, while supporting both unbounded-parallelism (where the system is allowed to dynamically create processes), and infinite-state per process. The traditional approach for verifying temporal properties of infinite-state systems employs well-founded relations (e.g. using linear arithmetic ranking functions). In contrast, our approach is based the idea of fair cycle detection. In finite-state systems, temporal verification can always be reduced to fair cycle detection (a system contains a fair cycle if it revisits a state after satisfying all fairness constraints). However, with both infinitely many states and infinitely many fairness constraints, a straightforward reduction to fair cycle detection is unsound. To regain soundness, we augment the infinite-state transition system by a dynamically computed finite set, that exploits the locality of transitions. This set lets us define a form of fair cycle detection that is sound in the presence of both infinitely many states, and infinitely many fairness constraints. Our approach allows a new style of temporal verification that does not explicitly involve ranking functions. This fits well with pure first-order verification which does not explicitly reason about numerical values. In particular, it can be used with effectively propositional first-order logic (EPR), in which case checking verification conditions is decidable. We applied our technique to verify temporal properties of several interesting protocols. To the best of our knowledge, we have obtained the first mechanized liveness proof for both TLB Shootdown, and Stoppable Paxos. ![]() ![]() ![]() ![]() |
|
Pagani, Michele |
![]() Thomas Ehrhard, Michele Pagani, and Christine Tasson (University of Paris Diderot, France; CNRS, France) We define a notion of stable and measurable map between cones endowed with measurability tests and show that it forms a cpo-enriched cartesian closed category. This category gives a denotational model of an extension of PCF supporting the main primitives of probabilistic functional programming, like continuous and discrete probabilistic distributions, sampling, conditioning and full recursion. We prove the soundness and adequacy of this model with respect to a call-by-name operational semantics and give some examples of its denotations. ![]() ![]() ![]() |
|
Palsberg, Jens |
![]() Matt Brown and Jens Palsberg (University of California at Los Angeles, USA) We present partial evaluation by specialization-safe normalization, a novel partial evaluation technique that is Jones-optimal, that can be self-applied to achieve the Futamura projections and that can be type-checked to ensure it always generates code with the correct type. Jones-optimality is the gold-standard for nontrivial partial evaluation and guarantees that a specializer can remove an entire layer of interpretation. We achieve Jones-optimality by using a novel affine-variable static analysis that directs specialization-safe normalization to always decrease a program’s runtime. We demonstrate the robustness of our approach by showing Jones-optimality in a variety of settings. We have formally proved that our partial evaluator is Jones-optimal for call-by-value reduction, and we have experimentally shown that it is Jones-optimal for call-by-value, normal-order, and memoized normal-order. Each of our experiments tests Jones-optimality with three different self-interpreters. We implemented our partial evaluator in Fωµ i, a recent language for typed self-applicable meta-programming. It is the first Jones-optimal and self-applicable partial evaluator whose type guarantees that it always generates type-correct code. ![]() ![]() ![]() ![]() |
|
Paraskevopoulou, Zoe |
![]() Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce (University of Pennsylvania, USA; Princeton University, USA) Property-based random testing (PBRT) is widely used in the functional programming and verification communities. For testing simple properties, PBRT tools such as QuickCheck can automatically generate random inputs of a given type. But for more complex properties, effective testing often demands generators for random inputs that belong to a given type and satisfy some logical condition. QuickCheck provides a library of combinators for building such generators by hand, but this can be tedious for simple conditions and error prone for more complex ones. Fortunately, the process can often be automated. The most prominent method, narrowing, works by traversing the structure of the condition, lazily instantiating parts of the data structure as constraints involving them are met. We show how to use ideas from narrowing to compile a large subclass of Coq's inductive relations into efficient generators, avoiding the interpretive overhead of previous implementations. More importantly, the same compilation technique allows us to produce proof terms certifying that each derived generator is good---i.e., sound and complete with respect to the inductive relation it was derived from. We implement our algorithm as an extension of QuickChick, an existing tool for property-based testing in Coq. We evaluate our method by automatically deriving good generators for the majority of the specifications in Software Foundations, a formalized textbook on programming language foundations. ![]() ![]() ![]() ![]() |
|
Parreaux, Lionel |
![]() Lionel Parreaux (EPFL, Switzerland; University of Pennsylvania, USA) Metaprograms are programs that manipulate (generate, analyze and evaluate) other programs. These tasks are greatly facilitated by quasiquotation, a technique to construct and deconstruct program fragments using quoted code templates expressed in the syntax of the manipulated language. We argue that two main flavors of quasiquotes have existed so far: Lisp-style quasiquotes, which can both construct and deconstruct programs but may produce code that contains type mismatches and unbound variables; and MetaML-style quasiquotes, which rely on static typing to prevent these errors, but can only construct programs. In this paper, we show how to combine the advantages of both flavors into a unified framework: we allow the construction, deconstruction and evaluation of program fragments while ensuring that generated programs are well-typed and well-scoped, a combination unseen in previous work. We formalize our approach as λ{}, a multi-stage calculus with code pattern matching and rewriting, and prove its type safety. We also present its realization in Squid, a metaprogramming framework for Scala, leveraging Scala’s expressive type system. To demonstrate the usefulness of our approach, we introduce speculative rewrite rules, a novel code transformation technique that makes decisive use of these capabilities, and we outline how it simplifies the design of some crucial query compiler optimizations. ![]() ![]() ![]() |
|
Patrignani, Marco |
![]() Dominique Devriese, Marco Patrignani, and Frank Piessens (KU Leuven, Belgium; MPI-SWS, Germany; CISPA, Germany) There has long been speculation in the scientific literature on how to dynamically enforce parametricity such as that yielded by System F. Almost 20 years ago, Sumii and Pierce proposed a formal compiler from System F into the cryptographic lambda calculus: an untyped lambda calculus extended with an idealised model of encryption. They conjectured that this compiler was fully abstract, i.e. that compiled terms are contextually equivalent if and only if the original terms were, a property that can be seen as a form of secure compilation. The conjecture has received attention in several other publications since then, but remains open to this day. More recently, several researchers have been looking at gradually-typed languages that extend System F. In this setting it is natural to wonder whether embedding System F into these gradually-typed languages preserves contextual equivalence and thus parametricity. In this paper, we answer both questions negatively. We provide a concrete counterexample: two System F terms whose contextual equivalence is not preserved by the Sumii-Pierce compiler, nor the embedding into the polymorphic blame calculus. This counterexample relies on the absence in System F of what we call a universal type, i.e., a type that all other types can be injected into and extracted from. As the languages in which System F is compiled have a universal type, the compilation cannot be fully abstract; this paper explains why. We believe this paper thus sheds light on recent results in the field of gradually typed languages and it provides a perspective for further research into secure compilation of polymorphic languages. ![]() ![]() |
|
Pavlogiannis, Andreas |
![]() Krishnendu Chatterjee (IST Austria, Austria; IIT Bombay, India) A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graph where the edges are labeled with different types of opening and closing parentheses, and the reachability information is computed via paths whose parentheses are properly matched. We present new results for Dyck reachability problems with applications to alias analysis and data-dependence analysis. Our main contributions, that include improved upper bounds as well as lower bounds that establish optimality guarantees, are as follows: First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph with n nodes and m edges, we present: (i) an algorithm with worst-case running time O(m + n · α(n)), where α(n) is the inverse Ackermann function, improving the previously known O(n2) time bound; (ii) a matching lower bound that shows that our algorithm is optimal wrt to worst-case complexity; and (iii) an optimal average-case upper bound of O(m) time, improving the previously known O(m · logn) bound. Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is only linear, and only wrt the number of call sites. Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean Matrix Multiplication, which is a long-standing open problem. Thus we establish that the existing combinatorial algorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the same hardness holds for graphs of constant treewidth. Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependence analysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform all existing methods on the two problems, over real-world benchmarks. ![]() ![]() ![]() ![]() Marek Chalupa, Krishnendu Chatterjee (Masaryk University, Czechia; IST Austria, Austria; Kena Labs, India; IIT Bombay, India) We present a new dynamic partial-order reduction method for stateless model checking of concurrent programs. A common approach for exploring program behaviors relies on enumerating the traces of the program, without storing the visited states (aka stateless exploration). As the number of distinct traces grows exponentially, dynamic partial-order reduction (DPOR) techniques have been successfully used to partition the space of traces into equivalence classes (Mazurkiewicz partitioning), with the goal of exploring only few representative traces from each class. We introduce a new equivalence on traces under sequential consistency semantics, which we call the observation equivalence. Two traces are observationally equivalent if every read event observes the same write event in both traces. While the traditional Mazurkiewicz equivalence is control-centric, our new definition is data-centric. We show that our observation equivalence is coarser than the Mazurkiewicz equivalence, and in many cases even exponentially coarser. We devise a DPOR exploration of the trace space, called data-centric DPOR, based on the observation equivalence. For acyclic architectures, our algorithm is guaranteed to explore exactly one representative trace from each observation class, while spending polynomial time per class. Hence, our algorithm is optimal wrt the observation equivalence, and in several cases explores exponentially fewer traces than any enumerative method based on the Mazurkiewicz equivalence. For cyclic architectures, we consider an equivalence between traces which is finer than the observation equivalence; but coarser than the Mazurkiewicz equivalence, and in some cases is exponentially coarser. Our data-centric DPOR algorithm remains optimal under this trace equivalence. Finally, we perform a basic experimental comparison between the existing Mazurkiewicz-based DPOR and our data-centric DPOR on a set of academic benchmarks. Our results show a significant reduction in both running time and the number of explored equivalence classes. ![]() ![]() |
|
Pellissier, Luc |
![]() Damiano Mazza, Luc Pellissier, and Pierre Vial (CNRS, France; University of Paris 13, France; University of Paris Diderot, France) Starting from an exact correspondence between linear approximations and non-idempotent intersection types, we develop a general framework for building systems of intersection types characterizing normalization properties. We show how this construction, which uses in a fundamental way Melliès and Zeilberger's ``type systems as functors'' viewpoint, allows us to recover equivalent versions of every well known intersection type system (including Coppo and Dezani's original system, as well as its non-idempotent variants independently introduced by Gardner and de Carvalho). We also show how new systems of intersection types may be built almost automatically in this way. ![]() ![]() |
|
Peña, Lucas |
![]() Christof Löding, P. Madhusudan (RWTH Aachen University, Germany; University of Illinois at Urbana-Champaign, USA) We give foundational results that explain the efficacy of heuristics used for dealing with quantified formulas and recursive definitions. We develop a framework for first order logic (FOL) over an uninterpreted combination of background theories. Our central technical result is that systematic term instantiation is complete for a fragment of FOL that we call safe. Coupled with the fact that unfolding recursive definitions is essentially term instantiation and with the observation that heap verification engines generate verification conditions in the safe fragment explains the efficacy of verification engines like natural proofs that resort to such heuristics. Furthermore, we study recursive definitions with least fixpoint semantics and show that though they are not amenable to complete procedures, we can systematically introduce induction principles that in practice bridge the divide between FOL and FOL with recursive definitions. ![]() ![]() ![]() |
|
Pereira, Fernando Magno Quintão |
![]() Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araújo, and Fernando Magno Quintão Pereira (Federal University of Minas Gerais, Brazil; Federal University of Ouro Preto, Brazil) Incomplete source code naturally emerges in software development: during the design phase, while evolving, testing and analyzing programs. Therefore, the ability to understand partial programs is a valuable asset. However, this problem is still unsolved in the C programming language. Difficulties stem from the fact that parsing C requires, not only syntax, but also semantic information. Furthermore, inferring types so that they respect C's type system is a challenging task. In this paper we present a technique that lets us solve these problems. We provide a unification-based type inference capable of dealing with C intricacies. The ideas we present let us reconstruct partial C programs into complete well-typed ones. Such program reconstruction has several applications: enabling static analysis tools in scenarios where software components may be absent; improving static analysis tools that do not rely on build-specifications; allowing stub-generation and testing tools to work on snippets; and assisting programmers on the extraction of reusable data-structures out of the program parts that use them. Our evaluation is performed on source code from a variety of C libraries such as GNU's Coreutils, GNULib, GNOME's GLib, and GDSL; on implementations from Sedgewick's books; and on snippets from popular open-source projects like CPython, FreeBSD, and Git. ![]() ![]() ![]() ![]() |
|
Peyton Jones, Simon |
![]() Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack (University of Gothenburg, Sweden; Tweag I/O, France; Indiana University, USA; Microsoft Research, UK) Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values. To demonstrate the efficacy of our linear type system — both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types — we implemented our type system in ghc, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions. ![]() ![]() ![]() |
|
Pierce, Benjamin C. |
![]() Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce (University of Pennsylvania, USA; Princeton University, USA) Property-based random testing (PBRT) is widely used in the functional programming and verification communities. For testing simple properties, PBRT tools such as QuickCheck can automatically generate random inputs of a given type. But for more complex properties, effective testing often demands generators for random inputs that belong to a given type and satisfy some logical condition. QuickCheck provides a library of combinators for building such generators by hand, but this can be tedious for simple conditions and error prone for more complex ones. Fortunately, the process can often be automated. The most prominent method, narrowing, works by traversing the structure of the condition, lazily instantiating parts of the data structure as constraints involving them are met. We show how to use ideas from narrowing to compile a large subclass of Coq's inductive relations into efficient generators, avoiding the interpretive overhead of previous implementations. More importantly, the same compilation technique allows us to produce proof terms certifying that each derived generator is good---i.e., sound and complete with respect to the inductive relation it was derived from. We implement our algorithm as an extension of QuickChick, an existing tool for property-based testing in Coq. We evaluate our method by automatically deriving good generators for the majority of the specifications in Software Foundations, a formalized textbook on programming language foundations. ![]() ![]() ![]() ![]() ![]() Anders Miltner, Kathleen Fisher, Benjamin C. Pierce (Princeton University, USA; Tufts University, USA; University of Pennsylvania, USA) Bidirectional transformations between different data representations occur frequently in modern software systems. They appear as serializers and deserializers, as parsers and pretty printers, as database views and view updaters, and as a multitude of different kinds of ad hoc data converters. Manually building bidirectional transformations---by writing two separate functions that are intended to be inverses---is tedious and error prone. A better approach is to use a domain-specific language in which both directions can be written as a single expression. However, these domain-specific languages can be difficult to program in, requiring programmers to manage fiddly details while working in a complex type system. We present an alternative approach. Instead of coding transformations manually, we synthesize them from declarative format descriptions and examples. Specifically, we present Optician, a tool for type-directed synthesis of bijective string transformers. The inputs to Optician are a pair of ordinary regular expressions representing two data formats and a few concrete examples for disambiguation. The output is a well-typed program in Boomerang (a bidirectional language based on the theory of lenses). The main technical challenge involves navigating the vast program search space efficiently. In particular, and unlike most prior work on type-directed synthesis, our system operates in the context of a language with a rich equivalence relation on types (the theory of regular expressions). Consequently, program synthesis requires search in two dimensions: First, our synthesis algorithm must find a pair of "syntactically compatible types," and second, using the structure of those types, it must find a type- and example-compliant term. Our key insight is that it is possible to reduce the size of this search space without losing any computational power by defining a new language of lenses designed specifically for synthesis. The new language is free from arbitrary function composition and operates only over types and terms in a new disjunctive normal form. We prove (1) our new language is just as powerful as a more natural, compositional, and declarative language and (2) our synthesis algorithm is sound and complete with respect to the new language. We also demonstrate empirically that our new language changes the synthesis problem from one that admits intractable solutions to one that admits highly efficient solutions, able to synthesize intricate lenses between complex file formats in seconds. We evaluate Optician on a benchmark suite of 39 examples that includes both microbenchmarks and realistic examples derived from other data management systems including Flash Fill, a tool for synthesizing string transformations in spreadsheets, and Augeas, a tool for bidirectional processing of Linux system configuration files. ![]() ![]() ![]() |
|
Piessens, Frank |
![]() Dominique Devriese, Marco Patrignani, and Frank Piessens (KU Leuven, Belgium; MPI-SWS, Germany; CISPA, Germany) There has long been speculation in the scientific literature on how to dynamically enforce parametricity such as that yielded by System F. Almost 20 years ago, Sumii and Pierce proposed a formal compiler from System F into the cryptographic lambda calculus: an untyped lambda calculus extended with an idealised model of encryption. They conjectured that this compiler was fully abstract, i.e. that compiled terms are contextually equivalent if and only if the original terms were, a property that can be seen as a form of secure compilation. The conjecture has received attention in several other publications since then, but remains open to this day. More recently, several researchers have been looking at gradually-typed languages that extend System F. In this setting it is natural to wonder whether embedding System F into these gradually-typed languages preserves contextual equivalence and thus parametricity. In this paper, we answer both questions negatively. We provide a concrete counterexample: two System F terms whose contextual equivalence is not preserved by the Sumii-Pierce compiler, nor the embedding into the polymorphic blame calculus. This counterexample relies on the absence in System F of what we call a universal type, i.e., a type that all other types can be injected into and extracted from. As the languages in which System F is compiled have a universal type, the compilation cannot be fully abstract; this paper explains why. We believe this paper thus sheds light on recent results in the field of gradually typed languages and it provides a perspective for further research into secure compilation of polymorphic languages. ![]() ![]() |
|
Piróg, Maciej |
![]() Dariusz Biernacki, Maciej Piróg, Piotr Polesiuk, and Filip Sieczkowski (University of Wrocław, Poland) Algebraic effects and handlers have received a lot of attention recently, both from the theoretical point of view and in practical language design. This stems from the fact that algebraic effects give the programmer unprecedented freedom to define, combine, and interpret computational effects. This plenty-of-rope, however, demands not only a deep understanding of the underlying semantics, but also access to practical means of reasoning about effectful code, including correctness and program equivalence. In this paper we tackle this problem by constructing a step-indexed relational interpretation of a call-by-value calculus with algebraic effect handlers and a row-based polymorphic type-and-effect system. Our calculus, while striving for simplicity, enjoys desirable theoretical properties, and is close to the cores of programming languages with algebraic effects used in the wild, while the logical relation we build for it can be used to reason about non-trivial properties, such as contextual equivalence and contextual approximation of programs. Our development has been fully formalised in the Coq proof assistant. ![]() ![]() ![]() ![]() |
|
Podelski, Andreas |
![]() Oded Padon, Jochen Hoenicke, Giuliano Losa, Andreas Podelski (Tel Aviv University, Israel; University of Freiburg, Germany; University of California at Los Angeles, USA) We develop a new technique for verifying temporal properties of infinite-state (distributed) systems. The main idea is to reduce the temporal verification problem to the problem of verifying the safety of infinite-state systems expressed in first-order logic. This allows to leverage existing techniques for safety verification to verify temporal properties of interesting distributed protocols, including some that have not been mechanically verified before. We model infinite-state systems using first-order logic, and use first-order temporal logic (FO-LTL) to specify temporal properties. This general formalism allows to naturally model distributed systems, while supporting both unbounded-parallelism (where the system is allowed to dynamically create processes), and infinite-state per process. The traditional approach for verifying temporal properties of infinite-state systems employs well-founded relations (e.g. using linear arithmetic ranking functions). In contrast, our approach is based the idea of fair cycle detection. In finite-state systems, temporal verification can always be reduced to fair cycle detection (a system contains a fair cycle if it revisits a state after satisfying all fairness constraints). However, with both infinitely many states and infinitely many fairness constraints, a straightforward reduction to fair cycle detection is unsound. To regain soundness, we augment the infinite-state transition system by a dynamically computed finite set, that exploits the locality of transitions. This set lets us define a form of fair cycle detection that is sound in the presence of both infinitely many states, and infinitely many fairness constraints. Our approach allows a new style of temporal verification that does not explicitly involve ranking functions. This fits well with pure first-order verification which does not explicitly reason about numerical values. In particular, it can be used with effectively propositional first-order logic (EPR), in which case checking verification conditions is decidable. We applied our technique to verify temporal properties of several interesting protocols. To the best of our knowledge, we have obtained the first mechanized liveness proof for both TLB Shootdown, and Stoppable Paxos. ![]() ![]() ![]() ![]() |
|
Polesiuk, Piotr |
![]() Dariusz Biernacki, Maciej Piróg, Piotr Polesiuk, and Filip Sieczkowski (University of Wrocław, Poland) Algebraic effects and handlers have received a lot of attention recently, both from the theoretical point of view and in practical language design. This stems from the fact that algebraic effects give the programmer unprecedented freedom to define, combine, and interpret computational effects. This plenty-of-rope, however, demands not only a deep understanding of the underlying semantics, but also access to practical means of reasoning about effectful code, including correctness and program equivalence. In this paper we tackle this problem by constructing a step-indexed relational interpretation of a call-by-value calculus with algebraic effect handlers and a row-based polymorphic type-and-effect system. Our calculus, while striving for simplicity, enjoys desirable theoretical properties, and is close to the cores of programming languages with algebraic effects used in the wild, while the logical relation we build for it can be used to reason about non-trivial properties, such as contextual equivalence and contextual approximation of programs. Our development has been fully formalised in the Coq proof assistant. ![]() ![]() ![]() ![]() |
|
Popescu, Andrei |
![]() Ondřej Kunčar and Andrei Popescu (TU Munich, Germany; Middlesex University, UK; Institute of Mathematics at Romanian Academy, Romania) Definitions are traditionally considered to be a safe mechanism for introducing concepts on top of a logic known to be consistent. In contrast to arbitrary axioms, definitions should in principle be treatable as a form of abbreviation, and thus compiled away from the theory without losing provability. In particular, definitions should form a conservative extension of the pure logic. These properties are crucial for modern interactive theorem provers, since they ensure the consistency of the logic, as well as a valid environment for total/certified functional programming. We prove these properties, namely, safety and conservativity, for Higher-Order Logic (HOL), a logic implemented in several mainstream theorem provers and relied upon by thousands of users. Some unique features of HOL, such as the requirement to give non-emptiness proofs when defining new types and the impossibility to unfold type definitions, make the proof of these properties, and also the very formulation of safety, nontrivial. Our study also factors in the essential variation of HOL definitions featured by Isabelle/HOL, a popular member of the HOL-based provers family. The current work improves on recent results which showed a weaker property, consistency of Isabelle/HOL's definitions. ![]() ![]() ![]() |
|
Pouchet, Louis-Noel |
![]() Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and P. Sadayappan (Ohio State University, USA; Pacific Northwest National Laboratory, USA; Colorado State University, USA) Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost models are used, due to the lack of precise compile-time models for misses for hierarchical caches. The current state of practice for cache miss analysis is based on accurate simulation. However, simulation requires time proportional to the dataset/problem size, as well as the number of distinct cache configurations of interest to be evaluated. This paper takes a fundamentally different approach, by focusing on polyhedral programs with static control flow. Instead of relying on costly simulation, a closed-form solution for modeling of misses in a set associative cache hierarchy is developed. This solution can enable program transformation choice at compile time to optimize cache misses. A tool implementing the approach has been developed and used for validation of the framework. ![]() ![]() |
|
Pulte, Christopher |
![]() Christopher Pulte (University of Cambridge, UK; ARM, UK; University of St. Andrews, UK) ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes could become visible to some other threads before becoming visible to all — but this has not been exploited in production implementations, the corresponding potential hardware optimisations are thought to have insufficient benefits in the ARM context, and it gives rise to subtle complications when combined with other ARMv8 features. The ARMv8 architecture has therefore been revised: it now has a multicopy-atomic model. It has also been simplified in other respects, including more straightforward notions of dependency, and the architecture now includes a formal concurrency model. In this paper we detail these changes and discuss their motivation. We define two formal concurrency models: an operational one, simplifying the Flowing model of Flur et al., and the axiomatic model of the revised ARMv8 specification. The models were developed by an academic group and by ARM staff, respectively, and this extended collaboration partly motivated the above changes. We prove the equivalence of the two models. The operational model is integrated into an executable exploration tool with new web interface, demonstrated by exhaustively checking the possible behaviours of a loop-unrolled version of a Linux kernel lock implementation, a previously known bug due to unprevented speculation, and a fixed version. ![]() ![]() ![]() |
|
Püschel, Markus |
![]() Gagandeep Singh, Markus Püschel, and Martin Vechev (ETH Zurich, Switzerland) Numerical abstract domains such as Polyhedra, Octahedron, Octagon, Interval, and others are an essential component of static program analysis. The choice of domain offers a performance/precision tradeoff ranging from cheap and imprecise (Interval) to expensive and precise (Polyhedra). Recently, significant speedups were achieved for Octagon and Polyhedra by manually decomposing their transformers to work with the Cartesian product of projections associated with partitions of the variable set. While practically useful, this manual process is time consuming, error-prone, and has to be applied from scratch for every domain. In this paper, we present a generic approach for decomposing the transformers of sub-polyhedra domains along with conditions for checking whether the decomposed transformers lose precision with respect to the original transformers. These conditions are satisfied by most practical transformers, thus our approach is suitable for increasing the performance of these transformers without compromising their precision. Furthermore, our approach is ``black box:'' it does not require changes to the internals of the original non-decomposed transformers or additional manual effort per domain. We implemented our approach and applied it to the domains of Zones, Octagon, and Polyhedra. We then compared the performance of the decomposed transformers obtained with our generic method versus the state of the art: the (non-decomposed) PPL for Polyhedra and the much faster ELINA (which uses manual decomposition) for Polyhedra and Octagon. Against ELINA we demonstrate finer partitions and an associated speedup of about 2x on average. Our results indicate that the general construction presented in this work is a viable method for improving the performance of sub-polyhedra domains. It enables designers of abstract domains to benefit from decomposition without re-writing all of their transformers from scratch as required by prior work. ![]() ![]() ![]() ![]() |
|
Radiček, Ivan |
![]() Ivan Radiček, Gilles Barthe (Vienna University of Technology, Austria; IMDEA Software Institute, Spain; SUNY Buffalo, USA; MPI-SWS, Germany) Formal frameworks for cost analysis of programs have been widely studied in the unary setting and, to a limited extent, in the relational setting. However, many of these frameworks focus only on the cost aspect, largely side-lining functional properties that are often a prerequisite for cost analysis, thus leaving many interesting programs out of their purview. In this paper, we show that elegant, simple, expressive proof systems combining cost analysis and functional properties can be built by combining already known ingredients: higher-order refinements and cost monads. Specifically, we derive two syntax-directed proof systems, UC and RC, for unary and relational cost analysis, by adding a cost monad to a (syntax-directed) logic of higher-order programs. We study the metatheory of the systems, show that several nontrivial examples can be verified in them, and prove that existing frameworks for cost analysis (RelCost and RAML) can be embedded in them. ![]() ![]() |
|
Ramsay, Steven J. |
![]() Toby Cathcart Burn, C.-H. Luke Ong, and Steven J. Ramsay (University of Oxford, UK; University of Bristol, UK) Motivated by applications in automated verification of higher-order functional programs, we develop a notion of constrained Horn clauses in higher-order logic and a decision problem concerning their satisfiability. We show that, although satisfiable systems of higher-order clauses do not generally have least models, there is a notion of canonical model obtained through a reduction to a problem concerning a kind of monotone logic program. Following work in higher-order program verification, we develop a refinement type system in order to reason about and automate the search for models. This provides a sound but incomplete method for solving the decision problem. Finally, we show that there is a sense in which we can use refinement types to express properties of terms whilst staying within the higher-order constrained Horn clause framework. ![]() ![]() ![]() |
|
Rastogi, Aseem |
![]() Danel Ahman, Cédric Fournet (Inria, France; Microsoft Research, UK; ENS Paris, France; Microsoft Research, India; Microsoft Research, USA) We provide a way to ease the verification of programs whose state evolves monotonically. The main idea is that a property witnessed in a prior state can be soundly recalled in the current state, provided (1) state evolves according to a given preorder, and (2) the property is preserved by this preorder. In many scenarios, such monotonic reasoning yields concise modular proofs, saving the need for explicit program invariants. We distill our approach into the monotonic-state monad, a general yet compact interface for Hoare-style reasoning about monotonic state in a dependently typed language. We prove the soundness of the monotonic-state monad and use it as a unified foundation for reasoning about monotonic state in the F⋆ verification system. Based on this foundation, we build libraries for various mutable data structures like monotonic references and apply these libraries at scale to the verification of several distributed applications. ![]() ![]() ![]() ![]() |
|
Rémy, Didier |
![]() Thomas Williams and Didier Rémy (Inria, France) Ornaments are a way to describe changes in datatype definitions reorganizing, adding, or dropping some pieces of data so that functions operating on the bare definition can be partially and sometimes totally lifted into functions operating on the ornamented structure. We propose an extension of ML with higher-order ornaments, demonstrate its expressiveness with a few typical examples, including code refactoring, study the metatheoretical properties of ornaments, and describe their elaboration process. We formalize ornamentation via an a posteriori abstraction of the bare code, returning a generic term, which lives in a meta-language above ML. The lifted code is obtained by application of the generic term to well-chosen arguments, followed by staged reduction, and some remaining simplifications. We use logical relations to closely relate the lifted code to the bare code. ![]() ![]() ![]() ![]() |
|
![]() |
Reps, Thomas |
![]() Zachary Kincaid (Princeton University, USA; University of Wisconsin-Madison, USA; GrammaTech, USA) Automatic generation of non-linear loop invariants is a long-standing challenge in program analysis, with many applications. For instance, reasoning about exponentials provides a way to find invariants of digital-filter programs, and reasoning about polynomials and/or logarithms is needed for establishing invariants that describe the time or memory usage of many well-known algorithms. An appealing approach to this challenge is to exploit the powerful recurrence-solving techniques that have been developed in the field of computer algebra, which can compute exact characterizations of non-linear repetitive behavior. However, there is a gap between the capabilities of recurrence solvers and the needs of program analysis: (1) loop bodies are not merely systems of recurrence relations---they may contain conditional branches, nested loops, non-deterministic assignments, etc., and (2) a client program analyzer must be able to reason about the closed-form solutions produced by a recurrence solver (e.g., to prove assertions). This paper presents a method for generating non-linear invariants of general loops based on analyzing recurrence relations. The key components are an abstract domain for reasoning about non-linear arithmetic, a semantics-based method for extracting recurrence relations from loop bodies, and a recurrence solver that avoids closed forms that involve complex or irrational numbers. Our technique has been implemented in a program analyzer that can analyze general loops and mutually recursive procedures. Our experiments show that our technique shows promise for non-linear assertion-checking and resource-bound generation. ![]() ![]() ![]() |
Ribeiro, Rodrigo G. |
![]() Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araújo, and Fernando Magno Quintão Pereira (Federal University of Minas Gerais, Brazil; Federal University of Ouro Preto, Brazil) Incomplete source code naturally emerges in software development: during the design phase, while evolving, testing and analyzing programs. Therefore, the ability to understand partial programs is a valuable asset. However, this problem is still unsolved in the C programming language. Difficulties stem from the fact that parsing C requires, not only syntax, but also semantic information. Furthermore, inferring types so that they respect C's type system is a challenging task. In this paper we present a technique that lets us solve these problems. We provide a unification-based type inference capable of dealing with C intricacies. The ideas we present let us reconstruct partial C programs into complete well-typed ones. Such program reconstruction has several applications: enabling static analysis tools in scenarios where software components may be absent; improving static analysis tools that do not rely on build-specifications; allowing stub-generation and testing tools to work on snippets; and assisting programmers on the extraction of reusable data-structures out of the program parts that use them. Our evaluation is performed on source code from a variety of C libraries such as GNU's Coreutils, GNULib, GNOME's GLib, and GDSL; on implementations from Sedgewick's books; and on snippets from popular open-source projects like CPython, FreeBSD, and Git. ![]() ![]() ![]() ![]() |
|
Riely, James |
![]() Brijesh Dongol, Radha Jagadeesan, and James Riely (Brunel University London, UK; DePaul University, USA) The integration of transactions into hardware relaxed memory architectures is a topic of current research both in industry and academia. In this paper, we provide a general architectural framework for the introduction of transactions into models of relaxed memory in hardware, including the SC, TSO, ARMv8 and PPC models. Our framework incorporates flexible and expressive forms of transaction aborts and execution that have hitherto been in the realm of software transactional memory. In contrast to software transactional memory, we account for the characteristics of relaxed memory as a restricted form of distributed system, without a notion of global time. We prove abstraction theorems to demonstrate that the programmer API matches the intuitions and expectations about transactions. ![]() ![]() |
|
Rinetzky, Noam |
![]() Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar (Tel Aviv University, Israel; VMware, USA; Stanford University, USA) Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object's local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contract executions, excluding these of the DAO or of contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state. ![]() ![]() ![]() |
|
Rioux, Nick |
![]() William J. Bowman, Youyou Cong, Nick Rioux, and Amal Ahmed (Northeastern University, USA; Ochanomizu University, Japan) Dependently typed languages such as Coq are used to specify and prove functional correctness of source programs, but what we ultimately need are guarantees about correctness of compiled code. By preserving dependent types through each compiler pass, we could preserve source-level specifications and correctness proofs into the generated target-language programs. Unfortunately, type-preserving compilation of dependent types is hard. In 2002, Barthe and Uustalu showed that type-preserving CPS is not possible for languages such as Coq. Specifically, they showed that for strong dependent pairs (Σ types), the standard typed call-by-name CPS is not type preserving. They further proved that for dependent case analysis on sums, a class of typed CPS translations—including the standard translation—is not possible. In 2016, Morrisett noticed a similar problem with the standard call-by-value CPS translation for dependent functions (Π types). In essence, the problem is that the standard typed CPS translation by double-negation, in which computations are assigned types of the form (A → ⊥) → ⊥, disrupts the term/type equivalence that is used during type checking in a dependently typed language. In this paper, we prove that type-preserving CPS translation for dependently typed languages is not not possible. We develop both call-by-name and call-by-value CPS translations from the Calculus of Constructions with both Π and Σ types (CC) to a dependently typed target language, and prove type preservation and compiler correctness of each translation. Our target language is CC extended with an additional equivalence rule and an additional typing rule, which we prove consistent by giving a model in the extensional Calculus of Constructions. Our key observation is that we can use a CPS translation that employs answer-type polymorphism, where CPS-translated computations have type ∀ α. (A → α) → α. This type justifies, by a free theorem, the new equality rule in our target language and allows us to recover the term/type equivalences that CPS translation disrupts. Finally, we conjecture that our translation extends to dependent case analysis on sums, despite the impossibility result, and provide a proof sketch. ![]() ![]() ![]() |
|
Rompf, Tiark |
![]() Nada Amin and Tiark Rompf (University of Cambridge, UK; Purdue University, USA) Given a tower of interpreters, i.e., a sequence of multiple interpreters interpreting one another as input programs, we aim to collapse this tower into a compiler that removes all interpretive overhead and runs in a single pass. In the real world, a use case might be Python code executed by an x86 runtime, on a CPU emulated in a JavaScript VM, running on an ARM CPU. Collapsing such a tower can not only exponentially improve runtime performance, but also enable the use of base-language tools for interpreted programs, e.g., for analysis and verification. In this paper, we lay the foundations in an idealized but realistic setting. We present a multi-level lambda calculus that features staging constructs and stage polymorphism: based on runtime parameters, an evaluator either executes source code (thereby acting as an interpreter) or generates code (thereby acting as a compiler). We identify stage polymorphism, a programming model from the domain of high-performance program generators, as the key mechanism to make such interpreters compose in a collapsible way. We present Pink, a meta-circular Lisp-like evaluator on top of this calculus, and demonstrate that we can collapse arbitrarily many levels of self-interpretation, including levels with semantic modifications. We discuss several examples: compiling regular expressions through an interpreter to base code, building program transformers from modi ed interpreters, and others. We develop these ideas further to include reflection and reification, culminating in Purple, a reflective language inspired by Brown, Blond, and Black, which realizes a conceptually infinite tower, where every aspect of the semantics can change dynamically. Addressing an open challenge, we show how user programs can be compiled and recompiled under user-modified semantics. ![]() ![]() ![]() ![]() |
|
Rouvoet, Arjen |
![]() Casper Bach Poulsen (Delft University of Technology, Netherlands; Portland State University, USA) A definitional interpreter defines the semantics of an object language in terms of the (well-known) semantics of a host language, enabling understanding and validation of the semantics through execution. Combining a definitional interpreter with a separate type system requires a separate type safety proof. An alternative approach, at least for pure object languages, is to use a dependently-typed language to encode the object language type system in the definition of the abstract syntax. Using such intrinsically-typed abstract syntax definitions allows the host language type checker to verify automatically that the interpreter satisfies type safety. Does this approach scale to larger and more realistic object languages, and in particular to languages with mutable state and objects? In this paper, we describe and demonstrate techniques and libraries in Agda that successfully scale up intrinsically-typed definitional interpreters to handle rich object languages with non-trivial binding structures and mutable state. While the resulting interpreters are certainly more complex than the simply-typed λ-calculus interpreter we start with, we claim that they still meet the goals of being concise, comprehensible, and executable, while guaranteeing type safety for more elaborate object languages. We make the following contributions: (1) A dependent-passing style technique for hiding the weakening of indexed values as they propagate through monadic code. (2) An Agda library for programming with scope graphs and frames, which provides a uniform approach to dealing with name binding in intrinsically-typed interpreters. (3) Case studies of intrinsically-typed definitional interpreters for the simply-typed λ-calculus with references (STLC+Ref) and for a large subset of Middleweight Java (MJ). ![]() ![]() ![]() |
|
Rümmer, Philipp |
![]() Lukáš Holík (Brno University of Technology, Czechia; University of Oxford, UK; Uppsala University, Sweden) String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature. ![]() ![]() ![]() ![]() |
|
Sadayappan, P. |
![]() Wenlei Bao, Sriram Krishnamoorthy, Louis-Noel Pouchet, and P. Sadayappan (Ohio State University, USA; Pacific Northwest National Laboratory, USA; Colorado State University, USA) Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost models are used, due to the lack of precise compile-time models for misses for hierarchical caches. The current state of practice for cache miss analysis is based on accurate simulation. However, simulation requires time proportional to the dataset/problem size, as well as the number of distinct cache configurations of interest to be evaluated. This paper takes a fundamentally different approach, by focusing on polyhedral programs with static control flow. Instead of relying on costly simulation, a closed-form solution for modeling of misses in a set associative cache hierarchy is developed. This solution can enable program transformation choice at compile time to optimize cache misses. A tool implementing the approach has been developed and used for validation of the framework. ![]() ![]() |
|
Sagiv, Mooly |
![]() Oded Padon, Jochen Hoenicke, Giuliano Losa, Andreas Podelski (Tel Aviv University, Israel; University of Freiburg, Germany; University of California at Los Angeles, USA) We develop a new technique for verifying temporal properties of infinite-state (distributed) systems. The main idea is to reduce the temporal verification problem to the problem of verifying the safety of infinite-state systems expressed in first-order logic. This allows to leverage existing techniques for safety verification to verify temporal properties of interesting distributed protocols, including some that have not been mechanically verified before. We model infinite-state systems using first-order logic, and use first-order temporal logic (FO-LTL) to specify temporal properties. This general formalism allows to naturally model distributed systems, while supporting both unbounded-parallelism (where the system is allowed to dynamically create processes), and infinite-state per process. The traditional approach for verifying temporal properties of infinite-state systems employs well-founded relations (e.g. using linear arithmetic ranking functions). In contrast, our approach is based the idea of fair cycle detection. In finite-state systems, temporal verification can always be reduced to fair cycle detection (a system contains a fair cycle if it revisits a state after satisfying all fairness constraints). However, with both infinitely many states and infinitely many fairness constraints, a straightforward reduction to fair cycle detection is unsound. To regain soundness, we augment the infinite-state transition system by a dynamically computed finite set, that exploits the locality of transitions. This set lets us define a form of fair cycle detection that is sound in the presence of both infinitely many states, and infinitely many fairness constraints. Our approach allows a new style of temporal verification that does not explicitly involve ranking functions. This fits well with pure first-order verification which does not explicitly reason about numerical values. In particular, it can be used with effectively propositional first-order logic (EPR), in which case checking verification conditions is decidable. We applied our technique to verify temporal properties of several interesting protocols. To the best of our knowledge, we have obtained the first mechanized liveness proof for both TLB Shootdown, and Stoppable Paxos. ![]() ![]() ![]() ![]() ![]() Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar (Tel Aviv University, Israel; VMware, USA; Stanford University, USA) Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object's local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contract executions, excluding these of the DAO or of contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state. ![]() ![]() ![]() |
|
Sagonas, Konstantinos |
![]() Michalis Kokologiannakis, Ori Lahav (National Technical University of Athens, Greece; Tel Aviv University, Israel; Uppsala University, Sweden; MPI-SWS, Germany) We present a stateless model checking algorithm for verifying concurrent programs running under RC11, a repaired version of the C/C++11 memory model without dependency cycles. Unlike most previous approaches, which enumerate thread interleavings up to some partial order reduction improvements, our approach works directly on execution graphs and (in the absence of RMW instructions and SC atomics) avoids redundant exploration by construction. We have implemented a model checker, called RCMC, based on this approach and applied it to a number of challenging concurrent programs. Our experiments confirm that RCMC is significantly faster, scales better than other model checking tools, and is also more resilient to small changes in the benchmarks. ![]() ![]() ![]() ![]() |
|
Sarkar, Susmit |
![]() Christopher Pulte (University of Cambridge, UK; ARM, UK; University of St. Andrews, UK) ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes could become visible to some other threads before becoming visible to all — but this has not been exploited in production implementations, the corresponding potential hardware optimisations are thought to have insufficient benefits in the ARM context, and it gives rise to subtle complications when combined with other ARMv8 features. The ARMv8 architecture has therefore been revised: it now has a multicopy-atomic model. It has also been simplified in other respects, including more straightforward notions of dependency, and the architecture now includes a formal concurrency model. In this paper we detail these changes and discuss their motivation. We define two formal concurrency models: an operational one, simplifying the Flowing model of Flur et al., and the axiomatic model of the revised ARMv8 specification. The models were developed by an academic group and by ARM staff, respectively, and this extended collaboration partly motivated the above changes. We prove the equivalence of the two models. The operational model is integrated into an executable exploration tool with new web interface, demonstrated by exhaustively checking the possible behaviours of a loop-unrolled version of a Linux kernel lock implementation, a previously known bug due to unprevented speculation, and a fixed version. ![]() ![]() ![]() |
|
Satake, Yuki |
![]() Hiroshi Unno (University of Tsukuba, Japan; Waseda University, Japan) This paper considers verification of non-deterministic higher-order functional programs. Our contribution is a novel type system in which the types are used to express and verify (conditional) safety, termination, non-safety, and non-termination properties in the presence of ∀-∃ branching behavior due to non-determinism. For instance, the judgement ⊢ e:{u: int | φ(u) }∀∀ says that every evaluation of e either diverges or reduces to some integer u satisfying φ(u), whereas ⊢ e:{u: int | ψ(u) }∃∀ says that there exists an evaluation of e that either diverges or reduces to some integer u satisfying ψ(u). Note that the former is a safety property whereas the latter is a counterexample to a (conditional) termination property. Following the recent work on type-based verification methods for deterministic higher-order functional programs, we formalize the idea on the foundation of dependent refinement types, thereby allowing the type system to express and verify rich properties involving program values, branching behaviors, and the combination thereof. Our type system is able to seamlessly combine deductions of both universal and existential facts within a unified framework, paving the way for an exciting opportunity for new type-based verification methods that combine both universal and existential reasoning. For example, our system can prove the existence of a path violating some safety property from a proof of termination that uses a well-foundedness termination argument. We prove that our type system is sound and relatively complete, and further, thanks to having both modes of non-determinism, we show that our types are closed under complement. ![]() ![]() |
|
Scherer, Gabriel |
![]() Olivier Flückiger (Northeastern University, USA; Inria, France; Czech Technical University, Czechia) High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition. ![]() ![]() |
|
Ścibior, Adam |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Scott, Ryan G. |
![]() Niki Vazou, Anish Tondwalkar, Vikraman Choudhury (University of Maryland, USA; University of California at San Diego, USA; Indiana University, USA; University of Edinburgh, UK; Input Output HK, UK) We introduce Refinement Reflection, a new framework for building SMT-based deductive verifiers. The key idea is to reflect the code implementing a user-defined function into the function’s (output) refinement type. As a consequence, at uses of the function, the function definition is instantiated in the SMT logic in a precise fashion that permits decidable verification. Reflection allows the user to write equational proofs of programs just by writing other programs using pattern-matching and recursion to perform case-splitting and induction. Thus, via the propositions-as-types principle, we show that reflection permits the specification of arbitrary functional correctness properties. Finally, we introduce a proof-search algorithm called Proof by Logical Evaluation that uses techniques from model checking and abstract interpretation, to completely automate equational reasoning. We have implemented reflection in Liquid Haskell and used it to verify that the widely used instances of the Monoid, Applicative, Functor, and Monad typeclasses actually satisfy key algebraic laws required to make the clients safe, and have used reflection to build the first library that actually verifies assumptions about associativity and ordering that are crucial for safe deterministic parallelism. ![]() ![]() ![]() |
|
Sergey, Ilya |
![]() Ilya Sergey, James R. Wilcox, and Zachary Tatlock (University College London, UK; University of Washington, USA) Distributed systems play a crucial role in modern infrastructure, but are notoriously difficult to implement correctly. This difficulty arises from two main challenges: (a) correctly implementing core system components (e.g., two-phase commit), so all their internal invariants hold, and (b) correctly composing standalone system components into functioning trustworthy applications (e.g., persistent storage built on top of a two-phase commit instance). Recent work has developed several approaches for addressing (a) by means of mechanically verifying implementations of core distributed components, but no methodology exists to address (b) by composing such verified components into larger verified applications. As a result, expensive verification efforts for key system components are not easily reusable, which hinders further verification efforts. In this paper, we present Disel, the first framework for implementation and compositional verification of distributed systems and their clients, all within the mechanized, foundational context of the Coq proof assistant. In Disel, users implement distributed systems using a domain specific language shallowly embedded in Coq and providing both high-level programming constructs as well as low-level communication primitives. Components of composite systems are specified in Disel as protocols, which capture system-specific logic and disentangle system definitions from implementation details. By virtue of Disel's dependent type system, well-typed implementations always satisfy their protocols' invariants and never go wrong, allowing users to verify system implementations interactively using Disel's Hoare-style program logic, which extends state-of-the-art techniques for concurrency verification to the distributed setting. By virtue of the substitution principle and frame rule provided by Disel's logic, system components can be composed leading to modular, reusable verified distributed systems. We describe Disel, illustrate its use with a series of examples, outline its logic and metatheory, and report on our experience using it as a framework for implementing, specifying, and verifying distributed systems. ![]() ![]() ![]() ![]() |
|
Sewell, Peter |
![]() Christopher Pulte (University of Cambridge, UK; ARM, UK; University of St. Andrews, UK) ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes could become visible to some other threads before becoming visible to all — but this has not been exploited in production implementations, the corresponding potential hardware optimisations are thought to have insufficient benefits in the ARM context, and it gives rise to subtle complications when combined with other ARMv8 features. The ARMv8 architecture has therefore been revised: it now has a multicopy-atomic model. It has also been simplified in other respects, including more straightforward notions of dependency, and the architecture now includes a formal concurrency model. In this paper we detail these changes and discuss their motivation. We define two formal concurrency models: an operational one, simplifying the Flowing model of Flur et al., and the axiomatic model of the revised ARMv8 specification. The models were developed by an academic group and by ARM staff, respectively, and this extended collaboration partly motivated the above changes. We prove the equivalence of the two models. The operational model is integrated into an executable exploration tool with new web interface, demonstrated by exhaustively checking the possible behaviours of a loop-unrolled version of a Linux kernel lock implementation, a previously known bug due to unprevented speculation, and a fixed version. ![]() ![]() ![]() |
|
Shaikhha, Amir |
![]() Lionel Parreaux (EPFL, Switzerland; University of Pennsylvania, USA) Metaprograms are programs that manipulate (generate, analyze and evaluate) other programs. These tasks are greatly facilitated by quasiquotation, a technique to construct and deconstruct program fragments using quoted code templates expressed in the syntax of the manipulated language. We argue that two main flavors of quasiquotes have existed so far: Lisp-style quasiquotes, which can both construct and deconstruct programs but may produce code that contains type mismatches and unbound variables; and MetaML-style quasiquotes, which rely on static typing to prevent these errors, but can only construct programs. In this paper, we show how to combine the advantages of both flavors into a unified framework: we allow the construction, deconstruction and evaluation of program fragments while ensuring that generated programs are well-typed and well-scoped, a combination unseen in previous work. We formalize our approach as λ{}, a multi-stage calculus with code pattern matching and rewriting, and prove its type safety. We also present its realization in Squid, a metaprogramming framework for Scala, leveraging Scala’s expressive type system. To demonstrate the usefulness of our approach, we introduce speculative rewrite rules, a novel code transformation technique that makes decisive use of these capabilities, and we outline how it simplifies the design of some crucial query compiler optimizations. ![]() ![]() ![]() |
|
Sharma, Rahul |
![]() Wonyeol Lee (Stanford University, USA; Microsoft Research, India) Industry standard implementations of math.h claim (often without formal proof) tight bounds on floating-point errors. We demonstrate a novel static analysis that proves these bounds and verifies the correctness of these implementations. Our key insight is a reduction of this verification task to a set of mathematical optimization problems that can be solved by off-the-shelf computer algebra systems. We use this analysis to prove the correctness of implementations in Intel's math library automatically. Prior to this work, these implementations could only be verified with significant manual effort. ![]() ![]() |
|
Shasha, Dennis |
![]() Siddharth Krishna, Dennis Shasha, and Thomas Wies (New York University, USA) Concurrent separation logics have helped to significantly simplify correctness proofs for concurrent data structures. However, a recurring problem in such proofs is that data structure abstractions that work well in the sequential setting are much harder to reason about in a concurrent setting due to complex sharing and overlays. To solve this problem, we propose a novel approach to abstracting regions in the heap by encoding the data structure invariant into a local condition on each individual node. This condition may depend on a quantity associated with the node that is computed as a fixpoint over the entire heap graph. We refer to this quantity as a flow. Flows can encode both structural properties of the heap (e.g. the reachable nodes from the root form a tree) as well as data invariants (e.g. sortedness). We then introduce the notion of a flow interface, which expresses the relies and guarantees that a heap region imposes on its context to maintain the local flow invariant with respect to the global heap. Our main technical result is that this notion leads to a new semantic model of separation logic. In this model, flow interfaces provide a general abstraction mechanism for describing complex data structures. This abstraction mechanism admits proof rules that generalize over a wide variety of data structures. To demonstrate the versatility of our approach, we show how to extend the logic RGSep with flow interfaces. We have used this new logic to prove linearizability and memory safety of nontrivial concurrent data structures. In particular, we obtain parametric linearizability proofs for concurrent dictionary algorithms that abstract from the details of the underlying data structure representation. These proofs cannot be easily expressed using the abstraction mechanisms provided by existing separation logics. ![]() ![]() |
|
Shoham, Sharon |
![]() Oded Padon, Jochen Hoenicke, Giuliano Losa, Andreas Podelski (Tel Aviv University, Israel; University of Freiburg, Germany; University of California at Los Angeles, USA) We develop a new technique for verifying temporal properties of infinite-state (distributed) systems. The main idea is to reduce the temporal verification problem to the problem of verifying the safety of infinite-state systems expressed in first-order logic. This allows to leverage existing techniques for safety verification to verify temporal properties of interesting distributed protocols, including some that have not been mechanically verified before. We model infinite-state systems using first-order logic, and use first-order temporal logic (FO-LTL) to specify temporal properties. This general formalism allows to naturally model distributed systems, while supporting both unbounded-parallelism (where the system is allowed to dynamically create processes), and infinite-state per process. The traditional approach for verifying temporal properties of infinite-state systems employs well-founded relations (e.g. using linear arithmetic ranking functions). In contrast, our approach is based the idea of fair cycle detection. In finite-state systems, temporal verification can always be reduced to fair cycle detection (a system contains a fair cycle if it revisits a state after satisfying all fairness constraints). However, with both infinitely many states and infinitely many fairness constraints, a straightforward reduction to fair cycle detection is unsound. To regain soundness, we augment the infinite-state transition system by a dynamically computed finite set, that exploits the locality of transitions. This set lets us define a form of fair cycle detection that is sound in the presence of both infinitely many states, and infinitely many fairness constraints. Our approach allows a new style of temporal verification that does not explicitly involve ranking functions. This fits well with pure first-order verification which does not explicitly reason about numerical values. In particular, it can be used with effectively propositional first-order logic (EPR), in which case checking verification conditions is decidable. We applied our technique to verify temporal properties of several interesting protocols. To the best of our knowledge, we have obtained the first mechanized liveness proof for both TLB Shootdown, and Stoppable Paxos. ![]() ![]() ![]() ![]() |
|
Sieczkowski, Filip |
![]() Dariusz Biernacki, Maciej Piróg, Piotr Polesiuk, and Filip Sieczkowski (University of Wrocław, Poland) Algebraic effects and handlers have received a lot of attention recently, both from the theoretical point of view and in practical language design. This stems from the fact that algebraic effects give the programmer unprecedented freedom to define, combine, and interpret computational effects. This plenty-of-rope, however, demands not only a deep understanding of the underlying semantics, but also access to practical means of reasoning about effectful code, including correctness and program equivalence. In this paper we tackle this problem by constructing a step-indexed relational interpretation of a call-by-value calculus with algebraic effect handlers and a row-based polymorphic type-and-effect system. Our calculus, while striving for simplicity, enjoys desirable theoretical properties, and is close to the cores of programming languages with algebraic effects used in the wild, while the logical relation we build for it can be used to reason about non-trivial properties, such as contextual equivalence and contextual approximation of programs. Our development has been fully formalised in the Coq proof assistant. ![]() ![]() ![]() ![]() |
|
Singh, Gagandeep |
![]() Gagandeep Singh, Markus Püschel, and Martin Vechev (ETH Zurich, Switzerland) Numerical abstract domains such as Polyhedra, Octahedron, Octagon, Interval, and others are an essential component of static program analysis. The choice of domain offers a performance/precision tradeoff ranging from cheap and imprecise (Interval) to expensive and precise (Polyhedra). Recently, significant speedups were achieved for Octagon and Polyhedra by manually decomposing their transformers to work with the Cartesian product of projections associated with partitions of the variable set. While practically useful, this manual process is time consuming, error-prone, and has to be applied from scratch for every domain. In this paper, we present a generic approach for decomposing the transformers of sub-polyhedra domains along with conditions for checking whether the decomposed transformers lose precision with respect to the original transformers. These conditions are satisfied by most practical transformers, thus our approach is suitable for increasing the performance of these transformers without compromising their precision. Furthermore, our approach is ``black box:'' it does not require changes to the internals of the original non-decomposed transformers or additional manual effort per domain. We implemented our approach and applied it to the domains of Zones, Octagon, and Polyhedra. We then compared the performance of the decomposed transformers obtained with our generic method versus the state of the art: the (non-decomposed) PPL for Polyhedra and the much faster ELINA (which uses manual decomposition) for Polyhedra and Octagon. Against ELINA we demonstrate finer partitions and an associated speedup of about 2x on average. Our results indicate that the general construction presented in this work is a viable method for improving the performance of sub-polyhedra domains. It enables designers of abstract domains to benefit from decomposition without re-writing all of their transformers from scratch as required by prior work. ![]() ![]() ![]() ![]() |
|
Singh, Rishabh |
![]() Xinyu Wang, Isil Dillig (University of Texas at Austin, USA; Microsoft Research, USA) We present a new approach to example-guided program synthesis based on counterexample-guided abstraction refinement. Our method uses the abstract semantics of the underlying DSL to find a program P whose abstract behavior satisfies the examples. However, since program P may be spurious with respect to the concrete semantics, our approach iteratively refines the abstraction until we either find a program that satisfies the examples or prove that no such DSL program exists. Because many programs have the same input-output behavior in terms of their abstract semantics, this synthesis methodology significantly reduces the search space compared to existing techniques that use purely concrete semantics. While synthesis using abstraction refinement (SYNGAR) could be implemented in different settings, we propose a refinement-based synthesis algorithm that uses abstract finite tree automata (AFTA). Our technique uses a coarse initial program abstraction to construct an initial AFTA, which is iteratively refined by constructing a proof of incorrectness of any spurious program. In addition to ruling out the spurious program accepted by the previous AFTA, proofs of incorrectness are also useful for ruling out many other spurious programs. We implement these ideas in a framework called Blaze, which can be instantiated in different domains by providing a suitable DSL and its corresponding concrete and abstract semantics. We have used the Blaze framework to build synthesizers for string and matrix transformations, and we compare Blaze with existing techniques. Our results for the string domain show that Blaze compares favorably with FlashFill, a domain-specific synthesizer that is now deployed in Microsoft PowerShell. In the context of matrix manipulations, we compare Blaze against Prose, a state-of-the-art general-purpose VSA-based synthesizer, and show that Blaze results in a 90x speed-up over Prose. In both application domains, Blaze also consistently improves upon the performance of two other existing techniques by at least an order of magnitude. ![]() ![]() ![]() Jeevana Priya Inala and Rishabh Singh (Massachusetts Institute of Technology, USA; Microsoft Research, USA) Data integration between web sources and relational data is a key challenge faced by data scientists and spreadsheet users. There are two main challenges in programmatically joining web data with relational data. First, most websites do not expose a direct interface to obtain tabular data, so the user needs to formulate a logic to get to different webpages for each input row in the relational table. Second, after reaching the desired webpage, the user needs to write complex scripts to extract the relevant data, which is often conditioned on the input data. Since many data scientists and end-users come from diverse backgrounds, writing such complex regular-expression based logical scripts to perform data integration tasks is unfortunately often beyond their programming expertise. We present WebRelate, a system that allows users to join semi-structured web data with relational data in spreadsheets using input-output examples. WebRelate decomposes the web data integration task into two sub-tasks of i) URL learning and ii) input-dependent web extraction. We introduce a novel synthesis paradigm called "Output-constrained Programming By Examples", which allows us to use the finite set of possible outputs for the new inputs to efficiently constrain the search in the synthesis algorithm. We instantiate this paradigm for the two sub-tasks in WebRelate. The first sub-task generates the URLs for the webpages containing the desired data for all rows in the relational table. WebRelate achieves this by learning a string transformation program using a few example URLs. The second sub-task uses examples of desired data to be extracted from the corresponding webpages and learns a program to extract the data for the other rows. We design expressive domain-specific languages for URL generation and web data extraction, and present efficient synthesis algorithms for learning programs in these DSLs from few input-output examples. We evaluate WebRelate on 88 real-world web data integration tasks taken from online help forums and Excel product team, and show that WebRelate can learn the desired programs within few seconds using only 1 example for the majority of the tasks. ![]() ![]() ![]() |
|
Sinha, Nishant |
![]() Marek Chalupa, Krishnendu Chatterjee (Masaryk University, Czechia; IST Austria, Austria; Kena Labs, India; IIT Bombay, India) We present a new dynamic partial-order reduction method for stateless model checking of concurrent programs. A common approach for exploring program behaviors relies on enumerating the traces of the program, without storing the visited states (aka stateless exploration). As the number of distinct traces grows exponentially, dynamic partial-order reduction (DPOR) techniques have been successfully used to partition the space of traces into equivalence classes (Mazurkiewicz partitioning), with the goal of exploring only few representative traces from each class. We introduce a new equivalence on traces under sequential consistency semantics, which we call the observation equivalence. Two traces are observationally equivalent if every read event observes the same write event in both traces. While the traditional Mazurkiewicz equivalence is control-centric, our new definition is data-centric. We show that our observation equivalence is coarser than the Mazurkiewicz equivalence, and in many cases even exponentially coarser. We devise a DPOR exploration of the trace space, called data-centric DPOR, based on the observation equivalence. For acyclic architectures, our algorithm is guaranteed to explore exactly one representative trace from each observation class, while spending polynomial time per class. Hence, our algorithm is optimal wrt the observation equivalence, and in several cases explores exponentially fewer traces than any enumerative method based on the Mazurkiewicz equivalence. For cyclic architectures, we consider an equivalence between traces which is finer than the observation equivalence; but coarser than the Mazurkiewicz equivalence, and in some cases is exponentially coarser. Our data-centric DPOR algorithm remains optimal under this trace equivalence. Finally, we perform a basic experimental comparison between the existing Mazurkiewicz-based DPOR and our data-centric DPOR on a set of academic benchmarks. Our results show a significant reduction in both running time and the number of explored equivalence classes. ![]() ![]() |
|
Spiwack, Arnaud |
![]() Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack (University of Gothenburg, Sweden; Tweag I/O, France; Indiana University, USA; Microsoft Research, UK) Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values. To demonstrate the efficacy of our linear type system — both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types — we implemented our type system in ghc, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions. ![]() ![]() ![]() |
|
Staton, Sam |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Stefanesco, Léo |
![]() Amin Timany, Léo Stefanesco, Morten Krogh-Jespersen, and Lars Birkedal (KU Leuven, Belgium; IRIF, France; CNRS, France; University of Paris Diderot, France; Aarhus University, Denmark) We present a logical relations model of a higher-order functional programming language with impredicative polymorphism, recursive types, and a Haskell-style ST monad type with runST. We use our logical relations model to show that runST provides proper encapsulation of state, by showing that effectful computations encapsulated by runST are heap independent. Furthermore, we show that contextual refinements and equivalences that are expected to hold for pure computations do indeed hold in the presence of runST. This is the first time such relational results have been proven for a language with monadic encapsulation of state. We have formalized all the technical development and results in Coq. ![]() ![]() ![]() |
|
Strub, Pierre-Yves |
![]() Gilles Barthe (IMDEA Software Institute, Spain; UPMC, France; Inria, France; University College London, UK; École Polytechnique, France) Program sensitivity, also known as Lipschitz continuity, describes how small changes in a program’s input lead to bounded changes in the output. We propose an average notion of program sensitivity for probabilistic programs—expected sensitivity—that averages a distance function over a probabilistic coupling of two output distributions from two similar inputs. By varying the distance, expected sensitivity recovers useful notions of probabilistic function sensitivity, including stability of machine learning algorithms and convergence of Markov chains. Furthermore, expected sensitivity satisfies clean compositional properties and is amenable to formal verification. We develop a relational program logic called EpRHL for proving expected sensitivity properties. Our logic features two key ideas. First, relational pre-conditions and post-conditions are expressed using distances, a real-valued generalization of typical boolean-valued (relational) assertions. Second, judgments are interpreted in terms of expectation coupling, a novel, quantitative generalization of probabilistic couplings which supports compositional reasoning. We demonstrate our logic on examples beyond the reach of prior relational logics. Our main example formalizes uniform stability of the stochastic gradient method. Furthermore, we prove rapid mixing for a probabilistic model of population dynamics. We also extend our logic with a transitivity principle for expectation couplings to capture the path coupling proof technique by Bubley and Dyer, and formalize rapid mixing of the Glauber dynamics from statistical physics. ![]() ![]() ![]() |
|
Stucki, Sandro |
![]() Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki (EPFL, Switzerland; Northeastern University, USA) Understanding a program entails understanding its context; dependencies, configurations and even implementations are all forms of contexts. Modern programming languages and theorem provers offer an array of constructs to define contexts, implicitly. Scala offers implicit parameters which are used pervasively, but which cannot be abstracted over. This paper describes a generalization of implicit parameters to implicit function types, a powerful way to abstract over the context in which some piece of code is run. We provide a formalization based on bidirectional type-checking that closely follows the semantics implemented by the Scala compiler. To demonstrate their range of abstraction capabilities, we present several applications that make use of implicit function types. We show how to encode the builder pattern, tagless interpreters, reader and free monads and we assess the performance of the monadic structures presented. ![]() ![]() ![]() |
|
Swamy, Nikhil |
![]() Danel Ahman, Cédric Fournet (Inria, France; Microsoft Research, UK; ENS Paris, France; Microsoft Research, India; Microsoft Research, USA) We provide a way to ease the verification of programs whose state evolves monotonically. The main idea is that a property witnessed in a prior state can be soundly recalled in the current state, provided (1) state evolves according to a given preorder, and (2) the property is preserved by this preorder. In many scenarios, such monotonic reasoning yields concise modular proofs, saving the need for explicit program invariants. We distill our approach into the monotonic-state monad, a general yet compact interface for Hoare-style reasoning about monotonic state in a dependently typed language. We prove the soundness of the monotonic-state monad and use it as a unified foundation for reasoning about monotonic state in the F⋆ verification system. Based on this foundation, we build libraries for various mutable data structures like monotonic references and apply these libraries at scale to the verification of several distributed applications. ![]() ![]() ![]() ![]() |
|
Ta, Quang-Trung |
![]() Quang-Trung Ta, Ton Chanh Le, Siau-Cheng Khoo, and Wei-Ngan Chin (National University of Singapore, Singapore) The symbolic-heap fragment of separation logic has been actively developed and advocated for verifying the memory-safety property of computer programs. At present, one of its biggest challenges is to effectively prove entailments containing inductive heap predicates. These entailments are usually proof obligations generated when verifying programs that manipulate complex data structures like linked lists, trees, or graphs. To assist in proving such entailments, this paper introduces a lemma synthesis framework, which automatically discovers lemmas to serve as eureka steps in the proofs. Mathematical induction and template-based constraint solving are two pillars of our framework. To derive the supporting lemmas for a given entailment, the framework firstly identifies possible lemma templates from the entailment's heap structure. It then sets up unknown relations among each template's variables and conducts structural induction proof to generate constraints about these relations. Finally, it solves the constraints to find out actual definitions of the unknown relations, thus discovers the lemmas. We have integrated this framework into a prototype prover and have experimented it on various entailment benchmarks. The experimental results show that our lemma-synthesis-assisted prover can prove many entailments that could not be handled by existing techniques. This new proposal opens up more opportunities to automatically reason with complex inductive heap predicates. ![]() ![]() ![]() ![]() |
|
Tasson, Christine |
![]() Thomas Ehrhard, Michele Pagani, and Christine Tasson (University of Paris Diderot, France; CNRS, France) We define a notion of stable and measurable map between cones endowed with measurability tests and show that it forms a cpo-enriched cartesian closed category. This category gives a denotational model of an extension of PCF supporting the main primitives of probabilistic functional programming, like continuous and discrete probabilistic distributions, sampling, conditioning and full recursion. We prove the soundness and adequacy of this model with respect to a call-by-name operational semantics and give some examples of its denotations. ![]() ![]() ![]() |
|
Tatlock, Zachary |
![]() Ilya Sergey, James R. Wilcox, and Zachary Tatlock (University College London, UK; University of Washington, USA) Distributed systems play a crucial role in modern infrastructure, but are notoriously difficult to implement correctly. This difficulty arises from two main challenges: (a) correctly implementing core system components (e.g., two-phase commit), so all their internal invariants hold, and (b) correctly composing standalone system components into functioning trustworthy applications (e.g., persistent storage built on top of a two-phase commit instance). Recent work has developed several approaches for addressing (a) by means of mechanically verifying implementations of core distributed components, but no methodology exists to address (b) by composing such verified components into larger verified applications. As a result, expensive verification efforts for key system components are not easily reusable, which hinders further verification efforts. In this paper, we present Disel, the first framework for implementation and compositional verification of distributed systems and their clients, all within the mechanized, foundational context of the Coq proof assistant. In Disel, users implement distributed systems using a domain specific language shallowly embedded in Coq and providing both high-level programming constructs as well as low-level communication primitives. Components of composite systems are specified in Disel as protocols, which capture system-specific logic and disentangle system definitions from implementation details. By virtue of Disel's dependent type system, well-typed implementations always satisfy their protocols' invariants and never go wrong, allowing users to verify system implementations interactively using Disel's Hoare-style program logic, which extends state-of-the-art techniques for concurrency verification to the distributed setting. By virtue of the substitution principle and frame rule provided by Disel's logic, system components can be composed leading to modular, reusable verified distributed systems. We describe Disel, illustrate its use with a series of examples, outline its logic and metatheory, and report on our experience using it as a framework for implementing, specifying, and verifying distributed systems. ![]() ![]() ![]() ![]() |
|
Terauchi, Tachio |
![]() Hiroshi Unno (University of Tsukuba, Japan; Waseda University, Japan) This paper considers verification of non-deterministic higher-order functional programs. Our contribution is a novel type system in which the types are used to express and verify (conditional) safety, termination, non-safety, and non-termination properties in the presence of ∀-∃ branching behavior due to non-determinism. For instance, the judgement ⊢ e:{u: int | φ(u) }∀∀ says that every evaluation of e either diverges or reduces to some integer u satisfying φ(u), whereas ⊢ e:{u: int | ψ(u) }∃∀ says that there exists an evaluation of e that either diverges or reduces to some integer u satisfying ψ(u). Note that the former is a safety property whereas the latter is a counterexample to a (conditional) termination property. Following the recent work on type-based verification methods for deterministic higher-order functional programs, we formalize the idea on the foundation of dependent refinement types, thereby allowing the type system to express and verify rich properties involving program values, branching behaviors, and the combination thereof. Our type system is able to seamlessly combine deductions of both universal and existential facts within a unified framework, paving the way for an exciting opportunity for new type-based verification methods that combine both universal and existential reasoning. For example, our system can prove the existence of a path violating some safety property from a proof of termination that uses a well-foundedness termination argument. We prove that our type system is sound and relatively complete, and further, thanks to having both modes of non-determinism, we show that our types are closed under complement. ![]() ![]() |
|
Timany, Amin |
![]() Amin Timany, Léo Stefanesco, Morten Krogh-Jespersen, and Lars Birkedal (KU Leuven, Belgium; IRIF, France; CNRS, France; University of Paris Diderot, France; Aarhus University, Denmark) We present a logical relations model of a higher-order functional programming language with impredicative polymorphism, recursive types, and a Haskell-style ST monad type with runST. We use our logical relations model to show that runST provides proper encapsulation of state, by showing that effectful computations encapsulated by runST are heap independent. Furthermore, we show that contextual refinements and equivalences that are expected to hold for pure computations do indeed hold in the presence of runST. This is the first time such relational results have been proven for a language with monadic encapsulation of state. We have formalized all the technical development and results in Coq. ![]() ![]() ![]() |
|
Tobin-Hochstadt, Sam |
![]() Phúc C. Nguyễn, Thomas Gilray, Sam Tobin-Hochstadt, and David Van Horn (University of Maryland, USA; Indiana University, USA) Software contracts allow programmers to state rich program properties using the full expressive power of an object language. However, since they are enforced at runtime, monitoring contracts imposes significant overhead and delays error discovery. So contract veri cation aims to guarantee all or most of these properties ahead of time, enabling valuable optimizations and yielding a more general assurance of correctness. Existing methods for static contract verification satisfy the needs of more restricted target languages, but fail to address the challenges unique to those conjoining untyped, dynamic programming, higher-order functions, modularity, and statefulness. Our approach tackles all these features at once, in the context of the full Racket system—a mature environment for stateful, higher-order, multi-paradigm programming with or with- out types. Evaluating our method using a set of both pure and stateful benchmarks, we are able to verify 99.94% of checks statically (all but 28 of 49, 861). Stateful, higher-order functions pose significant challenges for static contract verification in particular. In the presence of these features, a modular analysis must permit code from the current module to escape permanently to an opaque context (unspecified code from outside the current module) that may be stateful and therefore store a reference to the escaped closure. Also, contracts themselves, being predicates wri en in unrestricted Racket, may exhibit stateful behavior; a sound approach must be robust to contracts which are arbitrarily expressive and interwoven with the code they monitor. In this paper, we present and evaluate our solution based on higher-order symbolic execution, explain the techniques we used to address such thorny issues, formalize a notion of behavioral approximation, and use it to provide a mechanized proof of soundness. ![]() ![]() ![]() ![]() |
|
Tolmach, Andrew |
![]() Casper Bach Poulsen (Delft University of Technology, Netherlands; Portland State University, USA) A definitional interpreter defines the semantics of an object language in terms of the (well-known) semantics of a host language, enabling understanding and validation of the semantics through execution. Combining a definitional interpreter with a separate type system requires a separate type safety proof. An alternative approach, at least for pure object languages, is to use a dependently-typed language to encode the object language type system in the definition of the abstract syntax. Using such intrinsically-typed abstract syntax definitions allows the host language type checker to verify automatically that the interpreter satisfies type safety. Does this approach scale to larger and more realistic object languages, and in particular to languages with mutable state and objects? In this paper, we describe and demonstrate techniques and libraries in Agda that successfully scale up intrinsically-typed definitional interpreters to handle rich object languages with non-trivial binding structures and mutable state. While the resulting interpreters are certainly more complex than the simply-typed λ-calculus interpreter we start with, we claim that they still meet the goals of being concise, comprehensible, and executable, while guaranteeing type safety for more elaborate object languages. We make the following contributions: (1) A dependent-passing style technique for hiding the weakening of indexed values as they propagate through monadic code. (2) An Agda library for programming with scope graphs and frames, which provides a uniform approach to dealing with name binding in intrinsically-typed interpreters. (3) Case studies of intrinsically-typed definitional interpreters for the simply-typed λ-calculus with references (STLC+Ref) and for a large subset of Middleweight Java (MJ). ![]() ![]() ![]() |
|
Tondwalkar, Anish |
![]() Niki Vazou, Anish Tondwalkar, Vikraman Choudhury (University of Maryland, USA; University of California at San Diego, USA; Indiana University, USA; University of Edinburgh, UK; Input Output HK, UK) We introduce Refinement Reflection, a new framework for building SMT-based deductive verifiers. The key idea is to reflect the code implementing a user-defined function into the function’s (output) refinement type. As a consequence, at uses of the function, the function definition is instantiated in the SMT logic in a precise fashion that permits decidable verification. Reflection allows the user to write equational proofs of programs just by writing other programs using pattern-matching and recursion to perform case-splitting and induction. Thus, via the propositions-as-types principle, we show that reflection permits the specification of arbitrary functional correctness properties. Finally, we introduce a proof-search algorithm called Proof by Logical Evaluation that uses techniques from model checking and abstract interpretation, to completely automate equational reasoning. We have implemented reflection in Liquid Haskell and used it to verify that the widely used instances of the Monoid, Applicative, Functor, and Monad typeclasses actually satisfy key algebraic laws required to make the clients safe, and have used reflection to build the first library that actually verifies assumptions about associativity and ordering that are crucial for safe deterministic parallelism. ![]() ![]() ![]() |
|
Torlak, Emina |
![]() Stephen Chang, Alex Knauth, and Emina Torlak (Northeastern University, USA; University of Washington, USA) We present lambda_sym, a typed λ-calculus for lenient symbolic execution, where some language constructs do not recognize symbolic values. Its type system, however, ensures safe behavior of all symbolic values in a program. Our calculus extends a base occurrence typing system with symbolic types and mutable state, making it a suitable model for both functional and imperative symbolically executed languages. Naively allowing mutation in this mixed setting introduces soundness issues, however, so we further add concreteness polymorphism, which restores soundness without rejecting too many valid programs. To show that our calculus is a useful model for a real language, we implemented Typed Rosette, a typed extension of the solver-aided Rosette language. We evaluate Typed Rosette by porting a large code base, demonstrating that our type system accommodates a wide variety of symbolically executed programs. ![]() ![]() ![]() |
|
Unno, Hiroshi |
![]() Hiroshi Unno (University of Tsukuba, Japan; Waseda University, Japan) This paper considers verification of non-deterministic higher-order functional programs. Our contribution is a novel type system in which the types are used to express and verify (conditional) safety, termination, non-safety, and non-termination properties in the presence of ∀-∃ branching behavior due to non-determinism. For instance, the judgement ⊢ e:{u: int | φ(u) }∀∀ says that every evaluation of e either diverges or reduces to some integer u satisfying φ(u), whereas ⊢ e:{u: int | ψ(u) }∃∀ says that there exists an evaluation of e that either diverges or reduces to some integer u satisfying ψ(u). Note that the former is a safety property whereas the latter is a counterexample to a (conditional) termination property. Following the recent work on type-based verification methods for deterministic higher-order functional programs, we formalize the idea on the foundation of dependent refinement types, thereby allowing the type system to express and verify rich properties involving program values, branching behaviors, and the combination thereof. Our type system is able to seamlessly combine deductions of both universal and existential facts within a unified framework, paving the way for an exciting opportunity for new type-based verification methods that combine both universal and existential reasoning. For example, our system can prove the existence of a path violating some safety property from a proof of termination that uses a well-foundedness termination argument. We prove that our type system is sound and relatively complete, and further, thanks to having both modes of non-determinism, we show that our types are closed under complement. ![]() ![]() |
|
![]() |
Vafeiadis, Viktor |
![]() Michalis Kokologiannakis, Ori Lahav (National Technical University of Athens, Greece; Tel Aviv University, Israel; Uppsala University, Sweden; MPI-SWS, Germany) We present a stateless model checking algorithm for verifying concurrent programs running under RC11, a repaired version of the C/C++11 memory model without dependency cycles. Unlike most previous approaches, which enumerate thread interleavings up to some partial order reduction improvements, our approach works directly on execution graphs and (in the absence of RMW instructions and SC atomics) avoids redundant exploration by construction. We have implemented a model checker, called RCMC, based on this approach and applied it to a number of challenging concurrent programs. Our experiments confirm that RCMC is significantly faster, scales better than other model checking tools, and is also more resilient to small changes in the benchmarks. ![]() ![]() ![]() ![]() |
Vaidya, Kapil |
![]() Marek Chalupa, Krishnendu Chatterjee (Masaryk University, Czechia; IST Austria, Austria; Kena Labs, India; IIT Bombay, India) We present a new dynamic partial-order reduction method for stateless model checking of concurrent programs. A common approach for exploring program behaviors relies on enumerating the traces of the program, without storing the visited states (aka stateless exploration). As the number of distinct traces grows exponentially, dynamic partial-order reduction (DPOR) techniques have been successfully used to partition the space of traces into equivalence classes (Mazurkiewicz partitioning), with the goal of exploring only few representative traces from each class. We introduce a new equivalence on traces under sequential consistency semantics, which we call the observation equivalence. Two traces are observationally equivalent if every read event observes the same write event in both traces. While the traditional Mazurkiewicz equivalence is control-centric, our new definition is data-centric. We show that our observation equivalence is coarser than the Mazurkiewicz equivalence, and in many cases even exponentially coarser. We devise a DPOR exploration of the trace space, called data-centric DPOR, based on the observation equivalence. For acyclic architectures, our algorithm is guaranteed to explore exactly one representative trace from each observation class, while spending polynomial time per class. Hence, our algorithm is optimal wrt the observation equivalence, and in several cases explores exponentially fewer traces than any enumerative method based on the Mazurkiewicz equivalence. For cyclic architectures, we consider an equivalence between traces which is finer than the observation equivalence; but coarser than the Mazurkiewicz equivalence, and in some cases is exponentially coarser. Our data-centric DPOR algorithm remains optimal under this trace equivalence. Finally, we perform a basic experimental comparison between the existing Mazurkiewicz-based DPOR and our data-centric DPOR on a set of academic benchmarks. Our results show a significant reduction in both running time and the number of explored equivalence classes. ![]() ![]() |
|
Vákár, Matthijs |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Van Horn, David |
![]() Phúc C. Nguyễn, Thomas Gilray, Sam Tobin-Hochstadt, and David Van Horn (University of Maryland, USA; Indiana University, USA) Software contracts allow programmers to state rich program properties using the full expressive power of an object language. However, since they are enforced at runtime, monitoring contracts imposes significant overhead and delays error discovery. So contract veri cation aims to guarantee all or most of these properties ahead of time, enabling valuable optimizations and yielding a more general assurance of correctness. Existing methods for static contract verification satisfy the needs of more restricted target languages, but fail to address the challenges unique to those conjoining untyped, dynamic programming, higher-order functions, modularity, and statefulness. Our approach tackles all these features at once, in the context of the full Racket system—a mature environment for stateful, higher-order, multi-paradigm programming with or with- out types. Evaluating our method using a set of both pure and stateful benchmarks, we are able to verify 99.94% of checks statically (all but 28 of 49, 861). Stateful, higher-order functions pose significant challenges for static contract verification in particular. In the presence of these features, a modular analysis must permit code from the current module to escape permanently to an opaque context (unspecified code from outside the current module) that may be stateful and therefore store a reference to the escaped closure. Also, contracts themselves, being predicates wri en in unrestricted Racket, may exhibit stateful behavior; a sound approach must be robust to contracts which are arbitrarily expressive and interwoven with the code they monitor. In this paper, we present and evaluate our solution based on higher-order symbolic execution, explain the techniques we used to address such thorny issues, formalize a notion of behavioral approximation, and use it to provide a mechanized proof of soundness. ![]() ![]() ![]() ![]() |
|
Vazou, Niki |
![]() Niki Vazou, Anish Tondwalkar, Vikraman Choudhury (University of Maryland, USA; University of California at San Diego, USA; Indiana University, USA; University of Edinburgh, UK; Input Output HK, UK) We introduce Refinement Reflection, a new framework for building SMT-based deductive verifiers. The key idea is to reflect the code implementing a user-defined function into the function’s (output) refinement type. As a consequence, at uses of the function, the function definition is instantiated in the SMT logic in a precise fashion that permits decidable verification. Reflection allows the user to write equational proofs of programs just by writing other programs using pattern-matching and recursion to perform case-splitting and induction. Thus, via the propositions-as-types principle, we show that reflection permits the specification of arbitrary functional correctness properties. Finally, we introduce a proof-search algorithm called Proof by Logical Evaluation that uses techniques from model checking and abstract interpretation, to completely automate equational reasoning. We have implemented reflection in Liquid Haskell and used it to verify that the widely used instances of the Monoid, Applicative, Functor, and Monad typeclasses actually satisfy key algebraic laws required to make the clients safe, and have used reflection to build the first library that actually verifies assumptions about associativity and ordering that are crucial for safe deterministic parallelism. ![]() ![]() ![]() |
|
Vechev, Martin |
![]() Gagandeep Singh, Markus Püschel, and Martin Vechev (ETH Zurich, Switzerland) Numerical abstract domains such as Polyhedra, Octahedron, Octagon, Interval, and others are an essential component of static program analysis. The choice of domain offers a performance/precision tradeoff ranging from cheap and imprecise (Interval) to expensive and precise (Polyhedra). Recently, significant speedups were achieved for Octagon and Polyhedra by manually decomposing their transformers to work with the Cartesian product of projections associated with partitions of the variable set. While practically useful, this manual process is time consuming, error-prone, and has to be applied from scratch for every domain. In this paper, we present a generic approach for decomposing the transformers of sub-polyhedra domains along with conditions for checking whether the decomposed transformers lose precision with respect to the original transformers. These conditions are satisfied by most practical transformers, thus our approach is suitable for increasing the performance of these transformers without compromising their precision. Furthermore, our approach is ``black box:'' it does not require changes to the internals of the original non-decomposed transformers or additional manual effort per domain. We implemented our approach and applied it to the domains of Zones, Octagon, and Polyhedra. We then compared the performance of the decomposed transformers obtained with our generic method versus the state of the art: the (non-decomposed) PPL for Polyhedra and the much faster ELINA (which uses manual decomposition) for Polyhedra and Octagon. Against ELINA we demonstrate finer partitions and an associated speedup of about 2x on average. Our results indicate that the general construction presented in this work is a viable method for improving the performance of sub-polyhedra domains. It enables designers of abstract domains to benefit from decomposition without re-writing all of their transformers from scratch as required by prior work. ![]() ![]() ![]() ![]() |
|
Vezzosi, Andrea |
![]() Andreas Abel, Joakim Öhman (University of Gothenburg, Sweden; IMDEA Software Institute, Spain; Chalmers University of Technology, Sweden) Type theory should be able to handle its own meta-theory, both to justify its foundational claims and to obtain a verified implementation. At the core of a type checker for intensional type theory lies an algorithm to check equality of types, or in other words, to check whether two types are convertible. We have formalized in Agda a practical conversion checking algorithm for a dependent type theory with one universe à la Russell, natural numbers, and η-equality for Π types. We prove the algorithm correct via a Kripke logical relation parameterized by a suitable notion of equivalence of terms. We then instantiate the parameterized fundamental lemma twice: once to obtain canonicity and injectivity of type formers, and once again to prove the completeness of the algorithm. Our proof relies on inductive-recursive definitions, but not on the uniqueness of identity proofs. Thus, it is valid in variants of intensional Martin-Löf Type Theory as long as they support induction-recursion, for instance, Extensional, Observational, or Homotopy Type Theory. ![]() ![]() ![]() |
|
Vial, Pierre |
![]() Damiano Mazza, Luc Pellissier, and Pierre Vial (CNRS, France; University of Paris 13, France; University of Paris Diderot, France) Starting from an exact correspondence between linear approximations and non-idempotent intersection types, we develop a general framework for building systems of intersection types characterizing normalization properties. We show how this construction, which uses in a fundamental way Melliès and Zeilberger's ``type systems as functors'' viewpoint, allows us to recover equivalent versions of every well known intersection type system (including Coppo and Dezani's original system, as well as its non-idempotent variants independently introduced by Gardner and de Carvalho). We also show how new systems of intersection types may be built almost automatically in this way. ![]() ![]() |
|
Visser, Eelco |
![]() Casper Bach Poulsen (Delft University of Technology, Netherlands; Portland State University, USA) A definitional interpreter defines the semantics of an object language in terms of the (well-known) semantics of a host language, enabling understanding and validation of the semantics through execution. Combining a definitional interpreter with a separate type system requires a separate type safety proof. An alternative approach, at least for pure object languages, is to use a dependently-typed language to encode the object language type system in the definition of the abstract syntax. Using such intrinsically-typed abstract syntax definitions allows the host language type checker to verify automatically that the interpreter satisfies type safety. Does this approach scale to larger and more realistic object languages, and in particular to languages with mutable state and objects? In this paper, we describe and demonstrate techniques and libraries in Agda that successfully scale up intrinsically-typed definitional interpreters to handle rich object languages with non-trivial binding structures and mutable state. While the resulting interpreters are certainly more complex than the simply-typed λ-calculus interpreter we start with, we claim that they still meet the goals of being concise, comprehensible, and executable, while guaranteeing type safety for more elaborate object languages. We make the following contributions: (1) A dependent-passing style technique for hiding the weakening of indexed values as they propagate through monadic code. (2) An Agda library for programming with scope graphs and frames, which provides a uniform approach to dealing with name binding in intrinsically-typed interpreters. (3) Case studies of intrinsically-typed definitional interpreters for the simply-typed λ-calculus with references (STLC+Ref) and for a large subset of Middleweight Java (MJ). ![]() ![]() ![]() |
|
Vitek, Jan |
![]() Olivier Flückiger (Northeastern University, USA; Inria, France; Czech Technical University, Czechia) High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition. ![]() ![]() |
|
Voizard, Antoine |
![]() Lionel Parreaux (EPFL, Switzerland; University of Pennsylvania, USA) Metaprograms are programs that manipulate (generate, analyze and evaluate) other programs. These tasks are greatly facilitated by quasiquotation, a technique to construct and deconstruct program fragments using quoted code templates expressed in the syntax of the manipulated language. We argue that two main flavors of quasiquotes have existed so far: Lisp-style quasiquotes, which can both construct and deconstruct programs but may produce code that contains type mismatches and unbound variables; and MetaML-style quasiquotes, which rely on static typing to prevent these errors, but can only construct programs. In this paper, we show how to combine the advantages of both flavors into a unified framework: we allow the construction, deconstruction and evaluation of program fragments while ensuring that generated programs are well-typed and well-scoped, a combination unseen in previous work. We formalize our approach as λ{}, a multi-stage calculus with code pattern matching and rewriting, and prove its type safety. We also present its realization in Squid, a metaprogramming framework for Scala, leveraging Scala’s expressive type system. To demonstrate the usefulness of our approach, we introduce speculative rewrite rules, a novel code transformation technique that makes decisive use of these capabilities, and we outline how it simplifies the design of some crucial query compiler optimizations. ![]() ![]() ![]() |
|
![]() |
Vojnar, Tomáš |
![]() Lukáš Holík (Brno University of Technology, Czechia; University of Oxford, UK; Uppsala University, Sweden) String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature. ![]() ![]() ![]() ![]() |
Wadler, Philip |
![]() Niki Vazou, Anish Tondwalkar, Vikraman Choudhury (University of Maryland, USA; University of California at San Diego, USA; Indiana University, USA; University of Edinburgh, UK; Input Output HK, UK) We introduce Refinement Reflection, a new framework for building SMT-based deductive verifiers. The key idea is to reflect the code implementing a user-defined function into the function’s (output) refinement type. As a consequence, at uses of the function, the function definition is instantiated in the SMT logic in a precise fashion that permits decidable verification. Reflection allows the user to write equational proofs of programs just by writing other programs using pattern-matching and recursion to perform case-splitting and induction. Thus, via the propositions-as-types principle, we show that reflection permits the specification of arbitrary functional correctness properties. Finally, we introduce a proof-search algorithm called Proof by Logical Evaluation that uses techniques from model checking and abstract interpretation, to completely automate equational reasoning. We have implemented reflection in Liquid Haskell and used it to verify that the widely used instances of the Monoid, Applicative, Functor, and Monad typeclasses actually satisfy key algebraic laws required to make the clients safe, and have used reflection to build the first library that actually verifies assumptions about associativity and ordering that are crucial for safe deterministic parallelism. ![]() ![]() ![]() |
|
Walker, David |
![]() Anders Miltner, Kathleen Fisher, Benjamin C. Pierce (Princeton University, USA; Tufts University, USA; University of Pennsylvania, USA) Bidirectional transformations between different data representations occur frequently in modern software systems. They appear as serializers and deserializers, as parsers and pretty printers, as database views and view updaters, and as a multitude of different kinds of ad hoc data converters. Manually building bidirectional transformations---by writing two separate functions that are intended to be inverses---is tedious and error prone. A better approach is to use a domain-specific language in which both directions can be written as a single expression. However, these domain-specific languages can be difficult to program in, requiring programmers to manage fiddly details while working in a complex type system. We present an alternative approach. Instead of coding transformations manually, we synthesize them from declarative format descriptions and examples. Specifically, we present Optician, a tool for type-directed synthesis of bijective string transformers. The inputs to Optician are a pair of ordinary regular expressions representing two data formats and a few concrete examples for disambiguation. The output is a well-typed program in Boomerang (a bidirectional language based on the theory of lenses). The main technical challenge involves navigating the vast program search space efficiently. In particular, and unlike most prior work on type-directed synthesis, our system operates in the context of a language with a rich equivalence relation on types (the theory of regular expressions). Consequently, program synthesis requires search in two dimensions: First, our synthesis algorithm must find a pair of "syntactically compatible types," and second, using the structure of those types, it must find a type- and example-compliant term. Our key insight is that it is possible to reduce the size of this search space without losing any computational power by defining a new language of lenses designed specifically for synthesis. The new language is free from arbitrary function composition and operates only over types and terms in a new disjunctive normal form. We prove (1) our new language is just as powerful as a more natural, compositional, and declarative language and (2) our synthesis algorithm is sound and complete with respect to the new language. We also demonstrate empirically that our new language changes the synthesis problem from one that admits intractable solutions to one that admits highly efficient solutions, able to synthesize intricate lenses between complex file formats in seconds. We evaluate Optician on a benchmark suite of 39 examples that includes both microbenchmarks and realistic examples derived from other data management systems including Flash Fill, a tool for synthesizing string transformations in spreadsheets, and Augeas, a tool for bidirectional processing of Linux system configuration files. ![]() ![]() ![]() |
|
Walkingshaw, Eric |
![]() John Peter Campora, Sheng Chen, Martin Erwig, and Eric Walkingshaw (University of Louisiana at Lafayette, USA; Oregon State University, USA) Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically-typed to more statically-typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared. In this paper, we address this problem by developing migrational typing, which efficiently types all possible ways of adding or removing type annotations from a gradually typed program. The typing result supports automatically migrating a program to be as static as possible, or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing. ![]() ![]() ![]() |
|
Wang, Xinyu |
![]() Xinyu Wang, Isil Dillig (University of Texas at Austin, USA; Microsoft Research, USA) We present a new approach to example-guided program synthesis based on counterexample-guided abstraction refinement. Our method uses the abstract semantics of the underlying DSL to find a program P whose abstract behavior satisfies the examples. However, since program P may be spurious with respect to the concrete semantics, our approach iteratively refines the abstraction until we either find a program that satisfies the examples or prove that no such DSL program exists. Because many programs have the same input-output behavior in terms of their abstract semantics, this synthesis methodology significantly reduces the search space compared to existing techniques that use purely concrete semantics. While synthesis using abstraction refinement (SYNGAR) could be implemented in different settings, we propose a refinement-based synthesis algorithm that uses abstract finite tree automata (AFTA). Our technique uses a coarse initial program abstraction to construct an initial AFTA, which is iteratively refined by constructing a proof of incorrectness of any spurious program. In addition to ruling out the spurious program accepted by the previous AFTA, proofs of incorrectness are also useful for ruling out many other spurious programs. We implement these ideas in a framework called Blaze, which can be instantiated in different domains by providing a suitable DSL and its corresponding concrete and abstract semantics. We have used the Blaze framework to build synthesizers for string and matrix transformations, and we compare Blaze with existing techniques. Our results for the string domain show that Blaze compares favorably with FlashFill, a domain-specific synthesizer that is now deployed in Microsoft PowerShell. In the context of matrix manipulations, we compare Blaze against Prose, a state-of-the-art general-purpose VSA-based synthesizer, and show that Blaze results in a 90x speed-up over Prose. In both application domains, Blaze also consistently improves upon the performance of two other existing techniques by at least an order of magnitude. ![]() ![]() |
|
Wang, Yuepeng |
![]() Yuepeng Wang, Isil Dillig (University of Texas at Austin, USA; Microsoft Research, USA) This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the same externally observable behavior before and after schema migration, there are no existing tools for proving equivalence of such programs. This paper takes a first step towards solving this problem by formalizing the equivalence and refinement checking problems for database-driven applications. We also propose a proof methodology based on the notion of bisimulation invariants over relational algebra with updates and describe a technique for synthesizing such bisimulation invariants. We have implemented the proposed technique in a tool called Mediator for verifying equivalence between database-driven applications written in our intermediate language and evaluate our tool on 21 benchmarks extracted from textbooks and real-world web applications. Our results show that the proposed methodology can successfully verify 20 of these benchmarks. ![]() ![]() |
|
Wies, Thomas |
![]() Siddharth Krishna, Dennis Shasha, and Thomas Wies (New York University, USA) Concurrent separation logics have helped to significantly simplify correctness proofs for concurrent data structures. However, a recurring problem in such proofs is that data structure abstractions that work well in the sequential setting are much harder to reason about in a concurrent setting due to complex sharing and overlays. To solve this problem, we propose a novel approach to abstracting regions in the heap by encoding the data structure invariant into a local condition on each individual node. This condition may depend on a quantity associated with the node that is computed as a fixpoint over the entire heap graph. We refer to this quantity as a flow. Flows can encode both structural properties of the heap (e.g. the reachable nodes from the root form a tree) as well as data invariants (e.g. sortedness). We then introduce the notion of a flow interface, which expresses the relies and guarantees that a heap region imposes on its context to maintain the local flow invariant with respect to the global heap. Our main technical result is that this notion leads to a new semantic model of separation logic. In this model, flow interfaces provide a general abstraction mechanism for describing complex data structures. This abstraction mechanism admits proof rules that generalize over a wide variety of data structures. To demonstrate the versatility of our approach, we show how to extend the logic RGSep with flow interfaces. We have used this new logic to prove linearizability and memory safety of nontrivial concurrent data structures. In particular, we obtain parametric linearizability proofs for concurrent dictionary algorithms that abstract from the details of the underlying data structure representation. These proofs cannot be easily expressed using the abstraction mechanisms provided by existing separation logics. ![]() ![]() |
|
Wilcox, James R. |
![]() Ilya Sergey, James R. Wilcox, and Zachary Tatlock (University College London, UK; University of Washington, USA) Distributed systems play a crucial role in modern infrastructure, but are notoriously difficult to implement correctly. This difficulty arises from two main challenges: (a) correctly implementing core system components (e.g., two-phase commit), so all their internal invariants hold, and (b) correctly composing standalone system components into functioning trustworthy applications (e.g., persistent storage built on top of a two-phase commit instance). Recent work has developed several approaches for addressing (a) by means of mechanically verifying implementations of core distributed components, but no methodology exists to address (b) by composing such verified components into larger verified applications. As a result, expensive verification efforts for key system components are not easily reusable, which hinders further verification efforts. In this paper, we present Disel, the first framework for implementation and compositional verification of distributed systems and their clients, all within the mechanized, foundational context of the Coq proof assistant. In Disel, users implement distributed systems using a domain specific language shallowly embedded in Coq and providing both high-level programming constructs as well as low-level communication primitives. Components of composite systems are specified in Disel as protocols, which capture system-specific logic and disentangle system definitions from implementation details. By virtue of Disel's dependent type system, well-typed implementations always satisfy their protocols' invariants and never go wrong, allowing users to verify system implementations interactively using Disel's Hoare-style program logic, which extends state-of-the-art techniques for concurrency verification to the distributed setting. By virtue of the substitution principle and frame rule provided by Disel's logic, system components can be composed leading to modular, reusable verified distributed systems. We describe Disel, illustrate its use with a series of examples, outline its logic and metatheory, and report on our experience using it as a framework for implementing, specifying, and verifying distributed systems. ![]() ![]() ![]() ![]() |
|
Williams, Thomas |
![]() Thomas Williams and Didier Rémy (Inria, France) Ornaments are a way to describe changes in datatype definitions reorganizing, adding, or dropping some pieces of data so that functions operating on the bare definition can be partially and sometimes totally lifted into functions operating on the ornamented structure. We propose an extension of ML with higher-order ornaments, demonstrate its expressiveness with a few typical examples, including code refactoring, study the metatheoretical properties of ornaments, and describe their elaboration process. We formalize ornamentation via an a posteriori abstraction of the bare code, returning a generic term, which lives in a meta-language above ML. The lifted code is obtained by application of the generic term to well-chosen arguments, followed by staged reduction, and some remaining simplifications. We use logical relations to closely relate the lifted code to the bare code. ![]() ![]() ![]() ![]() |
|
Wood, Thomas |
![]() José Fragoso Santos, Petar Maksimović, Daiva Naudžiūnienė, Thomas Wood, and Philippa Gardner (Imperial College London, UK; Mathematical Institute SASA, Serbia) The dynamic nature of JavaScript and its complex semantics make it a difficult target for logic-based verification. We introduce JaVerT, a semi-automatic JavaScript Verification Toolchain, based on separation logic and aimed at the specialist developer wanting rich, mechanically verified specifications of critical JavaScript code. To specify JavaScript programs, we design abstractions that capture its key heap structures (for example, prototype chains and function closures), allowing the developer to write clear and succinct specifications with minimal knowledge of the JavaScript internals. To verify JavaScript programs, we develop JaVerT, a verification pipeline consisting of: JS-2-JSIL, a well-tested compiler from JavaScript to JSIL, an intermediate goto language capturing the fundamental dynamic features of JavaScript; JSIL Verify, a semi-automatic verification tool based on a sound JSIL separation logic; and verified axiomatic specifications of the JavaScript internal functions. Using JaVerT, we verify functional correctness properties of: data-structure libraries (key-value map, priority queue) written in an object-oriented style; operations on data structures such as binary search trees (BSTs) and lists; examples illustrating function closures; and test cases from the official ECMAScript test suite. The verification times suggest that reasoning about larger, more complex code using JaVerT is feasible. ![]() ![]() ![]() |
|
Wu, Zhilin |
![]() Taolue Chen, Yan Chen, Matthew Hague (University of London, UK; Institute of Software at Chinese Academy of Sciences, China; University at Chinese Academy of Sciences, China; Royal Holloway University of London, UK; University of Oxford, UK) The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in practice; for example, the use of regular constraints (pattern matching against a regular expression), and the string-replace function (replacing either the first occurrence or all occurrences of a ``pattern'' string constant/variable/regular expression by a ``replacement'' string constant/variable), among many others. Both regular constraints and the string-replace function are crucial for such applications as analysis of JavaScript (or more generally HTML5 applications) against cross-site scripting (XSS) vulnerabilities, which motivates us to consider a richer class of string constraints. The importance of the string-replace function (especially the replace-all facility) is increasingly recognised, which can be witnessed by the incorporation of the function in the input languages of several string constraint solvers. Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typically met by string constraints that are generated by symbolic execution. In this paper, we provide the first systematic study of straight-line string constraints with the string-replace function and the regular constraints as the basic operations. We show that a large class of such constraints (i.e. when only a constant string or a regular expression is permitted in the pattern) is decidable. We note that the string-replace function, even under this restriction, is sufficiently powerful for expressing the concatenation operator and much more (e.g. extensions of regular expressions with string variables). This gives us the most expressive decidable logic containing concatenation, replace, and regular constraints under the same umbrella. Our decision procedure for the straight-line fragment follows an automata-theoretic approach, and is modular in the sense that the string-replace terms are removed one by one to generate more and more regular constraints, which can then be discharged by the state-of-the-art string constraint solvers. We also show that this fragment is, in a way, a maximal decidable subclass of the straight-line fragment with string-replace and regular constraints. To this end, we show undecidability results for the following two extensions: (1) variables are permitted in the pattern parameter of the replace function, (2) length constraints are permitted. ![]() ![]() |
|
Yang, Hongseok |
![]() Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton (University of Cambridge, UK; MPI Tübingen, Germany; University of Oxford, UK; KAIST, South Korea; University of Tübingen, Germany; University of Edinburgh, UK; Uber AI Labs, USA) We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem. ![]() ![]() ![]() |
|
Yee, Ming-Ho |
![]() Olivier Flückiger (Northeastern University, USA; Inria, France; Czech Technical University, Czechia) High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition. ![]() ![]() |
|
Ying, Mingsheng |
![]() Yangjia Li and Mingsheng Ying (Institute of Software at Chinese Academy of Sciences, China; University of Technology Sydney, Australia; Tsinghua University, China) We introduce the notion of linear ranking super-martingale (LRSM) for quantum programs (with nondeterministic choices, namely angelic and demonic choices). Several termination theorems are established showing that the existence of the LRSMs of a quantum program implies its termination. Thus, the termination problems of quantum programs is reduced to realisability and synthesis of LRSMs. We further show that the realisability and synthesis problem of LRSMs for quantum programs can be reduced to an SDP (Semi-Definite Programming) problem, which can be settled with the existing SDP solvers. The techniques developed in this paper are used to analyse the termination of several example quantum programs, including quantum random walks and quantum Bernoulli factory for random number generation. This work is essentially a generalisation of constraint-based approach to the corresponding problems for probabilistic programs developed in the recent literature by adding two novel ideas: (1) employing the fundamental Gleason's theorem in quantum mechanics to guide the choices of templates; and (2) a generalised Farkas' lemma in terms of observables (Hermitian operators) in quantum physics. ![]() ![]() |
|
![]() |
Zdancewic, Steve |
![]() Anders Miltner, Kathleen Fisher, Benjamin C. Pierce (Princeton University, USA; Tufts University, USA; University of Pennsylvania, USA) Bidirectional transformations between different data representations occur frequently in modern software systems. They appear as serializers and deserializers, as parsers and pretty printers, as database views and view updaters, and as a multitude of different kinds of ad hoc data converters. Manually building bidirectional transformations---by writing two separate functions that are intended to be inverses---is tedious and error prone. A better approach is to use a domain-specific language in which both directions can be written as a single expression. However, these domain-specific languages can be difficult to program in, requiring programmers to manage fiddly details while working in a complex type system. We present an alternative approach. Instead of coding transformations manually, we synthesize them from declarative format descriptions and examples. Specifically, we present Optician, a tool for type-directed synthesis of bijective string transformers. The inputs to Optician are a pair of ordinary regular expressions representing two data formats and a few concrete examples for disambiguation. The output is a well-typed program in Boomerang (a bidirectional language based on the theory of lenses). The main technical challenge involves navigating the vast program search space efficiently. In particular, and unlike most prior work on type-directed synthesis, our system operates in the context of a language with a rich equivalence relation on types (the theory of regular expressions). Consequently, program synthesis requires search in two dimensions: First, our synthesis algorithm must find a pair of "syntactically compatible types," and second, using the structure of those types, it must find a type- and example-compliant term. Our key insight is that it is possible to reduce the size of this search space without losing any computational power by defining a new language of lenses designed specifically for synthesis. The new language is free from arbitrary function composition and operates only over types and terms in a new disjunctive normal form. We prove (1) our new language is just as powerful as a more natural, compositional, and declarative language and (2) our synthesis algorithm is sound and complete with respect to the new language. We also demonstrate empirically that our new language changes the synthesis problem from one that admits intractable solutions to one that admits highly efficient solutions, able to synthesize intricate lenses between complex file formats in seconds. We evaluate Optician on a benchmark suite of 39 examples that includes both microbenchmarks and realistic examples derived from other data management systems including Flash Fill, a tool for synthesizing string transformations in spreadsheets, and Augeas, a tool for bidirectional processing of Linux system configuration files. ![]() ![]() ![]() |
Zohar, Yoni |
![]() Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar (Tel Aviv University, Israel; VMware, USA; Stanford University, USA) Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object's local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contract executions, excluding these of the DAO or of contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state. ![]() ![]() ![]() |
|
Zuleger, Florian |
![]() Ivan Radiček, Gilles Barthe (Vienna University of Technology, Austria; IMDEA Software Institute, Spain; SUNY Buffalo, USA; MPI-SWS, Germany) Formal frameworks for cost analysis of programs have been widely studied in the unary setting and, to a limited extent, in the relational setting. However, many of these frameworks focus only on the cost aspect, largely side-lining functional properties that are often a prerequisite for cost analysis, thus leaving many interesting programs out of their purview. In this paper, we show that elegant, simple, expressive proof systems combining cost analysis and functional properties can be built by combining already known ingredients: higher-order refinements and cost monads. Specifically, we derive two syntax-directed proof systems, UC and RC, for unary and relational cost analysis, by adding a cost monad to a (syntax-directed) logic of higher-order programs. We study the metatheory of the systems, show that several nontrivial examples can be verified in them, and prove that existing frameworks for cost analysis (RelCost and RAML) can be embedded in them. ![]() ![]() |
228 authors
proc time: 66.93