Powered by
Proceedings of the ACM on Programming Languages, Volume 7, Number OOPSLA1,
October 22–27, 2023,
Cascais, Portugal
Frontmatter
Papers
Accelerating Fuzzing through Prefix-Guided Execution
Shaohua Li
and Zhendong Su
(ETH Zurich, Switzerland)
Coverage-guided fuzzing is one of the most effective approaches for discovering software defects and vulnerabilities. It executes all mutated tests from seed inputs to expose coverage-increasing tests. However, executing all mutated tests incurs significant performance penalties---most of the mutated tests are discarded because they do not increase code coverage. Thus, determining if a test increases code coverage without actually executing it is beneficial, but a paradoxical challenge. In this paper, we introduce the notion of prefix-guided execution (PGE) to tackle this challenge. PGE leverages two key observations: (1) Only a tiny fraction of the mutated tests increase coverage, thus requiring full execution; and (2) whether a test increases coverage may be accurately inferred from its partial execution. PGE monitors the execution of a test and applies early termination when the execution prefix indicates that the test is unlikely to increase coverage.
To demonstrate the potential of PGE, we implement a prototype on top of AFL++, which we call AFL++-PGE. We evaluate AFL++-PGE on MAGMA, a ground-truth benchmark set that consists of 21 programs from nine popular real-world projects. Our results show that, after 48 hours of fuzzing, AFL++-PGE finds more bugs, discovers bugs faster, and achieves higher coverage.
Prefix-guided execution is general and can benefit the AFL-based family of fuzzers.
Published Artifact
Artifacts Available
Artifacts Functional
Solving Conditional Linear Recurrences for Program Verification: The Periodic Case
Chenglin Wang
and Fangzhen Lin
(Hong Kong University of Science and Technology, China)
In program verification, one method for reasoning about loops is to convert them into sets of recurrences, and
then try to solve these recurrences by computing their closed-form solutions.
While there are solvers for computing closed-form solutions to these recurrences,
their capabilities are limited when the recurrences have
conditional expressions, which arise when the body of a loop contains
conditional statements.
In this paper, we take a step towards solving these recurrences. Specifically, we consider what we call
conditional linear recurrences and show that given such a recurrence and an initial value, if the
index sequence generated by the recurrence on the initial value is what we call ultimately periodic,
then it has a closed-form solution. However, checking whether such a sequence is
ultimately periodic is undecidable so we propose a heuristic "generate and verify" algorithm for
checking the ultimate periodicity of the sequence and computing closed-form solutions at the same time.
We implemented a solver based on this algorithm, and
our experiments show that a straightforward program verifier based on our solver and using the SMT solver Z3
is effective in verifying properties of many benchmark programs that contain conditional statements in their loops,
and compares favorably to other recurrence-based verification tools. Finally, we also consider extending
our results to computing closed-form solutions of recurrences with unknown initial values.
Published Artifact
Artifacts Available
Artifacts Reusable
Generating Proof Certificates for a Language-Agnostic Deductive Program Verifier
Zhengyao Lin
,
Xiaohong Chen , Minh-Thai Trinh
, John Wang
, and Grigore Roşu
(Carnegie Mellon University, USA; University of Illinois at Urbana-Champaign, USA; Advanced Digital Sciences Center, Singapore)
Previous work on rewriting and reachability logic establishes a vision for a language-agnostic program verifier, which takes three inputs: a program, its formal specification, and the formal semantics of the programming language in which the program is written. The verifier then uses a language-agnostic verification algorithm to prove the program correct with respect to the specification and the formal language semantics. Such a complex verifier can easily have bugs. This paper proposes a method to certify the correctness of each successful verification run by generating a proof certificate. The proof certificate can be checked by a small proof checker. The preliminary experiments apply the method to generate proof certificates for program verification in an imperative language, a functional language, and an assembly language, showing that the proposed method is language-agnostic.
Published Artifact
Archive submitted (1.1 MB)
Artifacts Available
Artifacts Reusable
Hybrid Multiparty Session Types: Compositionality for Protocol Specification through Endpoint Projection
Lorenzo Gheri and Nobuko Yoshida
(University of Oxford, UK)
Multiparty session types (MPST) are a specification and verification framework for distributed message-passing systems. The communication protocol of the system is specified as a global type, from which a collection of local types (local process implementations) is obtained by endpoint projection. A global type is a single disciplining entity for the whole system, specified by one designer that has full knowledge of the communication protocol. On the other hand, distributed systems are often described in terms of their components: a different designer is in charge of providing a subprotocol for each component. The problem of modular specification of global protocols has been addressed in the literature, but the state of the art focuses only on dual input/output compatibility. Our work overcomes this limitation. We propose the first MPST theory of multiparty compositionality for distributed protocol specification that is semantics-preserving, allows the composition of two or more components, and retains full MPST expressiveness. We introduce hybrid types for describing subprotocols interacting with each other, define a novel compatibility relation, explicitly describe an algorithm for composing multiple subprotocols into a well-formed global type, and prove that compositionality preserves projection, thus retaining semantic guarantees, such as liveness and deadlock freedom. Finally, we test our work against real-world case studies and we smoothly extend our novel compatibility to MPST with delegation and explicit connections.
Languages with Decidable Learning: A Meta-theorem
Paul Krogmeier
and
P. Madhusudan
(University of Illinois at Urbana-Champaign, USA)
We study expression learning problems with syntactic restrictions and introduce the class of finite-aspect checkable languages to characterize symbolic languages that admit decidable learning. The semantics of such languages can be defined using a bounded amount of auxiliary information that is independent of expression size but depends on a fixed structure over which evaluation occurs. We introduce a generic programming language for writing programs that evaluate expression syntax trees, and we give a meta-theorem that connects such programs for finite-aspect checkable languages to finite tree automata, which allows us to derive new decidable learning results and decision procedures for several expression learning problems by writing programs in the programming language.
Enabling Bounded Verification of Doubly-Unbounded Distributed Agreement-Based Systems via Bounded Regions
Christopher Wagner
, Nouraldin Jaber
, and Roopsha Samanta
(Purdue University, USA)
The ubiquity of distributed agreement protocols, such as consensus, has galvanized interest in verification of such protocols as well as applications built on top of them. The complexity and unboundedness of such systems, however, makes their verification onerous in general, and, particularly prohibitive for full automation. An exciting, recent breakthrough reveals that, through careful modeling, it becomes possible to reduce verification of interesting distributed agreement-based (DAB) systems, that are unbounded in the number of processes, to model checking of small, finite-state systems. It is an open question if such reductions are also possible for DAB systems that are doubly-unbounded, in particular, DAB systems that additionally have unbounded data domains. We answer this question in the affirmative in this work thereby broadening the class of DAB systems which can be automatically and efficiently verified. We present a novel reduction which leverages value symmetry and a new notion of data saturation to reduce verification of doubly-unbounded DAB systems to model checking of small, finite-state systems. We develop a tool, Venus, that can efficiently verify sophisticated DAB system models such as the arbitration mechanism for a consortium blockchain, a distributed register, and a simple key-value store.
Published Artifact
Artifacts Available
Artifacts Reusable
User-Customizable Transpilation of Scripting Languages
Bo Wang
, Aashish Kolluri
, Ivica Nikolić
, Teodora Baluta
, and
Prateek Saxena
(National University of Singapore, Singapore)
A transpiler converts code from one programming language to another. Many practical uses of transpilers require the user to be able to guide or customize the program produced from a given input program. This customizability is important for satisfying many application-specific goals for the produced code such as ensuring performance, readability, ease of exposition or maintainability, compatibility with external environment or analysis tools, and so on. Conventional transpilers are deterministic rule-driven systems often written without offering customizability per user and per program. Recent advances in transpilers based on neural networks offer some customizability to users, e.g. through interactive prompts, but they are still difficult to precisely control the production of a desired output. Both conventional and neural transpilation also suffer from the "last mile" problem: they produce correct code on average, i.e., on most parts of a given program, but not necessarily for all parts of it. We propose a new transpilation approach that offers fine-grained customizability and reusability of transpilation rules created by others, without burdening the user to understand the global semantics of the given source program. Our approach is mostly automatic and incremental, i.e., constructs translation rules needed to transpile the given program as per the user's guidance piece-by-piece. Users can rely on existing transpilation rules to translate most of the program correctly while focusing their effort locally, only on parts that are incorrect or need customization. This improves the correctness of the end result. We implement the transpiler as a tool called DuoGlot, which translates Python to Javascript programs, and evaluate it on the popular GeeksForGeeks benchmarks. DuoGlot achieves 90% translation accuracy and so it outperforms all existing translators (both handcrafted and neural-based), while it produces readable code. We evaluate DuoGlot on two additional benchmarks, containing more challenging and longer programs, and similarly observe improved accuracy compared to the other transpilers.
Published Artifact
Artifacts Available
Artifacts Functional
Bidirectional Object-Oriented Programming: Towards Programmatic and Direct Manipulation of Objects
Xing Zhang
, Guanchen Guo
, Xiao He
, and Zhenjiang Hu
(Peking University, China; University of Science and Technology Beijing, China)
Many bidirectional programming languages, which are mainly functional and relational, have been designed to support writing programs that run in both forward and backward directions. Nevertheless, there is little study on the bidirectionalization of object-oriented languages that are more popular in practice. This paper presents the first bidirectional object-oriented language that supports programmatic and direct manipulation of objects. Specifically, we carefully extend a core object-oriented language, which has a standard forward evaluation semantics, with backward updating semantics for class inheritance hierarchies and references. We formally prove that the bidirectional evaluation semantics satisfies the round-tripping properties if the output is altered consistently. To validate the utility of our approach, we have developed a tool called BiOOP for generating HTML documents through bidirectional GUI design. We evaluate the expressiveness and effectiveness of BiOOP for HTML webpage development by reproducing ten classic object-oriented applications from a Java Swing tutorial and one large project from GitHub. The experimental results show the response time of direct manipulation programming on object-oriented programs that produce HTML webpages is acceptable for developers.
Published Artifact
Artifacts Available
A Gradual Probabilistic Lambda Calculus
Wenjia Ye
, Matías Toro
, and Federico Olmedo
(University of Hong Kong, China; University of Chile, Chile)
Probabilistic programming languages have recently gained a lot of attention, in particular due to their applications in domains such as machine learning and differential privacy. To establish invariants of interest, many such languages include some form of static checking in the form of type systems. However, adopting such a type discipline can be cumbersome or overly conservative.
Gradual typing addresses this problem by supporting a smooth transition between static and dynamic checking, and has been successfully applied for languages with different constructs and type abstractions. Nevertheless, its benefits have never been explored in the context of probabilistic languages.
In this work, we present and formalize GPLC, a gradual source probabilistic lambda calculus. GPLC includes a binary probabilistic choice operator and allows programmers to gradually introduce/remove static type–and probability–annotations. The static semantics of GPLC heavily relies on the notion of probabilistic couplings, as required for defining several relations, such as consistency, precision, and consistent transitivity. The dynamic semantics of GPLC is given via elaboration to the target language TPLC, which features a distribution-based semantics interpreting programs as probability distributions over final values. Regarding the language metatheory, we establish that TPLC–and therefore also GPLC–is type safe and satisfies two of the so-called refined criteria for gradual languages, namely, that it is a conservative extension of a fully static variant and that it satisfies the gradual guarantee, behaving smoothly with respect to type precision.
Verus: Verifying Rust Programs using Linear Ghost Types
Andrea Lattuada
, Travis Hance
, Chanhee Cho
, Matthias Brun
, Isitha Subasinghe
, Yi Zhou
, Jon Howell
,
Bryan Parno , and Chris Hawblitzel
(VMware Research, Switzerland; Carnegie Mellon University, USA; ETH Zurich, Switzerland; UNSW Sydney, Australia; VMware Research, USA; Microsoft Research, USA)
The Rust programming language provides a powerful type system that checks linearity and borrowing, allowing code to safely manipulate memory without garbage collection and making Rust ideal for developing low-level, high-assurance systems. For such systems, formal verification can be useful to prove functional correctness properties beyond type safety. This paper presents Verus, an SMT-based tool for formally verifying Rust programs.
With Verus, programmers express proofs and specifications using the Rust language, allowing proofs to take advantage of Rust's linear types and borrow checking. We show how this allows proofs to manipulate linearly typed permissions that let Rust code safely manipulate memory, pointers, and concurrent resources. Verus organizes proofs and specifications using a novel mode system that distinguishes specifications, which are not checked for linearity and borrowing, from executable code and proofs, which are checked for linearity and borrowing.
We formalize Verus' linearity, borrowing, and modes in a small lambda calculus, for which we prove type safety and termination of specifications and proofs. We demonstrate Verus on a series of examples, including pointer-manipulating code (an xor-based doubly linked list), code with interior mutability, and concurrent code.
Published Artifact
Archive submitted (61 kB)
Artifacts Available
Artifacts Reusable
Fat Pointers for Temporal Memory Safety of C
Jie Zhou
, John Criswell
, and
Michael Hicks
(University of Rochester, USA; Amazon, USA; University of Maryland, USA)
Temporal memory safety bugs, especially use-after-free and double free bugs, pose a major security threat to C programs. Real-world exploits utilizing these bugs enable attackers to read and write arbitrary memory locations, causing disastrous violations of confidentiality, integrity, and availability. Many previous solutions retrofit temporal memory safety to C, but they all either incur high performance overhead and/or miss detecting certain types of temporal memory safety bugs.
In this paper, we propose a temporal memory safety solution that is both efficient and comprehensive. Specifically, we extend Checked C, a spatially-safe extension to C, with temporally-safe pointers. These are implemented by combining two techniques: fat pointers and dynamic key-lock checks. We show that the fat-pointer solution significantly improves running time and memory overhead compared to the disjoint-metadata approach that provides the same level of protection. With empirical program data and hands-on experience porting real-world applications, we also show that our solution is practical in terms of backward compatibility---one of the major complaints about fat pointers.
Published Artifact
Artifacts Available
Artifacts Functional
Modular Component-Based Quantum Circuit Synthesis
Chan Gu Kang
and Hakjoo Oh
(Korea University, South Korea)
In this article, we present a novel method for synthesizing quantum circuits from user-supplied components. Given input-output state vectors and component quantum gates, our synthesizer aims to construct a quantum circuit that implements the provided functionality in terms of the supplied component gates. To achieve this, we basically use an enumerative search with pruning. To accelerate the procedure, however, we perform the search and pruning at the module level; instead of simply enumerating candidate circuits by appending component gates in sequence, we stack modules, which are groups of gate operations.
With this modular approach, we can effectively reduce the search space by directing the search in a way that bridges the gap between the current circuit and the input-output specification.
Evaluation on 17 benchmark problems shows that our technique is highly effective at synthesizing quantum circuits. Our method successfully synthesized 16 out of 17 benchmark circuits in 96.6 seconds on average. On the other hand, the conventional, gate-level synthesis algorithm succeeded in 10 problems with an average time of 639.1 seconds. Our algorithm increased the speed of the baseline by 20.3x for the 10 problems commonly solved by both approaches.
Published Artifact
Artifacts Available
Artifacts Reusable
A Verification Methodology for the Arm® Confidential Computing Architecture: From a Secure Specification to Safe Implementations
Anthony C. J. Fox
, Gareth Stockwell
, Shale Xiong
, Hanno Becker
, Dominic P. Mulligan
, Gustavo Petri
, and Nathan Chong
(ARM, UK; Amazon Web Services, UK; Amazon Web Services, USA)
We present Arm's efforts in verifying the specification and prototype reference implementation of the Realm Management Monitor (RMM), an essential firmware component of Arm Confidential Computing Architecture (Arm CCA), the recently-announced Confidential Computing technologies incorporated in the Armv9-A architecture. Arm CCA introduced the Realm Management Extension (RME), an architectural extension for Armv9-A, and a technology that will eventually be deployed in hundreds of millions of devices. Given the security-critical nature of the RMM, and its taxing threat model, we use a combination of interactive theorem proving, model checking, and concurrency-aware testing to validate and verify security and safety properties of both the specification and a prototype implementation of the RMM. Crucially, our verification efforts were, and are still being, developed and refined contemporaneously with active development of both specification and implementation, and have been adopted by Arm's product teams.
We describe our major achievements, realized through the application of formal techniques, as well as challenges that remain for future work. We believe that the work reported in this paper is the most thorough application of formal techniques to the design and implementation of any current commercially-viable Confidential Computing implementation, setting a new high-water mark for work in this area.
Artifacts Reusable
Compositional Security Definitions for Higher-Order Where Declassification
Jan Menz
, Andrew K. Hirsch
, Peixuan Li
, and
Deepak Garg
(MPI-SWS, Germany; University at Buffalo, USA; Pennsylvania State University, USA)
To ensure programs do not leak private data, we often want to be able to provide formal guarantees ensuring such data is handled correctly. Often, we cannot keep such data secret entirely; instead programmers specify how private data may be declassified. While security definitions for declassification exist, they mostly do not handle higher-order programs. In fact, in the higher-order setting no compositional security definition exists for intensional information-flow properties such as where declassification, which allows declassification in specific parts of a program. We use logical relations to build a model (and thus security definition) of where declassification. The key insight required for our model is that we must stop enforcing indistinguishability once a relevant declassification has occurred. We show that the resulting security definition provides more security than the most related previous definition, which is for the lower-order setting.
Deep Learning Robustness Verification for Few-Pixel Attacks
Yuval Shapira
, Eran Avneri
, and Dana Drachsler-Cohen
(Technion, Israel)
While successful, neural networks have been shown to be vulnerable to adversarial example attacks. In L_{0} adversarial attacks, also known as few-pixel attacks, the attacker picks t pixels from the image and arbitrarily perturbs them. To understand the robustness level of a network to these attacks, it is required to check the robustness of the network to perturbations of every set of t pixels. Since the number of sets is exponentially large, existing robustness verifiers, which can reason about a single set of pixels at a time, are impractical for L_{0} robustness verification. We introduce Calzone, an L_{0} robustness verifier for neural networks. To the best of our knowledge, Calzone is the first to provide a sound and complete analysis for L_{0} adversarial attacks. Calzone builds on the following observation: if a classifier is robust to any perturbation of a set of k pixels, for k>t, then it is robust to any perturbation of its subsets of size t. Thus, to reduce the verification time, Calzone predicts the largest k that can be proven robust, via dynamic programming and sampling. It then relies on covering designs to compute a covering of the image with sets of size k. For each set in the covering, Calzone submits its corresponding box neighborhood to an existing L_{∞} robustness verifier. If a set’s neighborhood is not robust, Calzone repeats this process and covers this set with sets of size k′<k. We evaluate Calzone on several datasets and networks, for t≤ 5. Typically, Calzone verifies L_{0} robustness within few minutes. On our most challenging instances (e.g., t=5), Calzone completes within few hours. We compare to a MILP baseline and show that it does not scale already for t=3.
Proof Automation for Linearizability in Separation Logic
Ike Mulder and
Robbert Krebbers
(Radboud University Nijmegen, Netherlands)
Recent advances in concurrent separation logic enabled the formal verification of increasingly sophisticated fine-grained (i.e., lock-free) concurrent programs. For such programs, the golden standard of correctness is linearizability, which expresses that concurrent executions always behave as some valid sequence of sequential executions. Compositional approaches to linearizability (such as contextual refinement and logical atomicity) make it possible to prove linearizability of whole programs or compound data structures (e.g., a ticket lock) using proofs of linearizability of their individual components (e.g., a counter). While powerful, these approaches are also laborious—state-of-the-art tools such as Iris, FCSL, and Voila all require a form of interactive proof.
This paper develops proof automation for contextual refinement and logical atomicity in Iris. The key ingredient of our proof automation is a collection of proof rules whose application is directed by both the program and the logical state. This gives rise to effective proof search strategies that can prove linearizability of simple examples fully automatically. For more complex examples, we ensure the proof automation cooperates well with interactive proof tactics by minimizing the use of backtracking.
We implement our proof automation in Coq by extending and generalizing Diaframe, a proof automation extension for Iris. While the old version (Diaframe 1.0) was limited to ordinary Hoare triples, the new version (Diaframe 2.0) is extensible in its support for program verification styles: our proof search strategies for contextual refinement and logical atomicity are implemented as modules for Diaframe 2.0. We evaluate our proof automation on a set of existing benchmarks and novel proofs, showing that it provides significant reduction of proof work for both approaches to linearizability.
Published Artifact
Artifacts Available
Artifacts Reusable
Regular Expression Matching using Bit Vector Automata
Alexis Le Glaunec
,
Lingkun Kong , and Konstantinos Mamouras
(Rice University, USA)
Regular expressions (regexes) are ubiquitous in modern software. There is a variety of implementation techniques for regex matching, which can be roughly categorized as (1) relying on backtracking search, or (2) being based on finite-state automata. The implementations that use backtracking are often chosen due to their ability to support advanced pattern-matching constructs. Unfortunately, they are known to suffer from severe performance problems. For some regular expressions, the running time for matching can be exponential in the size of the input text. In order to provide stronger guarantees of matching efficiency, automata-based regex matching is the preferred choice. However, even these regex engines may exhibit severe performance degradation for some patterns. The main reason for this is that regexes used in practice are not exclusively built from the classical regular constructs, i.e., concatenation, nondeterministic choice and Kleene's star. They involve additional constructs that provide succinctness and convenience of expression. The most common such construct is bounded repetition (also called counting), which describes the repetition of the pattern a fixed number of times.
In this paper, we propose a new algorithm for the efficient matching of regular expressions that involve bounded repetition. Our algorithms are based on a new model of automata, which we call nondeterministic bit vector automata (NBVA). This model is chosen to be expressively equivalent to nondeterministic counter automata with bounded counters, a very natural model for expressing patterns with bounded repetition. We show that there is a class of regular expressions with bounded repetition that can be matched in time that is independent from the repetition bounds. Our algorithms are general enough to cover the vast majority of challenging bounded repetitions that arise in practice. We provide an implementation of our approach in a regex engine, which we call BVA-Scan. We compare BVA-Scan against state-of-the-art regex engines on several real datasets.
Artifacts Functional
Outcome Logic: A Unifying Foundation for Correctness and Incorrectness Reasoning
Noam Zilberstein
,
Derek Dreyer , and Alexandra Silva
(Cornell University, USA; MPI-SWS, Germany)
Program logics for bug-finding (such as the recently introduced Incorrectness Logic) have framed correctness and incorrectness as dual concepts requiring different logical foundations. In this paper, we argue that a single unified theory can be used for both correctness and incorrectness reasoning. We present Outcome Logic (OL), a novel generalization of Hoare Logic that is both monadic (to capture computational effects) and monoidal (to reason about outcomes and reachability). OL expresses true positive bugs, while retaining correctness reasoning abilities as well. To formalize the applicability of OL to both correctness and incorrectness, we prove that any false OL specification can be disproven in OL itself. We also use our framework to reason about new types of incorrectness in nondeterministic and probabilistic programs. Given these advances, we advocate for OL as a new foundational theory of correctness and incorrectness.
Aliasing Limits on Translating C to Safe Rust
Mehmet Emre
, Peter Boyland
, Aesha Parekh
, Ryan Schroeder
, Kyle Dewey
, and Ben Hardekopf
(University of San Francisco, USA; University of California at Santa Barbara, USA; California State University, Northridge, USA)
The Rust language was created to provide safe low-level systems programming. There is both industrial and academic interest in the problem of (semi-)automatically translating C code to Rust in order to exploit Rust's safety guarantees. We study the effectiveness and limitations of existing techniques for automatically translating unsafe raw pointers (in Rust programs translated from C) into safe Rust references via ownership and lifetime inference. Our novel evaluation methodology enables our study to extend beyond prior studies, and to discover new information contradicting the conclusions of prior studies. We find that existing translation methods are severely limited by a lack of precision in the Rust compiler's safety checker, causing many safe pointer manipulations to be labeled as potentially unsafe. Leveraging this information, we propose methods for improving translation, based on encoding the results of a more precise analysis in a manner that is understandable to an unmodified Rust compiler. We implement one of our proposed methods, increasing the number of pointers that can be translated to safe Rust references by 75% over the baseline (from 12% to 21% of all pointers).
Published Artifact
Artifacts Available
Artifacts Functional
Automated Translation of Functional Big Data Queries to SQL
Guoqiang Zhang
, Benjamin Mariano
,
Xipeng Shen , and Işıl Dillig
(North Carolina State University, USA; University of Texas at Austin, USA)
Big data analytics frameworks like Apache Spark and Flink enable users to implement queries over large, distributed databases using functional APIs. In recent years, these APIs have grown in popularity because their functional interfaces abstract away much of the minutiae of distributed programming required by traditional query languages like SQL. However, the convenience of these APIs comes at a cost because functional queries are often less efficient than their SQL counterparts. Motivated by this observation, we present a new technique for automatically transpiling functional queries to SQL. While our approach is based on the standard paradigm of counterexample-guided inductive synthesis, it uses a novel column-wise decomposition technique to split the synthesis task into smaller subquery synthesis problems. We have implemented this approach as a new tool called RDD2SQL for translating Spark RDD queries to SQL and empirically evaluate the effectiveness of RDD2SQL on a set of real-world RDD queries. Our results show that (1) most RDD queries can be translated to SQL, (2) our tool is very effective at automating this translation, and (3) performing this translation offers significant performance benefits.
Archive submitted (880 kB)
Live Pattern Matching with Typed Holes
Yongwei Yuan
, Scott Guest
, Eric Griffis
, Hannah Potter
, David Moon
, and
Cyrus Omar
(Purdue University, USA; University of Michigan, USA; University of Washington, USA)
Several modern programming systems, including GHC Haskell, Agda, Idris, and Hazel, support typed holes. Assigning static and, to varying degree, dynamic meaning to programs with holes allows program editors and other tools to offer meaningful feedback and assistance throughout editing, i.e. in a live manner. Prior work, however, has considered only holes appearing in expressions and types. This paper considers, from type theoretic and logical first principles, the problem of typed pattern holes. We confront two main difficulties, (1) statically reasoning about exhaustiveness and irredundancy when patterns are not fully known, and (2) live evaluation of expressions containing both pattern and expression holes. In both cases, this requires reasoning conservatively about all possible hole fillings. We develop a typed lambda calculus, Peanut, where reasoning about exhaustiveness and redundancy is mapped to the problem of deriving first order entailments. We equip Peanut with an operational semantics in the style of Hazelnut Live that allows us to evaluate around holes in both expressions and patterns. We mechanize the metatheory of Peanut in Agda and formalize a procedure capable of deciding the necessary entailments. Finally, we scale up and implement these mechanisms within Hazel, a programming environment for a dialect of Elm that automatically inserts holes during editing to provide static and dynamic feedback to the programmer in a maximally live manner, i.e. for every possible editor state. Hazel is the first maximally live environment for a general-purpose functional language.
Published Artifact
Artifacts Available
Artifacts Reusable
Pushing the Limit of 1-Minimality of Language-Agnostic Program Reduction
Zhenyang Xu
, Yongqiang Tian
, Mengxiao Zhang
, Gaosen Zhao
, Yu Jiang
, and Chengnian Sun
(University of Waterloo, Canada; Tsinghua University, China)
Program reduction has demonstrated its usefulness in facilitating debugging language implementations in practice, by minimizing bug-triggering programs. There are two categories of program reducers: language-agnostic program reducers (AGRs) and language-specific program reducers (SPRs). AGRs, such as HDD and Perses, are generally applicable to various languages; SPRs are specifically designed for one language with meticulous thoughts and significant engineering efforts, e.g., C-Reduce for reducing C/C++ programs.
Program reduction is an NP-complete problem: finding the globally minimal program is usually infeasible. Thus all existing program reducers resort to producing 1-minimal results, a special type of local minima. However, 1-minimality can still be large and contain excessive bug-irrelevant program elements. This is especially the case for AGR-produced results because of the generic reduction algorithms used in AGRs. An SPR often yields smaller results than AGRs for the language for which the SPR has customized reduction algorithms. But SPRs are not language-agnostic, and implementing a new SPR for a different language requires significant engineering efforts.
This paper proposes Vulcan, a language-agnostic framework to further minimize AGRs-produced results by exploiting the formal syntax of the language to perform aggressive program transformations, in hope of creating reduction opportunities for other reduction algorithms to progress or even directly deleting bugirrelevant elements from the results. Our key insights are two-fold. First, the program transformations in all existing program reducers including SPRs are not diverse enough, which traps these program reducers early in 1-minimality. Second, compared with the original program, the results of AGRs are much smaller, and time-wise it is affordable to perform diverse program transformations that change programs but do not necessarily reduce the sizes of the programs directly. Within the Vulcan framework, we proposed three simple examples of fine-grained program transformations to demonstrate that Vulcan can indeed further push the 1-minimality of AGRs. By performing these program transformations, a 1-minimal program might become a non-1-minimal one that can be further reduced later.
Our extensive evaluations on multilingual benchmarks including C, Rust and SMT-LIBv2 programs strongly demonstrate the effectiveness and generality of Vulcan. Vulcan outperforms the state-of-the-art language-agnostic program reducer Perses in size in all benchmarks: On average, the result of Vulcan contains 33.55%, 21.61%, and 31.34% fewer tokens than that of Perses on C, Rust, and SMT-LIBv2 subjects respectively. Vulcan can produce even smaller results if more reduction time is allocated. Moreover, for the C programs that are reduced by C-Reduce, Vulcan is even able to further minimize them by 10.07%.
Exact Recursive Probabilistic Programming
David Chiang
, Colin McDonald
, and Chung-chieh Shan
(University of Notre Dame, USA; Indiana University, USA)
Recursive calls over recursive data are useful for generating probability distributions, and probabilistic programming allows computations over these distributions to be expressed in a modular and intuitive way. Exact inference is also useful, but unfortunately, existing probabilistic programming languages do not perform exact inference on recursive calls over recursive data, forcing programmers to code many applications manually. We introduce a probabilistic language in which a wide variety of recursion can be expressed naturally, and inference carried out exactly. For instance, probabilistic pushdown automata and their generalizations are easy to express, and polynomial-time parsing algorithms for them are derived automatically. We eliminate recursive data types using program transformations related to defunctionalization and refunctionalization. These transformations are assured correct by a linear type system, and a successful choice of transformations, if there is one, is guaranteed to be found by a greedy algorithm.
Published Artifact
Archive submitted (520 kB)
Info
Artifacts Available
Artifacts Functional
Lower Bounds for Possibly Divergent Probabilistic Programs
Shenghua Feng
, Mingshuai Chen
, Han Su
, Benjamin Lucien Kaminski
, Joost-Pieter Katoen
, and Naijun Zhan
(Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Zhejiang University, China; Saarland University, Germany; University College London, UK; RWTH Aachen University, Germany)
We present a new proof rule for verifying lower bounds on quantities of probabilistic programs. Our proof rule is not confined to almost-surely terminating programs -- as is the case for existing rules -- and can be used to establish non-trivial lower bounds on, e.g., termination probabilities and expected values, for possibly divergent probabilistic loops, e.g., the well-known three-dimensional random walk on a lattice.
Algebro-geometric Algorithms for Template-Based Synthesis of Polynomial Programs
Amir Kafshdar Goharshady
, S. Hitarth
, Fatemeh Mohammadi
, and Harshit Jitendra Motwani
(Hong Kong University of Science and Technology, Hong Kong; KU Leuven, Belgium; Ghent University, Belgium)
Template-based synthesis, also known as sketching, is a localized approach to program synthesis in which the programmer provides not only a specification, but also a high-level "sketch" of the program. The sketch is basically a partial program that models the general intuition of the programmer, while leaving the low-level details as unimplemented "holes". The role of the synthesis engine is then to fill in these holes such that the completed program satisfies the desired specification. In this work, we focus on template-based synthesis of polynomial imperative programs with real variables, i.e. imperative programs in which all expressions appearing in assignments, conditions and guards are polynomials over program variables. While this problem can be solved in a sound and complete manner by a reduction to the first-order theory of the reals, the resulting formulas will contain a quantifier alternation and are extremely hard for modern SMT solvers, even when considering toy programs with a handful of lines. Moreover, the classical algorithms for quantifier elimination are notoriously unscalable and not at all applicable to this use-case.
In contrast, our main contribution is an algorithm, based on several well-known theorems in polyhedral and real algebraic geometry, namely Putinar's Positivstellensatz, the Real Nullstellensatz, Handelman's Theorem and Farkas' Lemma, which sidesteps the quantifier elimination difficulty and reduces the problem directly to Quadratic Programming (QP). Alternatively, one can view our algorithm as an efficient way of eliminating quantifiers in the particular formulas that appear in the synthesis problem. The resulting QP instances can then be handled quite easily by SMT solvers. Notably, our reduction to QP is sound and semi-complete, i.e. it is complete if polynomials of a sufficiently high degree are used in the templates. Thus, we provide the first method for sketching-based synthesis of polynomial programs that does not sacrifice completeness, while being scalable enough to handle meaningful programs. Finally, we provide experimental results over a variety of examples from the literature.
Published Artifact
Artifacts Available
Artifacts Reusable
Randomized Testing of Byzantine Fault Tolerant Algorithms
Levin N. Winter
, Florena Buse
, Daan de Graaf
, Klaus von Gleissenthall
, and Burcu Kulahcioglu Ozkan
(Delft University of Technology, Netherlands; Vrije Universiteit Amsterdam, Netherlands)
Byzantine fault-tolerant algorithms promise agreement on a correct value, even if a subset of processes can deviate from the algorithm arbitrarily. While these algorithms provide strong guarantees in theory, in practice, protocol bugs and implementation mistakes may still cause them to go wrong. This paper introduces ByzzFuzz, a simple yet effective method for automatically finding errors in implementations of Byzantine fault-tolerant algorithms through randomized testing. ByzzFuzz detects fault-tolerance bugs by injecting randomly generated network and process faults into their executions. To navigate the space of possible process faults, ByzzFuzz introduces small-scope message mutations which mutate the contents of the protocol messages by applying small changes to the original message either in value (e.g., by incrementing the round number) or in time (e.g., by repeating a proposal value from a previous message). We find that small-scope mutations, combined with insights from the testing and fuzzing literature, are effective at uncovering protocol logic and implementation bugs in real-world fault-tolerant systems.
We implemented ByzzFuzz and applied it to test the production implementations of two popular blockchain systems, Tendermint and Ripple, and an implementation of the seminal PBFT protocol. ByzzFuzz detected several bugs in the implementation of PBFT, a potential liveness violation in Tendermint, and materialized two theoretically described vulnerabilities in Ripple’s XRP Ledger Consensus Algorithm. Moreover, we discovered a previously unknown fault-tolerance bug in the production implementation of Ripple, which is confirmed by the developers and fixed.
Published Artifact
Artifacts Available
Artifacts Reusable
Verification-Preserving Inlining in Automatic Separation Logic Verifiers
Thibault Dardinier , Gaurav Parthasarathy
, and Peter Müller
(ETH Zurich, Switzerland)
Bounded verification has proved useful to detect bugs and to increase confidence in the correctness of a program. In contrast to unbounded verification, reasoning about calls via (bounded) inlining and about loops via (bounded) unrolling does not require method specifications and loop invariants and, therefore, reduces the annotation overhead to the bare minimum, namely specifications of the properties to be verified. For verifiers based on traditional program logics, verification is preserved by inlining (and unrolling): successful unbounded verification of a program w.r.t. some annotation implies successful verification of the inlined program. That is, any error detected in the inlined program reveals a true error in the original program. However, this essential property might not hold for automatic separation logic verifiers such as Caper, GRASShopper, RefinedC, Steel, VeriFast, and verifiers based on Viper. In this setting, inlining generally changes the resources owned by method executions, which may affect automatic proof search algorithms and introduce spurious errors.
In this paper, we present the first technique for verification-preserving inlining in automatic separation logic verifiers. We identify a semantic condition on programs and prove in Isabelle/HOL that it ensures verification-preserving inlining for state-of-the-art automatic separation logic verifiers. We also prove a dual result: successful verification of the inlined program ensures that there are method and loop annotations that enable the verification of the original program for bounded executions. To check our semantic condition automatically, we present two approximations that can be checked syntactically and with a program verifier, respectively. We implement these checks in Viper and demonstrate that they are effective for non-trivial examples from different verifiers.
Published Artifact
Artifacts Available
Artifacts Functional
Improving Oracle-Guided Inductive Synthesis by Efficient Question Selection
Ruyi Ji
, Chaozhe Kong
, Yingfei Xiong
, and Zhenjiang Hu
(Peking University, China)
Oracle-guided inductive synthesis (OGIS) is a widely-used framework to apply program synthesis techniques in practice. The question selection problem aims at reducing the number of iterations in OGIS by selecting a proper input for each OGIS iteration. Theoretically, a question selector can generally improve the performance of OGIS solvers on both interactive and non-interactive tasks if it is not only effective for reducing iterations but also efficient. However, all existing effective question selectors fail in satisfying the requirement of efficiency. To ensure effectiveness, they convert the question selection problem into an optimization one, which is difficult to solve within a short time.
In this paper, we propose a novel question selector, named LearnSy. LearnSy is both efficient and effective and thus achieves general improvement for OGIS solvers for the first time. Since we notice that the optimization tasks in previous studies are difficult because of the complex behavior of operators, we estimate these behaviors in LearnSy as simple random events. Subsequently, we provide theoretical results for the precision of this estimation and design an efficient algorithm for its calculation.
According to our evaluation, when dealing with interactive tasks, LearnSy can offer competitive performance compared to existing selectors while being more efficient and more general. Moreover, when working on non-interactive tasks, LearnSy can generally reduce the time cost of existing CEGIS solvers by up to 43.0%.
Published Artifact
Artifacts Available
Artifacts Functional
Back to Direct Style: Typed and Tight
Marius Müller
, Philipp Schuster
,
Jonathan Immanuel Brachthäuser , and Klaus Ostermann
(University of Tübingen, Germany)
Translating programs into continuation-passing style is a well-studied
tool to explicitly deal with the control structure of programs. This is
useful, for example, for compilation.
In a typed setting, there also is a logical interpretation of such a translation
as an embedding of classical logic into intuitionistic logic.
A naturally arising question is whether there is an inverse translation
back to direct style. The answer to this question depends on how the
continuation-passing translation is defined and on the domain of the inverse translation.
In general, translating programs from continuation-passing style back to direct
style requires the use of control operators to account for the use of
continuations in non-trivial ways.
We present two languages, one in direct style and one in continuation-passing
style. Both languages are typed and equipped with an abstract machine semantics.
Moreover, both languages allow for non-trivial control flow.
We further present a translation to continuation-passing style and a translation
back to direct style. We show that both translations are type-preserving and
also preserve semantics in a very precise way giving an operational
correspondence between the two languages.
Moreover, we show that the compositions of the translations are well-behaved.
In particular, they are syntactic one-sided inverses on the full language and full
syntactic inverses when restricted to trivial control flow.
proc time: 9.13