Proceedings of the ACM on Programming Languages, Volume 3, Number OOPSLA
Powered by
Conference Publishing Consulting

Proceedings of the ACM on Programming Languages, Volume 3, Number OOPSLA, October 23–25, 2019, Athens, Greece

OOPSLA – Journal Issue

Contents - Abstracts - Authors


Title Page

Editorial Message
The Proceedings of the ACM series presents the highest quality research conducted in diverse areas of computer science, as represented by the ACM Special Interest Groups (SIGs). The ACM Proceedings of the ACM on Programming Languages (PACMPL) focuses on research on all aspects of programming languages, from design to implementation and from mathematical formalisms to empirical studies. The journal operates in close collaboration with the Special Interest Group on Programming Languages (SIGPLAN) and is committed to making high-quality peer-reviewed scientific research in programming languages free of restrictions on both access and use.


CLOTHO: Directed Test Generation for Weakly Consistent Database Systems
Kia Rahmani, Kartik Nagar, Benjamin Delaware, and Suresh Jagannathan
(Purdue University, USA)
Relational database applications are notoriously difficult to test and debug. Concurrent execution of database transactions may violate complex structural invariants that constraint how changes to the contents of one (shared) table affect the contents of another. Simplifying the underlying concurrency model is one way to ameliorate the difficulty of understanding how concurrent accesses and updates can affect database state with respect to these sophisticated properties. Enforcing serializable execution of all transactions achieves this simplification, but it comes at a significant price in performance, especially at scale, where database state is often replicated to improve latency and availability.
To address these challenges, this paper presents a novel testing framework for detecting serializability violations in (SQL) database-backed Java applications executing on weakly-consistent storage systems. We manifest our approach in a tool, CLOTHO, that combines a static analyzer and model checker to generate abstract executions, discover serializability violations in these executions, and translate them back into concrete test inputs suitable for deployment in a test environment. To the best of our knowledge, CLOTHO, is the first automated test generation facility for identifying serializability anomalies of Java applications intended to operate in geo-replicated distributed environments. An experimental evaluation on a set of industry-standard benchmarks demonstrates the utility of our approach.

Publisher's Version Article Search Info
Probabilistic Verification of Fairness Properties via Concentration
Osbert Bastani, Xin Zhang, and Armando Solar-Lezama
(University of Pennsylvania, USA; Massachusetts Institute of Technology, USA)
As machine learning systems are increasingly used to make real world legal and financial decisions, it is of paramount importance that we develop algorithms to verify that these systems do not discriminate against minorities. We design a scalable algorithm for verifying fairness specifications. Our algorithm obtains strong correctness guarantees based on adaptive concentration inequalities; such inequalities enable our algorithm to adaptively take samples until it has enough data to make a decision. We implement our algorithm in a tool called VeriFair, and show that it scales to large machine learning models, including a deep recurrent neural network that is more than five orders of magnitude larger than the largest previously-verified neural network. While our technique only gives probabilistic guarantees due to the use of random samples, we show that we can choose the probability of error to be extremely small.

Publisher's Version Article Search
Verifying Safety and Accuracy of Approximate Parallel Programs via Canonical Sequentialization
Vimuth Fernando, Keyur Joshi, and Sasa Misailovic
(University of Illinois at Urbana-Champaign, USA)
We present Parallely, a programming language and a system for verification of approximations in parallel message-passing programs. Parallely's language can express various software and hardware level approximations that reduce the computation and communication overheads at the cost of result accuracy.
Parallely's safety analysis can prove the absence of deadlocks in approximate computations and its type system can ensure that approximate values do not interfere with precise values. Parallely's quantitative accuracy analysis can reason about the frequency and magnitude of error. To support such analyses, Parallely presents an approximation-aware version of canonical sequentialization, a recently proposed verification technique that generates sequential programs that capture the semantics of well-structured parallel programs (i.e., ones that satisfy a symmetric nondeterminism property). To the best of our knowledge, Parallely is the first system designed to analyze parallel approximate programs.
We demonstrate the effectiveness of Parallely on eight benchmark applications from the domains of graph analytics, image processing, and numerical analysis. We also encode and study five approximation mechanisms from literature. Our implementation of Parallely automatically and efficiently proves type safety, reliability, and accuracy properties of the approximate benchmarks.

Publisher's Version Article Search
Qubit Allocation as a Combination of Subgraph Isomorphism and Token Swapping
Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Caroline Collange, and Fernando Magno Quintão Pereira
(Federal University of Minas Gerais, Brazil; Inria, France; University of Rennes, France; CNRS, France; IRISA, France)
In 2016, the first quantum processors have been made available to the general public. The possibility of programming an actual quantum device has elicited much enthusiasm. Yet, such possibility also brought challenges. One challenge is the so called Qubit Allocation problem: the mapping of a virtual quantum circuit into an actual quantum architecture. There exist solutions to this problem; however, in our opinion, they fail to capitalize on decades of improvements on graph theory. In contrast, this paper shows how to model qubit allocation as the combination of Subgraph Isomorphism and Token Swapping. This idea has been made possible by the publication of an approximative solution to the latter problem in 2016. We have compared our algorithm against five other qubit allocators, all independently designed in the last two years, including the winner of the IBM Challenge. When evaluated in "Tokyo", a quantum architecture with 20 qubits, our technique outperforms these state-of-the-art approaches in terms of the quality of the solutions that it finds and the amount of memory that it uses, while showing practical runtime.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
Modular Verification of Heap Reachability Properties in Separation Logic
Arshavir Ter-Gabrielyan, Alexander J. Summers, and Peter Müller
(ETH Zurich, Switzerland)
The correctness of many algorithms and data structures depends on reachability properties, that is, on the existence of chains of references between objects in the heap. Reasoning about reachability is difficult for two main reasons. First, any heap modification may affect an unbounded number of reference chains, which complicates modular verification, in particular, framing. Second, general graph reachability is not supported by first-order SMT solvers, which impedes automatic verification.
In this paper, we present a modular specification and verification technique for reachability properties in separation logic. For each method, we specify reachability only locally within the fragment of the heap on which the method operates. We identify relative convexity, a novel relation between the heap fragments of a client and a callee, which enables (first-order) reachability framing, that is, extending reachability properties from the heap fragment of a callee to the larger fragment of its client, enabling precise procedure-modular reasoning. Our technique supports practically important heap structures, namely acyclic graphs with a bounded outdegree as well as (potentially cyclic) graphs with at most one path (modulo cycles) between each pair of nodes. The integration into separation logic allows us to reason about reachability and other properties in a uniform way, to verify concurrent programs, and to automate our technique via existing separation logic verifiers. We demonstrate that our verification technique is amenable to SMT-based verification by encoding a number of benchmark examples into the Viper verification infrastructure.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
Complete Monitors for Gradual Types
Ben Greenman, Matthias Felleisen, and Christos Dimoulas
(Northeastern University, USA; Northwestern University, USA)
In the context of gradual typing, type soundness guarantees the safety of typed code. When untyped code fails to respect types, a runtime check finds the discrepancy. As for untyped code, type soundness makes no promises; it does not protect untyped code from mistakes in type specifications and unwarranted blame.
To address the asymmetry, this paper adapts complete monitoring from the contract world to gradual typing. Complete monitoring strengthens plain soundness into a guarantee that catches problems with faulty type specifications. Furthermore, a semantics that satisfies complete monitoring can easily pinpoint the conflict between a type specification and a value. For gradual typing systems that fail complete monitoring, the technical framework provides a source-of-truth to assess the quality of blame.

Publisher's Version Article Search
TLA+ Model Checking Made Symbolic
Igor Konnov, Jure Kukovec, and Thanh-Hai Tran
(Inria, France; LORIA, France; University of Lorraine, France; CNRS, France; TU Vienna, Austria)
TLA+ is a language for formal specification of all kinds of computer systems. System designers use this language to specify concurrent, distributed, and fault-tolerant protocols, which are traditionally presented in pseudo-code. TLA+ is extremely concise yet expressive: The language primitives include Booleans, integers, functions, tuples, records, sequences, and sets thereof, which can be also nested. This is probably why the only model checker for TLA+ (called TLC) relies on explicit enumeration of values and states.
In this paper, we present APALACHE -- a first symbolic model checker for TLA+. Like TLC, it assumes that all specification parameters are fixed and all states are finite structures. Unlike TLC, APALACHE translates the underlying transition relation into quantifier-free SMT constraints, which allows us to exploit the power of SMT solvers. Designing this translation is the central challenge that we address in this paper. Our experiments show that APALACHE outperforms TLC on examples with large state spaces.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Value-Centric Dynamic Partial Order Reduction
Krishnendu Chatterjee, Andreas Pavlogiannis, and Viktor Toman
(IST Austria, Austria; EPFL, Switzerland)
The verification of concurrent programs remains an open challenge, as thread interaction has to be accounted for, which leads to state-space explosion. Stateless model checking battles this problem by exploring traces rather than states of the program. As there are exponentially many traces, dynamic partial-order reduction (DPOR) techniques are used to partition the trace space into equivalence classes, and explore a few representatives from each class. The standard equivalence that underlies most DPOR techniques is the happens-before equivalence, however recent works have spawned a vivid interest towards coarser equivalences. The efficiency of such approaches is a product of two parameters: (i) the size of the partitioning induced by the equivalence, and (ii) the time spent by the exploration algorithm in each class of the partitioning.
In this work, we present a new equivalence, called value-happens-before and show that it has two appealing features. First, value-happens-before is always at least as coarse as the happens-before equivalence, and can be even exponentially coarser. Second, the value-happens-before partitioning is efficiently explorable when the number of threads is bounded. We present an algorithm called value-centric DPOR (VCDPOR), which explores the underlying partitioning using polynomial time per class. Finally, we perform an experimental evaluation of VCDPOR on various benchmarks, and compare it against other state-of-the-art approaches. Our results show that value-happens-before typically induces a significant reduction in the size of the underlying partitioning, which leads to a considerable reduction in the running time for exploring the whole partitioning.

Publisher's Version Article Search
Seq: A High-Performance Language for Bioinformatics
Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe
(Massachusetts Institute of Technology, USA)
The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100—a factor of over 106—and the amount of data to be analyzed has increased proportionally. Yet, as Moore’s Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines.
Here, we introduce Seq, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python—and is in many cases a drop-in replacement—yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation, k-mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160× improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.

Publisher's Version Article Search Info Artifacts Available Artifacts Functional
Staged Abstract Interpreters: Fast and Modular Whole-Program Analysis via Meta-programming
Guannan Wei, Yuxuan Chen, and Tiark Rompf
(Purdue University, USA)
It is well known that a staged interpreter is a compiler: specializing an interpreter to a given program produces an equivalent executable that runs faster. This connection is known as the first Futamura projection. It is even more widely known that an abstract interpreter is a program analyzer: tweaking an interpreter to run on abstract domains produces a sound static analysis. What happens when we combine these two ideas, and apply specialization to an abstract interpreter?
In this paper, we present a unifying framework that naturally extends the first Futamura projection from concrete interpreters to abstract interpreters. Our approach derives a sound staged abstract interpreter based on a generic interpreter with type-level binding-time abstractions and monadic abstractions. By using different instantiations of these abstractions, the generic interpreter can flexibly behave in one of four modes: as an unstaged concrete interpreter, a staged concrete interpreter, an unstaged abstract interpreter, or a staged abstract interpreter. As an example of abstraction without regret, the overhead of these abstraction layers is eliminated in the generated code after staging. We show that staging abstract interpreters is practical and useful to optimize static analysis, all while requiring less engineering effort and without compromising soundness. We conduct three case studies, including a comparison with Boucher and Feeley’s abstract compilation, applications to various control-flow analyses, and a demonstration for modular analysis. We also empirically evaluate the effect of staging on the execution time. The experiment shows an order of magnitude speedup with staging for control-flow analyses.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Derivative Grammars: A Symbolic Approach to Parsing with Derivatives
Ian Henriksen, Gianfranco Bilardi, and Keshav Pingali
(University of Texas at Austin, USA; University of Padua, Italy)
We present a novel approach to context-free grammar parsing that is based on generating a sequence of grammars called derivative grammars from a given context-free grammar and input string. The generation of the derivative grammars is described by a few simple inference rules. We present an O(n2) space and O(n3) time recognition algorithm, which can be extended to generate parse trees in O(n3) time and O(n2logn) space. Derivative grammars can be viewed as a symbolic approach to implementing the notion of derivative languages, which was introduced by Brzozowski.
Might and others have explored an operational approach to implementing derivative languages in which the context-free grammar is encoded as a collection of recursive algebraic data types in a functional language like Haskell. Functional language implementation features like knot-tying and lazy evaluation are exploited to ensure that parsing is done correctly and efficiently in spite of complications like left-recursion. In contrast, our symbolic approach using inference rules can be implemented easily in any programming language and we obtain better space bounds for parsing.
Reifying derivative languages by encoding them symbolically as grammars also enables formal connections to be made for the first time between the derivatives approach and classical parsing methods like the Earley and LL/LR parsers. In particular, we show that the sets of Earley items maintained by the Earley parser implicitly encode derivative grammars and we give a procedure for producing derivative grammars from these sets. Conversely, we show that our derivative grammar recognizer can be transformed into the Earley recognizer by optimizing some of its bookkeeping. These results suggest that derivative grammars may provide a new foundation for context-free grammar recognition and parsing.

Publisher's Version Article Search
Efficient Lock-Free Durable Sets
Yoav Zuriel, Michal Friedman, Gali Sheffi, Nachshon Cohen, and Erez Petrank
(Technion, Israel; Amazon, Israel)
Non-volatile memory is expected to co-exist or replace DRAM in upcoming architectures. Durable concurrent data structures for non-volatile memories are essential building blocks for constructing adequate software for use with these architectures. In this paper, we propose a new approach for durable concurrent sets and use this approach to build the most efficient durable hash tables available today. Evaluation shows a performance improvement factor of up to 3.3x over existing technology.

Publisher's Version Article Search
Modular Verification for Almost-Sure Termination of Probabilistic Programs
Mingzhang Huang, Hongfei Fu, Krishnendu Chatterjee, and Amir Kafshdar Goharshady
(Shanghai Jiao Tong University, China; East China Normal University, China; IST Austria, Austria)
In this work, we consider the almost-sure termination problem for probabilistic programs that asks whether a given probabilistic program terminates with probability 1. Scalable approaches for program analysis often rely on modularity as their theoretical basis. In non-probabilistic programs, the classical variant rule (V-rule) of Floyd-Hoare logic provides the foundation for modular analysis. Extension of this rule to almost-sure termination of probabilistic programs is quite tricky, and a probabilistic variant was proposed by Fioriti and Hermanns in POPL 2015. While the proposed probabilistic variant cautiously addresses the key issue of integrability, we show that the proposed modular rule is still not sound for almost-sure termination of probabilistic programs.
Besides establishing unsoundness of the previous rule, our contributions are as follows: First, we present a sound modular rule for almost-sure termination of probabilistic programs. Our approach is based on a novel notion of descent supermartingales. Second, for algorithmic approaches, we consider descent supermartingales that are linear and show that they can be synthesized in polynomial time. Finally, we present experimental results on a variety of benchmarks and several natural examples that model various types of nested while loops in probabilistic programs and demonstrate that our approach is able to efficiently prove their almost-sure termination property.

Publisher's Version Article Search Artifacts Available Artifacts Functional
IVT: An Efficient Method for Sharing Subtype Polymorphic Objects
Yu-Ping Wang, Xu-Qiang Hu, Zi-Xin Zou, Wende Tan, and Gang Tan
(Tsinghua University, China; Pennsylvania State University, USA)
Shared memory provides the fastest form of inter-process communication. Sharing polymorphic objects between different address spaces requires solving the issue of sharing pointers. In this paper, we propose a method, named Indexed Virtual Tables (IVT for short), to share polymorphic objects efficiently. On object construction, the virtual table pointers are replaced with indexes, which are used to find the actual virtual table pointers on dynamic dispatch. Only a few addition and load instructions are needed for both operations. Experimental results show that the IVT can outperform prior techniques on both object construction time and dynamic dispatch time. We also apply the proposed IVT technique to several practical scenarios, resulting the improvement of overall performance.

Publisher's Version Article Search
Refinement Kinds: Type-Safe Programming with Practical Type-Level Computation
Luís Caires and Bernardo Toninho
(Nova University of Lisbon, Portugal; NOVA-LINCS, Portugal)
This work introduces the novel concept of kind refinement, which we develop in the context of an explicitly polymorphic ML-like language with type-level computation. Just as type refinements embed rich specifications by means of comprehension principles expressed by predicates over values in the type domain, kind refinements provide rich kind specifications by means of predicates over types in the kind domain. By leveraging our powerful refinement kind discipline, types in our language are not just used to statically classify program expressions and values, but also conveniently manipulated as tree-like data structures, with their kinds refined by logical constraints on such structures. Remarkably, the resulting typing and kinding disciplines allow for powerful forms of type reflection, ad-hoc polymorphism and type-directed meta-programming, which are often found in modern software development, but not typically expressible in a type-safe manner in general purpose languages. We validate our approach both formally and pragmatically by establishing the standard meta-theoretical results of type safety and via a prototype implementation of a kind checker, type checker and interpreter for our language.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Program Synthesis with Algebraic Library Specifications
Benjamin Mariano, Josh Reese, Siyuan Xu, ThanhVu Nguyen, Xiaokang Qiu, Jeffrey S. Foster, and Armando Solar-Lezama
(University of Maryland at College Park, USA; Purdue University, USA; University of Nebraska-Lincoln, USA; Tufts University, USA; Massachusetts Institute of Technology, USA)
A key challenge in program synthesis is synthesizing programs that use libraries, which most real-world software does. The current state of the art is to model libraries with mock library implementations that perform the same function in a simpler way. However, mocks may still be large and complex, and must include many implementation details, both of which could limit synthesis performance. To address this problem, we introduce JLibSketch, a Java program synthesis tool that allows library behavior to be described with algebraic specifications, which are rewrite rules for sequences of method calls, e.g., encryption followed by decryption (with the same key) is the identity. JLibSketch implements rewrite rules by compiling JLibSketch problems into problems for the Sketch program synthesis tool. More specifically, after compilation, library calls are represented by abstract data types (ADTs), and rewrite rules manipulate those ADTs. We formalize compilation and prove it sound and complete if the rewrite rules are ordered and non-unifiable. We evaluated JLibSketch by using it to synthesize nine programs that use libraries from three domains: data structures, cryptography, and file systems. We found that algebraic specifications are, on average, about half the size of mocks. We also found that algebraic specifications perform better than mocks on seven of the nine programs, sometimes significantly so, and perform equally well on the last two programs. Thus, we believe that JLibSketch takes an important step toward synthesis of programs that use libraries.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Weakening WebAssembly
Conrad Watt, Andreas Rossberg, and Jean Pichon-Pharabod
(University of Cambridge, UK; Dfinity Stiftung, Germany)
WebAssembly (Wasm) is a safe, portable virtual instruction set that can be hosted in a wide range of environments, such as a Web browser. It is a low-level language whose instructions are intended to compile directly to bare hardware. While the initial version of Wasm focussed on single-threaded computation, a recent proposal extends it with low-level support for multiple threads and atomic instructions for synchronised access to shared memory. To support the correct compilation of concurrent programs, it is necessary to give a suitable specification of its memory model.
Wasm's language definition is based on a fully formalised specification that carefully avoids undefined behaviour. We present a substantial extension to this semantics, incorporating a relaxed memory model, along with a few proposed extensions. Wasm's memory model is unique in that its linear address space can be dynamically grown during execution, while all accesses are bounds-checked. This leads to the novel problem of specifying how observations about the size of the memory can propagate between threads. We argue that, considering desirable compilation schemes, we cannot give a sequentially consistent semantics to memory growth.
We show that our model provides sequential consistency for data-race-free executions (SC-DRF). However, because Wasm is to run on the Web, we must also consider interoperability of its model with that of JavaScript. We show, by counter-example, that JavaScript's memory model is not SC-DRF, in contrast to what is claimed in its specification. We propose two axiomatic conditions that should be added to the JavaScript model to correct this difference.
We also describe a prototype SMT-based litmus tool which acts as an oracle for our axiomatic model, visualising its behaviours, including memory resizing.

Publisher's Version Article Search
Generating a Fluent API with Syntax Checking from an LR Grammar
Tetsuro Yamazaki, Tomoki Nakamaru, Kazuhiro Ichikawa, and Shigeru Chiba
(University of Tokyo, Japan)
This paper proposes a fluent API generator for Scala, Haskell, and C++. It receives a grammar definition and generates a code skeleton of the library in the host programming language. The generated library is accessed through a chain of method calls; this style of API is called a fluent API. The library uses the host-language type checker to detect an invalid chain of method calls. Each method call is regarded as a lexical token in the embedded domain specific language implemented by that library. A sequence of the lexical tokens is checked and, if the sequence is not acceptable by the grammar, a type error is reported during compilation time. A contribution of this paper is to present an algorithm for generating the code-skeleton for a fluent API that reports a type error when a chain of method calls to the library does not match the given LR grammar. Our algorithm works in Scala, Haskell, and C++. To encode LR parsing, it uses the method/function overloading available in those languages. It does not need an advanced type system, or exponential compilation time or memory consumption. This paper also presents our implementation of the proposed generator.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Weak Persistency Semantics from the Ground Up: Formalising the Persistency Semantics of ARMv8 and Transactional Models
Azalea Raad, John Wickerson, and Viktor Vafeiadis
(MPI-SWS, Germany; Imperial College London, UK)
Emerging non-volatile memory (NVM) technologies promise the durability of disks with the performance of volatile memory (RAM). To describe the persistency guarantees of NVM, several memory persistency models have been proposed in the literature. However, the formal persistency semantics of mainstream hardware is unexplored to date. To close this gap, we present a formal declarative framework for describing concurrency models in the NVM context, and then develop the PARMv8 persistency model as an instance of our framework, formalising the persistency semantics of the ARMv8 architecture for the first time. To facilitate correct persistent programming, we study transactions as a simple abstraction for concurrency and persistency control. We thus develop the PSER (persistent serialisability) persistency model, formalising transactional semantics in the NVM context for the first time, and demonstrate that PSER correctly compiles to PARMv8. This then enables programmers to write correct, concurrent and persistent programs, without having to understand the low-level architecture-specific persistency semantics of the underlying hardware.

Publisher's Version Article Search Info
DeepSEA: A Language for Certified System Software
Vilhelm Sjöberg, Yuyang Sang, Shu-chun Weng, and Zhong Shao
(Yale University, USA; CertiK, USA)
Writing certifiably correct system software is still very labor-intensive, and current programming languages are not well suited for the task. Proof assistants work best on programs written in a high-level functional style, while operating systems need low-level control over the hardware. We present DeepSEA, a language which provides support for layered specification and abstraction refinement, effect encapsulation and composition, and full equational reasoning. A single DeepSEA program is automatically compiled into a certified ``layer'' consisting of a C program (which is then compiled into assembly by CompCert), a low-level functional Coq specification, and a formal (Coq) proof that the C program satisfies the specification. Multiple layers can be composed and interleaved with manual proofs to ascribe a high-level specification to a program by stepwise refinement. We evaluate the language by using it to reimplement two existing verified programs: a SHA-256 hash function and an OS kernel page table manager. This new style of programming language design can directly support the development of correct-by-construction system software.

Publisher's Version Article Search
BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and Per-Path Abstract Interpretation
Zhuo Zhang, Wei You, Guanhong Tao, Guannan Wei, Yonghwi Kwon, and Xiangyu Zhang
(Purdue University, USA; Renmin University of China, China; University of Virginia, USA)
Binary program dependence analysis determines dependence between instructions and hence is important for many applications that have to deal with executables without any symbol information. A key challenge is to identify if multiple memory read/write instructions access the same memory location. The state-of-the-art solution is the value set analysis (VSA) that uses abstract interpretation to determine the set of addresses that are possibly accessed by memory instructions. However, VSA is conservative and hence leads to a large number of bogus dependences and then substantial false positives in downstream analyses such as malware behavior analysis. Furthermore, existing public VSA implementations have difficulty scaling to complex binaries. In this paper, we propose a new binary dependence analysis called BDA enabled by a randomized abstract interpretation technique. It features a novel whole program path sampling algorithm that is not biased by path length, and a per-path abstract interpretation avoiding precision loss caused by merging paths in traditional analyses. It also provides probabilistic guarantees. Our evaluation on SPECINT2000 programs shows that it can handle complex binaries such as gcc whereas VSA implementations from the-state-of-art platforms have difficulty producing results for many SPEC binaries. In addition, the dependences reported by BDA are 75 and 6 times smaller than Alto, a scalable binary dependence analysis tool, and VSA, respectively, with only 0.19% of true dependences observed during dynamic execution missed (by BDA). Applying BDA to call graph generation and malware analysis shows that BDA substantially supersedes the commercial tool IDA in recovering indirect call targets and outperforms a state-of-the-art malware analysis tool Cuckoo by disclosing 3 times more hidden payloads.

Publisher's Version Article Search Info
Asphalion: Trustworthy Shielding against Byzantine Faults
Ivana Vukotic, Vincent Rahli, and Paulo Esteves-Veríssimo
(University of Luxembourg, Luxembourg; University of Birmingham, UK)
Byzantine fault-tolerant state-machine replication (BFT-SMR) is a technique for hardening systems to tolerate arbitrary faults. Although robust, BFT-SMR protocols are very costly in terms of the number of required replicas (3f+1 to tolerate f faults) and of exchanged messages. However, with "hybrid" architectures, where "normal" components trust some "special" components to provide properties in a trustworthy manner, the cost of using BFT can be dramatically reduced. Unfortunately, even though such hybridization techniques decrease the message/time/space complexity of BFT protocols, they also increase their structural complexity.
Therefore, we introduce Asphalion, the first theorem prover-based framework for verifying implementations of hybrid systems and protocols. It relies on three novel languages: (1) HyLoE: a Hybrid Logic of Events to reason about hybrid fault models; (2) MoC: a Monadic Component language to implement systems as collections of interacting hybrid components; and (3) LoCK: a sound Logic of events-based Calculus of Knowledge to reason about both homogeneous and hybrid systems at a high-level of abstraction (thereby allowing reusing proofs, and capturing the high-level logic of distributed systems). In addition, Asphalion supports compositional reasoning, e.g., through mechanisms to lift properties about trusted-trustworthy components, to the level of the distributed systems they are integrated in. As a case study, we have verified crucial safety properties (e.g., agreement) of several implementations of hybrid protocols.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Automatic Repair of Regular Expressions
Rong Pan, Qinheping Hu, Gaowei Xu, and Loris D'Antoni
(University of Texas at Austin, USA; University of Wisconsin-Madison, USA)
We introduce RFixer, a tool for repairing complex regular expressions using examples and only consider regular expressions without non-regular operators (e.g., negative lookahead). Given an incorrect regular expression and sets of positive and negative examples, RFixer synthesizes the closest regular expression to the original one that is consistent with the examples. Automatically repairing regular expressions requires exploring a large search space because practical regular expressions: i) are large, ii) operate over very large alphabets---e.g., UTF-16 and ASCII---and iii) employ complex constructs---e.g., character classes and numerical quantifiers. RFixer's repair algorithm achieves scalability by taking advantage of structural properties of regular expressions to effectively prune the search space, and it employs satisfiability modulo theory solvers to efficiently and symbolically explore the sets of possible character classes and numerical quantifiers. RFixer could successfully compute minimal repairs for regular expressions collected from a variety of sources, whereas existing tools either failed to produce any repair or produced overly complex repairs.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Static Analysis with Demand-Driven Value Refinement
Benno Stein, Benjamin Barslev Nielsen, Bor-Yuh Evan Chang, and Anders Møller
(University of Colorado Boulder, USA; Aarhus University, Denmark)
Static analysis tools for JavaScript must strike a delicate balance, achieving the level of precision required by the most complex features of target programs without incurring prohibitively high analysis time. For example, reasoning about dynamic property accesses sometimes requires precise relational information connecting the object, the dynamically-computed property name, and the property value. Even a minor precision loss at such critical program locations can result in a proliferation of spurious dataflow that renders the analysis results useless.
We present a technique by which a conventional non-relational static dataflow analysis can be combined soundly with a value refinement mechanism to increase precision on demand at critical locations. Crucially, our technique is able to incorporate relational information from the value refinement mechanism into the non-relational domain of the dataflow analysis.
We demonstrate the feasibility of this approach by extending an existing JavaScript static analysis with a demand-driven value refinement mechanism that relies on backwards abstract interpretation. Our evaluation finds that precise analysis of widely used JavaScript utility libraries depends heavily on the precision at a small number of critical locations that can be identified heuristically, and that backwards abstract interpretation is an effective mechanism to provide that precision on demand.

Publisher's Version Article Search Info
Relational Verification using Reinforcement Learning
Jia Chen, Jiayi Wei, Yu Feng, Osbert Bastani, and Isil Dillig
(University of Texas at Austin, USA; University of California at Santa Barbara, USA; University of Pennsylvania, USA)
Relational verification aims to prove properties that relate a pair of programs or two different runs of the same program. While relational properties (e.g., equivalence, non-interference) can be verified by reducing them to standard safety, there are typically many possible reduction strategies, only some of which result in successful automated verification. Motivated by this problem, we propose a novel relational verification algorithm that learns useful reduction strategies using reinforcement learning. Specifically, we show how to formulate relational verification as a Markov Decision Process (MDP) and use reinforcement learning to synthesize an optimal policy for the underlying MDP. The learned policy is then used to guide the search for a successful verification strategy. We have implemented this approach in a tool called Coeus and evaluate it on two benchmark suites. Our evaluation shows that Coeus solves significantly more problems within a given time limit compared to multiple baselines, including two state-of-the-art relational verification tools.

Publisher's Version Article Search
A Formalization of Java’s Concurrent Access Modes
John Bender and Jens Palsberg
(University of California at Los Angeles, USA)
Java's memory model was recently updated and expanded with new access modes. The accompanying documentation for these access modes is intended to make strong guarantees about program behavior that the Java compiler must enforce, yet the documentation is frequently unclear. This makes the intended program behavior ambiguous, impedes discussion of key design decisions, and makes it impossible to prove general properties about the semantics of the access modes.
In this paper we present the first formalization of Java's access modes. We have constructed an axiomatic model for all of the modes using the Herd modeling tool. This allows us to give precise answers to questions about the behavior of example programs, called litmus tests. We have validated our model using a large suite of litmus tests from existing research which helps to shed light on the relationship with other memory models. We have also modeled the semantics in Coq and proven several general theorems including a DRF guarantee, which says that if a program is properly synchronized then it will exhibit sequentially consistent behavior. Finally, we use our model to prove that the unusual design choice of a partial order among writes to the same location is unobservable in any program.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
On the Fly Synthesis of Edit Suggestions
Anders Miltner, Sumit Gulwani, Vu Le, Alan Leung, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari, and Abhishek Udupa
(Princeton University, USA; Microsoft, USA)
When working with a document, users often perform context-specific repetitive edits – changes to the document that are similar but specific to the contexts at their locations. Programming by demonstration/examples (PBD/PBE) systems automate these tasks by learning programs to perform the repetitive edits from demonstration or examples. However, PBD/PBE systems are not widely adopted, mainly because they require modal UIs – users must enter a special mode to give the demonstration/examples. This paper presents Blue-Pencil, a modeless system for synthesizing edit suggestions on the fly. Blue-Pencil observes users as they make changes to the document, silently identifies repetitive changes, and automatically suggests transformations that can apply at other locations. Blue-Pencil is parameterized – it allows the ”plug-and-play” of different PBE engines to support different document types and different kinds of transformations. We demonstrate this parameterization by instantiating Blue-Pencil to several domains – C# and SQL code, markdown documents, and spreadsheets – using various existing PBE engines. Our evaluation on 37 code editing sessions shows that Blue-Pencil synthesized edit suggestions with a precision of 0.89 and a recall of 1.0, and took 199 ms to return suggestions on average. Finally, we report on several improvements based on feedback gleaned from a field study with professional programmers to investigate the use of Blue-Pencil during long code editing sessions. Blue-Pencil has been integrated with Visual Studio IntelliCode to power the IntelliCode refactorings feature.

Publisher's Version Article Search Artifacts Functional
A Fault-Tolerant Programming Model for Distributed Interactive Applications
Ragnar Mogk, Joscha Drechsler, Guido Salvaneschi, and Mira Mezini
(TU Darmstadt, Germany)
Ubiquitous connectivity of web, mobile, and IoT computing platforms has fostered a variety of distributed applications with decentralized state. These applications execute across multiple devices with varying reliability and connectivity. Unfortunately, there is no declarative fault-tolerant programming model for distributed interactive applications with an inherently decentralized system model.
We present a novel approach to automating fault tolerance using high-level programming abstractions tailored to the needs of distributed interactive applications. Specifically, we propose a calculus that enables formal reasoning about applications' dataflow within and across individual devices. Our calculus reinterprets the functional reactive programming model to seamlessly integrate its automated state change propagation with automated crash recovery of device-local dataflow and disconnection-tolerant distribution with guaranteed automated eventual consistency semantics based on conflict-free replicated datatypes. As a result, programmers are relieved of handling intricate details of distributing change propagation and coping with distribution failures in the presence of interactivity. We also provides proofs of our claims, an implementation of our calculus, and an empirical evaluation using a common interactive application.

Publisher's Version Article Search Info
A Path to DOT: Formalizing Fully Path-Dependent Types
Marianna Rapoport and Ondřej Lhoták
(University of Waterloo, Canada)
The Dependent Object Types (DOT) calculus aims to formalize the Scala programming language with a focus on path-dependent types — types such as x.a1an.T that depend on the runtime value of a path x.a1an to an object. Unfortunately, existing formulations of DOT can model only types of the form x.A which depend on variables rather than general paths. This restriction makes it impossible to model nested module dependencies. Nesting small components inside larger ones is a necessary ingredient of a modular, scalable language. DOT’s variable restriction thus undermines its ability to fully formalize a variety of programming-language features including Scala’s module system, family polymorphism, and covariant specialization.
This paper presents the pDOT calculus, which generalizes DOT to support types that depend on paths of arbitrary length, as well as singleton types to track path equality. We show that naive approaches to add paths to DOT make it inherently unsound, and present necessary conditions for such a calculus to be sound. We discuss the key changes necessary to adapt the techniques of the DOT soundness proofs so that they can be applied to pDOT. Our paper comes with a Coq-mechanized type-safety proof of pDOT. With support for paths of arbitrary length, pDOT can realize DOT’s full potential for formalizing Scala-like calculi.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Reliable and Fast DWARF-Based Stack Unwinding
Théophile Bastian, Stephen Kell, and Francesco Zappa Nardelli
(ENS, France; University of Kent, UK; Inria, France)
Debug information, usually encoded in the DWARF format, is a hidden and obscure component of our computing infrastructure. Debug information is obviously used by debuggers, but it also plays a key role in program analysis tools, and, most surprisingly, it can be relied upon by the runtime of high-level programming languages. For instance the C++ runtime leverages DWARF stack unwind tables to implement exceptions! Alas, generating debug information adds significant burden to compiler implementations, and the debug information itself can be pervaded by subtle bugs, making the whole infrastructure unreliable. Additionally, interpreting the debug tables is a time-consuming task and, for some applications as sampling profilers, it turns out to be a performance bottleneck.
In this paper we focus on the DWARF .eh_frame table, that enables stack unwinding in absence of frame-pointers. We will describe two techniques to perform validation and synthesis of the DWARF stack unwinding tables, and their implementation for the x86_64 architecture. The validation tool has proven effective for compiler and inline assembly testing, while the synthesis tool can generate DWARF unwind tables for arbitrary binaries lacking debug information. Additionally, we report on a technique to precompile unwind tables into native x86_64 code, which we have implemented and integrated into libunwind, resulting in 11x-25x DWARF-based unwind speedups.

Publisher's Version Article Search Info Artifacts Available Artifacts Functional
Leveraging Rust Types for Modular Specification and Verification
Vytautas Astrauskas, Peter Müller, Federico Poli, and Alexander J. Summers
(ETH Zurich, Switzerland)
Rust's type system ensures memory safety: well-typed Rust programs are guaranteed to not exhibit problems such as dangling pointers, data races, and unexpected side effects through aliased references. Ensuring correctness properties beyond memory safety, for instance, the guaranteed absence of assertion failures or more-general functional correctness, requires static program verification. For traditional system programming languages, formal verification is notoriously difficult and requires complex specifications and logics to reason about pointers, aliasing, and side effects on mutable state. This complexity is a major obstacle to the more-widespread verification of system software.
In this paper, we present a novel verification technique that leverages Rust's type system to greatly simplify the specification and verification of system software written in Rust. We analyse information from the Rust compiler and synthesise a corresponding core proof for the program in a flavour of separation logic tailored to automation. To verify correctness properties beyond memory safety, users can annotate Rust programs with specifications at the abstraction level of Rust expressions; our technique weaves them into the core proof to verify modularly whether these specifications hold. Crucially, our proofs are constructed and checked automatically without exposing the underlying formal logic, allowing users to work exclusively at the level of abstraction of the programming language. As such, our work enables a new kind of verification tool, with the potential to impact a wide audience and allow the Rust community to benefit from state-of-the-art verification techniques. We have implemented our techniques for a subset of Rust; our evaluation on several thousand functions from widely-used Rust crates demonstrates its effectiveness.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
Precision-Preserving Yet Fast Object-Sensitive Pointer Analysis with Partial Context Sensitivity
Jingbo Lu and Jingling Xue
(UNSW, Australia)
Object-sensitivity is widely used as a context abstraction for computing the points-to information context-sensitively for object-oriented languages like Java. Due to the combinatorial explosion of contexts in large programs, k-object-sensitive pointer analysis (under k-limiting), denoted k-obj, is scalable only for small values of k, where k⩽2 typically. A few recent solutions attempt to improve its efficiency by instructing k-obj to analyze only some methods in the program context-sensitively, determined heuristically by a pre-analysis. While already effective, these heuristics-based pre-analyses do not provide precision guarantees, and consequently, are limited in the efficiency gains achieved. We introduce a radically different approach, Eagle, that makes k-obj run significantly faster than the prior art while maintaining its precision. The novelty of Eagle is to enable k-obj to analyze a method with partial context-sensitivity, i.e., context-sensitively for only some of its selected variables/allocation sites. Eagle makes these selections during a lightweight pre-analysis by reasoning about context-free-language (CFL) reachability at the level of variables/objects in the program, based on a new CFL-reachability formulation of k-obj. We demonstrate the advances made by Eagle by comparing it with the prior art in terms of a set of popular Java benchmarks and applications.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Formal Foundations of Serverless Computing
Abhinav Jangda, Donald Pinckney, Yuriy Brun, and Arjun Guha
(University of Massachusetts Amherst, USA)
Serverless computing (also known as functions as a service) is a new cloud computing abstraction that makes it easier to write robust, large-scale web services. In serverless computing, programmers write what are called serverless functions, which are programs that respond to external events. When demand for the serverless function spikes, the platform automatically allocates additional hardware and manages load-balancing; when demand falls, the platform silently deallocates idle resources; and when the platform detects a failure, it transparently retries affected requests. In 2014, Amazon Web Services introduced the first serverless platform, AWS Lambda, and similar abstractions are now available on all major cloud computing platforms.
Unfortunately, the serverless computing abstraction exposes several low-level operational details that make it hard for programmers to write and reason about their code. This paper sheds light on this problem by presenting λλ, an operational semantics of the essence of serverless computing. Despite being a small (half a page) core calculus, λλ models all the low-level details that serverless functions can observe. To show that λλ is useful, we present three applications. First, to ease reasoning about code, we present a simplified naive semantics of serverless execution and precisely characterize when the naive semantics and λλ coincide. Second, we augment λλ with a key-value store to allow reasoning about stateful serverless functions. Third, since a handful of serverless platforms support serverless function composition, we show how to extend λλ with a composition language and show that our implementation can outperform prior work.

Publisher's Version Article Search
Optimal Stateless Model Checking for Reads-From Equivalence under Sequential Consistency
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, Magnus Lång, Tuan Phong Ngo, and Konstantinos Sagonas
(Uppsala University, Sweden)
We present a new approach for stateless model checking (SMC) of multithreaded programs under Sequential Consistency (SC) semantics. To combat state-space explosion, SMC is often equipped with a partial-order reduction technique, which defines an equivalence on executions, and only needs to explore one execution in each equivalence class. Recently, it has been observed that the commonly used equivalence of Mazurkiewicz traces can be coarsened but still cover all program crashes and assertion violations. However, for this coarser equivalence, which preserves only the reads-from relation from writes to reads, there is no SMC algorithm which is (i) optimal in the sense that it explores precisely one execution in each reads-from equivalence class, and (ii) efficient in the sense that it spends polynomial effort per class. We present the first SMC algorithm for SC that is both optimal and efficient in practice, meaning that it spends polynomial time per equivalence class on all programs that we have tried. This is achieved by a novel test that checks whether a given reads-from relation can arise in some execution. We have implemented the algorithm by extending Nidhugg, an SMC tool for C/C++ programs, with a new mode called rfsc. Our experimental results show that Nidhugg/rfsc, although slower than the fastest SMC tools in programs where tools happen to examine the same number of executions, always scales similarly or better than them, and outperforms them by an exponential factor in programs where the reads-from equivalence is coarser than the standard one. We also present two non-trivial use cases where the new equivalence is particularly effective, as well as the significant performance advantage that Nidhugg/rfsc offers compared to state-of-the-art SMC and systematic concurrency testing tools.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Modular Verification of Web Page Layout
Pavel Panchekha, Michael D. Ernst, Zachary Tatlock, and Shoaib Kamil
(University of Utah, USA; University of Washington, USA; Adobe, USA)
Automated verification can ensure that a web page satisfies accessibility, usability, and design properties regardless of the end user's device, preferences, and assistive technologies. However, state-of-the-art verification tools for layout properties do not scale to large pages because they rely on whole-page analyses and must reason about the entire page using the complex semantics of the browser layout algorithm.
This paper introduces and formalizes modular layout proofs. A modular layout proof splits a monolithic verification problem into smaller verification problems, one for each component of a web page. Each component specification can use rely/guarantee-style preconditions to make it verifiable independently of the rest of the page and enabling reuse across multiple pages. Modular layout proofs scale verification to pages an order of magnitude larger than those supported by previous approaches.
We prototyped these techniques in a new proof assistant, Troika. In Troika, a proof author partitions a page into components and writes specifications for them. Troika then verifies the specifications, and uses those specifications to verify whole-page properties. Troika also enables the proof author to verify different component specifications with different verification tools, leveraging the strengths of each. In a case study, we use Troika to verify a large web page and demonstrate a speed-up of 13--1469x over existing tools, taking verification time from hours to seconds. We develop a systematic approach to writing Troika proofs and demonstrate it on 8 proofs of properties from prior work to show that modular layout proofs are short, easy to write, and provide benefits over existing tools.

Publisher's Version Article Search
Aroma: Code Recommendation via Structural Code Search
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra
(Facebook, USA; University of California at Irvine, USA; University of California at Berkeley, USA)
Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.

Publisher's Version Article Search
On the Design, Implementation, and Use of Laziness in R
Aviral Goel and Jan Vitek
(Northeastern University, USA; Czech Technical University, Czechia)
The R programming language has been lazy for over twenty-five years. This paper presents a review of the design and implementation of call-by-need in R, and a data-driven study of how generations of programmers have put laziness to use in their code. We analyze 16,707 packages and observe the creation of 270.9 B promises. Our data suggests that there is little supporting evidence to assert that programmers use laziness to avoid unnecessary computation or to operate over infinite data structures. For the most part R code appears to have been written without reliance on, and in many cases even knowledge of, delayed argument evaluation. The only significant exception is a small number of packages which leverage call-by-need for meta-programming.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Mergeable Replicated Data Types
Gowtham Kaki, Swarn Priya, KC Sivaramakrishnan, and Suresh Jagannathan
(Purdue University, USA; IIT Madras, India)
Programming geo-replicated distributed systems is challenging given the complexity of reasoning about different evolving states on different replicas. Existing approaches to this problem impose significant burden on application developers to consider the effect of how operations performed on one replica are witnessed and applied on others. To alleviate these challenges, we present a fundamentally different approach to programming in the presence of replicated state. Our insight is based on the use of invertible relational specifications of an inductively-defined data type as a mechanism to capture salient aspects of the data type relevant to how its different instances can be safely merged in a replicated environment. Importantly, because these specifications only address a data type's (static) structural properties, their formulation does not require exposing low-level system-level details concerning asynchrony, replication, visibility, etc. As a consequence, our framework enables the correct-by-construction synthesis of rich merge functions over arbitrarily complex (i.e., composable) data types. We show that the use of a rich relational specification language allows us to extract sufficient conditions to automatically derive merge functions that have meaningful non-trivial convergence properties. We incorporate these ideas in a tool called Quark, and demonstrate its utility via a detailed evaluation study on real-world benchmarks.

Publisher's Version Article Search
Compiler Fuzzing: How Much Does It Matter?
Michaël Marcozzi, Qiyi Tang, Alastair F. Donaldson, and Cristian Cadar
(Imperial College London, UK)
Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether we can manually find application inputs that trigger execution divergence due to such changes. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for at least one package, in which case only a very small part of the binary is typically affected, yet causing two failures when running the test suites of all the impacted packages. User-reported and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or that they would require very specific runtime circumstances to trigger execution divergence.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
DProf: Distributed Profiler with Strong Guarantees
Zachary Benavides, Keval Vora, and Rajiv Gupta
(University of California at Riverside, USA; Simon Fraser University, Canada)
Performance analysis of a distributed system is typically achieved by collecting profiles whose underlying events are timestamped with unsynchronized clocks of multiple machines in the system. To allow comparison of timestamps taken at different machines, several timestamp synchronization algorithms have been developed. However, the inaccuracies associated with these algorithms can lead to inaccuracies in the final results of performance analysis. To address this problem, in this paper, we develop a system for constructing distributed performance profiles called DProf. At the core of DProf is a new timestamp synchronization algorithm, FreeZer, that tightly bounds the inaccuracy in a converted timestamp to a time interval. This not only allows timestamps from different machines to be compared, it also enables maintaining strong guarantees throughout the comparison which can be carefully transformed into guarantees for analysis results. To demonstrate the utility of DProf, we use it to implement dCSP and dCOZ that are accuracy bounded distributed versions of Context Sensitive Profiles and Causal Profiles developed for shared memory systems. While dCSP enables user to ascertain existence of a performance bottleneck, dCOZ estimates the expected performance benefit from eliminating that bottleneck. Experiments with three distributed applications on a cluster of heterogeneous machines validate that inferences via dCSP and dCOZ are highly accurate. Moreover, if FreeZer is replaced by two existing timestamp algorithms (linear regression & convex hull), the inferences provided by dCSP and dCOZ are severely degraded.

Publisher's Version Article Search
Precise Reasoning with Structured Time, Structured Heaps, and Collective Operations
Grégory M. Essertel, Guannan Wei, and Tiark Rompf
(Purdue University, USA)
Despite decades of progress, static analysis tools still have great difficulty dealing with programs that combine arithmetic, loops, dynamic memory allocation, and linked data structures. In this paper we draw attention to two fundamental reasons for this difficulty: First, typical underlying program abstractions are low-level and inherently scalar, characterizing compound entities like data structures or results computed through iteration only indirectly. Second, to ensure termination, analyses typically project away the dimension of time, and merge information per program point, which incurs a loss in precision.
As a remedy, we propose to make collective operations first-class in program analysis – inspired by Σ-notation in mathematics, and also by the success of high-level intermediate languages based on @map/reduce@ operations in program generators and aggressive optimizing compilers for domain-specific languages (DSLs). We further propose a novel structured heap abstraction that preserves a symbolic dimension of time, reflecting the program’s loop structure and thus unambiguously correlating multiple temporal points in the dynamic execution with a single point in the program text.
This paper presents a formal model, based on a high-level intermediate analysis language, a practical realization in a prototype tool that analyzes C code, and an experimental evaluation that demonstrates competitive results on a series of benchmarks. Remarkably, our implementation achieves these results in a fully semantics-preserving strongest-postcondition model, which is a worst-case for analysis/verification. The underlying ideas, however, are not tied to this model and would equally apply in other settings, e.g., demand-driven invariant inference in a weakest-precondition model. Given its semantics-preserving nature, our implementation is not limited to analysis for verification, but can also check program equivalence, and translate legacy C code to high-performance DSLs.

Publisher's Version Article Search
Casting about in the Dark: An Empirical Study of Cast Operations in Java Programs
Luis Mastrangelo, Matthias Hauswirth, and Nathaniel Nystrom
(USI Lugano, Switzerland)
The main goal of a static type system is to prevent certain kinds of errors from happening at run time. A type system is formulated as a set of constraints that gives any expression or term in a program a well-defined type. Yet mainstream programming languages are endowed with type systems that provide the means to circumvent their constraints through casting.
We want to understand how and when developers escape the static type system to use dynamic typing. We empirically study how casting is used by developers in more than seven thousand Java projects. We find that casts are widely used (8.7% of methods contain at least one cast) and that 50% of casts we inspected are not guarded locally to ensure against potential run-time errors.
To help us better categorize use cases and thus understand how casts are used in practice, we identify 25 cast-usage patterns---recurrent programming idioms using casts to solve a specific issue. This knowledge can be: (a) a recommendation for current and future language designers to make informed decisions (b) a reference for tool builders, e.g., by providing more precise or new refactoring analyses, (c) a guide for researchers to test new language features, or to carry out controlled programming experiments, and (d) a guide for developers for better practices.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Getafix: Learning to Fix Bugs Automatically
Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra
(Facebook, USA)
Static analyzers help find bugs early by warning about recurring bug categories. While fixing these bugs still remains a mostly manual task in practice, we observe that fixes for a specific bug category often are repetitive. This paper addresses the problem of automatically fixing instances of common bugs by learning from past fixes. We present Getafix, an approach that produces human-like fixes while being fast enough to suggest fixes in time proportional to the amount of time needed to obtain static analysis results in the first place.
Getafix is based on a novel hierarchical clustering algorithm that summarizes fix patterns into a hierarchy ranging from general to specific patterns. Instead of an expensive exploration of a potentially large space of candidate fixes, Getafix uses a simple yet effective ranking technique that uses the context of a code change to select the most appropriate fix for a given bug.
Our evaluation applies Getafix to 1,268 bug fixes for six bug categories reported by popular static analyzers for Java, including null dereferences, incorrect API calls, and misuses of particular language constructs. The approach predicts exactly the human-written fix as the top-most suggestion between 12% and 91% of the time, depending on the bug category. The top-5 suggestions contain fixes for 526 of the 1,268 bugs. Moreover, we report on deploying the approach within Facebook, where it contributes to the reliability of software used by billions of people. To the best of our knowledge, Getafix is the first industrially-deployed automated bug-fixing tool that learns fix patterns from past, human-written fixes to produce human-like fixes.

Publisher's Version Article Search
Generating Precise Error Specifications for C: A Zero Shot Learning Approach
Baijun Wu, John Peter Campora III, Yi He, Alexander Schlecht, and Sheng Chen
(University of Louisiana at Lafayette, USA)
In C programs, error specifications, which specify the value range that each function returns to indicate failures, are widely used to check and propagate errors for the sake of reliability and security. Various kinds of C analyzers employ error specifications for different purposes, e.g., to detect error handling bugs, yet a general approach for generating precise specifications is still missing. This limits the applicability of those tools.
In this paper, we solve this problem by developing a machine learning-based approach named MLPEx. It generates error specifications by analyzing only the source code, and is thus general. We propose a novel machine learning paradigm based on transfer learning, enabling MLPEx to require only one-time minimal data labeling from us (as the tool developers) and zero manual labeling efforts from users. To improve the accuracy of generated error specifications, MLPEx extracts and exploits project-specific information. We evaluate MLPEx on 10 projects, including 6 libraries and 4 applications. An investigation of 3,443 functions and 17,750 paths reveals that MLPEx generates error specifications with a precision of 91% and a recall of 94%, significantly higher than those of state-of-the-art approaches. To further demonstrate the usefulness of the generated error specifications, we use them to detect 57 bugs in 5 tested projects.

Publisher's Version Article Search
Specifying Concurrent Programs in Separation Logic: Morphisms and Simulations
Aleksandar Nanevski, Anindya Banerjee, Germán Andrés Delbianco, and Ignacio Fábregas
(IMDEA Software Institute, Spain; IRIF, France; University of Paris, France)
In addition to pre- and postconditions, program specifications in recent separation logics for concurrency have employed an algebraic structure of resources—a form of state transition systems—to describe the state-based program invariants that must be preserved, and to record the permissible atomic changes to program state. In this paper we introduce a novel notion of resource morphism, i.e. structure-preserving function on resources, and show how to effectively integrate it into separation logic, using an associated notion of morphism-specific simulation. We apply morphisms and simulations to programs verified under one resource, to compositionally adapt them to operate under another resource, thus facilitating proof reuse.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Improving Bug Detection via Context-Based Code Representation Learning and Attention-Based Neural Networks
Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen
(New Jersey Institute of Technology, USA; University of Texas at Dallas, USA)
Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method’s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy.

Publisher's Version Article Search Info
Scala Implicits Are Everywhere: A Large-Scale Study of the Use of Scala Implicits in the Wild
Filip Křikava, Heather Miller, and Jan Vitek
(Czech Technical University, Czechia; Carnegie Mellon University, USA; Northeastern University, USA)
The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Optimization of Swift Protocols
Rajkishore Barik, Manu Sridharan, Murali Krishna Ramanathan, and Milind Chabbi
(Uber Technologies, USA; University of California at Riverside, USA)
Swift, an increasingly-popular programming language, advocates the use of protocols, which define a set of required methods and properties for conforming types. Protocols are commonly used in Swift programs for abstracting away implementation details; e.g., in a large industrial app from Uber, they are heavily used to enable mock objects for unit testing. Unfortunately, heavy use of protocols can result in significant performance overhead. Beyond the dynamic dispatch often associated with such a feature, Swift allows for both value and reference types to conform to a protocol, leading to significant boxing and unboxing overheads.
In this paper, we describe three new optimizations and transformations to reduce the overhead of Swift protocols. Within a procedure, we define LocalVar, a dataflow analysis and transformation to remove both dynamic dispatch and boxing overheads. We also describe Param, which optimizes the case of protocol-typed method parameters using specialization. Finally, we describe SoleType, a transformation that injects casts when a global analysis (like type-hierarchy analysis) discovers some protocol variable must have some concrete type. We also describe how these optimizations work fruitfully together and with existing Swift optimizations to deliver further speedups.
We perform elaborate experimentation and demonstrate that our optimizations deliver an average 1.56x speedup on a suite of Swift benchmarks that use protocols. Further, we applied the optimizations to a production iOS Swift application from Uber used by millions of customers daily. For a set of performance spans defined by the developers of the application, the optimized version showed speedups ranging from 6.9% to 55.49%. A version of our optimizations has been accepted as part of the official Swift compiler distribution.

Publisher's Version Article Search Artifacts Available Artifacts Functional
On the Complexity of Checking Transactional Consistency
Ranadeep Biswas and Constantin Enea
(University of Paris, France; IRIF, France; CNRS, France)
Transactions simplify concurrent programming by enabling computations on shared data that are isolated from other concurrent computations and are resilient to failures. Modern databases provide different consistency models for transactions corresponding to different tradeoffs between consistency and availability. In this work, we investigate the problem of checking whether a given execution of a transactional database adheres to some consistency model. We show that consistency models like read committed, read atomic, and causal consistency are polynomial-time checkable while prefix consistency and snapshot isolation are NP-complete in general. These results complement a previous NP-completeness result concerning serializability. Moreover, in the context of NP-complete consistency models, we devise algorithms that are polynomial time assuming that certain parameters in the input executions, e.g., the number of sessions, are fixed. We evaluate the scalability of these algorithms in the context of several production databases.

Publisher's Version Article Search Artifacts Available Artifacts Functional
System FR: Formalized Foundations for the Stainless Verifier
Jad Hamza, Nicolas Voirol, and Viktor Kunčak
(EPFL, Switzerland)
We present the design, implementation, and foundation of a verifier for higher-order functional programs with generics and recursive data types. Our system supports proving safety and termination using preconditions, postconditions and assertions. It supports writing proof hints using assertions and recursive calls. To formalize the soundness of the system we introduce System FR, a calculus supporting System F polymorphism, dependent refinement types, and recursive types (including recursion through contravariant positions of function types). Through the use of sized types, System FR supports reasoning about termination of lazy data structures such as streams. We formalize a reducibility argument using the Coq proof assistant and prove the soundness of a type-checker with respect to call-by-value semantics, ensuring type safety and normalization for typeable programs. Our program verifier is implemented as an alternative verification-condition generator for the Stainless tool, which relies on the Inox SMT-based solver backend for automation. We demonstrate the efficiency of our approach by verifying a collection of higher-order functional programs comprising around 14000 lines of polymorphic higher-order Scala code, including graph search algorithms, basic number theory, monad laws, functional data structures, and assignments from popular Functional Programming MOOCs.

Publisher's Version Article Search
Language-Integrated Privacy-Aware Distributed Queries
Guido Salvaneschi, Mirko Köhler, Daniel Sokolowski, Philipp Haller, Sebastian Erdweg, and Mira Mezini
(TU Darmstadt, Germany; KTH, Sweden; Johannes Gutenberg University Mainz, Germany)
Distributed query processing is an effective means for processing large amounts of data. To abstract from the technicalities of distributed systems, algorithms for operator placement automatically distribute sequential data queries over the available processing units. However, current algorithms for operator placement focus on performance and ignore privacy concerns that arise when handling sensitive data.
We present a new methodology for privacy-aware operator placement that both prevents leakage of sensitive information and improves performance. Crucially, our approach is based on an information-flow type system for data queries to reason about the sensitivity of query subcomputations. Our solution unfolds in two phases. First, placement space reduction generates deployment candidates based on privacy constraints using a syntax-directed transformation driven by the information-flow type system. Second, constraint solving selects the best placement among the candidates based on a cost model that maximizes performance. We verify that our algorithm preserves the sequential behavior of queries and prevents leakage of sensitive data. We implemented the type system and placement algorithm for a new query language SecQL and demonstrate significant performance improvements in benchmarks.

Publisher's Version Article Search
AutoPandas: Neural-Backed Generators for Program Synthesis
Rohan Bavishi, Caroline Lemieux, Roy Fox, Koushik Sen, and Ion Stoica
(University of California at Berkeley, USA; University of California at Irvine, USA)
Developers nowadays have to contend with a growing number of APIs. While in the long-term they are very useful to developers, many modern APIs have an incredibly steep learning curve, due to their hundreds of functions handling many arguments, obscure documentation, and frequently changing semantics. For APIs that perform data transformations, novices can often provide an I/O example demonstrating the desired transformation, but may be stuck on how to translate it to the API. A programming-by-example synthesis engine that takes such I/O examples and directly produces programs in the target API could help such novices. Such an engine presents unique challenges due to the breadth of real-world APIs, and the often-complex constraints over function arguments. We present a generator-based synthesis approach to contend with these problems. This approach uses a program candidate generator, which encodes basic constraints on the space of programs. We introduce neural-backed operators which can be seamlessly integrated into the program generator. To improve the efficiency of the search, we simply use these operators at non-deterministic decision points, instead of relying on domain-specific heuristics. We implement this technique for the Python pandas library in AutoPandas. AutoPandas supports 119 pandas dataframe transformation functions. We evaluate AutoPandas on 26 real-world benchmarks and find it solves 17 of them.

Publisher's Version Article Search
Ryū Revisited: Printf Floating Point Conversion
Ulf Adams
(Google, Germany)
Ryū Printf is a new algorithm to convert floating-point numbers to decimal strings according to the printf %f, %e, and %g formats: %f generates ‘full’ output (integer part of the input, dot, configurable number of digits), %e generates scientific output (one leading digit, dot, configurable number of digits, exponent), and %g generates the shorter of the two. Ryū Printf is based on the Ryū algorithm, which converts binary floating-point numbers to the shortest equivalent decimal floating-point representation. We provide quantitative evidence that Ryū Printf is between 3.8 and 55 times faster than existing printf implementations.
Furthermore, we show that both Ryū and Ryū Printf generalize to arbitrary number bases. This finding implies the existence of a fast algorithm to convert from base-10 to base-2, as long as the maximum precision of the input is known a priori.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
IntelliMerge: A Refactoring-Aware Software Merging Technique
Bo Shen, Wei Zhang, Haiyan Zhao, Guangtai Liang, Zhi Jin, and Qianxiang Wang
(Peking University, China; Huawei Technologies, China)
In modern software development, developers rely on version control systems like Git to collaborate in the branch-based development workflow. One downside of this workflow is the conflicts occurred when merging contributions from different developers: these conflicts are tedious and error-prone to be correctly resolved, reducing the efficiency of collaboration and introducing potential bugs. The situation becomes even worse, with the popularity of refactorings in software development and evolution, because current merging tools (usually based on the text or tree structures of source code) are unaware of refactorings. In this paper, we present IntelliMerge, a graph-based refactoring-aware merging algorithm for Java programs. We explicitly enhance this algorithm's ability in detecting and resolving refactoring-related conflicts. Through the evaluation on 1,070 merge scenarios from 10 popular open-source Java projects, we show that IntelliMerge reduces the number of merge conflicts by 58.90% comparing with GitMerge (the prevalent unstructured merging tool) and 11.84% comparing with jFSTMerge (the state-of-the-art semi-structured merging tool) without sacrificing the auto-merging precision (88.48%) and recall (90.22%). Besides, the evaluation of performance shows that IntelliMerge takes 539 milliseconds to process one merge scenario on the median, which indicates its feasibility in real-world applications.

Publisher's Version Article Search Info
Certifying Graph-Manipulating C Programs via Localizations within Data Structures
Shengyi Wang, Qinxiang Cao, Anshuman Mohan, and Aquinas Hobor
(National University of Singapore, Singapore; Shanghai Jiao Tong University, China)
We develop powerful and general techniques to mechanically verify realistic programs that manipulate heap-represented graphs. These graphs can exhibit well-known organization principles, such as being a directed acyclic graph or a disjoint-forest; alternatively, these graphs can be totally unstructured. The common thread for such structures is that they exhibit deep intrinsic sharing and can be expressed using the language of graph theory. We construct a modular and general setup for reasoning about abstract mathematical graphs and use separation logic to define how such abstract graphs are represented concretely in the heap. We develop a Localize rule that enables modular reasoning about such programs, and show how this rule can support existential quantifiers in postconditions and smoothly handle modified program variables. We demonstrate the generality and power of our techniques by integrating them into the Verified Software Toolchain and certifying the correctness of seven graph-manipulating programs written in CompCert C, including a 400-line generational garbage collector for the CertiCoq project. While doing so, we identify two places where the semantics of C is too weak to define generational garbage collectors of the sort used in the OCaml runtime. Our proofs are entirely machine-checked in Coq.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Duet: An Expressive Higher-Order Language and Linear Type System for Statically Enforcing Differential Privacy
Joseph P. Near, David Darais, Chike Abuah, Tim Stevens, Pranav Gaddamadugu, Lun Wang, Neel Somani, Mu Zhang, Nikhil Sharma, Alex Shan, and Dawn Song
(University of Vermont, USA; University of California at Berkeley, USA; University of Utah, USA)
During the past decade, differential privacy has become the gold standard for protecting the privacy of individuals. However, verifying that a particular program provides differential privacy often remains a manual task to be completed by an expert in the field. Language-based techniques have been proposed for fully automating proofs of differential privacy via type system design, however these results have lagged behind advances in differentially-private algorithms, leaving a noticeable gap in programs which can be automatically verified while also providing state-of-the-art bounds on privacy.
We propose Duet, an expressive higher-order language, linear type system and tool for automatically verifying differential privacy of general-purpose higher-order programs. In addition to general purpose programming, Duet supports encoding machine learning algorithms such as stochastic gradient descent, as well as common auxiliary data analysis tasks such as clipping, normalization and hyperparameter tuning - each of which are particularly challenging to encode in a statically verified differential privacy framework.
We present a core design of the Duet language and linear type system, and complete key proofs about privacy for well-typed programs. We then show how to extend Duet to support realistic machine learning applications and recent variants of differential privacy which result in improved accuracy for many practical differentially private algorithms. Finally, we implement several differentially private machine learning algorithms in Duet which have never before been automatically verified by a language-based tool, and we present experimental results which demonstrate the benefits of Duet's language design in terms of accuracy of trained machine learning models.

Publisher's Version Article Search Artifacts Available Artifacts Functional
Effective Lock Handling in Stateless Model Checking
Michalis Kokologiannakis, Azalea Raad, and Viktor Vafeiadis
(MPI-SWS, Germany)
Stateless Model Checking (SMC) is a verification technique for concurrent programs that checks for safety violations by exploring all possible thread interleavings. SMC is usually coupled with Partial Order Reduction (POR), which exploits the independence of instructions to avoid redundant explorations when an equivalent one has already been considered. While effective POR techniques have been developed for many different memory models, they are only able to exploit independence at the instruction level, which makes them unsuitable for programs with coarse-grained synchronization mechanisms such as locks.
We present a lock-aware POR algorithm, LAPOR, that exploits independence at both instruction and critical section levels. This enables LAPOR to explore exponentially fewer interleavings than the state-of-the-art techniques for programs that use locks conservatively. Our algorithm is sound, complete, and optimal, and can be used for verifying programs under several different memory models. We implement LAPOR in a tool and show that it can be exponentially faster than the state-of-the-art model checkers.

Publisher's Version Article Search Info Artifacts Available Artifacts Functional
FuzzFactory: Domain-Specific Fuzzing with Waypoints
Rohan Padhye, Caroline Lemieux, Koushik Sen, Laurent Simon, and Hayawardh Vijayakumar
(University of California at Berkeley, USA; Samsung Research, USA)
Coverage-guided fuzz testing has gained prominence as a highly effective method of finding security vulnerabilities such as buffer overflows in programs that parse binary data. Recently, researchers have introduced various specializations to the coverage-guided fuzzing algorithm for different domain-specific testing goals, such as finding performance bottlenecks, generating valid inputs, handling magic-byte comparisons, etc. Each such solution can require non-trivial implementation effort and produces a distinct variant of a fuzzing tool. We observe that many of these domain-specific solutions follow a common solution pattern.
In this paper, we present FuzzFactory, a framework for developing domain-specific fuzzing applications without requiring changes to mutation and search heuristics. FuzzFactory allows users to specify the collection of dynamic domain-specific feedback during test execution, as well as how such feedback should be aggregated. FuzzFactory uses this information to selectively save intermediate inputs, called waypoints, to augment coverage-guided fuzzing. Such waypoints always make progress towards domain-specific multi-dimensional objectives. We instantiate six domain-specific fuzzing applications using FuzzFactory: three re-implementations of prior work and three novel solutions, and evaluate their effectiveness on benchmarks from Google's fuzzer test suite. We also show how multiple domains can be composed to perform better than the sum of their parts. For example, we combine domain-specific feedback about strict equality comparisons and dynamic memory allocations, to enable the automatic generation of LZ4 bombs and PNG bombs.

Publisher's Version Article Search Artifacts Available Artifacts Functional
AL: Autogenerating Supervised Learning Programs
José P. Cambronero and Martin C. Rinard
(Massachusetts Institute of Technology, USA)
We present AL, a novel automated machine learning system that learns to generate new supervised learning pipelines from an existing corpus of supervised learning programs. In contrast to existing automated machine learning tools, which typically implement a search over manually selected machine learning functions and classes, AL learns to identify the relevant classes in an API by analyzing dynamic program traces that use the target machine learning library. AL constructs a conditional probability model from these traces to estimate the likelihood of the generated supervised learning pipelines and uses this model to guide the search to generate pipelines for new datasets. Our evaluation shows that AL can produce successful pipelines for datasets that previous systems fail to process and produces pipelines with comparable predictive performance for datasets that previous systems process successfully.

Publisher's Version Article Search
Sound and Reusable Components for Abstract Interpretation
Sven Keidel and Sebastian Erdweg
(Johannes Gutenberg University Mainz, Germany)
Abstract interpretation is a methodology for defining sound static analysis. Yet, building sound static analyses for modern programming languages is difficult, because these static analyses need to combine sophisticated abstractions for values, environments, stores, etc. However, static analyses often tightly couple these abstractions in the implementation, which not only complicates the implementation, but also makes it hard to decide which parts of the analyses can be proven sound independently from each other. Furthermore, this coupling makes it hard to combine soundness lemmas for parts of the analysis to a soundness proof of the complete analysis.
To solve this problem, we propose to construct static analyses modularly from reusable analysis components. Each analysis component encapsulates a single analysis concern and can be proven sound independently from the analysis where it is used. We base the design of our analysis components on arrow transformers, which allows us to compose analysis components. This composition preserves soundness, which guarantees that a static analysis is sound, if all its analysis components are sound. This means that analysis developers do not have to worry about soundness as long as they reuse sound analysis components. To evaluate our approach, we developed a library of 13 reusable analysis components in Haskell. We use these components to define a k-CFA analysis for PCF and an interval and reaching definition analysis for a While language.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
Design, Implementation, and Application of GPU-Based Java Bytecode Interpreters
Ahmet Celik, Pengyu Nie, Christopher J. Rossbach, and Milos Gligoric
(University of Texas at Austin, USA; VMware, USA)
We present the design and implementation of GVM, the first system for executing Java bytecode entirely on GPUs. GVM is ideal for applications that execute a large number of short-living tasks, which share a significant fraction of their codebase and have similar execution time. GVM uses novel algorithms, scheduling, and data layout techniques to adapt to the massively parallel programming and execution model of GPUs. We apply GVM to generate and execute tests for Java projects. First, we implement a sequence-based test generation on top of GVM and design novel algorithms to avoid redundant test sequences. Second, we use GVM to execute randomly generated test cases. We evaluate GVM by comparing it with two existing Java bytecode interpreters (Oracle JVM and Java Pathfinder), as well as with the Oracle JVM with just-in-time (JIT) compiler, which has been engineered and optimized for over twenty years. Our evaluation shows that sequence-based test generation on GVM outperforms both Java Pathfinder and Oracle JVM interpreter. Additionally, our results show that GVM performs as well as running our parallel sequence-based test generation algorithm using JVM with JIT with many CPU threads. Furthermore, our evaluation on several classes from open-source projects shows that executing randomly generated tests on GVM outperforms sequential execution on JVM interpreter and JVM with JIT.

Publisher's Version Article Search Artifacts Functional
Specification and Inference of Trace Refinement Relations
Timos Antonopoulos, Eric Koskinen, and Ton Chanh Le
(Yale University, USA; Stevens Institute of Technology, USA)
The modern software engineering process is evolutionary, with commits/patches begetting new versions of code, progressing steadily toward improved systems. In recent years, program analysis and verification tools have exploited version-based reasoning, where new code can be seen in terms of how it has changed from the previous version. When considering program versions, refinement seems a natural fit and, in recent decades, researchers have weakened classical notions of concrete refinement and program equivalence to capture similarities as well as differences between programs. For example, Benton, Yang and others have worked on state-based refinement relations.
In this paper, we explore a form of weak refinement based on trace relations rather than state relations. The idea begins by partitioning traces of a program C1 into trace classes, each identified via a restriction r1. For each class, we specify similar behavior in the other program C2 via a separate restriction r2 on C2. Still, these two trace classes may not yet be equivalent so we further permit a weakening via a binary relation A on traces, that allows one to, for instance disregard unimportant events, relate analogous atomic events, etc.
We address several challenges that arise. First, we explore one way to specify trace refinement relations by instantiating the framework to Kleene Algebra with Tests (KAT) due to Kozen. We use KAT intersection for restriction, KAT hypotheses for A, KAT inclusion for refinement, and have proved compositionality. Next, we present an algorithm for automatically synthesizing refinement relations, based on a mixture of semantic program abstraction, KAT inclusion, a custom edit-distance algorithm on counterexamples, and case-analysis on nondeterministic branching. We have proved our algorithm to be sound. Finally, we implemented our algorithm as a tool called Knotical, on top of Interproc and Symkat. We demonstrate promising first steps in synthesizing trace refinement relations across a hand-crafted collection of 37 benchmarks that include changing fragments of array programs, models of systems code, and examples inspired by the thttpd and Merecat web servers.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Dependence-Aware, Unbounded Sound Predictive Race Detection
Kaan Genç, Jake Roemer, Yufan Xu, and Michael D. Bond
(Ohio State University, USA)
Data races are a real problem for parallel software, yet hard to detect. Sound predictive analysis observes a program execution and detects data races that exist in some other, unobserved execution. However, existing predictive analyses miss races because they do not scale to full program executions or do not precisely incorporate data and control dependence.
This paper introduces two novel, sound predictive approaches that incorporate data and control dependence and handle full program executions. An evaluation using real, large Java programs shows that these approaches detect more data races than the closest related approaches, thus advancing the state of the art in sound predictive race detection.

Publisher's Version Article Search
Trace Aware Random Testing for Distributed Systems
Burcu Kulahcioglu Ozkan, Rupak Majumdar, and Simin Oraee
(MPI-SWS, Germany)
Distributed and concurrent applications often have subtle bugs that only get exposed under specific schedules. While these schedules may be found by systematic model checking techniques, in practice, model checkers do not scale to large systems. On the other hand, naive random exploration techniques often require a very large number of runs to find the specific interactions needed to expose a bug. In recent years, several random testing algorithms have been proposed that, on the one hand, exploit state-space reduction strategies from model checking and, on the other, provide guarantees on the probability of hitting bugs of certain kinds.
These existing techniques exploit two orthogonal strategies to reduce the state space: partial-order reduction and bug depth. Testing algorithms based on partial order techniques, such as RAPOS or POS, ensure non-redundant exploration of independent interleavings among system events by imposing an equivalence relation on schedules and ideally exploring only one schedule from each equivalence class. Techniques based on bug depth, such as PCT, exploit the empirical observation that many bugs are exposed by the clever scheduling of a small number of key events. They bias the sample space of schedules to only cover all executions of small depth, rather than the much larger space of all schedules. At this point, there is no random testing algorithm that combines the power of both approaches.
In this paper, we provide such an algorithm. Our algorithm, trace-aware PCT (taPCTCP), extends and unifies several different algorithms in the random testing literature. It samples the space of low-depth executions by constructing a schedule online, while taking dependencies among events into account. Moreover, the algorithm comes with a theoretical guarantee on the probability of sampling a trace of low depth---the probability grows exponentially with the depth but only polynomially with the number of racy events explored. We further show that the guarantee is optimal among a large class of techniques.
We empirically compare our algorithm with several state-of-the-art random testing approaches for concurrent software on two large-scale distributed systems, Zookeeper and Cassandra, and show that our approach is effective in uncovering subtle bugs and usually outperforms related random testing algorithms.

Publisher's Version Article Search
Coverage Guided, Property Based Testing
Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce
(University of Maryland, USA; University of Pennsylvania, USA)
Property-based random testing, exemplified by frameworks such as Haskell's QuickCheck, works by testing an executable predicate (a property) on a stream of randomly generated inputs. Property testing works very well in many cases, but not always. Some properties are conditioned on the input satisfying demanding semantic invariants that are not consequences of its syntactic structure---e.g., that an input list must be sorted or have no duplicates. Most randomly generated inputs fail to satisfy properties with such sparse preconditions, and so are simply discarded. As a result, much of the target system may go untested.
We address this issue with a novel technique called coverage guided, property based testing (CGPT). Our approach is inspired by the related area of coverage guided fuzzing, exemplified by tools like AFL. Rather than just generating a fresh random input at each iteration, CGPT can also produce new inputs by mutating previous ones using type-aware, generic mutator operators. The target program is instrumented to track which control flow branches are executed during a run and inputs whose runs expand control-flow coverage are retained for future mutations. This means that, when sparse conditions in the target are satisfied and new coverage is observed, the input that triggered them will be retained and used as a springboard to go further.
We have implemented CGPT as an extension to the QuickChick property testing tool for Coq programs; we call our implementation FuzzChick. We evaluate FuzzChick on two Coq developments for abstract machines that aim to enforce flavors of noninterference, which has a (very) sparse precondition. We systematically inject bugs in the machines' checking rules and use FuzzChick to look for counterexamples to the claim that they satisfy a standard noninterference property. We find that vanilla QuickChick almost always fails to find any bugs after a long period of time, as does an earlier proposal for combining property testing and fuzzing. In contrast, FuzzChick often finds them within seconds to minutes. Moreover, FuzzChick is almost fully automatic; although highly tuned, hand-written generators can find the bugs faster, they require substantial amounts of insight and manual effort.

Publisher's Version Article Search
PlanAlyzer: Assessing Threats to the Validity of Online Experiments
Emma Tosch, Eytan Bakshy, Emery D. Berger, David D. Jensen, and J. Eliot B. Moss
(University of Massachusetts Amherst, USA; Facebook, USA)
Online experiments have become a ubiquitous aspect of design and engineering processes within Internet firms. As the scale of experiments has grown, so has the complexity of their design and implementation. In response, firms have developed software frameworks for designing and deploying online experiments. Ensuring that experiments in these frameworks are correctly designed and that their results are trustworthy---referred to as internal validity---can be difficult. Currently, verifying internal validity requires manual inspection by someone with substantial expertise in experimental design.
We present the first approach for statically checking the internal validity of online experiments. Our checks are based on well-known problems that arise in experimental design and causal inference. Our analyses target PlanOut, a widely deployed, open-source experimentation framework that uses a domain-specific language to specify and run complex experiments. We have built a tool called PlanAlyzer that checks PlanOut programs for a variety of threats to internal validity, including failures of randomization, treatment assignment, and causal sufficiency. PlanAlyzer uses its analyses to automatically generate contrasts, a key type of information required to perform valid statistical analyses over the results of these experiments. We demonstrate PlanAlyzer's utility on a corpus of PlanOut scripts deployed in production at Facebook, and we evaluate its ability to identify threats to validity on a mutated subset of this corpus. PlanAlyzer has both precision and recall of 92% on the mutated corpus, and 82% of the contrasts it generates match hand-specified data.

Publisher's Version Article Search
I/O Dependent Idempotence Bugs in Intermittent Systems
Milijana Surbatovich, Limin Jia, and Brandon Lucia
(Carnegie Mellon University, USA)
Intermittently-powered, energy-harvesting devices operate on energy collected from their environment and must operate intermittently as energy is available.
Runtime systems for such devices often rely on checkpoints or redo-logs to save execution state between power cycles, causing arbitrary code regions to re-execute on reboot. Any non-idempotent program behavior—behavior that can change on each execution—can lead to incorrect results.
This work investigates non-idempotent behavior caused by repeating I/O operations, not addressed by prior work. If such operations affect a control statement or address of a memory update, they can cause programs to take different paths or write to different memory locations on re-executions, resulting in inconsistent memory states. We provide the first characterization of input-dependent idempotence bugs and develop IBIS-S, a program analysis tool for detecting such bugs at compile time, and IBIS-D, a dynamic information flow tracker to detect bugs at runtime.
These tools use taint propagation to determine the reach of input. IBIS-S searches for code patterns leading to inconsistent memory updates, while IBIS-D detects concrete memory inconsistencies. We evaluate IBIS on embedded system drivers and applications. IBIS can detect I/O-dependent idempotence bugs, giving few (IBIS-S) or no (IBIS-D) false positives and providing actionable bug reports. These bugs are common in sensor-driven applications and are not fixed by existing intermittent systems.

Publisher's Version Article Search
Initialize Once, Start Fast: Application Initialization at Build Time
Christian Wimmer, Codrut Stancu, Peter Hofer, Vojin Jovanovic, Paul Wögerer, Peter B. Kessler, Oleg Pliss, and Thomas Würthinger
(Oracle Labs, USA; Oracle Labs, Austria; Oracle Labs, Switzerland)
Arbitrary program extension at run time in language-based VMs, e.g., Java's dynamic class loading, comes at a startup cost: high memory footprint and slow warmup. Cloud computing amplifies the startup overhead. Microservices and serverless cloud functions lead to small, self-contained applications that are started often. Slow startup and high memory footprint directly affect the cloud hosting costs, and slow startup can also break service-level agreements. Many applications are limited to a prescribed set of pre-tested classes, i.e., use a closed-world assumption at deployment time. For such Java applications, GraalVM Native Image offers fast startup and stable performance.
GraalVM Native Image uses a novel iterative application of points-to analysis and heap snapshotting, followed by ahead-of-time compilation with an optimizing compiler. Initialization code can run at build time, i.e., executables can be tailored to a particular application configuration. Execution at run time starts with a pre-populated heap, leveraging copy-on-write memory sharing. We show that this approach improves the startup performance by up to two orders of magnitude compared to the Java HotSpot VM, while preserving peak performance. This allows Java applications to have a better startup performance than Go applications and the V8 JavaScript VM.

Publisher's Version Article Search Info
Safer Smart Contract Programming with Scilla
Ilya Sergey, Vaivaswatha Nagaraj, Jacob Johannsen, Amrit Kumar, Anton Trunov, and Ken Chan Guan Hao
(Yale-NUS College, Singapore; National University of Singapore, Singapore; Zilliqa Research, India; Zilliqa Research, Denmark; Zilliqa Research, UK; Zilliqa Research, Russia; Zilliqa Research, Malaysia)
The rise of programmable open distributed consensus platforms based on the blockchain technology has aroused a lot of interest in replicated stateful computations, aka smart contracts. As blockchains are used predominantly in financial applications, smart contracts frequently manage millions of dollars worth of virtual coins. Since smart contracts cannot be updated once deployed, the ability to reason about their correctness becomes a critical task. Yet, the de facto implementation standard, pioneered by the Ethereum platform, dictates smart contracts to be deployed in a low-level language, which renders independent audit and formal verification of deployed code infeasible in practice.
We report an ongoing experiment held with an industrial blockchain vendor on designing, evaluating, and deploying Scilla, a new programming language for safe smart contracts. Scilla is positioned as an intermediate-level language, suitable to serve as a compilation target and also as an independent programming framework. Taking System F as a foundational calculus, Scilla offers strong safety guarantees by means of type soundness. It provides a clean separation between pure computational, state-manipulating, and communication aspects of smart contracts, avoiding many known pitfalls due to execution in a byzantine environment. We describe the motivation, design principles, and semantics of Scilla, and we report on Scilla use cases provided by the developer community. Finally, we present a framework for lightweight verification of Scilla programs, and showcase it with two domain-specific analyses on a suite of real-world use cases.

Publisher's Version Article Search Info Artifacts Available Artifacts Reusable Artifacts Functional
ApproxHPVM: A Portable Compiler IR for Accuracy-Aware Optimizations
Hashim Sharif, Prakalp Srivastava, Muhammad Huzaifa, Maria Kotsifakou, Keyur Joshi, Yasmin Sarita, Nathan Zhao, Vikram S. Adve, Sasa Misailovic, and Sarita Adve
(University of Illinois at Urbana-Champaign, USA; Cornell University, USA)
We propose ApproxHPVM, a compiler IR and system designed to enable accuracy-aware performance and energy tuning on heterogeneous systems with multiple compute units and approximation methods. ApproxHPVM automatically translates end-to-end application-level quality metrics into accuracy requirements for individual operations. ApproxHPVM uses a hardware-agnostic accuracy-tuning phase to do this translation that provides greater portability across heterogeneous hardware platforms and enables future capabilities like accuracy-aware dynamic scheduling and design space exploration.
ApproxHPVM incorporates three main components: (a) a compiler IR with hardware-agnostic approximation metrics, (b) a hardware-agnostic accuracy-tuning phase to identify error-tolerant computations, and (c) an accuracy-aware hardware scheduler that maps error-tolerant computations to approximate hardware components. As ApproxHPVM does not incorporate any hardware-specific knowledge as part of the IR, it can serve as a portable virtual ISA that can be shipped to all kinds of hardware platforms.
We evaluate our framework on nine benchmarks from the deep learning domain and five image processing benchmarks. Our results show that our framework can offload chunks of approximable computations to special-purpose accelerators that provide significant gains in performance and energy, while staying within user-specified application-level quality metrics with high probability. Across the 14 benchmarks, we observe from 1-9x performance speedups and 1.1-11.3x energy reduction for very small reductions in accuracy.

Publisher's Version Article Search
Reflection-Aware Static Regression Test Selection
August Shi, Milica Hadzi-Tanovic, Lingming Zhang, Darko Marinov, and Owolabi Legunsen
(University of Illinois at Urbana-Champaign, USA; University of Texas at Dallas, USA)
Regression test selection (RTS) aims to speed up regression testing by rerunning only tests that are affected by code changes. RTS can be performed using static or dynamic analysis techniques. Our prior study showed that static and dynamic RTS perform similarly for medium-sized Java projects. However, the results of that prior study also showed that static RTS can be unsafe, missing to select tests that dynamic RTS selects, and that reflection was the only cause of unsafety observed among the evaluated projects.
In this paper, we investigate five techniques—three purely static techniques and two hybrid static-dynamic techniques—that aim to make static RTS safe with respect to reflection. We implement these reflection-aware (RA) techniques by extending the reflection-unaware (RU) class-level static RTS technique in a tool called STARTS. To evaluate these RA techniques, we compare their end-to-end times with RU, and with RetestAll, which reruns all tests after every code change. We also compare safety and precision of the RA techniques with Ekstazi, a state-of-the-art dynamic RTS technique; precision is a measure of unaffected tests selected.
Our evaluation on 1173 versions of 24 open-source Java projects shows negative results. The RA techniques improve the safety of RU but at very high costs. The purely static techniques are safe in our experiments but decrease the precision of RU, with end-to-end time at best 85.8% of RetestAll time, versus 69.1% for RU. One hybrid static-dynamic technique improves the safety of RU but at high cost, with end-to-end time that is 91.2% of RetestAll. The other hybrid static-dynamic technique provides better precision, is safer than RU, and incurs lower end-to-end time—75.8% of RetestAll, but it can still be unsafe in the presence of test-order dependencies. Our study highlights the challenges involved in making static RTS safe with respect to reflection.

Publisher's Version Article Search
Automatic and Scalable Detection of Logical Errors in Functional Programming Assignments
Dowon Song, Myungho Lee, and Hakjoo Oh
(Korea University, South Korea)
We present a new technique for automatically detecting logical errors in functional programming assignments. Compared to syntax or type errors, detecting logical errors remains largely a manual process that requires hand-made test cases. However, designing proper test cases is nontrivial and involves a lot of human effort. Furthermore, manual test cases are unlikely to catch diverse errors because instructors cannot predict all corner cases of diverse student submissions. We aim to reduce this burden by automatically generating test cases for functional programs. Given a reference program and a student's submission, our technique generates a counter-example that captures the semantic difference of the two programs without any manual effort. The key novelty behind our approach is the counter-example generation algorithm that combines enumerative search and symbolic verification techniques in a synergistic way. The experimental results show that our technique is able to detect 88 more errors not found by mature test cases that have been improved over the past few years, and performs better than the existing property-based testing techniques. We also demonstrate the usefulness of our technique in the context of automated program repair, where it effectively helps to eliminate test-suite-overfitted patches.

Publisher's Version Article Search Artifacts Available Artifacts Reusable Artifacts Functional
Detecting Nondeterministic Payment Bugs in Ethereum Smart Contracts
Shuai Wang, Chengyu Zhang, and Zhendong Su
(Hong Kong University of Science and Technology, China; East China Normal University, China; ETH Zurich, Switzerland)
The term “smart contracts” has become ubiquitous to describe an enormous number of programs uploaded to the popular Ethereum blockchain system. Despite rapid growth of the smart contract ecosystem, errors and exploitations have been constantly reported from online contract systems, which has put financial stability at risk with losses totaling millions of US dollars. Most existing research focuses on pinpointing specific types of vulnerabilities using known patterns. However, due to the lack of awareness of the inherent nondeterminism in the Ethereum blockchain system and how it affects the funds transfer of smart contracts, there can be unknown vulnerabilities that may be exploited by attackers to access numerous online smart contracts.
In this paper, we introduce a methodical approach to understanding the inherent nondeterminism in the Ethereum blockchain system and its (unwanted) influence on contract payments. We show that our new focus on nondeterminism-related smart contract payment bugs captures the root causes of many common vulnerabilities without relying on any known patterns and also encompasses recently disclosed issues that are not handled by existing research. To do so, we introduce techniques to systematically model components in the contract execution context and to expose various nondeterministic factors that are not yet fully understood. We further study how these nondeterministic factors impact contract funds transfer using information flow tracking. The technical challenge of detecting nondeterministic payments lies in discovering the contract global variables subtly affected by read-write hazards because of unpredictable transaction scheduling and external callee behavior. We show how to augment and instrument a contract program into a representation that simulates the execution of a large subset of the contract behavior. The instrumented code is then analyzed to flag nondeterministic global variables using off-the-shelf model checkers.
We implement the proposed techniques as a practical tool named NPChecker (Nondeterministic Payment Checker) and evaluate it on 30K online contracts (3,075 distinct) collected from the Ethereum mainnet. NPChecker has successfully detected nondeterministic payments in 1,111 online contracts with reasonable cost. Further investigation reports high precision of NPChecker (only four false positives in a manual study of 50 contracts). We also show that NPChecker unveils contracts vulnerable to recently-disclosed attack vectors. NPChecker can identify all six new vulnerabilities or variants of common smart contract vulnerabilities that are missed by existing research relying on a “contract vulnerability checklist.”

Publisher's Version Article Search

proc time: 0.34