Powered by
Conference Publishing Consulting

37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2016), June 13–17, 2016, Santa Barbara, CA, USA

PLDI 2016 – Proceedings

Contents - Abstracts - Authors

Frontmatter

Title Page

Message from the Chairs
Welcome to PLDI 2016, the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation; held this year in Santa Barbara, California. PLDI is the premier research conference on programming languages and their implementation.
Organization
PLDI 2016 Organization
Keynote Presentations
PLDI 2016 Keynote Presentations
Sponsors and Supporters
Sponsors and Supporters

Research Papers

Down to the Metal I

Into the Depths of C: Elaborating the De Facto Standards
Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N. M. Watson, and Peter Sewell
(University of Cambridge, UK)
C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-prone situation. First, we describe an in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice. We articulate many specific questions, build a suite of semantic test cases, gather experimental data from multiple implementations, and survey what C experts believe about the de facto standards. We identify questions where there is a consensus (either following ISO or differing) and where there are conflicts. We apply all this to an experimental C implemented above capability hardware. Second, we describe a formal model, Cerberus, for large parts of C. Cerberus is parameterised on its memory model; it is linkable either with a candidate de facto memory object model, under construction, or with an operational C11 concurrency model; it is defined by elaboration to a much simpler Core language for accessibility, and it is executable as a test oracle on small examples. This should provide a solid basis for discussion of what mainstream C is now: what programmers and analysis tools can assume and what compilers aim to implement. Ultimately we hope it will be a step towards clear, consistent, and accepted semantics for the various use-cases of C.
Publisher's Version Article Search Info
Living on the Edge: Rapid-Toggling Probes with Cross-Modification on x86
Buddhika Chamith, Bo Joel Svensson, Luke Dalessandro, and Ryan R. Newton
(Indiana University, USA)
Dynamic probe injection is now a widely used method to debug performance in production. Current techniques for dynamic probing of native code, however, rely on an expensive stop-the-world approach: binary changes are made within a safe state of the program---typically in which all the program threads are halted---to ensure that another thread executing the modified code region doesn't step into a partially-modified code. Stop-the-world patching is not scalable. In contrast, low overhead, scalable probes that can be rapidly toggled on and off in-place would open up new use cases for statistical profilers and language implementations, even traditional ahead-of-time, native-code compilers. In this paper we introduce safe cross-modification protocols that mutate x86 code between threads but do not require quiescing threads, resulting in radically lower overheads than existing solutions. A key problem is handling instructions that straddle cache lines. We empirically evaluate existing x86 architectures to derive a safe policy given current processor behavior, and we argue that future architectures should clarify the semantics of instruction fetching to make cheap cross-modification easier and future proof.
Publisher's Version Article Search
Polymorphic Type Inference for Machine Code
Matt Noonan, Alexey Loginov, and David Cok
(GrammaTech, USA)
For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe optimizations mean that type information in a stripped binary is essentially nonexistent. The problem of recovering high-level types by performing type inference over stripped machine code is called type reconstruction, and offers a useful capability in support of reverse engineering and decompilation. In this paper, we motivate and develop a novel type system and algorithm for machine-code type inference. The features of this type system were developed by surveying a wide collection of common source- and machine-code idioms, building a catalog of challenging cases for type reconstruction. We found that these idioms place a sophisticated set of requirements on the type system, inducing features such as recursively-constrained polymorphic types. Many of the features we identify are often seen only in expressive and powerful type systems used by high-level functional languages. Using these type-system features as a guideline, we have developed Retypd: a novel static type-inference algorithm for machine code that supports recursive types, polymorphism, and subtyping. Retypd yields more accurate inferred types than existing algorithms, while also enabling new capabilities such as reconstruction of pointer const annotations with 98% recall. Retypd can operate on weaker program representations than the current state of the art, removing the need for high-quality points-to information that may be impractical to compute.
Publisher's Version Article Search Info

Verification I

Data-Driven Precondition Inference with Learned Features
Saswat Padhi, Rahul Sharma, and Todd Millstein
(University of California at Los Angeles, USA; Stanford University, USA)
We extend the data-driven approach to inferring preconditions for code from a set of test executions. Prior work requires a fixed set of features, atomic predicates that define the search space of possible preconditions, to be specified in advance. In contrast, we introduce a technique for on-demand feature learning, which automatically expands the search space of candidate preconditions in a targeted manner as necessary. We have instantiated our approach in a tool called PIE. In addition to making precondition inference more expressive, we show how to apply our feature-learning technique to the setting of data-driven loop invariant inference. We evaluate our approach by using PIE to infer rich preconditions for black-box OCaml library functions and using our loop-invariant inference algorithm as part of an automatic program verifier for C++ programs.
Publisher's Version Article Search
Cartesian Hoare Logic for Verifying k-Safety Properties
Marcelo Sousa and Isil Dillig
(University of Oxford, UK; University of Texas at Austin, USA)
Unlike safety properties which require the absence of a “bad” program trace, k-safety properties stipulate the absence of a “bad” interaction between k traces. Examples of k-safety properties include transitivity, associativity, anti-symmetry, and monotonicity. This paper presents a sound and relatively complete calculus, called Cartesian Hoare Logic (CHL), for verifying k-safety properties. We also present an automated verification algorithm based on CHL and implement it in a tool called DESCARTES. We use DESCARTES to analyze user-defined relational operators in Java and demonstrate that DESCARTES is effective at verifying (or finding violations of) multiple k-safety properties.
Publisher's Version Article Search
Verifying Bit-Manipulations of Floating-Point
Wonyeol Lee, Rahul Sharma, and Alex Aiken
(Stanford University, USA)
Reasoning about floating-point is difficult and becomes only more so if there is an interplay between floating-point and bit-level operations. Even though real-world floating-point libraries use implementations that have such mixed computations, no systematic technique to verify the correctness of the implementations of such computations is known. In this paper, we present the first general technique for verifying the correctness of mixed binaries, which combines abstraction, analytical optimization, and testing. The technique provides a method to compute an error bound of a given implementation with respect to its mathematical specification. We apply our technique to Intel's implementations of transcendental functions and prove formal error bounds for these widely used routines.
Publisher's Version Article Search

Testing and Debugging

Coverage-Directed Differential Testing of JVM Implementations
Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao
(Shanghai Jiao Tong University, China; East China Normal University, China; University of California at Davis, USA; Kyushu University, Japan)
Java virtual machine (JVM) is a core technology, whose reliability is critical. Testing JVM implementations requires painstaking effort in designing test classfiles (*.class) along with their test oracles. An alternative is to employ binary fuzzing to differentially test JVMs by blindly mutating seeding classfiles and then executing the resulting mutants on different JVM binaries for revealing inconsistent behaviors. However, this blind approach is not cost effective in practice because most of the mutants are invalid and redundant. This paper tackles this challenge by introducing classfuzz, a coverage-directed fuzzing approach that focuses on representative classfiles for differential testing of JVMs’ startup processes. Our core insight is to (1) mutate seeding classfiles using a set of predefined mutation operators (mutators) and employ Markov Chain Monte Carlo (MCMC) sampling to guide mutator selection, and (2) execute the mutants on a reference JVM implementation and use coverage uniqueness as a discipline for accepting representative ones. The accepted classfiles are used as inputs to differentially test different JVM implementations and find defects. We have implemented classfuzz and conducted an extensive evaluation of it against existing fuzz testing algorithms. Our evaluation results show that classfuzz can enhance the ratio of discrepancy-triggering classfiles from 1.7% to 11.9%. We have also reported 62 JVM discrepancies, along with the test classfiles, to JVM developers. Many of our reported issues have already been confirmed as JVM defects, and some even match recent clarifications and changes to the Java SE 8 edition of the JVM specification.
Publisher's Version Article Search
Exposing Errors Related to Weak Memory in GPU Applications
Tyler Sorensen and Alastair F. Donaldson
(Imperial College London, UK)
We present the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. We evaluate our approach on seven GPUs spanning three Nvidia architectures, across ten CUDA applications that use fine-grained concurrency. Our results show that applications that rarely or never exhibit errors related to weak memory when executed natively can readily exhibit these errors when executed in our testing environment. Our testing environment also provides a means to help identify the root causes of such errors, and automatically suggests how to insert fences that harden an application against weak memory bugs. To understand the cost of GPU fences, we benchmark applications with fences provided by the hardening strategy as well as a more conservative, sound fencing strategy.
Publisher's Version Article Search
Lightweight Computation Tree Tracing for Lazy Functional Languages
Maarten Faddegon and Olaf Chitil
(University of Kent, UK)
A computation tree of a program execution describes computations of functions and their dependencies. A computation tree describes how a program works and is at the heart of algorithmic debugging. To generate a computation tree, existing algorithmic debuggers either use a complex implementation or yield a less informative approximation. We present a method for lazy functional languages that requires only a simple tracing library to generate a detailed computation tree. With our algorithmic debugger a programmer can debug any Haskell program by only importing our library and annotating suspected functions.
Publisher's Version Article Search aec-badge-pldi

Energy and Performance

Effective Padding of Multidimensional Arrays to Avoid Cache Conflict Misses
Changwan Hong, Wenlei Bao, Albert Cohen, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan
(Ohio State University, USA; Inria, France; ENS, France; Pacific Northwest National Laboratory, USA; Louisiana State University, USA)
Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays aimed at a set-associative cache for arbitrary tile sizes. In addition, we develop the first solution to padding for nested tiles and multi-level caches. Experimental results with multiple benchmarks demonstrate a significant performance improvement from padding.
Publisher's Version Article Search
GreenWeb: Language Extensions for Energy-Efficient Mobile Web Computing
Yuhao Zhu and Vijay Janapa Reddi
(University of Texas at Austin, USA)
Web computing is gradually shifting toward mobile devices, in which the energy budget is severely constrained. As a result, Web developers must be conscious of energy efficiency. However, current Web languages provide developers little control over energy consumption. In this paper, we take a first step toward language-level research to enable energy-efficient Web computing. Our key motivation is that mobile systems can wisely budget energy usage if informed with user quality-of-service (QoS) constraints. To do this, programmers need new abstractions. We propose two language abstractions, QoS type and QoS target, to capture two fundamental aspects of user QoS experience. We then present GreenWeb, a set of language extensions that empower developers to easily express the QoS abstractions as program annotations. As a proof of concept, we develop a GreenWeb runtime, which intelligently determines how to deliver specified user QoS expectation while minimizing energy consumption. Overall, GreenWeb shows significant energy savings (29.2% ∼ 66.0%) over Android’s default Interactive governor with few QoS violations. Our work demonstrates a promising first step toward language innovations for energy-efficient Web computing.
Publisher's Version Article Search
Input Responsiveness: Using Canary Inputs to Dynamically Steer Approximation
Michael A. Laurenzano, Parker Hill, Mehrzad Samadi, Scott Mahlke, Jason Mars, and Lingjia Tang
(University of Michigan, USA)
This paper introduces Input Responsive Approximation (IRA), an approach that uses a canary input — a small program input carefully constructed to capture the intrinsic properties of the original input — to automatically control how program approximation is applied on an input-by-input basis. Motivating this approach is the observation that many of the prior techniques focusing on choosing how to approximate arrive at conservative decisions by discounting substantial differences between inputs when applying approximation. The main challenges in overcoming this limitation lie in making the choice of how to approximate both effectively (e.g., the fastest approximation that meets a particular accuracy target) and rapidly for every input. With IRA, each time the approximate program is run, a canary input is constructed and used dynamically to quickly test a spectrum of approximation alternatives. Based on these runtime tests, the approximation that best fits the desired accuracy constraints is selected and applied to the full input to produce an approximate result. We use IRA to select and parameterize mixes of four approximation techniques from the literature for a range of 13 image processing, machine learning, and data mining applications. Our results demonstrate that IRA significantly outperforms prior approaches, delivering an average of 10.2× speedup over exact execution while minimizing accuracy losses in program outputs.
Publisher's Version Article Search

New Languages

Configuration Synthesis for Programmable Analog Devices with Arco
Sara Achour, Rahul Sarpeshkar, and Martin C. Rinard
(Massachusetts Institute of Technology, USA; Dartmouth College, USA)
Programmable analog devices have emerged as a powerful computing substrate for performing complex neuromorphic and cytomorphic computations. We present Arco, a new solver that, given a dynamical system specification in the form of a set of differential equations, generates physically realizable configurations for programmable analog devices that are algebraically equivalent to the specified system. On a set of benchmarks from the biological domain, Arco generates configurations with 35 to 534 connections and 28 to 326 components in 1 to 54 minutes.
Publisher's Version Article Search aec-badge-pldi
From Datalog to Flix: A Declarative Language for Fixed Points on Lattices
Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták
(University of Waterloo, Canada)
We present Flix, a declarative programming language for specifying and solving least fixed point problems, particularly static program analyses. Flix is inspired by Datalog and extends it with lattices and monotone functions. Using Flix, implementors of static analyses can express a broader range of analyses than is currently possible in pure Datalog, while retaining its familiar rule-based syntax. We define a model-theoretic semantics of Flix as a natural extension of the Datalog semantics. This semantics captures the declarative meaning of Flix programs without imposing any specific evaluation strategy. An efficient strategy is semi-naive evaluation which we adapt for Flix. We have implemented a compiler and runtime for Flix, and used it to express several well-known static analyses, including the IFDS and IDE algorithms. The declarative nature of Flix clearly exposes the similarity between these two algorithms.
Publisher's Version Article Search
Latte: A Language, Compiler, and Runtime for Elegant and Efficient Deep Neural Networks
Leonard Truong, Rajkishore Barik, Ehsan Totoni, Hai Liu, Chick Markley, Armando Fox, and Tatiana Shpeisman
(Intel Labs, USA; University of California at Berkeley, USA)
Deep neural networks (DNNs) have undergone a surge in popularity with consistent advances in the state of the art for tasks including image recognition, natural language processing, and speech recognition. The computationally expensive nature of these networks has led to the proliferation of implementations that sacrifice abstraction for high performance. In this paper, we present Latte, a domain-specific language for DNNs that provides a natural abstraction for specifying new layers without sacrificing performance. Users of Latte express DNNs as ensembles of neurons with connections between them. The Latte compiler synthesizes a program based on the user specification, applies a suite of domain-specific and general optimizations, and emits efficient machine code for heterogeneous architectures. Latte also includes a communication runtime for distributed memory data-parallelism. Using networks described using Latte, we demonstrate 3-6x speedup over Caffe (C++/MKL) on the three state-of-the-art ImageNet models executing on an Intel Xeon E5-2699 v3 x86 CPU.
Publisher's Version Article Search

Parsing and Compilation

On the Complexity and Performance of Parsing with Derivatives
Michael D. Adams, Celeste Hollenbeck, and Matthew Might
(University of Utah, USA)
Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously. Might et al. introduced parsing with derivatives, which handles arbitrary context-free grammars while being both easy to understand and simple to implement. Despite much initial enthusiasm and a multitude of independent implementations, its worst-case complexity has never been proven to be better than exponential. In fact, high-level arguments claiming it is fundamentally exponential have been advanced and even accepted as part of the folklore. Performance ended up being sluggish in practice, and this sluggishness was taken as informal evidence of exponentiality. In this paper, we reexamine the performance of parsing with derivatives. We have discovered that it is not exponential but, in fact, cubic. Moreover, simple (though perhaps not obvious) modifications to the implementation by Might et al. lead to an implementation that is not only easy to understand but also highly performant in practice.
Publisher's Version Article Search

Down to the Metal II

Stratified Synthesis: Automatically Learning the x86-64 Instruction Set
Stefan Heule, Eric Schkufza, Rahul Sharma, and Alex Aiken
(Stanford University, USA; VMware, USA)
The x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-64 instructions. However, formal semantics for the x86-64 ISA are difficult to obtain and often written manually through great effort. We describe an automatically synthesized formal semantics of the input/output behavior for a large fraction of the x86-64 Haswell ISA’s many thousands of instruction variants. The key to our results is stratified synthesis, where we use a set of instructions whose semantics are known to synthesize the semantics of additional instructions whose semantics are unknown. As the set of formally described instructions increases, the synthesis vocabulary expands, making it possible to synthesize the semantics of increasingly complex instructions. Using this technique we automatically synthesized formal semantics for 1,795 instruction variants of the x86-64 Haswell ISA. We evaluate the learned semantics against manually written semantics (where available) and find that they are formally equivalent with the exception of 50 instructions, where the manually written semantics contain an error. We further find the learned formulas to be largely as precise as manually written ones and of similar size.
Publisher's Version Article Search Info
Remix: Online Detection and Repair of Cache Contention for the JVM
Ariel Eizenberg, Shiliang Hu, Gilles Pokam, and Joseph Devietti
(University of Pennsylvania, USA; Intel, USA)
As ever more computation shifts onto multicore architectures, it is increasingly critical to find effective ways of dealing with multithreaded performance bugs like true and false sharing. Previous approaches to fixing false sharing in unmanaged languages have employed highly-invasive runtime program modifications. We observe that managed language runtimes, with garbage collection and JIT code compilation, present unique opportunities to repair such bugs directly, mirroring the techniques used in manual repairs. We present Remix, a modified version of the Oracle HotSpot JVM which can detect cache contention bugs and repair false sharing at runtime. Remix's detection mechanism leverages recent performance counter improvements on Intel platforms, which allow for precise, unobtrusive monitoring of cache contention at the hardware level. Remix can detect and repair known false sharing issues in the LMAX Disruptor high-performance inter-thread messaging library and the Spring Reactor event-processing framework, automatically providing 1.5-2x speedups over unoptimized code and matching the performance of hand-optimization. Remix also finds a new false sharing bug in SPECjvm2008, and uncovers a true sharing bug in the HotSpot JVM that, when fixed, improves the performance of three NAS Parallel Benchmarks by 7-25x. Remix incurs no statistically-significant performance overhead on other benchmarks that do not exhibit cache contention, making Remix practical for always-on use.
Publisher's Version Article Search aec-badge-pldi
Statistical Similarity of Binaries
Yaniv David, Nimrod Partush, and Eran Yahav
(Technion, Israel)
We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled using different compilers, or has been modified. The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures. We have implemented our approach in a tool called Esh, and applied it to find various prominent vulnerabilities across compilers and versions, including Heartbleed, Shellshock and Venom. We show that Esh produces high accuracy results, with few to no false positives -- a crucial factor in the scenario of vulnerability search in stripped binaries.
Publisher's Version Article Search Info

Types I

Accepting Blame for Safe Tunneled Exceptions
Yizhou Zhang, Guido Salvaneschi, Quinn Beightol, Barbara Liskov, and Andrew C. Myers
(Cornell University, USA; TU Darmstadt, Germany; Massachusetts Institute of Technology, USA)
Unhandled exceptions crash programs, so a compile-time check that exceptions are handled should in principle make software more reliable. But designers of some recent languages have argued that the benefits of statically checked exceptions are not worth the costs. We introduce a new statically checked exception mechanism that addresses the problems with existing checked-exception mechanisms. In particular, it interacts well with higher-order functions and other design patterns. The key insight is that whether an exception should be treated as a "checked" exception is not a property of its type but rather of the context in which the exception propagates. Statically checked exceptions can "tunnel" through code that is oblivious to their presence, but the type system nevertheless checks that these exceptions are handled. Further, exceptions can be tunneled without being accidentally caught, by expanding the space of exception identifiers to identify the exception-handling context. The resulting mechanism is expressive and syntactically light, and can be implemented efficiently. We demonstrate the expressiveness of the mechanism using significant codebases and evaluate its performance. We have implemented this new exception mechanism as part of the new Genus programming language, but the mechanism could equally well be applied to other programming languages.
Publisher's Version Article Search
Occurrence Typing Modulo Theories
Andrew M. Kent, David Kempe, and Sam Tobin-Hochstadt
(Indiana University, USA)
We present a new type system combining occurrence typing---a technique previously used to type check programs in dynamically-typed languages such as Racket, Clojure, and JavaScript---with dependent refinement types. We demonstrate that the addition of refinement types allows the integration of arbitrary solver-backed reasoning about logical propositions from external theories. By building on occurrence typing, we can add our enriched type system as a natural extension of Typed Racket, reusing its core while increasing its expressiveness. The result is a well-tested type system with a conservative, decidable core in which types may depend on a small but extensible set of program terms. In addition to describing our design, we present the following: a formal model and proof of correctness; a strategy for integrating new theories, with specific examples including linear arithmetic and bitvectors; and an evaluation in the context of the full Typed Racket implementation. Specifically, we take safe vector operations as a case study, examining all vector accesses in a 56,000 line corpus of Typed Racket programs. Our system is able to prove that 50% of these are safe with no new annotations, and with a few annotations and modifications we capture more than 70%.
Publisher's Version Article Search aec-badge-pldi
Refinement Types for TypeScript
Panagiotis Vekris, Benjamin Cosman, and Ranjit Jhala
(University of California at San Diego, USA)
We present Refined TypeScript (RSC), a lightweight refinement type system for TypeScript, that enables static verification of higher-order, imperative programs. We develop a formal system for RSC that delineates the interaction between refinement types and mutability, and enables flow-sensitive reasoning by translating input programs to an equivalent intermediate SSA form. By establishing type safety for the intermediate form, we prove safety for the input programs. Next, we extend the core to account for imperative and dynamic features of TypeScript, including overloading, type reflection, ad hoc type hierarchies and object initialization. Finally, we evaluate RSC on a set of real-world benchmarks, including parts of the Octane benchmarks, D3, Transducers, and the TypeScript compiler. We show how RSC successfully establishes a number of value dependent properties, such as the safety of array accesses and downcasts, while incurring a modest overhead in type annotations and code restructuring.
Publisher's Version Article Search aec-badge-pldi

Synthesis I

MapReduce Program Synthesis
Calvin Smith and Aws Albarghouthi
(University of Wisconsin-Madison, USA)
By abstracting away the complexity of distributed systems, large-scale data processing platforms—MapReduce, Hadoop, Spark, Dryad, etc.—have provided developers with simple means for harnessing the power of the cloud. In this paper, we ask whether we can automatically synthesize MapReduce-style distributed programs from input–output examples. Our ultimate goal is to enable end users to specify large-scale data analyses through the simple interface of examples. We thus present a new algorithm and tool for synthesizing programs composed of efficient data-parallel operations that can execute on cloud computing infrastructure. We evaluate our tool on a range of real-world big-data analysis tasks and general computations. Our results demonstrate the efficiency of our approach and the small number of examples it requires to synthesize correct, scalable programs.
Publisher's Version Article Search
Programmatic and Direct Manipulation, Together at Last
Ravi Chugh, Brian Hempel, Mitchell Spradlin, and Jacob Albers
(University of Chicago, USA)
Direct manipulation interfaces and programmatic systems have distinct and complementary strengths. The former provide intuitive, immediate visual feedback and enable rapid prototyping, whereas the latter enable complex, reusable abstractions. Unfortunately, existing systems typically force users into just one of these two interaction modes. We present a system called Sketch-n-Sketch that integrates programmatic and direct manipulation for the particular domain of Scalable Vector Graphics (SVG). In Sketch-n-Sketch, the user writes a program to generate an output SVG canvas. Then the user may directly manipulate the canvas while the system immediately infers a program update in order to match the changes to the output, a workflow we call live synchronization. To achieve this, we propose (i) a technique called trace-based program synthesis that takes program execution history into account in order to constrain the search space and (ii) heuristics for dealing with ambiguities. Based on our experience with examples spanning 2,000 lines of code and from the results of a preliminary user study, we believe that Sketch-n-Sketch provides a novel workflow that can augment traditional programming systems. Our approach may serve as the basis for live synchronization in other application domains, as well as a starting point for yet more ambitious ways of combining programmatic and direct manipulation.
Publisher's Version Article Search Info aec-badge-pldi
Fast Synthesis of Fast Collections
Calvin Loncaric, Emina Torlak, and Michael D. Ernst
(University of Washington, USA)
Many applications require specialized data structures not found in the standard libraries, but implementing new data structures by hand is tedious and error-prone. This paper presents a novel approach for synthesizing efficient implementations of complex collection data structures from high-level specifications that describe the desired retrieval operations. Our approach handles a wider range of data structures than previous work, including structures that maintain an order among their elements or have complex retrieval methods. We have prototyped our approach in a data structure synthesizer called Cozy. Four large, real-world case studies compare structures generated by Cozy against handwritten implementations in terms of correctness and performance. Structures synthesized by Cozy match the performance of handwritten data structures while avoiding human error.
Publisher's Version Article Search Info aec-badge-pldi

Software-Defined Networking

Event-Driven Network Programming
Jedidiah McClurg, Hossein Hojjat, Nate Foster, and Pavol Černý
(University of Colorado at Boulder, USA; Cornell University, USA)
Software-defined networking (SDN) programs must simultaneously describe static forwarding behavior and dynamic updates in response to events. Event-driven updates are critical to get right, but difficult to implement correctly due to the high degree of concurrency in networks. Existing SDN platforms offer weak guarantees that can break application invariants, leading to problems such as dropped packets, degraded performance, security violations, etc. This paper introduces EVENT-DRIVEN CONSISTENT UPDATES that are guaranteed to preserve well-defined behaviors when transitioning between configurations in response to events. We propose NETWORK EVENT STRUCTURES (NESs) to model constraints on updates, such as which events can be enabled simultaneously and causal dependencies between events. We define an extension of the NetKAT language with mutable state, give semantics to stateful programs using NESs, and discuss provably-correct strategies for implementing NESs in SDNs. Finally, we evaluate our approach empirically, demonstrating that it gives well-defined consistency guarantees while avoiding expensive synchronization and packet buffering.
Publisher's Version Article Search aec-badge-pldi
Temporal NetKAT
Ryan Beckett, Michael Greenberg, and David Walker
(Princeton University, USA; Pomona College, USA)
Over the past 5-10 years, the rise of software-defined networking (SDN) has inspired a wide range of new systems, libraries, hypervisors and languages for programming, monitoring, and debugging network behavior. Oftentimes, these systems are disjoint—one language for programming and another for verification, and yet another for run-time monitoring and debugging. In this paper, we present a new, unified framework, called Temporal NetKAT, capable of facilitating all of these tasks at once. As its name suggests, Temporal NetKAT is the synthesis of two formal theories: past-time (finite trace) linear temporal logic and (network) Kleene Algebra with Tests. Temporal predicates allow programmers to write down concise properties of a packet’s path through the network and to make dynamic packet-forwarding, access control or debugging decisions on that basis. In addition to being useful for programming, the combined equational theory of LTL and NetKAT facilitates proofs of path-based correctness properties. Using new, general, proof techniques, we show that the equational semantics is sound with respect to the denotational semantics, and, for a class of programs we call network-wide programs, complete. We have also implemented a compiler for temporal NetKAT, evaluated its performance on a range of benchmarks, and studied the effectiveness of several optimizations.
Publisher's Version Article Search aec-badge-pldi
SDNRacer: Concurrency Analysis for Software-Defined Networks
Ahmed El-Hassany, Jeremie Miserez, Pavol Bielik, Laurent Vanbever, and Martin Vechev
(ETH Zurich, Switzerland)
Concurrency violations are an important source of bugs in Software-Defined Networks (SDN), often leading to policy or invariant violations. Unfortunately, concurrency violations are also notoriously difficult to avoid, detect and debug. This paper presents a novel approach and a tool, SDNRacer, for detecting concurrency violations of SDNs. Our approach is enabled by three key ingredients: (i) a precise happens- before model for SDNs that captures when events can happen concurrently; (ii) a set of sound, domain-specific filters that reduce reported violations by orders of magnitude, and; (iii) a sound and complete dynamic analyzer, based on the above, that can ensure the network is free of harmful errors such as data races and per-packet incoherence. We evaluated SDNRacer on several real-world OpenFlow controllers, running both reactive and proactive applications in large networks. We show that SDNRacer is practically effective: it quickly pinpoints harmful concurrency violations without overwhelming the user with false positives.
Publisher's Version Article Search aec-badge-pldi

Verifying Systems

Rehearsal: A Configuration Verification Tool for Puppet
Rian Shambaugh, Aaron Weiss, and Arjun Guha
(University of Massachusetts at Amherst, USA)
Large-scale data centers and cloud computing have turned system configuration into a challenging problem. Several widely-publicized outages have been blamed not on software bugs, but on configuration bugs. To cope, thousands of organizations use system configuration languages to manage their computing infrastructure. Of these, Puppet is the most widely used with thousands of paying customers and many more open-source users. The heart of Puppet is a domain-specific language that describes the state of a system. Puppet already performs some basic static checks, but they only prevent a narrow range of errors. Furthermore, testing is ineffective because many errors are only triggered under specific machine states that are difficult to predict and reproduce. With several examples, we show that a key problem with Puppet is that configurations can be non-deterministic. This paper presents Rehearsal, a verification tool for Puppet configurations. Rehearsal implements a sound, complete, and scalable determinacy analysis for Puppet. To develop it, we (1) present a formal semantics for Puppet, (2) use several analyses to shrink our models to a tractable size, and (3) frame determinism-checking as decidable formulas for an SMT solver. Rehearsal then leverages the determinacy analysis to check other important properties, such as idempotency. Finally, we apply Rehearsal to several real-world Puppet configurations.
Publisher's Version Article Search aec-badge-pldi
Toward Compositional Verification of Interruptible OS Kernels and Device Drivers
Hao Chen, Xiongnan (Newman) Wu, Zhong Shao, Joshua Lockerman, and Ronghui Gu
(Yale University, USA)
An operating system (OS) kernel forms the lowest level of any system software stack. The correctness of the OS kernel is the basis for the correctness of the entire system. Recent efforts have demonstrated the feasibility of building formally verified general-purpose kernels, but it is unclear how to extend their work to verify the functional correctness of device drivers, due to the non-local effects of interrupts. In this paper, we present a novel compositional framework for building certified interruptible OS kernels with device drivers. We provide a general device model that can be instantiated with various hardware devices, and a realistic formal model of interrupts, which can be used to reason about interruptible code. We have realized this framework in the Coq proof assistant. To demonstrate the effectiveness of our new approach, we have successfully extended an existing verified non-interruptible kernel with our framework and turned it into an interruptible kernel with verified device drivers. To the best of our knowledge, this is the first verified interruptible operating system with device drivers.
Publisher's Version Article Search
Verified Peephole Optimizations for CompCert
Eric Mullen, Daryl Zuniga, Zachary Tatlock, and Dan Grossman
(University of Washington, USA)
Transformations over assembly code are common in many compilers. These transformations are also some of the most bug-dense compiler components. Such bugs could be elim- inated by formally verifying the compiler, but state-of-the- art formally verified compilers like CompCert do not sup- port assembly-level program transformations. This paper presents Peek, a framework for expressing, verifying, and running meaning-preserving assembly-level program trans- formations in CompCert. Peek contributes four new com- ponents: a lower level semantics for CompCert x86 syntax, a liveness analysis, a library for expressing and verifying peephole optimizations, and a verified peephole optimiza- tion pass built into CompCert. Each of these is accompanied by a correctness proof in Coq against realistic assumptions about the calling convention and the system memory alloca- tor. Verifying peephole optimizations in Peek requires prov- ing only a set of local properties, which we have proved are sufficient to ensure global transformation correctness. We have proven these local properties for 28 peephole transfor- mations from the literature. We discuss the development of our new assembly semantics, liveness analysis, representa- tion of program transformations, and execution engine; de- scribe the verification challenges of each component; and detail techniques we applied to mitigate the proof burden.
Publisher's Version Article Search

Types II

Just-in-Time Static Type Checking for Dynamic Languages
Brianna M. Ren and Jeffrey S. Foster
(University of Maryland at College Park, USA)
Dynamic languages such as Ruby, Python, and JavaScript have many compelling benefits, but the lack of static types means subtle errors can remain latent in code for a long time. While many researchers have developed various systems to bring some of the benefits of static types to dynamic languages, prior approaches have trouble dealing with metaprogramming, which generates code as the program executes. In this paper, we propose Hummingbird, a new system that uses a novel technique, just-in-time static type checking, to type check Ruby code even in the presence of metaprogramming. In Hummingbird, method type signatures are gathered dynamically at run-time, as those methods are created. When a method is called, Hummingbird statically type checks the method body against current type signatures. Thus, Hummingbird provides thorough static checks on a per-method basis, while also allowing arbitrarily complex metaprogramming. For performance, Hummingbird memoizes the static type checking pass, invalidating cached checks only if necessary. We formalize Hummingbird using a core, Ruby-like language and prove it sound. To evaluate Hummingbird, we applied it to six apps, including three that use Ruby on Rails, a powerful framework that relies heavily on metaprogramming. We found that all apps typecheck successfully using Hummingbird, and that Hummingbird's performance overhead is reasonable. We applied Hummingbird to earlier versions of one Rails app and found several type errors that had been introduced and then fixed. Lastly, we demonstrate using Hummingbird in Rails development mode to typecheck an app as live updates are applied to it.
Publisher's Version Article Search aec-badge-pldi
Types from Data: Making Structured Data First-Class Citizens in F#
Tomas Petricek, Gustavo Guerra, and Don Syme
(University of Cambridge, UK; Microsoft, UK; Microsoft Research, UK)
Most modern applications interact with external services and access data in structured formats such as XML, JSON and CSV. Static type systems do not understand such formats, often making data access more cumbersome. Should we give up and leave the messy world of external data to dynamic typing and runtime checks? Of course, not! We present F# Data, a library that integrates external structured data into F#. As most real-world data does not come with an explicit schema, we develop a shape inference algorithm that infers a shape from representative sample documents. We then integrate the inferred shape into the F# type system using type providers. We formalize the process and prove a relative type soundness theorem. Our library significantly reduces the amount of data access code and it provides additional safety guarantees when contrasted with the widely used weakly typed techniques.
Publisher's Version Article Search
Automatically Learning Shape Specifications
He Zhu, Gustavo Petri, and Suresh Jagannathan
(Purdue University, USA; University of Paris Diderot, France)
This paper presents a novel automated procedure for discovering expressive shape specifications for sophisticated functional data structures. Our approach extracts potential shape predicates based on the definition of constructors of arbitrary user-defined inductive data types, and combines these predicates within an expressive first-order specification language using a lightweight data-driven learning procedure. Notably, this technique requires no programmer annotations, and is equipped with a type-based decision procedure to verify the correctness of discovered specifications. Experimental results indicate that our implementation is both efficient and effective, capable of automatically synthesizing sophisticated shape specifications over a range of complex data types, going well beyond the scope of existing solutions.
Publisher's Version Article Search aec-badge-pldi

Synthesis II

Synthesizing Transformations on Hierarchically Structured Data
Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri
(University of Texas at Austin, USA; University of Freiburg, Germany; Rice University, USA)
This paper presents a new approach for synthesizing transformations on tree-structured data, such as Unix directories and XML documents. We consider a general abstraction for such data, called hierarchical data trees (HDTs) and present a novel example-driven synthesis algorithm for HDT transformations. Our central insight is to reduce the problem of synthesizing tree transformers to the synthesis of list transformations that are applied to the paths of the tree. The synthesis problem over lists is solved using a new algorithm that combines SMT solving and decision tree learning. We have implemented our technique in a system called HADES and show that HADES can automatically synthesize a variety of interesting transformations collected from online forums.
Publisher's Version Article Search
Program Synthesis from Polymorphic Refinement Types
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama
(Massachusetts Institute of Technology, USA)
We present a method for synthesizing recursive functions that provably satisfy a given specification in the form of a polymorphic refinement type. We observe that such specifications are particularly suitable for program synthesis for two reasons. First, they offer a unique combination of expressive power and decidability, which enables automatic verification—and hence synthesis—of nontrivial programs. Second, a type-based specification for a program can often be effectively decomposed into independent specifications for its components, causing the synthesizer to consider fewer component combinations and leading to a combinatorial reduction in the size of the search space. At the core of our synthesis procedure is a newalgorithm for refinement type checking, which supports specification decomposition. We have evaluated our prototype implementation on a large set of synthesis problems and found that it exceeds the state of the art in terms of both scalability and usability. The tool was able to synthesize more complex programs than those reported in prior work (several sorting algorithms and operations on balanced search trees), as well as most of the benchmarks tackled by existing synthesizers, often starting from a more concise and intuitive user input.
Publisher's Version Article Search aec-badge-pldi

Parallelism I

Higher-Order and Tuple-Based Massively-Parallel Prefix Sums
Sepideh Maleki, Annie Yang, and Martin Burtscher
(Texas State University, USA)
Prefix sums are an important parallel primitive, especially in massively-parallel programs. This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that directly supports higher orders and tuple values. Its templated CUDA implementation unifies all of these computations in a single 100-statement kernel. SAM is communication-efficient in the sense that it minimizes main-memory accesses. When computing prefix sums of a million or more values, it outperforms Thrust and CUDPP on both a Titan X and a K40 GPU. On the Titan X, SAM reaches memory-copy speeds for large input sizes, which cannot be surpassed. SAM outperforms CUB, the currently fastest conventional prefix sum implementation, by up to a factor of 2.9 on eighth-order prefix sums and by up to a factor of 2.6 on eight-tuple prefix sums.
Publisher's Version Article Search Info aec-badge-pldi
A Distributed OpenCL Framework using Redundant Computation and Data Replication
Junghyun Kim, Gangwon Jo, Jaehoon Jung, Jungwon Kim, and Jaejin Lee
(Seoul National University, South Korea)
Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.
Publisher's Version Article Search

Memory Management

Idle Time Garbage Collection Scheduling
Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer
(Google, Germany; Google, USA; Google, UK)
Efficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live memory, free unused memory, and compact fragmented regions of memory, even when employing concurrent garbage collection. In animation-based applications that require 60 frames per second, these pause times may be observable, degrading user experience. This paper introduces idle time garbage collection scheduling to increase the responsiveness of applications by hiding expensive garbage collection operations inside of small, otherwise unused idle portions of the application's execution, resulting in smoother animations. Additionally we take advantage of idleness to reduce memory consumption while allowing higher memory use when high throughput is required. We implemented idle time garbage collection scheduling in V8, an open-source, production JavaScript virtual machine running within Chrome. We present performance results on various benchmarks running popular webpages and show that idle time garbage collection scheduling can significantly improve latency and memory consumption. Furthermore, we introduce a new metric called frame time discrepancy to quantify the quality of the user experience and precisely measure the improvements that idle time garbage collection provides for a WebGL-based game benchmark. Idle time garbage collection is shipped and enabled by default in Chrome.
Publisher's Version Article Search aec-badge-pldi
Assessing the Limits of Program-Specific Garbage Collection Performance
Nicholas Jacek, Meng-Chieh Chiu, Benjamin Marlin, and Eliot Moss
(University of Massachusetts at Amherst, USA)
We consider the ultimate limits of program-specific garbage collector performance for real programs. We first characterize the GC schedule optimization problem using Markov Decision Processes (MDPs). Based on this characterization, we develop a method of determining, for a given program run and heap size, an optimal schedule of collections for a non-generational collector. We further explore the limits of performance of a generational collector, where it is not feasible to search the space of schedules to prove optimality. Still, we show significant improvements with Least Squares Policy Iteration, a reinforcement learning technique for solving MDPs. We demonstrate that there is considerable promise to reduce garbage collection costs by developing program-specific collection policies.
Publisher's Version Article Search

Verification II

Cardinalities and Universal Quantifiers for Verifying Parameterized Systems
Klaus v. Gleissenthall, Nikolaj Bjørner, and Andrey Rybalchenko
(TU Munich, Germany; University of California at San Diego, USA; Microsoft Research, USA; Microsoft Research, UK)
Parallel and distributed systems rely on intricate protocols to manage shared resources and synchronize, i.e., to manage how many processes are in a particular state. Effective verification of such systems requires universally quantification to reason about parameterized state and cardinalities tracking sets of processes, messages, failures to adequately capture protocol logic. In this paper we present Tool, an automatic invariant synthesis method that integrates cardinality-based reasoning and universal quantification. The resulting increase of expressiveness allows Tool to verify, for the first time, a representative collection of intricate parameterized protocols.
Publisher's Version Article Search
Ivy: Safety Verification by Interactive Generalization
Oded Padon, Kenneth L. McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham
(Tel Aviv University, Israel; Microsoft Research, USA; University of California at Berkeley, USA)
Despite several decades of research, the problem of formal verification of infinite-state systems has resisted effective automation. We describe a system --- Ivy --- for interactively verifying safety of infinite-state systems. Ivy's key principle is that whenever verification fails, Ivy graphically displays a concrete counterexample to induction. The user then interactively guides generalization from this counterexample. This process continues until an inductive invariant is found. Ivy searches for universally quantified invariants, and uses a restricted modeling language. This ensures that all verification conditions can be checked algorithmically. All user interactions are performed using graphical models, easing the user's task. We describe our initial experience with verifying several distributed protocols.
Publisher's Version Article Search aec-badge-pldi

Security

Precise, Dynamic Information Flow for Database-Backed Applications
Jean Yang, Travis Hance, Thomas H. Austin, Armando Solar-Lezama, Cormac Flanagan, and Stephen Chong
(Carnegie Mellon University, USA; Harvard Medical School, USA; Dropbox, USA; San Jose State University, USA; Massachusetts Institute of Technology, USA; University of California at Santa Cruz, USA; Harvard University, USA)

We present an approach for dynamic information flow control across the application and database. Our approach reduces the amount of policy code required, yields formal guarantees across the application and database, works with existing relational database implementations, and scales for realistic applications. In this paper, we present a programming model that factors out information flow policies from application code and database queries, a dynamic semantics for the underlying JDB core language, and proofs of termination-insensitive non-interference and policy compliance for the semantics. We implement these ideas in Jacqueline, a Python web framework, and demonstrate feasibility through three application case studies: a course manager, a health record system, and a conference management system used to run an academic workshop. We show that in comparison to traditional applications with hand-coded policy checks, Jacqueline applications have 1) a smaller trusted computing base, 2) fewer lines of policy code, and 2) reasonable, often negligible, additional overheads.


Publisher's Version Article Search aec-badge-pldi
End-to-End Verification of Information-Flow Security for C and Assembly Programs
David Costanzo, Zhong Shao, and Ronghui Gu
(Yale University, USA)
Protecting the confidentiality of information manipulated by a computing system is one of the most important challenges facing today's cybersecurity community. A promising step toward conquering this challenge is to formally verify that the end-to-end behavior of the computing system really satisfies various information-flow policies. Unfortunately, because today's system software still consists of both C and assembly programs, the end-to-end verification necessarily requires that we not only prove the security properties of individual components, but also carefully preserve these properties through compilation and cross-language linking. In this paper, we present a novel methodology for formally verifying end-to-end security of a software system that consists of both C and assembly programs. We introduce a general definition of observation function that unifies the concepts of policy specification, state indistinguishability, and whole-execution behaviors. We show how to use different observation functions for different levels of abstraction, and how to link different security proofs across abstraction levels using a special kind of simulation that is guaranteed to preserve state indistinguishability. To demonstrate the effectiveness of our new methodology, we have successfully constructed an end-to-end security proof, fully formalized in the Coq proof assistant, of a nontrivial operating system kernel (running on an extended CompCert x86 assembly machine model). Some parts of the kernel are written in C and some are written in assembly; we verify all of the code, regardless of language.
Publisher's Version Article Search aec-badge-pldi
A Design and Verification Methodology for Secure Isolated Regions
Rohit Sinha, Manuel Costa, Akash Lal, Nuno P. Lopes, Sriram Rajamani, Sanjit A. Seshia, and Kapil Vaswani
(University of California at Berkeley, USA; Microsoft Research, UK; Microsoft Research, India)
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes services such as secure communication channels and memory management. We formalize this restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human-assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application's machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime's internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.
Publisher's Version Article Search aec-badge-pldi

Parallelism II

Transactional Data Structure Libraries
Alexander Spiegelman, Guy Golan-Gueta, and Idit Keidar
(Technion, Israel; Yahoo Research, Israel)
We introduce transactions into libraries of concurrent data structures; such transactions can be used to ensure atomicity of sequences of data structure operations. By focusing on transactional access to a well-defined set of data structure operations, we strike a balance between the ease-of-programming of transactions and the efficiency of custom-tailored data structures. We exemplify this concept by designing and implementing a library supporting transactions on any number of maps, sets (implemented as skiplists), and queues. Our library offers efficient and scalable transactions, which are an order of magnitude faster than state-of-the-art transactional memory toolkits. Moreover, our approach treats stand-alone data structure operations (like put and enqueue) as first class citizens, and allows them to execute with virtually no overhead, at the speed of the original data structure library.
Publisher's Version Article Search
FlexVec: Auto-Vectorization for Irregular Loops
Sara S. Baghsorkhi, Nalini Vasudevan, and Youfeng Wu
(Intel, USA; Google, USA)
Traditional vectorization techniques build a dependence graph with distance and direction information to determine whether a loop is vectorizable. Since vectorization reorders the execution of instructions across iterations, in general instructions involved in a strongly connected component (SCC) are deemed not vectorizable unless the SCC can be eliminated using techniques such as scalar expansion or privatization. Therefore, traditional vectorization techniques are limited in their ability to efficiently handle loops with dynamic cross-iteration dependencies or complex control flow interweaved within the dependence cycles. When potential dependencies do not occur very often, the end-result is under utilization of the SIMD hardware. In this paper, we propose FlexVec architecture that combines new vector instructions with novel code generation techniques to dynamically adjusts vector length for loop statements affected by cross-iteration dependencies that happen at runtime. We have designed and implemented FlexVec's new ISA as extensions to the recently released AVX-512 ISA. We have evaluated the performance improvements enabled by FlexVec vectorization for 11 C/C++ SPEC 2006 benchmarks and 7 real applications with AVX-512 vectorization as baseline. We show that FlexVec vectorization technique produces a Geomean speedup of 9% for SPEC 2006 and a Geomean speedup of 11% for 7 real applications.
Publisher's Version Article Search
Verified Lifting of Stencil Computations
Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama
(Adobe, USA; University of Washington, USA; Massachusetts Institute of Technology, USA)
This paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and leverages counter-example guided inductive synthesis (CEGIS) to find provably correct translations. Lifting existing code to a high-performance description language has a number of benefits, including maintainability and performance portability. For example, our experiments show that the lifted summaries can enable domain specific compilers to do a better job of parallelization as compared to an off-the-shelf compiler working on the original code, and can even support fully automatic migration to hardware accelerators such as GPUs. We have implemented verified lifting in a system called STNG and have evaluated it using microbenchmarks, mini-apps, and real-world applications. We demonstrate the benefits of verified lifting by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.
Publisher's Version Article Search

proc time: 0.33