Powered by
38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017),
June 18–23, 2017,
Barcelona, Spain
Frontmatter
Message from the Chairs
It is our great pleasure to welcome you to 38th ACM SIGPLAN Conference on Programming Language
Design and Implementation (PLDI). PLDI is the premier forum in the field of programming languages
and programming systems research, covering the areas of design, implementation, theory, applications,
and performance. PLDI welcomes outstanding research, which clearly advances the field and has the
potential to make a lasting contribution.
Compiler Optimizations
Cache Locality Optimization for Recursive Programs
Jonathan Lifflander and Sriram Krishnamoorthy
(Sandia National Laboratories, USA; Pacific Northwest National Laboratory, USA)
We present an approach to optimize the cache locality for recursive programs by dynamically splicing---recursively interleaving---the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. We present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain-specific optimizer for stencil programs.
@InProceedings{PLDI17p1,
author = {Jonathan Lifflander and Sriram Krishnamoorthy},
title = {Cache Locality Optimization for Recursive Programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {1--16},
doi = {},
year = {2017},
}
Fusing Effectful Comprehensions
Olli Saarikivi,
Margus Veanes, Todd Mytkowicz, and
Madan Musuvathi
(Aalto University, Finland; Helsinki Institute for Information Technology, Finland; Microsoft Research, USA)
List comprehensions provide a powerful abstraction mechanism for expressing computations over ordered collections of data declaratively without having to use explicit iteration constructs. This paper puts forth effectful comprehensions as an elegant way to describe list comprehensions that incorporate loop-carried state. This is motivated by operations such as compression/decompression and serialization/deserialization that are common in log/data processing pipelines and require loop-carried state when processing an input stream of data.
We build on the underlying theory of symbolic transducers to fuse pipelines of effectful comprehensions into a single representation, from which efficient code can be generated. Using background theory reasoning with an SMT solver, our fusion and subsequent reachability based branch elimination algorithms can significantly reduce the complexity of the fused pipelines. Our implementation shows significant speedups over reasonable hand-written code (3.4×, on average) and traditionally fused version of the pipeline (2.6×, on average) for a variety of examples, including scenarios for extracting fields with regular expressions, processing XML with XPath, and running queries over encoded data.
@InProceedings{PLDI17p17,
author = {Olli Saarikivi and Margus Veanes and Todd Mytkowicz and Madan Musuvathi},
title = {Fusing Effectful Comprehensions},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {17--32},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Generalizations of the Theory and Deployment of Triangular Inequality for Compiler-Based Strength Reduction
Yufei Ding, Lin Ning,
Hui Guan, and
Xipeng Shen
(North Carolina State University, USA)
Triangular Inequality (TI) has been used in many manual algorithm
designs to achieve good efficiency in solving some distance
calculation-based problems. This paper presents our generalization of
the idea into a compiler optimization technique, named TI-based
strength reduction. The generalization consists of three parts. The
first is the establishment of the theoretic foundation of this new
optimization via the development of a new form of TI named Angular
Triangular Inequality, along with several fundamental theorems. The
second is the revealing of the properties of the new forms of TI and
the proposal of guided TI adaptation, a systematic method to address
the difficulties in effective deployments of TI optimizations. The
third is an integration of the new optimization technique in an
open-source compiler. Experiments on a set of data mining and machine
learning algorithms show that the new technique can speed up the
standard implementations by as much as 134X and 46X on average for
distance-related problems, outperforming previous TI-based
optimizations by 2.35X on average. It also extends the applicability
of TI-based optimizations to vector related problems, producing tens
of times of speedup.
@InProceedings{PLDI17p33,
author = {Yufei Ding and Lin Ning and Hui Guan and Xipeng Shen},
title = {Generalizations of the Theory and Deployment of Triangular Inequality for Compiler-Based Strength Reduction},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {33--48},
doi = {},
year = {2017},
}
Alive-Infer: Data-Driven Precondition Inference for Peephole Optimizations in LLVM
David Menendez and
Santosh Nagarakatte
(Rutgers University, USA)
Peephole optimizations are a common source of compiler bugs. Compiler developers typically transform an incorrect peephole optimization into a valid one by strengthening the precondition. This process is challenging and tedious. This paper proposes Alive-Infer, a data-driven approach that infers preconditions for peephole optimizations expressed in Alive. Alive-Infer generates positive and negative examples for an optimization, enumerates predicates on-demand, and learns a set of predicates that separate the positive and negative examples. Alive-Infer repeats this process until it finds a precondition that ensures the validity of the optimization. Alive-Infer reports both a weakest precondition and a set of succinct partial preconditions to the developer. Our prototype generates preconditions that are weaker than LLVM’s preconditions for 73 optimizations in the Alive suite. We also demonstrate the applicability of this technique to generalize 54 optimization patterns generated by Souper, an LLVM IR–based superoptimizer.
@InProceedings{PLDI17p49,
author = {David Menendez and Santosh Nagarakatte},
title = {Alive-Infer: Data-Driven Precondition Inference for Peephole Optimizations in LLVM},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {49--63},
doi = {},
year = {2017},
}
Artifacts Functional
Learning and Probabilistic
DemoMatch: API Discovery from Demonstrations
Kuat Yessenov, Ivan Kuraj, and
Armando Solar-Lezama
(Massachusetts Institute of Technology, USA)
We introduce DemoMatch, a tool for API discovery that allows the user to discover how to implement functionality using a software framework by demonstrating the functionality in existing applications built with the same framework. DemoMatch matches the demonstrations against a database of execution traces called Semeru and generates code snippets explaining how to use the functionality. We evaluated DemoMatch on several case studies involving Java Swing and Eclipse RCP.
@InProceedings{PLDI17p64,
author = {Kuat Yessenov and Ivan Kuraj and Armando Solar-Lezama},
title = {DemoMatch: API Discovery from Demonstrations},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {64--78},
doi = {},
year = {2017},
}
Video
Artifacts Functional
Similarity of Binaries through re-Optimization
Yaniv David, Nimrod Partush, and Eran Yahav
(Technion, Israel)
We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code.
We present a technique that is scalable and precise, as it alleviates the need for heavyweight semantic comparison by performing out-of-context re-optimization of procedure fragments. It works by decomposing binary procedures to comparable fragments and transforming them to a canonical, normalized form using the compiler optimizer, which enables finding equivalent fragments through simple syntactic comparison. We use a statistical framework built by analyzing samples collected “in the wild” to generate a global context that quantifies the significance of each pair of fragments, and uses it to lift pairwise fragment equivalence to whole procedure similarity.
We have implemented our technique in a tool called GitZ and performed an extensive evaluation. We show that GitZ is able to perform millions of comparisons efficiently, and find similarity with high accuracy.
@InProceedings{PLDI17p79,
author = {Yaniv David and Nimrod Partush and Eran Yahav},
title = {Similarity of Binaries through re-Optimization},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {79--94},
doi = {},
year = {2017},
}
Synthesizing Program Input Grammars
Osbert Bastani,
Rahul Sharma,
Alex Aiken, and Percy Liang
(Stanford University, USA; Microsoft Research, India)
We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fuzz test programs with structured inputs. We show that GLADE substantially increases the incremental coverage on valid inputs compared to two baseline fuzzers.
@InProceedings{PLDI17p95,
author = {Osbert Bastani and Rahul Sharma and Alex Aiken and Percy Liang},
title = {Synthesizing Program Input Grammars},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {95--110},
doi = {},
year = {2017},
}
Compiling Markov Chain Monte Carlo Algorithms for Probabilistic Modeling
Daniel Huang, Jean-Baptiste Tristan, and
Greg Morrisett
(Harvard University, USA; Oracle Labs, USA; Cornell University, USA)
The problem of probabilistic modeling and inference, at a high-level, can be viewed as constructing a (model, query, inference) tuple, where an inference algorithm implements a query on a model. Notably, the derivation of inference algorithms can be a difficult and error-prone task. Hence, researchers have explored how ideas from probabilistic programming can be applied. In the context of constructing these tuples, probabilistic programming can be seen as taking a language-based approach to probabilistic modeling and inference. For instance, by using (1) appropriate languages for expressing models and queries and (2) devising inference techniques that operate on encodings of models (and queries) as program expressions, the task of inference can be automated.
In this paper, we describe a compiler that transforms a probabilistic model written in a restricted modeling language and a query for posterior samples given observed data into a Markov Chain Monte Carlo (MCMC) inference algorithm that implements the query. The compiler uses a sequence of intermediate languages (ILs) that guide it in gradually and successively refining a declarative specification of a probabilistic model and the query into an executable MCMC inference algorithm. The compilation strategy produces composable MCMC algorithms for execution on a CPU or GPU.
@InProceedings{PLDI17p111,
author = {Daniel Huang and Jean-Baptiste Tristan and Greg Morrisett},
title = {Compiling Markov Chain Monte Carlo Algorithms for Probabilistic Modeling},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {111--125},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Concurrency Analysis
BARRACUDA: Binary-level Analysis of Runtime RAces in CUDA programs
Ariel Eizenberg, Yuanfeng Peng, Toma Pigli, William Mansky, and Joseph Devietti
(University of Pennsylvania, USA; Princeton University, USA)
GPU programming models enable and encourage massively parallel programming with over a million threads, requiring extreme parallelism to achieve good performance. Massive parallelism brings significant correctness challenges by increasing the possibility for bugs as the number of thread interleavings balloons. Conventional dynamic safety analyses struggle to run at this scale.
We present BARRACUDA, a concurrency bug detector for GPU programs written in Nvidia’s CUDA language. BARRACUDA handles a wider range of parallelism constructs than previous work, including branch operations, low-level atomics and memory fences, which allows BARRACUDA to detect new classes of concurrency bugs. BARRACUDA operates at the binary level for increased compatibility with existing code, leveraging a new binary instrumentation framework that is extensible to other dynamic analyses. BARRACUDA incorporates a number of novel optimizations that are crucial for scaling concurrency bug detection to over a million threads.
@InProceedings{PLDI17p126,
author = {Ariel Eizenberg and Yuanfeng Peng and Toma Pigli and William Mansky and Joseph Devietti},
title = {BARRACUDA: Binary-level Analysis of Runtime RAces in CUDA programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {126--140},
doi = {},
year = {2017},
}
BigFoot: Static Check Placement for Dynamic Race Detection
Dustin Rhodes, Cormac Flanagan, and Stephen N. Freund
(University of California at Santa Cruz, USA; Williams College, USA)
Precise dynamic data race detectors provide strong correctness
guarantees but have high overheads because they generally keep
analysis state in a separate shadow location for each heap memory
location, and they check (and potentially update) the
corresponding shadow location on each heap access. The BigFoot
dynamic data race detector uses a combination of static and
dynamic analysis techniques to coalesce checks and compress
shadow locations. With BigFoot, multiple accesses to an object or
array often induce a single coalesced check that manipulates a
single compressed shadow location, resulting in a performance
improvement over FastTrack of 61%.
@InProceedings{PLDI17p141,
author = {Dustin Rhodes and Cormac Flanagan and Stephen N. Freund},
title = {BigFoot: Static Check Placement for Dynamic Race Detection},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {141--156},
doi = {},
year = {2017},
}
Artifacts Functional
Dynamic Race Prediction in Linear Time
Dileep Kini,
Umang Mathur, and
Mahesh Viswanathan
(University of Illinois at Urbana-Champaign, USA)
Writing reliable concurrent software remains a huge challenge for today's programmers.
Programmers rarely reason about their code by explicitly considering different possible inter-leavings of its execution.
We consider the problem of detecting data races from individual executions in a sound manner.
The classical approach to solving this problem has been to use Lamport's happens-before (HB) relation.
Until now HB remains the only approach that runs in linear time.
Previous efforts in improving over HB such as causally-precedes (CP) and maximal causal models fall short due to the fact that they are not implementable efficiently and hence have to compromise on their race detecting ability by limiting their techniques to bounded sized fragments of the execution.
We present a new relation weak-causally-precedes (WCP) that is provably better than CP in terms of being able to detect more races, while still remaining sound.
Moreover, it admits a linear time algorithm which works on the entire execution without having to fragment it.
@InProceedings{PLDI17p157,
author = {Dileep Kini and Umang Mathur and Mahesh Viswanathan},
title = {Dynamic Race Prediction in Linear Time},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {157--170},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Systematic Black-Box Analysis of Collaborative Web Applications
Marina Billes,
Anders Møller, and
Michael Pradel
(TU Darmstadt, Germany; Aarhus University, Denmark)
Web applications, such as collaborative editors that allow multiple clients to concurrently interact on a shared resource, are difficult to implement correctly. Existing techniques for analyzing concurrent software do not scale to such complex systems or do not consider multiple interacting clients. This paper presents Simian, the first fully automated technique for systematically analyzing multi-client web applications.
Naively exploring all possible interactions between a set of clients of such applications is practically infeasible. Simian obtains scalability for real-world applications by using a two-phase black-box approach. The application code remains unknown to the analysis and is first explored systematically using a single client to infer potential conflicts between client events triggered in a specific context. The second phase synthesizes multi-client interactions targeted at triggering misbehavior that may result from the potential conflicts, and reports an inconsistency if the clients do not converge to a consistent state.
We evaluate the analysis on three widely used systems, Google Docs, Firepad, and ownCloud Documents, where it reports a variety of inconsistencies, such as incorrect formatting and misplaced text fragments. Moreover, we find that the two-phase approach runs 10x faster compared to exhaustive exploration, making systematic analysis practically applicable.
@InProceedings{PLDI17p171,
author = {Marina Billes and Anders Møller and Michael Pradel},
title = {Systematic Black-Box Analysis of Collaborative Web Applications},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {171--184},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Language Implementation
Bringing the Web up to Speed with WebAssembly
Andreas Haas,
Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien
(Google, Germany; Google, USA; Microsoft, USA; Mozilla, USA; Apple, USA)
The maturation of the Web platform has given rise to sophisticated and demanding Web applications such as interactive 3D visualization, audio and video software, and games.
With that, efficiency and security of code on the Web has become more important than ever.
Yet JavaScript as the only built-in language of the Web is not well-equipped to meet these requirements, especially as a compilation target.
Engineers from the four major browser vendors have risen to the challenge and collaboratively designed a portable low-level bytecode called WebAssembly. It offers compact representation, efficient validation and compilation, and safe low to no-overhead execution. Rather than committing to a specific programming model, WebAssembly is an abstraction over modern hardware, making it language-, hardware-, and platform-independent, with use cases beyond just the Web. WebAssembly has been designed with a formal semantics from the start. We describe the motivation, design and formal semantics of WebAssembly and provide some preliminary experience with implementations.
@InProceedings{PLDI17p185,
author = {Andreas Haas and Andreas Rossberg and Derek L. Schuff and Ben L. Titzer and Michael Holman and Dan Gohman and Luke Wagner and Alon Zakai and JF Bastien},
title = {Bringing the Web up to Speed with WebAssembly},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {185--200},
doi = {},
year = {2017},
}
Miniphases: Compilation using Modular and Efficient Tree Transformations
Dmitry Petrashko,
Ondřej Lhoták, and
Martin Odersky
(EPFL, Switzerland; University of Waterloo, Canada)
Production compilers commonly perform dozens of transformations on an intermediate representation. Running those transformations in separate passes harms performance. One approach to recover performance is to combine transformations by hand in order to reduce number of passes. Such an approach harms modularity, and thus makes it hard to maintain and evolve a compiler over the long term, and makes reasoning about performance harder. This paper describes a methodology that allows a compiler writer to define multiple transformations separately, but fuse them into a single traversal of the intermediate representation when the compiler runs. This approach has been implemented in a compiler for the Scala language. Our performance evaluation indicates that this approach reduces the running time of tree transformations by 35% and shows that this is due to improved cache friendliness. At the same time, the approach improves total memory consumption by reducing the object tenuring rate by 50%. This approach enables compiler writers to write transformations that are both modular and fast at the same time.
@InProceedings{PLDI17p201,
author = {Dmitry Petrashko and Ondřej Lhoták and Martin Odersky},
title = {Miniphases: Compilation using Modular and Efficient Tree Transformations},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {201--216},
doi = {},
year = {2017},
}
Proactive and Adaptive Energy-Aware Programming with Mixed Typechecking
Anthony Canino and Yu David Liu
(SUNY Binghamton, USA)
Application-level energy management is an important dimension of energy optimization. In this paper, we introduce ENT, a novel programming language for enabling *proactive* and *adaptive* mode-based energy management at the application level. The proactive design allows programmers to apply their application knowledge to energy management, by characterizing the energy behavior of different program fragments with modes. The adaptive design allows such characterization to be delayed until run time, useful for capturing dynamic program behavior dependent on program states, configuration settings, external battery levels, or CPU temperatures. The key insight is both proactiveness and adaptiveness can be unified under a type system combined with static typing and dynamic typing. ENT has been implemented as an extension to Java, and successfully ported to three energy-conscious platforms: an Intel-based laptop, a Raspberry Pi, and an Android phone. Evaluation shows ENT improves the programmability, debuggability, and energy efficiency of battery-aware and temperature-aware programs.
@InProceedings{PLDI17p217,
author = {Anthony Canino and Yu David Liu},
title = {Proactive and Adaptive Energy-Aware Programming with Mixed Typechecking},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {217--232},
doi = {},
year = {2017},
}
Artifacts Functional
Simple, Fast, and Safe Manual Memory Management
Piyus Kedia, Manuel Costa,
Matthew Parkinson, Kapil Vaswani, Dimitrios Vytiniotis, and Aaron Blankstein
(Microsoft Research, India; Microsoft Research, UK; Princeton University, USA)
Safe programming languages are readily available, but many applications continue to be written in unsafe languages because of efficiency. As a consequence, many applications continue to have exploitable memory safety bugs. Since garbage collection is a major source of inefficiency in the implementation of safe languages, replacing it with safe manual memory management would be an important step towards solving this problem.
Previous approaches to safe manual memory management use programming models based on regions, unique pointers, borrowing of references, and ownership types. We propose a much simpler programming model that does not require any of these concepts. Starting from the design of an imperative type safe language (like Java or C#), we just add a delete operator to free memory explicitly and an exception which is thrown if the program dereferences a pointer to freed memory. We propose an efficient implementation of this programming model that guarantees type safety. Experimental results from our implementation based on the C# native compiler show that this design achieves up to 3x reduction in peak working set and run time.
@InProceedings{PLDI17p233,
author = {Piyus Kedia and Manuel Costa and Matthew Parkinson and Kapil Vaswani and Dimitrios Vytiniotis and Aaron Blankstein},
title = {Simple, Fast, and Safe Manual Memory Management},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {233--247},
doi = {},
year = {2017},
}
Static Analysis
Compositional Recurrence Analysis Revisited
Zachary Kincaid, Jason Breck, Ashkan Forouhi Boroujeni, and
Thomas Reps
(Princeton University, USA; University of Wisconsin-Madison, USA; GrammaTech, USA)
Compositional recurrence analysis (CRA) is a static-analysis method based on a combination of symbolic analysis and abstract interpretation. This paper addresses the problem of creating a context-sensitive interprocedural version of CRA that handles recursive procedures. The problem is non-trivial because there is an “impedance mismatch” between CRA, which relies on analysis techniques based on regular languages (i.e., Tarjan’s path-expression method), and the context-free-language underpinnings of context-sensitive analysis.
We show how to address this impedance mismatch by augmenting the CRA abstract domain with additional operations. We call the resulting algorithm Interprocedural CRA (ICRA). Our experiments with ICRA show that it has broad overall strength compared with several state-of-the-art software model checkers.
@InProceedings{PLDI17p248,
author = {Zachary Kincaid and Jason Breck and Ashkan Forouhi Boroujeni and Thomas Reps},
title = {Compositional Recurrence Analysis Revisited},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {248--262},
doi = {},
year = {2017},
}
Artifacts Functional
Context Transformations for Pointer Analysis
Rei Thiessen and
Ondřej Lhoták
(University of Waterloo, Canada)
Points-to analysis for Java benefits greatly from context sensitivity. CFL-reachability and k-limited context strings are two approaches to obtaining context sensitivity with different advantages: CFL-reachability allows local reasoning about data-value flow and thus is suitable for demand-driven analyses, whereas k-limited analyses allow object sensitivity which is a superior calling context abstraction for object-oriented languages. We combine the advantages of both approaches to obtain a context-sensitive analysis that is as precise as k-limited context strings, but is more efficient to compute. Our key insight is based on a novel abstraction of contexts adapted from CFL-reachability that represents a relation between two calling contexts as a composition of transformations over contexts.
We formulate pointer analysis in an algebraic structure of context transformations, which is a set of functions over calling contexts closed under function composition. We show that the context representation of context-string-based analyses is an explicit enumeration of all input and output values of context transformations. CFL-reachability-based pointer analysis is formulated to use call-strings as contexts, but the context transformations concept can be applied to any context abstraction used in k-limited analyses, including object- and type-sensitive analysis. The result is a more efficient algorithm for computing context-sensitive results for a wide variety of context configurations.
@InProceedings{PLDI17p263,
author = {Rei Thiessen and Ondřej Lhoták},
title = {Context Transformations for Pointer Analysis},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {263--277},
doi = {},
year = {2017},
}
Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata
Tian Tan, Yue Li, and
Jingling Xue
(UNSW, Australia)
Mainstream points-to analysis techniques for object-oriented languages rely predominantly on the allocation-site abstraction to model heap objects. We present MAHJONG, a novel heap abstraction that is specifically developed to address the needs of an important class of type-dependent clients, such as call graph construction, devirtualization and may-fail casting. By merging equivalent automata representing type-consistent objects that are created by the allocation-site abstraction, MAHJONG enables an allocation-site-based points-to analysis to run significantly faster while achieving nearly the same precision for type-dependent clients.
MAHJONG is simple conceptually, efficient, and drops easily on any allocation-site-based points-to analysis. We demonstrate its effectiveness by discussing some insights on why it is a better alternative of the allocation-site abstraction for type-dependent clients and evaluating it extensively on 12 large real-world Java programs with five context-sensitive points-to analyses and three widely used type-dependent clients. MAHJONG is expected to provide significant benefits for many program analyses where call graphs are required.
@InProceedings{PLDI17p278,
author = {Tian Tan and Yue Li and Jingling Xue},
title = {Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {278--291},
doi = {},
year = {2017},
}
Artifacts Functional
Static Deadlock Detection for Asynchronous C# Programs
Anirudh Santhiar and Aditya Kanade
(Indian Institute of Science, India)
Asynchronous programming is a standard approach for designing responsive applications. Modern languages such as C# provide async/await primitives for the disciplined use of asynchrony. In spite of this, programs can deadlock because of incorrect use of blocking operations along with non-blocking (asynchronous) operations. While developers are aware of this problem, there is no automated technique
to detect deadlocks in asynchronous programs.
We present a novel representation of control flow and scheduling of asynchronous programs, called continuation scheduling graph and formulate necessary conditions for a deadlock to occur in a program. We design static analyses to construct continuation scheduling graphs of asynchronous C# programs and to identify deadlocks in them.
We have implemented the static analyses in a tool called DeadWait. Using DeadWait, we found 43 previously unknown deadlocks in 11 asynchronous C# libraries. We reported the deadlocks to the library developers. They have confirmed and fixed 40 of them.
@InProceedings{PLDI17p292,
author = {Anirudh Santhiar and Aditya Kanade},
title = {Static Deadlock Detection for Asynchronous C# Programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {292--305},
doi = {},
year = {2017},
}
Info
Dynamic Analysis and Testing
Achieving High Coverage for Floating-Point Code via Unconstrained Programming
Zhoulai Fu and Zhendong Su
(University of California at Davis, USA)
Achieving high code coverage is essential in testing, which gives us
confidence in code quality. Testing floating-point code usually
requires painstaking efforts in handling floating-point constraints,
e.g., in symbolic execution. This paper turns the challenge of testing
floating-point code into the opportunity of applying unconstrained
programming --- the mathematical solution for calculating function minimum
points over the entire search space. Our core insight is to derive
a representing function from the floating-point program, any of whose minimum
points is a test input guaranteed to exercise a new branch of the
tested program. This guarantee allows us to achieve high coverage of
the floating-point program by repeatedly minimizing the representing
function.
We have realized this approach in a tool called CoverMe and conducted
an extensive evaluation of it on Sun's C math library. Our evaluation results show
that CoverMe achieves, on average, 90.8% branch coverage in 6.9 seconds, drastically
outperforming our compared tools: (1) Random testing, (2) AFL, a highly
optimized, robust fuzzer released by Google, and (3) Austin, a
state-of-the-art coverage-based testing tool designed to support
floating-point code.
@InProceedings{PLDI17p306,
author = {Zhoulai Fu and Zhendong Su},
title = {Achieving High Coverage for Floating-Point Code via Unconstrained Programming},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {306--319},
doi = {},
year = {2017},
}
Instruction Punning: Lightweight Instrumentation for x86-64
Buddhika Chamith, Bo Joel Svensson, Luke Dalessandro, and Ryan R. Newton
(Indiana University, USA; Chalmers University of Technology, Sweden)
Existing techniques for injecting probes into running applications are limited;
they either fail to support probing arbitrary locations, or to support scalable,
rapid toggling of probes. We introduce a new technique on x86-64, called
instruction punning, which allows scalable probes at any instruction. The key
idea is that when we inject a jump instruction, the relative address of the jump
serves simultaneously as data and as an instruction sequence. We show that this
approach achieves probe invocation overheads of only a few dozen cycles, and
probe activation/deactivation costs that are cheaper than a system call, even
when all threads in the system are both invoking probes and toggling them.
@InProceedings{PLDI17p320,
author = {Buddhika Chamith and Bo Joel Svensson and Luke Dalessandro and Ryan R. Newton},
title = {Instruction Punning: Lightweight Instrumentation for x86-64},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {320--332},
doi = {},
year = {2017},
}
Low Overhead Dynamic Binary Translation on ARM
Amanieu d'Antras, Cosmin Gorgovan, Jim Garside, and
Mikel Luján
(University of Manchester, UK)
The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity.
We present MAMBO-X64, a dynamic binary translator for Linux which executes 32-bit ARM binaries using only the AArch64 instruction set. We have evaluated the performance of MAMBO-X64 on three existing ARMv8 processors which support both AArch32 and AArch64 instruction sets. The performance was measured by comparing the running time of 32-bit benchmarks running under MAMBO-X64 with the same benchmark running natively. On SPEC CPU2006, we achieve a geometric mean overhead of less than 7.5% on in-order Cortex-A53 processors and a performance improvement of 1% on out-of-order X-Gene 1 processors.
MAMBO-X64 achieves such low overhead by novel optimizations to map AArch32 floating-point registers to AArch64 registers dynamically, handle overflowing address calculations efficiently, generate traces that harness hardware return address prediction, and handle operating system signals accurately.
@InProceedings{PLDI17p333,
author = {Amanieu d'Antras and Cosmin Gorgovan and Jim Garside and Mikel Luján},
title = {Low Overhead Dynamic Binary Translation on ARM},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {333--346},
doi = {},
year = {2017},
}
Artifacts Functional
Skeletal Program Enumeration for Rigorous Compiler Testing
Qirun Zhang, Chengnian Sun, and Zhendong Su
(University of California at Davis, USA)
A program can be viewed as a syntactic structure P (syntactic skeleton) parameterized by a collection of identifiers V (variable names). This paper introduces the skeletal program enumeration (SPE) problem: Given a syntactic skeleton P and a set of variables V , enumerate a set of programs P exhibiting all possible variable usage patterns within P. It proposes an effective realization of SPE for systematic, rigorous compiler testing by leveraging three important observations: (1) Programs with different variable usage patterns exhibit diverse control- and data-dependence, and help exploit different compiler optimizations; (2) most real compiler bugs were revealed by small tests (i.e., small-sized P) — this “small-scope” observation opens up SPE for practical compiler validation; and (3) SPE is exhaustive w.r.t. a given syntactic skeleton and variable set, offering a level of guarantee absent from all existing compiler testing techniques.
The key challenge of SPE is how to eliminate the enormous amount of equivalent programs w.r.t. α-conversion. Our main technical contribution is a novel algorithm for computing the canonical (and smallest) set of all non-α-equivalent programs. To demonstrate its practical utility, we have applied the SPE technique to test C/C++ compilers using syntactic skeletons derived from their own regression test-suites. Our evaluation results are extremely encouraging. In less than six months, our approach has led to 217 confirmed GCC/Clang bug reports, 119 of which have already been fixed, and the majority are long latent despite extensive prior testing efforts. Our SPE algorithm also provides six orders of magnitude reduction. Moreover, in three weeks, our technique has found 29 CompCert crashing bugs and 42 bugs in two Scala optimizing compilers. These results demonstrate our SPE technique’s generality and further illustrate its effectiveness.
@InProceedings{PLDI17p347,
author = {Qirun Zhang and Chengnian Sun and Zhendong Su},
title = {Skeletal Program Enumeration for Rigorous Compiler Testing},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {347--361},
doi = {},
year = {2017},
}
Static Analysis and Security
Decomposition Instead of Self-Composition for Proving the Absence of Timing Channels
Timos Antonopoulos, Paul Gazzillo,
Michael Hicks, Eric Koskinen, Tachio Terauchi, and Shiyi Wei
(Yale University, USA; University of Maryland, USA; JAIST, Japan)
We present a novel approach to proving the absence of timing channels. The idea is to partition the program’s execution traces in such a way that each partition component is checked for timing attack resilience by a time complexity analysis and that per-component resilience implies the resilience of the whole program. We construct a partition by splitting the program traces at secret-independent branches. This ensures that any pair of traces with the same public input has a component containing both traces. Crucially, the per-component checks can be normal safety properties expressed in terms of a single execution. Our approach is thus in contrast to prior approaches, such as self-composition, that aim to reason about multiple (k≥ 2) executions at once.
We formalize the above as an approach called quotient partitioning, generalized to any k-safety property, and prove it to be sound. A key feature of our approach is a demand-driven partitioning strategy that uses a regex-like notion called trails to identify sets of execution traces, particularly those influenced by tainted (or secret) data. We have applied our technique in a prototype implementation tool called Blazer, based on WALA, PPL, and the brics automaton library. We have proved timing-channel freedom of (or synthesized an attack specification for) 24 programs written in Java bytecode, including 6 classic examples from the literature and 6 examples extracted from the DARPA STAC challenge problems.
@InProceedings{PLDI17p362,
author = {Timos Antonopoulos and Paul Gazzillo and Michael Hicks and Eric Koskinen and Tachio Terauchi and Shiyi Wei},
title = {Decomposition Instead of Self-Composition for Proving the Absence of Timing Channels},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {362--375},
doi = {},
year = {2017},
}
Artifacts Functional
Automatic Program Inversion using Symbolic Transducers
Qinheping Hu and
Loris D'Antoni
(University of Wisconsin-Madison, USA)
We propose a fully-automated technique for inverting functional programs that operate over lists such as string encoders and decoders. We consider programs that can be modeled using symbolic extended finite transducers (), an expressive model that can describe complex list-manipulating programs while retaining several decidable properties. Concretely, given a program P expressed as an , we propose techniques for: (1) checking whether P is injective and, if that is the case, (2) building an P−1 describing its inverse. We first show that it is undecidable to check whether an is injective and propose an algorithm for checking injectivity for a restricted, but a practical class of . We then propose an algorithm for inverting based on the following idea: if an is injective, inverting it amounts to inverting all its individual transitions. We leverage recent advances program synthesis and show that the transition inversion problem can be expressed as an instance of the syntax-guided synthesis framework. Finally, we implement the proposed techniques in a tool called and show that can invert 13 out 14 real complex string encoders and decoders, producing inverse programs that are substantially identical to manually written ones.
@InProceedings{PLDI17p376,
author = {Qinheping Hu and Loris D'Antoni},
title = {Automatic Program Inversion using Symbolic Transducers},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {376--389},
doi = {},
year = {2017},
}
Control-Flow Recovery from Partial Failure Reports
Peter Ohmann, Alexander Brooks,
Loris D'Antoni, and Ben Liblit
(University of Wisconsin-Madison, USA)
Debugging is difficult. When software fails in production, debugging is even harder, as failure reports usually provide only an incomplete picture of the failing execution. We present a system that answers control-flow queries posed by developers as formal languages, indicating whether the query expresses control flow that is possible or impossible for a given failure report. We consider three separate approaches that trade off precision, expressiveness for failure constraints, and scalability. We also introduce a new subclass of regular languages, the unreliable trace languages, which are particularly suited to answering control-flow queries in polynomial time. Our system answers queries remarkably efficiently when we encode failure constraints and user queries entirely as unreliable trace languages.
@InProceedings{PLDI17p390,
author = {Peter Ohmann and Alexander Brooks and Loris D'Antoni and Ben Liblit},
title = {Control-Flow Recovery from Partial Failure Reports},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {390--405},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Rigorous Analysis of Software Countermeasures against Cache Attacks
Goran Doychev and Boris Köpf
(IMDEA Software Institute, Spain)
CPU caches introduce variations into the execution time of programs
that can be exploited by adversaries to recover private information
about users or cryptographic keys.
Establishing the security of countermeasures against this threat
often requires intricate reasoning about the interactions of program
code, memory layout, and hardware architecture and has so far only
been done for restricted cases.
In this paper we devise novel techniques that provide support for
bit-level and arithmetic reasoning about memory accesses in the presence of
dynamic memory allocation. These techniques
enable us to perform the
first rigorous analysis of widely deployed software countermeasures
against cache attacks on modular exponentiation, based on executable
code.
@InProceedings{PLDI17p406,
author = {Goran Doychev and Boris Köpf},
title = {Rigorous Analysis of Software Countermeasures against Cache Attacks},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {406--421},
doi = {},
year = {2017},
}
Artifacts Functional
Synthesis
Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples
Yu Feng, Ruben Martins, Jacob Van Geffen,
Isil Dillig, and Swarat Chaudhuri
(University of Texas at Austin, USA; Rice University, USA)
This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specifications, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches. We apply the proposed synthesis methodology for automating a large class of data preparation tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.
@InProceedings{PLDI17p422,
author = {Yu Feng and Ruben Martins and Jacob Van Geffen and Isil Dillig and Swarat Chaudhuri},
title = {Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {422--436},
doi = {},
year = {2017},
}
Artifacts Functional
Network Configuration Synthesis with Abstract Topologies
Ryan Beckett, Ratul Mahajan,
Todd Millstein, Jitendra Padhye, and David Walker
(Princeton University, USA; Intentionet, USA; Microsoft, USA; University of California at Los Angeles, USA)
We develop Propane/AT, a system to synthesize provably-correct BGP (border gateway protocol) configurations for large, evolving networks from high-level specifications of topology, routing policy, and fault-tolerance requirements. Propane/AT is based on new abstractions for capturing parameterized network topologies and their evolution, and algorithms to analyze the impact of topology and routing policy on fault tolerance. Our algorithms operate entirely on abstract topologies. We prove that the properties established by our analyses hold for every concrete instantiation of the given abstract topology. Propane/AT also guarantees that only incremental changes to existing device configurations are required when the network evolves to add or remove devices and links. Our experiments with real-world topologies and policies show that our abstractions and algorithms are effective, and that, for large networks, Propane/AT synthesizes configurations two orders of magnitude faster than systems that operate on concrete topologies.
@InProceedings{PLDI17p437,
author = {Ryan Beckett and Ratul Mahajan and Todd Millstein and Jitendra Padhye and David Walker},
title = {Network Configuration Synthesis with Abstract Topologies},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {437--451},
doi = {},
year = {2017},
}
Artifacts Functional
Synthesizing Highly Expressive SQL Queries from Input-Output Examples
Chenglong Wang, Alvin Cheung, and Rastislav Bodik
(University of Washington, USA)
SQL is the de facto language for manipulating relational data. Though powerful, many users find it difficult to write SQL queries due to highly expressive constructs.
While using the programming-by-example paradigm to help users write SQL queries is an attractive proposition, as evidenced by online help forums such as Stack Overflow, developing techniques for synthesizing SQL queries from given input-output (I/O) examples has been difficult, due to the large space of SQL queries as a result of its rich set of operators.
In this paper, we present a new scalable and efficient algorithm for synthesizing SQL queries based on I/O examples. The key innovation of our algorithm is development of a language for abstract queries, i.e., queries with uninstantiated operators, that can be used to express a large space of SQL queries efficiently. Using abstract queries to represent the search space nicely decomposes the synthesis problem into two tasks: 1) searching for abstract queries that can potentially satisfy the given I/O examples, and 2) instantiating the found abstract queries and ranking the results.
We have implemented this algorithm in a new tool called Scythe and evaluated it using 193 benchmarks collected from Stack Overflow. Our evaluation shows that Scythe can efficiently solve 74% of the benchmarks, most in just a few seconds, and the queries range from simple ones involving a single selection to complex queries with 6 nested subqueires.
@InProceedings{PLDI17p452,
author = {Chenglong Wang and Alvin Cheung and Rastislav Bodik},
title = {Synthesizing Highly Expressive SQL Queries from Input-Output Examples},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {452--466},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Synthesizing Memory Models from Framework Sketches and Litmus Tests
James Bornholt and
Emina Torlak
(University of Washington, USA)
A memory consistency model specifies which writes to shared memory a given read may see. Ambiguities or errors in these specifications can lead to bugs in both compilers and applications. Yet architectures usually define their memory models with prose and litmus tests—small concurrent programs that demonstrate allowed and forbidden outcomes. Recent work has formalized the memory models of common architectures through substantial manual effort, but as new architectures emerge, there is a growing need for tools to aid these efforts.
This paper presents MemSynth, a synthesis-aided system for reasoning about axiomatic specifications of memory models. MemSynth takes as input a set of litmus tests and a framework sketch that defines a class of memory models. The sketch comprises a set of axioms with missing expressions (or holes). Given these inputs, MemSynth synthesizes a completion of the axioms—i.e., a memory model—that gives the desired outcome on all tests. The MemSynth engine employs a novel embedding of bounded relational logic in a solver-aided programming language, which enables it to tackle complex synthesis queries intractable to existing relational solvers. This design also enables it to solve new kinds of queries, such as checking if a set of litmus tests unambiguously defines a memory model within a framework sketch.
We show that MemSynth can synthesize specifications for x86 in under two seconds, and for PowerPC in 12 seconds from 768 litmus tests. Our ambiguity check identifies missing tests from both the Intel x86 documentation and the validation suite of a previous PowerPC formalization. We also used MemSynth to reproduce, debug, and automatically repair a paper on comparing memory models in just two days.
@InProceedings{PLDI17p467,
author = {James Bornholt and Emina Torlak},
title = {Synthesizing Memory Models from Framework Sketches and Litmus Tests},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {467--481},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Functional Programming and Correctness
Compiling without Continuations
Luke Maurer,
Paul Downen, Zena M. Ariola, and Simon Peyton Jones
(University of Oregon, USA; Microsoft Research, UK)
Many fields of study in compilers give rise to the concept of a join point—a place where different execution paths come together. Join points are often treated as functions or continuations, but we believe it is time to study them in their own right. We show that adding join points to a direct-style functional intermediate language is a simple but powerful change that allows new optimizations to be performed, including a significant improvement to list fusion. Finally, we report on recent work on adding join points to the intermediate language of the Glasgow Haskell Compiler.
@InProceedings{PLDI17p482,
author = {Luke Maurer and Paul Downen and Zena M. Ariola and Simon Peyton Jones},
title = {Compiling without Continuations},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {482--494},
doi = {},
year = {2017},
}
FunTAL: Reasonably Mixing a Functional Language with Assembly
Daniel Patterson, Jamie Perconti, Christos Dimoulas, and
Amal Ahmed
(Northeastern University, USA; Harvard University, USA)
We present FunTAL, the first multi-language system to formalize safe interoperability between a high-level functional language and low-level assembly code while supporting compositional reasoning about the mix. A central challenge in developing such a multi-language is bridging the gap between assembly, which is staged into jumps to continuations, and high-level code, where subterms return a result. We present a compositional stack-based typed assembly language that supports components, comprised of one or more basic blocks, that may be embedded in high-level contexts. We also present a logical relation for FunTAL that supports reasoning about equivalence of high-level components and their assembly replacements, mixed-language programs with callbacks between languages, and assembly components comprised of different numbers of basic blocks.
@InProceedings{PLDI17p495,
author = {Daniel Patterson and Jamie Perconti and Christos Dimoulas and Amal Ahmed},
title = {FunTAL: Reasonably Mixing a Functional Language with Assembly},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {495--509},
doi = {},
year = {2017},
}
Artifacts Functional
HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics
Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu
(University of Washington, USA)
Every database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which makes reasoning about rewrite rules difficult. In this paper, we propose a machine-checkable denotational semantics for SQL, the de facto language for relational database, for rigorously validating rewrite rules. Unlike previously proposed semantics that are either non-mechanized or only cover a small amount of SQL language features, our semantics covers all major features of SQL, including bags, correlated subqueries, aggregation, and indexes. Our mechanized semantics, called HoTT SQL, is based on K-Relations and homotopy type theory, where we denote relations as mathematical functions from tuples to univalent types. We have implemented HoTTSQL in Coq, which takes only fewer than 300 lines of code and have proved a wide range of SQL rewrite rules, including those from database research literature (e.g., magic set rewrites) and real-world query optimizers (e.g., subquery elimination). Several of these rewrite rules have never been previously proven correct. In addition, while query equivalence is generally undecidable, we have implemented an automated decision procedure using HoTTSQL for conjunctive queries: a well studied decidable fragment of SQL that encompasses many real-world queries.
@InProceedings{PLDI17p510,
author = {Shumo Chu and Konstantin Weitz and Alvin Cheung and Dan Suciu},
title = {HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {510--524},
doi = {},
year = {2017},
}
Info
Artifacts Functional
Levity Polymorphism
Richard A. Eisenberg and Simon Peyton Jones
(Bryn Mawr College, USA; Microsoft Research, UK)
Parametric polymorphism is one of the linchpins of modern typed programming,
but it comes with a real performance penalty.
We describe this penalty; offer a principled way to reason
about it (kinds as calling conventions); and propose levity polymorphism.
This new form of polymorphism allows abstractions over
calling conventions; we detail and verify restrictions that are
necessary in order to compile levity-polymorphic functions.
Levity polymorphism has created new opportunities in Haskell,
including the ability to generalize nearly half of the type classes in GHC's
standard library.
@InProceedings{PLDI17p525,
author = {Richard A. Eisenberg and Simon Peyton Jones},
title = {Levity Polymorphism},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {525--539},
doi = {},
year = {2017},
}
Parallelization and Concurrency
Synthesis of Divide and Conquer Parallelism for Loops
Azadeh Farzan and Victor Nicolet
(University of Toronto, Canada)
Divide-and-conquer is a common parallel programming skeleton supported by many cross-platform multithreaded libraries, and most commonly used by programmers for parallelization. The challenges of producing (manually or automatically) a correct divide-and-conquer parallel program from a given sequential code are two-fold: (1) assuming that a good solution exists where individual worker threads execute a code identical to the sequential one, the programmer has to provide the extra code for dividing the tasks and combining the partial results (i.e. joins), and (2) the sequential code may not be suitable for divide-and-conquer parallelization as is, and may need to be modified to become a part of a good solution. We address both challenges in this paper. We present an automated synthesis technique to synthesize correct joins and an algorithm for modifying the sequential code to make it suitable for parallelization when necessary. This paper focuses on class of loops that traverse a read-only collection and compute a scalar function over that collection. We present theoretical results for when the necessary modifications to sequential code are possible, theoretical guarantees for the algorithmic solutions presented here, and experimental evaluation of the approach’s success in practice and the quality of the produced parallel programs.
@InProceedings{PLDI17p540,
author = {Azadeh Farzan and Victor Nicolet},
title = {Synthesis of Divide and Conquer Parallelism for Loops},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {540--555},
doi = {},
year = {2017},
}
Info
Futhark: Purely Functional GPU-Programming with Nested Parallelism and In-Place Array Updates
Troels Henriksen, Niels G. W. Serup,
Martin Elsman,
Fritz Henglein, and
Cosmin E. Oancea
(University of Copenhagen, Denmark)
Futhark is a purely functional data-parallel array language that
offers a machine-neutral programming model and an optimising compiler
that generates OpenCL code for GPUs.
This paper presents the design and implementation of three key
features of Futhark that seek a suitable middle ground with imperative
approaches.
First, in order to express efficient code inside the parallel
constructs, we introduce a simple type system for in-place updates
that ensures referential transparency and supports equational
reasoning.
Second, we furnish Futhark with parallel operators capable of
expressing efficient strength-reduced code, along with their
fusion rules.
Third, we present a flattening transformation aimed at enhancing the
degree of parallelism that (i) builds on loop interchange and
distribution but uses higher-order reasoning rather than
array-dependence analysis, and (ii) still allows further
locality-of-reference optimisations. Finally, an evaluation on 16
benchmarks demonstrates the impact of the language and compiler
features and shows application-level performance competitive with
hand-written GPU code.
@InProceedings{PLDI17p556,
author = {Troels Henriksen and Niels G. W. Serup and Martin Elsman and Fritz Henglein and Cosmin E. Oancea},
title = {Futhark: Purely Functional GPU-Programming with Nested Parallelism and In-Place Array Updates},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {556--571},
doi = {},
year = {2017},
}
Artifacts Functional
Gradual Synthesis for Static Parallelization of Single-Pass Array-Processing Programs
Grigory Fedyukovich, Maaz Bin Safeer Ahmad, and Rastislav Bodik
(University of Washington, USA)
Parallelizing of software improves its effectiveness and productivity. To guarantee correctness, the parallel and serial versions of the same code must be formally verified to be equivalent. We present a novel approach, called GRASSP, that automatically synthesizes parallel single-pass array-processing programs by treating the given serial versions as specifications. Given arbitrary segmentation of the input array, GRASSP synthesizes a code to determine a new segmentation of the array that allows computing partial results for each segment and merging them. In contrast to other parallelizers, GRASSP gradually considers several parallelization scenarios and certifies the results using constrained Horn solving. For several classes of programs, we show that such parallelization can be performed efficiently. The C++ translations of the GRASSP solutions sped performance by up to 5X relative to serial code on an 8-thread machine and Hadoop translations by up to 10X on a 10-node Amazon EMR cluster.
@InProceedings{PLDI17p572,
author = {Grigory Fedyukovich and Maaz Bin Safeer Ahmad and Rastislav Bodik},
title = {Gradual Synthesis for Static Parallelization of Single-Pass Array-Processing Programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {572--585},
doi = {},
year = {2017},
}
Verified Computation
A Formally Verified Compiler for Lustre
Timothy Bourke, Lélio Brun, Pierre-Évariste Dagand, Xavier Leroy,
Marc Pouzet, and Lionel Rieg
(Inria, France; ENS, France; UPMC, France; CNRS, France; Collège de France, France; Yale University, USA)
The correct compilation of block diagram languages like Lustre, Scade, and a discrete subset of Simulink is important since they are used to program critical embedded control software. We describe the specification and verification in an Interactive Theorem Prover of a compilation chain that treats the key aspects of Lustre: sampling, nodes, and delays. Building on CompCert, we show that repeated execution of the generated assembly code faithfully implements the dataflow semantics of source programs.
We resolve two key technical challenges. The first is the change from a synchronous dataflow semantics, where programs manipulate streams of values, to an imperative one, where computations manipulate memory sequentially. The second is the verified compilation of an imperative language with encapsulated state to C code where the state is realized by nested records. We also treat a standard control optimization that eliminates unnecessary conditional statements.
@InProceedings{PLDI17p586,
author = {Timothy Bourke and Lélio Brun and Pierre-Évariste Dagand and Xavier Leroy and Marc Pouzet and Lionel Rieg},
title = {A Formally Verified Compiler for Lustre},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {586--601},
doi = {},
year = {2017},
}
Artifacts Functional
Flatten and Conquer: A Framework for Efficient Analysis of String Constraints
Parosh Aziz Abdulla,
Mohamed Faouzi Atig,
Yu-Fang Chen, Bui Phi Diep,
Lukáš Holík, Ahmed Rezine, and Philipp Rümmer
(Uppsala University, Sweden; Academia Sinica, Taiwan; Brno University of Technology, Czech Republic; Linköping University, Sweden)
We describe a uniform and efficient framework for checking the satisfiability of a large class of string constraints. The framework is based on the observation that both satisfiability and unsatisfiability of common constraints can be demonstrated through witnesses with simple patterns. These patterns are captured using flat automata each of which consists of a sequence of simple loops. We build a Counter-Example Guided Abstraction Refinement (CEGAR) framework which contains both an under- and an over-approximation module. The flow of information between the modules allows to increase the precision in an automatic manner. We have implemented the framework as a tool and performed extensive experimentation that demonstrates both the generality and efficiency of our method.
@InProceedings{PLDI17p602,
author = {Parosh Aziz Abdulla and Mohamed Faouzi Atig and Yu-Fang Chen and Bui Phi Diep and Lukáš Holík and Ahmed Rezine and Philipp Rümmer},
title = {Flatten and Conquer: A Framework for Efficient Analysis of String Constraints},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {602--617},
doi = {},
year = {2017},
}
Artifacts Functional
Correctness
Repairing Sequential Consistency in C/C++11
Ori Lahav,
Viktor Vafeiadis, Jeehoon Kang,
Chung-Kil Hur, and
Derek Dreyer
(MPI-SWS, Germany; Seoul National University, South Korea)
The C/C++11 memory model defines the semantics of concurrent memory accesses in C/C++, and in particular supports racy "atomic" accesses at a range of different consistency levels, from very weak consistency ("relaxed") to strong, sequential consistency ("SC"). Unfortunately, as we observe in this paper, the semantics of SC atomic accesses in C/C++11, as well as in all proposed strengthenings of the semantics, is flawed, in that (contrary to previously published results) both suggested compilation schemes to the Power architecture are unsound. We propose a model, called RC11 (for Repaired C11), with a better semantics for SC accesses that restores the soundness of the compilation schemes to Power, maintains the DRF-SC guarantee, and provides stronger, more useful, guarantees to SC fences. In addition, we formally prove, for the first time, the correctness of the proposed stronger compilation schemes to Power that preserve load-to-store ordering and avoid "out-of-thin-air" reads.
@InProceedings{PLDI17p618,
author = {Ori Lahav and Viktor Vafeiadis and Jeehoon Kang and Chung-Kil Hur and Derek Dreyer},
title = {Repairing Sequential Consistency in C/C++11},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {618--632},
doi = {},
year = {2017},
}
Info
Taming Undefined Behavior in LLVM
Juneyoung Lee, Yoonseung Kim, Youngju Song,
Chung-Kil Hur, Sanjoy Das, David Majnemer,
John Regehr, and Nuno P. Lopes
(Seoul National University, South Korea; Azul Systems, USA; Google, USA; University of Utah, USA; Microsoft Research, UK)
A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis.
In this paper we study an aspect of IR design that has received little attention: the role of undefined behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of undefined behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs.
We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.
@InProceedings{PLDI17p633,
author = {Juneyoung Lee and Yoonseung Kim and Youngju Song and Chung-Kil Hur and Sanjoy Das and David Majnemer and John Regehr and Nuno P. Lopes},
title = {Taming Undefined Behavior in LLVM},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {633--647},
doi = {},
year = {2017},
}
Artifacts Functional
Systems and Performance
Low-Synchronization, Mostly Lock-Free, Elastic Scheduling for Streaming Runtimes
Scott Schneider and Kun-Lung Wu
(IBM Research, USA)
We present the scalable, elastic operator scheduler in IBM Streams 4.2. Streams
is a distributed stream processing system used in production at many companies
in a wide range of industries. The programming language for Streams, SPL,
presents operators, tuples and streams as the primary abstractions. A
fundamental SPL optimization is operator fusion, where multiple operators
execute in the same process. Streams 4.2 introduces automatic
submission-time fusion to simplify application development and deployment.
However, potentially thousands of operators could then execute
in the same process, with no user guidance for thread placement. We needed a way
to automatically figure out how many threads to use, with arbitrarily sized
applications on a wide variety of hardware, and without any input from
programmers. Our solution has two components. The first is a scalable operator
scheduler that minimizes synchronization, locks and global data, while allowing
threads to execute any operator and dynamically come and go. The second is an
elastic algorithm to dynamically adjust the number of threads to optimize
performance, using the principles of trusted measurements to establish trends.
We demonstrate our scheduler's ability to scale to over a hundred threads, and
our elasticity algorithm's ability to adapt to different workloads on an Intel
Xeon system with 176 logical cores, and an IBM Power8 system with 184 logical
cores.
@InProceedings{PLDI17p648,
author = {Scott Schneider and Kun-Lung Wu},
title = {Low-Synchronization, Mostly Lock-Free, Elastic Scheduling for Streaming Runtimes},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {648--661},
doi = {},
year = {2017},
}
Practical Partial Evaluation for High-Performance Dynamic Language Runtimes
Thomas Würthinger,
Christian Wimmer, Christian Humer, Andreas Wöß,
Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and Matthias Grimmer
(Oracle Labs, Switzerland; Oracle Labs, USA; Oracle Labs, Austria; Oracle Labs, UK; JKU Linz, Austria)
Most high-performance dynamic language virtual machines duplicate language semantics in the interpreter, compiler, and runtime system.
This violates the principle to not repeat yourself.
In contrast, we define languages solely by writing an interpreter.
The interpreter performs specializations, e.g., augments the interpreted program with type information and profiling information.
Compiled code is derived automatically using partial evaluation while incorporating these specializations.
This makes partial evaluation practical in the context of dynamic languages: It reduces the size of the compiled code while still compiling all parts of an operation that are relevant for a particular program.
When a speculation fails, execution transfers back to the interpreter, the program re-specializes in the interpreter, and later partial evaluation again transforms the new state of the interpreter to compiled code.
We evaluate our approach by comparing our implementations of JavaScript, Ruby, and R with best-in-class specialized production implementations.
Our general-purpose compilation system is competitive with production systems even when they have been heavily optimized for the one language they support.
For our set of benchmarks, our speedup relative to the V8 JavaScript VM is 0.83x, relative to JRuby is 3.8x, and relative to GNU R is 5x.
@InProceedings{PLDI17p662,
author = {Thomas Würthinger and Christian Wimmer and Christian Humer and Andreas Wöß and Lukas Stadler and Chris Seaton and Gilles Duboscq and Doug Simon and Matthias Grimmer},
title = {Practical Partial Evaluation for High-Performance Dynamic Language Runtimes},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {662--676},
doi = {},
year = {2017},
}
Responsive Parallel Computation: Bridging Competitive and Cooperative Threading
Stefan K. Muller,
Umut A. Acar, and
Robert Harper
(Carnegie Mellon University, USA; Inria, France)
Competitive and cooperative threading are widely used
abstractions in computing. In competitive threading, threads are
scheduled preemptively with the goal of minimizing response time,
usually of interactive applications. In cooperative threading,
threads are scheduled non-preemptively with the goal of maximizing
throughput or minimizing the completion time, usually in
compute-intensive applications, e.g. scientific computing, machine
learning and AI.
Although both of these forms of threading rely on the same abstraction
of a thread, they have, to date, remained largely separate forms of
computing. Motivated by the recent increase in the mainstream use of
multicore computers, we propose a threading model that aims to unify
competitive and cooperative threading. To this end, we extend the
classic graph-based cost model for cooperative threading to allow for
competitive threading, and describe how such a cost model may be used
in a programming language by presenting a language and a corresponding
cost semantics. Finally, we show that the cost model and the
semantics are realizable by presenting an operational semantics for
the language that specifies the behavior of an implementation, as
well as an implementation and a small empirical evaluation.
@InProceedings{PLDI17p677,
author = {Stefan K. Muller and Umut A. Acar and Robert Harper},
title = {Responsive Parallel Computation: Bridging Competitive and Cooperative Threading},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {677--692},
doi = {},
year = {2017},
}
Artifacts Functional
StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data
Konstantinos Mamouras, Mukund Raghothaman,
Rajeev Alur, Zachary G. Ives, and
Sanjeev Khanna
(University of Pennsylvania, USA)
Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications.
We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.
@InProceedings{PLDI17p693,
author = {Konstantinos Mamouras and Mukund Raghothaman and Rajeev Alur and Zachary G. Ives and Sanjeev Khanna},
title = {StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {693--708},
doi = {},
year = {2017},
}
Artifacts Functional
proc time: 1.56