Powered by
39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018),
June 18–22, 2018,
Philadelphia, PA, USA
Frontmatter
Message from the Chairs
Welcome to PLDI 2018, the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, held in Philadelphia, Pennsylvania, June 18-22. As the call for research papers indicated: PLDI is
a premier forum for programming language research, broadly construed, including design, implementation,
theory, applications, and performance. PLDI contains outstanding research that extends and/or applies
programming-language concepts to advance the field of computing. Novel system designs, thorough
empirical work, well-motivated theoretical results, and new application areas are all welcome emphases at
PLDI.
Web Pages
Verifying That Web Pages Have Accessible Layout
Pavel Panchekha, Adam T. Geller,
Michael D. Ernst,
Zachary Tatlock, and Shoaib Kamil
(University of Washington, USA; Adobe Research, USA)
Usability and accessibility guidelines aim to make graphical user interfaces accessible to all users, by, say, requiring that text is sufficiently large, interactive controls are visible, and heading size corresponds to importance. These guidelines must hold on the infinitely many possible renderings of a web page generated by differing screen sizes, fonts, and other user preferences. Today, these guidelines are tested by manual inspection of a few renderings, because 1) the guidelines are not expressed in a formal language, 2) the semantics of browser rendering are not well understood, and 3) no tools exist to check all possible renderings of a web page. VizAssert solves these problems. First, it introduces visual logic to precisely specify accessibility properties. Second, it formalizes a large fragment of the browser rendering algorithm using novel finitization reductions. Third, it provides a sound, automated tool for verifying assertions in visual logic.
We encoded 14 assertions drawn from best-practice accessibility and mobile-usability guidelines in visual logic. VizAssert checked them on on 62 professionally designed web pages. It found 64 distinct errors in the web pages, while reporting only 13 false positive warnings.
@InProceedings{PLDI18p1,
author = {Pavel Panchekha and Adam T. Geller and Michael D. Ernst and Zachary Tatlock and Shoaib Kamil},
title = {Verifying That Web Pages Have Accessible Layout},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {1--14},
doi = {},
year = {2018},
}
Published Artifact
Info
Artifacts Available
Artifacts Functional
BLeak: Automatically Debugging Memory Leaks in Web Applications
John Vilk and
Emery D. Berger
(University of Massachusetts at Amherst, USA)
Despite the presence of garbage collection in managed languages like JavaScript, memory leaks remain a serious problem. In the context of web applications, these leaks are especially pervasive and difficult to debug. Web application memory leaks can take many forms, including failing to dispose of unneeded event listeners, repeatedly injecting iframes and CSS files, and failing to call cleanup routines in third-party libraries. Leaks degrade responsiveness by increasing GC frequency and overhead, and can even lead to browser tab crashes by exhausting available memory. Because previous leak detection approaches designed for conventional C, C++ or Java applications are ineffective in the browser environment, tracking down leaks currently requires intensive manual effort by web developers.
This paper introduces BLeak (Browser Leak debugger), the first system for automatically debugging memory leaks
in web applications. BLeak's algorithms leverage the observation that in modern web applications, users often repeatedly return to the same (approximate) visual state (e.g., the inbox view in Gmail). Sustained growth between round trips is a strong indicator of a memory leak. To use BLeak, a developer writes a short script (17-73 LOC on our benchmarks) to drive a web application in round trips to the same visual state. BLeak then automatically generates a list of leaks found along with their root causes, ranked by return on investment. Guided by BLeak, we identify and fix over 50 memory leaks in popular libraries and apps including Airbnb, AngularJS, Google Analytics, Google Maps SDK, and jQuery. BLeak's median precision is 100%; fixing the leaks it identifies reduces heap growth by an average of 94%, saving from 0.5 MB to 8 MB per round trip. We believe BLeak's approach to be broadly applicable beyond web applications, including to GUI applications on desktop and mobile platforms.
@InProceedings{PLDI18p15,
author = {John Vilk and Emery D. Berger},
title = {BLeak: Automatically Debugging Memory Leaks in Web Applications},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {15--38},
doi = {},
year = {2018},
}
Published Artifact
Artifacts Available
Artifacts Functional
Putting in All the Stops: Execution Control for JavaScript
Samuel Baxter, Rachit Nigam, Joe Gibbs Politz, Shriram Krishnamurthi, and Arjun Guha
(University of Massachusetts at Amherst, USA; University of California at San Diego, USA; Brown University, USA)
Scores of compilers produce JavaScript, enabling programmers to use many languages on the Web, reuse existing code, and even use Web IDEs. Unfortunately, most compilers inherit the browser's compromised execution model, so long-running programs freeze the browser tab, infinite loops crash IDEs, and so on. The few compilers that avoid these problems suffer poor performance and are difficult to engineer.
This paper presents Stopify, a source-to-source compiler that extends JavaScript with debugging abstractions and blocking operations, and easily integrates with existing compilers. We apply Stopify to ten programming languages and develop a Web IDE that supports stopping, single-stepping, breakpointing, and long-running computations. For nine languages, Stopify requires no or trivial compiler changes. For eight, our IDE is the first that provides these features. Two of our subject languages have compilers with similar features. Stopify's performance is competitive with these compilers and it makes them dramatically simpler.
Stopify's abstractions rely on first-class continuations, which it provides by compiling JavaScript to JavaScript. We also identify sub-languages of JavaScript that compilers implicitly use, and exploit these to improve performance. Finally, Stopify needs to repeatedly interrupt and resume program execution. We use a sampling-based technique to estimate program speed that outperforms other systems.
@InProceedings{PLDI18p39,
author = {Samuel Baxter and Rachit Nigam and Joe Gibbs Politz and Shriram Krishnamurthi and Arjun Guha},
title = {Putting in All the Stops: Execution Control for JavaScript},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {39--54},
doi = {},
year = {2018},
}
Info
Artifacts Functional
Emerging Hardware
Persistency for Synchronization-Free Regions
Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F. Wenisch
(University of Michigan, USA; ARM Research, UK)
Nascent persistent memory (PM) technologies promise the performance of DRAM with the durability of disk, but how best to integrate them into programming systems remains an open question.
Recent work extends language memory models with a persistency model prescribing semantics for updates to PM.
These semantics enable programmers to design data structures in PM that are accessed like memory and yet are recoverable upon crash or failure.
Alas, we find the semantics and performance of existing approaches unsatisfying.
Existing approaches require high-overhead mechanisms, are restricted to certain synchronization constructs, provide incomplete semantics, and/or may recover to state that cannot arise in fault-free execution.
We propose persistency semantics that guarantee failure atomicity of synchronization-free regions (SFRs) - program regions delimited by synchronization operations.
Our approach provides clear semantics for the PM state recovery code may observe and extends C++11's "sequential consistency for data-race-free" guarantee to post-failure recovery code.
We investigate two designs for failure-atomic SFRs that vary in performance and the degree to which commit of persistent state may lag execution.
We demonstrate both approaches in LLVM v3.6.0 and compare to a state-of-the-art baseline to show performance improvement up to 87.5% (65.5% avg).
@InProceedings{PLDI18p55,
author = {Vaibhav Gogte and Stephan Diestelhorst and William Wang and Satish Narayanasamy and Peter M. Chen and Thomas F. Wenisch},
title = {Persistency for Synchronization-Free Regions},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {55--70},
doi = {},
year = {2018},
}
Write-Rationing Garbage Collection for Hybrid Memories
Shoaib Akram, Jennifer B. Sartor,
Kathryn S. McKinley, and
Lieven Eeckhout
(Ghent University, Belgium; Vrije Universiteit Brussel, Belgium; Google, USA)
Emerging Non-Volatile Memory (NVM) technologies offer
high capacity and energy efficiency compared to DRAM, but
suffer from limited write endurance and longer latencies.
Prior work seeks the best of both technologies by combining
DRAM and NVM in hybrid memories to attain low latency,
high capacity, energy efficiency, and durability. Coarsegrained
hardware and OS optimizations then spread writes
out (wear-leveling) and place highly mutated pages in DRAM
to extend NVM lifetimes. Unfortunately even with these
coarse-grained methods, popular Java applications exact impractical
NVM lifetimes of 4 years or less.
This paper shows how to make hybrid memories practical,
without changing the programming model, by enhancing
garbage collection in managed language runtimes. We find
object write behaviors offer two opportunities: (1) 70% of
writes occur to newly allocated objects, and (2) 2% of objects
capture 81% of writes to mature objects. We introduce writerationing
garbage collectors that exploit these fine-grained
behaviors. They extend NVM lifetimes by placing highly mutated
objects in DRAM and read-mostly objects in NVM. We
implement two such systems. (1) Kingsguard-nursery places
new allocation in DRAM and survivors in NVM, reducing
NVM writes by 5× versus NVM only with wear-leveling. (2)
Kingsguard-writers (KG-W) places nursery objects in DRAM
and survivors in a DRAM observer space. It monitors all mature
object writes and moves unwritten mature objects from
DRAM to NVM. Because most mature objects are unwritten,
KG-W exploits NVM capacity while increasing NVM
lifetimes by 11×. It reduces the energy-delay product by 32%
over DRAM-only and 29% over NVM-only. This work opens
up new avenues for making hybrid memories practical.
@InProceedings{PLDI18p71,
author = {Shoaib Akram and Jennifer B. Sartor and Kathryn S. McKinley and Lieven Eeckhout},
title = {Write-Rationing Garbage Collection for Hybrid Memories},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {71--86},
doi = {},
year = {2018},
}
Mapping Spiking Neural Networks onto a Manycore Neuromorphic Architecture
Chit-Kwan Lin, Andreas Wild, Gautham N. Chinya, Tsung-Han Lin, Mike Davies, and Hong Wang
(Intel Labs, USA)
We present a compiler for Loihi, a novel manycore neuromorphic processor that features a programmable, on-chip learning engine for training and executing spiking neural networks (SNNs). An SNN is distinguished from other neural networks in that (1) its independent computing units, or "neurons", communicate with others only through spike messages; and (2) each neuron evaluates local learning rules, which are functions of spike arrival and departure timings, to modify its local state. The collective neuronal state dynamics of an SNN form a nonlinear dynamical system that can be cast as an unconventional model of computation. To realize such an SNN on Loihi requires each constituent neuron to locally store and independently update its own spike timing information. However, each Loihi core has limited resources for this purpose and these must be shared by neurons assigned to the same core. In this work, we present a compiler for Loihi that maps the neurons of an SNN onto and across Loihi's cores efficiently. We show that a poor neuron-to-core mapping can incur significant energy costs and address this with a greedy algorithm that compiles SNNs onto Loihi in a power-efficient manner. In so doing, we highlight the need for further development of compilers for this new, emerging class of architectures.
@InProceedings{PLDI18p87,
author = {Chit-Kwan Lin and Andreas Wild and Gautham N. Chinya and Tsung-Han Lin and Mike Davies and Hong Wang},
title = {Mapping Spiking Neural Networks onto a Manycore Neuromorphic Architecture},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {87--98},
doi = {},
year = {2018},
}
Concurrency and Termination
Static Serializability Analysis for Causal Consistency
Lucas Brutschy, Dimitar Dimitrov,
Peter Müller, and
Martin Vechev
(ETH Zurich, Switzerland)
Many distributed databases provide only weak consistency guarantees to reduce
synchronization overhead and remain available under network partitions.
However, this leads to behaviors not possible under stronger guarantees. Such
behaviors can easily defy programmer intuition and lead to errors that are
notoriously hard to detect.
In this paper, we propose a static analysis for detecting non-serializable
behaviors of applications running on top of causally-consistent databases. Our
technique is based on a novel, local serializability criterion and combines a
generalization of graph-based techniques from the database literature with
another, complementary analysis technique that encodes our serializability
criterion into first-order logic formulas to be checked by an SMT solver. This
analysis is more expensive yet more precise and produces concrete
counter-examples.
We implemented our methods and evaluated them on a number of applications from
two different domains: cloud-backed mobile applications and clients of a
distributed database. Our experiments demonstrate that our analysis is able to
detect harmful serializability violations while producing only a small number
of false alarms.
@InProceedings{PLDI18p99,
author = {Lucas Brutschy and Dimitar Dimitrov and Peter Müller and Martin Vechev},
title = {Static Serializability Analysis for Causal Consistency},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {99--113},
doi = {},
year = {2018},
}
Published Artifact
Artifacts Available
Artifacts Functional
CUBA: Interprocedural Context-UnBounded Analysis of Concurrent Programs
Peizun Liu and Thomas Wahl
(Northeastern University, USA)
A classical result by Ramalingam about synchronization-sensitive interprocedural program analysis implies that reachability for concurrent threads running recursive procedures is undecidable. A technique proposed by Qadeer and Rehof, to bound the number of context switches allowed between the threads, leads to an incomplete solution that is, however, believed to catch “most bugs” in practice. The question whether the technique can also prove the absence of bugs at least in some cases has remained largely open.
In this paper we introduce a broad verification methodology for resource-parameterized programs that observes how changes to the resource parameter affect the behavior of the program. Applied to the context-unbounded analysis problem (CUBA), the methodology results in partial verification techniques for procedural concurrent programs. Our solutions may not terminate, but are able to both refute and prove context-unbounded safety for concurrent recursive threads. We demonstrate the effectiveness of our method using a variety of examples, the safe of which cannot be proved safe by earlier, context-bounded methods.
@InProceedings{PLDI18p114,
author = {Peizun Liu and Thomas Wahl},
title = {CUBA: Interprocedural Context-UnBounded Analysis of Concurrent Programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {114--128},
doi = {},
year = {2018},
}
Published Artifact
Info
Artifacts Available
Artifacts Functional
Symbolic Reasoning for Automatic Signal Placement
Kostas Ferles, Jacob Van Geffen,
Isil Dillig, and Yannis Smaragdakis
(University of Texas at Austin, USA; University of Athens, Greece)
Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several run-time techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal program from its corresponding implicit-signal implementation. The key idea is to generate verification conditions that allow us to minimize the number of required signals and unnecessary context switches, while guaranteeing semantic equivalence between the source and target programs. We have implemented our method in a tool called Expresso and evaluate it on challenging benchmarks from prior papers and open-source software. Expresso-generated code significantly outperforms past automatic signaling mechanisms (avg. 1.56x speedup) and closely matches the performance of hand-optimized explicit-signal code.
@InProceedings{PLDI18p129,
author = {Kostas Ferles and Jacob Van Geffen and Isil Dillig and Yannis Smaragdakis},
title = {Symbolic Reasoning for Automatic Signal Placement},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {129--143},
doi = {},
year = {2018},
}
Advanced Automata-Based Algorithms for Program Termination Checking
Yu-Fang Chen, Matthias Heizmann,
Ondřej Lengál, Yong Li,
Ming-Hsien Tsai,
Andrea Turrini, and
Lijun Zhang
(Academia Sinica, Taiwan; National Taipei University, Taiwan; University of Freiburg, Germany; Brno University of Technology, Czechia; Institute of Software at Chinese Academy of Sciences, China)
In 2014, Heizmann et al. proposed a novel framework for program termination analysis. The analysis starts with a termination proof of a sample path. The path is generalized to a Büchi automaton (BA) whose language (by construction) represents a set of terminating paths. All these paths can be safely removed from the program. The removal of paths is done using automata difference, implemented via BA complementation and intersection. The analysis constructs in this way a set of BAs that jointly "cover" the behavior of the program, thus proving its termination. An implementation of the approach in Ultimate Automizer won the 1st place in the Termination category of SV-COMP 2017.
In this paper, we exploit advanced automata-based algorithms and propose several non-trivial improvements of the framework. To alleviate the complementation computation for BAs---one of the most expensive operations in the framework---, we propose a multi-stage
generalization construction. We start with generalizations producing subclasses of BAs (such as deterministic BAs) for which efficient complementation algorithms are known, and proceed to more general classes only if necessary. Particularly, we focus on the quite expressive subclass of semideterministic BAs and provide an improved complementation algorithm for this class. Our experimental evaluation shows that the proposed approach significantly improves the power of termination checking within the Ultimate Automizer framework.
@InProceedings{PLDI18p144,
author = {Yu-Fang Chen and Matthias Heizmann and Ondřej Lengál and Yong Li and Ming-Hsien Tsai and Andrea Turrini and Lijun Zhang},
title = {Advanced Automata-Based Algorithms for Program Termination Checking},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {144--159},
doi = {},
year = {2018},
}
Dynamic Techniques
HHVM JIT: A Profile-Guided, Region-Based Compiler for PHP and Hack
Guilherme Ottoni
(Facebook, USA)
Dynamic languages such as PHP, JavaScript, Python, and Ruby have been
gaining popularity over the last two decades. A very popular domain
for these languages is web development, including server-side
development of large-scale websites. As a result, improving the
performance of these languages has become more important. Efficiently
compiling programs in these languages is challenging, and many popular
dynamic languages still lack efficient production-quality
implementations. This paper describes the design of the second
generation of the HHVM JIT and how it addresses the challenges to
efficiently execute PHP and Hack programs. This new design uses
profiling to build an aggressive region-based JIT compiler. We
discuss the benefits of this approach compared to the more popular
method-based and trace-based approaches to compile dynamic languages.
Our evaluation running a very large PHP-based code base, the Facebook
website, demonstrates the effectiveness of the new JIT design.
@InProceedings{PLDI18p160,
author = {Guilherme Ottoni},
title = {HHVM JIT: A Profile-Guided, Region-Based Compiler for PHP and Hack},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {160--174},
doi = {},
year = {2018},
}
On-Stack Replacement, Distilled
Daniele Cono D'Elia and Camil Demetrescu
(Sapienza University of Rome, Italy)
On-stack replacement (OSR) is essential technology for adaptive optimization, allowing changes to code actively executing in a managed runtime. The engineering aspects of OSR are well-known among VM architects, with several implementations available to date. However, OSR is yet to be explored as a general means to transfer execution between related program versions, which can pave the road to unprecedented applications that stretch beyond VMs. We aim at filling this gap with a constructive and provably correct OSR framework, allowing a class of general-purpose transformation functions to yield a special-purpose replacement. We describe and evaluate an implementation of our technique in LLVM. As a novel application of OSR, we present a feasibility study on debugging of optimized code, showing how our techniques can be used to fix variables holding incorrect values at breakpoints due to optimizations.
@InProceedings{PLDI18p175,
author = {Daniele Cono D'Elia and Camil Demetrescu},
title = {On-Stack Replacement, Distilled},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {175--189},
doi = {},
year = {2018},
}
EffectiveSan: Type and Memory Error Detection using Dynamically Typed C/C++
Gregory J. Duck and Roland H. C. Yap
(National University of Singapore, Singapore)
Low-level programming languages with weak/static type systems, such as C and C++, are vulnerable to errors relating to the misuse of memory at runtime, such as (sub-)object bounds overflows, (re)use-after-free, and type confusion. Such errors account for many security and other undefined behavior bugs for programs written in these languages. In this paper, we introduce the notion of dynamically typed C/C++, which aims to detect such errors by dynamically checking the "effective type" of each object before use at runtime. We also present an implementation of dynamically typed C/C++ in the form of the Effective Type Sanitizer (EffectiveSan). EffectiveSan enforces type and memory safety using a combination of low-fat pointers, type meta data and type/bounds check instrumentation. We evaluate EffectiveSan against the SPEC2006 benchmark suite and the Firefox web browser, and detect several new type and memory errors. We also show that EffectiveSan achieves high compatibility and reasonable overheads for the given error coverage. Finally, we highlight that EffectiveSan is one of only a few tools that can detect sub-object bounds errors, and uses a novel approach (dynamic type checking) to do so.
@InProceedings{PLDI18p190,
author = {Gregory J. Duck and Roland H. C. Yap},
title = {EffectiveSan: Type and Memory Error Detection using Dynamically Typed C/C++},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {190--204},
doi = {},
year = {2018},
}
Calling-to-Reference Context Translation via Constraint-Guided CFL-Reachability
Cheng Cai,
Qirun Zhang,
Zhiqiang Zuo, Khanh Nguyen, Guoqing Xu, and Zhendong Su
(University of California at Irvine, USA; University of California at Davis, USA; Nanjing University, China)
A calling context is an important piece of information used widely to help developers understand program executions (e.g., for debugging). While calling contexts offer useful control information, information regarding data involved in a bug (e.g., what data structure holds a leaking object), in many cases, can bring developers closer to the bug's root cause. Such data information, often exhibited as heap reference paths, has already been needed by many tools.
The only way for a dynamic analysis to record complete reference paths is to perform heap dumping, which incurs huge runtime overhead and renders the analysis impractical. This paper presents a novel static analysis that can precisely infer, from a calling context of a method that contains a use (e.g., read or write) of an object, the heap reference paths leading to the object at the time the use occurs. Since calling context recording is much less expensive, our technique provides benefits for all dynamic techniques that need heap information, significantly reducing their overhead.
@InProceedings{PLDI18p205,
author = {Cheng Cai and Qirun Zhang and Zhiqiang Zuo and Khanh Nguyen and Guoqing Xu and Zhendong Su},
title = {Calling-to-Reference Context Translation via Constraint-Guided CFL-Reachability},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {205--219},
doi = {},
year = {2018},
}
Transactions and Races
The Semantics of Transactions and Weak Memory in x86, Power, ARM, and C++
Nathan Chong, Tyler Sorensen, and
John Wickerson
(ARM, UK; Imperial College London, UK)
Weak memory models provide a complex, system-centric semantics for concurrent programs, while transactional memory (TM) provides a simpler, programmer-centric semantics. Both have been studied in detail, but their combined semantics is not well understood. This is problematic because such widely-used architectures and languages as x86, Power, and C++ all support TM, and all have weak memory models.
Our work aims to clarify the interplay between weak memory and TM by extending existing axiomatic weak memory models (x86, Power, ARMv8, and C++) with new rules for TM. Our formal models are backed by automated tooling that enables (1) the synthesis of tests for validating our models against existing implementations and (2) the model-checking of TM-related transformations, such as lock elision and compiling C++ transactions to hardware. A key finding is that a proposed TM extension to ARMv8 currently being considered within ARM Research is incompatible with lock elision without sacrificing portability or performance.
@InProceedings{PLDI18p220,
author = {Nathan Chong and Tyler Sorensen and John Wickerson},
title = {The Semantics of Transactions and Weak Memory in x86, Power, ARM, and C++},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {220--234},
doi = {},
year = {2018},
}
Info
Artifacts Functional
MixT: A Language for Mixing Consistency in Geodistributed Transactions
Matthew Milano and
Andrew C. Myers
(Cornell University, USA)
Programming concurrent, distributed systems is hard—especially when these systems mutate shared, persistent state replicated at geographic scale. To enable high availability and scalability, a new class of weakly consistent data stores has become popular. However, some data needs strong consistency. To manipulate both weakly and strongly consistent data in a single transaction, we introduce a new abstraction: mixed-consistency transactions, embodied in a new embedded language, MixT. Programmers explicitly associate consistency models with remote storage sites; each atomic, isolated transaction can access a mixture of data with different consistency models. Compile-time information-flow checking, applied to consistency models, ensures that these models are mixed safely and enables the compiler to automatically partition transactions. New run-time mechanisms ensure that consistency models can also be mixed safely, even when the data used by a transaction resides on separate, mutually unaware stores. Performance measurements show that despite their stronger guarantees, mixed-consistency transactions retain much of the speed of weak consistency, significantly outperforming traditional serializable transactions.
@InProceedings{PLDI18p235,
author = {Matthew Milano and Andrew C. Myers},
title = {MixT: A Language for Mixing Consistency in Geodistributed Transactions},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {235--250},
doi = {},
year = {2018},
}
Bounding Data Races in Space and Time
Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy
(University of Cambridge, UK)
We propose a new semantics for shared-memory parallel programs that
gives strong guarantees even in the presence of data races. Our
local data race freedom property guarantees that all data-race-free
portions of programs exhibit sequential semantics. We provide a
straightforward operational semantics and an equivalent axiomatic
model, and evaluate an implementation for the OCaml
programming language. Our evaluation demonstrates that it is possible
to balance a comprehensible memory model with a reasonable (no
overhead on x86, ~0.6% on ARM) sequential performance trade-off in a
mainstream programming language.
@InProceedings{PLDI18p251,
author = {Stephen Dolan and KC Sivaramakrishnan and Anil Madhavapeddy},
title = {Bounding Data Races in Space and Time},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {251--264},
doi = {},
year = {2018},
}
Artifacts Functional
Floats and Maps
Finding Root Causes of Floating Point Error
Alex Sanchez-Stern, Pavel Panchekha,
Sorin Lerner, and
Zachary Tatlock
(University of California at San Diego, USA; University of Washington, USA)
Floating-point arithmetic plays a central role in science, engineering,
and finance by enabling developers to approximate real arithmetic.
To address numerical issues in large floating-point applications,
developers must identify root causes, which is difficult because
floating-point errors are generally non-local, non-compositional,
and non-uniform.
This paper presents Herbgrind, a tool to help developers identify and
address root causes in numerical code written in low-level languages
like C/C++ and Fortran. Herbgrind dynamically tracks dependencies
between operations and program outputs to avoid false positives and
abstracts erroneous computations to simplified program fragments whose
improvement can reduce output error. We perform several case studies
applying Herbgrind to large, expert-crafted numerical programs and
show that it scales to applications spanning hundreds of thousands of
lines, correctly handling the low-level details of modern floating
point hardware and mathematical libraries and tracking error across
function boundaries and through the heap.
@InProceedings{PLDI18p265,
author = {Alex Sanchez-Stern and Pavel Panchekha and Sorin Lerner and Zachary Tatlock},
title = {Finding Root Causes of Floating Point Error},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {265--278},
doi = {},
year = {2018},
}
Published Artifact
Info
Artifacts Available
Artifacts Functional
Ryū: Fast Float-to-String Conversion
Ulf Adams
(Google, Germany)
We present Ryū, a new routine to convert binary floating point numbers to their decimal representations using only fixed-size integer operations, and prove its correctness. Ryū is simpler and approximately three times faster than the previously fastest implementation.
@InProceedings{PLDI18p279,
author = {Ulf Adams},
title = {Ryū: Fast Float-to-String Conversion},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {279--291},
doi = {},
year = {2018},
}
Artifacts Functional
To-Many or To-One? All-in-One! Efficient Purely Functional Multi-maps with Type-Heterogeneous Hash-Tries
Michael J. Steindorfer and
Jurgen J. Vinju
(Delft University of Technology, Netherlands; CWI, Netherlands; Eindhoven University of Technology, Netherlands)
An immutable multi-map is a many-to-many map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in compilers, runtimes of programming languages, or in static analysis of object-oriented systems. Collection data structures are assumed to carefully balance execution time of operations with memory consumption characteristics and need to scale gracefully from a few elements to multiple gigabytes at least. When processing larger in-memory data sets the overhead of the data structure encoding itself becomes a memory usage bottleneck, dominating the overall performance.
In this paper we propose AXIOM, a novel hash-trie data structure that allows for a highly efficient and type-safe multi-map encoding by distinguishing inlined values of singleton sets from nested sets of multi-mappings. AXIOM strictly generalizes over previous hash-trie data structures by supporting the processing of fine-grained type-heterogeneous content on the implementation level (while API and language support for type-heterogeneity are not scope of this paper). We detail the design and optimizations of AXIOM and further compare it against state-of-the-art immutable maps and multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a case study in static analysis. AXIOM reduces the key-value storage overhead by 1.87x; with specializing and inlining across collection boundaries it improves by 5.1x.
@InProceedings{PLDI18p292,
author = {Michael J. Steindorfer and Jurgen J. Vinju},
title = {To-Many or To-One? All-in-One! Efficient Purely Functional Multi-maps with Type-Heterogeneous Hash-Tries},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {292--304},
doi = {},
year = {2018},
}
Published Artifact
Info
Artifacts Available
Artifacts Functional
Multicore and More
Spatial: A Language and Compiler for Application Accelerators
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel,
Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and
Kunle Olukotun
(Stanford University, USA; EPFL, Switzerland)
Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult.
In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators.
We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.
@InProceedings{PLDI18p305,
author = {David Koeplinger and Matthew Feldman and Raghu Prabhakar and Yaqi Zhang and Stefan Hadjis and Ruben Fiszel and Tian Zhao and Luigi Nardi and Ardavan Pedram and Christos Kozyrakis and Kunle Olukotun},
title = {Spatial: A Language and Compiler for Application Accelerators},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {305--320},
doi = {},
year = {2018},
}
Enhancing Computation-to-Core Assignment with Physical Location Information
Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, and Myoungsoo Jung
(Pennsylvania State University, USA; Yonsei University, South Korea)
Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.
@InProceedings{PLDI18p321,
author = {Orhan Kislal and Jagadish Kotra and Xulong Tang and Mahmut Taylan Kandemir and Myoungsoo Jung},
title = {Enhancing Computation-to-Core Assignment with Physical Location Information},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {321--336},
doi = {},
year = {2018},
}
SWOOP: Software-Hardware Co-design for Non-speculative, Execute-Ahead, In-Order Cores
Kim-Anh Tran, Alexandra Jimborean,
Trevor E. Carlson, Konstantinos Koukos, Magnus Själander, and Stefanos Kaxiras
(Uppsala University, Sweden; National University of Singapore, Singapore; NTNU, Norway)
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy efficiency.
In this paper, we address one of the main performance bottlenecks — last-level cache misses — through a software-hardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating a non-speculative, execute-ahead paradigm in software (SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to efficiency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-flow and indirect memory accesses.
@InProceedings{PLDI18p337,
author = {Kim-Anh Tran and Alexandra Jimborean and Trevor E. Carlson and Konstantinos Koukos and Magnus Själander and Stefanos Kaxiras},
title = {SWOOP: Software-Hardware Co-design for Non-speculative, Execute-Ahead, In-Order Cores},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {337--352},
doi = {},
year = {2018},
}
Concurrency Debugging
iReplayer: In-situ and Identical Record-and-Replay for Multithreaded Applications
Hongyu Liu,
Sam Silvestro,
Wei Wang, Chen Tian, and Tongping Liu
(University of Texas at San Antonio, USA; Huawei Lab, USA)
Reproducing executions of multithreaded programs is very challenging due to many intrinsic and external non-deterministic factors. Existing RnR systems achieve significant progress in terms of performance overhead, but none targets the in-situ setting, in which replay occurs within the same process as the recording process. Also, most existing work cannot achieve identical replay, which may prevent the reproduction of some errors.
This paper presents iReplayer, which aims to identically replay multithreaded programs in the original process (under the "in-situ" setting). The novel in-situ and identical replay of iReplayer makes it more likely to reproduce errors, and allows it to directly employ debugging mechanisms (e.g. watchpoints) to aid failure diagnosis. Currently, iReplayer only incurs 3% performance overhead on average, which allows it to be always enabled in the production environment. iReplayer enables a range of possibilities, and this paper presents three examples: two automatic tools for detecting buffer overflows and use-after-free bugs, and one interactive debugging tool that is integrated with GDB.
@InProceedings{PLDI18p353,
author = {Hongyu Liu and Sam Silvestro and Wei Wang and Chen Tian and Tongping Liu},
title = {iReplayer: In-situ and Identical Record-and-Replay for Multithreaded Applications},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {353--367},
doi = {},
year = {2018},
}
D4: Fast Concurrency Debugging with Parallel Differential Analysis
Bozhen Liu and
Jeff Huang
(Texas A&M University, USA)
We present D4, a fast concurrency analysis framework that detects concurrency bugs (e.g., data races and deadlocks) interactively in the programming phase. As developers add, modify, and remove statements, the code changes are sent to D4 to detect concurrency bugs in real time, which in turn provides immediate feedback to the developer of the new bugs. The cornerstone of D4 includes a novel system design and two novel parallel differential algorithms that embrace both change and parallelization for fundamental static analyses of concurrent programs. Both algorithms react to program changes by memoizing the analysis results and only recomputing the impact of a change in parallel. Our evaluation on an extensive collection of large real-world applications shows that D4 efficiently pinpoints concurrency bugs within 100ms on average after a code change, several orders of magnitude faster than both the exhaustive analysis and the state-of-the-art incremental techniques.
@InProceedings{PLDI18p368,
author = {Bozhen Liu and Jeff Huang},
title = {D4: Fast Concurrency Debugging with Parallel Differential Analysis},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {368--382},
doi = {},
year = {2018},
}
Info
High-Coverage, Unbounded Sound Predictive Race Detection
Jake Roemer, Kaan Genç, and
Michael D. Bond
(Ohio State University, USA)
Dynamic program analysis can predict data races knowable from an observed execution, but existing predictive analyses either miss races or cannot analyze full program executions. This paper presents Vindicator, a novel, sound (no false races) predictive approach that finds more data races than existing predictive approaches. Vindicator achieves high coverage by using a new, efficient analysis that finds all possible predictable races but may detect false races. Vindicator ensures soundness using a novel algorithm that checks each potential race to determine whether it is a true predictable race. An evaluation using large Java programs shows that Vindicator finds hard-to-detect predictable races that existing sound predictive analyses miss, at a comparable performance cost.
@InProceedings{PLDI18p383,
author = {Jake Roemer and Kaan Genç and Michael D. Bond},
title = {High-Coverage, Unbounded Sound Predictive Race Detection},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {383--398},
doi = {},
year = {2018},
}
Artifacts Functional
CURD: A Dynamic CUDA Race Detector
Yuanfeng Peng,
Vinod Grover, and Joseph Devietti
(University of Pennsylvania, USA; NVIDIA, USA)
As GPUs have become an integral part of nearly every pro- cessor, GPU programming has become increasingly popular. GPU programming requires a combination of extreme levels of parallelism and low-level programming, making it easy for concurrency bugs such as data races to arise. These con- currency bugs can be extremely subtle and di cult to debug due to the massive numbers of threads running concurrently on a modern GPU.
While some tools exist to detect data races in GPU pro- grams, they are often prohibitively slow or focused only on a small class of data races in shared memory. Compared to prior work, our race detector, CURD, can detect data races precisely on both shared and global memory, selects an appropriate race detection algorithm based on the synchronization used in a program, and utilizes efficient compiler instrumentation to reduce performance overheads. Across 53 benchmarks, we find that using CURD incurs an aver- age slowdown of just 2.88x over native execution. CURD is 2.1x faster than Nvidia’s CUDA-Racecheck race detector, de- spite detecting a much broader class of races. CURD finds 35 races across our benchmarks, including bugs in established benchmark suites and in sample programs from Nvidia.
@InProceedings{PLDI18p399,
author = {Yuanfeng Peng and Vinod Grover and Joseph Devietti},
title = {CURD: A Dynamic CUDA Race Detector},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {399--412},
doi = {},
year = {2018},
}
Synthesis and Learning
A General Path-Based Representation for Predicting Program Properties
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav
(Technion, Israel; University of Washington, USA)
Predicting program properties such as names or expression types has a wide range of applications. It can ease the task of programming, and increase programmer productivity. A major challenge when learning from programs is how to represent programs in a way that facilitates effective learning.
We present a general path-based representation for learning from programs. Our representation is purely syntactic and extracted automatically. The main idea is to represent a program using paths in its abstract syntax tree (AST). This allows a learning model to leverage the structured nature of code rather than treating it as a flat sequence of tokens.
We show that this representation is general and can: (i) cover different prediction tasks, (ii) drive different learning algorithms (for both generative and discriminative models), and (iii) work across different programming languages.
We evaluate our approach on the tasks of predicting variable names, method names, and full types. We use our representation to drive both CRF-based and word2vec-based learning, for programs of four languages: JavaScript, Java, Python and C#. Our evaluation shows that our approach obtains better results than task-specific handcrafted representations across different tasks and programming languages.
@InProceedings{PLDI18p413,
author = {Uri Alon and Meital Zilberstein and Omer Levy and Eran Yahav},
title = {A General Path-Based Representation for Predicting Program Properties},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {413--428},
doi = {},
year = {2018},
}
Program Synthesis using Conflict-Driven Learning
Yu Feng,
Ruben Martins, Osbert Bastani, and
Isil Dillig
(University of Texas at Austin, USA; Carnegie Mellon University, USA; Massachusetts Institute of Technology, USA)
We propose a new conflict-driven program synthesis technique that is capable of learning from past mistakes. Given a spurious program that violates the desired specification, our synthesis algorithm identifies the root cause of the conflict and learns new lemmas that can prevent similar mistakes in the future. Specifically, we introduce the notion of equivalence modulo conflict and show how this idea can be used to learn useful lemmas that allow the synthesizer to prune large parts of the search space. We have implemented a general-purpose CDCL-style program synthesizer called Neo and evaluate it in two different application domains, namely data wrangling in R and functional programming over lists. Our experiments demonstrate the substantial benefits of conflict-driven learning and show that Neo outperforms two state-of-the-art synthesis tools, Morpheus and Deepcoder, that target these respective domains.
@InProceedings{PLDI18p429,
author = {Yu Feng and Ruben Martins and Osbert Bastani and Isil Dillig},
title = {Program Synthesis using Conflict-Driven Learning},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {429--444},
doi = {},
year = {2018},
}
Accelerating Search-Based Program Synthesis using Learned Probabilistic Models
Woosuk Lee, Kihong Heo,
Rajeev Alur, and Mayur Naik
(University of Pennsylvania, USA)
A key challenge in program synthesis concerns how to efficiently search for the desired program in the space of possible programs. We propose a general approach to accelerate search-based program synthesis by biasing the search towards likely programs. Our approach targets a standard formulation, syntax-guided synthesis (SyGuS), by extending the grammar of possible programs with a probabilistic model dictating the likelihood of each program. We develop a weighted search algorithm to efficiently enumerate programs in order of their likelihood. We also propose a method based on transfer learning that enables to effectively learn a powerful model, called probabilistic higher-order grammar, from known solutions in a domain. We have implemented our approach in a tool called Euphony and evaluate it on SyGuS benchmark problems from a variety of domains. We show that Euphony can learn good models using easily obtainable solutions, and achieves significant performance gains over existing general-purpose as well as domain-specific synthesizers.
@InProceedings{PLDI18p445,
author = {Woosuk Lee and Kihong Heo and Rajeev Alur and Mayur Naik},
title = {Accelerating Search-Based Program Synthesis using Learned Probabilistic Models},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {445--458},
doi = {},
year = {2018},
}
Published Artifact
Artifacts Available
Artifacts Functional
Inferring Crypto API Rules from Code Changes
Rumen Paletov, Petar Tsankov, Veselin Raychev, and
Martin Vechev
(ETH Zurich, Switzerland; DeepCode, Switzerland)
Creating and maintaining an up-to-date set of security rules that match misuses of crypto APIs is challenging, as crypto APIs constantly evolve over time with new cryptographic primitives and settings, making existing ones obsolete.
To address this challenge, we present a new approach to extract security fixes from thousands of code changes. Our approach consists of: (i) identifying code changes, which often capture security fixes, (ii) an abstraction that filters irrelevant code changes (such as refactorings), and (iii) a clustering analysis that reveals commonalities between semantic code changes and helps in eliciting security rules.
We applied our approach to the Java Crypto API and showed that it is effective: (i) our abstraction effectively filters non-semantic code changes (over 99% of all changes) without removing security fixes, and (ii) over 80% of the code changes are security fixes identifying security rules. Based on our results, we identified 13 rules, including new ones not supported by existing security checkers.
@InProceedings{PLDI18p459,
author = {Rumen Paletov and Petar Tsankov and Veselin Raychev and Martin Vechev},
title = {Inferring Crypto API Rules from Code Changes},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {459--473},
doi = {},
year = {2018},
}
Info
Programming-Student Feedback
Automated Clustering and Program Repair for Introductory Programming Assignments
Sumit Gulwani, Ivan Radiček, and Florian Zuleger
(Microsoft, USA; Vienna University of Technology, Austria)
Providing feedback on programming assignments is a tedious task for the
instructor, and even impossible in large Massive Open Online Courses with
thousands of students.
Previous research has suggested that program repair techniques can be used to
generate feedback in programming education.
In this paper, we present a novel fully automated program repair algorithm for
introductory programming assignments.
The key idea of the technique, which enables automation and scalability, is to
use the existing correct student solutions to repair the incorrect attempts.
We evaluate the approach in two experiments:
(I) We evaluate the number, size and quality of the generated repairs on 4,293
incorrect student attempts from an existing MOOC.
We find that our approach can repair 97% of student attempts,
while 81% of those are small repairs of good quality.
(II) We conduct a preliminary user study on performance and repair usefulness
in an interactive teaching setting.
We obtain promising initial results (the average usefulness grade 3.4 on a
scale from 1 to 5), and conclude that our approach can be used in an
interactive setting.
@InProceedings{PLDI18p474,
author = {Sumit Gulwani and Ivan Radiček and Florian Zuleger},
title = {Automated Clustering and Program Repair for Introductory Programming Assignments},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {474--489},
doi = {},
year = {2018},
}
Search, Align, and Repair: Data-Driven Feedback Generation for Introductory Programming Exercises
Ke Wang, Rishabh Singh, and Zhendong Su
(University of California at Davis, USA; Microsoft Research, USA)
This paper introduces the “Search, Align, and Repair” data-driven program repair framework to automate feedback generation for introductory programming exercises. Distinct from existing techniques, our goal is to develop an efficient, fully automated, and problem-agnostic technique for large or MOOC-scale introductory programming courses. We leverage the large amount of available student submissions in such settings and develop new algorithms for identifying similar programs, aligning correct and incorrect programs, and repairing incorrect programs by finding minimal fixes. We have implemented our technique in the Sarfgen system and evaluated it on thousands of real student attempts from the Microsoft-DEV204.1x edX course and the Microsoft CodeHunt platform. Our results show that Sarfgen can, within two seconds on average, generate concise, useful feedback for 89.7% of the incorrect student submissions. It has been integrated with the Microsoft-DEV204.1X edX class and deployed for production use.
@InProceedings{PLDI18p490,
author = {Ke Wang and Rishabh Singh and Zhendong Su},
title = {Search, Align, and Repair: Data-Driven Feedback Generation for Introductory Programming Exercises},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {490--504},
doi = {},
year = {2018},
}
Analyzing Probabilistic Programs
Bounded Expectations: Resource Analysis for Probabilistic Programs
Van Chan Ngo, Quentin Carbonneaux, and
Jan Hoffmann
(Carnegie Mellon University, USA; Yale University, USA)
This paper presents a new static analysis for deriving upper bounds on the expected resource consumption of probabilistic programs. The analysis is fully automatic and derives symbolic bounds that are multivariate polynomials in the inputs. The new technique combines manual state-of-the-art reasoning techniques for probabilistic programs with an effective method for automatic resource-bound analysis of deterministic programs. It can be seen as both, an extension of automatic amortized resource analysis (AARA) to probabilistic programs and an automation of manual reasoning for probabilistic programs that is based on weakest preconditions. An advantage of the technique is that it combines the clarity and compositionality of a weakest-precondition calculus with the efficient automation of AARA. As a result, bound inference can be reduced to off-the-shelf LP solving in many cases and automatically-derived bounds can be interactively extended with standard program logics if the automation fails. Building on existing work, the soundness of the analysis is proved with respect to an operational semantics that is based on Markov decision processes. The effectiveness of the technique is demonstrated with a prototype implementation that is used to automatically analyze 39 challenging probabilistic programs and randomized algorithms. Experiments indicate that the derived constant factors in the bounds are very precise and even optimal for some programs.
@InProceedings{PLDI18p505,
author = {Van Chan Ngo and Quentin Carbonneaux and Jan Hoffmann},
title = {Bounded Expectations: Resource Analysis for Probabilistic Programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {505--521},
doi = {},
year = {2018},
}
Info
Artifacts Functional
PMAF: An Algebraic Framework for Static Analysis of Probabilistic Programs
Di Wang,
Jan Hoffmann, and
Thomas Reps
(Carnegie Mellon University, USA; University of Wisconsin, USA; GrammaTech, USA)
Automatically establishing that a probabilistic program satisfies some property ϕ is a challenging problem. While a sampling-based approach—which involves running the program repeatedly—can suggest that ϕ holds, to establish that the program satisfies ϕ, analysis techniques must be used. Despite recent successes, probabilistic static analyses are still more difficult to design and implement than their deterministic counterparts. This paper presents a framework, called PMAF, for designing, implementing, and proving the correctness of static analyses of probabilistic programs with challenging features such as recursion, unstructured control-flow, divergence, nondeterminism, and continuous distributions. PMAF introduces pre-Markov algebras to factor out common parts of different analyses. To perform interprocedural analysis and to create procedure summaries, PMAF extends ideas from non-probabilistic interprocedural dataflow analysis to the probabilistic setting. One novelty is that PMAF is based on a semantics formulated in terms of a control-flow hyper-graph for each procedure, rather than a standard control-flow graph. To evaluate its effectiveness, PMAF has been used to reformulate and implement existing intraprocedural analyses for Bayesian-inference and the Markov decision problem, by creating corresponding interprocedural analyses. Additionally, PMAF has been used to implement a new interprocedural linear expectation-invariant analysis. Experiments with benchmark programs for the three analyses demonstrate that the approach is practical.
@InProceedings{PLDI18p522,
author = {Di Wang and Jan Hoffmann and Thomas Reps},
title = {PMAF: An Algebraic Framework for Static Analysis of Probabilistic Programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {522--537},
doi = {},
year = {2018},
}
Published Artifact
Artifacts Available
Artifacts Functional
Optimization and Locality
Polyhedral Auto-transformation with No Integer Linear Programming
Aravind Acharya,
Uday Bondhugula, and Albert Cohen
(Indian Institute of Science, India; Inria, France; ENS, France)
State-of-the-art algorithms used in automatic polyhedral transformation for parallelization and locality optimization typically rely on Integer Linear Programming (ILP). This poses a scalability issue when scaling to tens or hundreds of statements, and may be disconcerting in production compiler settings. In this work, we consider relaxing integrality in the ILP formulation of the Pluto algorithm, a popular algorithm used to find good affine transformations. We show that the rational solutions obtained from the relaxed LP formulation can easily be scaled to valid integral ones to obtain desired solutions, although with some caveats. We first present formal results connecting the solution of the relaxed LP to the original Pluto ILP. We then show that there are difficulties in realizing the above theoretical results in practice, and propose an alternate approach to overcome those while still leveraging linear programming. Our new approach obtains dramatic compile-time speedups for a range of large benchmarks. While achieving these compile-time improvements, we show that the performance of the transformed code is not sacrificed. Our approach to automatic transformation provides a mean compilation time improvement of 5.6× over state-of-the-art on relevant challenging benchmarks from the NAS PB, SPEC CPU 2006, and PolyBench suites. We also came across situations where prior frameworks failed to find a transformation in a reasonable amount of time, while our new approach did so instantaneously.
@InProceedings{PLDI18p538,
author = {Aravind Acharya and Uday Bondhugula and Albert Cohen},
title = {Polyhedral Auto-transformation with No Integer Linear Programming},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {538--551},
doi = {},
year = {2018},
}
Info
Partial Control-Flow Linearization
Simon Moll and
Sebastian Hack
(Saarland University, Germany)
If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a SIMD program, several targets of a branch might be executed because of divergence. Especially for irregular data-parallel workloads, it is crucial to avoid if-converting non-divergent branches to increase SIMD utilization. In this paper, we present partial linearization, a simple and efficient if-conversion algorithm that overcomes several limitations of existing if-conversion techniques. In contrast to prior work, it has provable guarantees on which non-divergent branches are retained and will never duplicate code or insert additional branches. We show how our algorithm can be used in a classic loop vectorizer as well as to implement data-parallel languages such as ISPC or OpenCL. Furthermore, we implement prior vectorizer optimizations on top of partial linearization in a more general way. We evaluate the implementation of our algorithm in LLVM on a range of irregular data analytics kernels, a neutronics simulation benchmark and NAB, a molecular dynamics benchmark from SPEC2017 on AVX2, AVX512, and ARM Advanced SIMD machines and report speedups of up to 146 % over ICC, GCC and Clang O3.
@InProceedings{PLDI18p552,
author = {Simon Moll and Sebastian Hack},
title = {Partial Control-Flow Linearization},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {552--565},
doi = {},
year = {2018},
}
Locality Analysis through Static Parallel Sampling
Dong Chen, Fangzhou Liu,
Chen Ding, and
Sreepathi Pai
(University of Rochester, USA)
Locality analysis is important since accessing memory is much slower than computing. Compile-time locality analysis can provide detailed program-level feedback for compilers or runtime systems faster than trace-based locality analysis.
In this paper, we describe a new approach to locality analysis based on static parallel sampling. A compiler analyzes loop-based code and generates sampler code which is run to measure locality.
Our approach can predict precise cache line granularity miss ratio curves for complex loops with non-linear array references and even branches. The precision and overhead of static sampling are evaluated using PolyBench and a bit-reversal loop. Our result shows that by randomly sampling 2% of loop iterations, a compiler can construct almost exact miss ratio curves as trace based analysis. Sampling 0.5% and 1% iterations can achieve good precision and efficiency with an average 0.6% to 1% the time of tracing respectively. Our analysis can also be parallelized. The analysis may assist program optimization techniques such as tiling, program co-location, cache hint selection and help to analyze write locality and parallel locality.
@InProceedings{PLDI18p566,
author = {Dong Chen and Fangzhou Liu and Chen Ding and Sreepathi Pai},
title = {Locality Analysis through Static Parallel Sampling},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {566--579},
doi = {},
year = {2018},
}
Published Artifact
Artifacts Available
Artifacts Functional
Inference for Probabilistic Programs
Incremental Inference for Probabilistic Programs
Marco Cusumano-Towner,
Benjamin Bichsel,
Timon Gehr,
Martin Vechev, and
Vikash K. Mansinghka
(Massachusetts Institute of Technology, USA; ETH Zurich, Switzerland)
We present a novel approach for approximate sampling in probabilistic programs based on incremental inference. The key idea is to adapt the samples for a program P into samples for a program Q, thereby avoiding the expensive sampling computation for program Q. To enable incremental inference in probabilistic programming, our work: (i) introduces the concept of a trace translator which adapts samples from P into samples of Q, (ii) phrases this translation approach in the context of sequential Monte Carlo (SMC), which gives theoretical guarantees that the adapted samples converge to the distribution induced by Q, and (iii) shows how to obtain a concrete trace translator by establishing a correspondence between the random choices of the two probabilistic programs. We implemented our approach in two different probabilistic programming systems and showed that, compared to methods that sample the program Q from scratch, incremental inference can lead to orders of magnitude increase in efficiency, depending on how closely related P and Q are.
@InProceedings{PLDI18p580,
author = {Marco Cusumano-Towner and Benjamin Bichsel and Timon Gehr and Martin Vechev and Vikash K. Mansinghka},
title = {Incremental Inference for Probabilistic Programs},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {580--594},
doi = {},
year = {2018},
}
Artifacts Functional
Bayonet: Probabilistic Inference for Networks
Timon Gehr,
Sasa Misailovic, Petar Tsankov, Laurent Vanbever, Pascal Wiesmann, and
Martin Vechev
(ETH Zurich, Switzerland; University of Illinois at Urbana-Champaign, USA)
Network operators often need to ensure that important probabilistic properties are met, such as that the probability of network congestion is below a certain threshold.
Ensuring such properties is challenging and requires both a suitable language for probabilistic networks and an automated procedure for answering probabilistic inference queries.
We present Bayonet, a novel approach that consists of: (i) a probabilistic network programming language and (ii) a system that performs probabilistic inference on Bayonet programs. The key insight behind Bayonet is to phrase the problem of probabilistic network reasoning as inference in existing probabilistic languages. As a result, Bayonet directly leverages existing probabilistic inference systems and offers a flexible and expressive interface to operators.
We present a detailed evaluation of Bayonet on common network scenarios, such as network congestion, reliability of packet delivery, and others. Our results indicate that Bayonet can express such practical scenarios and answer queries for realistic topology sizes (with up to 30 nodes).
@InProceedings{PLDI18p595,
author = {Timon Gehr and Sasa Misailovic and Petar Tsankov and Laurent Vanbever and Pascal Wiesmann and Martin Vechev},
title = {Bayonet: Probabilistic Inference for Networks},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {595--611},
doi = {},
year = {2018},
}
Published Artifact
Info
Artifacts Available
Artifacts Functional
Probabilistic Programming with Programmable Inference
Vikash K. Mansinghka, Ulrich Schaechtle, Shivam Handa, Alexey Radul, Yutian Chen, and
Martin Rinard
(Massachusetts Institute of Technology, USA; Google Deepmind, UK)
We introduce inference metaprogramming for probabilistic programming languages, including new language constructs, a formalism, and the rst demonstration of e ectiveness in practice. Instead of relying on rigid black-box inference algorithms hard-coded into the language implementation as in previous probabilistic programming languages, infer- ence metaprogramming enables developers to 1) dynamically decompose inference problems into subproblems, 2) apply in- ference tactics to subproblems, 3) alternate between incorpo- rating new data and performing inference over existing data, and 4) explore multiple execution traces of the probabilis- tic program at once. Implemented tactics include gradient- based optimization, Markov chain Monte Carlo, variational inference, and sequental Monte Carlo techniques. Inference metaprogramming enables the concise expression of proba- bilistic models and inference algorithms across diverse elds, such as computer vision, data science, and robotics, within a single probabilistic programming language.
@InProceedings{PLDI18p612,
author = {Vikash K. Mansinghka and Ulrich Schaechtle and Shivam Handa and Alexey Radul and Yutian Chen and Martin Rinard},
title = {Probabilistic Programming with Programmable Inference},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {612--625},
doi = {},
year = {2018},
}
Verification
VeriPhy: Verified Controller Executables from Verified Cyber-Physical System Models
Brandon Bohrer, Yong Kiam Tan, Stefan Mitsch,
Magnus O. Myreen, and André Platzer
(Carnegie Mellon University, USA; Chalmers University of Technology, Sweden)
We present VeriPhy, a verified pipeline which automatically transforms verified high-level models of safety-critical cyber-physical systems (CPSs) in differential dynamic logic (dL) to verified controller executables.
VeriPhy proves that all safety results are preserved end-to-end as it bridges abstraction gaps, including:
i) the gap between mathematical reals in physical models and machine arithmetic in the implementation,
ii) the gap between real physics and its differential-equation models, and
iii) the gap between nondeterministic controller models and machine code.
VeriPhy reduces CPS safety to the faithfulness of the physical environment, which is checked at runtime by synthesized, verified monitors.
We use three provers in this effort: KeYmaera X, HOL4, and Isabelle/HOL.
To minimize the trusted base, we cross-verify KeYmaeraX in Isabelle/HOL.
We evaluate the resulting controller and monitors on commodity robotics hardware.
@InProceedings{PLDI18p626,
author = {Brandon Bohrer and Yong Kiam Tan and Stefan Mitsch and Magnus O. Myreen and André Platzer},
title = {VeriPhy: Verified Controller Executables from Verified Cyber-Physical System Models},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {626--639},
doi = {},
year = {2018},
}
Info
Artifacts Functional
Crellvm: Verified Credible Compilation for LLVM
Jeehoon Kang, Yoonseung Kim, Youngju Song, Juneyoung Lee, Sanghoon Park, Mark Dongyeon Shin,
Yonghyun Kim, Sungkeun Cho,
Joonwon Choi,
Chung-Kil Hur, and
Kwangkeun Yi
(Seoul National University, South Korea; Massachusetts Institute of Technology, USA)
Production compilers such as GCC and LLVM are large complex software systems, for which achieving a high level of reliability is hard. Although testing is an effective method for finding bugs, it alone cannot guarantee a high level of reliability. To provide a higher level of reliability, many approaches that examine compilers' internal logics have been proposed. However, none of them have been successfully applied to major optimizations of production compilers.
This paper presents Crellvm: a verified credible compilation framework for LLVM, which can be used as a systematic way of providing a high level of reliability for major optimizations in LLVM. Specifically, we augment an LLVM optimizer to generate translation results together with their correctness proofs, which can then be checked by a proof checker formally verified in Coq. As case studies, we applied our approach to two major optimizations of LLVM: register promotion mem2reg and global value numbering gvn, having found four new miscompilation bugs (two in each).
@InProceedings{PLDI18p640,
author = {Jeehoon Kang and Yoonseung Kim and Youngju Song and Juneyoung Lee and Sanghoon Park and Mark Dongyeon Shin and Yonghyun Kim and Sungkeun Cho and Joonwon Choi and Chung-Kil Hur and Kwangkeun Yi},
title = {Crellvm: Verified Credible Compilation for LLVM},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {640--654},
doi = {},
year = {2018},
}
Info
Artifacts Functional
Certified Concurrent Abstraction Layers
Ronghui Gu,
Zhong Shao, Jieung Kim, Xiongnan (Newman) Wu,
Jérémie Koenig, Vilhelm Sjöberg,
Hao Chen, David Costanzo, and Tahina Ramananandro
(Yale University, USA; Microsoft Research, USA)
Concurrent abstraction layers are ubiquitous in modern computer systems because of the pervasiveness of multithreaded programming and multicore hardware. Abstraction layers are used to hide the implementation details (e.g., fine-grained synchronization) and reduce the complex dependencies among components at different levels of abstraction. Despite their obvious importance, concurrent abstraction layers have not been treated formally. This severely limits the applicability of layer-based techniques and makes it difficult to scale verification across multiple concurrent layers.
In this paper, we present CCAL---a fully mechanized programming toolkit developed under the CertiKOS project---for specifying, composing, compiling, and linking certified concurrent abstraction layers. CCAL consists of three technical novelties: a new game-theoretical, strategy-based compositional semantic model for concurrency (and its associated program verifiers), a set of formal linking theorems for composing multithreaded and multicore concurrent layers, and a new CompCertX compiler that supports certified thread-safe compilation and linking. The CCAL toolkit is implemented in Coq and supports layered concurrent programming in both C and assembly. It has been successfully applied to build a fully certified concurrent OS kernel with fine-grained locking.
@InProceedings{PLDI18p655,
author = {Ronghui Gu and Zhong Shao and Jieung Kim and Xiongnan (Newman) Wu and Jérémie Koenig and Vilhelm Sjöberg and Hao Chen and David Costanzo and Tahina Ramananandro},
title = {Certified Concurrent Abstraction Layers},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {655--670},
doi = {},
year = {2018},
}
Artifacts Functional
Modularity for Decidability of Deductive Verification with Applications to Distributed Systems
Marcelo Taube, Giuliano Losa, Kenneth L. McMillan, Oded Padon,
Mooly Sagiv,
Sharon Shoham, James R. Wilcox, and Doug Woos
(Tel Aviv University, Israel; University of California at Los Angeles, USA; Microsoft Research, USA; University of Washington, USA)
Proof automation can substantially increase productivity in formal verification of complex systems. However, unpredictablility of automated provers in handling quantified formulas presents a major hurdle to usability of these tools. We propose to solve this problem not by improving the provers, but by using a modular proof methodology that allows us to produce decidable verification conditions. Decidability greatly improves predictability of proof automation, resulting in a more practical verification approach. We apply this methodology to develop verified implementations of distributed protocols, demonstrating its effectiveness.
@InProceedings{PLDI18p671,
author = {Marcelo Taube and Giuliano Losa and Kenneth L. McMillan and Oded Padon and Mooly Sagiv and Sharon Shoham and James R. Wilcox and Doug Woos},
title = {Modularity for Decidability of Deductive Verification with Applications to Distributed Systems},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {671--686},
doi = {},
year = {2018},
}
Artifacts Functional
Program Analysis
Active Learning of Points-To Specifications
Osbert Bastani,
Rahul Sharma,
Alex Aiken, and Percy Liang
(Stanford University, USA; Microsoft Research, India)
When analyzing programs, large libraries pose significant challenges to static points-to analysis. A popular solution is to have a human analyst provide points-to specifications that summarize relevant behaviors of library code, which can substantially improve precision and handle missing code such as native code. We propose Atlas, a tool that automatically infers points-to specifications. Atlas synthesizes unit tests that exercise the library code, and then infers points-to specifications based on observations from these executions. Atlas automatically infers specifications for the Java standard library, and produces better results for a client static information flow analysis on a benchmark of 46 Android apps compared to using existing handwritten specifications.
@InProceedings{PLDI18p687,
author = {Osbert Bastani and Rahul Sharma and Alex Aiken and Percy Liang},
title = {Active Learning of Points-To Specifications},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {687--701},
doi = {},
year = {2018},
}
Pinpoint: Fast and Precise Sparse Value Flow Analysis for Million Lines of Code
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and
Charles Zhang
(Hong Kong University of Science and Technology, China; Sourcebrella, China)
When dealing with millions of lines of code, we still cannot have the cake and eat it: sparse value-flow analysis is powerful in checking source-sink problems, but existing work cannot escape from the “pointer trap” – a precise points-to analysis limits its scalability and an imprecise one seriously undermines its precision. We present Pinpoint, a holistic approach that decomposes the cost of high-precision points-to analysis by precisely discovering local data dependence and delaying the expensive inter-procedural analysis through memorization. Such memorization enables the on-demand slicing of only the necessary inter-procedural data dependence and path feasibility queries, which are then solved by a costly SMT solver. Experiments show that Pinpoint can check programs such as MySQL (around 2 million lines of code) within 1.5 hours. The overall false positive rate is also very low (14.3% - 23.6%). Pinpoint has discovered over forty real bugs in mature and extensively checked open source systems. And the implementation of Pinpoint and all experimental results are freely available.
@InProceedings{PLDI18p702,
author = {Qingkai Shi and Xiao Xiao and Rongxin Wu and Jinguo Zhou and Gang Fan and Charles Zhang},
title = {Pinpoint: Fast and Precise Sparse Value Flow Analysis for Million Lines of Code},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {702--715},
doi = {},
year = {2018},
}
Info
Artifacts Functional
A Data-Driven CHC Solver
He Zhu, Stephen Magill, and
Suresh Jagannathan
(Galois, USA; Purdue University, USA)
We present a data-driven technique to solve Constrained
Horn Clauses (CHCs) that encode verification conditions of programs
containing unconstrained loops and recursions. Our CHC solver
neither constrains the search space from which
a predicate's components are inferred (e.g., by constraining the
number of variables or the values of coefficients used to specify an
invariant), nor fixes the shape of the predicate itself (e.g., by
bounding the number and kind of logical connectives). Instead, our
approach is based on a novel machine learning-inspired
tool chain that synthesizes CHC solutions in terms of arbitrary
Boolean combinations of unrestricted atomic predicates. A
CEGAR-based verification loop inside the solver progressively
samples representative positive and negative data from
recursive CHCs, which is fed to the machine
learning tool chain. Our solver is implemented as an LLVM pass in
the SeaHorn verification framework and has been used
to successfully verify a large number of nontrivial and challenging
C programs from the literature and well-known benchmark suites
(e.g., SV-COMP).
@InProceedings{PLDI18p716,
author = {He Zhu and Stephen Magill and Suresh Jagannathan},
title = {A Data-Driven CHC Solver},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {716--730},
doi = {},
year = {2018},
}
Published Artifact
Artifacts Available
Artifacts Functional
User-Guided Program Reasoning using Bayesian Inference
Mukund Raghothaman, Sulekha Kulkarni, Kihong Heo, and Mayur Naik
(University of Pennsylvania, USA)
Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false alarms. Our approach associates each alarm with a confidence value by performing Bayesian inference on a probabilistic model derived from the analysis rules. In each iteration, the user inspects the alarm with the highest confidence and labels its ground truth, and the approach recomputes the confidences of the remaining alarms given this feedback. It thereby maximizes the return on the effort by the user in inspecting each alarm. We have implemented our approach in a tool named Bingo for program analyses expressed in Datalog. Experiments with real users and two sophisticated analyses---a static datarace analysis for Java programs and a static taint analysis for Android apps---show significant improvements on a range of metrics, including false alarm rates and number of bugs found.
@InProceedings{PLDI18p731,
author = {Mukund Raghothaman and Sulekha Kulkarni and Kihong Heo and Mayur Naik},
title = {User-Guided Program Reasoning using Bayesian Inference},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {731--744},
doi = {},
year = {2018},
}
Artifacts Functional
Parallelism
GPU Code Optimization using Abstract Kernel Emulation and Sensitivity Analysis
Changwan Hong, Aravind Sukumaran-Rajam, Jinsung Kim, Prashant Singh Rawat, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and P. Sadayappan
(Ohio State University, USA; Pacific Northwest National Laboratory, USA; Colorado State University, USA; Grenoble Alpes University, France; Inria, France)
In this paper, we develop an approach to GPU kernel optimization by focusing on identification of bottleneck resources and determining optimization parameters that can alleviate the bottleneck. Performance modeling for GPUs is done by abstract kernel emulation along with latency/gap modeling of resources. Sensitivity analysis with respect to resource latency/gap parameters is used to predict the bottleneck resource for a given kernel's execution. The utility of the bottleneck analysis is demonstrated in two contexts: 1) Coupling the new bottleneck-driven optimization strategy with the OpenTuner auto-tuner: experimental results on all kernels from the Rodinia suite and GPU tensor contraction kernels from the NWChem computational chemistry suite demonstrate effectiveness. 2) Manual code optimization: two case studies illustrate the use of the bottleneck analysis to iteratively improve the performance of code from state-of-the-art domain-specific code generators.
@InProceedings{PLDI18p745,
author = {Changwan Hong and Aravind Sukumaran-Rajam and Jinsung Kim and Prashant Singh Rawat and Sriram Krishnamoorthy and Louis-Noël Pouchet and Fabrice Rastello and P. Sadayappan},
title = {GPU Code Optimization using Abstract Kernel Emulation and Sensitivity Analysis},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {745--760},
doi = {},
year = {2018},
}
Gluon: A Communication-Optimizing Substrate for Distributed Heterogeneous Graph Analytics
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali
(University of Texas at Austin, USA; University of Illinois at Urbana-Champaign, USA)
This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, and programming models. The key to this approach is Gluon, a communication-optimizing substrate.
Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies.
To demonstrate Gluon’s ability to support different programming models, we interfaced Gluon with the Galois and Ligra shared-memory graph analytics systems to produce distributed-memory versions of these systems named D-Galois and D-Ligra, respectively. To demonstrate Gluon’s ability to support heterogeneous processors, we interfaced Gluon with IrGL, a state-of-the-art single-GPU system for graph analytics, to produce D-IrGL, the first multi-GPU distributed-memory graph analytics system.
Our experiments were done on CPU clusters with up to 256 hosts and roughly 70,000 threads and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ∼2.6× on the average. D-Galois and D-IrGL scale well and are faster than Gemini, the state-of-the-art distributed CPU graph analytics system, by factors of ∼3.9× and ∼4.9×, respectively, on the average.
@InProceedings{PLDI18p761,
author = {Roshan Dathathri and Gurbinder Gill and Loc Hoang and Hoang-Vu Dang and Alex Brooks and Nikoli Dryden and Marc Snir and Keshav Pingali},
title = {Gluon: A Communication-Optimizing Substrate for Distributed Heterogeneous Graph Analytics},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {761--777},
doi = {},
year = {2018},
}
Heartbeat Scheduling: Provable Efficiency for Nested Parallelism
Umut A. Acar,
Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski
(Carnegie Mellon University, USA; University of Strasbourg, France; Inria, France; CREST, USA)
A classic problem in parallel computing is to take a high-level parallel program written, for example, in nested-parallel style with fork-join constructs and run it efficiently on a real machine. The problem could be considered solved in theory, but not in practice, because the overheads of creating and managing parallel threads can overwhelm their benefits. Developing efficient parallel codes therefore usually requires extensive tuning and optimizations to reduce parallelism just to a point where the overheads become acceptable.
In this paper, we present a scheduling technique that delivers provably efficient results for arbitrary nested-parallel programs, without the tuning needed for controlling parallelism overheads. The basic idea behind our technique is to create threads only at a beat (which we refer to as the "heartbeat") and make sure to do useful work in between. We specify our heartbeat scheduler using an abstract-machine semantics and provide mechanized proofs that the scheduler guarantees low overheads for all nested parallel programs. We present a prototype C++ implementation and an evaluation that shows that Heartbeat competes well with manually optimized Cilk Plus codes, without requiring manual tuning.
@InProceedings{PLDI18p778,
author = {Umut A. Acar and Arthur Charguéraud and Adrien Guatto and Mike Rainey and Filip Sieczkowski},
title = {Heartbeat Scheduling: Provable Efficiency for Nested Parallelism},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {778--791},
doi = {},
year = {2018},
}
Info
Types
Guarded Impredicative Polymorphism
Alejandro Serrano,
Jurriaan Hage, Dimitrios Vytiniotis, and Simon Peyton Jones
(Utrecht University, Netherlands; Microsoft Research, UK)
The design space for type systems that support impredicative instantiation is extremely complicated. One needs to strike a balance between expressiveness, simplicity for both the end programmer and the type system implementor, and how easily the system can be integrated with other advanced type system concepts. In this paper, we propose a new point in the design space, which we call guarded impredicativity. Its key idea is that impredicative instantiation in an application is allowed for type variables that occur under a type constructor. The resulting type system has a clean declarative specification — making it easy for programmers to predict what will type and what will not —, allows for a smooth integration with GHC’s OutsideIn(X) constraint solving framework, while giving up very little in terms of expressiveness compared to systems like HMF, HML, FPH and MLF. We give a sound and complete inference algorithm, and prove a principal type property for our system.
@InProceedings{PLDI18p792,
author = {Alejandro Serrano and Jurriaan Hage and Dimitrios Vytiniotis and Simon Peyton Jones},
title = {Guarded Impredicative Polymorphism},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {792--805},
doi = {},
year = {2018},
}
Typed Closure Conversion for the Calculus of Constructions
William J. Bowman and
Amal Ahmed
(Northeastern University, USA)
Dependently typed languages such as Coq are used to specify and verify the full functional correctness of source programs. Type-preserving compilation can be used to preserve these specifications and proofs of correctness through compilation into the generated target-language programs. Unfortunately, type-preserving compilation of dependent types is hard. In essence, the problem is that dependent type systems are designed around high-level compositional abstractions to decide type checking, but compilation interferes with the type-system rules for reasoning about run-time terms.
We develop a type-preserving closure-conversion translation from the Calculus of Constructions (CC) with strong dependent pairs (Σ types)—a subset of the core language of Coq—to a type-safe, dependently typed compiler intermediate language named CC-CC. The central challenge in this work is how to translate the source type-system rules for reasoning about functions into target type-system rules for reasoning about closures. To justify these rules, we prove soundness of CC-CC by giving a model in CC. In addition to type preservation, we prove correctness of separate compilation.
@InProceedings{PLDI18p806,
author = {William J. Bowman and Amal Ahmed},
title = {Typed Closure Conversion for the Calculus of Constructions},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {806--820},
doi = {},
year = {2018},
}
Info
Inferring Type Rules for Syntactic Sugar
Justin Pombrio and Shriram Krishnamurthi
(Brown University, USA)
Type systems and syntactic sugar are both valuable to programmers, but sometimes at odds. While sugar is a valuable mechanism for implementing realistic languages, the expansion process obscures program source structure. As a result, type errors can reference terms the programmers did not write (and even constructs they do not know), baffling them. The language developer must also manually construct type rules for the sugars, to give a typed account of the surface language. We address these problems by presenting a process for automatically reconstructing type rules for the surface language using rules for the core. We have implemented this theory, and show several interesting case studies.
@InProceedings{PLDI18p821,
author = {Justin Pombrio and Shriram Krishnamurthi},
title = {Inferring Type Rules for Syntactic Sugar},
booktitle = {Proc.\ PLDI},
publisher = {ACM},
pages = {821--834},
doi = {},
year = {2018},
}
Published Artifact
Info
Artifacts Available
Artifacts Functional
proc time: 2