CC 2018 – Author Index |
Contents -
Abstracts -
Authors
|
A B C D E F G H K L M N O P R S T V X Y Z
Abdeljallal, Fahd |
CC '18: "A Session Type Provider: Compile-Time ..."
A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#
Rumyana Neykova, Raymond Hu, Nobuko Yoshida, and Fahd Abdeljallal (Imperial College London, UK) We present a library for the specification and implementation of distributed protocols in native F# (and other .NET languages) based on multiparty session types (MPST). There are two main contributions. Our library is the first practical development of MPST to support what we refer to as interaction refinements: a collection of features related to the refinement of protocols, such as message-type refinements (value constraints) and message value dependent control flow. A well-typed endpoint program using our library is guaranteed to perform only compliant session I/O actions w.r.t. to the refined protocol, up to premature termination. Second, our library is developed as a session type provider, that performs on-demand compile-time protocol validation and generation of protocol-specific .NET types for users writing the distributed endpoint programs. It is implemented by extending and integrating Scribble (a toolchain for MPST) with an SMT solver into the type providers framework. The safety guarantees are achieved by a combination of static type checking of the generated types for messages and I/O operations, correctness by construction from code generation, and automated inlining of assertions. @InProceedings{CC18p128, author = {Rumyana Neykova and Raymond Hu and Nobuko Yoshida and Fahd Abdeljallal}, title = {A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {128--138}, doi = {10.1145/3178372.3179495}, year = {2018}, } Publisher's Version |
|
Adam, Justus |
CC '18: "Compiling for Concise Code ..."
Compiling for Concise Code and Efficient I/O
Sebastian Ertel, Andrés Goens, Justus Adam, and Jeronimo Castrillon (TU Dresden, Germany) Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals. @InProceedings{CC18p104, author = {Sebastian Ertel and Andrés Goens and Justus Adam and Jeronimo Castrillon}, title = {Compiling for Concise Code and Efficient I/O}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {104--115}, doi = {10.1145/3178372.3179505}, year = {2018}, } Publisher's Version |
|
Balatsouras, George |
CC '18: "An Efficient Data Structure ..."
An Efficient Data Structure for Must-Alias Analysis
George Kastrinis, George Balatsouras, Kostas Ferles, Nefeli Prokopaki-Kostopoulou, and Yannis Smaragdakis (University of Athens, Greece; University of Texas at Austin, USA) A must-alias (or “definite-alias”) analysis computes sets of expressions that are guaranteed to be aliased at a given pro- gram point. The analysis has been shown to have significant practical impact, and it is actively used in popular research frameworks and commercial tools. We present a custom data structure that speeds up must-alias analysis by nearly two orders of magnitude (while computing identical results). The data structure achieves efficiency by encoding multiple alias sets in a single linked structure, and compactly representing the aliasing relations of arbitrarily long expressions. We ex- plore the data structure’s performance in both an imperative and a declarative setting and contrast it extensively with prior techniques. With our approach, must-alias analysis can be performed efficiently, over large Java benchmarks, in under half a minute, making the analysis cost acceptable for most practical uses. @InProceedings{CC18p48, author = {George Kastrinis and George Balatsouras and Kostas Ferles and Nefeli Prokopaki-Kostopoulou and Yannis Smaragdakis}, title = {An Efficient Data Structure for Must-Alias Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {48--58}, doi = {10.1145/3178372.3179519}, year = {2018}, } Publisher's Version |
|
Barany, Gergö |
CC '18: "Finding Missed Compiler Optimizations ..."
Finding Missed Compiler Optimizations by Differential Testing
Gergö Barany (Inria, France) Randomized differential testing of compilers has had great success in finding compiler crashes and silent miscompilations. In this paper we investigate whether we can use similar techniques to improve the quality of the generated code: Can we compare the code generated by different compilers to find optimizations performed by one but missed by another? We have developed a set of tools for running such tests. We compile C code generated by standard random program generators and use a custom binary analysis tool to compare the output programs. Depending on the optimization of interest, the tool can be configured to compare features such as the number of total instructions, multiply or divide instructions, function calls, stack accesses, and more. A standard test case reduction tool produces minimal examples once an interesting difference has been found. We have used our tools to compare the code generated by GCC, Clang, and CompCert. We have found previously unreported missing arithmetic optimizations in all three compilers, as well as individual cases of unnecessary register spilling, missed opportunities for register coalescing, dead stores, redundant computations, and missing instruction selection patterns. @InProceedings{CC18p82, author = {Gergö Barany}, title = {Finding Missed Compiler Optimizations by Differential Testing}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {82--92}, doi = {10.1145/3178372.3179521}, year = {2018}, } Publisher's Version Info |
|
Binder, Walter |
CC '18: "Efficient Dynamic Analysis ..."
Efficient Dynamic Analysis for Node.js
Haiyang Sun, Daniele Bonetta, Christian Humer, and Walter Binder (University of Lugano, Switzerland; Oracle Labs, USA; Oracle Labs, Switzerland) Due to its popularity, there is an urgent need for dynamic program-analysis tools for Node.js, helping developers find bugs, performance bottlenecks, and bad coding practices. Frameworks based on code-level instrumentation enable dynamic analyses close to program semantics and are more flexible than Node.js built-in profiling tools. However, existing code-level instrumentation frameworks for JavaScript suffer from enormous overheads and difficulties in instrumenting the built-in module library of Node.js. In this paper, we introduce a new dynamic analysis framework for JavaScript and Node.js called NodeProf. While offering similar flexibility as code-level instrumentation frameworks, NodeProf significantly improves analysis performance while ensuring comprehensive code coverage. NodeProf supports runtime (de)activation of analyses and incurs zero overhead when no analysis is active. NodeProf is based on dynamic instrumentation of the JavaScript runtime and leverages automatic partial evaluation to generate efficient machine code. In addition, NodeProf makes use of the language interoperability provided by the runtime and thus allows dynamic analyses to be written in Java and JavaScript with compatibility to Jalangi, a state-of-the-art code-level JavaScript instrumentation framework. Our experiments show that the peak performance of running the same dynamic analyses using NodeProf can be up to three orders of magnitude faster than Jalangi. @InProceedings{CC18p196, author = {Haiyang Sun and Daniele Bonetta and Christian Humer and Walter Binder}, title = {Efficient Dynamic Analysis for Node.js}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {196--206}, doi = {10.1145/3178372.3179527}, year = {2018}, } Publisher's Version |
|
Bonetta, Daniele |
CC '18: "Efficient Dynamic Analysis ..."
Efficient Dynamic Analysis for Node.js
Haiyang Sun, Daniele Bonetta, Christian Humer, and Walter Binder (University of Lugano, Switzerland; Oracle Labs, USA; Oracle Labs, Switzerland) Due to its popularity, there is an urgent need for dynamic program-analysis tools for Node.js, helping developers find bugs, performance bottlenecks, and bad coding practices. Frameworks based on code-level instrumentation enable dynamic analyses close to program semantics and are more flexible than Node.js built-in profiling tools. However, existing code-level instrumentation frameworks for JavaScript suffer from enormous overheads and difficulties in instrumenting the built-in module library of Node.js. In this paper, we introduce a new dynamic analysis framework for JavaScript and Node.js called NodeProf. While offering similar flexibility as code-level instrumentation frameworks, NodeProf significantly improves analysis performance while ensuring comprehensive code coverage. NodeProf supports runtime (de)activation of analyses and incurs zero overhead when no analysis is active. NodeProf is based on dynamic instrumentation of the JavaScript runtime and leverages automatic partial evaluation to generate efficient machine code. In addition, NodeProf makes use of the language interoperability provided by the runtime and thus allows dynamic analyses to be written in Java and JavaScript with compatibility to Jalangi, a state-of-the-art code-level JavaScript instrumentation framework. Our experiments show that the peak performance of running the same dynamic analyses using NodeProf can be up to three orders of magnitude faster than Jalangi. @InProceedings{CC18p196, author = {Haiyang Sun and Daniele Bonetta and Christian Humer and Walter Binder}, title = {Efficient Dynamic Analysis for Node.js}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {196--206}, doi = {10.1145/3178372.3179527}, year = {2018}, } Publisher's Version |
|
Brock, Jacob |
CC '18: "PAYJIT: Space-Optimal JIT ..."
PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation
Jacob Brock, Chen Ding, Xiaoran Xu, and Yan Zhang (University of Rochester, USA; Rice University, USA; Futurewei Technologies, USA) Just-in-time (JIT) compilation in dynamic programming languages can improve execution speed in code with hot sections. However, that comes at the cost of increased memory usage for the storage of compiled code and associated bookkeeping data, as well as up-front compilation time for the hot code. The current standard JIT compilation policy (used in both Android's JIT compiler and the HotSpot Java compiler) is simplistic; any method that has reached a certain hotness, as counted by invocations and loop iterations, is scheduled for compilation. This would be fine if the cost/benefit of compilation was the same for every method of a given hotness, but that is not the case. In fact, compiling a large method will likely result in less overall speedup than compiling a smaller, equally hot method. This exposes an opportunity for improvement. We propose the PAYJIT compilation policy for method-based JIT compilers, which scales compilation hotness thresholds with method sizes, and includes two-point tuning, a mechanism for determining those hotness thresholds. In this way, PAYJIT compiles more small methods, it compiles them sooner, and it nearly eliminates the compilation of large methods that contribute little to overall speedup. Among 10 popular Android apps tested across 16 experiments, the PAYJIT compilation policy decreases compilation time by a maximum of 49.2%, with an average of 18.7%; execution time by a maximum of 10.9%, with an average of 1.34% (for a geometric mean speedup of 1.46%); and code cache size by a maximum of 41.1%, with an average of 15.6%. @InProceedings{CC18p71, author = {Jacob Brock and Chen Ding and Xiaoran Xu and Yan Zhang}, title = {PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {71--81}, doi = {10.1145/3178372.3179523}, year = {2018}, } Publisher's Version |
|
Burke, Michael G. |
CC '18: "Parallel Sparse Flow-Sensitive ..."
Parallel Sparse Flow-Sensitive Points-to Analysis
Jisheng Zhao, Michael G. Burke, and Vivek Sarkar (Rice University, USA) This paper aims to contribute to further advances in pointer (or points-to) analysis algorithms along the combined dimen- sions of precision, scalability, and performance. For precision, we aim to support interprocedural ow-sensitive analysis. For scalability, we aim to show that our approach scales to large applications with reasonable memory requirements. For performance, we aim to design a points-to analysis algo- rithm that is amenable to parallel execution. The algorithm introduced in this paper achieves all these goals. As an ex- ample, our experimental results show that our algorithm can analyze the 2.2MLOC Tizen OS framework with < 16GB of memory while delivering an average analysis rate of > 10KLOC/second. Our points-to analysis algorithm, PSEGPT, is based on the Pointer Sparse Evaluation Graph (PSEG) form, a new analysis representation that combines both points-to and heap def-use information. PSEGPT is a scalable interprocedural flow-sensitive context-insensitive points-to analy- sis that is amenable to efficient task-parallel implementa- tions, even though points-to analysis is usually viewed as a challenge problem for parallelization. Our experimental results with 6 real-world applications on a 12-core machine show an average parallel speedup of 4.45× and maximum speedup of 7.35×. The evaluation also includes precision results by demonstrating that our algorithm identifies significantly more inlinable indirect calls (IICs) than SUPT and SS, two states of the art SSA-based points-to analyses implemented in LLVM. @InProceedings{CC18p59, author = {Jisheng Zhao and Michael G. Burke and Vivek Sarkar}, title = {Parallel Sparse Flow-Sensitive Points-to Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {59--70}, doi = {10.1145/3178372.3179517}, year = {2018}, } Publisher's Version |
|
Castrillon, Jeronimo |
CC '18: "Compiling for Concise Code ..."
Compiling for Concise Code and Efficient I/O
Sebastian Ertel, Andrés Goens, Justus Adam, and Jeronimo Castrillon (TU Dresden, Germany) Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals. @InProceedings{CC18p104, author = {Sebastian Ertel and Andrés Goens and Justus Adam and Jeronimo Castrillon}, title = {Compiling for Concise Code and Efficient I/O}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {104--115}, doi = {10.1145/3178372.3179505}, year = {2018}, } Publisher's Version |
|
Cohen, Albert |
CC '18: "A Polyhedral Compilation Framework ..."
A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds
Jie Zhao, Michael Kruse, and Albert Cohen (Inria, France; ENS, France) We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable, removing the loop-carried data dependences induced by termination conditions. We propose an automatic compilation approach to parallelize and optimize dynamic counted loops. Our approach relies on affine relations only, as implemented in state-of-the-art polyhedral libraries. Revisiting a state-of-the-art framework to parallelize arbitrary while loops, we introduce additional control dependences on data-dependent predicates. Our method goes beyond the state of the art in fully automating the process, specializing the code generation algorithm to the case of dynamic counted loops and avoiding the introduction of spurious loop-carried dependences. We conduct experiments on representative irregular computations, from dynamic programming, computer vision and finite element methods to sparse matrix linear algebra. We validate that the method is applicable to general affine transformations for locality optimization, vectorization and parallelization. @InProceedings{CC18p14, author = {Jie Zhao and Michael Kruse and Albert Cohen}, title = {A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {14--24}, doi = {10.1145/3178372.3179509}, year = {2018}, } Publisher's Version CC '18: "Modeling the Conflicting Demands ..." Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen (Inria, France; ENS, France; KU Leuven, Belgium; Polly Labs, Belgium; Georgia Tech, USA; ETH Zurich, Switzerland) The construction of effective loop nest optimizers and parallelizers remains challenging despite decades of work in the area. Due to the increasing diversity of loop-intensive applications and to the complex memory/computation hierarchies in modern processors, optimization heuristics are pulled towards conflicting goals, highlighting the lack of a systematic approach to optimizing locality and parallelism. Acknowledging these conflicting demands on loop nest optimization, we propose an algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators. This algorithmic template orchestrates a collection of parameterizable, linear optimization problems over a polyhedral space of semantics-preserving transformations. While the overall problem is not convex, effective algorithms can be derived from this template delivering unprecedented performance portability over GPU and multicore CPU. We discuss the rationale for this algorithmic template and validate it on representative computational kernels/benchmarks. @InProceedings{CC18p3, author = {Oleksandr Zinenko and Sven Verdoolaege and Chandan Reddy and Jun Shirako and Tobias Grosser and Vivek Sarkar and Albert Cohen}, title = {Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {3--13}, doi = {10.1145/3178372.3179507}, year = {2018}, } Publisher's Version |
|
Cole, Murray |
CC '18: "Towards a Compiler Analysis ..."
Towards a Compiler Analysis for Parallel Algorithmic Skeletons
Tobias J. K. Edler von Koch, Stanislav Manilov, Christos Vasiladiotis, Murray Cole, and Björn Franke (Qualcomm Innovation Center, USA; University of Edinburgh, UK) Parallelizing compilers aim to detect data-parallel loops in sequential programs, which -- after suitable transformation -- can be safely and profitably executed in parallel. However, in the traditional model safe parallelization requires provable absence of dependences. At the same time, several well-known parallel algorithmic skeletons cannot be easily expressed in a data dependence framework due to spurious depedences, which prevent parallel execution. In this paper we argue that commutativity is a more suitable concept supporting formal characterization of parallel algorithmic skeletons. We show that existing commutativity definitions cannot be easily adapted for practical use, and develop a new concept of commutativity based on liveness, which readily integrates with existing compiler analyses. This enables us to develop formal definitions of parallel algorithmic skeletons such as task farms, MapReduce and Divide&Conquer. We show that existing informal characterizations of various parallel algorithmic skeletons are captured by our abstract formalizations. In this way we provide the urgently needed formal characterization of widely used parallel constructs allowing their immediate use in novel parallelizing compilers. @InProceedings{CC18p174, author = {Tobias J. K. Edler von Koch and Stanislav Manilov and Christos Vasiladiotis and Murray Cole and Björn Franke}, title = {Towards a Compiler Analysis for Parallel Algorithmic Skeletons}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {174--184}, doi = {10.1145/3178372.3179513}, year = {2018}, } Publisher's Version |
|
Colin, Alexei |
CC '18: "Termination Checking and Task ..."
Termination Checking and Task Decomposition for Task-Based Intermittent Programs
Alexei Colin and Brandon Lucia (Carnegie Mellon University, USA) Emerging energy-harvesting computer systems extract energy from their environment to compute, sense, and communicate with no battery or tethered power supply. Building software for energy-harvesting devices is a challenge, because they operate only intermittently as energy is available. Programs frequently reboot due to power loss, which can corrupt program state and prevent forward progress. Task-based programming models allow intermittent execution of long-running applications, but require the programmer to decompose code into tasks that will eventually complete between two power failures. Task decomposition is challenging and no tools exist to aid in task decomposition. We propose CleanCut, a tool that can check for and report non-terminating tasks in existing code, as well as automatically decompose code into efficient, terminating tasks. CleanCut is based on a statistical model for energy of paths through the program. We applied a prototype of CleanCut to four applications, including pattern-recognition, encryption, compression, and data filtering. Our experiments demonstrated the risk of non-termination in existing code and showed that CleanCut finds efficient task decompositions that execute 2.45x faster on average than manually placed boundaries. @InProceedings{CC18p116, author = {Alexei Colin and Brandon Lucia}, title = {Termination Checking and Task Decomposition for Task-Based Intermittent Programs}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {116--127}, doi = {10.1145/3178372.3179525}, year = {2018}, } Publisher's Version Info |
|
Crawford, Lewis |
CC '18: "CAnDL: A Domain Specific Language ..."
CAnDL: A Domain Specific Language for Compiler Analysis
Philip Ginsbach, Lewis Crawford, and Michael F. P. O'Boyle (University of Edinburgh, UK) Optimizing compilers require sophisticated program analysis and transformations to exploit modern hardware. Implementing the appropriate analysis for a compiler optimization is a time consuming activity. For example, in LLVM, tens of thousands of lines of code are required to detect appropriate places to apply peephole optimizations. It is a barrier to the rapid prototyping and evaluation of new optimizations. In this paper we present the Compiler Analysis Description Language (CAnDL), a domain specific language for compiler analysis. CAnDL is a constraint based language that operates over LLVM's intermediate representation. The compiler developer writes a CAnDL program, which is then compiled by the CAnDL compiler into a C++ LLVM pass. It provides a uniform manner in which to describe compiler analysis and can be applied to a range of compiler analysis problems, reducing code length and complexity. We implemented and evaluated CAnDL on a number of real world use cases: eliminating redundant operations; graphics code optimization; identifying static control flow regions. In all cases were we able to express the analysis more briefly than competing approaches. @InProceedings{CC18p151, author = {Philip Ginsbach and Lewis Crawford and Michael F. P. O'Boyle}, title = {CAnDL: A Domain Specific Language for Compiler Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {151--162}, doi = {10.1145/3178372.3179515}, year = {2018}, } Publisher's Version |
|
Demange, Delphine |
CC '18: "Semantic Reasoning about the ..."
Semantic Reasoning about the Sea of Nodes
Delphine Demange, Yon Fernández de Retana, and David Pichardie (Univ Rennes, France; Inria, France; CNRS, France; IRISA, France) The Sea of Nodes intermediate representation was introduced by Cliff Click in the mid 90s as an enhanced Static Single Assignment (SSA) form. It improves on the initial SSA form by relaxing the total order on instructions in basic blocks into explicit data and control dependencies. This makes programs more flexible to optimize. This graph-based representation is now used in many industrial-strength compilers, such as HotSpot or Graal. While the SSA form is now well understood from a semantic perspective -- even formally verified optimizing compilers use it in their middle-end -- very few semantic studies have been conducted about the Sea of Nodes. This paper presents a simple but rigorous formal semantics for a Sea of Nodes form. It comprises a denotational component to express data computation, and an operational component to express control flow. We then prove a fundamental, dominance-based semantic property on Sea of Nodes programs which determines the regions of the graph where the values of nodes are preserved. Finally, we apply our results to prove the semantic correctness of a redundant zero-check elimination optimization. All the necessary semantic properties have been mechanically verified in the Coq proof assistant. @InProceedings{CC18p163, author = {Delphine Demange and Yon Fernández de Retana and David Pichardie}, title = {Semantic Reasoning about the Sea of Nodes}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {163--173}, doi = {10.1145/3178372.3179503}, year = {2018}, } Publisher's Version |
|
Ding, Chen |
CC '18: "PAYJIT: Space-Optimal JIT ..."
PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation
Jacob Brock, Chen Ding, Xiaoran Xu, and Yan Zhang (University of Rochester, USA; Rice University, USA; Futurewei Technologies, USA) Just-in-time (JIT) compilation in dynamic programming languages can improve execution speed in code with hot sections. However, that comes at the cost of increased memory usage for the storage of compiled code and associated bookkeeping data, as well as up-front compilation time for the hot code. The current standard JIT compilation policy (used in both Android's JIT compiler and the HotSpot Java compiler) is simplistic; any method that has reached a certain hotness, as counted by invocations and loop iterations, is scheduled for compilation. This would be fine if the cost/benefit of compilation was the same for every method of a given hotness, but that is not the case. In fact, compiling a large method will likely result in less overall speedup than compiling a smaller, equally hot method. This exposes an opportunity for improvement. We propose the PAYJIT compilation policy for method-based JIT compilers, which scales compilation hotness thresholds with method sizes, and includes two-point tuning, a mechanism for determining those hotness thresholds. In this way, PAYJIT compiles more small methods, it compiles them sooner, and it nearly eliminates the compilation of large methods that contribute little to overall speedup. Among 10 popular Android apps tested across 16 experiments, the PAYJIT compilation policy decreases compilation time by a maximum of 49.2%, with an average of 18.7%; execution time by a maximum of 10.9%, with an average of 1.34% (for a geometric mean speedup of 1.46%); and code cache size by a maximum of 41.1%, with an average of 15.6%. @InProceedings{CC18p71, author = {Jacob Brock and Chen Ding and Xiaoran Xu and Yan Zhang}, title = {PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {71--81}, doi = {10.1145/3178372.3179523}, year = {2018}, } Publisher's Version |
|
Doerfert, Johannes |
CC '18: "Polyhedral Expression Propagation ..."
Polyhedral Expression Propagation
Johannes Doerfert, Shrey Sharma, and Sebastian Hack (Saarland University, Germany) Polyhedral techniques have proven to be powerful for various optimizations, from automatic parallelization to accelerator programming. At their core, these techniques compute accurate dependences among statement instances in order to apply complex program transformations. Such transformations comprise memory layout or program order modifications by optimizing memory access functions or scheduling functions. However, these approaches treat statements as opaque entities and do not consider changing the structure of the contained expressions or the memory accesses involved. In this paper we present a technique that statically propagates expressions in order to avoid communicating their result via memory. While orthogonal to other polyhedral optimizations, this transformation can be used to enable them. Applied separately, expression propagation can increase parallelism, eliminate temporary arrays, create independent computations and improve cache utilization. It is especially useful for streaming codes that involve temporary arrays and scalar variables. For multiple image processing pipelines we achieve portable speedups of up to 21.3× as well as a significant memory reduction compared to a naive parallel implementation. In 6 out of 7 cases, expression propagation outperforms a state-of-the-art polyhedral optimization especially designed for this kind of programs by a factor of up to 2.03×. @InProceedings{CC18p25, author = {Johannes Doerfert and Shrey Sharma and Sebastian Hack}, title = {Polyhedral Expression Propagation}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {25--36}, doi = {10.1145/3178372.3179529}, year = {2018}, } Publisher's Version |
|
Ertel, Sebastian |
CC '18: "Compiling for Concise Code ..."
Compiling for Concise Code and Efficient I/O
Sebastian Ertel, Andrés Goens, Justus Adam, and Jeronimo Castrillon (TU Dresden, Germany) Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals. @InProceedings{CC18p104, author = {Sebastian Ertel and Andrés Goens and Justus Adam and Jeronimo Castrillon}, title = {Compiling for Concise Code and Efficient I/O}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {104--115}, doi = {10.1145/3178372.3179505}, year = {2018}, } Publisher's Version |
|
Ertl, M. Anton |
CC '18: "Fast and Flexible Instruction ..."
Fast and Flexible Instruction Selection with Constraints
Patrick Thier, M. Anton Ertl, and Andreas Krall (Vienna University of Technology, Austria) Tree-parsing instruction selection as used in, e.g., lcc, uses dynamic costs to gain flexibility and handle situations (such as read-modify-write instructions) that do not fit into the basic tree-parsing model. The disadvantage of dynamic costs is that we can no longer turn the tree grammar into a tree automaton (as is done by burg) for fast instruction selection for JIT compilers. In this paper we introduce constraints that say whether a tree-grammar rule is applicable or not. While theoretically less powerful than dynamic costs, constraints cover the practical uses of dynamic costs; more importantly, they allow turning the tree grammar with constraints into a tree automaton (with instruction-selection-time checks), resulting in faster instruction selection than with pure instruction-selection-time dynamic programming. We integrate constraints in an instruction selector that matches DAGs with tree rules. We evaluate this concept in lcc and the CACAO JavaVM JIT compiler, and see instruction selector speedups by a factor 1.33--1.89. @InProceedings{CC18p93, author = {Patrick Thier and M. Anton Ertl and Andreas Krall}, title = {Fast and Flexible Instruction Selection with Constraints}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {93--103}, doi = {10.1145/3178372.3179501}, year = {2018}, } Publisher's Version |
|
Ferles, Kostas |
CC '18: "An Efficient Data Structure ..."
An Efficient Data Structure for Must-Alias Analysis
George Kastrinis, George Balatsouras, Kostas Ferles, Nefeli Prokopaki-Kostopoulou, and Yannis Smaragdakis (University of Athens, Greece; University of Texas at Austin, USA) A must-alias (or “definite-alias”) analysis computes sets of expressions that are guaranteed to be aliased at a given pro- gram point. The analysis has been shown to have significant practical impact, and it is actively used in popular research frameworks and commercial tools. We present a custom data structure that speeds up must-alias analysis by nearly two orders of magnitude (while computing identical results). The data structure achieves efficiency by encoding multiple alias sets in a single linked structure, and compactly representing the aliasing relations of arbitrarily long expressions. We ex- plore the data structure’s performance in both an imperative and a declarative setting and contrast it extensively with prior techniques. With our approach, must-alias analysis can be performed efficiently, over large Java benchmarks, in under half a minute, making the analysis cost acceptable for most practical uses. @InProceedings{CC18p48, author = {George Kastrinis and George Balatsouras and Kostas Ferles and Nefeli Prokopaki-Kostopoulou and Yannis Smaragdakis}, title = {An Efficient Data Structure for Must-Alias Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {48--58}, doi = {10.1145/3178372.3179519}, year = {2018}, } Publisher's Version |
|
Fernández de Retana, Yon |
CC '18: "Semantic Reasoning about the ..."
Semantic Reasoning about the Sea of Nodes
Delphine Demange, Yon Fernández de Retana, and David Pichardie (Univ Rennes, France; Inria, France; CNRS, France; IRISA, France) The Sea of Nodes intermediate representation was introduced by Cliff Click in the mid 90s as an enhanced Static Single Assignment (SSA) form. It improves on the initial SSA form by relaxing the total order on instructions in basic blocks into explicit data and control dependencies. This makes programs more flexible to optimize. This graph-based representation is now used in many industrial-strength compilers, such as HotSpot or Graal. While the SSA form is now well understood from a semantic perspective -- even formally verified optimizing compilers use it in their middle-end -- very few semantic studies have been conducted about the Sea of Nodes. This paper presents a simple but rigorous formal semantics for a Sea of Nodes form. It comprises a denotational component to express data computation, and an operational component to express control flow. We then prove a fundamental, dominance-based semantic property on Sea of Nodes programs which determines the regions of the graph where the values of nodes are preserved. Finally, we apply our results to prove the semantic correctness of a redundant zero-check elimination optimization. All the necessary semantic properties have been mechanically verified in the Coq proof assistant. @InProceedings{CC18p163, author = {Delphine Demange and Yon Fernández de Retana and David Pichardie}, title = {Semantic Reasoning about the Sea of Nodes}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {163--173}, doi = {10.1145/3178372.3179503}, year = {2018}, } Publisher's Version |
|
Franke, Björn |
CC '18: "Generalized Profile-Guided ..."
Generalized Profile-Guided Iterator Recognition
Stanislav Manilov, Christos Vasiladiotis, and Björn Franke (University of Edinburgh, UK) Iterators prescribe the traversal of data structures and determine loop termination, and many loop analyses and transformations require their exact identification. While recognition of iterators is a straight-forward task for affine loops, the situation is different for loops iterating over dynamic data structures or involving control flow dependent computations to determine the next data element. In this paper we propose a compiler analysis for recognizing loop iterators code for a wide class of loops. We initially develop a static analysis, which is then enhanced with profiling information to support speculative code optimizations. We have prototyped our analysis in the LLVM framework and demonstrate its capabilities using the SPEC CPU2006 benchmarks. Our approach is applicable to all loops and we show that we can recognize iterators in, on average, 88.1% of over 75,000 loops using static analysis alone, and up to 94.9% using additional profiling information. Existing techniques perform substantially worse, especially for C and C++ applications, and cover only 35-44% of the loops. Our analysis enables advanced loop optimizations such as decoupled software pipelining, commutativity analysis and source code rejuvenation for real-world applications, which escape analysis and transformation if loop iterators are not recognized accurately. @InProceedings{CC18p185, author = {Stanislav Manilov and Christos Vasiladiotis and Björn Franke}, title = {Generalized Profile-Guided Iterator Recognition}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {185--195}, doi = {10.1145/3178372.3179511}, year = {2018}, } Publisher's Version CC '18: "Towards a Compiler Analysis ..." Towards a Compiler Analysis for Parallel Algorithmic Skeletons Tobias J. K. Edler von Koch, Stanislav Manilov, Christos Vasiladiotis, Murray Cole, and Björn Franke (Qualcomm Innovation Center, USA; University of Edinburgh, UK) Parallelizing compilers aim to detect data-parallel loops in sequential programs, which -- after suitable transformation -- can be safely and profitably executed in parallel. However, in the traditional model safe parallelization requires provable absence of dependences. At the same time, several well-known parallel algorithmic skeletons cannot be easily expressed in a data dependence framework due to spurious depedences, which prevent parallel execution. In this paper we argue that commutativity is a more suitable concept supporting formal characterization of parallel algorithmic skeletons. We show that existing commutativity definitions cannot be easily adapted for practical use, and develop a new concept of commutativity based on liveness, which readily integrates with existing compiler analyses. This enables us to develop formal definitions of parallel algorithmic skeletons such as task farms, MapReduce and Divide&Conquer. We show that existing informal characterizations of various parallel algorithmic skeletons are captured by our abstract formalizations. In this way we provide the urgently needed formal characterization of widely used parallel constructs allowing their immediate use in novel parallelizing compilers. @InProceedings{CC18p174, author = {Tobias J. K. Edler von Koch and Stanislav Manilov and Christos Vasiladiotis and Murray Cole and Björn Franke}, title = {Towards a Compiler Analysis for Parallel Algorithmic Skeletons}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {174--184}, doi = {10.1145/3178372.3179513}, year = {2018}, } Publisher's Version |
|
Ginsbach, Philip |
CC '18: "CAnDL: A Domain Specific Language ..."
CAnDL: A Domain Specific Language for Compiler Analysis
Philip Ginsbach, Lewis Crawford, and Michael F. P. O'Boyle (University of Edinburgh, UK) Optimizing compilers require sophisticated program analysis and transformations to exploit modern hardware. Implementing the appropriate analysis for a compiler optimization is a time consuming activity. For example, in LLVM, tens of thousands of lines of code are required to detect appropriate places to apply peephole optimizations. It is a barrier to the rapid prototyping and evaluation of new optimizations. In this paper we present the Compiler Analysis Description Language (CAnDL), a domain specific language for compiler analysis. CAnDL is a constraint based language that operates over LLVM's intermediate representation. The compiler developer writes a CAnDL program, which is then compiled by the CAnDL compiler into a C++ LLVM pass. It provides a uniform manner in which to describe compiler analysis and can be applied to a range of compiler analysis problems, reducing code length and complexity. We implemented and evaluated CAnDL on a number of real world use cases: eliminating redundant operations; graphics code optimization; identifying static control flow regions. In all cases were we able to express the analysis more briefly than competing approaches. @InProceedings{CC18p151, author = {Philip Ginsbach and Lewis Crawford and Michael F. P. O'Boyle}, title = {CAnDL: A Domain Specific Language for Compiler Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {151--162}, doi = {10.1145/3178372.3179515}, year = {2018}, } Publisher's Version |
|
Goens, Andrés |
CC '18: "Compiling for Concise Code ..."
Compiling for Concise Code and Efficient I/O
Sebastian Ertel, Andrés Goens, Justus Adam, and Jeronimo Castrillon (TU Dresden, Germany) Large infrastructures of Internet companies, such as Facebook and Twitter, are composed of several layers of micro-services. While this modularity provides scalability to the system, the I/O associated with each service request strongly impacts its performance. In this context, writing concise programs which execute I/O efficiently is especially challenging. In this paper, we introduce Ÿauhau, a novel compile-time solution. Ÿauhau reduces the number of I/O calls through rewrites on a simple expression language. To execute I/O concurrently, it lowers the expression language to a dataflow representation. Our approach can be used alongside an existing programming language, permitting the use of legacy code. We describe an implementation in the JVM and use it to evaluate our approach. Experiments show that Ÿauhau can significantly improve I/O, both in terms of the number of I/O calls and concurrent execution. Ÿauhau outperforms state-of-the-art approaches with similar goals. @InProceedings{CC18p104, author = {Sebastian Ertel and Andrés Goens and Justus Adam and Jeronimo Castrillon}, title = {Compiling for Concise Code and Efficient I/O}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {104--115}, doi = {10.1145/3178372.3179505}, year = {2018}, } Publisher's Version |
|
Grosser, Tobias |
CC '18: "Modeling the Conflicting Demands ..."
Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling
Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen (Inria, France; ENS, France; KU Leuven, Belgium; Polly Labs, Belgium; Georgia Tech, USA; ETH Zurich, Switzerland) The construction of effective loop nest optimizers and parallelizers remains challenging despite decades of work in the area. Due to the increasing diversity of loop-intensive applications and to the complex memory/computation hierarchies in modern processors, optimization heuristics are pulled towards conflicting goals, highlighting the lack of a systematic approach to optimizing locality and parallelism. Acknowledging these conflicting demands on loop nest optimization, we propose an algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators. This algorithmic template orchestrates a collection of parameterizable, linear optimization problems over a polyhedral space of semantics-preserving transformations. While the overall problem is not convex, effective algorithms can be derived from this template delivering unprecedented performance portability over GPU and multicore CPU. We discuss the rationale for this algorithmic template and validate it on representative computational kernels/benchmarks. @InProceedings{CC18p3, author = {Oleksandr Zinenko and Sven Verdoolaege and Chandan Reddy and Jun Shirako and Tobias Grosser and Vivek Sarkar and Albert Cohen}, title = {Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {3--13}, doi = {10.1145/3178372.3179507}, year = {2018}, } Publisher's Version |
|
Hack, Sebastian |
CC '18: "Polyhedral Expression Propagation ..."
Polyhedral Expression Propagation
Johannes Doerfert, Shrey Sharma, and Sebastian Hack (Saarland University, Germany) Polyhedral techniques have proven to be powerful for various optimizations, from automatic parallelization to accelerator programming. At their core, these techniques compute accurate dependences among statement instances in order to apply complex program transformations. Such transformations comprise memory layout or program order modifications by optimizing memory access functions or scheduling functions. However, these approaches treat statements as opaque entities and do not consider changing the structure of the contained expressions or the memory accesses involved. In this paper we present a technique that statically propagates expressions in order to avoid communicating their result via memory. While orthogonal to other polyhedral optimizations, this transformation can be used to enable them. Applied separately, expression propagation can increase parallelism, eliminate temporary arrays, create independent computations and improve cache utilization. It is especially useful for streaming codes that involve temporary arrays and scalar variables. For multiple image processing pipelines we achieve portable speedups of up to 21.3× as well as a significant memory reduction compared to a naive parallel implementation. In 6 out of 7 cases, expression propagation outperforms a state-of-the-art polyhedral optimization especially designed for this kind of programs by a factor of up to 2.03×. @InProceedings{CC18p25, author = {Johannes Doerfert and Shrey Sharma and Sebastian Hack}, title = {Polyhedral Expression Propagation}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {25--36}, doi = {10.1145/3178372.3179529}, year = {2018}, } Publisher's Version |
|
Heim, Bettina |
CC '18: "Compiler and Language Design ..."
Compiler and Language Design for Quantum Computing (Keynote)
Bettina Heim (Microsoft Research, USA) Quantum computing, once merely a curious concept discussed within the field of theoretical physics, has long-since become of practical interest in numerous fields and caught the attention of mainstream media. The reason for the widespread interest stems from the tremendous impact it could have on today’s technology. Quantum computing could revolutionize how we develop new materials, how we approach machine learning and optimization tasks, and provide answers to some of the most intriguing questions in physics, chemistry and biology. Despite having been conceived as early as 1981, quantum computers remained beyond the realms of possibility until recently. Building a scalable programmable universal device to this day is one of the biggest challenges of the 21st century. Such an endeavor entails a set of unique requirements for the design of classical software. In my talk I will discuss the particularities of compiling for quantum hardware compared to classical machines, and what advantages a well designed quantum computing language can provide. I will outline the architecture of the software stack involved in translating high level mathematical concepts into machine instructions, and elaborate on the role of classical computing in this new age of quantum. @InProceedings{CC18p2, author = {Bettina Heim}, title = {Compiler and Language Design for Quantum Computing (Keynote)}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {2--2}, doi = {10.1145/3178372.3183636}, year = {2018}, } Publisher's Version |
|
Humer, Christian |
CC '18: "Efficient Dynamic Analysis ..."
Efficient Dynamic Analysis for Node.js
Haiyang Sun, Daniele Bonetta, Christian Humer, and Walter Binder (University of Lugano, Switzerland; Oracle Labs, USA; Oracle Labs, Switzerland) Due to its popularity, there is an urgent need for dynamic program-analysis tools for Node.js, helping developers find bugs, performance bottlenecks, and bad coding practices. Frameworks based on code-level instrumentation enable dynamic analyses close to program semantics and are more flexible than Node.js built-in profiling tools. However, existing code-level instrumentation frameworks for JavaScript suffer from enormous overheads and difficulties in instrumenting the built-in module library of Node.js. In this paper, we introduce a new dynamic analysis framework for JavaScript and Node.js called NodeProf. While offering similar flexibility as code-level instrumentation frameworks, NodeProf significantly improves analysis performance while ensuring comprehensive code coverage. NodeProf supports runtime (de)activation of analyses and incurs zero overhead when no analysis is active. NodeProf is based on dynamic instrumentation of the JavaScript runtime and leverages automatic partial evaluation to generate efficient machine code. In addition, NodeProf makes use of the language interoperability provided by the runtime and thus allows dynamic analyses to be written in Java and JavaScript with compatibility to Jalangi, a state-of-the-art code-level JavaScript instrumentation framework. Our experiments show that the peak performance of running the same dynamic analyses using NodeProf can be up to three orders of magnitude faster than Jalangi. @InProceedings{CC18p196, author = {Haiyang Sun and Daniele Bonetta and Christian Humer and Walter Binder}, title = {Efficient Dynamic Analysis for Node.js}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {196--206}, doi = {10.1145/3178372.3179527}, year = {2018}, } Publisher's Version |
|
Hu, Raymond |
CC '18: "A Session Type Provider: Compile-Time ..."
A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#
Rumyana Neykova, Raymond Hu, Nobuko Yoshida, and Fahd Abdeljallal (Imperial College London, UK) We present a library for the specification and implementation of distributed protocols in native F# (and other .NET languages) based on multiparty session types (MPST). There are two main contributions. Our library is the first practical development of MPST to support what we refer to as interaction refinements: a collection of features related to the refinement of protocols, such as message-type refinements (value constraints) and message value dependent control flow. A well-typed endpoint program using our library is guaranteed to perform only compliant session I/O actions w.r.t. to the refined protocol, up to premature termination. Second, our library is developed as a session type provider, that performs on-demand compile-time protocol validation and generation of protocol-specific .NET types for users writing the distributed endpoint programs. It is implemented by extending and integrating Scribble (a toolchain for MPST) with an SMT solver into the type providers framework. The safety guarantees are achieved by a combination of static type checking of the generated types for messages and I/O operations, correctness by construction from code generation, and automated inlining of assertions. @InProceedings{CC18p128, author = {Rumyana Neykova and Raymond Hu and Nobuko Yoshida and Fahd Abdeljallal}, title = {A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {128--138}, doi = {10.1145/3178372.3179495}, year = {2018}, } Publisher's Version |
|
Kastrinis, George |
CC '18: "An Efficient Data Structure ..."
An Efficient Data Structure for Must-Alias Analysis
George Kastrinis, George Balatsouras, Kostas Ferles, Nefeli Prokopaki-Kostopoulou, and Yannis Smaragdakis (University of Athens, Greece; University of Texas at Austin, USA) A must-alias (or “definite-alias”) analysis computes sets of expressions that are guaranteed to be aliased at a given pro- gram point. The analysis has been shown to have significant practical impact, and it is actively used in popular research frameworks and commercial tools. We present a custom data structure that speeds up must-alias analysis by nearly two orders of magnitude (while computing identical results). The data structure achieves efficiency by encoding multiple alias sets in a single linked structure, and compactly representing the aliasing relations of arbitrarily long expressions. We ex- plore the data structure’s performance in both an imperative and a declarative setting and contrast it extensively with prior techniques. With our approach, must-alias analysis can be performed efficiently, over large Java benchmarks, in under half a minute, making the analysis cost acceptable for most practical uses. @InProceedings{CC18p48, author = {George Kastrinis and George Balatsouras and Kostas Ferles and Nefeli Prokopaki-Kostopoulou and Yannis Smaragdakis}, title = {An Efficient Data Structure for Must-Alias Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {48--58}, doi = {10.1145/3178372.3179519}, year = {2018}, } Publisher's Version |
|
Khedker, Uday P. |
CC '18: "Computing Partially Path-Sensitive ..."
Computing Partially Path-Sensitive MFP Solutions in Data Flow Analyses
Komal Pathade and Uday P. Khedker (Tata Consultancy Services, India; IIT Bombay, India) Data flow analysis traverses paths in a control flow graph (CFG) representation of programs to compute useful information. Many of these paths are infeasible, i.e. they cannot arise in any possible execution. The information computed along these paths adds imprecision to the conventional Maximal Fixed Point (MFP) solution of a data flow analysis. Existing approaches for removing this imprecision are either specific to a data flow problem or involve control flow graph restructuring which has exponential complexity. We introduce partial path-sensitivity to the MFP solution by identifying clusters of minimal infeasible path segments to distinguish between the data flowing along feasible and infeasible control flow paths. This allows us to lift any data flow analysis to an analysis over k+1 tuples where k is the number of clusters. Our flow function for a k+1 tuple shifts the values of the underlying analysis from an element in the tuple to other element(s) at the start and end of a cluster as appropriate. This allows us to maintain the distinctions where they are beneficial. Since k is linear in the number of conditional edges in the CFG, the effort is multiplied by a factor that is linear in the number of conditional edges (and is not exponential, unlike conventional approaches of achieving path sensitivity.) We have implemented our method of computing partially path sensitive MFP for reaching definitions analysis and value range analysis of variables. Our measurements on benchmark programs show up to 9% reduction in the number of reaching definitions and up to 14% cases where the value range of a variable is smaller. @InProceedings{CC18p37, author = {Komal Pathade and Uday P. Khedker}, title = {Computing Partially Path-Sensitive MFP Solutions in Data Flow Analyses}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {37--47}, doi = {10.1145/3178372.3179497}, year = {2018}, } Publisher's Version |
|
Krall, Andreas |
CC '18: "Fast and Flexible Instruction ..."
Fast and Flexible Instruction Selection with Constraints
Patrick Thier, M. Anton Ertl, and Andreas Krall (Vienna University of Technology, Austria) Tree-parsing instruction selection as used in, e.g., lcc, uses dynamic costs to gain flexibility and handle situations (such as read-modify-write instructions) that do not fit into the basic tree-parsing model. The disadvantage of dynamic costs is that we can no longer turn the tree grammar into a tree automaton (as is done by burg) for fast instruction selection for JIT compilers. In this paper we introduce constraints that say whether a tree-grammar rule is applicable or not. While theoretically less powerful than dynamic costs, constraints cover the practical uses of dynamic costs; more importantly, they allow turning the tree grammar with constraints into a tree automaton (with instruction-selection-time checks), resulting in faster instruction selection than with pure instruction-selection-time dynamic programming. We integrate constraints in an instruction selector that matches DAGs with tree rules. We evaluate this concept in lcc and the CACAO JavaVM JIT compiler, and see instruction selector speedups by a factor 1.33--1.89. @InProceedings{CC18p93, author = {Patrick Thier and M. Anton Ertl and Andreas Krall}, title = {Fast and Flexible Instruction Selection with Constraints}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {93--103}, doi = {10.1145/3178372.3179501}, year = {2018}, } Publisher's Version |
|
Kruse, Michael |
CC '18: "A Polyhedral Compilation Framework ..."
A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds
Jie Zhao, Michael Kruse, and Albert Cohen (Inria, France; ENS, France) We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable, removing the loop-carried data dependences induced by termination conditions. We propose an automatic compilation approach to parallelize and optimize dynamic counted loops. Our approach relies on affine relations only, as implemented in state-of-the-art polyhedral libraries. Revisiting a state-of-the-art framework to parallelize arbitrary while loops, we introduce additional control dependences on data-dependent predicates. Our method goes beyond the state of the art in fully automating the process, specializing the code generation algorithm to the case of dynamic counted loops and avoiding the introduction of spurious loop-carried dependences. We conduct experiments on representative irregular computations, from dynamic programming, computer vision and finite element methods to sparse matrix linear algebra. We validate that the method is applicable to general affine transformations for locality optimization, vectorization and parallelization. @InProceedings{CC18p14, author = {Jie Zhao and Michael Kruse and Albert Cohen}, title = {A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {14--24}, doi = {10.1145/3178372.3179509}, year = {2018}, } Publisher's Version |
|
Lhoták, Ondřej |
CC '18: "Tail Call Elimination and ..."
Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine
Magnus Madsen, Ramin Zarifi, and Ondřej Lhoták (Aalborg University, Denmark; University of Waterloo, Canada) The Java Virtual Machine (JVM) offers an attractive runtime environment for programming language implementers. The JVM has a simple bytecode format, excellent performance, multiple state-of-the art garbage collectors, robust backwards compatibility, and it runs on almost all platforms. Further, the Java ecosystem grants access to a plethora of libraries and tooling, including debuggers and profilers. Yet, the JVM was originally designed for Java, and its representation of code and data may cause difficulties for other languages. In this paper, we discuss how to efficiently implement functional languages on the JVM. We focus on two overall challenges: (a) how to efficiently represent algebraic data types in the presence of tags and tuples, option types, newtypes, and parametric polymorphism, and (b) how to support full tail call elimination on the JVM. We present two technical contributions: A fused representation of tags and tuples and a full tail call elimination strategy that is thread-safe. We implement these techniques in the Flix language and evaluate their performance. @InProceedings{CC18p139, author = {Magnus Madsen and Ramin Zarifi and Ondřej Lhoták}, title = {Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {139--150}, doi = {10.1145/3178372.3179499}, year = {2018}, } Publisher's Version |
|
Lucia, Brandon |
CC '18: "Termination Checking and Task ..."
Termination Checking and Task Decomposition for Task-Based Intermittent Programs
Alexei Colin and Brandon Lucia (Carnegie Mellon University, USA) Emerging energy-harvesting computer systems extract energy from their environment to compute, sense, and communicate with no battery or tethered power supply. Building software for energy-harvesting devices is a challenge, because they operate only intermittently as energy is available. Programs frequently reboot due to power loss, which can corrupt program state and prevent forward progress. Task-based programming models allow intermittent execution of long-running applications, but require the programmer to decompose code into tasks that will eventually complete between two power failures. Task decomposition is challenging and no tools exist to aid in task decomposition. We propose CleanCut, a tool that can check for and report non-terminating tasks in existing code, as well as automatically decompose code into efficient, terminating tasks. CleanCut is based on a statistical model for energy of paths through the program. We applied a prototype of CleanCut to four applications, including pattern-recognition, encryption, compression, and data filtering. Our experiments demonstrated the risk of non-termination in existing code and showed that CleanCut finds efficient task decompositions that execute 2.45x faster on average than manually placed boundaries. @InProceedings{CC18p116, author = {Alexei Colin and Brandon Lucia}, title = {Termination Checking and Task Decomposition for Task-Based Intermittent Programs}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {116--127}, doi = {10.1145/3178372.3179525}, year = {2018}, } Publisher's Version Info |
|
Madsen, Magnus |
CC '18: "Tail Call Elimination and ..."
Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine
Magnus Madsen, Ramin Zarifi, and Ondřej Lhoták (Aalborg University, Denmark; University of Waterloo, Canada) The Java Virtual Machine (JVM) offers an attractive runtime environment for programming language implementers. The JVM has a simple bytecode format, excellent performance, multiple state-of-the art garbage collectors, robust backwards compatibility, and it runs on almost all platforms. Further, the Java ecosystem grants access to a plethora of libraries and tooling, including debuggers and profilers. Yet, the JVM was originally designed for Java, and its representation of code and data may cause difficulties for other languages. In this paper, we discuss how to efficiently implement functional languages on the JVM. We focus on two overall challenges: (a) how to efficiently represent algebraic data types in the presence of tags and tuples, option types, newtypes, and parametric polymorphism, and (b) how to support full tail call elimination on the JVM. We present two technical contributions: A fused representation of tags and tuples and a full tail call elimination strategy that is thread-safe. We implement these techniques in the Flix language and evaluate their performance. @InProceedings{CC18p139, author = {Magnus Madsen and Ramin Zarifi and Ondřej Lhoták}, title = {Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {139--150}, doi = {10.1145/3178372.3179499}, year = {2018}, } Publisher's Version |
|
Manilov, Stanislav |
CC '18: "Generalized Profile-Guided ..."
Generalized Profile-Guided Iterator Recognition
Stanislav Manilov, Christos Vasiladiotis, and Björn Franke (University of Edinburgh, UK) Iterators prescribe the traversal of data structures and determine loop termination, and many loop analyses and transformations require their exact identification. While recognition of iterators is a straight-forward task for affine loops, the situation is different for loops iterating over dynamic data structures or involving control flow dependent computations to determine the next data element. In this paper we propose a compiler analysis for recognizing loop iterators code for a wide class of loops. We initially develop a static analysis, which is then enhanced with profiling information to support speculative code optimizations. We have prototyped our analysis in the LLVM framework and demonstrate its capabilities using the SPEC CPU2006 benchmarks. Our approach is applicable to all loops and we show that we can recognize iterators in, on average, 88.1% of over 75,000 loops using static analysis alone, and up to 94.9% using additional profiling information. Existing techniques perform substantially worse, especially for C and C++ applications, and cover only 35-44% of the loops. Our analysis enables advanced loop optimizations such as decoupled software pipelining, commutativity analysis and source code rejuvenation for real-world applications, which escape analysis and transformation if loop iterators are not recognized accurately. @InProceedings{CC18p185, author = {Stanislav Manilov and Christos Vasiladiotis and Björn Franke}, title = {Generalized Profile-Guided Iterator Recognition}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {185--195}, doi = {10.1145/3178372.3179511}, year = {2018}, } Publisher's Version CC '18: "Towards a Compiler Analysis ..." Towards a Compiler Analysis for Parallel Algorithmic Skeletons Tobias J. K. Edler von Koch, Stanislav Manilov, Christos Vasiladiotis, Murray Cole, and Björn Franke (Qualcomm Innovation Center, USA; University of Edinburgh, UK) Parallelizing compilers aim to detect data-parallel loops in sequential programs, which -- after suitable transformation -- can be safely and profitably executed in parallel. However, in the traditional model safe parallelization requires provable absence of dependences. At the same time, several well-known parallel algorithmic skeletons cannot be easily expressed in a data dependence framework due to spurious depedences, which prevent parallel execution. In this paper we argue that commutativity is a more suitable concept supporting formal characterization of parallel algorithmic skeletons. We show that existing commutativity definitions cannot be easily adapted for practical use, and develop a new concept of commutativity based on liveness, which readily integrates with existing compiler analyses. This enables us to develop formal definitions of parallel algorithmic skeletons such as task farms, MapReduce and Divide&Conquer. We show that existing informal characterizations of various parallel algorithmic skeletons are captured by our abstract formalizations. In this way we provide the urgently needed formal characterization of widely used parallel constructs allowing their immediate use in novel parallelizing compilers. @InProceedings{CC18p174, author = {Tobias J. K. Edler von Koch and Stanislav Manilov and Christos Vasiladiotis and Murray Cole and Björn Franke}, title = {Towards a Compiler Analysis for Parallel Algorithmic Skeletons}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {174--184}, doi = {10.1145/3178372.3179513}, year = {2018}, } Publisher's Version |
|
Neykova, Rumyana |
CC '18: "A Session Type Provider: Compile-Time ..."
A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#
Rumyana Neykova, Raymond Hu, Nobuko Yoshida, and Fahd Abdeljallal (Imperial College London, UK) We present a library for the specification and implementation of distributed protocols in native F# (and other .NET languages) based on multiparty session types (MPST). There are two main contributions. Our library is the first practical development of MPST to support what we refer to as interaction refinements: a collection of features related to the refinement of protocols, such as message-type refinements (value constraints) and message value dependent control flow. A well-typed endpoint program using our library is guaranteed to perform only compliant session I/O actions w.r.t. to the refined protocol, up to premature termination. Second, our library is developed as a session type provider, that performs on-demand compile-time protocol validation and generation of protocol-specific .NET types for users writing the distributed endpoint programs. It is implemented by extending and integrating Scribble (a toolchain for MPST) with an SMT solver into the type providers framework. The safety guarantees are achieved by a combination of static type checking of the generated types for messages and I/O operations, correctness by construction from code generation, and automated inlining of assertions. @InProceedings{CC18p128, author = {Rumyana Neykova and Raymond Hu and Nobuko Yoshida and Fahd Abdeljallal}, title = {A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {128--138}, doi = {10.1145/3178372.3179495}, year = {2018}, } Publisher's Version |
|
O'Boyle, Michael F. P. |
CC '18: "CAnDL: A Domain Specific Language ..."
CAnDL: A Domain Specific Language for Compiler Analysis
Philip Ginsbach, Lewis Crawford, and Michael F. P. O'Boyle (University of Edinburgh, UK) Optimizing compilers require sophisticated program analysis and transformations to exploit modern hardware. Implementing the appropriate analysis for a compiler optimization is a time consuming activity. For example, in LLVM, tens of thousands of lines of code are required to detect appropriate places to apply peephole optimizations. It is a barrier to the rapid prototyping and evaluation of new optimizations. In this paper we present the Compiler Analysis Description Language (CAnDL), a domain specific language for compiler analysis. CAnDL is a constraint based language that operates over LLVM's intermediate representation. The compiler developer writes a CAnDL program, which is then compiled by the CAnDL compiler into a C++ LLVM pass. It provides a uniform manner in which to describe compiler analysis and can be applied to a range of compiler analysis problems, reducing code length and complexity. We implemented and evaluated CAnDL on a number of real world use cases: eliminating redundant operations; graphics code optimization; identifying static control flow regions. In all cases were we able to express the analysis more briefly than competing approaches. @InProceedings{CC18p151, author = {Philip Ginsbach and Lewis Crawford and Michael F. P. O'Boyle}, title = {CAnDL: A Domain Specific Language for Compiler Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {151--162}, doi = {10.1145/3178372.3179515}, year = {2018}, } Publisher's Version |
|
Pathade, Komal |
CC '18: "Computing Partially Path-Sensitive ..."
Computing Partially Path-Sensitive MFP Solutions in Data Flow Analyses
Komal Pathade and Uday P. Khedker (Tata Consultancy Services, India; IIT Bombay, India) Data flow analysis traverses paths in a control flow graph (CFG) representation of programs to compute useful information. Many of these paths are infeasible, i.e. they cannot arise in any possible execution. The information computed along these paths adds imprecision to the conventional Maximal Fixed Point (MFP) solution of a data flow analysis. Existing approaches for removing this imprecision are either specific to a data flow problem or involve control flow graph restructuring which has exponential complexity. We introduce partial path-sensitivity to the MFP solution by identifying clusters of minimal infeasible path segments to distinguish between the data flowing along feasible and infeasible control flow paths. This allows us to lift any data flow analysis to an analysis over k+1 tuples where k is the number of clusters. Our flow function for a k+1 tuple shifts the values of the underlying analysis from an element in the tuple to other element(s) at the start and end of a cluster as appropriate. This allows us to maintain the distinctions where they are beneficial. Since k is linear in the number of conditional edges in the CFG, the effort is multiplied by a factor that is linear in the number of conditional edges (and is not exponential, unlike conventional approaches of achieving path sensitivity.) We have implemented our method of computing partially path sensitive MFP for reaching definitions analysis and value range analysis of variables. Our measurements on benchmark programs show up to 9% reduction in the number of reaching definitions and up to 14% cases where the value range of a variable is smaller. @InProceedings{CC18p37, author = {Komal Pathade and Uday P. Khedker}, title = {Computing Partially Path-Sensitive MFP Solutions in Data Flow Analyses}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {37--47}, doi = {10.1145/3178372.3179497}, year = {2018}, } Publisher's Version |
|
Pichardie, David |
CC '18: "Semantic Reasoning about the ..."
Semantic Reasoning about the Sea of Nodes
Delphine Demange, Yon Fernández de Retana, and David Pichardie (Univ Rennes, France; Inria, France; CNRS, France; IRISA, France) The Sea of Nodes intermediate representation was introduced by Cliff Click in the mid 90s as an enhanced Static Single Assignment (SSA) form. It improves on the initial SSA form by relaxing the total order on instructions in basic blocks into explicit data and control dependencies. This makes programs more flexible to optimize. This graph-based representation is now used in many industrial-strength compilers, such as HotSpot or Graal. While the SSA form is now well understood from a semantic perspective -- even formally verified optimizing compilers use it in their middle-end -- very few semantic studies have been conducted about the Sea of Nodes. This paper presents a simple but rigorous formal semantics for a Sea of Nodes form. It comprises a denotational component to express data computation, and an operational component to express control flow. We then prove a fundamental, dominance-based semantic property on Sea of Nodes programs which determines the regions of the graph where the values of nodes are preserved. Finally, we apply our results to prove the semantic correctness of a redundant zero-check elimination optimization. All the necessary semantic properties have been mechanically verified in the Coq proof assistant. @InProceedings{CC18p163, author = {Delphine Demange and Yon Fernández de Retana and David Pichardie}, title = {Semantic Reasoning about the Sea of Nodes}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {163--173}, doi = {10.1145/3178372.3179503}, year = {2018}, } Publisher's Version |
|
Prokopaki-Kostopoulou, Nefeli |
CC '18: "An Efficient Data Structure ..."
An Efficient Data Structure for Must-Alias Analysis
George Kastrinis, George Balatsouras, Kostas Ferles, Nefeli Prokopaki-Kostopoulou, and Yannis Smaragdakis (University of Athens, Greece; University of Texas at Austin, USA) A must-alias (or “definite-alias”) analysis computes sets of expressions that are guaranteed to be aliased at a given pro- gram point. The analysis has been shown to have significant practical impact, and it is actively used in popular research frameworks and commercial tools. We present a custom data structure that speeds up must-alias analysis by nearly two orders of magnitude (while computing identical results). The data structure achieves efficiency by encoding multiple alias sets in a single linked structure, and compactly representing the aliasing relations of arbitrarily long expressions. We ex- plore the data structure’s performance in both an imperative and a declarative setting and contrast it extensively with prior techniques. With our approach, must-alias analysis can be performed efficiently, over large Java benchmarks, in under half a minute, making the analysis cost acceptable for most practical uses. @InProceedings{CC18p48, author = {George Kastrinis and George Balatsouras and Kostas Ferles and Nefeli Prokopaki-Kostopoulou and Yannis Smaragdakis}, title = {An Efficient Data Structure for Must-Alias Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {48--58}, doi = {10.1145/3178372.3179519}, year = {2018}, } Publisher's Version |
|
Reddy, Chandan |
CC '18: "Modeling the Conflicting Demands ..."
Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling
Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen (Inria, France; ENS, France; KU Leuven, Belgium; Polly Labs, Belgium; Georgia Tech, USA; ETH Zurich, Switzerland) The construction of effective loop nest optimizers and parallelizers remains challenging despite decades of work in the area. Due to the increasing diversity of loop-intensive applications and to the complex memory/computation hierarchies in modern processors, optimization heuristics are pulled towards conflicting goals, highlighting the lack of a systematic approach to optimizing locality and parallelism. Acknowledging these conflicting demands on loop nest optimization, we propose an algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators. This algorithmic template orchestrates a collection of parameterizable, linear optimization problems over a polyhedral space of semantics-preserving transformations. While the overall problem is not convex, effective algorithms can be derived from this template delivering unprecedented performance portability over GPU and multicore CPU. We discuss the rationale for this algorithmic template and validate it on representative computational kernels/benchmarks. @InProceedings{CC18p3, author = {Oleksandr Zinenko and Sven Verdoolaege and Chandan Reddy and Jun Shirako and Tobias Grosser and Vivek Sarkar and Albert Cohen}, title = {Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {3--13}, doi = {10.1145/3178372.3179507}, year = {2018}, } Publisher's Version |
|
Sarkar, Vivek |
CC '18: "Parallel Sparse Flow-Sensitive ..."
Parallel Sparse Flow-Sensitive Points-to Analysis
Jisheng Zhao, Michael G. Burke, and Vivek Sarkar (Rice University, USA) This paper aims to contribute to further advances in pointer (or points-to) analysis algorithms along the combined dimen- sions of precision, scalability, and performance. For precision, we aim to support interprocedural ow-sensitive analysis. For scalability, we aim to show that our approach scales to large applications with reasonable memory requirements. For performance, we aim to design a points-to analysis algo- rithm that is amenable to parallel execution. The algorithm introduced in this paper achieves all these goals. As an ex- ample, our experimental results show that our algorithm can analyze the 2.2MLOC Tizen OS framework with < 16GB of memory while delivering an average analysis rate of > 10KLOC/second. Our points-to analysis algorithm, PSEGPT, is based on the Pointer Sparse Evaluation Graph (PSEG) form, a new analysis representation that combines both points-to and heap def-use information. PSEGPT is a scalable interprocedural flow-sensitive context-insensitive points-to analy- sis that is amenable to efficient task-parallel implementa- tions, even though points-to analysis is usually viewed as a challenge problem for parallelization. Our experimental results with 6 real-world applications on a 12-core machine show an average parallel speedup of 4.45× and maximum speedup of 7.35×. The evaluation also includes precision results by demonstrating that our algorithm identifies significantly more inlinable indirect calls (IICs) than SUPT and SS, two states of the art SSA-based points-to analyses implemented in LLVM. @InProceedings{CC18p59, author = {Jisheng Zhao and Michael G. Burke and Vivek Sarkar}, title = {Parallel Sparse Flow-Sensitive Points-to Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {59--70}, doi = {10.1145/3178372.3179517}, year = {2018}, } Publisher's Version CC '18: "Modeling the Conflicting Demands ..." Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen (Inria, France; ENS, France; KU Leuven, Belgium; Polly Labs, Belgium; Georgia Tech, USA; ETH Zurich, Switzerland) The construction of effective loop nest optimizers and parallelizers remains challenging despite decades of work in the area. Due to the increasing diversity of loop-intensive applications and to the complex memory/computation hierarchies in modern processors, optimization heuristics are pulled towards conflicting goals, highlighting the lack of a systematic approach to optimizing locality and parallelism. Acknowledging these conflicting demands on loop nest optimization, we propose an algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators. This algorithmic template orchestrates a collection of parameterizable, linear optimization problems over a polyhedral space of semantics-preserving transformations. While the overall problem is not convex, effective algorithms can be derived from this template delivering unprecedented performance portability over GPU and multicore CPU. We discuss the rationale for this algorithmic template and validate it on representative computational kernels/benchmarks. @InProceedings{CC18p3, author = {Oleksandr Zinenko and Sven Verdoolaege and Chandan Reddy and Jun Shirako and Tobias Grosser and Vivek Sarkar and Albert Cohen}, title = {Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {3--13}, doi = {10.1145/3178372.3179507}, year = {2018}, } Publisher's Version |
|
Sharma, Shrey |
CC '18: "Polyhedral Expression Propagation ..."
Polyhedral Expression Propagation
Johannes Doerfert, Shrey Sharma, and Sebastian Hack (Saarland University, Germany) Polyhedral techniques have proven to be powerful for various optimizations, from automatic parallelization to accelerator programming. At their core, these techniques compute accurate dependences among statement instances in order to apply complex program transformations. Such transformations comprise memory layout or program order modifications by optimizing memory access functions or scheduling functions. However, these approaches treat statements as opaque entities and do not consider changing the structure of the contained expressions or the memory accesses involved. In this paper we present a technique that statically propagates expressions in order to avoid communicating their result via memory. While orthogonal to other polyhedral optimizations, this transformation can be used to enable them. Applied separately, expression propagation can increase parallelism, eliminate temporary arrays, create independent computations and improve cache utilization. It is especially useful for streaming codes that involve temporary arrays and scalar variables. For multiple image processing pipelines we achieve portable speedups of up to 21.3× as well as a significant memory reduction compared to a naive parallel implementation. In 6 out of 7 cases, expression propagation outperforms a state-of-the-art polyhedral optimization especially designed for this kind of programs by a factor of up to 2.03×. @InProceedings{CC18p25, author = {Johannes Doerfert and Shrey Sharma and Sebastian Hack}, title = {Polyhedral Expression Propagation}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {25--36}, doi = {10.1145/3178372.3179529}, year = {2018}, } Publisher's Version |
|
Shen, Xipeng |
CC '18: "Rethinking Compilers in the ..."
Rethinking Compilers in the Rise of Machine Learning and AI (Keynote)
Xipeng Shen (North Carolina State University, USA) Recent years have witnessed some influential progresses in Machine Learning (ML) and Artificial Intelligence (AI). The progresses may lead to some significant changes to future programming. Many programs, for instance, may be not code written in some specially designed programming languages, but high-level user intentions expressed in natural languages. Deep Learning-based software, despite the difficulties in interpreting their results, may continue its rapid growth in the software market and its influence in people's everyday life. This talk will first examine the implications of these changes to compiler research, and then discuss the potential opportunities that ML and AI could bring to possibly transform the field of compiler research. Specifically, the talk will focus on the possibilities for ML and AI to help reveal the high-level semantics and attributes of software components that traditional compiler technology cannot do, and hence, open important opportunities for high-level large-scoped code reasoning and optimizations---a direction that has some tremendous potential but has been beyond the reach of traditional compiler technology. The talk will discuss how ML and AI may help break the "abstraction wall"---barriers formed by layers of abstractions in modern software---for program analysis and optimizations, and how ML and AI may transform the way in which high-level user intentions get translated into low-level code implementations. The talk will conclude with a list of grand challenges and possible research directions for future compiler constructions. @InProceedings{CC18p1, author = {Xipeng Shen}, title = {Rethinking Compilers in the Rise of Machine Learning and AI (Keynote)}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {1--1}, doi = {10.1145/3178372.3183634}, year = {2018}, } Publisher's Version |
|
Shirako, Jun |
CC '18: "Modeling the Conflicting Demands ..."
Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling
Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen (Inria, France; ENS, France; KU Leuven, Belgium; Polly Labs, Belgium; Georgia Tech, USA; ETH Zurich, Switzerland) The construction of effective loop nest optimizers and parallelizers remains challenging despite decades of work in the area. Due to the increasing diversity of loop-intensive applications and to the complex memory/computation hierarchies in modern processors, optimization heuristics are pulled towards conflicting goals, highlighting the lack of a systematic approach to optimizing locality and parallelism. Acknowledging these conflicting demands on loop nest optimization, we propose an algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators. This algorithmic template orchestrates a collection of parameterizable, linear optimization problems over a polyhedral space of semantics-preserving transformations. While the overall problem is not convex, effective algorithms can be derived from this template delivering unprecedented performance portability over GPU and multicore CPU. We discuss the rationale for this algorithmic template and validate it on representative computational kernels/benchmarks. @InProceedings{CC18p3, author = {Oleksandr Zinenko and Sven Verdoolaege and Chandan Reddy and Jun Shirako and Tobias Grosser and Vivek Sarkar and Albert Cohen}, title = {Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {3--13}, doi = {10.1145/3178372.3179507}, year = {2018}, } Publisher's Version |
|
Smaragdakis, Yannis |
CC '18: "An Efficient Data Structure ..."
An Efficient Data Structure for Must-Alias Analysis
George Kastrinis, George Balatsouras, Kostas Ferles, Nefeli Prokopaki-Kostopoulou, and Yannis Smaragdakis (University of Athens, Greece; University of Texas at Austin, USA) A must-alias (or “definite-alias”) analysis computes sets of expressions that are guaranteed to be aliased at a given pro- gram point. The analysis has been shown to have significant practical impact, and it is actively used in popular research frameworks and commercial tools. We present a custom data structure that speeds up must-alias analysis by nearly two orders of magnitude (while computing identical results). The data structure achieves efficiency by encoding multiple alias sets in a single linked structure, and compactly representing the aliasing relations of arbitrarily long expressions. We ex- plore the data structure’s performance in both an imperative and a declarative setting and contrast it extensively with prior techniques. With our approach, must-alias analysis can be performed efficiently, over large Java benchmarks, in under half a minute, making the analysis cost acceptable for most practical uses. @InProceedings{CC18p48, author = {George Kastrinis and George Balatsouras and Kostas Ferles and Nefeli Prokopaki-Kostopoulou and Yannis Smaragdakis}, title = {An Efficient Data Structure for Must-Alias Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {48--58}, doi = {10.1145/3178372.3179519}, year = {2018}, } Publisher's Version |
|
Sun, Haiyang |
CC '18: "Efficient Dynamic Analysis ..."
Efficient Dynamic Analysis for Node.js
Haiyang Sun, Daniele Bonetta, Christian Humer, and Walter Binder (University of Lugano, Switzerland; Oracle Labs, USA; Oracle Labs, Switzerland) Due to its popularity, there is an urgent need for dynamic program-analysis tools for Node.js, helping developers find bugs, performance bottlenecks, and bad coding practices. Frameworks based on code-level instrumentation enable dynamic analyses close to program semantics and are more flexible than Node.js built-in profiling tools. However, existing code-level instrumentation frameworks for JavaScript suffer from enormous overheads and difficulties in instrumenting the built-in module library of Node.js. In this paper, we introduce a new dynamic analysis framework for JavaScript and Node.js called NodeProf. While offering similar flexibility as code-level instrumentation frameworks, NodeProf significantly improves analysis performance while ensuring comprehensive code coverage. NodeProf supports runtime (de)activation of analyses and incurs zero overhead when no analysis is active. NodeProf is based on dynamic instrumentation of the JavaScript runtime and leverages automatic partial evaluation to generate efficient machine code. In addition, NodeProf makes use of the language interoperability provided by the runtime and thus allows dynamic analyses to be written in Java and JavaScript with compatibility to Jalangi, a state-of-the-art code-level JavaScript instrumentation framework. Our experiments show that the peak performance of running the same dynamic analyses using NodeProf can be up to three orders of magnitude faster than Jalangi. @InProceedings{CC18p196, author = {Haiyang Sun and Daniele Bonetta and Christian Humer and Walter Binder}, title = {Efficient Dynamic Analysis for Node.js}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {196--206}, doi = {10.1145/3178372.3179527}, year = {2018}, } Publisher's Version |
|
Thier, Patrick |
CC '18: "Fast and Flexible Instruction ..."
Fast and Flexible Instruction Selection with Constraints
Patrick Thier, M. Anton Ertl, and Andreas Krall (Vienna University of Technology, Austria) Tree-parsing instruction selection as used in, e.g., lcc, uses dynamic costs to gain flexibility and handle situations (such as read-modify-write instructions) that do not fit into the basic tree-parsing model. The disadvantage of dynamic costs is that we can no longer turn the tree grammar into a tree automaton (as is done by burg) for fast instruction selection for JIT compilers. In this paper we introduce constraints that say whether a tree-grammar rule is applicable or not. While theoretically less powerful than dynamic costs, constraints cover the practical uses of dynamic costs; more importantly, they allow turning the tree grammar with constraints into a tree automaton (with instruction-selection-time checks), resulting in faster instruction selection than with pure instruction-selection-time dynamic programming. We integrate constraints in an instruction selector that matches DAGs with tree rules. We evaluate this concept in lcc and the CACAO JavaVM JIT compiler, and see instruction selector speedups by a factor 1.33--1.89. @InProceedings{CC18p93, author = {Patrick Thier and M. Anton Ertl and Andreas Krall}, title = {Fast and Flexible Instruction Selection with Constraints}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {93--103}, doi = {10.1145/3178372.3179501}, year = {2018}, } Publisher's Version |
|
Vasiladiotis, Christos |
CC '18: "Generalized Profile-Guided ..."
Generalized Profile-Guided Iterator Recognition
Stanislav Manilov, Christos Vasiladiotis, and Björn Franke (University of Edinburgh, UK) Iterators prescribe the traversal of data structures and determine loop termination, and many loop analyses and transformations require their exact identification. While recognition of iterators is a straight-forward task for affine loops, the situation is different for loops iterating over dynamic data structures or involving control flow dependent computations to determine the next data element. In this paper we propose a compiler analysis for recognizing loop iterators code for a wide class of loops. We initially develop a static analysis, which is then enhanced with profiling information to support speculative code optimizations. We have prototyped our analysis in the LLVM framework and demonstrate its capabilities using the SPEC CPU2006 benchmarks. Our approach is applicable to all loops and we show that we can recognize iterators in, on average, 88.1% of over 75,000 loops using static analysis alone, and up to 94.9% using additional profiling information. Existing techniques perform substantially worse, especially for C and C++ applications, and cover only 35-44% of the loops. Our analysis enables advanced loop optimizations such as decoupled software pipelining, commutativity analysis and source code rejuvenation for real-world applications, which escape analysis and transformation if loop iterators are not recognized accurately. @InProceedings{CC18p185, author = {Stanislav Manilov and Christos Vasiladiotis and Björn Franke}, title = {Generalized Profile-Guided Iterator Recognition}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {185--195}, doi = {10.1145/3178372.3179511}, year = {2018}, } Publisher's Version CC '18: "Towards a Compiler Analysis ..." Towards a Compiler Analysis for Parallel Algorithmic Skeletons Tobias J. K. Edler von Koch, Stanislav Manilov, Christos Vasiladiotis, Murray Cole, and Björn Franke (Qualcomm Innovation Center, USA; University of Edinburgh, UK) Parallelizing compilers aim to detect data-parallel loops in sequential programs, which -- after suitable transformation -- can be safely and profitably executed in parallel. However, in the traditional model safe parallelization requires provable absence of dependences. At the same time, several well-known parallel algorithmic skeletons cannot be easily expressed in a data dependence framework due to spurious depedences, which prevent parallel execution. In this paper we argue that commutativity is a more suitable concept supporting formal characterization of parallel algorithmic skeletons. We show that existing commutativity definitions cannot be easily adapted for practical use, and develop a new concept of commutativity based on liveness, which readily integrates with existing compiler analyses. This enables us to develop formal definitions of parallel algorithmic skeletons such as task farms, MapReduce and Divide&Conquer. We show that existing informal characterizations of various parallel algorithmic skeletons are captured by our abstract formalizations. In this way we provide the urgently needed formal characterization of widely used parallel constructs allowing their immediate use in novel parallelizing compilers. @InProceedings{CC18p174, author = {Tobias J. K. Edler von Koch and Stanislav Manilov and Christos Vasiladiotis and Murray Cole and Björn Franke}, title = {Towards a Compiler Analysis for Parallel Algorithmic Skeletons}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {174--184}, doi = {10.1145/3178372.3179513}, year = {2018}, } Publisher's Version |
|
Verdoolaege, Sven |
CC '18: "Modeling the Conflicting Demands ..."
Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling
Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen (Inria, France; ENS, France; KU Leuven, Belgium; Polly Labs, Belgium; Georgia Tech, USA; ETH Zurich, Switzerland) The construction of effective loop nest optimizers and parallelizers remains challenging despite decades of work in the area. Due to the increasing diversity of loop-intensive applications and to the complex memory/computation hierarchies in modern processors, optimization heuristics are pulled towards conflicting goals, highlighting the lack of a systematic approach to optimizing locality and parallelism. Acknowledging these conflicting demands on loop nest optimization, we propose an algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators. This algorithmic template orchestrates a collection of parameterizable, linear optimization problems over a polyhedral space of semantics-preserving transformations. While the overall problem is not convex, effective algorithms can be derived from this template delivering unprecedented performance portability over GPU and multicore CPU. We discuss the rationale for this algorithmic template and validate it on representative computational kernels/benchmarks. @InProceedings{CC18p3, author = {Oleksandr Zinenko and Sven Verdoolaege and Chandan Reddy and Jun Shirako and Tobias Grosser and Vivek Sarkar and Albert Cohen}, title = {Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {3--13}, doi = {10.1145/3178372.3179507}, year = {2018}, } Publisher's Version |
|
Von Koch, Tobias J. K. Edler |
CC '18: "Towards a Compiler Analysis ..."
Towards a Compiler Analysis for Parallel Algorithmic Skeletons
Tobias J. K. Edler von Koch, Stanislav Manilov, Christos Vasiladiotis, Murray Cole, and Björn Franke (Qualcomm Innovation Center, USA; University of Edinburgh, UK) Parallelizing compilers aim to detect data-parallel loops in sequential programs, which -- after suitable transformation -- can be safely and profitably executed in parallel. However, in the traditional model safe parallelization requires provable absence of dependences. At the same time, several well-known parallel algorithmic skeletons cannot be easily expressed in a data dependence framework due to spurious depedences, which prevent parallel execution. In this paper we argue that commutativity is a more suitable concept supporting formal characterization of parallel algorithmic skeletons. We show that existing commutativity definitions cannot be easily adapted for practical use, and develop a new concept of commutativity based on liveness, which readily integrates with existing compiler analyses. This enables us to develop formal definitions of parallel algorithmic skeletons such as task farms, MapReduce and Divide&Conquer. We show that existing informal characterizations of various parallel algorithmic skeletons are captured by our abstract formalizations. In this way we provide the urgently needed formal characterization of widely used parallel constructs allowing their immediate use in novel parallelizing compilers. @InProceedings{CC18p174, author = {Tobias J. K. Edler von Koch and Stanislav Manilov and Christos Vasiladiotis and Murray Cole and Björn Franke}, title = {Towards a Compiler Analysis for Parallel Algorithmic Skeletons}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {174--184}, doi = {10.1145/3178372.3179513}, year = {2018}, } Publisher's Version |
|
Xu, Xiaoran |
CC '18: "PAYJIT: Space-Optimal JIT ..."
PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation
Jacob Brock, Chen Ding, Xiaoran Xu, and Yan Zhang (University of Rochester, USA; Rice University, USA; Futurewei Technologies, USA) Just-in-time (JIT) compilation in dynamic programming languages can improve execution speed in code with hot sections. However, that comes at the cost of increased memory usage for the storage of compiled code and associated bookkeeping data, as well as up-front compilation time for the hot code. The current standard JIT compilation policy (used in both Android's JIT compiler and the HotSpot Java compiler) is simplistic; any method that has reached a certain hotness, as counted by invocations and loop iterations, is scheduled for compilation. This would be fine if the cost/benefit of compilation was the same for every method of a given hotness, but that is not the case. In fact, compiling a large method will likely result in less overall speedup than compiling a smaller, equally hot method. This exposes an opportunity for improvement. We propose the PAYJIT compilation policy for method-based JIT compilers, which scales compilation hotness thresholds with method sizes, and includes two-point tuning, a mechanism for determining those hotness thresholds. In this way, PAYJIT compiles more small methods, it compiles them sooner, and it nearly eliminates the compilation of large methods that contribute little to overall speedup. Among 10 popular Android apps tested across 16 experiments, the PAYJIT compilation policy decreases compilation time by a maximum of 49.2%, with an average of 18.7%; execution time by a maximum of 10.9%, with an average of 1.34% (for a geometric mean speedup of 1.46%); and code cache size by a maximum of 41.1%, with an average of 15.6%. @InProceedings{CC18p71, author = {Jacob Brock and Chen Ding and Xiaoran Xu and Yan Zhang}, title = {PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {71--81}, doi = {10.1145/3178372.3179523}, year = {2018}, } Publisher's Version |
|
Yoshida, Nobuko |
CC '18: "A Session Type Provider: Compile-Time ..."
A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#
Rumyana Neykova, Raymond Hu, Nobuko Yoshida, and Fahd Abdeljallal (Imperial College London, UK) We present a library for the specification and implementation of distributed protocols in native F# (and other .NET languages) based on multiparty session types (MPST). There are two main contributions. Our library is the first practical development of MPST to support what we refer to as interaction refinements: a collection of features related to the refinement of protocols, such as message-type refinements (value constraints) and message value dependent control flow. A well-typed endpoint program using our library is guaranteed to perform only compliant session I/O actions w.r.t. to the refined protocol, up to premature termination. Second, our library is developed as a session type provider, that performs on-demand compile-time protocol validation and generation of protocol-specific .NET types for users writing the distributed endpoint programs. It is implemented by extending and integrating Scribble (a toolchain for MPST) with an SMT solver into the type providers framework. The safety guarantees are achieved by a combination of static type checking of the generated types for messages and I/O operations, correctness by construction from code generation, and automated inlining of assertions. @InProceedings{CC18p128, author = {Rumyana Neykova and Raymond Hu and Nobuko Yoshida and Fahd Abdeljallal}, title = {A Session Type Provider: Compile-Time API Generation of Distributed Protocols with Refinements in F#}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {128--138}, doi = {10.1145/3178372.3179495}, year = {2018}, } Publisher's Version |
|
Zarifi, Ramin |
CC '18: "Tail Call Elimination and ..."
Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine
Magnus Madsen, Ramin Zarifi, and Ondřej Lhoták (Aalborg University, Denmark; University of Waterloo, Canada) The Java Virtual Machine (JVM) offers an attractive runtime environment for programming language implementers. The JVM has a simple bytecode format, excellent performance, multiple state-of-the art garbage collectors, robust backwards compatibility, and it runs on almost all platforms. Further, the Java ecosystem grants access to a plethora of libraries and tooling, including debuggers and profilers. Yet, the JVM was originally designed for Java, and its representation of code and data may cause difficulties for other languages. In this paper, we discuss how to efficiently implement functional languages on the JVM. We focus on two overall challenges: (a) how to efficiently represent algebraic data types in the presence of tags and tuples, option types, newtypes, and parametric polymorphism, and (b) how to support full tail call elimination on the JVM. We present two technical contributions: A fused representation of tags and tuples and a full tail call elimination strategy that is thread-safe. We implement these techniques in the Flix language and evaluate their performance. @InProceedings{CC18p139, author = {Magnus Madsen and Ramin Zarifi and Ondřej Lhoták}, title = {Tail Call Elimination and Data Representation for Functional Languages on the Java Virtual Machine}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {139--150}, doi = {10.1145/3178372.3179499}, year = {2018}, } Publisher's Version |
|
Zhang, Yan |
CC '18: "PAYJIT: Space-Optimal JIT ..."
PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation
Jacob Brock, Chen Ding, Xiaoran Xu, and Yan Zhang (University of Rochester, USA; Rice University, USA; Futurewei Technologies, USA) Just-in-time (JIT) compilation in dynamic programming languages can improve execution speed in code with hot sections. However, that comes at the cost of increased memory usage for the storage of compiled code and associated bookkeeping data, as well as up-front compilation time for the hot code. The current standard JIT compilation policy (used in both Android's JIT compiler and the HotSpot Java compiler) is simplistic; any method that has reached a certain hotness, as counted by invocations and loop iterations, is scheduled for compilation. This would be fine if the cost/benefit of compilation was the same for every method of a given hotness, but that is not the case. In fact, compiling a large method will likely result in less overall speedup than compiling a smaller, equally hot method. This exposes an opportunity for improvement. We propose the PAYJIT compilation policy for method-based JIT compilers, which scales compilation hotness thresholds with method sizes, and includes two-point tuning, a mechanism for determining those hotness thresholds. In this way, PAYJIT compiles more small methods, it compiles them sooner, and it nearly eliminates the compilation of large methods that contribute little to overall speedup. Among 10 popular Android apps tested across 16 experiments, the PAYJIT compilation policy decreases compilation time by a maximum of 49.2%, with an average of 18.7%; execution time by a maximum of 10.9%, with an average of 1.34% (for a geometric mean speedup of 1.46%); and code cache size by a maximum of 41.1%, with an average of 15.6%. @InProceedings{CC18p71, author = {Jacob Brock and Chen Ding and Xiaoran Xu and Yan Zhang}, title = {PAYJIT: Space-Optimal JIT Compilation and Its Practical Implementation}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {71--81}, doi = {10.1145/3178372.3179523}, year = {2018}, } Publisher's Version |
|
Zhao, Jie |
CC '18: "A Polyhedral Compilation Framework ..."
A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds
Jie Zhao, Michael Kruse, and Albert Cohen (Inria, France; ENS, France) We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable, removing the loop-carried data dependences induced by termination conditions. We propose an automatic compilation approach to parallelize and optimize dynamic counted loops. Our approach relies on affine relations only, as implemented in state-of-the-art polyhedral libraries. Revisiting a state-of-the-art framework to parallelize arbitrary while loops, we introduce additional control dependences on data-dependent predicates. Our method goes beyond the state of the art in fully automating the process, specializing the code generation algorithm to the case of dynamic counted loops and avoiding the introduction of spurious loop-carried dependences. We conduct experiments on representative irregular computations, from dynamic programming, computer vision and finite element methods to sparse matrix linear algebra. We validate that the method is applicable to general affine transformations for locality optimization, vectorization and parallelization. @InProceedings{CC18p14, author = {Jie Zhao and Michael Kruse and Albert Cohen}, title = {A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {14--24}, doi = {10.1145/3178372.3179509}, year = {2018}, } Publisher's Version |
|
Zhao, Jisheng |
CC '18: "Parallel Sparse Flow-Sensitive ..."
Parallel Sparse Flow-Sensitive Points-to Analysis
Jisheng Zhao, Michael G. Burke, and Vivek Sarkar (Rice University, USA) This paper aims to contribute to further advances in pointer (or points-to) analysis algorithms along the combined dimen- sions of precision, scalability, and performance. For precision, we aim to support interprocedural ow-sensitive analysis. For scalability, we aim to show that our approach scales to large applications with reasonable memory requirements. For performance, we aim to design a points-to analysis algo- rithm that is amenable to parallel execution. The algorithm introduced in this paper achieves all these goals. As an ex- ample, our experimental results show that our algorithm can analyze the 2.2MLOC Tizen OS framework with < 16GB of memory while delivering an average analysis rate of > 10KLOC/second. Our points-to analysis algorithm, PSEGPT, is based on the Pointer Sparse Evaluation Graph (PSEG) form, a new analysis representation that combines both points-to and heap def-use information. PSEGPT is a scalable interprocedural flow-sensitive context-insensitive points-to analy- sis that is amenable to efficient task-parallel implementa- tions, even though points-to analysis is usually viewed as a challenge problem for parallelization. Our experimental results with 6 real-world applications on a 12-core machine show an average parallel speedup of 4.45× and maximum speedup of 7.35×. The evaluation also includes precision results by demonstrating that our algorithm identifies significantly more inlinable indirect calls (IICs) than SUPT and SS, two states of the art SSA-based points-to analyses implemented in LLVM. @InProceedings{CC18p59, author = {Jisheng Zhao and Michael G. Burke and Vivek Sarkar}, title = {Parallel Sparse Flow-Sensitive Points-to Analysis}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {59--70}, doi = {10.1145/3178372.3179517}, year = {2018}, } Publisher's Version |
|
Zinenko, Oleksandr |
CC '18: "Modeling the Conflicting Demands ..."
Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling
Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen (Inria, France; ENS, France; KU Leuven, Belgium; Polly Labs, Belgium; Georgia Tech, USA; ETH Zurich, Switzerland) The construction of effective loop nest optimizers and parallelizers remains challenging despite decades of work in the area. Due to the increasing diversity of loop-intensive applications and to the complex memory/computation hierarchies in modern processors, optimization heuristics are pulled towards conflicting goals, highlighting the lack of a systematic approach to optimizing locality and parallelism. Acknowledging these conflicting demands on loop nest optimization, we propose an algorithmic template capable of modeling the multi-level parallelism and the temporal/spatial locality of multiprocessors and accelerators. This algorithmic template orchestrates a collection of parameterizable, linear optimization problems over a polyhedral space of semantics-preserving transformations. While the overall problem is not convex, effective algorithms can be derived from this template delivering unprecedented performance portability over GPU and multicore CPU. We discuss the rationale for this algorithmic template and validate it on representative computational kernels/benchmarks. @InProceedings{CC18p3, author = {Oleksandr Zinenko and Sven Verdoolaege and Chandan Reddy and Jun Shirako and Tobias Grosser and Vivek Sarkar and Albert Cohen}, title = {Modeling the Conflicting Demands of Parallelism and Temporal/Spatial Locality in Affine Scheduling}, booktitle = {Proc.\ CC}, publisher = {ACM}, pages = {3--13}, doi = {10.1145/3178372.3179507}, year = {2018}, } Publisher's Version |
59 authors
proc time: 9.85