Powered by
2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018),
June 18, 2018,
Philadelphia, PA, USA
2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018)
Frontmatter
Program Analysis
Ariadne: Analysis for Machine Learning Programs
Julian Dolby,
Avraham Shinnar, Allison Allain, and Jenna Reinen
(IBM Research, USA)
Machine learning has transformed domains like vision and translation, and is now increasingly used in science, where the correctness of such code is vital. Python is popular for machine learning, in part because of its wealth of machine learning libraries, and is felt to make development faster; however, this dynamic language has less support for error detection at code creation time than tools like Eclipse. This is especially problematic for machine learning: given its statistical nature, code with subtle errors may run and produce results that look plausible but are meaningless. This can vitiate scientific results. We report on : applying a static framework, WALA, to machine learning code that uses TensorFlow. We have created static analysis for Python, a type system for tracking tensors—Tensorflow’s core data structures—and a data flow analysis to track their usage. We report on how it was built and present some early results.
@InProceedings{MAPL18p1,
author = {Julian Dolby and Avraham Shinnar and Allison Allain and Jenna Reinen},
title = {Ariadne: Analysis for Machine Learning Programs},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {1--10},
doi = {10.1145/3211346.3211349},
year = {2018},
}
Publisher's Version
Clone-Hunter: Accelerated Bound Checks Elimination via Binary Code Clone Detection
Hongfa Xue, Guru Venkataramani, and Tian Lan
(George Washington University, USA)
Unsafe pointer usage and illegitimate memory accesses are prevalent bugs in software. To ensure memory safety, conditions for array bound checks are inserted into the code to detect out-of-bound memory accesses. Unfortunately, these bound checks contribute to high runtime overheads, and therefore, redundant array bound checks should be removed to improve application performance. In this paper, we propose Clone-Hunter, a practical and scalable framework for redundant bound check elimination in binary executables. Clone-Hunter first uses binary code clone detection, and then employs bound safety verification mechanism (using binary symbolic execution) to ensure sound removal of redundant bound checks. Our results show the Clone-Hunter can swiftly identify redundant bound checks about 90× faster than pure binary symbolic execution, while ensuring zero false positives.
@InProceedings{MAPL18p11,
author = {Hongfa Xue and Guru Venkataramani and Tian Lan},
title = {Clone-Hunter: Accelerated Bound Checks Elimination via Binary Code Clone Detection},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {11--19},
doi = {10.1145/3211346.3211347},
year = {2018},
}
Publisher's Version
Code Search
Obfuscation Resilient Search through Executable Classification
Fang-Hsiang Su, Jonathan Bell,
Gail Kaiser, and Baishakhi Ray
(Columbia University, USA; George Mason University, USA)
Android applications are usually obfuscated before release, making it difficult to analyze them for malware presence or intellectual property violations. Obfuscators might hide the true intent of code by renaming variables and/or modifying program structures. It is challenging to search for executables relevant to an obfuscated application for developers to analyze efficiently. Prior approaches toward obfuscation resilient search have relied on certain structural parts of apps remaining as landmarks, un-touched by obfuscation. For instance, some prior approaches have assumed that the structural relationships between identifiers are not broken by obfuscators; others have assumed that control flow graphs maintain their structures. Both approaches can be easily defeated by a motivated obfuscator. We present a new approach, MACNETO, to search for programs relevant to obfuscated executables leveraging deep learning and principal components on instructions. MACNETO makes few assumptions about the kinds of modifications that an obfuscator might perform. We show that it has high search precision for executables obfuscated by a state-of-the-art obfuscator that changes control flow. Further, we also demonstrate the potential of MACNETO to help developers understand executables, where MACNETO infers keywords (which are from relevant un-obfuscated programs) for obfuscated executables.
@InProceedings{MAPL18p20,
author = {Fang-Hsiang Su and Jonathan Bell and Gail Kaiser and Baishakhi Ray},
title = {Obfuscation Resilient Search through Executable Classification},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {20--30},
doi = {10.1145/3211346.3211352},
year = {2018},
}
Publisher's Version
Info
Retrieval on Source Code: A Neural Code Search
Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim,
Koushik Sen, and Satish Chandra
(University of Waterloo, Canada; Facebook, USA; University of California at Berkeley, USA)
Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports.
In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand.
Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word–embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.
@InProceedings{MAPL18p31,
author = {Saksham Sachdev and Hongyu Li and Sifei Luan and Seohyun Kim and Koushik Sen and Satish Chandra},
title = {Retrieval on Source Code: A Neural Code Search},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {31--41},
doi = {10.1145/3211346.3211353},
year = {2018},
}
Publisher's Version
New Languages
Diesel: DSL for Linear Algebra and Neural Net Computations on GPUs
Venmugil Elango, Norm Rubin, Mahesh Ravishankar, Hariharan Sandanagobalane, and
Vinod Grover
(NVIDIA, USA)
We present a domain specific language compiler, Diesel, for basic linear
algebra and neural network computations, that accepts input
expressions in an intuitive form and generates high performing code
for GPUs. The current trend is to represent a neural network as a
computation DAG, where each node in the DAG corresponds to a single
operation such as matrix-matrix multiplication, and map the individual
operations to hand tuned library functions provided by standard
libraries such as CuBLAS and CuDNN. While this method takes advantage
of readily available optimized library codes to achieve good
performance for individual operations, it is not possible to optimize
across operations. As opposed to this, given a computation composed of
several operations, Diesel generates (a set) of efficient device
functions, where the code is optimized for the computation as a whole,
using polyhedral compilation techniques. In addition, there are cases
where the code needs to be specialized for specific problem sizes to
achieve optimal performance.
While standard libraries are written for parametric problem sizes (where
problem sizes are provided at runtime), Diesel can accept problem
sizes at compile time and generate specialized codes.
Experimental results show that the performance achieved by Diesel
generated code for individual operations are comparable to the highly
tuned versions provided by standard libraries, while for composite
computations, Diesel outperforms manually written versions.
@InProceedings{MAPL18p42,
author = {Venmugil Elango and Norm Rubin and Mahesh Ravishankar and Hariharan Sandanagobalane and Vinod Grover},
title = {Diesel: DSL for Linear Algebra and Neural Net Computations on GPUs},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {42--51},
doi = {10.1145/3211346.3211354},
year = {2018},
}
Publisher's Version
A Design Proposal for Gen: Probabilistic Programming with Fast Custom Inference via Code Generation
Marco Cusumano-Towner and
Vikash K. Mansinghka
(Massachusetts Institute of Technology, USA)
Probabilistic programming languages have the potential to make probabilistic modeling and inference easier to use in practice, but only if inference is sufficiently fast and accurate for real applications. Thus far, this has only been possible for domain-specific languages that focus on a restricted class of models and inference algorithms. This paper proposes a design for a probabilistic programming language called Gen, embedded in Julia, that aims to be sufficiently expressive and performant for general-purpose use. The language provides constructs for automatically generating optimized implementations of custom inference tactics based on static analysis of the target probabilistic model. This paper informally describes a language design for Gen, and shows that Gen is more expressive than Stan, a widely used language for hierarchical Bayesian modeling. A first benchmark shows that a prototype implementation of Gen can be as fast as Stan, only ∼1.4x slower than a hand-coded sampler in Julia, and ∼7,500x faster than Venture, one of the only other probabilistic languages with support for custom inference.
@InProceedings{MAPL18p52,
author = {Marco Cusumano-Towner and Vikash K. Mansinghka},
title = {A Design Proposal for Gen: Probabilistic Programming with Fast Custom Inference via Code Generation},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {52--57},
doi = {10.1145/3211346.3211350},
year = {2018},
}
Publisher's Version
Relay: A New IR for Machine Learning Frameworks
Jared Roesch, Steven Lyubomirsky, Logan Weber, Josh Pollock, Marisa Kirisame, Tianqi Chen, and
Zachary Tatlock
(University of Washington, USA)
Machine learning powers diverse services in industry including search,
translation, recommendation systems, and security. The scale and
importance of these models require that they be efficient, expressive, and
portable across an array of heterogeneous hardware devices. These constraints
are often at odds; in order to better accommodate them we propose a
new high-level intermediate representation (IR) called Relay. Relay is being designed
as a purely-functional, statically-typed language with the goal of balancing efficient
compilation, expressiveness, and portability. We discuss the goals of Relay and
highlight its important design constraints. Our prototype is part of the open
source NNVM compiler framework, which powers Amazon's deep learning framework
MxNet.
@InProceedings{MAPL18p58,
author = {Jared Roesch and Steven Lyubomirsky and Logan Weber and Josh Pollock and Marisa Kirisame and Tianqi Chen and Zachary Tatlock},
title = {Relay: A New IR for Machine Learning Frameworks},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {58--68},
doi = {10.1145/3211346.3211348},
year = {2018},
}
Publisher's Version
Programming Methodology
The Three Pillars of Machine Programming
Justin Gottschlich,
Armando Solar-Lezama, Nesime Tatbul,
Michael Carbin,
Martin Rinard, Regina Barzilay,
Saman Amarasinghe,
Joshua B. Tenenbaum, and Tim Mattson
(Intel Labs, USA; Massachusetts Institute of Technology, USA)
In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.
@InProceedings{MAPL18p69,
author = {Justin Gottschlich and Armando Solar-Lezama and Nesime Tatbul and Michael Carbin and Martin Rinard and Regina Barzilay and Saman Amarasinghe and Joshua B. Tenenbaum and Tim Mattson},
title = {The Three Pillars of Machine Programming},
booktitle = {Proc.\ MAPL},
publisher = {ACM},
pages = {69--80},
doi = {10.1145/3211346.3211355},
year = {2018},
}
Publisher's Version
proc time: 1.61