MAPL 2019 – Proceedings

Welcome from the Chairs
Welcome to the third ACM SIGPLAN workshop on Machine Learning and Programming Languages (MAPL), co-located with PLDI in Phoenix, Arizona. The focus of MAPL is to advance interdisciplinary research across the fields of machine learning (ML) and programming languages (PL).
The workshop was born of the belief that although significant advances have been made in both ML and PL, there are many open research problems that are likely to be best solved using a combination of the two. These include the development of machine learning to support novel programming systems that are better able to capture programmer's intentions, invent new algorithms, and potentially help with the autonomous generation and optimization of software programs. It also includes the use of programming systems technologies to improve machine learning infrastructure and to support learning with stronger guarantees.

Papers

Machine Learning in Python with No Strings Attached
Guillaume Baudart, Martin Hirzel, Kiran Kate, Louis Mandel

, and Avraham Shinnar

(IBM Research, USA)
Machine-learning frameworks in Python, such as scikit-learn, Keras, Spark, or Pyro, use embedded domain specific languages (EDSLs) to assemble a computational graph. Unfortunately, these EDSLs make heavy use of strings as names for computational graph nodes and other entities, leading to repetitive and hard-to-maintain code that does not benefit from standard Python tooling. This paper proposes eliminating strings where possible, reusing Python variable names instead. We demonstrate this on two examples from opposite ends of the design space: Keras.na, a light-weight wrapper around the Keras library, and , a new embedding of Stan into Python. Our techniques do not require modifications to the underlying library. Avoiding strings removes redundancy, simplifies maintenance, and enables Python tooling to better reason about the code and assist users.

Publisher's Version

Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations
Philippe Tillet, H. T. Kung, and David Cox
(Harvard University, USA; IBM, USA)
The validation and deployment of novel research ideas in the field of Deep Learning is often limited by the availability of efficient compute kernels for certain basic primitives. In particular, operations that cannot leverage existing vendor libraries (e.g., cuBLAS, cuDNN) are at risk of facing poor device utilization unless custom implementations are written by experts – usually at the expense of portability. For this reason, the development of new programming abstractions for specifying custom Deep Learning workloads at a minimal performance cost has become crucial.
We present Triton, a language and compiler centered around the concept of tile, i.e., statically shaped multi-dimensional sub-arrays. Our approach revolves around (1) a C-based language and an LLVM-based intermediate representation (IR) for expressing tensor programs in terms of operations on parametric tile variables and (2) a set of novel tile-level optimization passes for compiling these programs into efficient GPU code. We demonstrate how Triton can be used to build portable implementations of matrix multiplication and convolution kernels on par with hand-tuned vendor libraries (cuBLAS / cuDNN), or for efficiently implementing recent research ideas such as shift convolutions.

Publisher's Version

Info

HackPPL: A Universal Probabilistic Programming Language
Jessica Ai, Nimar S. Arora, Ning Dong, Beliz Gokkaya, Thomas Jiang, Anitha Kubendran, Arun Kumar, Michael Tingley, and Narjes Torabi
(Facebook, USA)
HackPPL is a probabilistic programming language (PPL) built within the Hack programming language. Its universal inference engine allows developers to perform inference across a diverse set of models expressible in arbitrary Hack code. Through language-level extensions and direct integration with developer tools, HackPPL aims to bridge the gap between domain-specific and embedded PPLs. This paper overviews the design and implementation choices for the HackPPL toolchain and presents findings by applying it to a representative problem faced by social media companies.

Publisher's Version

Neural Query Expansion for Code Search
Jason Liu, Seohyun Kim, Vijayaraghavan Murali, Swarat Chaudhuri, and Satish Chandra

(Facebook, USA; Rice University, USA)
Searching repositories of existing source code for code snippets is a key task in software engineering. Over the years, many approaches to this problem have been proposed. One recent tool called NCS, takes in a natural language query and outputs relevant code snippets, often being able to correctly answer Stack Overflow questions. But what happens when the developer doesn’t provide a query with a clear intent? What if shorter queries are used to demonstrate a more vague intent?
We find that the performance of NCS regresses with shorter queries. Furthermore, data from developers’ code search history logs shows that shorter queries have a less successful code search session: there are more query reformulations and more time is spent browsing the results. These observations lead us to believe that using NCS alone with short queries may not be productive enough.
In this paper, we explore an additional way of using neural networks in code search: the automatic expansion of queries. We present NQE, a neural model that takes in a set of keywords and predicts a set of keywords to expand the query to NCS. NQE learns to predict keywords that co-occur with the query keywords in the underlying corpus, which helps expand the query in a productive way. Our results show that with query expansion, NQE + NCS is able to perform better than using NCS alone.

Publisher's Version

A Case Study on Machine Learning for Synthesizing Benchmarks
Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, and Jeronimo Castrillon

(TU Dresden, Germany; University of Edinburgh, UK)
Good benchmarks are hard to find because they require a substantial effort to keep them representative for the constantly changing challenges of a particular field. Synthetic benchmarks are a common approach to deal with this, and methods from machine learning are natural candidates for synthetic benchmark generation. In this paper we investigate the usefulness of machine learning in the prominent CLgen benchmark generator. We re-evaluate CLgen by comparing the benchmarks generated by the model with the raw data used to train it. This re-evaluation indicates that, for the use case considered, machine learning did not yield additional benefit over a simpler method using the raw data. We investigate the reasons for this and provide further insights into the challenges the problem could pose for potential future generators.

Publisher's Version

MAPL 2019 – Proceedings

3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2019)

Frontmatter

Papers