MAPL 2017 – Proceedings

Message from the Chairs
Welcome to the first ACM SIGPLAN workshop on Machine Learning and Programming Languages (MAPL), co-located with PLDI in Barcelona, Spain, June 18, 2017. The focus of MAPL is to advance interdisciplinary research across the fields of machine learning (ML) and programming languages (PL).
Although significant advances have been made in both ML and PL, individually, we believe there are many open research problems that are likely to be best solved using a combination of the two. Some of those areas aim to simplify programmer intention, use ML for algorithmic invention, as well as provide infrastructure for the autonomous generation and optimization of software programs. It is for these reasons that we formed MAPL -- to help usher in a new wave of research in these domains.

Languages and Frameworks

A Computational Model for TensorFlow: An Introduction
Martín Abadi, Michael Isard, and Derek G. Murray
(Google Brain, USA)
TensorFlow is a powerful, programmable system for machine learning. This paper aims to provide the basics of a conceptual framework for understanding the behavior of TensorFlow models during training and inference: it describes an operational semantics, of the kind common in the literature on programming languages. More broadly, the paper suggests that a programming-language perspective is fruitful in designing and in explaining systems such as TensorFlow.

Dyna: Toward a Self-Optimizing Declarative Language for Machine Learning Applications
Tim Vieira, Matthew Francis-Landau, Nathaniel Wesley Filardo, Farzad Khorasani, and Jason Eisner
(Johns Hopkins University, USA; Rice University, USA)
Declarative programming is a paradigm that allows programmers to specify what they want to compute, leaving how to compute it to a solver. Our declarative programming language, Dyna, is designed to compactly specify computations like those that are frequently encountered in machine learning. As a declarative language, Dyna’s solver has a large space of (correct) strategies available to it. We describe a reinforcement learning framework for adaptively choosing among these strategies to maximize efficiency for a given workload. Adaptivity in execution is especially important for software that will run under a variety of workloads, where no fixed policy works well. We hope that reinforcement learning will identify good policies reasonably quickly—offloading the burden of writing efficient code from human programmers.

Debugging, Analysis, and Verification

Debugging Probabilistic Programs
Chandrakana Nandi

, Dan Grossman, Adrian Sampson, Todd Mytkowicz, and Kathryn S. McKinley

(University of Washington, USA; Cornell University, USA; Microsoft Research, USA; Google, USA)
Many applications compute with estimated and uncertain data. While advances in probabilistic programming help developers build such applications, debugging them remains extremely challenging. New types of errors in probabilistic programs include 1) ignoring dependencies and correlation between random variables and in training data, 2) poorly chosen inference hyper-parameters, and 3) incorrect statistical models. A partial solution to prevent these errors in some languages forbids developers from explicitly invoking inference. While this prevents some dependence errors, it limits composition and control over inference, and does not guarantee absence of other types of errors. This paper presents the FLEXI programming model which supports constructs for invoking inference in the language and reusing the results in other statistical computations. We define a novel formalism for inference with a Decorated Bayesian Network and present a tool, DePP, that analyzes this representation to identify the above errors. We evaluate DePP on a range of prototypical examples to show how it helps developers to detect errors.

Combining the Logical and the Probabilistic in Program Analysis
Xin Zhang, Xujie Si, and Mayur Naik

(Georgia Institute of Technology, USA; University of Pennsylvania, USA)
Conventional program analyses have made great strides by leveraging logical reasoning. However, they cannot handle uncertain knowledge, and they lack the ability to learn and adapt. This in turn hinders the accuracy, scalability, and usability of program analysis tools in practice. We seek to address these limitations by proposing a methodology and framework for incorporating probabilistic reasoning directly into existing program analyses that are based on logical reasoning. We demonstrate that the combined approach can benefit a number of important applications of program analysis and thereby facilitate more widespread adoption of this technology.

Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools
Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter
(University of Maryland at College Park, USA)
The large scale and high complexity of modern software systems make perfectly precise static code analysis (SCA) infeasible. Therefore SCA tools often over-approximate, so not to miss any real problems. This, however, comes at the expense of raising false alarms, which, in practice, reduces the usability of these tools. To partially address this problem, we propose a novel learning process whose goal is to discover program structures that cause a given SCA tool to emit false error reports, and then to use this information to predict whether a new error report is likely to be a false positive as well. To do this, we first preprocess code to isolate the locations that are related to the error report. Then, we apply machine learning techniques to the preprocessed code to discover correlations and to learn a classifier. We evaluated this approach in an initial case study of a widely-used SCA tool for Java. Our results showed that for our dataset we could accurately classify a large majority of false positive error reports. Moreover, we identified some common coding patterns that led to false positive errors. We believe that SCA developers may be able to redesign their methods to address these patterns and reduce false positive error reports.

Verified Perceptron Convergence Theorem
Charlie Murphy, Patrick Gray, and Gordon Stewart
(Princeton University, USA; Ohio University, USA)
Frank Rosenblatt invented the perceptron algorithm in 1957 as part of an early attempt to build ``brain models'', artificial neural networks. In this paper, we apply tools from symbolic logic such as dependent type theory as implemented in Coq to build, and prove convergence of, one-layer perceptrons (specifically, we show that our Coq implementation converges to a binary classifier when trained on linearly separable datasets).
Our perceptron and proof are extensible, which we demonstrate by adapting our convergence proof to the averaged perceptron, a common variant of the basic perceptron algorithm. We perform experiments to evaluate the performance of our Coq perceptron vs. an arbitrary-precision C++ implementation and against a hybrid implementation in which separators learned in C++ are certified in Coq. We find that by carefully optimizing the extraction of our Coq perceptron, we can meet -- and occasionally exceed -- the performance of the arbitrary-precision C++ implementation. Our hybrid Coq certifier demonstrates an architecture for building high-assurance machine-learning systems that reuse existing codebases.

Info

MAPL 2017 – Proceedings

1st ACM SIGPLAN Workshop on Machine Learning and Programming Languages (MAPL 2017)

Frontmatter

Languages and Frameworks

Debugging, Analysis, and Verification