Onward! 2017 – Proceedings

Message from the Chairs
Onward! is a premier multidisciplinary conference focused on everything to do with programming and software, including processes, methods, languages, communities and applications. Onward! is more radical, more visionary and more open than other conferences to ideas that are well-argued but not yet proven. We welcome different ways of thinking about, approaching and reporting on programming language and software engineering research. Onward! 2017 is part of SPLASH 2017, taking place in Vancouver, Canada on October 22--27, 2017.

Papers

Language Design

Can We Crowdsource Language Design?
Preston Tunnell Wilson, Justin Pombrio, and Shriram Krishnamurthi
(Brown University, USA)
Most programming languages have been designed by committees or individuals. What happens if, instead, we throw open the design process and let lots of programmers weigh in on semantic choices? Will they avoid well-known mistakes like dynamic scope? What do they expect of aliasing? What kind of overloading behavior will they choose?
We investigate this issue by posing questions to programmers on Amazon Mechanical Turk. We examine several language features, in each case using multiple-choice questions to explore programmer preferences. We check the responses for consensus (agreement between people) and consistency (agreement across responses from one person). In general we find low consistency and consensus, potential confusion over mainstream features, and arguably poor design choices. In short, this preliminary evidence does not argue in favor of designing languages based on programmer preference.

Assessing User Preferences in Programming Language Design
Roger D. Chamberlain
(Washington University at St. Louis, USA)
The design of new programming languages has primarily been guided by the preferences of a few (the authors of the language), rather than systematic study of the various options available. This is in part due to the fact that user studies to effectively test usability or understandability hypotheses are cumbersome and expensive. An interesting question is whether crowdsourcing techniques can be leveraged to improve this situation.
We explore this idea using a specific example. While the streaming data paradigm is a popular one for expressing parallelism within applications, there has been little consensus on the methods used to express streaming topologies. Here, we explore the use of Mechanical Turk to recruit self-described programmers as a community to assess user preferences and code readability for two techniques currently in use for the expression of streaming application topology.
The positive results of this study point to the idea that crowdsourcing techniques can be an effective technique that can inexpensively assist language developers in making good design choices.

Replacing Phrase Structure Grammar with Dependency Grammar in the Design and Implementation of Programming Languages
Friedrich Steimann

(Fernuniversität in Hagen, Germany)
For decades, the design and implementation of programming languages has been based on phrase structure grammars, which divide a program recursively into phrases (represented by nonterminals), leaving the words of the program (the terminals) as the atoms of this division. By contrast, dependency grammar suggests that dependency between words, not constituency between phrases, is the primary grammatical relationship, and that a grammar capturing this relationship is basically a lexicon.
In this paper, I suggest the use of dependency grammar for the design and implementation of programming languages. This allows me to unify the dictionaries populated by programs (with variables, procedures, functions, etc.) written in a programming language with the grammar of this language. I call the compilation of a programming language and its programs into a lexicon a langram and, using the example of the while language, show how langrams serve the “growing of languages”.

Program Generation and Synthesis

Generating Chat Bots from Web API Specifications
Mandana Vaziri, Louis Mandel

, Avraham Shinnar

, Jérôme Siméon, and Martin Hirzel
(IBM Research, USA)
Companies want to offer chat bots to their customers and employees which can answer questions, enable self-service, and showcase their products and services. Implementing and maintaining chat bots by hand costs time and money. Companies typically have web APIs for their services, which are often documented with an API specification. This paper presents a compiler that takes a web API specification written in Swagger and automatically generates a chat bot that helps the user make API calls. The generated bot is self-documenting, using descriptions from the API specification to answer help requests. Unfortunately, Swagger specifications are not always good enough to generate high-quality chat bots. This paper addresses this problem via a novel in-dialogue curation approach: the power user can improve the generated chat bot by interacting with it. The result is then saved back as an API specification. This paper reports on the design and implementation of the chat bot compiler, the in-dialogue curation, and working case studies.

ChimpCheck: Property-Based Randomized Test Generation for Interactive Apps
Edmund S. L. Lam, Peilun Zhang, and Bor-Yuh Evan Chang

(University of Colorado at Boulder, USA)
We consider the problem of generating relevant execution traces to test rich interactive applications. Rich interactive applications, such as apps on mobile platforms, are complex stateful and often distributed systems where sufficiently exercising the app with user-interaction (UI) event sequences to expose defects is both hard and time-consuming. In particular, there is a fundamental tension between brute-force random UI exercising tools, which are fully-automated but offer low relevance, and UI test scripts, which are manual but offer high relevance. In this paper, we consider a middle way---enabling a seamless fusion of scripted and randomized UI testing. This fusion is prototyped in a testing tool called ChimpCheck for programming, generating, and executing property-based randomized test cases for Android apps. Our approach realizes this fusion by offering a high-level, embedded domain-specific language for defining custom generators of simulated user-interaction event sequences. What follows is a combinator library built on industrial strength frameworks for property-based testing (ScalaCheck) and Android testing (Android JUnit and Espresso) to implement property-based randomized testing for Android development. Driven by real, reported issues in open source Android apps, we show, through case studies, how ChimpCheck enables expressing effective testing patterns in a compact manner.

Unbounded Superoptimization
Abhinav Jangda and Greta Yorsh
(IIT Varanasi, India; Queen Mary University of London, UK)
Our aim is to enable software to take full advantage of the capabilities of emerging microprocessor designs without modifying the compiler.
Towards this end, we propose a new approach to code generation and optimization. Our approach uses an SMT solver in a novel way to generate efficient code for modern architectures and guarantee that the generated code correctly implements the source code. The distinguishing characteristic of our approach is that the size of the constraints does not depend on the candidate sequence of instructions.
To study the feasibility of our approach, we implemented a preliminary prototype, which takes as input LLVM IR code and uses Z3 SMT solver to generate ARMv7-A assembly. The prototype handles arbitrary loop-free code (not only basic blocks) as input and output. We applied it to small but tricky examples used as standard benchmarks for other superoptimization and synthesis tools. We are encouraged to see that Z3 successfully solved complex constraints that arise from our approach.
This work paves the way to employing recent advances in SMT solvers and has a potential to advance SMT solvers further by providing a new category of challenging benchmarks that come from an industrial application domain.

Programming Models

The Serverless Trilemma: Function Composition for Serverless Computing
Ioana Baldini, Perry Cheng, Stephen J. Fink, Nick Mitchell, Vinod Muthusamy, Rodric Rabbah, Philippe Suter, and Olivier Tardieu

(IBM Research, USA; Two Sigma, USA)
The field of serverless computing has recently emerged in support of highly scalable, event-driven applications. A serverless application is a set of stateless functions, along with the events that should trigger their activation. A serverless runtime allocates resources as events arrive, avoiding the need for costly pre-allocated or dedicated hardware.
While an attractive economic proposition, serverless computing currently lags behind the state of the art when it comes to function composition. This paper addresses the challenge of programming a composition of functions, where the composition is itself a serverless function.
We demonstrate that engineering function composition into a serverless application is possible, but requires a careful evaluation of trade-offs. To help in evaluating these trade-offs, we identify three competing constraints: functions should be considered as black boxes; function composition should obey a substitution principle with respect to synchronous invocation; and invocations should not be double-billed.
Furthermore, we argue that, if the serverless runtime is limited to a reactive core, i.e. one that deals only with dispatching functions in response to events, then these constraints form the serverless trilemma. Without specific runtime support, compositions-as-functions must violate at least one of the three constraints.
Finally, we demonstrate an extension to the reactive core of an open-source serverless runtime that enables the sequential composition of functions in a trilemma-satisfying way. We conjecture that this technique could be generalized to support other combinations of functions.

Encoding the Building Blocks of Communication
Aleksandar Prokopec

(Oracle Labs, Switzerland)
Distributed systems are often built from the simplest building blocks such as message sends and RPCs. Since many communication patterns have to be reinvented every time a distributed system is created, implementing a high-level system is usually expensive. The recently proposed reactor model alleviates this cost by expressing distributed computations as reusable components, however, encodings for various communications patterns in this model are missing.
This paper investigates how to encode the router, client-server, scatter-gather, rendezvous, two-way communication, reliable communication and the backpressure protocol in the reactor model. These protocols are used to implement the core of a distributed streaming framework, and the performance of these implementations is evaluated.

Iᴏᴛᴀ: A Calculus for Internet of Things Automation
Julie L. Newcomb, Satish Chandra

, Jean-Baptiste Jeannin, Cole Schlesinger, and Manu Sridharan

(University of Washington, USA; Samsung Research, USA)
Programmatically controllable home devices are proliferating, ranging from lights, locks, and motion sensors to smart refrigerators, televisions, and cameras, giving end users unprecedented control over their environment. New domain-specific languages are emerging to supplant general purpose programming platforms as a means for end users to configure home automation. These languages, based on event-condition-action (ECA) rules, have an appealing simplicity. But programmatic control lets users write programs with bugs, introducing the frustrations of software engineering with none of the tool support. The subtle semantics of the home automation domain---and the varying interfaces and implementation strategies that existing home automation platforms use---exacerbates the problem. In this work, we present the Internet of Things Automation (Iota) calculus, the first calculus for the domain of home automation. Iota models an ECA language equipped with first-class notions of time, state, and device aggregation, and comes equipped with a precise semantics inspired by a careful analysis of five existing home automation platforms. We show that the Iota calculus is useful by implementing two analyses from the software engineering literature, and expressive by encoding sixteen programs from these home automation platforms. Along the way, we highlight where the design of the Iota semantics rules out subtle classes of bugs.

Usability and Performance

Error Messages Are Classifiers: A Process to Design and Evaluate Error Messages
John Wrenn and Shriram Krishnamurthi
(Brown University, USA)
This paper presents a lightweight process to guide error report authoring. We take the perspective that error reports are really classifiers of program information. They should therefore be subjected to the same measures as other classifiers (e.g., precision and recall). We formalize this perspective as a process for assessing error reports, describe our application of this process to an actual programming language, and present a preliminary study on the utility of the resulting error reports.

You Can Have It All: Abstraction and Good Cache Performance
Juliana Franco, Martin Hagelin, Tobias Wrigstad

, Sophia Drossopoulou

, and Susan Eisenbach

(Imperial College London, UK; Dirac, Sweden; Uppsala University, Sweden)
On current architectures, the optimisation of an application's performance often involves data being stored according to access affinity --- what is accessed together should be stored together, rather than logical affinity --- what belongs together logically stays together. Such low level techniques lead to faster, but more error prone code, and end up tangling the program's logic with low-level data layout details.
Our vision, which we call SHAPES --- Safe, High-level, Abstractions for oPtimisation of mEmory cacheS --- is that the layout of a data structure should be defined only once, upon instantiation, and the remainder of the code should be layout agnostic. This enables performance improvements while also guaranteeing memory safety, and supports the separation of program logic from low level concerns. In this paper we investigate how this vision can be supported by extending a programming language.
We describe the core language features supporting this vision: classes can be customized to support different layouts, and layout information is carried around in types; the remaining source code is layout-unaware and the compiler emits layout-aware code. We then discuss our SHAPES implementation through a prototype library, which we also used for preliminary evaluations. Finally, we discuss how the core could be expanded so as to deliver SHAPES's full potential: the incorporation of compacting garbage collection, ad hoc polymorphism and late binding, synchronization of representations of different collections, support for dynamic change of representation, etc.

Garbology: A Study of How Java Objects Die
Raoul L. Veroy and Samuel Z. Guyer
(Tufts University, USA)
In this paper we present a study of how Java programs dispose of objects. Unlike prior work on object demographics and lifetime patterns, our goal is to precisely characterize the actions that cause objects to become unreachable. We use a recently-developed tracing tool, called Elephant Tracks, which can localize object deaths within a specific method and tell us the proximal cause. Our analysis centers around garbage clusters: groups of connected objects that become unreachable at precisely the same time due to a single program action. We classify these clusters using traditional features, such as size, allocation site, and lifetime, and using new ones, such as death site and cause of death. We then use this knowledge about garbage clusters in a new GC algorithm: the Cluster Aware Garbage Collection (CAGC) algorithm.
We present results for a set of standard benchmarks including SPECJVM98, SPECjbb, and DaCapo. We identify several patterns that could inform the design of new collectors or tuning of existing systems. Most garbage clusters are small, suggesting that these programs almost always dispose of large structures in a piecemeal fashion. In addition, most clusters die in one of only a dozen or so places in the program. Furthermore, these death sites are much more stable and predictable than object lifetimes. Finally we show that the CAGC algorithm can in certain cases improve GC performance.

New Languages

Infra: Structure All the Way Down: Structured Data as a Visual Programming Language
Christopher Hall, Trevor Standley, and Tobias Hollerer
(University of California at Santa Barbara, USA; Stanford University, USA)
We present Infra, a new baseline medium for representing data. With Infra, arbitrarily-complex structured data can be encoded, viewed, edited, and processed, all while remaining in an efficient non-textual form. It is suitable for the full range of information modalities, from free-form input, to compact schema-conforming structures. With its own equivalent of a text editor and text-field widget, Infra is designed to target the domain currently dominated by flat character strings while simultaneously enabling the expression of sub-structure, inter-reference, dynamic dependencies, abstraction, computation, and context (metadata).
Existing metaformats fit neatly into two categories. They are either textual for human readability (such as XML and JSON) or binary for compact serialization (such as Thrift and Protocol Buffers). In contrast, Infra unifies those two paradigms. In order to have the desirable properties of binary formats, Infra has no textual representation. And yet, it is designed to be easily read and authored by end-users.
We show how the organization Infra brings to data makes a new non-textual programming paradigm viable. Programs that modify data can now be embedded into the data itself. Furthermore, these programs can often be authored by demonstration. We argue that Infra can be used to improve existing software projects and that bringing direct authoring and human readability to a binary data paradigm could have rippling ramifications on the computing landscape.

Video

Info

Selfie and the Basics
Christoph M. Kirsch
(University of Salzburg, Austria)
Imagine a world in which virtually everyone at least intuitively understands the fundamental principles of information and computation. However, computer science, in research and in education, is still a young field compared to others and lacks maturity despite the enormous demand created by information technology. To address the problem we would like to encourage everyone in the computer science community to go back to their favorite topic and identify the absolute basics that they feel are essential for understanding the topic. We present here our experience in trying to do just that with programming languages and runtime systems as our favorite topic. We argue that understanding the construction of their semantics and the self-referentiality involved in that is essential for understanding computer science. We have developed selfie, a tiny self-compiling C compiler, self-executing MIPS emulator, and self-hosting MIPS hypervisor all implemented in a single, self-contained file using a tiny subset of C. Selfie has become the foundation of our classes on the design and implementation of programming languages and runtime systems. Teaching selfie has also helped us identify some of the absolute basics that we feel are essential for understanding computer science in general.

Info

Systems Level Liveness with Extempore
Andrew Sorensen and Henry Gardner
(Australian National University, Australia)
Live programs can be modified while the program is executing in order to provide a more reactive experience for the programmer. In demanding applications, such programs traditionally utilise pre-defined function calls to compiled libraries. We present a system that enables demanding live programs to be built where the supporting stack of libraries is, itself, live. In such situations, the top level code might be thought of as a simple ``live environment'' that can be created live and that encapsulates code that has ``bubbled-up'' from the supporting libraries. Our system enables this bubbling up to be achieved in an ad-hoc way and with minimal performance penalty. The deep, systems-level liveness that it exhibits is described and compared with other approaches to Live Coding and liveness generally.
The work described here has its origins in the artistic Live Coding of computer music and multimedia. We also discuss its wider uses including the development of interactive multimedia installations and the harnessing of scientific simulation.

Info

Essays

Some Were Meant for C: The Endurance of an Unmanageable Language
Stephen Kell
(University of Cambridge, UK)
The C language leads a double life: as an application programming language of yesteryear, perpetuated by circumstance; and as a systems programming language which remains a weapon of choice decades after its creation. This essay is a C programmer's reaction to the call to abandon ship. It questions several aspects commonly held to define the experience of using C; these include unsafety, undefined behaviour, and the motivation of performance. It argues all these are in fact inessential; rather, it traces C's ultimate strength to a communicative design which cannot be understood within the usual conception of "a programming language", but can be seen as the antithesis of so-called "managed" languages. This communicativity is understood as facilitating the essential aspect of system-building: creating parts which interact with other remote parts---being "alongside" not "within", and of "alien" origin.

Concept Analysis in Programming Language Research: Done Well It Is All Right
Antti-Juhani Kaijanaho
(University of Jyväskylä, Finland)
Programming language research is becoming method conscious. Rigorous mathematical or empirical evaluation is often demanded, which is a good thing. However, I argue in this essay that concept analysis is a legitimate research approach in programming languages, with important limitations. It can be used to sharpen vague concepts, and to expose distinctions that have previously been overlooked, but it does not demonstrate the superiority of one language design over another. Arguments and counter-arguments are essential to successful concept analysis, and such thoughtful conversations should be published more.

Panel

How Can Our Publication Models Best Serve Our Research? (Panel)
Robert Biddle
(Carleton University, Canada)
Research and publication have a long history together, with models that date back millennia. Our experience in Computer Science has much in common with other areas of scholarship, but also some distinctive characteristics. One issue discussed and explored extensively involves the role of conferences and journals. This panel has specific focus on how our publication model might best help the research itself. Issues that arise still include the pragmatic, but also a range of other topics particular to our discipline: the role of artifacts and empirical research, the potential for new tools and infrastructure similar to those we use for software, and how best to structure research publications to support practice.

Publisher's Version

Onward! 2017 – Proceedings

Frontmatter

Papers

Language Design

Program Generation and Synthesis

Programming Models

Usability and Performance

New Languages

Essays

Panel