GPCE 2017
16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2017)

Powered by

Conference Publishing Consulting

16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2017), October 23–24, 2017, Vancouver, BC, Canada

GPCE 2017 – Proceedings

Contents - Abstracts - Authors

Frontmatter

Title Page

Message from the Chairs
Welcome to the 16th International Conference on Generative Programming: Concepts & Experiences (GPCE’17). GPCE continues to provide the premier venue for researchers and practitioners interested in techniques and tools for code generation, language implementation, and product-line development. In addition to exploring cutting-edge techniques of generative software, our goal is to foster further cross-fertilization between the software engineering and the programming languages research communities. This year, GPCE is co-located with SLE, the conference on software language engineering, and with various other events under the SPLASH umbrella. GPCE participants are invited to visit other sessions on the same day and vice versa. This provides the attendees of all events with a comprehensive overview of current research at the intersection of programming languages and software engineering.

Keynote

The Landscape of Refactoring Research in the Last Decade (Keynote)
Danny Dig
(Oregon State University, USA)
In the last decade refactoring research has seen an exponential growth. I will attempt to map this vast landscape and the advances that the community has made by answering questions such as who does what, when, where, with who, why, and how. I will muse on some of the factors contributing to the growth of the field, the adoption of research into industry, and the lessons that we learned along this journey. This will inspire and equip you so that you can make a difference, with people who make a difference, at a time when it makes a difference.

Publisher's Version

Staging

Refining Semantics for Multi-stage Programming
Rui Ge and Ronald Garcia

(University of British Columbia, Canada)
The multi-stage programming paradigm supports runtime code generation and execution. Though powerful, its potential is impeded by the lack of static analysis support. Van Horn and Might proposed a general-purpose approach to systematically develop static analyses by transforming an environmental abstract machine, which evolves a control string, an environment and a continuation as a program evaluates. To the best of our knowledge, no such semantics exists for a multi-stage language like MetaML.
We develop and prove correct an environmental abstract machine semantics for MetaML by gradually refining the reference substitutional structural operational semantics. Highlights of our approach include leveraging explicit substitutions to bridge the gap between substitutional and environmental semantics, and devising meta-environments to model the complexities of variable bindings in multi-stage environmental semantics.

Publisher's Version

Staging for Generic Programming in Space and Time
Georg Ofenbeck, Tiark Rompf

, and Markus Püschel

(ETH Zurich, Switzerland; Purdue University, USA)
Metaprogramming is among the most promising candidates to solve the abstraction vs performance trade-off that plagues software engineering through specialization. Metaprogramming has been used to enable low-overhead generic programming for a long time, with C++ templates being one of the most prominent examples. But often a single, fixed pattern of specialization is not enough, and more flexibility is needed. Hence, this paper seeks to apply generic programming techniques to challenges in metaprogramming, in particular to abstract over the execution stage of individual program expressions. We thus extend the scope of generic programming into the dimension of time. The resulting notion of stage polymorphism enables novel abstractions in the design of program generators, which we develop and explore in this paper. We present one possible implementation, in Scala using the lightweight modular staging (LMS) framework, and apply it to two important case studies: convolution on images and the fast Fourier transform (FFT).

Publisher's Version

Staging with Control: Type-Safe Multi-stage Programming with Control Operators
Junpei Oishi and Yukiyoshi Kameyama

(University of Tsukuba, Japan)
Staging allows a programmer to write domain-specific, custom code generators. Ideally, a programming language for staging provides all necessary features for staging, and at the same time, gives static guarantee for the safety properties of generated code including well typedness and well scopedness. We address this classic problem for the language with control operators, which allow code optimizations in a modular and compact way. Specifically, we design a staged programming language with the expressive control operators shift0 and reset0, which let us express, for instance, multi-layer let-insertion, while keeping the static guarantee of well typedness and well scopedness. For this purpose, we extend our earlier work on refined environment classifiers which were introduced for the staging language with state. We show that our language is expressive enough to express interesting code generation techniques, and that the type system enjoys type soundness. We also mention a type inference algorithm for our language under reasonable restriction.

Publisher's Version

Code Staging in GNU Guix
Ludovic Courtès
(Inria, France)
GNU Guix is a “functional” package manager that borrows from earlier work on Nix by Dolstra et al.. Guix implements high-level abstractions such as packages and operating system services as domain-specific languages (DSL) embedded in Scheme, and it also implements build actions and operating system orchestration in Scheme. This leads to a multi-tier programming environment where embedded code snippets are staged for eventual execution.
In this paper we present G-expressions or “gexps”. We explain our journey from traditional Lisp S-expressions to G-expressions, which augment the former with contextual information, and we discuss the implementation of gexps. We report on our experience using gexps in a variety of operating system use cases—from package build processes to system services. Gexps provide a novel way to cover many aspects of OS configuration in a single, multi-tier language while facilitating code reuse and code sharing.

Publisher's Version

Variability

A Classification of Variation Control Systems
Lukas Linsbauer, Thorsten Berger, and Paul Grünbacher
(JKU Linz, Austria; Chalmers University of Technology, Sweden; University of Gothenburg, Sweden)
Version control systems are an integral part of today's software and systems development processes. They facilitate the management of revisions (sequential versions) and variants (concurrent versions) of a system under development and enable collaboration between developers. Revisions are commonly maintained either per file or for the whole system. Variants are supported via branching or forking mechanisms that conceptually clone the whole system under development. It is known that such cloning practices come with disadvantages. In fact, while short-lived branches for isolated development of new functionality (a.k.a. feature branches) are well supported, dealing with long-term and fine-grained system variants currently requires employing additional mechanisms, such as preprocessors, build systems or custom configuration tools. Interestingly, the literature describes a number of variation control systems, which provide a richer set of capabilities for handling fine-grained system variants compared to the version control systems widely used today. In this paper we present a classification and comparison of selected variation control systems to get an understanding of their capabilities and the advantages they can offer. We discuss problems of variation control systems, which may explain their comparably low popularity. We also propose research activities we regard as important to change this situation.

Publisher's Version

Analyzing the Impact of Natural Language Processing over Feature Location in Models
Raúl Lapeña, Jaime Font, Óscar Pastor, and Carlos Cetina
(San Jorge University, Spain; Universitat Politècnica de València, Spain)
Feature Location (FL) is a common task in the Software Engineering field, specially in maintenance and evolution of software products. The results of FL depend in a great manner in the style in which Feature Descriptions and software artifacts are written. Therefore, Natural Language Processing (NLP) techniques are used to process them. Through this paper, we analyze the influence of the most common NLP techniques over FL in Conceptual Models through Latent Semantic Indexing, and the influence of human participation when embedding domain knowledge in the process. We evaluated the techniques in a real-world industrial case study in the rolling stocks domain.

Publisher's Version

How Preprocessor Annotations (Do Not) Affect Maintainability: A Case Study on Change-Proneness
Wolfram Fenske, Sandro Schulze, and Gunter Saake
(University of Magdeburg, Germany)
Preprocessor annotations (e.g., #ifdef in C) enable the development of similar, but distinct software variants from a common code base. One particularly popular preprocessor is the C preprocessor, cpp. But the cpp is also widely criticized for impeding software maintenance by making code hard to understand and change. Yet, evidence to support this criticism is scarce. In this paper, we investigate the relation between cpp usage and maintenance effort, which we approximate with the frequency and extent of source code changes. To this end, we mined the version control repositories of eight open- source systems written in C. For each system, we measured if and how individual functions use cpp annotations and how they were changed. We found that functions containing cpp annotations are generally changed more frequently and more profoundly than other functions. However, when accounting for function size, the differences disappear or are greatly diminished. In summary, with respect to the frequency and extent of changes, our findings do not support the criticism of the cpp regarding maintainability.

Publisher's Version

Types

Type Qualifiers as Composable Language Extensions
Travis Carlson and Eric Van Wyk

(University of Minnesota, USA)
This paper reformulates type qualifiers as language extensions that can be automatically and reliably composed. Type qualifiers annotate type expressions to introduce new subtyping relations and are powerful enough to detect many kinds of errors. Type qualifiers, as illustrated in our ableC extensible language framework for C, can introduce rich forms of concrete syntax, can generate dynamic checks on data when static checks are infeasible or not appropriate, and inject code that affects the program's behavior, for example for conversions of data or logging.
ableC language extensions to C are implemented as attribute grammar fragments and provide an expressive mechanism for type qualifier implementations to check for additional errors, e.g. dereferences to pointers not qualified by a "nonnull" qualifier, and report custom error messages. Our approach distinguishes language extension users from developers and provides modular analyses to developers to ensure that when users select a set of extensions to use, they will automatically compose to form a working compiler.

Publisher's Version

Accurate Reification of Complete Supertype Information for Dynamic Analysis on the JVM
Andrea Rosà

, Eduardo Rosales

, and Walter Binder

(University of Lugano, Switzerland)
Reflective supertype information (RSI) is useful for many instrumentation-based dynamic analyses on the Java Virtual Machine (JVM). On the one hand, while such information can be obtained when performing the instrumentation within the same JVM process executing the instrumented program, in-process instrumentation severely limits the code coverage of the analysis. On the other hand, performing the instrumentation in a separate process can achieve full code coverage, but complete RSI is generally not available, often requiring expensive runtime checks in the instrumented program. Providing accurate and complete RSI in the instrumentation process is challenging because of dynamic class loading and classloader namespaces. In this paper, we present a novel technique to accurately reify complete RSI in a separate instrumentation process. We implement our technique in the dynamic analysis framework DiSL and evaluate it on a task profiler, achieving speedups of up to 45% for an analysis with full code coverage.

Publisher's Version

Rewriting for Sound and Complete Union, Intersection and Negation Types
David J. Pearce
(Victoria University of Wellington, New Zealand)
Implementing the type system of a programming language is a critical task that is often done in an ad-hoc fashion. Whilst this makes it hard to ensure the system is sound, it also makes it difficult to extend as the language evolves. We are interested in describing type systems using declarative rewrite rules from which an implementation can be automatically generated. Whilst not all type systems are easily expressed in this manner, those involving unions, intersections and negations are well-suited for this.
In this paper, we consider a relatively complex type system involving unions, intersections and negations developed previously. This system was not developed with rewriting in mind, though clear parallels are immediately apparent from the original presentation. For example, the system presented required types be first converted into a variation on Disjunctive Normal Form. We identify that the original system can, for the most part, be reworked to enable a natural expression using declarative rewrite rules. We present an implementation of our rewrite rules in the Whiley Rewrite Language (WyRL), and report performance results compared with a hand-coded solution.

Publisher's Version

Performance

Quoted Staged Rewriting: A Practical Approach to Library-Defined Optimizations
Lionel Parreaux

, Amir Shaikhha, and Christoph E. Koch
(EPFL, Switzerland)
Staging has proved a successful technique for programmatically removing code abstractions, thereby allowing for faster program execution while retaining a high-level interface for the programmer. Unfortunately, techniques based on staging suffer from a number of problems — ranging from practicalities to fundamental limitations — which have prevented their widespread adoption. We introduce Quoted Staged Rewriting (QSR), an approach that uses type-safe, pattern matching-enabled quasiquotes to define optimizations. The approach is “staged” in two ways: first, rewrite rules can execute arbitrary code during pattern matching and code reconstruction, leveraging the power and flexibility of staging; second, library designers can orchestrate the application of successive rewriting phases (stages). The advantages of using quasiquote-based rewriting are that library designers never have to deal directly with the intermediate representation (IR), and that it allows for non-intrusive optimizations — in contrast with staging, it is not necessary to adapt the entire library and user programs to accommodate optimizations.
We show how Squid, a Scala macro-based framework, enables QSR and renders library-defined optimizations more practical than ever before: library designers write domain-specific optimizers that users invoke transparently on delimited portions of their code base. As a motivating example we describe an implementation of stream fusion (a well-known deforestation technique) that is both simpler and more powerful than the state of the art, and can readily be used by Scala programmers with no knowledge of metaprogramming.

Publisher's Version

Reducing Calling Convention Overhead in Object-Oriented Programming on Embedded ARM Thumb-2 Platforms
Joseph Caldwell and Shigeru Chiba

(University of Tokyo, Japan)
This paper examines the causes and extent of code size overhead caused by the ARM calling convention in Thumb-2 binaries. We show that binaries generated from C++ source files generally have higher amounts of calling convention overhead, and present a binary file optimizer to eliminate some of that overhead. Calling convention overhead can negatively impact power consumption, flash memory costs, and chip size in embedded or otherwise resource-constrained domains. This is particularly true on platforms using "compressed" instruction sets, such as the 16-bit ARM Thumb and Thumb-2 instruction sets, used in virtually all smartphones and in many other smaller-scale embedded devices. In this paper, we examine the extent of calling convention overhead in practical software, and compare the results of C and C++ programs, and find that C++ programs generally have a higher percentage of calling-convention overhead. Finally, we demonstrate a tool capable of eliminating some of this overhead, particularly in the case of C++ programs, by modifying the calling conventions on a per-procedure basis.

Publisher's Version

RaTrace: Simple and Efficient Abstractions for BVH Ray Traversal Algorithms
Arsène Pérard-Gayot, Martin Weier, Richard Membarth, Philipp Slusallek, Roland Leißa, and Sebastian Hack

(Saarland University, Germany; Bonn-Rhein-Sieg University of Applied Sciences, Germany; DFKI, Germany)
In order to achieve the highest possible performance, the ray traversal and intersection routines at the core of every high-performance ray tracer are usually hand-coded, heavily optimized, and implemented separately for each hardware platform—even though they share most of their algorithmic core. The results are implementations that heavily mix algorithmic aspects with hardware and implementation details, making the code non-portable and difficult to change and maintain.
In this paper, we present a new approach that offers the ability to define in a functional language a set of conceptual, high-level language abstractions that are optimized away by a special compiler in order to maximize performance. Using this abstraction mechanism we separate a generic ray traversal and intersection algorithm from its low-level aspects that are specific to the target hardware. We demonstrate that our code is not only significantly more flexible, simpler to write, and more concise but also that the compiled results perform as well as state-of-the-art implementations on any of the tested CPU and GPU platforms.

Publisher's Version

Towards Compositional and Generative Tensor Optimizations
Adilla Susungi, Norman A. Rink, Jerónimo Castrillón

, Immo Huismann, Albert Cohen, Claude Tadonki, Jörg Stiller, and Jochen Fröhlich
(MINES ParisTech, France; TU Dresden, Germany; Inria, France; ENS, France)
Many numerical algorithms are naturally expressed as operations on tensors (i.e. multi-dimensional arrays). Hence, tensor expressions occur in a wide range of application domains, e.g. quantum chemistry and physics; big data analysis and machine learning; and computational fluid dynamics. Each domain, typically, has developed its own strategies for efficiently generating optimized code, supported by tools such as domain-specific languages, compilers, and libraries. However, strategies and tools are rarely portable between domains, and generic solutions typically act as ''black boxes'' that offer little control over code generation and optimization. As a consequence, there are application domains without adequate support for easily generating optimized code, e.g. computational fluid dynamics. In this paper we propose a generic and easily extensible intermediate language for expressing tensor computations and code transformations in a modular and generative fashion. Beyond being an intermediate language, our solution also offers meta-programming capabilities for experts in code optimization. While applications from the domain of computational fluid dynamics serve to illustrate our proposed solution, we believe that our general approach can help unify research in tensor optimizations and make solutions more portable between domains.

Publisher's Version

Analysis and Testing

Four Languages and Lots of Macros: Analyzing Autotools Build Systems
Jafar M. Al-Kofahi, Suresh Kothari, and Christian Kästner

(Iowa State University, USA; Carnegie Mellon University, USA)
Build systems are crucial for software system development, however there is a lack of tool support to help with their high maintenance overhead. GNU Autotools are widely used in the open source community, but users face various challenges from its hard to comprehend nature and staging of multiple code generation steps, often leading to low quality and error-prone build code. In this paper, we present a platform, AutoHaven, to provide a foundation for developers to create analysis tools to help them understand, maintain, and migrate their GNU Autotools build systems. Internally it uses approximate parsing and symbolic analysis of the build logic. We illustrate the use of the platform with two tools: ACSense helps developers to better understand their build systems and ACSniff detects build smells to improve build code quality. Our evaluation shows that AutoHaven can support most GNU Autotools build systems and can detect build smells in the wild.

Publisher's Version

Avoiding Useless Mutants
Leonardo Fernandes, Márcio Ribeiro, Luiz Carvalho, Rohit Gheyi, Melina Mongiovi, André Santos, Ana Cavalcanti, Fabiano Ferrari, and José Carlos Maldonado
(Federal University of Pernambuco, Brazil; Federal University of Alagoas, Brazil; Federal University of Campina Grande, Brazil; University of York, UK; Federal University of São Carlos, Brazil; University of São Paulo, Brazil)
Mutation testing is a program-transformation technique that injects artificial bugs to check whether the existing test suite can detect them. However, the costs of using mutation testing are usually high, hindering its use in industry. Useless mutants (equivalent and duplicated) contribute to increase costs. Previous research has focused mainly on detecting useless mutants only after they are generated and compiled. In this paper, we introduce a strategy to help developers with deriving rules to avoid the generation of useless mutants. To use our strategy, we pass as input a set of programs. For each program, we also need a passing test suite and a set of mutants. As output, our strategy yields a set of useless mutants candidates. After manually confirming that the mutants classified by our strategy as "useless" are indeed useless, we derive rules that can avoid their generation and thus decrease costs. To the best of our knowledge, we introduce 37 new rules that can avoid useless mutants right before their generation. We then implement a subset of these rules in the MUJAVA mutation testing tool. Since our rules have been derived based on artificial and small Java programs, we take our MUJAVA version embedded with our rules and execute it in industrial-scale projects. Our rules reduced the number of mutants by almost 13% on average. Our results are promising because (i) we avoid useless mutants generation; (ii) our strategy can help with identifying more rules in case we set it to use more complex Java programs; and (iii) our MUJAVA version has only a subset of the rules we derived.

Publisher's Version

Silverchain: A Fluent API Generator
Tomoki Nakamaru, Kazuhiro Ichikawa, Tetsuro Yamazaki

, and Shigeru Chiba

(University of Tokyo, Japan)
This paper presents a tool named Silverchain, which generates class definitions for a fluent API from the grammar of the API. A fluent API is an API that is used by method chaining and its grammar is a BNF-like set of rules that defines method chains accepted in type checking. Fluent APIs generated by Silverchain provide two styles of APIs: One is for building a chain by concatenating all method calls in series. The other is for building a chain from partial chains by passing child chains to method calls in the parent chain as their arguments. To generate such a fluent API, Silverchain first translates given grammar into a set of deterministic pushdown automata without є-transitions, then encodes these automata into class definitions. Each constructed automata corresponds to a nonterminal in given grammar and recognizes symbol sequences produced from its corresponding nonterminal.

Publisher's Version

Parser Generation by Example for Legacy Pattern Languages
Vadim Zaytsev

(Raincode Labs, Belgium)
Most modern software languages enjoy relatively free and relaxed concrete syntax, with significant flexibility of formatting of the program/model/sheet text. Yet, in the dark legacy corners of software engineering there are still languages with a strict fixed column-based structure — the compromises of times long gone, attempting to combine some human readability with some ease of machine processing. In this paper, we consider an industrial case study for retirement of a legacy domain-specific language, completed under extreme circumstances: absolute lack of documentation, varying line structure, hierarchical blocks within one file, scalability demands for millions of lines of code, performance demands for manipulating tens of thousands multi-megabyte files, etc. However, the regularity of the language allowed to infer its structure from the available examples, automatically, and produce highly efficient parsers for it.

Publisher's Version

DSLs

A Haskell Compiler for Signal Transforms
Geoffrey Mainland and Jeremy Johnson
(Drexel University, USA)
Building a reusable, auto-tuning code generator from scratch is a challenging problem, requiring many careful design choices. We describe HSpiral, a Haskell compiler for signal transforms that builds on the foundational work of Spiral. Our design leverages many Haskell language features to ensure that our framework is reusable, flexible, and efficient. As well as describing the design of our system, we show how to extend it to support new classes of transforms, including the number-theoretic transform and a variant of the split-radix algorithm that results in reduced operation counts. We also show how to incorporate rewrite rules into our system to reproduce results from previous literature on code generation for the fast Fourier transform.
Although the Spiral project demonstrated significant advances in automatic code generation, it has not been widely used by other researchers. HSpiral is freely available under an MIT-style license, and we are actively working to turn it into a tool to further both our own research goals and to serve as a foundation for other research groups' work in developing new implementations of signal transform algorithms.

Publisher's Version

Automatic Generation of Virtual Learning Spaces Driven by CaVa^DSL: An Experience Report
Ricardo Giuliani Martini and Pedro Rangel Henriques
(University of Minho, Portugal)
Several applications are based on Domain-Specific Languages (DSL). They provide the right terminology to a peculiar problem/subject, because they use a particular domain vocabulary that defines abstract concepts, different from general-purpose languages. Aiming an easy generation of virtual Learning Spaces (LS) for the use of the responsible of institutional archives or museums, we have idealized and developed an external domain-specific language, called CaVa DSL, to describe, in an abstract level, virtual exhibition rooms in the museum curator's viewpoint, giving the curator the possibility to specify the virtual LS upon a domain ontology vocabulary. We also contribute with a set of processors that deal with CaVa DSL and generates virtual Learning Spaces, turning available the navigation over important and real information contained in archival documents to the public through virtual museums. To demonstrate the obtained results, we present a running example along the paper showing the virtual LS generation process.

Publisher's Version

Rewriting a Shallow DSL using a GHC Compiler Extension
Mark Grebe, David Young, and Andy Gill
(University of Kansas, USA)
Embedded Domain Specific Languages are a powerful tool for developing customized languages to fit specific problem domains. Shallow EDSLs allow a programmer to program using many of the features of a host language and its syntax, but sacrifice performance. Deep EDSLs provide better performance and flexibility, through the ability to manipulate the abstract syntax tree of the DSL program, but sacrifice syntactical similarity to the host language. Using Haskino, an EDSL designed for small embedded systems based on the Arduino line of microcontrollers, and a compiler plugin for the Haskell GHC compiler, we show a method for combining the best aspects of shallow and deep EDSLs. The programmer is able to write in the shallow EDSL, and have it automatically transformed into the deep EDSL. This allows the EDSL user to benefit from powerful aspects of the host language, Haskell, while meeting the demanding resource constraints of the small embedded processing environment.

Publisher's Version

proc time: 4.47