Powered by
Conference Publishing Consulting

15th Symposium on Database Programming Languages (DBPL 2015), October 27, 2015, Pittsburgh, PA, USA

DBPL 2015 – Proceedings

Contents - Abstracts - Authors

Frontmatter

Title Page


Foreword
This volume contains the papers presented at the 15th International Symposium on Database Programming Languages (DBPL 2015), held in Pittsburgh, Pennsylvania, October 27, 2015. For over 25 years, DBPL has established itself as the principal venue for publishing and discussing new ideas at the intersection of databases and programming languages. Many key contributions in query languages for object-oriented data, persistent databases, nested relational data, and semistructured data, as well as fundamental ideas in types for query languages, were first announced at DBPL. Today, this creative research area is broadening into a subfield of data-centric computation, currently scattered among a range of venues. DBPL is an established destination for such new ideas and solicits submissions from researchers in databases, programming languages or any other community interested in the design, implementation or foundations of data-centric computation.

Invited Talk

The Gremlin Graph Traversal Machine and Language (Invited Talk)
Marko A. Rodriguez
(DataStax, USA)
Gremlin is a graph traversal machine and language designed, developed, and distributed by the Apache TinkerPop project. Gremlin, as a graph traversal machine, is composed of three interacting components: a graph, a traversal, and a set of traversers. The traversers move about the graph according to the instructions specified in the traversal, where the result of the computation is the ultimate locations of all halted traversers. A Gremlin machine can be executed over any supporting graph computing system such as an OLTP graph database and/or an OLAP graph processor. Gremlin, as a graph traversal language, is a functional language implemented in the user's native programming language and is used to define the traversal of a Gremlin machine. This article provides a mathematical description of Gremlin and details its automaton and functional properties. These properties enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, hybrid depth- and breadth-first evaluation, as well as the existence of a Universal Gremlin Machine and its respective entailments.

Full Papers 1

A Common Data Manipulation Language for Nested Data in Heterogeneous Environments
João Costa Seco, Hugo Lourenço, and Paulo Ferreira
(Universidade Nova de Lisboa, Portugal; OutSystems, Portugal)
One key aspect of data-centric applications is the manipu- lation of persistent data repositories, which is moving fast from querying a centralized relational database to the ad- hoc combination of constellations of data sources. Query languages are being typefuly integrated in host, general purpose, languages in order to increase reasoning and optimizing capabilities of interpreters and compilers. However, not much is being done to integrate and orches- trate different and separate sources of data. We present a common data manipulation language, that abstracts the nature and localization of the data-sources. We define its semantics and a type directed compilation, query optimization, and query orchestration mechanism to be used in development tools for heterogeneous environments. We provide type safety and language integration. Our approach is also suitable for an interactive query construction environment by rich user interfaces that pro- vide immediate feedback on data manipulation operations. This approach is currently the base for the data layer of a development platform for mobile and web applications.

Relational Foundations for Functorial Data Migration
David I. Spivak and Ryan Wisnesky
(Massachusetts Institute of Technology, USA)
We study the data transformation capabilities associated with schemas that are presented by directed multi-graphs and path equations. Unlike most approaches which treat graph-based schemas as abbreviations for relational schemas, we treat graph-based schemas as categories. A schema S is a finitely-presented category, and the collection of all S-instances forms a category, S-inst. A functor F between schemas S and T, which can be generated from a visual mapping between graphs, induces three adjoint data migration functors, Sigma_F : S-inst -> T-inst, Pi_F : S-inst -> T-inst, and Delta_F : T-inst -> S-inst. We present an algebraic query language FQL based on these functors, prove that FQL is closed under composition, prove that FQL can be implemented with the select-project-product-union relational algebra (SPCU) extended with a key-generation operation, and prove that SPCU can be implemented with FQL.

Abstract Rewriting Approach to Solve Datalog Programs
Fernando Tarin Morales, Fuyuki Ishikawa, and Shinichi Honiden
(University of Tokyo, Japan; National Institute of Informatics, Japan)
Over the past decade, we have seen a resurgence in the Datalog language in different computing areas for solving a number of non- trivial problems. In this paper we introduce a novel resolution ap- proach to solve Datalog programs. We present a version of the technique that works on plain Datalog programs. We have devel- oped an abstract rewriting formalism to create a functional reso- lution process for Datalog. The resolution approach translates the Datalog resolution strategy into a fix-point abstract rewriting equa- tion system. Being an abstract rewriting formalism, every equation of the system can be viewed as a function. Based on this fact, we also developed an optimization process that improves the initial rewriting equation system. The optimization process generates an equation system that computes the solutions much more efficiently. Well known optimizations such as strength reduction or memoiza- tion have been used. We also developed a prototype compiler that encodes the optimized equation system into a solver. Experimental results obtained with the solver suggest execution times several or- ders of magnitude better than modern Prolog solvers like Y AP or X SB and usually one order of magnitude faster than state-of-the-art Datalog solvers such as B DDBDDB and DLV.

Short Papers

Requesting Heterogeneous Data Sources with Array Comprehensions in Hop.js
Yoann Couillec and Manuel Serrano
(INRIA, France)
During the past few years the volume of accumulated data has increased dramatically. New kinds of data stores have emerged as NoSQL family stores. Many modern applications now collect, analyze, and produce data from several heterogeneous sources. However implementing such applications is still difficult because of lack of appropriate tools and formalisms. We propose a solution to this problem in the context of the JavaScript programming language by extending array comprehensions. Our extension allows programmers to query data from usual stores, such as SQL databases, NoSQL databases, Semantic Web data repositories, Web pages, or even custom user defined data structures. The extension has been implemented in the Hop.js system. It is the subject of this paper.

A Datalog-Based Protocol for Lazy Data Migration in Agile NoSQL Application Development
Stefanie Scherzinger, Uta Störl, and Meike Klettke
(Regensburg University of Applied Sciences, Germany; Darmstadt University of Applied Sciences, Germany; University of Rostock, Germany)
We address a practical challenge in agile web development against NoSQL data stores: Upon a new release of the web application, entities already persisted in production no longer match the application code. Rather than migrating all legacy entities eagerly (prior to the release) and at the cost of application downtime, lazy data migration is a popular alternative: When a legacy entity is loaded by the application, all pending structural changes are applied. Yet correctly migrating legacy data from several releases back, involving more than one entity at-a-time, is not trivial. In this paper, we propose a holistic Datalog ¬non-rec model model for reading, writing, and migrating data. In implementing our model, we may blend established Datalog evaluation algorithms, such as an incremental evaluation with certain rules evaluated bottom-up, and certain rules evaluated top-down with sideways information passing. Our systematic approach guarantees that from the viewpoint of the application, it remains transparent whether data is migrated eagerly or lazily.

Function Inlining in XQuery 3.0 Optimization
Leonard Wörteler, Michael Grossniklaus, Christian Grün, and Marc H. Scholl
(University of Konstanz, Germany)
Originally developed as a query language for XML databases, XQuery has evolved into a complete functional programming language. In order to unlock all optimization opportunities, XQuery processors therefore need to combine traditional query optimization with techniques used in optimizing compilers. In this paper, we discuss how the well-known technique of function inlining can be applied to XQuery. We present an implementation of function inlining based on the query processor of BaseX, an open-source XML database. Finally, a detailed quantitative evaluation demonstrates that the performance benefits obtained by blending compiler and query optimizer techniques surpass results from any one single technique.

Full Papers 2

Using Dependent Types and Tactics to Enable Semantic Optimization of Language-Integrated Queries
Gregory Malecha and Ryan Wisnesky
(University of California at San Diego, USA; Massachusetts Institute of Technology, USA)
Semantic optimization --- the use of data integrity constraints to optimize relational queries --- has been well studied but, owing to limitations in how SQL handles constraints, has not often been applied by mainstream RDBMSs. In a language-integrated query setting, however, the query provider is free to rewrite queries before they are executed on an RDBMS. We show, using Coq as our ambient language, how to use dependent types to represent a well known class of constraints --- embedded, implicational dependencies --- and how Coq tactics can be used to implement a particular kind of semantic optimization: tableaux minimization, which minimizes the number of joins required by a query.

Relative Expressive Power of Downward Fragments of Navigational Query Languages on Trees and Chains
Jelle Hellings, Marc Gyssens, Yuqing Wu, Dirk Van Gucht, Jan Van den Bussche, Stijn Vansummeren, and George H. L. Fletcher
(University of Hasselt, Belgium; Transnational University of Limburg, Belgium; Pomona College, USA; Indiana University, USA; Université Libre de Bruxelles, Belgium; Eindhoven University of Technology, Netherlands)
Motivated by the continuing interest in the tree data model, we study the expressive power of downward fragments of navigational query languages on trees. The basic navigational query language we consider expresses queries by building binary relations from the edge relations and the identity relation, using composition and union. We study the effects on the expressive power when we add transitive closure, projections, coprojections, intersection, and difference. We study expressiveness at the level of boolean queries and path queries, on labeled and unlabeled trees, and on labeled and unlabeled chains. In all these cases, we are able to present the complete Hasse diagram of relative expressiveness. In particular, we were able to decide, for each fragment of the navigational query languages that we study, whether it is closed under difference and intersection when applied on trees.

Typing Regular Path Query Languages for Data Graphs
Dario Colazzo and Carlo Sartiani
(Paris Dauphine University, France; University of Basilicata, Italy)
Regular path query languages for data graphs are essentially untyped. The lack of type information greatly limits the optimization opportunities for query engines and makes application development more complex. In this paper we discuss a simple, yet expressive, schema language for edge-labelled data graphs. This schema language is, then, used to define a query type inference approach with good precision properties.

proc time: 0.73