Powered by
6th ACM SIGPLAN X10 Workshop (X10 2016),
June 14, 2016,
Santa Barbara, CA, USA
6th ACM SIGPLAN X10 Workshop (X10 2016)
Frontmatter
Welcome to X10'16, the 6th ACM SIGPLAN X10 Workshop, held in Santa
Barbara, California, USA, on June 14, 2016. The X10 Workshop
provides a forum for the X10 community to exchange their insights,
experiences, and plans. Since its inception in 2011, the X10 Workshop
has been co-located with the annual ACM SIGPLAN conference on
Programming Language Design and Implementation (PLDI).
Control Structure Overloading in X10
Louis Mandel, Josh Milthorpe, and
Olivier Tardieu
(IBM Research, USA)
The X10 programming language offers a simple but expressive model of concurrency and distribution. Domain Specific Languages embedded in X10 (eDSL) can build upon this model to offer scheduling and placement facilities tailored to particular patterns of applications, e.g. stencils or graph traversals. They exploit X10's rich type system and closures to offer flexible and precise functional interfaces, however, they are restricted by X10's rigid syntax. In this work, we propose an overloading mechanism enabling eDSLs to redefine or extend the behavior of X10 control structures. Loops can be parallelized or distributed. Exception handlers can triage and process exceptions arising from concurrent tasks. While our overloading mechanism requires augmenting the X10 syntax with new forms, the change to the syntax is small and intuitive. Overall, the combination of syntax and semantics we propose improves code readability over traditional X10 at no cost in run time performance.
@InProceedings{X1016p1,
author = {Louis Mandel and Josh Milthorpe and Olivier Tardieu},
title = {Control Structure Overloading in X10},
booktitle = {Proc.\ X10},
publisher = {ACM},
pages = {1--6},
doi = {},
year = {2016},
}
A Memory Model for X10
Andreas Zwinkau
(KIT, Germany)
A programming language used for concurrent shared-memory programs must specify its memory model for programmers to reason about the behavior of a program. Java and C++ have plugged this hole in their specifications, but not X10. This paper proposes a memory model for X10. Additionally, this serves as a case study of how the design goals of a language map to requirements for its memory model.
@InProceedings{X1016p7,
author = {Andreas Zwinkau},
title = {A Memory Model for X10},
booktitle = {Proc.\ X10},
publisher = {ACM},
pages = {7--12},
doi = {},
year = {2016},
}
Info
Cooperation vs. Coordination for Lifeline-Based Global Load Balancing in APGAS
Jonas Posner and Claudia Fohry
(University of Kassel, Germany)
Work stealing can be implemented in either a cooperative or a coordinated way. We compared the two approaches for lifeline-based global load balancing, which is the algorithm used by X10's Global Load Balancing framework GLB. We conducted our study with the APGAS library for Java, to which we ported GLB in a first step. Our cooperative variant resembles the original GLB framework, except that strict sequentialization is replaced by Java synchronization constructs such as critical sections. Our coordinated variant enables concurrent access to local task pools by using a split queue data structure. In experiments with modified versions of the UTS and BC benchmarks, the cooperative and coordinated APGAS variants had similar executions times, without a clear winner. Both variants outperformed the original GLB when compiled with Managed X10. Experiments were run on up to 128 nodes, to which we assigned up to 512 places.
@InProceedings{X1016p13,
author = {Jonas Posner and Claudia Fohry},
title = {Cooperation vs. Coordination for Lifeline-Based Global Load Balancing in APGAS},
booktitle = {Proc.\ X10},
publisher = {ACM},
pages = {13--17},
doi = {},
year = {2016},
}
Resilient X10 over MPI User Level Failure Mitigation
Sara S. Hamouda, Benjamin Herta, Josh Milthorpe,
David Grove, and
Olivier Tardieu
(Australian National University, Australia; IBM Research, USA)
Many PGAS languages and libraries rely on high performance transport layers such as GASNet and MPI to achieve low communication latency, portability and scalability. As systems increase in scale, failures are expected to become normal events rather than exceptions. Unfortunately, GASNet and standard MPI do not pro- vide fault tolerance capabilities. This limitation hinders PGAS languages and other high-level programming models from supporting resilience at scale. For this reason, Resilient X10 has previously been supported over sockets only, not over MPI. This paper describes the use of a fault tolerant MPI implementation, called ULFM (User Level Failure Mitigation), as a transport layer for Resilient X10. By providing fault tolerant collective and agreement algorithms, on demand failure propagation, and support for InfiniBand, ULFM provides the required infrastructure to create a high performance transport layer for Resilient X10. We show that replacing X10’s emulated collectives with ULFM’s blocking collectives results in significant performance improvements. For three iterative SPMD-style applications running on 1000 X10 places, the improvement ranged between 30% and 51%. The per-step overhead for resilience was less than 9%. A proposal for adding ULFM to the coming MPI-4 standard is currently under assessment by the MPI Forum. Our results show that adding user-level fault tolerance support in MPI makes it a suitable base for resilience in high-level programming models.
@InProceedings{X1016p18,
author = {Sara S. Hamouda and Benjamin Herta and Josh Milthorpe and David Grove and Olivier Tardieu},
title = {Resilient X10 over MPI User Level Failure Mitigation},
booktitle = {Proc.\ X10},
publisher = {ACM},
pages = {18--23},
doi = {},
year = {2016},
}
ActorX10: An Actor Library for X10
Sascha Roloff, Alexander Pöppl, Tobias Schwarzer, Stefan Wildermann, Michael Bader, Michael Glaß, Frank Hannig, and Jürgen Teich
(University of Erlangen-Nuremberg, Germany; TU Munich, Germany)
The APGAS programming model is a powerful computing paradigm for multi-core and massively parallel computer architectures. It allows for the dynamic creation and distribution of thousands of threads amongst hundreds of nodes in a cluster computer within a single application. For programs of such a complexity, appropriate higher level abstractions on computation and communication are necessary for performance analysis and optimization. In this work, we present actorX10, an X10 library of a formally specified actor model based on the APGAS principles. The realized actor model explicitly exposes communication paths and decouples these from the control flow of the concurrently executed application components. Our approach provides the right abstraction for a wide range of applications.
Its capabilities and advantages are introduced and demonstrated for two applications from the embedded system and HPC domain, i.e., an object detection chain and a proxy application for the simulation of tsunami events.
@InProceedings{X1016p24,
author = {Sascha Roloff and Alexander Pöppl and Tobias Schwarzer and Stefan Wildermann and Michael Bader and Michael Glaß and Frank Hannig and Jürgen Teich},
title = {ActorX10: An Actor Library for X10},
booktitle = {Proc.\ X10},
publisher = {ACM},
pages = {24--29},
doi = {},
year = {2016},
}
Info
SWE-X10: An Actor-Based and Locally Coordinated Solver for the Shallow Water Equations
Alexander Pöppl and Michael Bader
(TU Munich, Germany)
We present an X10 software package for the solution of the shallow water equations, a set of equations commonly used to simulate tsunami and flooding events. The software uses an actor-oriented approach to obtain a communication scheme that does not rely on central coordination. Instead, each actor only communicates with its neighbors. We evaluated the package via scaling tests on single-place shared memory as well as multi-place distributed memory system configurations, and found it to perform comparably to prior implementations based on C++, OpenMP and MPI.
@InProceedings{X1016p30,
author = {Alexander Pöppl and Michael Bader},
title = {SWE-X10: An Actor-Based and Locally Coordinated Solver for the Shallow Water Equations},
booktitle = {Proc.\ X10},
publisher = {ACM},
pages = {30--31},
doi = {},
year = {2016},
}
A Case for Distributed Work-Stealing in Regular Applications
Brendan Sheridan and Jeremy T. Fineman
(Georgetown University, USA)
This paper presents a dynamically heterogeneous architecture use-case that is both realistic and favorable for distributed work-stealing in regular parallel applications. Using a straightforward implementation of distributed dense matrix multiplication in X10's Global Load Balancing (GLB) library, we show that moderate differences in node processing power allow work-stealing to significantly outperform a standard static schedule such as SUMMA. It also scales comparably on up to 128 cores.
@InProceedings{X1016p32,
author = {Brendan Sheridan and Jeremy T. Fineman},
title = {A Case for Distributed Work-Stealing in Regular Applications},
booktitle = {Proc.\ X10},
publisher = {ACM},
pages = {32--33},
doi = {},
year = {2016},
}
proc time: 0.67