PROGJ – Journal Issue

Building a Secure Software Supply Chain with GNU Guix
Ludovic Courtès
(Inria, France)
The software supply chain is becoming a widespread analogy to designate the series of steps taken to go from source code published by developers to executables running on the users’ computers. A security vulnerability in any of these steps puts users at risk, and evidence shows that attacks on the supply chain are becoming more common. The consequences of an attack on the software supply chain can be tragic in a society that relies on many interconnected software systems, and this has led research interest as well as governmental incentives for supply chain security to rise.
GNU Guix is a software deployment tool and software distribution that supports provenance tracking, reproducible builds, and reproducible software environments. Unlike many software distributions, it consists exclusively of source code: it provides a set of package definitions that describe how to build code from source. Together, these properties set it apart from many deployment tools that center on the distribution of binaries.
This paper focuses on one research question: how can Guix and similar systems allow users to securely update their software? Guix source code is distributed using the Git version control system; updating Guix-installed software packages means, first, updating the local copy of the Guix source code. Prior work on secure software updates focuses on systems very different from Guix—systems such as Debian, Fedora, or PyPI where updating consists in fetching metadata about the latest binary artifacts available—and is largely inapplicable in the context of Guix. By contrast, the main threats for Guix are attacks on its source code repository, which could lead users to run inauthentic code or to downgrade their system. Deployment tools that more closely resemble Guix, from Nix to Portage, either lack secure update mechanisms or suffer from shortcomings.
Our main contribution is a model and tool to authenticate new Git revisions. We further show how, building on Git semantics, we build protections against downgrade attacks and related threats. We explain implementation choices. This work has been deployed in production two years ago, giving us insight on its actual use at scale every day. The Git checkout authentication at its core is applicable beyond the specific use case of Guix, and we think it could benefit to developer teams that use Git.
As attacks on the software supply chain appear, security research is now looking at every link of the supply chain. Secure updates are one important aspect of the supply chain, but this paper also looks at the broader context: how Guix models and implements the supply chain, from upstream source code to binaries running on computers. While much recent work focuses on attestation—certifying each link of the supply chain—Guix takes a more radical approach: enabling independent verification of each step, building on reproducible builds, “bootstrappable” builds, and provenance tracking. The big picture shows how Guix can be used as the foundation of secure software supply chains.

Publisher's Version

Artifact Reusable

Gradual Soundness: Lessons from Static Python
Kuang-Chen Lu, Ben Greenman

, Carl Meyer, Dino Viehland, Aniket Panse, and Shriram Krishnamurthi

(Brown University, USA; Meta, USA)
Context: Gradually-typed languages allow typed and untyped code to interoperate, but typically come with significant drawbacks. In some languages, the types are unreliable; in others, communication across type boundaries can be extremely expensive; and still others allow only limited forms of interoperability. The research community is actively seeking a sound, fast, and expressive approach to gradual typing.
Inquiry: This paper describes Static Python, a language developed by engineers at Instagram that has proven itself sound, fast, and reasonably expressive in production. Static Python’s approach to gradual types is essentially a programmer-tunable combination of the concrete and transient approaches from the literature. Concrete types provide full soundness and low performance overhead, but impose nonlocal constraints. Transient types are sound in a shallow sense and easier to use; they help to bridge the gap between untyped code and typed concrete code.
Approach: We evaluate the language in its current state and develop a model that captures the essence of its approach to gradual types. We draw upon personal communication, bug reports, and the Static Python regression test suite to develop this model.
Knowledge: Our main finding is that the gradual soundness that arises from a mix of concrete and transient types is an effective way to lower the maintenance cost of the concrete approach. We also find that method-based JIT technology can eliminate the costs of the transient approach. On a more technical level, this paper describes two contributions: a model of Static Python and a performance evaluation of Static Python. The process of formalization found several errors in the implementation, including fatal errors.
Grounding: Our model of Static Python is implemented in PLT Redex and tested using property-based soundness tests and 265 tests from the Static Python regression suite. This paper includes a small core of the model to convey the main ideas of the Static Python approach and its soundness. Our performance claims are based on production experience in the Instagram web server. Migrations to Static Python in the server have caused a 3.7% increase in requests handled per second at maximum CPU load.
Importance: Static Python is the first sound gradual language whose piece-meal application to a realistic codebase has consistently improved performance. Other language designers may wish to replicate its approach, especially those who currently maintain unsound gradual languages and are seeking a path to soundness.

Publisher's Version

Artifact Reusable

Compilation Forking: A Fast and Flexible Way of Generating Data for Compiler-Internal Machine Learning Tasks
Raphael Mosaner

, David Leopoldseder

, Wolfgang Kisling, Lukas Stadler

, and Hanspeter Mössenböck

(JKU Linz, Austria; Oracle Labs Vienna, Austria; Oracle Labs Linz, Austria)
Compiler optimization decisions are often based on hand-crafted heuristics centered around a few established benchmark suites. Alternatively, they can be learned from feature and performance data produced during compilation. However, data-driven compiler optimizations based on machine learning models require large sets of quality data for training in order to match or even outperform existing human-crafted heuristics. In static compilation setups, related work has addressed this problem with iterative compilation. However, a dynamic compiler may produce different data depending on dynamically-chosen compilation strategies, which aggravates the generation of comparable data. We propose compilation forking, a technique for generating consistent feature and performance data from arbitrary, dynamically-compiled programs. Different versions of program parts with the same profiling and compilation history are executed within single program runs to minimize noise stemming from dynamic compilation and the runtime environment. Our approach facilitates large-scale performance evaluations of compiler optimization decisions. Additionally, compilation forking supports creating domain-specific compilation strategies based on machine learning by providing the data for model training. We implemented compilation forking in the GraalVM compiler in a programming-language-agnostic way. To assess the quality of the generated data, we trained several machine learning models to replace compiler heuristics for loop-related optimizations. The trained models perform equally well to the highly-tuned compiler heuristics when comparing the geometric means of benchmark suite performances. Larger impacts on few single benchmarks range from speedups of 20% to slowdowns of 17%. The presented approach can be implemented in any dynamic compiler. We believe that it can help to analyze compilation decisions and leverage the use of machine learning into dynamic compilation.

Publisher's Version

PROGJ – Journal Issue

Frontmatter

Papers