Ensuring the integrity of software build artifacts is an increasingly important concern for modern software engineering, driven by increasingly sophisticated attacks on build systems, distribution channels, and development infrastructures. Reproducible builds -- where binaries built independently from the same source code can be verified to be bit-for-bit identical to the distributed artifacts -- provide a principled foundation for transparency and trust in software distribution. Despite their potential, the large-scale adoption of reproducible builds faces two significant challenges: achieving high reproducibility rates across vast software collections and establishing reproducibility monitoring infrastructure that can operate at very large scale. While recent studies have shown that high reproducibility rates are achievable at scale -- demonstrated by the Nix ecosystem achieving over 90% reproducibility on more than 80,000 packages -- the problem of effective reproducibility monitoring remains largely unsolved. In this work, we address the reproducibility monitoring challenge by introducing Lila, a decentralized system for reproducibility assessment tailored to the functional package management model. Lila enables distributed reporting of build results and aggregation into a reproducibility database, benefiting both practitioners and future empirical build reproducibility studies.
The reproducibility of software environments is a critical concern in modern software engineering, with ramifications ranging from the effectiveness of collaboration workflows to software supply chain security and scientific reproducibility. Containerization technologies like Docker address this problem by encapsulating software environments into shareable filesystem snapshots known as images. While Docker is frequently cited in the literature as a tool that enables reproducibility in theory, the extent of its guarantees and limitations in practice remains under-explored. In this work, we address this gap through two complementary approaches. First, we conduct a systematic literature review to examine how Docker is framed in scientific discourse on reproducibility and to identify documented best practices for writing Dockerfiles enabling reproducible image building. Then, we perform a large-scale empirical study of 5298 Docker builds collected from GitHub workflows. By rebuilding these images and comparing the results with their historical counterparts, we assess the real reproducibility of Docker images and evaluate the effectiveness of the best practices identified in the literature.
In March 2024, a backdoor was discovered in xz, a compression software widely used across Linux distributions. This work analyzes the backdoor's mechanics and explores how bitwise build reproducibility, systematically applied within the nixpkgs bootstrap, could have mechanically detected the compromise. It proposes a concrete verification scheme based on rebuilding suspect packages after the bootstrap from independently-sourced tarballs and checking for bitwise convergence.
Reproducible Builds (R-B) guarantee that rebuilding a software package from source leads to bitwise identical artifacts. R-B is a promising approach to increase the integrity of the software supply chain, when installing open source software built by third parties. Unfortunately, despite success stories like high build reproducibility levels in Debian packages, uncertainty remains among field experts on the scalability of R-B to very large package repositories. In this work, we perform the first large-scale study of bitwise reproducibility, in the context of the Nix functional package manager, rebuilding 709 816 packages from historical snapshots of the nixpkgs repository, the largest cross-ecosystem open source software distribution, sampled in the period 2017-2023. We obtain very high bitwise reproducibility rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%. We investigate unreproducibility causes, showing that about 15% of failures are due to embedded build dates. We release a novel dataset with all build statuses, logs, as well as full 'diffoscopes': recursive diffs of where unreproducible build artifacts differ.
Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FMPs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain.
Modern software engineering builds up on the composability of software components, that rely on more and more direct and transitive dependencies to build their functionalities. This principle of reusability however makes it harder to reproduce projects' build environments, even though reproducibility of build environments is essential for collaboration, maintenance and component lifetime. In this work, we argue that functional package managers provide the tooling to make build environments reproducible in space and time, and we produce a preliminary evaluation to justify this claim. Using historical data, we show that we are able to reproduce build environments of about 7 million Nix packages, and to rebuild 99.94% of the 14 thousand packages from a 6-year-old Nixpkgs revision.