Reproducibility

Vaibify is built around the principle that every computational result should be reproducible from a single command. This page describes the tools and practices that make this possible.

The Reproducibility Stack

A Vaibify project captures four layers of provenance:

Environment – The Docker image pins the operating system, compilers, system libraries, Python version, and all package versions.
Code – container.conf lists every repository with its branch or tag, so the exact source code is recorded.
Pipeline – workflow.json defines the commands to run and their order, removing ambiguity about how results were produced.
Configuration – vaibify.yml records all settings, so a collaborator can rebuild the identical environment.

Together, these four files constitute a reproducibility manifest. Sharing them (or the repository that contains them) is sufficient for anyone with Docker to reproduce the results.

L1 precondition: workflows live inside a git repo

Vaibify enforces the lowest rung of the reproducibility ladder as a precondition, not a best practice. Every workflow must live inside a git repository — its project repo — which vaibify auto-detects as the git work tree enclosing the workflow.json file. A workflow saved to a directory that is not a git work tree is rejected at both creation and connect time with a clear error pointing the user to run git init. The dashboard cannot display a meaningful reproducibility level for code that cannot be committed, so asking for one would be dishonest.

The project-repo path is auto-detected once per connect via git rev-parse --show-toplevel, stamped on the in-memory workflow dict, and threaded through every subsequent status, badge, and manifest call. A single container may host multiple workflows in separate project-repo subdirectories (for example, a paper pipeline and a follow-on cross-system analysis that share the same dependency clones); the active workflow determines the scope of every per-file badge.

Test markers (the JSON files that record the last pytest outcome + output-file hashes for each step) live inside the project repo under .vaibify/test_markers/ and are committed alongside workflow.json. This makes a workflow’s verification state — which tests have run, what they produced, whether the outputs have drifted — reproducible from a fresh clone without rerunning anything.

AICS Level 3 — Reproducible

Vaibify targets AICS Level 3 (“Reproducible”) on the AI Containment Scale: third parties can confirm, at the bit level, that the artefacts they hold are byte-for-byte identical to the artefacts the original workflow produced. Level 3 is a claim about file-byte identity, not numerical re-derivation. Re-running the workflow on a different machine may produce slightly different bytes for the same inputs (CPU/BLAS variance, see Known limitations); the hashes recorded in MANIFEST.sha256 describe the bytes the original run produced, and those bytes can be redistributed and verified anywhere coreutils is installed.

The Reproducibility Envelope

An honest L3 claim covers three tiers. Vaibify writes one file per tier into the project repo, and each tier is independently verifiable with standard tools — vaibify is the orchestrator, not a dependency.

The envelope is regenerated automatically when the workflow transitions to all-green (every step fully verified), in addition to manual Archive button presses. This keeps the manifest in sync with the latest verified state without requiring the user to remember to trigger it.

Tier 1 — Artifacts (`MANIFEST.sha256`)

A GNU-coreutils shasum-format file at the project-repo root listing every declared workflow output (everything in each step’s saOutputFiles, saPlotFiles, and saDataFiles) by repo-relative POSIX path with its SHA-256 hash:

1a2b3c...  scripts/runAnalysis.py
4d5e6f...  data/results.csv
7g8h9i...  plots/figure1.pdf

Paths containing newlines or backslashes are encoded with the GNU escape convention: the line is prefixed with \ and the path itself has \\ for backslash and \n for newline. This prevents an attacker from forging a second manifest line by injecting a newline into a filename.

Written by fnWriteManifest and verified in-process by flistVerifyManifest. The file is also verifiable on any system that ships coreutils:

sha256sum -c MANIFEST.sha256

An architectural-invariants test enforces that every path-list field in workflow.json (saOutputFiles, saPlotFiles, saDataFiles, and any future addition) is reflected in MANIFEST.sha256 — guarding against silent under-tracking when the workflow schema is extended.

No vaibify install is required.

Tier 2 — Python dependencies (`requirements.lock`)

A pinned, hash-augmented Python dependency lockfile at the project-repo root. Generated by fnGenerateRequirementsLock which shells out to uv pip compile --generate-hashes against pyproject.toml (or requirements.in). Each entry pins an exact version and at least one --hash=sha256:... line.

Verifiers reproduce the environment with stock pip:

pip install --require-hashes -r requirements.lock

uv is needed only to generate the lockfile, never to consume it. flistVerifyRequirementsLock performs a structural check (file exists, parses, every entry carries a sha256 hash) without installing.

Tier 3 — Container / system layer (`.vaibify/environment.json`)

A JSON document at <projectRepo>/.vaibify/environment.json capturing the layers below the Python interpreter. Written by fnWriteEnvironmentJson from three orthogonal capture helpers:

fdictCaptureContainerImageDigest(sContainerName) — the immutable <image>@sha256:... digest of the running container image, via docker inspect.
fdictCaptureHostBinaryHashes(listBinaryPaths) — for each binary the workflow declares as a host-side dependency (e.g., a compiled scientific executable referenced from saHostBinaries in workflow.json), the SHA-256 of the file plus the first line of its --version output.
fdictCaptureSystemTools() — Python interpreter version, gcc --version, platform.libc_ver(), and the contents of /etc/os-release from inside the container.

This tier records what the container layer cannot pin by digest alone, without claiming to bit-pin floating-point arithmetic across CPU architectures.

The verification ceremony: `vaibify reproduce`

For users who want one command instead of three, vaibify reproduce walks the three tiers in sequence and (optionally) re-runs the workflow:

$ git clone <project-url> && cd <project>
$ vaibify reproduce
[1/4] Verifying file integrity (MANIFEST.sha256) ... 47/47 OK
[2/4] Reproducing Python env (requirements.lock) ... hashes verified OK
[3/4] Pulling pinned container image ... python@sha256:1a2b... OK
[4/4] Re-running workflow ... skipped (use --rerun)

L3 reproduction confirmed.

Flags:

--repo <path> — path to the project repo (defaults to the current directory).
--rerun / --no-rerun — also run step 4, the full workflow re-execution. Off by default; opt-in because workflows can be expensive and Tier 4 is best-effort (see Known limitations). When enabled, vaibify dispatches to the same pipeline runner that vaibify run uses, against a running container resolved from the project repo.
--skip-tier 1|2|3 — skip a tier; may be repeated. Useful when a verifier only wants to confirm artefact identity without installing Python packages.

Exit codes:

0 — every selected tier passed.
1 — at least one tier failed; per-tier diagnostics are printed above the final summary.
2 — usage error (a required input file is missing, or a malformed environment.json).

Trust-anchor architecture

vaibify reproduce is a convenience orchestrator, not the trust anchor. The trust anchor for Tier 1 is sha256sum -c MANIFEST.sha256, a coreutils binary every verifier already has. If vaibify reproduce is ever wrong, a third party verifying by hand catches the discrepancy. This is the load-bearing reason the AICS levels are defined independently of vaibify: it makes vaibify auditable rather than authoritative. The same independence applies to Tier 2 (pip install --require-hashes) and Tier 3 (docker pull <image>@sha256:...); each step can be performed manually by anyone who reads the three files.

Remote-mirror verification

When a workflow is pushed to a public mirror — GitHub, Overleaf, or Zenodo — vaibify verifies that the remote copy of every manifested file still matches the SHA-256 recorded at archive time. Each remote exposes a uniform fdictFetchRemoteHashes(...) API (githubMirror.py, overleafMirror.py, zenodoClient.py) that returns one SHA-256 per declared file. Two layers run on top:

Cheap poll — continuous, low-cost change detection (per-file blob SHA-1 or modified-time metadata). Flags “something might have drifted, re-verify.”
Authoritative verify — downloads bytes, recomputes SHA-256, compares against MANIFEST.sha256. Triggered by the per-remote Re-verify button in the dashboard or by the scheduled background loop in scheduledReverify.py. The cadence is currently a single global default (6 hours) set when the FastAPI app is constructed and applied uniformly to every loaded workflow; per-workflow overrides are deferred to a future commit.

Results are cached in <projectRepo>/.vaibify/syncStatus.json keyed by service so the dashboard always shows ground truth without a network round trip on every poll. See the dashboard guide for the resulting UI.

Known limitations

Symbolic links are rejected. fnWriteManifest raises ValueError if any declared output is a symlink. Following them silently would let the manifest hash a target the declared path no longer points to; refusing is the only honest behaviour.

Tier 1 is bit-perfect; re-running the workflow is best-effort. MANIFEST.sha256 records the exact bytes a particular run produced, and sha256sum -c confirms those bytes were preserved. Re-executing the workflow on a different CPU, BLAS implementation, or compiler toolchain may produce numerically near-identical but byte-different outputs because of floating-point order-of-operation variance. This is a science-of-reproducibility limitation, not a vaibify defect, and we document it rather than try to engineer around it. Tier 4 (workflow re-run via vaibify reproduce --rerun) is therefore advisory.

The unfixable failure mode. If vaibify reproduce itself is replaced by a tampered binary on the verifier’s machine, vaibify cannot detect that — the same problem every verification tool has, including a tampered sha256sum. The mitigation is the architectural one above: vaibify’s source is public, builds reproducibly, and any verifier can fall back to plain coreutils.

Publishing a Workflow

Generate a GitHub Actions workflow that automates the entire pipeline:

vaibify publish workflow

This reads workflow.json and vaibify.yml, renders the Jinja2 template at templates/workflow.yml.j2, and writes the result to .github/workflows/vaibify.yml.

The generated workflow:

Checks out the repository.
Installs Vaibify.
Builds the Docker image.
Runs each pipeline step inside the container.
Uploads artifacts (figures, data products) to GitHub Actions.

Archiving to Zenodo

Create a Zenodo deposit for long-term archival:

vaibify publish archive

This packages the Docker image, configuration files, and pipeline outputs into a tarball, uploads it to Zenodo (or the Zenodo sandbox, depending on the reproducibility.zenodoService setting), and returns a DOI.

Authentication with Zenodo is handled through the host’s credential manager. Vaibify never stores tokens in configuration files or environment variables.

Version Pinning

For maximum reproducibility, pin repository branches to specific tags or commit hashes in container.conf:

mycode|git@github.com:user/mycode.git|v1.2.3|pip_editable

The Docker image caches the cloned repositories, so rebuilding with vaibify build after changing a branch or tag will pull the updated code.

Network Isolation

Enable networkIsolation: true in vaibify.yml to disable outbound network access from the container. This ensures that the pipeline cannot download external resources at runtime, guaranteeing that all dependencies are captured in the image.