Reproducibility
Vaibify is built around the principle that every computational result should be reproducible from a single command. This page describes the tools and practices that make this possible.
The Reproducibility Stack
A Vaibify project captures four layers of provenance:
Environment – The Docker image pins the operating system, compilers, system libraries, Python version, and all package versions.
Code –
container.conflists every repository with its branch or tag, so the exact source code is recorded.Pipeline –
workflow.jsondefines the commands to run and their order, removing ambiguity about how results were produced.Configuration –
vaibify.ymlrecords all settings, so a collaborator can rebuild the identical environment.
Together, these four files constitute a reproducibility manifest. Sharing them (or the repository that contains them) is sufficient for anyone with Docker to reproduce the results.
L1 precondition: workflows live inside a git repo
Vaibify enforces the lowest rung of the reproducibility ladder as a
precondition, not a best practice. Every workflow must live inside a
git repository — its project repo — which vaibify auto-detects as
the git work tree enclosing the workflow.json file. A workflow saved
to a directory that is not a git work tree is rejected at both
creation and connect time with a clear error pointing the user to run
git init. The dashboard cannot display a meaningful reproducibility
level for code that cannot be committed, so asking for one would be
dishonest.
The project-repo path is auto-detected once per connect via
git rev-parse --show-toplevel, stamped on the in-memory workflow
dict, and threaded through every subsequent status, badge, and
manifest call. A single container may host multiple workflows in
separate project-repo subdirectories (for example, a paper pipeline
and a follow-on cross-system analysis that share the same dependency
clones); the active workflow determines the scope of every per-file
badge.
Test markers (the JSON files that record the last pytest outcome +
output-file hashes for each step) live inside the project repo under
.vaibify/test_markers/ and are committed alongside workflow.json.
This makes a workflow’s verification state — which tests have run,
what they produced, whether the outputs have drifted — reproducible
from a fresh clone without rerunning anything.
AICS Level 3 — Reproducible
Vaibify targets AICS Level 3 (“Reproducible”) on the AI
Containment Scale: third parties can confirm, at the bit level, that
the artefacts they hold are byte-for-byte identical to the artefacts
the original workflow produced. Level 3 is a claim about file-byte
identity, not numerical re-derivation. Re-running the workflow on a
different machine may produce slightly different bytes for the same
inputs (CPU/BLAS variance, see Known
limitations); the hashes recorded in
MANIFEST.sha256 describe the bytes the original run produced, and
those bytes can be redistributed and verified anywhere coreutils is
installed.
The Reproducibility Envelope
An honest L3 claim covers three tiers. Vaibify writes one file per tier into the project repo, and each tier is independently verifiable with standard tools — vaibify is the orchestrator, not a dependency.
The envelope is regenerated automatically when the workflow transitions to all-green (every step fully verified), in addition to manual Archive button presses. This keeps the manifest in sync with the latest verified state without requiring the user to remember to trigger it.
Tier 1 — Artifacts (MANIFEST.sha256)
A GNU-coreutils shasum-format file at the project-repo root listing
every declared workflow output (everything in each step’s
saOutputFiles, saPlotFiles, and saDataFiles) by repo-relative
POSIX path with its SHA-256 hash:
1a2b3c... scripts/runAnalysis.py
4d5e6f... data/results.csv
7g8h9i... plots/figure1.pdf
Paths containing newlines or backslashes are encoded with the GNU
escape convention: the line is prefixed with \ and the path itself
has \\ for backslash and \n for newline. This prevents an
attacker from forging a second manifest line by injecting a newline
into a filename.
Written by
fnWriteManifest and
verified in-process by flistVerifyManifest. The file is also
verifiable on any system that ships coreutils:
sha256sum -c MANIFEST.sha256
An architectural-invariants test enforces that every path-list field
in workflow.json (saOutputFiles, saPlotFiles, saDataFiles,
and any future addition) is reflected in MANIFEST.sha256 — guarding
against silent under-tracking when the workflow schema is extended.
No vaibify install is required.
Tier 2 — Python dependencies (requirements.lock)
A pinned, hash-augmented Python dependency lockfile at the project-repo
root. Generated by
fnGenerateRequirementsLock
which shells out to uv pip compile --generate-hashes against
pyproject.toml (or requirements.in). Each entry pins an exact
version and at least one --hash=sha256:... line.
Verifiers reproduce the environment with stock pip:
pip install --require-hashes -r requirements.lock
uv is needed only to generate the lockfile, never to consume it.
flistVerifyRequirementsLock performs a structural check (file
exists, parses, every entry carries a sha256 hash) without installing.
Tier 3 — Container / system layer (.vaibify/environment.json)
A JSON document at <projectRepo>/.vaibify/environment.json capturing
the layers below the Python interpreter. Written by
fnWriteEnvironmentJson
from three orthogonal capture helpers:
fdictCaptureContainerImageDigest(sContainerName)— the immutable<image>@sha256:...digest of the running container image, viadocker inspect.fdictCaptureHostBinaryHashes(listBinaryPaths)— for each binary the workflow declares as a host-side dependency (e.g., a compiled scientific executable referenced fromsaHostBinariesinworkflow.json), the SHA-256 of the file plus the first line of its--versionoutput.fdictCaptureSystemTools()— Python interpreter version,gcc --version,platform.libc_ver(), and the contents of/etc/os-releasefrom inside the container.
This tier records what the container layer cannot pin by digest alone, without claiming to bit-pin floating-point arithmetic across CPU architectures.
The verification ceremony: vaibify reproduce
For users who want one command instead of three, vaibify reproduce walks the three tiers in sequence and (optionally) re-runs the workflow:
$ git clone <project-url> && cd <project>
$ vaibify reproduce
[1/4] Verifying file integrity (MANIFEST.sha256) ... 47/47 OK
[2/4] Reproducing Python env (requirements.lock) ... hashes verified OK
[3/4] Pulling pinned container image ... python@sha256:1a2b... OK
[4/4] Re-running workflow ... skipped (use --rerun)
L3 reproduction confirmed.
Flags:
--repo <path>— path to the project repo (defaults to the current directory).--rerun/--no-rerun— also run step 4, the full workflow re-execution. Off by default; opt-in because workflows can be expensive and Tier 4 is best-effort (see Known limitations). When enabled, vaibify dispatches to the same pipeline runner thatvaibify runuses, against a running container resolved from the project repo.--skip-tier 1|2|3— skip a tier; may be repeated. Useful when a verifier only wants to confirm artefact identity without installing Python packages.
Exit codes:
0— every selected tier passed.1— at least one tier failed; per-tier diagnostics are printed above the final summary.2— usage error (a required input file is missing, or a malformedenvironment.json).
Trust-anchor architecture
vaibify reproduce is a convenience orchestrator, not the trust
anchor. The trust anchor for Tier 1 is sha256sum -c MANIFEST.sha256,
a coreutils binary every verifier already has. If vaibify reproduce is ever wrong, a third party verifying by hand catches the
discrepancy. This is the load-bearing reason the AICS levels are
defined independently of vaibify: it makes vaibify auditable rather
than authoritative. The same independence applies to Tier 2 (pip install --require-hashes) and Tier 3 (docker pull <image>@sha256:...); each step can be performed manually by anyone
who reads the three files.
Remote-mirror verification
When a workflow is pushed to a public mirror — GitHub, Overleaf, or
Zenodo — vaibify verifies that the remote copy of every manifested
file still matches the SHA-256 recorded at archive time. Each remote
exposes a uniform fdictFetchRemoteHashes(...) API
(githubMirror.py,
overleafMirror.py,
zenodoClient.py) that
returns one SHA-256 per declared file. Two layers run on top:
Cheap poll — continuous, low-cost change detection (per-file blob SHA-1 or modified-time metadata). Flags “something might have drifted, re-verify.”
Authoritative verify — downloads bytes, recomputes SHA-256, compares against
MANIFEST.sha256. Triggered by the per-remote Re-verify button in the dashboard or by the scheduled background loop in scheduledReverify.py. The cadence is currently a single global default (6 hours) set when the FastAPI app is constructed and applied uniformly to every loaded workflow; per-workflow overrides are deferred to a future commit.
Results are cached in <projectRepo>/.vaibify/syncStatus.json keyed
by service so the dashboard always shows ground truth without a
network round trip on every poll. See the dashboard
guide for the resulting UI.
Known limitations
Symbolic links are rejected.
fnWriteManifest raises
ValueError if any declared output is a symlink. Following them
silently would let the manifest hash a target the declared path no
longer points to; refusing is the only honest behaviour.
Tier 1 is bit-perfect; re-running the workflow is best-effort.
MANIFEST.sha256 records the exact bytes a particular run produced,
and sha256sum -c confirms those bytes were preserved. Re-executing
the workflow on a different CPU, BLAS implementation, or compiler
toolchain may produce numerically near-identical but
byte-different outputs because of floating-point order-of-operation
variance. This is a science-of-reproducibility limitation, not a
vaibify defect, and we document it rather than try to engineer around
it. Tier 4 (workflow re-run via vaibify reproduce --rerun) is
therefore advisory.
The unfixable failure mode. If vaibify reproduce itself is
replaced by a tampered binary on the verifier’s machine, vaibify
cannot detect that — the same problem every verification tool has,
including a tampered sha256sum. The mitigation is the architectural
one above: vaibify’s source is public, builds reproducibly, and any
verifier can fall back to plain coreutils.
Publishing a Workflow
Generate a GitHub Actions workflow that automates the entire pipeline:
vaibify publish workflow
This reads workflow.json and vaibify.yml, renders the Jinja2
template at templates/workflow.yml.j2, and writes the result to
.github/workflows/vaibify.yml.
The generated workflow:
Checks out the repository.
Installs Vaibify.
Builds the Docker image.
Runs each pipeline step inside the container.
Uploads artifacts (figures, data products) to GitHub Actions.
Archiving to Zenodo
Create a Zenodo deposit for long-term archival:
vaibify publish archive
This packages the Docker image, configuration files, and pipeline outputs
into a tarball, uploads it to Zenodo (or the Zenodo sandbox, depending on
the reproducibility.zenodoService setting), and returns a DOI.
Authentication with Zenodo is handled through the host’s credential manager. Vaibify never stores tokens in configuration files or environment variables.
Version Pinning
For maximum reproducibility, pin repository branches to specific tags or
commit hashes in container.conf:
mycode|git@github.com:user/mycode.git|v1.2.3|pip_editable
The Docker image caches the cloned repositories, so rebuilding with
vaibify build after changing a branch or tag will pull the updated
code.
Network Isolation
Enable networkIsolation: true in vaibify.yml to disable outbound
network access from the container. This ensures that the pipeline cannot
download external resources at runtime, guaranteeing that all dependencies
are captured in the image.