Glaas minimal logo, light

Reproducibility

On this page

Reproducibility is one of the reasons to record lineage — knowing exactly what went into a model or dataset is another (did you accidentally train on test data?), and that provenance is what TReqs builds on. This page is about the reproducibility side: what roar can and can't promise, the distinction that matters most (reproducibility vs. recreatability), why some runs won't come back the same, and the handful of habits that keep your work reproducible. It ends with a short how-to for roar reproduce.

Reproducibility vs. recreatability

These are often used interchangeably; roar keeps them apart because the difference is where most surprises live.

  • Recreatability — you can re-run the recipe. The code, the inputs, the environment, and the steps are all recoverable, so the pipeline can be executed again on another machine. This is what a lineage DAG captures: a recipe for rebuilding an artifact.
  • Reproducibility — you re-run the recipe and get the same bytes. The output hash matches the original.

Recreatability is a property of the record — did roar capture everything needed to run it again? Reproducibility is a property of the run — given a faithful recreation, does the pipeline produce an identical result? A lineage can be perfectly recreatable and still not reproduce byte-for-byte, because the pipeline itself is non-deterministic.

roar's job is to make your work recreatable — to record the code, inputs, and environment honestly, and to tell you when something is missing. Whether a recreated run reproduces the original bytes is up to the pipeline.

Non-determinism

A recreatable pipeline that reads the same inputs and runs the same code can still produce different output. Common sources:

  • Unseeded randomness — RNGs without a fixed seed, random data shuffling, dropout.
  • Wall-clock time and timestamps — anything that embeds "now" into an output.
  • Concurrency — thread and process scheduling, non-deterministic reduction order.
  • Hardware and kernels — GPU floating-point atomics, BLAS variants, differing CPU instruction sets.
  • Dependency drift — a package range that resolves to a newer build than the one recorded.

roar does not make a non-deterministic pipeline deterministic — nothing can, short of changing the pipeline. What it does is remove every other variable: pinned packages, the recorded interpreter, the exact commit. When a recreated run still differs, you know the cause is the pipeline's own non-determinism, not a missing input or the wrong code. That is a precise, useful answer.

If byte-identical output matters to you, make the pipeline deterministic at the source: seed your RNGs, avoid embedding timestamps, and pin anything that floats.

Sourced and unsourced inputs

For a run to be recreatable, every input has to be recoverable on another machine. roar divides inputs into two kinds:

  • Sourced — the input has a tracked origin. Either a previous job in the lineage produced it, or it's covered by the recorded git commit. Anyone who recreates the lineage gets it.
  • Unsourced — nothing tracked produced it and it isn't covered by git. It pre-existed the run as a loose file on your machine, so it won't exist anywhere else.

An unsourced input is the most common reason a lineage that looks complete won't actually reproduce. The classic case is a file in /tmp, a manually-downloaded dataset sitting outside the repo, or a script you ran directly (roar run ./script.sh) that was never committed — the program itself is an input, and a loose one is unsourced.

roar surfaces these as you go:

roar inputs --unsourced <ref>    # list a run's unsourced inputs

and warns at run time when a job reads one. To fix an unsourced input, give it a tracked origin:

  • Code and configs — commit them to the git repository.
  • External data — ingest it through roar so the retrieval is recorded: roar get <url> or roar run wget <url>. Now a Get job is the input's tracked producer.

A /tmp file is doubly fragile — even setting aside sourcing, it's ephemeral and likely gone by the next run. Intermediate outputs in particular should live in your project, not /tmp.

Artifact hygiene

A few habits keep lineage reproducible:

  • Commit your code before the run. A dirty working tree means the commit alone no longer describes what executed — see the FAQ on why a dirty tree counts as not reproducible.
  • Keep inputs inside the project or ingest them. Anything read from outside the repo and not pulled in with roar get is unsourced.
  • Don't park intermediates in /tmp. Write them where they'll survive and be tracked.
  • Write outputs into tracked directories. Git recreates a directory on a clean checkout only if it tracks something inside it. An output written to an untracked directory (or to a path outside the repo) may not exist when the run is reproduced elsewhere. roar status --untracked-dirs lists the offenders; a committed .gitkeep (or any tracked file) in the directory is enough.
  • Prefer project-relative paths over absolute, machine-local ones. A path like /home/you/scratch/data.csv doesn't travel; data/input.csv under the repo does.
  • Pin what floats. Reproduction reinstalls the recorded packages; the tighter your declared versions, the closer the recreation.

The reproducibility checklist

roar register and roar put print this checklist as a publish receipt, evaluated over the artifact's transitive lineage, so you can see at a glance whether a result will come back:

Reproducibility — 6/7
  [✅] code committed to git
  [✅] single git commit across all steps
  [❌] commit reachable on a remote
  [✅] all inputs sourced
  [✅] all artifact paths in tracked directories
  [✅] runtime captured (interpreter + packages)
  [✅] lineage saved on glaas.ai

Each item is a precondition for recreating the run elsewhere: the code is committed, the steps share one commit, that commit is fetchable from a remote, every input is sourced, every artifact lands in a directory a clean checkout will recreate, the interpreter and packages were captured, and the record is published. An unchecked item isn't a failure — roar never blocks on it — but it's a concrete thing to fix if you want the lineage to reproduce on another machine.

roar reproduce prints the same punchlist minus the publish-receipt items (lineage saved on glaas.ai and all artifact paths in tracked directories): by the time you're reproducing you already hold the lineage and are re-creating its outputs, so only the checks that bear on whether the run can execute apply.

Reproducing an artifact

Once you have a hash — from roar register, a roar show, or a glaas.ai artifact page — you can rebuild it:

roar reproduce <artifact-hash>          # preview the recipe — no changes
roar reproduce <artifact-hash> --run    # actually reproduce it

The default is a preview: the target, its git information, the build and run steps, and the packages that would be installed. Add --run to execute. roar resolves the code from the best available source — reusing your current checkout when you're already in the matching repository at the recorded commit, otherwise cloning the recorded remote — creates a virtual environment pinned to the recorded interpreter, installs the recorded packages, and runs the steps. The replay happens in a fresh session, so it never pollutes your active lineage.

To reproduce a whole recorded session rather than a single artifact, pass its 64-character lineage hash with --lineage:

roar reproduce <lineage-hash> --lineage --run

For the full set of flags — package-version fallbacks, system-package install, requirement listing — see the roar Guide.


For a hands-on reproduction from a hash in a fresh directory, see the End-to-End Example.