Use Cases
On this page
Common workflows, framed as the question a senior engineer actually asks before opening the docs. Each one names the trigger, then the steps.
"I just trained this model. What did it learn from?"
You finished a run and want to confirm — locally, before sharing anything — which dataset and which code state the model actually saw.
roar show model.pkl # or `roar show <hash>`
That gives you the full producer chain: command that ran, input artifacts (with their hashes), git commit, environment. The DAG view is:
roar dag # the active session
roar show @<step> # one specific step
Purely local — no GLaaS, no register. Useful for end-of-day "wait, what was that?" checks.
"Something behaved differently this run. What changed?"
You have two runs that should have produced similar results but didn't. The diff is the answer.
- Register both runs to GLaaS (
roar register <hash>on each) if they aren't already. - Open the first run's DAG on glaas.ai.
- Click Diff and pick the second run.
- The comparison highlights the root-cause difference and breaks down divergence by code, data, params, environment, pipeline.
The Diff view is the "fast explanation of what's different" path. Especially useful when both runs look fine in isolation but the metric moved.
"Prove which dataset this model was trained on, six months later"
The audit case. You need to point at a model artifact and produce a hash-grounded answer about the data behind it — not "I think it was the v3 export."
- Find the model artifact hash (from your records, the registered DAG link, or
roar show model.pkl). - On glaas.ai, paste the hash. Or run
roar reproduce <hash> --planfor the recipe DAG. - Walk the lineage upstream from the model job → its inputs → their producer jobs → the original dataset registration. Each edge is hash-addressed; there's no "the file used to be called X."
The same query works locally as long as you still have the .roar/ directory:
roar show <model-hash> # walks back as far as the local DB has
"I need to recreate this artifact"
Someone (a teammate, a paper, an older version of you) published an artifact. You want the same bytes, locally.
- Get the artifact hash — from glaas.ai, a Slack message, an email, the
roar registeroutput that produced it. - Run:
roar reproduce <artifact-hash> roarwalks the lineage backward, prints the planned DAG, and offers to run the steps.- Execute end-to-end or step-by-step; the result is byte-identical when the inputs are still reachable.
This is the inverse of traditional pipelines: you start from the outcome, not the recipe.
"I want to pick up a teammate's checkpoint and keep training"
Reproduce gives you the path to the checkpoint; from there it's just roar run on your continuation script.
roar reproduce <checkpoint-hash>— gets you the checkpoint and its lineage locally.- Your continuation script reads the checkpoint as input.
roar run python continue_training.py --from-checkpoint ./checkpoint.pt --output continued.pt- The resulting
continued.ptartifact's lineage walks all the way back to your teammate's checkpoint — and from there, to their lineage — so the audit trail stays whole.
"Share my work, but keep the data internal"
You have a result worth showing externally, but the training data is private. You want the artifact discoverable, the dataset not.
GLaaS handles this by design: the artifact's existence is globally visible by hash (that's the point of a content-addressable registry), while the attached records (jobs, metadata, labels) follow the workspace's scope.
roar login # one-time
roar scope use <owner>/<project> # or `private`
roar register model.pkl
Anyone with the hash sees the model is registered. Reading what it was trained on requires being inside your scope. See Scopes for the full visibility matrix.
"How is this lineage different from that one?"
Same DAG-Diff workflow as the "What changed?" case above, but applied across teams or experiments rather than across two of your own runs. Open the first DAG on glaas.ai, click Diff, pick the second; the comparison highlights the root cause and categorizes divergence (code / data / params / environment / pipeline).