Why roar + GLaaS?
On this page
Lineage without orchestration
Modern ML teams move fast — often directly from the command line. That speed is powerful, but it comes with a cost: work becomes hard to trace, results are difficult to reproduce, and context is hard to share with others.
roar and GLaaS exist to help, without asking you to slow down, change how you work, or write more YAML. They do this by making context automatic.
The promise is simple:
- Install
roar - Put
roar runin front of a command - Keep working as you normally do
Everything else is derived from what actually happened. See Quick Start for the install and first-run walkthrough.
What roar is
roar is a CLI that adds implicit observation to your existing workflows. You run the same commands you already run — training scripts, preprocessing jobs, shell scripts — but prefixed with roar run. From that point on, roar quietly records:
- The command and its parameters
- Which files were read and written
- Which artifacts were created
- The git repository, branch, and commit (and whether the working tree was dirty)
- Package dependencies and environment variables
- Execution time, exit status, and errors
You don't declare inputs. You don't define pipelines. You don't write config files.
roar infers inputs, outputs, and the entire DAG from observing runs.
The core idea: two "what ifs"
What if observation were implicit?
What if, when you ran a command, you could automatically capture what it read, what it wrote, and use that to infer how multiple commands depended on each other — without declaring any of it?
That's what roar run does. It observes data as it flows through a pipeline. No declarations. No orchestration.
What if artifacts were dereferenceable?
What if you had a great big lookup table from content hashes to their lineage? Then you could answer:
- Where did this come from? (Even if the chain of custody is broken.)
- What data did it depend on?
- Which code and parameters produced it?
You'd be able to trust models even when the original pipeline, logs, or orchestration context are gone — because the model itself carries a dereferenceable link back to how it was produced.
This becomes possible when artifacts are tracked by content hashes, not filenames. That's where GLaaS comes in.
What GLaaS is
GLaaS (Global Lineage-as-a-Service) is a content-addressable lineage registry.
When you run roar register, GLaaS records:
- The artifact's hash
- The job(s) that produced it
- The inputs it depended on
- The surrounding context (commit, environment, packages)
GLaaS never stores your artifacts. It only stores how they were created.
This means:
- No storage lock-in
- No copying your data
- No replacing your existing object stores
GLaaS gives you the ability to look up provenance globally by hash — like a time machine for models and data. You don't need a chain-of-custody guarantee on a binary to know where it came from. Put it anywhere, name it anything; as long as the bytes don't change, the lineage is reachable.
The CLI is great for day-to-day work; the GLaaS website is great for navigation and visualization — paste a hash (artifact, job, or session) into the search bar at glaas.ai and click through artifact → job → DAG → other artifacts. See the roar Guide for the registration walkthrough.
Safety without friction
roar is designed to support self-governance, not enforcement. It records git state and warns about uncommitted changes, with optional hygiene rules (e.g., committed-code-only) you can opt into. The goal: make the happy path the compliant path — without slowing builders down.
What roar and GLaaS are not
- Not an experiment tracker
- Not a workflow orchestrator
- Not a model registry that stores your artifacts
- Not a replacement for git, cloud storage, or training frameworks
- Not a system that requires declarations to be useful
They are tools for carrying context forward, built on implicit observation, not ceremony.
Why this matters
Together, roar and GLaaS build good practices into your AI development workflow:
- Traceability — follow models and data across their full lifecycle
- Reproducibility — recreate results later without guesswork
- Attributability — understand what inputs and decisions drove outcomes
- Collaboration — reason together using shared context instead of private state
Or more simply: they make audits trivial, recovery fast, and coordination cheap.
Where to go next
- Quick Start — install and run your first commands
- Core Concepts — jobs, artifacts, sessions, DAGs
- Common Use Cases — workflows for real problems
- FAQ — implementation details and edge cases