Why roar + GLaaS?

roar

CLI tool for Run Observation & Artifact Registration.

GLaaS

Global Lineage-as-a-Service—a content-addressable registry for the DAGs and artifacts roar observes.

Lineage without orchestration

Modern ML teams move fast — often directly from the command line. That speed is powerful, but it comes with a cost: work becomes hard to trace, results are difficult to reproduce, and context is hard to share with others.

roar and GLaaS exist to help, without asking you to slow down, change how you work, or write more YAML. They do this by making context automatic.

The promise is simple:

Install roar
Put roar run in front of a command
Keep working as you normally do

Everything else is derived from what actually happened. See Quick Start for the install and first-run walkthrough.

What roar is

roar is a CLI that adds implicit observation to your existing workflows. You run the same commands you already run — training scripts, preprocessing jobs, shell scripts — but prefixed with roar run. From that point on, roar quietly records:

The command and its parameters
Which files were read and written
Which artifacts were created
The git repository, branch, and commit (and whether the working tree was dirty)
Package dependencies and environment variables
Execution time, exit status, and errors

You don't declare inputs. You don't define pipelines. You don't write config files.

roar infers inputs, outputs, and the entire DAG from observing runs.

The core idea: two "what ifs"

What if observation were implicit?

What if, when you ran a command, you could automatically capture what it read, what it wrote, and use that to infer how multiple commands depended on each other — without declaring any of it?

That's what roar run does. It observes data as it flows through a pipeline. No declarations. No orchestration.

What if artifacts were dereferenceable?

What if you had a great big lookup table from content hashes to their lineage? Then you could answer:

Where did this come from? (Even if the chain of custody is broken.)
What data did it depend on?
Which code and parameters produced it?

You'd be able to trust models even when the original pipeline, logs, or orchestration context are gone — because the model itself carries a dereferenceable link back to how it was produced.

This becomes possible when artifacts are tracked by content hashes, not filenames. That's where GLaaS comes in.

What GLaaS is

GLaaS (Global Lineage-as-a-Service) is a content-addressable lineage registry.

When you run roar register, GLaaS records:

The artifact's hash
The job(s) that produced it
The inputs it depended on
The surrounding context (commit, environment, packages)

GLaaS never stores your artifacts. It only stores how they were created.

This means:

No storage lock-in
No copying your data
No replacing your existing object stores

GLaaS gives you the ability to look up provenance globally by hash — like a time machine for models and data. You don't need a chain-of-custody guarantee on a binary to know where it came from. Put it anywhere, name it anything; as long as the bytes don't change, the lineage is reachable.

The CLI is great for day-to-day work; the GLaaS website is great for navigation and visualization — paste a hash (artifact, job, or session) into the search bar at glaas.ai and click through artifact → job → DAG → other artifacts. See the roar Guide for the registration walkthrough.

Safety without friction

roar is designed to support self-governance, not enforcement. It records git state and warns about uncommitted changes, with optional hygiene rules (e.g., committed-code-only) you can opt into. The goal: make the happy path the compliant path — without slowing builders down.

What roar and GLaaS are not

Not an experiment tracker
Not a workflow orchestrator
Not a model registry that stores your artifacts
Not a replacement for git, cloud storage, or training frameworks
Not a system that requires declarations to be useful

They are tools for carrying context forward, built on implicit observation, not ceremony.

Why this matters

Together, roar and GLaaS build good practices into your AI development workflow:

Traceability — follow models and data across their full lifecycle
Reproducibility — recreate results later without guesswork
Attributability — understand what inputs and decisions drove outcomes
Collaboration — reason together using shared context instead of private state

Or more simply: they make audits trivial, recovery fast, and coordination cheap.

Where to go next

Quick Start — install and run your first commands
Core Concepts — jobs, artifacts, sessions, DAGs
Common Use Cases — workflows for real problems
Compared to Logs and Trackers — implementation details and edge cases