Glaas minimal logo, light
Loading...

Start Here

roar
CLI tool for Run Observation & Artifact Registration.
GLaaS
Global Lineage-as-a-Service—a content-addressable registry for the DAGs and artifacts roar observes.

On this page

Lineage without Orchestration

Modern ML teams move fast—often directly from the command line. That speed is powerful, but it comes with a cost: work become hard to trace, results are difficult to reproduce, and context is hard to share with others.

roar and GLaaS exist to help. Without asking you to slow down, change how you work, or write more YAML. They do this by making context automatic.

The key promise is simplicity:

  • Install roar
  • Put roar run in front of a command
  • Keep working as you normally do

That’s it. Everything else is derived from what actually happened.

What roar Is

roar is a CLI that adds implicit observation to your existing workflows.

You run the same commands you already run—training scripts, preprocessing jobs, shell scripts—but you prefix them with:

roar run <your command>

From that point on, roar quietly observes what happens.

Specifically, roar records:

  • The command and parameters you ran
  • Which files were read and written
  • Which artifacts were created
  • The git repository, branch, and commit
  • Whether the repository had uncommitted changes
  • Package dependencies and environment variables
  • Execution time, exit status, and errors

You don’t declare inputs.
You don’t define pipelines. You don’t write config files to keep track of things.

roar automatically infers inputs, outputs and the entire pipeline directly by observing runs.

The Core Idea: Two “What Ifs”

roar is built around two simple questions.

What if observation were implicit?

What if when you ran a command you could automatically capture:

  • what it read
  • what it wrote
  • and used that to infer how multiple commands were depended on each other

...without you having to declare any of it?

That’s what roar run does. It observes data as it flows through a pipeline. No declarations. No orchestration.

What if artifacts were dereferenceable?

What if you had a great big lookup table from content hashes to their lineage? Could you then answer...

  • Where did this come from? (Even if the chain of control over a model is broken?
  • What data did it depend on?
  • Which code and parameters produced it?

Then you could gain trust in models even when the original pipeline, logs, or orchestration context are gone—because the model itself carries a dereferenceable link back to who built it, how it was produced, and what evidence supports it.

This becomes possible when artifacts are tracked by content hashes, not filenames. That’s where GLaaS comes in.

What GLaaS Is

GLaaS (Global Lineage-as-a-Service) is a content-addressable lineage registry.

When you choose to register artifacts (for example via roar register), GLaaS records:

  • the artifact’s hash
  • the job(s) that produced it
  • the inputs it depended on
  • and surrounding context (commit, environment, packages)

GLaaS never stores your artifacts.
It only stores how they were created.

This means:

  • No storage lock-in
  • No copying your data
  • No replacing your existing object stores

GLaaS simply gives you the ability to look up provenance globally by hash—like a time machine for models and data. You don't need to guarantee a chain of custody of a binary in order to know where it came from. You can put that binary anywhere you like, call it anything you like, but as long as the bytes don't change, you can still figure out its lineage.

Sessions and DAGs

On the command line, you are always working inside a session.

A session represents the sequence of work you do as you run commands with roar. Each roar run adds another recorded job to the current session.

From that session, roar derives a DAG (directed acyclic graph) showing how artifacts flowed through your work through a series of steps.

On the CLI, the session and the DAG are effectively the same thing.
roar dag is how you inspect and manage your current session.

Jobs and Artifacts

  • Each roar run creates a job (a recorded execution).
  • Jobs consume input artifacts and produce output artifacts.
  • Artifacts are tracked by content hash, not by filename.

The DAG is not something you define—it is inferred from these relationships.

Controlling your session

You don’t “start” a session explicitly. A session begins naturally as soon as you run your first command with roar.

You do control the session when you want to reset or restructure your work.

Resetting the session

To reset the current session (and its derived DAG):

roar reset

This clears the session state and lets you begin a new line of work, without affecting previously registered artifacts.

Think of it as: "start fresh from here."

A 5‑Minute First Experience

# Initialize roar in a project
roar init

# Run commands as usual—just prefix with roar
roar run python preprocess.py --input data.csv --output features.parquet
roar run python train.py --data features.parquet --output model.pt
roar run python evaluate.py --model model.pt --output metrics.json

# View recorded jobs
roar log

# View the current inferred DAG
roar dag

# Ask: where did this artifact come from?
roar show model.pt

At this point you already have:

  • Traceability
  • Reproducibility
  • A derived pipeline (DAG)
  • A searchable history of your work

All without changing how you work.

Using the GLaaS website (glaas.ai)

The CLI is great for day-to-day work; the GLaaS website is great for navigation and visualization.

Typical flow:

  1. Run roar register to register DAG information with GLaaS.
  2. Visit glaas.ai and paste a hash into the search bar:
    • artifact hash
    • job hash
    • session/DAG hash

Note: To use the roar register command, you'll need to set up an account with GLaaS. See authentication docs for more information.

On an artifact page:

  • Artifact Details show size and registration time
  • Associated Jobs lists the job(s) that produced or used the artifact
  • Originating Sessions links back to the session/DAG context
  • A “Reproduce with roar” action gives you the exact command to recreate it locally

From there you can click through:

artifact → job → session → other artifacts → …

This “follow the links” experience is intentionally simple: you don’t need to understand the entire system up front to use it effectively.

Reproducing Work from an Artifact

Because artifacts are content-addressed, you can work backward from a result.

roar reproduce <artifact-hash>

This traces the lineage of the artifact and reconstructs the steps required to reproduce it. Depending on the artifact, reproduction may:

  • clone or fetch the original repo (if available)
  • guide you through environment/package setup
  • reconstruct the “recipe” DAG of steps to run
  • let you run steps selectively or end-to-end

This is the inverse of traditional pipelines: you start from the outcome, not the recipe.

Tip: On glaas.ai, the artifact page’s “Reproduce with roar” action can generate the exact command to paste into your terminal.

From Local to Global with GLaaS (Optional)

By default, everything stays local.

When you're ready, you can register artifacts and jobs to GLaaS:

roar register model.pt

When you do this:

  • The artifact becomes globally dereferenceable by hash
  • Its provenance (jobs, inputs, commit, environment) is recorded
  • The associated git commit is tagged for reproducibility
  • The artifact can be looked up by others without sharing the data itself

GLaaS never stores your artifacts. It only stores how they were created.

Safety Without Friction

roar is designed to support self-governance, not enforcement.

Examples include:

  • Recording git state and warning about uncommitted changes
  • Optional enforcement of hygiene rules (e.g., committed code only)

The goal is to make the happy path the compliant path—without slowing builders down.

What roar and GLaaS Are Not

To set expectations clearly, roar and GLaaS are:

  • Not an experiment tracker
  • Not a workflow orchestrator
  • Not a model registry that stores your artifacts
  • Not a replacement for git, cloud storage, or training frameworks
  • Not a system that requires declarations to be useful

They are tools for carrying context forward, built on implicit observation, not ceremony.

Why This Matters

Together, roar and GLaaS help build good practices into your AI development workflow:

  • Traceability — follow models and data across their full lifecycle
  • Reproducibility — recreate results later without guesswork
  • Attributability — understand what inputs and decisions drove outcomes
  • Collaboration — reason together using shared context instead of private state

Or more simply: They make audits trivial, recovery fast, and coordination cheap.

Summary

roar and GLaaS are built on a simple idea:

  • Install it.
  • Run your commands as usual.
  • Let structure emerge from reality.

Everything else—DAGs, lineage, reproduction, collaboration, governance—is a consequence of that choice.

Where to Go Next

  • Learn the Core Concepts in more detail
  • Explore roar run, roar show, roar dag, and roar reproduce usage
  • Read the Common Use Cases
  • Understand how roar works and review the FAQ
>