Glaas minimal logo, light
Loading...

Core Concepts

On this page

roar and GLaaS are built on a small set of ideas. Most of the power comes not from learning a new workflow, but from understanding what is automatically captured and how to use it.

You don’t need to define pipelines.
You don’t need to annotate data.
You don’t need to change how you work.

You install roar, and then you put roar run in front of a command. That’s it.

What follows are the concepts that explain what roar is capturing and how to get the most power out of roar + GLaaS.

roar (Run Observation & Artifact Registration)

roar is based on implicit observation.

Instead of asking you to declare inputs, outputs, or pipelines, roar actively observes what happens when you run commands. In particular, roar observes:

  • which files are read
  • which files are written
  • which artifacts are produced
  • and how those events relate over time

All structure—DAGs, lineage, reproducibility—is derived from runs, not specified up front.

roar creates records of jobs and artifacts.

Jobs

A job is a recorded execution of a command.

Each time you run:

roar run <command>

roar creates a job record that includes (conceptually):

  • a unique ID for the job
  • the command and arguments
  • files read (input artifacts)
  • files written (output artifacts)
  • execution time and status
  • OS version and basic hardware details (e.g., CPU/GPU, when available)
  • environment variables that are read
  • package dependencies (system and Python packages)
  • git repository, branch, and commit
  • whether the working tree had uncommitted changes

You can see these captured properties when you run:

roar show <job-id>

Jobs capture details that may be important to later recreation of finished artifacts.

Jobs consume and produce artifacts.

See How Does roar Work? for an implementation overview.

Artifacts

Artifacts are files that connect jobs.

Many artifacts are outputs of one job and inputs to another—intermediate files that matter for reproducibility even if they are never “deployed.”

Other artifacts are input datasets.
Other artifacts are finished model files or evaluation results—finished artifacts.

While roar doesn’t know an artifact’s purpose, it captures any file that might matter to the later recreation of finished artifacts.

Artifacts are identified by a content hash, not by:

  • filename
  • file path
  • storage location

If two files have the same content, they are the same artifact—even if they live in different places or have different names.

You can see information captured about artifacts when you run:

roar show <artifact-hash>
# or: roar show <path-to-file>

See:

Lineage and DAGs

roar tracks a recreatable lineage of artifacts. Because it’s shorter and easier to type, we often just call this the DAG.

An artifact’s lineage is the graph of jobs and artifacts that created the artifact. Because job records are rich, and artifacts connect job dependencies, the lineage DAG captures a recipe for how to recreate an artifact.

In many pipeline tools, jobs are the only nodes. In roar, artifacts are also nodes. That’s essential because while a run is happening, roar cannot know which output files might later become inputs to other jobs.

To view the current inferred DAG:

roar dag

To view the lineage behind a specific artifact:

roar show <artifact-hash>

Note: roar assumes the lineage graph is directed and acyclic for observed CLI workflows. If you intentionally create cycles (e.g., by reading a file you also overwrite in a loop across steps), roar can still record the events, but the resulting structure may be presented as a best-effort DAG with cycle detection/handling. (Details will evolve as the system matures.)

TODO: insert simple DAG example visualization.

Registration and GLaaS

Where are jobs, artifacts, and DAGs stored?

When you use roar locally, jobs, artifacts, and DAG state are stored on your machine in the directory where you ran:

roar init

This creates a .roar/ directory containing the local database and cache. Typically, you run roar init once per repo (near your .git directory).

When you’ve produced something you care about—especially something expensive—you may want it to be reproducible and dereferenceable in the future. That’s when you register a Registered DAG on GLaaS.

You do that with roar register:

roar register <artifact1>

This registers the Registered DAG on GLaaS: the jobs, artifacts, and relationships that explain how those artifacts were produced.

At the end of roar register, roar prints a roar reproduce ... command you can use later.

Global dereferenceability

Once a DAG is registered, you can start from a hash and navigate backward or forward:

  • artifact → job(s) → session → other artifacts
  • job → inputs → upstream jobs
  • artifact → “Reproduce with roar”

This works in the CLI (roar show, roar reproduce) and on the glaas.ai website via clickable links between artifacts, jobs, and sessions.


For a practical walkthrough of how to use roar and GLaaS, see the End-to-End Example.