Core Concepts
On this page
roar and GLaaS are built on a small set of ideas. Most of the power comes not from learning a new workflow, but from understanding what is automatically captured and how to use it.
You don’t need to define pipelines.
You don’t need to annotate data.
You don’t need to change how you work.
You install roar, and then you put roar run in front of a command. That’s it.
What follows are the concepts that explain what roar is capturing and how to get the most power out of roar + GLaaS.
roar (Run Observation & Artifact Registration)
roar is based on implicit observation.
Instead of asking you to declare inputs, outputs, or pipelines, roar actively observes what happens when you run commands. In particular, roar observes:
- which files are read
- which files are written
- which artifacts are produced
- and how those events relate over time
All structure—DAGs, lineage, reproducibility—is derived from runs, not specified up front.
roar creates records of jobs and artifacts.
Jobs
A job is a recorded execution of a command. There are three main types of jobs in GLaaS:
- Standard Job: A typical processing task created when you run a command.
- Get Job: A retrieval operation focused on pulling or downloading input data.
- Put Job: A storage or upload operation focused on pushing output data.
Each time you run:
roar run <command>
roar creates a standard job record that includes (conceptually):
- a unique ID for the job
- the command and arguments
- files read (input artifacts)
- files written (output artifacts)
- execution time and status
- OS version and basic hardware details (e.g., CPU/GPU, when available)
- environment variables that are read
- package dependencies (system and Python packages)
- git repository, branch, and commit
- whether the working tree had uncommitted changes
You can see these captured properties when you run:
roar show <job-id>
Jobs capture details that may be important to later recreation of finished artifacts.
Jobs consume and produce artifacts.
See How Does roar Work? for an implementation overview.
Artifacts
Artifacts are files that connect jobs.
Many artifacts are outputs of one job and inputs to another—intermediate files that matter for reproducibility even if they are never “deployed.”
Other artifacts are input datasets.
Other artifacts are finished model files or evaluation results—finished artifacts.
While roar doesn’t know an artifact’s purpose, it captures any file that might matter to the later recreation of finished artifacts.
Artifacts are identified by a content hash, not by:
- filename
- file path
- storage location
If two files have the same content, they are the same artifact—even if they live in different places or have different names.
You can see information captured about artifacts when you run:
roar show <artifact-hash>
# or: roar show <path-to-file>
See:
Labels
Labels are lightweight key/value annotations you can attach to sessions, jobs and artifacts.
- With the
roarCLI, labels can be used to store metrics, comments, or any text. - In GLaaS, labels can be configured for visibility across items, and label views can be used to construct leaderboards or other custom views.
Composite Artifacts
A composite artifact represents a directory or collection of files as a single tracked unit. In GLaaS, composite artifacts are also referred to as Datasets.
Many ML workflows produce or consume entire directories—datasets, sharded model checkpoints, multi-file outputs. If you track each file independently, you lose the relationship that says "these files are one thing." A composite preserves that grouping.
Instead of tracking each file as a separate artifact, roar groups all the files under one content hash. That hash is derived from the hashes of every component file. If any component changes, the composite hash changes. This gives composites the same content-addressable identity guarantee that single-file artifacts have.
A composite artifact is one artifact with one hash, but it knows about its parts.
While the composite is a single artifact, it retains awareness of its individual components—file paths, sizes, and types. You can query which files belong to a composite.
Composites participate in the DAG like any other artifact. They can be inputs or outputs of jobs, and their lineage is tracked end-to-end. Nothing about lineage or registration changes when an artifact is composite rather than a single file.
Lineage and DAGs
roar tracks a recreatable lineage of artifacts. Because it’s shorter and easier to type, we often just call this the DAG.
An artifact’s lineage is the graph of jobs and artifacts that created the artifact. Because job records are rich, and artifacts connect job dependencies, the lineage DAG captures a recipe for how to recreate an artifact.
In many pipeline tools, jobs are the only nodes. In roar, artifacts are also nodes. That’s essential because while a run is happening, roar cannot know which output files might later become inputs to other jobs.
To view the current inferred DAG:
roar dag
To view the lineage behind a specific artifact:
roar show <artifact-hash>
Note:
roarassumes the lineage graph is directed and acyclic for observed CLI workflows. If you intentionally create cycles (e.g., by reading a file you also overwrite in a loop across steps),roarcan still record the events, but the resulting structure may be presented as a best-effort DAG with cycle detection/handling. (Details will evolve as the system matures.)
Registration and GLaaS
Where are jobs, artifacts, and DAGs stored?
When you use roar locally, jobs, artifacts, and DAG state are stored on your machine in the directory where you ran:
roar init
This creates a .roar/ directory containing the local database and cache. Typically, you run roar init once per repo (near your .git directory).
When you’ve produced something you care about—especially something expensive—you may want it to be reproducible and dereferenceable in the future. That’s when you register a Registered DAG on GLaaS.
You do that with roar register:
roar register <artifact1>
This registers the Registered DAG on GLaaS: the jobs, artifacts, and relationships that explain how those artifacts were produced.
At the end of roar register, roar prints a roar reproduce ... command you can use later.
Labels
Labels are structured key/value annotations attached to jobs, artifacts, or sessions.
roar automatically captures structure—files, dependencies, execution context—but some information cannot be inferred from observation alone. Labels provide a way to record that additional information explicitly.
Labels are added via roar, and are then queryable in GLaaS
Lineage answers what happened. Labels help describe how it performed.
Global dereferenceability
Once a DAG is registered, you can start from a hash and navigate backward or forward:
- artifact → job(s) → session → other artifacts
- job → inputs → upstream jobs
- artifact → “Reproduce with roar”
This works in the CLI (roar show, roar reproduce) and on the glaas.ai website via clickable links between artifacts, jobs, and sessions.
For a practical walkthrough of how to use
roarand GLaaS, see the End-to-End Example.