Core Concepts
On this page
The structural building blocks of roar and GLaaS. Read top-to-bottom if you're new; every section is also independently scannable.
Jobs
A job is one recorded command execution. Each roar run, roar build, roar get, and roar put produces a job. Jobs are the active nodes in the DAG — they consume input artifacts and produce output artifacts.
Three job types are distinguished in the record:
- Run / build jobs — the standard processing tasks, produced by
roar runandroar build. The bulk of any DAG. - Get jobs — retrieval operations (
roar get), recording input data that came from outside the workspace. - Put jobs — upload operations (
roar put), recording outputs that were pushed somewhere external.
The split matters because Get and Put participate in the lineage as ordinary jobs but represent boundary crossings — data entering or leaving the workspace.
Each job record captures:
- A unique ID, the command and arguments
- Files read (input artifacts), files written (output artifacts)
- Execution time and exit status
- OS, hardware (CPU/GPU when available), Python and system packages
- Environment variables that were read
- Git repository, branch, commit, and dirty/clean state
Inspect a job:
roar show <job-id>
# or: roar show @<step> e.g. roar show @5
See Tracers for what the observation layer actually does.
Artifacts
Artifacts are the files (or composite artifacts) that connect jobs. They're the passive nodes in the DAG.
Artifacts are identified by their content hash, not by:
- filename
- file path
- storage location
If two files have the same bytes, they are the same artifact — even if they live in different places or have different names. The whole DAG model rests on this.
Inspect an artifact:
roar show <hash>
# or: roar show <path>
See Hashes for the default algorithm choice and the multi-algorithm storage model, and Composite Artifacts for how roar represents directory-shaped artifacts (datasets) as single nodes.
Sessions
On the command line you are always working inside a session. A session is the sequence of jobs you've recorded since the last reset, and it's what roar derives the local DAG from. Each roar run adds another job to the active session.
On the CLI, the session and the DAG are effectively the same thing.
roar statusshows the session at a glance;roar dagis how you inspect its structure.
You don't "start" a session explicitly — one begins naturally as soon as you run your first command with roar. To wipe state and begin a new line of work (without affecting previously registered artifacts):
roar reset
Think of it as: "start fresh from here."
Lineage and DAGs
roar tracks a recreatable lineage of artifacts. Because the term is shorter, we usually just call it the DAG.
An artifact's lineage is the graph of jobs and artifacts that created the artifact. Because job records are rich and artifacts connect job dependencies, the lineage DAG captures a recipe for how to recreate that artifact.
In many pipeline tools, jobs are the only nodes. In roar, artifacts are also nodes. That's essential because, while a run is happening, roar cannot know which output files might later become inputs to other jobs — so it records the artifact regardless.
roar dag # the inferred DAG of the active session
roar show <artifact-hash> # the lineage behind a specific artifact
Labels
Labels are user-attached metadata — key/value annotations on a session, job, or artifact when the thing you want to record isn't observable: a metric, a comment, an experiment ID, a human judgment. Where lineage answers what happened, labels capture how it performed or what it's for.
For the full label model, configuration, and how labels become searchable on glaas.ai: Labels.
Registration and GLaaS
When you use roar locally, jobs, artifacts, and DAG state live in your workspace's .roar/ directory. To make an artifact's lineage globally lookup-able — and reproducible by anyone with the hash — register it to GLaaS:
roar register <artifact>
This publishes the artifact, the jobs that produced it, and the chain of upstream inputs as a registered DAG. Visibility is governed by the workspace's scope; the artifact's existence is global by design, while the metadata, jobs, DAGs, and labels follow the scope rule.
At the end of roar register, roar prints a roar reproduce … command you can use later.
Global dereferenceability
Once a DAG is registered, you can start from any hash — artifact, job, or session — and navigate backward or forward:
- artifact → job(s) → session → other artifacts
- job → inputs → upstream jobs
- artifact → "Reproduce with roar"
This works in the CLI (roar show, roar reproduce) and on glaas.ai via clickable links between artifacts, jobs, and sessions.
For a practical walkthrough, see the End-to-End Example.