Core Concepts

Jobs

A job is one recorded command execution. Each roar run, roar build, roar get, and roar put produces a job. Jobs are the active nodes in the DAG — they consume input artifacts and produce output artifacts.

Three job types are distinguished in the record:

Run / build jobs — the standard processing tasks, produced by roar run and roar build. The bulk of any DAG.
Get jobs — retrieval operations (roar get), recording input data that came from outside the workspace.
Put jobs — upload operations (roar put), recording outputs that were pushed somewhere external.

The split matters because Get and Put participate in the lineage as ordinary jobs but represent boundary crossings — data entering or leaving the workspace.

Each job record captures:

A unique ID, the command and arguments
Files read (input artifacts), files written (output artifacts)
Execution time and exit status
OS, hardware (CPU/GPU when available), Python and system packages
Environment variables that were read
Git repository, branch, commit, and dirty/clean state

Inspect a job:

roar show <job-id>
# or:  roar show @<step>          e.g. roar show @5

See Tracers for what the observation layer actually does.

Artifacts

Artifacts are the files (or composite artifacts) that connect jobs. They're the passive nodes in the DAG.

Artifacts are identified by their content hash, not by:

filename
file path
storage location

If two files have the same bytes, they are the same artifact — even if they live in different places or have different names. The whole DAG model rests on this.

Inspect an artifact:

roar show <hash>
# or:  roar show <path>

See Hashes for the default algorithm choice and the multi-algorithm storage model, and Composite Artifacts for how roar represents directory-shaped artifacts (datasets) as single nodes.

Sessions

On the command line you are always working inside a session. A session is the sequence of jobs you've recorded since the last reset, and it's what roar derives the local DAG from. Each roar run adds another job to the active session.

On the CLI, the session and the DAG are effectively the same thing.roar status shows the session at a glance; roar dag is how you inspect its structure.

You don't "start" a session explicitly — one begins naturally as soon as you run your first command with roar. To wipe state and begin a new line of work (without affecting previously registered artifacts):

roar reset

Think of it as: "start fresh from here."

Lineage and DAGs

roar tracks a recreatable lineage of artifacts. Because the term is shorter, we usually just call it the DAG.

An artifact's lineage is the graph of jobs and artifacts that created the artifact. Because job records are rich and artifacts connect job dependencies, the lineage DAG captures a recipe for how to recreate that artifact.

In many pipeline tools, jobs are the only nodes. In roar, artifacts are also nodes. That's essential because, while a run is happening, roar cannot know which output files might later become inputs to other jobs — so it records the artifact regardless.

roar dag                       # the inferred DAG of the active session
roar show <artifact-hash>      # the lineage behind a specific artifact

Labels

Labels are user-attached metadata — key/value annotations on a session, job, or artifact when the thing you want to record isn't observable: a metric, a comment, an experiment ID, a human judgment. Where lineage answers what happened, labels capture how it performed or what it's for.

For the full label model, configuration, and how labels become searchable on glaas.ai: Labels.

Registration and GLaaS

When you use roar locally, jobs, artifacts, and DAG state live in your workspace's .roar/ directory. To make an artifact's lineage globally lookup-able — and reproducible by anyone with the hash — register it to GLaaS:

roar register <artifact>

This publishes the artifact, the jobs that produced it, and the chain of upstream inputs as a registered DAG. Visibility is governed by the workspace's scope; the artifact's existence is global by design, while the metadata, jobs, DAGs, and labels follow the scope rule.

At the end of roar register, roar prints a roar reproduce … command you can use later.

Global dereferenceability

Once a DAG is registered, you can start from any hash — artifact, job, or session — and navigate backward or forward:

artifact → job(s) → session → other artifacts
job → inputs → upstream jobs
artifact → "Reproduce with roar"

This works in the CLI (roar show, roar reproduce) and on glaas.ai via clickable links between artifacts, jobs, and sessions.

For a practical walkthrough, see the End-to-End Example.