Definitions for the vocabulary used across the `roar` and GLaaS docs.

## Main entities

### Session

A **session** is the local context for a sequence of `roar run` commands. It captures everything you've recorded since the last `roar reset`, and it's what `roar` derives the local DAG from. One session, one local DAG. Confusingly, GLaaS-side records sometimes call the *registered* version of a session a **DAG** for short — see [DAG](#dag-directed-acyclic-graph) below.

### DAG (Directed Acyclic Graph)

The graph of jobs and artifacts that produced an artifact. Each session has one local DAG; you can also publish a snapshot of it as a **registered DAG** on GLaaS. The two aren't identical — the local session evolves as you re-run things; the registered DAG is frozen at register time. When you view a "DAG page" on glaas.ai, you're looking at a registered DAG.

### Job

A single recorded command execution. Active node in the DAG. Created by `roar run`, `roar build`, `roar get`, or `roar put`. Jobs consume input artifacts and produce output artifacts.

### Artifact

A file (or composite artifact) identified by its **content hash**, not by filename or path. Passive node in the DAG — sits between jobs that produce and jobs that consume it.

### Composite artifact

An artifact representing a directory or multi-file collection (a dataset, a sharded checkpoint) as a single tracked unit with one content hash. Component files are still recorded internally for introspection, but the DAG shows the composite as one node. See [Composite Artifacts](/docs/composite-artifacts).

### Component

An individual file inside a composite artifact. Records the relative path within the composite and the component's own hash.

### Registered DAG

A snapshot of a session published to GLaaS via `roar register`. Becomes globally lookup-able by hash. The local session it was registered from can evolve afterward without changing the registered DAG.

## Hashes and identity

### Content hash

A hash of an artifact's bytes — the artifact's identity. If two files share a content hash, they are the same artifact regardless of where they live or what they're called.

### blake3

`roar`'s default content-hashing algorithm. 256-bit output, fast, ~2× sha256 single-threaded and 5–10× with SIMD. Falls back to sha256 if the `blake3` Python module isn't installed. See [Hashes](/docs/hashes).

### sha256

256-bit content hash; widely used (Hugging Face, git, IPFS, Docker, TLS). `roar` can compute it alongside blake3 per operation when configured; useful for cross-tool identity checks. See [Hashes](/docs/hashes).

### composite-blake3

A Merkle-tree-style hash over a composite artifact: blake3 over the component file hashes plus their relative paths. Deterministic across same-input/same-layout, sensitive to any component change. See [Composite Artifacts](/docs/composite-artifacts#composite-blake3).

### ETag

The hash S3 returns on `PutObject` / `GetObject`. For single-part uploads it's MD5 of the body; for multipart it's `MD5(concat(MD5(part_1), MD5(part_2), …))-<N>`, which is **not** a content digest. `roar` records ETag as one of an artifact's hash algorithms (via the [Proxy](/docs/proxy)) but treats blake3 as the source of truth for content identity.

## Operations and characteristics

### Get

A `roar get` invocation, or the input-side of any job: an operation where a job reads an artifact. Get as a CLI command (`roar get`) creates a Get-type job recording data entering the workspace from outside; Get as an edge in the DAG just means "this job read this artifact."

### Put

A `roar put` invocation, or the output-side of any job: an operation where a job writes an artifact. Put as a CLI command (`roar put`) creates a Put-type job recording data leaving the workspace; Put as a DAG edge just means "this job wrote this artifact."

## Tracers, proxy, and integrations

### Tracer

The component that observes file I/O during `roar run`. `roar` ships three backends — eBPF, preload, ptrace — and picks one via auto-fallback. See [Tracers](/docs/tracers).

### Proxy

`roar`'s S3 reverse proxy. Captures `GetObject` / `PutObject` traffic so cloud-storage I/O ends up in the lineage. Today: AWS only; GCP on the roadmap. See [Proxy](/docs/proxy).

### Fragment

A streamed lineage event from a Ray worker (or any distributed runner) to the driver. Encapsulates one task's reads and writes; the driver-side reconstituter merges fragments into the local DB after the job exits. See [Ray](/docs/ray).

## Annotations and visibility

### Label

A user-attached key/value annotation on a session, job, or artifact. The way to record metadata that isn't observable from the run alone — metrics, comments, tags, experiment IDs. Searchable on glaas.ai. See [Labels](/docs/labels).

### Scope

The persistent visibility setting on a `roar` workspace — `anonymous`, `private`, `public`, or `<owner>/<project>`. Governs who can read the records registered from that workspace. See [Scopes](/docs/scopes).

### Dataset fingerprint

A heuristic identifier `roar` attaches to composite artifacts it recognizes as datasets — based on the directory shape, manifest files, and file types present. Surfaces as `dataset.id`, `dataset.modality`, `dataset.type` labels.

## Relationship overview

The following diagram shows how these entities connect to form the lineage of your work.

![Relationship Overview](/images/docs/roar-glaas-diagram.png)

## Inspecting the DAG

From the CLI:

- **View the current DAG:** `roar dag`
- **Inspect a specific job:** `roar show <job-id>` (or `roar show @<step>`)
- **Inspect an artifact or path:** `roar show <hash-or-path>`

On glaas.ai, click any job or artifact to navigate its upstream and downstream connections.
