Guide to Using roar from Start to Finish

Installing roar

See Installation for the platform support matrix, prerequisites, tracer-backend requirements, macOS SIP notes, and build-from-source instructions.

Configuring roar (`roar init`)

In your project repo:

cd my-ml-project
roar init

This:

creates .roar/ (local DB + cache)
offers to add .roar/ to .gitignore
configures defaults (including the default GLaaS server, if set)

You can inspect and modify configuration via:

roar config list
roar config get <key>
roar config set <key> <value>

Building (`roar build`)

Use roar build for setup steps you want tracked separately from your "main" DAG steps—e.g., compiling extensions or installing local packages.

roar build pip install -e .
roar build make -j4

Build steps can be replayed as part of reproduction when needed.

Running (`roar run`)

roar run python train.py --epochs 10 --lr 1e-3
roar run ./scripts/preprocess.sh
roar run torchrun --nproc_per_node=4 train.py

roar observes file I/O across the full process tree, then updates jobs, artifacts, and the inferred DAG.

Getting and Putting Data (`roar get` and `roar put`)

While roar run is used for standard processing tasks, you can explicitly track data movement into and out of your workspace:

roar get s3://my-bucket/dataset ./data/
roar put ./model/ s3://my-bucket/models/

roar get: Records a Get job. This is a retrieval operation that explicitly tracks data being pulled into your workspace as an input artifact.
roar put: Records a Put job. This is a storage operation that explicitly tracks data being pushed out as an output artifact.

These commands are especially useful when working with Composite Artifacts (Datasets)—directories or collections of files treated as a single tracked unit. By using roar get or roar put, an entire dataset directory maintains its lineage and content hash identity as it moves into or out of your DAG.

The current DAG (`roar dag`)

As you iterate, you may re-run commands, overwrite outputs, or change downstream results. roar retains history, but roar dag shows what is true now.

roar dag

The current DAG:

collapses re-runs
keeps only the most recent equivalent job
hides downstream work that depends on overwritten inputs

This is a projection of history, not a deletion of it.

Setting up authentication with glaas.ai (`roar login`)

To register artifacts under your GLaaS identity (rather than anonymously), log in once:

roar login

This opens a device-code flow in your browser: sign in with GitHub, approve the device, done. The auth state is stored under ~/.config/roar/ so all your roar workspaces share it.

After roar login the workspace's scope auto-flips from anonymous to private if it was still at the init default — so the next roar register is access-controlled rather than public-by-default.

To check what's stored:

roar whoami

To clear the auth state:

roar logout

Legacy: SSH-key auth (roar auth). An older auth path lets you pair roar with a GitHub SSH key registered on glaas.ai (roar auth key to print the public key, roar auth test to verify). It's still functional and useful for non-interactive environments where the device-code flow is awkward. For interactive setups, roar login is the recommended path.

Registering DAGs with GLaaS (`roar register`)

When you run:

roar register <path to artifacts>

roar registers a Registered DAG:

includes the selected artifact(s), which can be single files or Composite Artifacts (Datasets)
includes upstream jobs and artifacts required to explain them
forms a self-contained recipe for reproduction

This is what GLaaS stores and makes searchable.

!NOTEroar register ... can be performed without setting up an account. New workspaces default to the anonymous scope, which publishes publicly without attribution and persistence isn't guaranteed. For attributed registration and a real personal namespace, run roar login once — this is the recommended workflow. See Scopes.

After registration, use glaas.ai to visualize and navigate the Registered DAG by clicking between artifacts, jobs, and sessions.

Reproduction (`roar reproduce`)

When you run:

roar reproduce <artifact-hash>

roar reconstructs a recipe DAG:

describes steps required to recreate an artifact
contains planned steps, not completed jobs
may reference artifacts that do not yet exist

As you execute steps:

planned steps become real jobs
artifacts are created
the recipe merges into your active session

On glaas.ai, artifact pages provide a "Reproduce with roar" action to generate the reproduce command.

Guide to Using roar from Start to Finish

On this page

Installing roar

Configuring roar (roar init)

Building (roar build)

Running (roar run)

Getting and Putting Data (roar get and roar put)

The current DAG (roar dag)

Setting up authentication with glaas.ai (roar login)

Registering DAGs with GLaaS (roar register)

Reproduction (roar reproduce)