Frequently Asked Questions
On this page
How Does roar Work?
roar records which files were read and written using a Rust-based tracer.
At a high level:
roarruns your command underptrace- it watches syscalls for file opens/reads/writes across the whole process tree (including fork/exec)
- it resolves file paths and records events (processes, timing, env, etc.)
- it writes a report to disk
roarconsumes that report to:- identify artifacts
- create job records
- and build/update the DAG
What Does roar Run On?
Today, roar is Linux-only due to its use of user-space tracing (ptrace). (Support for other OSes may come later via different tracing strategies.)
How Are Artifacts Recognized?
As roar detects reads and writes, it categorizes files:
- System files: OS paths like
/usr,/etc,/proc,/dev,/lib, etc.
These are ignored when filtering read inputs. - Package files: Python
site-packages, virtualenv directories, standard library.
These are not treated as data inputs; instead,roarrecords imported packages + versions. - Artifacts: what’s left—files your code actually reads/writes that aren’t system/package noise.
These become tracked inputs/outputs in the DAG.
Are Artifact Hash Collisions Possible?
Assuming full-length BLAKE3 (256-bit) hashes behave like a uniform random hash:
- With 1 trillion artifacts (n = 10¹²), chance of any collision is roughly
p ≈ n² / (2·2²⁵⁶) ≈ 4×10⁻⁵⁴ (about 1 in 2×10⁵³)
In other words: astronomically unlikely.
Why is roar Hash Agnostic?
Different ecosystems standardize on different hashes (and future standards will evolve). Hash agnosticism keeps GLaaS durable and interoperable.
Why content hashes?
Content hashes provide two essential properties:
- Identity — the same content always refers to the same artifact
- Dereferenceability — you can work backward from an artifact to its provenance
This enables:
- reliable lineage
- deduplication
- reproduction without filenames or conventions
Hash agnosticism and defaults
roar treats hashes as attributes of artifacts, not as their identity. Multiple hashes may point to the same artifact.
Current defaults:
roar get(downloads): compute SHA‑256 and BLAKE3- intermediate artifacts: compute BLAKE3
roar put(registration/uploads): register BLAKE3
This allows roar to:
- interoperate with ecosystems that expect SHA‑256 (e.g., Hugging Face)
- remain fast during iteration (BLAKE3)
- evolve hashing choices without breaking lineage
Why won’t roar run on a dirty repo?
If your working tree has uncommitted changes, it can be difficult (or impossible) for someone else to reproduce what you did later—because the exact code state isn’t anchored anywhere.
roar records dirty/clean status for transparency, and organizations may choose to enforce clean repos for certain workflows.
Why does roar tag git commits?
Every job records git context:
- repository
- branch
- commit
- dirty/clean state
When DAGs are registered, commits may be persistently tagged so that:
- provenance remains stable even as branches move
- reproduction does not depend on local history
- lineage can outlive a single clone or machine