Glaas minimal logo, light
Loading...

Frequently Asked Questions

On this page

How Does roar Work?

roar records which files were read and written using a Rust-based tracer.

At a high level:

  • roar runs your command under ptrace
  • it watches syscalls for file opens/reads/writes across the whole process tree (including fork/exec)
  • it resolves file paths and records events (processes, timing, env, etc.)
  • it writes a report to disk
  • roar consumes that report to:
    • identify artifacts
    • create job records
    • and build/update the DAG

What Does roar Run On?

Today, roar is Linux-only due to its use of user-space tracing (ptrace). (Support for other OSes may come later via different tracing strategies.)

How Are Artifacts Recognized?

As roar detects reads and writes, it categorizes files:

  • System files: OS paths like /usr, /etc, /proc, /dev, /lib, etc.
    These are ignored when filtering read inputs.
  • Package files: Python site-packages, virtualenv directories, standard library.
    These are not treated as data inputs; instead, roar records imported packages + versions.
  • Artifacts: what’s left—files your code actually reads/writes that aren’t system/package noise.
    These become tracked inputs/outputs in the DAG.

Are Artifact Hash Collisions Possible?

Assuming full-length BLAKE3 (256-bit) hashes behave like a uniform random hash:

  • With 1 trillion artifacts (n = 10¹²), chance of any collision is roughly
    p ≈ n² / (2·2²⁵⁶) ≈ 4×10⁻⁵⁴ (about 1 in 2×10⁵³)

In other words: astronomically unlikely.

Why is roar Hash Agnostic?

Different ecosystems standardize on different hashes (and future standards will evolve). Hash agnosticism keeps GLaaS durable and interoperable.

Why content hashes?

Content hashes provide two essential properties:

  • Identity — the same content always refers to the same artifact
  • Dereferenceability — you can work backward from an artifact to its provenance

This enables:

  • reliable lineage
  • deduplication
  • reproduction without filenames or conventions

Hash agnosticism and defaults

roar treats hashes as attributes of artifacts, not as their identity. Multiple hashes may point to the same artifact.

Current defaults:

  • roar get (downloads): compute SHA‑256 and BLAKE3
  • intermediate artifacts: compute BLAKE3
  • roar put (registration/uploads): register BLAKE3

This allows roar to:

  • interoperate with ecosystems that expect SHA‑256 (e.g., Hugging Face)
  • remain fast during iteration (BLAKE3)
  • evolve hashing choices without breaking lineage

Why won’t roar run on a dirty repo?

If your working tree has uncommitted changes, it can be difficult (or impossible) for someone else to reproduce what you did later—because the exact code state isn’t anchored anywhere.

roar records dirty/clean status for transparency, and organizations may choose to enforce clean repos for certain workflows.

Why does roar tag git commits?

Every job records git context:

  • repository
  • branch
  • commit
  • dirty/clean state

When DAGs are registered, commits may be persistently tagged so that:

  • provenance remains stable even as branches move
  • reproduction does not depend on local history
  • lineage can outlive a single clone or machine