Glaas minimal logo, light

Troubleshooting

On this page

This page is symptom-indexed: scan for what you're seeing in the terminal, then jump.

For deeper context on individual subsystems, follow the cross-references to Tracers, Scopes, Ray, or the roar Guide.

First-time setup

Error: roar is not initialized in this directory.

What it means: you're running roar (or roar run, roar dag, etc.) in a directory that doesn't have a .roar/ subdirectory.

roar init

Run once per project. .roar/ lives next to your .git/. The init step is idempotent — re-running on an already-initialized workspace just reports it's done.

Error: Not in a git repository.

roar requires the working tree to be inside a git repo so it can tag every recorded job with the exact commit it ran from. Initialize git first, then roar:

git init && git add -A && git commit -m "initial"
roar init

roar: command not found after pip install roar-cli

Your shell can't see the roar binary on PATH. Two cases:

  • Installed via uv tool install roar-cli (the recommended path) — uv puts shims under ~/.local/bin by default. Add that to your PATH, or run uv tool update-shell once to wire it up.
  • Installed via pipx install roar-cli — pipx similarly uses ~/.local/bin. Run pipx ensurepath once.
  • Installed via pip install roar-cli into a project venvroar only exists when that venv is activated. The other two are how you keep roar always-available.

See Installation for the recommended setup.

Running commands

Git repository has uncommitted changes

By design. roar run refuses to start when the working tree is dirty so every recorded job carries an honest commit SHA — otherwise re-running later wouldn't actually re-run the same code.

git add -A && git commit -m "wip"
roar run python train.py

If you need to iterate fast without committing each loop, see the --allow-dirty discussion in the roar Guide.

Tracer preflight failed for '<backend>': …

The selected tracer can't initialize. Start with the deep preflight to see exactly why:

roar tracer check <backend>    # ebpf | preload | ptrace

Common causes by backend:

  • eBPF — kernel < 5.8, missing BTF, or CAP_BPF not set. Run roar tracer enable ebpf once to apply the capability, or pick a different backend with roar tracer use preload.
  • preload — the shared library isn't built. cd rust && cargo build --release in the roar repo.
  • ptracekernel.yama.ptrace_scope is blocking us. Check sysctl kernel.yama.ptrace_scope; values 0 or 1 typically work, 2 requires CAP_SYS_PTRACE.

roar tracer (no subcommand) shows the readiness table for all three backends — useful when you don't yet know which one is failing.

Operation not permitted creating BPF map

You're seeing the raw kernel error from the eBPF backend trying to allocate a BPF map without sufficient privilege. Two paths:

roar tracer check ebpf      # confirms the specific cause (CAP_BPF / kernel / BTF)
roar tracer enable ebpf     # one-shot: applies CAP_BPF to the eBPF binary

If the host doesn't allow CAP_BPF (managed container runtimes, locked-down CI), eBPF won't work here regardless. Fall back to preload:

roar tracer use preload

See Tracers → Cloud and managed GPU platforms for which environments support eBPF.

roar run completes but the DAG is empty (no outputs reported)

The first diagnostic is which backend actually ran, and whether it could see your workload at all:

roar tracer                  # which backend is active + the readiness table
roar tracer check <backend>  # deep preflight if the readiness table is unclear

The usual causes once you know the backend:

  • Static binaries via preload — preload only sees dynamically-linked libc calls. Switch to eBPF or ptrace for static binaries.
  • Setuid binaries — the dynamic linker scrubs LD_PRELOAD. Same fix.
  • Container without privileges — eBPF requires kernel access; preload usually works. Most CI runners need preload.
  • Auto-fallback chose a less-capable backend silentlyroar tracer will tell you which backend was active for the run; if it wasn't the one you expected, run preflight on the one you want and fix the underlying issue.

The Tracers cloud-platform table lists what's available where.

The wrapped command's exit code seems wrong

roar run forwards the wrapped command's exit code verbatim. If you're seeing the wrong one, check whether you have a shell pipeline rewriting it — roar run cmd | tee file will exit with tee's status, not cmd's. Either use set -o pipefail, or run roar run without the pipeline and have it write the output file directly.

Inspect and show

No artifact found for path: <path>

roar show <path> couldn't find a tracked artifact at that path. Three common reasons:

  • The file was never recorded — only files the tracer saw being read or written get tracked. A file you copied into the directory by hand isn't an artifact until a tracked command touches it.
  • Filtered as noise — temp files in /tmp/, package files in site-packages/, and .roar//.git/ internals are filtered out by default. Set filters.ignore_tmp_files = false in .roar/config.toml to recover temp-file tracking.
  • Path mismatchroar show resolves relative paths against your current cwd. roar show ./model.pkl and roar show $(pwd)/model.pkl should both work; roar show model.pkl when you're in a different directory won't.

roar dag --show-artifacts lists every artifact in the active session; scan there to find the right path or hash.

roar show <path> reports (missing) next to a tracked file

The artifact is registered in roar's DB but the file no longer exists on disk. This is signal, not an error — you can still call roar reproduce <hash> to recreate it. To clear missing entries entirely:

roar reset

…which wipes the active session. If you only want to drop one missing artifact, do it in a fresh session and don't rerun the producing step.

A file shows as both input and output of the same job

In current roar builds this shouldn't happen — the cross-tracer normalization in tracer-fd ensures O_RDWR|O_TRUNC opens (the numpy.savez / zipfile.ZipFile("w") pattern) classify as write-only.

If you're seeing it: confirm you're on a recent build. Historical DB rows from older roar versions can carry the bug; new jobs recorded with current roar are clean. roar reset wipes the historical state.

Registration / scope / auth

You ran roar register (or roar put) from a workspace whose scope isn't set, and you're either logged out or your scope expects a project binding. Three resolutions, depending on intent:

roar scope use anonymous         # publish publicly without an account
roar scope use private           # personal scope, needs roar login
roar scope use <owner>/<project> # team / org project scope

See Scopes for the full picture.

Every roar register prompts for confirmation, even after the first

By design under anonymous scope — every publication is irreversible and worth a conscious confirmation. Bypass for one invocation:

roar register @5 -y

Or switch to a scope that doesn't prompt (roar login then any of private / public / <owner>/<project>).

Auth state is rejected (Stored auth state at <path> is not valid JSON / similar)

Your local auth state was corrupted or partially written. Wipe it and log back in:

roar logout
roar login

Registration aborted. after answering "no" to the anonymous prompt

That's the expected exit when you decline the prompt. The registration didn't happen; the rest of your local state is unchanged. Either pass -y next time, or pick a different scope (roar scope use …).

Reproduce

roar reproduce <hash> fails with "GLaaS server not configured"

The artifact isn't in your local .roar/roar.db so roar tried to fetch its lineage from GLaaS, but the workspace doesn't have a GLaaS URL configured. Two options:

  • Reproduce against your local DB (the artifact must already be tracked locally), or
  • Set the GLaaS URL: roar config set glaas.url https://glaas.ai, then retry.

roar reproduce partial replay — some steps succeed, others fail

The reproduce engine runs every recorded job in topological order. If a step references a tool or path that doesn't exist in the new environment, that step fails and the chain stops.

  • Check what's missing: the failing step's stderr usually tells you (missing python package, missing input file, etc.).
  • For environment differences: roar show @<step> shows the captured environment (packages, env vars) so you can recreate it.

Tracers

See Tracers > Debugging for the common tracer-side failure modes. The most frequent: preflight failures (caught above under "Running commands"), platform mismatches (see the comparison table), and edge cases like setuid/static binaries.

Ray

See Ray > Limitations for the cluster-side gotchas. Common ones:

  • Partial worker lineage — likely a runtime_env policy blocking worker_process_setup_hook. Check your cluster's policy.
  • S3 ETag mismatch — multipart uploads produce hash-of-part-hashes, not a content digest. roar records both ETag and a content blake3 from local open captures.
  • Cluster URL mismatch — driver uses localhost but workers need a routable address. Set ROAR_CLUSTER_GLAAS_URL and/or ROAR_CLUSTER_AWS_ENDPOINT_URL to the worker-visible URLs.

The Ray integration is currently beta — see the callout at the top of the Ray page.

Performance

roar run adds noticeable latency on small commands

The overhead breakdown:

  • Startup: 50–200 ms for backend selection + tracer init.
  • Per-syscall: very low under eBPF/preload, higher under ptrace (two context switches per syscall).
  • Post-run hashing: content hashes are computed at write time (Python open) or post-run (rename/replace). Big files dominate.

For sub-second commands, the overhead is a meaningful fraction of total runtime. If you need lean overhead for many tiny commands, batch them under a single roar run rather than wrapping each.

Hashing throughput looks slow on large files

roar uses blake3 by default with sha256 as a fallback. If blake3 isn't installed in your environment (pip install blake3), you'll get sha256, which is roughly 5× slower on modern CPUs. Check with roar show <artifact-hash> — the Hash (blake3): … line tells you which algorithm ran.

Where to look next

  • Tracers — backend behavior, capabilities, the full debug surface.
  • Scopes — privacy and registration visibility.
  • Ray — Ray cluster setup and limitations.
  • roar Guide — full CLI reference for the commands referenced above.
  • FAQ — implementation details and the "why does it work this way" questions.