Troubleshooting
On this page
This page is symptom-indexed: scan for what you're seeing in the terminal, then jump.
For deeper context on individual subsystems, follow the cross-references to Tracers, Scopes, Ray, or the roar Guide.
First-time setup
Error: roar is not initialized in this directory.
What it means: you're running roar (or roar run, roar dag, etc.) in a directory that doesn't have a .roar/ subdirectory.
roar init
Run once per project. .roar/ lives next to your .git/. The init step is idempotent — re-running on an already-initialized workspace just reports it's done.
Error: Not in a git repository.
roar requires the working tree to be inside a git repo so it can tag every recorded job with the exact commit it ran from. Initialize git first, then roar:
git init && git add -A && git commit -m "initial"
roar init
roar: command not found after pip install roar-cli
Your shell can't see the roar binary on PATH. Two cases:
- Installed via
uv tool install roar-cli(the recommended path) —uvputs shims under~/.local/binby default. Add that to yourPATH, or runuv tool update-shellonce to wire it up. - Installed via
pipx install roar-cli— pipx similarly uses~/.local/bin. Runpipx ensurepathonce. - Installed via
pip install roar-cliinto a project venv —roaronly exists when that venv is activated. The other two are how you keeproaralways-available.
See Installation for the recommended setup.
Running commands
Git repository has uncommitted changes
By design. roar run refuses to start when the working tree is dirty so every recorded job carries an honest commit SHA — otherwise re-running later wouldn't actually re-run the same code.
git add -A && git commit -m "wip"
roar run python train.py …
If you need to iterate fast without committing each loop, see the --allow-dirty discussion in the roar Guide.
Tracer preflight failed for '<backend>': …
The selected tracer can't initialize. Start with the deep preflight to see exactly why:
roar tracer check <backend> # ebpf | preload | ptrace
Common causes by backend:
- eBPF — kernel < 5.8, missing BTF, or
CAP_BPFnot set. Runroar tracer enable ebpfonce to apply the capability, or pick a different backend withroar tracer use preload. - preload — the shared library isn't built.
cd rust && cargo build --releasein theroarrepo. - ptrace —
kernel.yama.ptrace_scopeis blocking us. Checksysctl kernel.yama.ptrace_scope; values0or1typically work,2requiresCAP_SYS_PTRACE.
roar tracer (no subcommand) shows the readiness table for all three backends — useful when you don't yet know which one is failing.
Operation not permitted creating BPF map
You're seeing the raw kernel error from the eBPF backend trying to allocate a BPF map without sufficient privilege. Two paths:
roar tracer check ebpf # confirms the specific cause (CAP_BPF / kernel / BTF)
roar tracer enable ebpf # one-shot: applies CAP_BPF to the eBPF binary
If the host doesn't allow CAP_BPF (managed container runtimes, locked-down CI), eBPF won't work here regardless. Fall back to preload:
roar tracer use preload
See Tracers → Cloud and managed GPU platforms for which environments support eBPF.
roar run completes but the DAG is empty (no outputs reported)
The first diagnostic is which backend actually ran, and whether it could see your workload at all:
roar tracer # which backend is active + the readiness table
roar tracer check <backend> # deep preflight if the readiness table is unclear
The usual causes once you know the backend:
- Static binaries via preload — preload only sees dynamically-linked libc calls. Switch to eBPF or ptrace for static binaries.
- Setuid binaries — the dynamic linker scrubs
LD_PRELOAD. Same fix. - Container without privileges — eBPF requires kernel access; preload usually works. Most CI runners need preload.
- Auto-fallback chose a less-capable backend silently —
roar tracerwill tell you which backend was active for the run; if it wasn't the one you expected, run preflight on the one you want and fix the underlying issue.
The Tracers cloud-platform table lists what's available where.
The wrapped command's exit code seems wrong
roar run forwards the wrapped command's exit code verbatim. If you're seeing the wrong one, check whether you have a shell pipeline rewriting it — roar run cmd | tee file will exit with tee's status, not cmd's. Either use set -o pipefail, or run roar run without the pipeline and have it write the output file directly.
Inspect and show
No artifact found for path: <path>
roar show <path> couldn't find a tracked artifact at that path. Three common reasons:
- The file was never recorded — only files the tracer saw being read or written get tracked. A file you copied into the directory by hand isn't an artifact until a tracked command touches it.
- Filtered as noise — temp files in
/tmp/, package files insite-packages/, and.roar//.git/internals are filtered out by default. Setfilters.ignore_tmp_files = falsein.roar/config.tomlto recover temp-file tracking. - Path mismatch —
roar showresolves relative paths against your currentcwd.roar show ./model.pklandroar show $(pwd)/model.pklshould both work;roar show model.pklwhen you're in a different directory won't.
roar dag --show-artifacts lists every artifact in the active session; scan there to find the right path or hash.
roar show <path> reports (missing) next to a tracked file
The artifact is registered in roar's DB but the file no longer exists on disk. This is signal, not an error — you can still call roar reproduce <hash> to recreate it. To clear missing entries entirely:
roar reset
…which wipes the active session. If you only want to drop one missing artifact, do it in a fresh session and don't rerun the producing step.
A file shows as both input and output of the same job
In current roar builds this shouldn't happen — the cross-tracer normalization in tracer-fd ensures O_RDWR|O_TRUNC opens (the numpy.savez / zipfile.ZipFile("w") pattern) classify as write-only.
If you're seeing it: confirm you're on a recent build. Historical DB rows from older roar versions can carry the bug; new jobs recorded with current roar are clean. roar reset wipes the historical state.
Registration / scope / auth
No GLaaS repo binding found for this publish. Link the repo to a TReqs owner/project first, or rerun with --public to publish publicly.
You ran roar register (or roar put) from a workspace whose scope isn't set, and you're either logged out or your scope expects a project binding. Three resolutions, depending on intent:
roar scope use anonymous # publish publicly without an account
roar scope use private # personal scope, needs roar login
roar scope use <owner>/<project> # team / org project scope
See Scopes for the full picture.
Every roar register prompts for confirmation, even after the first
By design under anonymous scope — every publication is irreversible and worth a conscious confirmation. Bypass for one invocation:
roar register @5 -y
Or switch to a scope that doesn't prompt (roar login then any of private / public / <owner>/<project>).
Auth state is rejected (Stored auth state at <path> is not valid JSON / similar)
Your local auth state was corrupted or partially written. Wipe it and log back in:
roar logout
roar login
Registration aborted. after answering "no" to the anonymous prompt
That's the expected exit when you decline the prompt. The registration didn't happen; the rest of your local state is unchanged. Either pass -y next time, or pick a different scope (roar scope use …).
Reproduce
roar reproduce <hash> fails with "GLaaS server not configured"
The artifact isn't in your local .roar/roar.db so roar tried to fetch its lineage from GLaaS, but the workspace doesn't have a GLaaS URL configured. Two options:
- Reproduce against your local DB (the artifact must already be tracked locally), or
- Set the GLaaS URL:
roar config set glaas.url https://glaas.ai, then retry.
roar reproduce partial replay — some steps succeed, others fail
The reproduce engine runs every recorded job in topological order. If a step references a tool or path that doesn't exist in the new environment, that step fails and the chain stops.
- Check what's missing: the failing step's stderr usually tells you (missing python package, missing input file, etc.).
- For environment differences:
roar show @<step>shows the captured environment (packages, env vars) so you can recreate it.
Tracers
See Tracers > Debugging for the common tracer-side failure modes. The most frequent: preflight failures (caught above under "Running commands"), platform mismatches (see the comparison table), and edge cases like setuid/static binaries.
Ray
See Ray > Limitations for the cluster-side gotchas. Common ones:
- Partial worker lineage — likely a
runtime_envpolicy blockingworker_process_setup_hook. Check your cluster's policy. - S3 ETag mismatch — multipart uploads produce hash-of-part-hashes, not a content digest.
roarrecords both ETag and a content blake3 from localopencaptures. - Cluster URL mismatch — driver uses
localhostbut workers need a routable address. SetROAR_CLUSTER_GLAAS_URLand/orROAR_CLUSTER_AWS_ENDPOINT_URLto the worker-visible URLs.
The Ray integration is currently beta — see the callout at the top of the Ray page.
Performance
roar run adds noticeable latency on small commands
The overhead breakdown:
- Startup: 50–200 ms for backend selection + tracer init.
- Per-syscall: very low under eBPF/preload, higher under ptrace (two context switches per syscall).
- Post-run hashing: content hashes are computed at write time (Python
open) or post-run (rename/replace). Big files dominate.
For sub-second commands, the overhead is a meaningful fraction of total runtime. If you need lean overhead for many tiny commands, batch them under a single roar run rather than wrapping each.
Hashing throughput looks slow on large files
roar uses blake3 by default with sha256 as a fallback. If blake3 isn't installed in your environment (pip install blake3), you'll get sha256, which is roughly 5× slower on modern CPUs. Check with roar show <artifact-hash> — the Hash (blake3): … line tells you which algorithm ran.
Where to look next
- Tracers — backend behavior, capabilities, the full debug surface.
- Scopes — privacy and registration visibility.
- Ray — Ray cluster setup and limitations.
- roar Guide — full CLI reference for the commands referenced above.
- FAQ — implementation details and the "why does it work this way" questions.