## What is a tracer?

A **tracer** is the component that observes what your command does as it runs. When you type `roar run python train.py`, the tracer is what records the read/write/mmap/rename/unlink events that flow through your process — and through any subprocess it spawns. From those events `roar` derives the inputs, outputs, and DAG.

`roar` ships three backend tracers and picks one automatically. Each makes different tradeoffs around platform support, privileges, and overhead.

> **Privacy.** Tracers observe *metadata only* — file paths, sizes, syscall events, content hashes. File contents are never captured by the tracer.

## Quick use

```bash
roar tracer                        # show the active backend + per-backend status
roar tracer use <mode>             # pick auto | ebpf | preload | ptrace
roar tracer check <mode>           # deep preflight for one backend (CAP_BPF, BTF, etc.)
roar tracer enable ebpf            # configure eBPF capabilities / sysctls
```

The rest of this page explains *why* you'd pick each, what they observe, and how to debug them when something goes sideways.

## The three backends

### eBPF

A small kernel-side probe attaches to syscall tracepoints (`openat`, `read`, `write`, `mmap`, `rename`, `unlink`, `sendfile`, `copy_file_range`, `clone`/`fork`/`exec`). The probe is registered **system-wide** — it fires on those syscalls from every process on the host — but the in-kernel program's first action is to check the calling PID against a BPF map of `roar`-tracked PIDs. On a miss it returns immediately; on a hit it ships a compact event to a userspace daemon over a perf ring buffer.

**Strengths.** Lowest per-syscall overhead, even versus other in-process tracers. Sees every tracked process regardless of how it was built — statically-linked binaries, setuid binaries, processes that bypass libc are all covered.

**Requirements.** Linux kernel ≥ 5.8 with BPF Type Format (BTF) available. `CAP_BPF` (or root) on the calling user.

**Footprint on the host.** The BPF program executes briefly in kernel context for unrelated processes' syscalls too — the PID-map lookup is the first thing it does, and a miss is a couple of instructions. No userspace events are emitted for those processes, and no path or content data leaves the kernel. The probe is open source and the audit surface is small.

### preload

A `cdylib` shared library that wraps libc's I/O functions via `LD_PRELOAD` on Linux (and `DYLD_INSERT_LIBRARIES` on macOS — the dynamic-linker mechanisms differ by name and a few rules, but the single shared library covers both). Each hooked function emits a structured event to a daemon over a Unix socket before delegating to the real libc symbol.

**Strengths.** Very low overhead. Works on macOS where eBPF isn't available. No kernel privileges required. Crystal-clear semantics: an event fires when libc's `open`/`read`/`write`/etc. are called.

**Fundamental constraint.** Only sees calls that go through the dynamic libc. Statically-linked binaries and processes that issue raw syscalls are invisible. The dynamic linker also scrubs the preload variable from privileged binaries — `LD_PRELOAD` from setuid binaries on Linux, `DYLD_INSERT_LIBRARIES` from any SIP-protected or library-validated process on macOS — so those are invisible too.

### ptrace

Uses `PTRACE_O_TRACESYSGOOD` to stop the traced process on every syscall entry and exit, reads the registers to decide the syscall and arguments, classifies, then resumes.

**Strengths.** Works without privileges on most Linux systems and sees every syscall regardless of how the process was built (static, raw syscalls, etc.). The fallback when neither eBPF nor preload is usable.

**Cost.** Two context switches per syscall. On heavy I/O workloads this is noticeable.

## What gets captured

Across all three backends, `roar`'s tracer records:

- **Reads** — `read`, `pread`, `preadv`, `readv`.
- **Writes** — `write`, `pwrite`, `pwritev`, `writev`.
- **Opens** — `open`, `openat`, `creat`, `fopen` (preload only).
- **Path-publication** — `rename`, `renameat`, `link`, `linkat`, `unlink`, `unlinkat`, `truncate`, `ftruncate`. Each marks the destination path as written even when no `write()` ever fires (e.g. `bash`'s `echo > x`).
- **mmap** — both `PROT_READ` and `PROT_WRITE` mappings classified appropriately. `MAP_PRIVATE` writes do not count as output (they're copy-on-write).
- **Cross-fd I/O** — `sendfile` and `copy_file_range` are recorded as a read on the input fd and a write on the output fd.
- **Subprocesses** — `clone`, `fork`, `vfork`, `exec`. The tracer follows every child automatically; see [How forks are followed](#how-forks-are-followed) below.

**Backends agree.** Cross-backend classification (read vs. write, what counts as a publication, how `mmap` is recorded) lives in the shared `tracer-fd` crate, so the three backends produce the same DAG for the same syscall stream.

**Not captured.** File *contents* (only hashes and sizes). GPU compute issued via driver IOCTLs. Operations on anonymous fds (memfd, pipes, sockets) — those don't correspond to artifacts. Filesystem activity in nested mount namespaces the daemon can't see.

## How to choose

### Comparison

| | **eBPF** | **preload** | **ptrace** |
|---|---|---|---|
| Linux | ✓ (≥ 5.8) | ✓ | ✓ |
| macOS | ✗ | ✓ | ✗ |
| Required privileges | `CAP_BPF` or root | none | none (unless YAMA blocks) |
| Static binaries | ✓ | ✗ | ✓ |
| Setuid binaries | ✓ | ✗ | ✗ |
| Subprocess follow | ✓ | ✓ | ✓ |
| Containers | host-kernel access | ✓ | ✓ |
| Per-syscall overhead | very low | very low | high |
| `mmap` capture | ✓ | ✓ | ✓ |

### Quick guide

- **Linux ≥ 5.8 with root or `CAP_BPF`** → eBPF. Lowest overhead, most coverage.
- **macOS** → preload. The only viable option.
- **No privileges, no kernel-version control** → preload if your workload only uses dynamically-linked tools (the common case), ptrace otherwise.
- **Static binaries or setuid-sensitive workflows on Linux** → eBPF or ptrace.
- **CI** → preload. No kernel deps, no privileges needed, scales fine.

### `auto` mode

The default. `roar` runs a per-backend preflight at startup, picks the most capable backend that passes, and falls back to the next one if the chosen backend errors out during the run (configurable — `tracer.fallback_enabled`).

The fallback order is: **eBPF → preload → ptrace**. The first one whose preflight succeeds becomes the active backend. `roar tracer` shows what auto resolved to and why the others were rejected.

## Limitations

- **Containers and namespaces.** eBPF requires the host kernel to be readable from where the daemon runs — nested namespaces and locked-down container runtimes can interfere. Preload and ptrace work without that constraint.
- **Setuid binaries.** Preload is blocked by the dynamic linker; ptrace fails the privilege check. eBPF observes them transparently.
- **Statically-linked binaries** that issue raw syscalls without going through libc are invisible to preload. eBPF and ptrace see them.
- **GPU compute** issued via driver IOCTLs is recorded as opaque syscall activity — the tracer can't peer into the framebuffer or compute kernel to identify artifacts.
- **Browser-launched / detached processes** that don't inherit the tracer's environment (preload) or attachment (ptrace) escape observation. The eBPF backend sees them because it's system-wide.

## Platform notes

### Linux

- eBPF requires kernel ≥ 5.8 with BTF. Recent Ubuntu / Debian / RHEL ship this by default.
- `kernel.yama.ptrace_scope` controls who can ptrace what. The default of `1` blocks ptrace across users; `0` allows it; `2` requires `CAP_SYS_PTRACE`. `roar tracer enable ebpf` sets `CAP_BPF` so eBPF works regardless.
- `CAP_BPF` (kernel ≥ 5.8) is the minimal capability for eBPF; `roar tracer enable ebpf` configures it as a one-time setup.

### macOS

Preload is the only backend. eBPF doesn't exist; ptrace is blocked on hardened (SIP-protected, library-validated) processes, which covers almost everything you'd want to trace.

The same hardening also strips `DYLD_INSERT_LIBRARIES` — so Apple-signed binaries (the system `/usr/bin/python3`, `/bin/zsh`, etc.) escape the preload tracer too. In practice this isn't a problem: ML workflows almost always run an unhardened Python (uv, Homebrew, MacPorts, Conda, a project venv), which preload covers fine.

### Containers / CI

- **Preload** is the safest default for CI. No kernel deps, no privileges, no host-side configuration.
- **eBPF in containers** works only if the kernel is host-visible (e.g., docker with `--privileged` or specific capabilities). Most managed CI runners don't allow this.
- **ptrace in containers** works in most setups but is slower; double-check that `kernel.yama.ptrace_scope` isn't blocking it.

### Cloud and managed GPU platforms

eBPF needs a real Linux kernel with `CAP_BPF` (or root). Whether you've got that depends almost entirely on how the platform virtualizes you.

| Platform | eBPF? | Notes |
|---|---|---|
| AWS EC2 (incl. GPU instances) | ✓ | Standard Linux VMs; root by default. |
| GCP Compute Engine GPU VMs | ✓ | KVM VMs with root; Google itself ships eBPF/Cilium on GKE. |
| Azure GPU VMs | ✓ | Recent Azure kernels ship with `CONFIG_BPF=y`. |
| Lambda Cloud | ✓ | Ubuntu 22.04 LTS VMs with sudo. |
| CoreWeave | ✓ | Bare-metal Linux; privileged pods can load BPF. |
| Latitude.sh | ✓ | Bare-metal Linux, full kernel and root. |
| DigitalOcean GPU Droplets / Bare Metal GPUs | ✓ | Root VMs; bare metal allows kernel upgrades. |
| Crusoe Cloud | likely ✓ | KVM VMs with sudo; not explicitly verified. |
| Together AI Instant Clusters | likely ✓ | Bare-metal nodes with user-installed K8s/Slurm; not explicitly verified. |
| Anyscale | ✓ if pods are privileged | Needs `securityContext.privileged: true` on the K8s cluster. |
| Vast.ai | depends on the host's Docker config | Containers on third-party hosts; privileged mode varies per template. |
| RunPod | ✗ | Unprivileged containers; no `CAP_BPF`. |
| Modal | ✗ | Workloads run under gVisor; the `bpf()` syscall isn't forwarded to the host kernel. |
| Replicate | ✗ | Managed container runtime; no documented privileged escape hatch. |
| Paperspace Gradient Notebooks | ✗ | Managed notebook containers without host-kernel access. (DigitalOcean Paperspace GPU Droplets are different — those work.) |

**Rule of thumb.** Full VMs and bare metal → eBPF works. Container-as-a-service → either an opt-in privileged mode or nothing. Sandboxed serverless runtimes (Modal's gVisor, etc.) → blocked entirely; `roar`'s auto-fallback picks `preload` or `ptrace`.

If your platform isn't listed, the empirical test is `sudo bpftool feature` — one command, tells you whether the kernel supports the BPF features `roar` needs.

## Distributed runners (Ray, …)

Ray jobs run in worker processes that may be on remote nodes; each worker needs its own tracer attached. `roar` handles this via its Ray backend (`roar.backends.ray.*`), which wraps Ray worker startup so the tracer is present from the first task. See [Ray](/docs/ray) for the full integration story, including fragment-store outputs and host-submit vs in-cluster modes.

## Setup

### Building the binaries

The tracers live in the `roar` repo under `rust/tracers/`. Build all three with:

```bash
cd rust && cargo build --release
```

`roar` looks for the built binaries in `rust/target/release/`. A from-scratch build takes about a minute on a modern laptop.

### eBPF privileges

```bash
roar tracer enable ebpf
```

This is a one-time setup. It applies the `CAP_BPF` capability to the eBPF binary and ensures the kernel exposes the tracepoints we need. Re-run if the binary moves or the kernel changes.

### Configuration

The key knobs in `~/.roar/config.toml` (or per-repo `.roar/config.toml`):

| Key | Default | Effect |
|---|---|---|
| `tracer.mode` | `auto` | Override the default backend selection. |
| `tracer.fallback_enabled` | `true` | If `false`, a backend failure during the run aborts instead of falling back. |
| `tracer.preflight_timeout_ms` | `2000` | How long preflight will wait for each backend to respond before rejecting it. |

CLI flags take precedence: `roar run --tracer preload --no-tracer-fallback python train.py`.

## Debugging

### Preflight

```bash
roar tracer
```

Shows the currently-active backend, the preflight result for each backend, and (when `auto` is in play) why the others were rejected. First place to look when "it's not tracing" — usually preflight has a story.

### Common failure modes

- **`tracer preflight failed for 'ebpf'`** → check `kernel.yama.ptrace_scope`, kernel version, and `CAP_BPF`. Re-run `roar tracer enable ebpf` if needed.
- **`Permission denied` on a setuid binary under preload** → expected. Either rebuild with capabilities you control, switch to eBPF, or wrap the setuid step in a different command.
- **Some outputs are missing** → likely a static binary plus preload. Switch to eBPF or ptrace.
- **DAG shows a file as both input and output** → the cross-tracer normalization should prevent this, but see the recent O_TRUNC fix. If it still happens, file an issue with the tracer report.

### Verbose logs

`roar run -vv ...` enables debug-level tracer logging to `.roar/tracer.log`. Useful for issue reports; not for daily use.

## How they're built (and why they're fast)

All three backends share a small Rust workspace under `rust/`. The split is:

- **`crates/tracer-schema`** — wire-format types (`TraceEvent`, `FileRecord`, `TracerReport`).
- **`crates/tracer-fd`** — per-fd state aggregator. Receives raw `Read`/`Write`/`OpenRead`/`OpenWrite` events, applies the cross-backend classification rules (O_TRUNC = write-only, MAP_SHARED+PROT_WRITE = real write, etc.), and emits a canonical `FileSummary` regardless of which backend produced the events.
- **`tracers/ebpf`** — the kernel-side BPF program (written in Rust via `aya`) plus a userspace daemon. The probe filters in-kernel so the hot path doesn't enter userspace.
- **`tracers/preload`** — a `cdylib` shared library. Each interposed libc function dispatches through `dlsym(RTLD_NEXT)` to call the real symbol, with the event emission on the side via a thread-local Unix socket.
- **`tracers/ptrace`** — a standalone binary that forks the target, attaches with `PTRACE_O_TRACESYSGOOD | TRACEFORK | TRACEVFORK | TRACECLONE`, and runs a syscall-stop loop.

Three deliberate choices that keep the hot paths cheap:

- **No allocations on the syscall path.** Event structs are fixed-size and pre-allocated; reports are batched and serialized via `rmp-serde` after the run.
- **Cross-backend code lives in one crate.** Adding a new policy (like the O_TRUNC fix) means changing one place — `tracer-fd` — and all three backends inherit it.
- **Trace transport is dumb.** preload uses a raw Unix socket; eBPF uses a perf ring buffer; ptrace just collects state in the parent process. No serialization until the run completes.

### How forks are followed

All three backends follow `fork`/`vfork`/`clone`/`exec` automatically. The mechanism is different per backend:

- **eBPF.** The kernel probe attaches to the `sched_process_fork` and `sched_process_exec` tracepoints. When a tracked PID forks, the in-kernel program adds the child PID to the tracked-PID BPF map so its syscalls are observed from the next instruction onward, then emits a `Clone` event to userspace with parent + child PIDs. The userspace daemon inherits per-pid state. Coverage is automatic.
- **preload.** `LD_PRELOAD` is part of the environment, so children inherit it automatically. A `pthread_atfork` handler in the library increments a per-process generation counter on `fork()`, so each child opens its own trace socket without confusing the parent.
- **ptrace.** Attaching with `PTRACE_O_TRACEFORK | TRACEVFORK | TRACECLONE` tells the kernel to auto-attach us to every child the traced process spawns. We get a stop on each new PID and add it to our table.

In practice: `roar run python train.py` with 100 worker processes Just Works on all three backends, with no configuration.

For perf numbers across backends and workloads, see [Benchmarks](/docs/benchmarks).
