Glaas minimal logo, light

Benchmarks

On this page

NOTE: We are actively compiling benchmark stats on the latest roar version. Contact us for more information.

This page reports performance characteristics for the parts of roar whose overhead users actually feel: the tracer backends, the S3 proxy, and the content-hashing pipeline. The point isn't a single benchmark headline, but to provide enough detail to predict overhead on your own workload and pick configurations that match your performance budget.

All benchmarks are reproducible. Setup notes per section.

Tracers

How much wall-clock overhead each tracer backend adds on top of a baseline command. See Tracers for the backend mechanics and tradeoffs that explain why the numbers differ.

WorkloadBaselineeBPFpreloadptrace
cat <small file>TODOTODO (~?%)TODO (~?%)TODO (~?%)
python -c "pass"TODOTODOTODOTODO
python train.py (MNIST baseline)TODOTODOTODOTODO
dd if=/dev/zero of=out bs=1M count=1024TODOTODOTODOTODO

Per-syscall overhead breakdown, per backend:

BackendPer-syscall overhead
eBPFTODO (in-kernel filtering; cheapest)
preloadTODO (libc interposition; very cheap on dynamically-linked binaries)
ptraceTODO (two context switches per syscall; meaningfully higher)

Setup: TODO — hardware, kernel version, what's running, how many trials.

Proxy

How much overhead the S3 proxy adds on top of direct-to-S3 traffic. The proxy stands between an AWS SDK client and S3, re-signs each request, and forwards the result. Measurements should isolate the proxy's own work from network round-trip variability.

OperationDirect-to-S3Through proxyOverhead
GetObject (1 KB)TODOTODOTODO
GetObject (1 MB)TODOTODOTODO
GetObject (100 MB)TODOTODOTODO
PutObject (1 MB)TODOTODOTODO
Multipart upload (1 GB, 8 MB parts)TODOTODOTODO

Sustained throughput on bulk transfers:

DirectThrough proxy
Single-stream GetObjectTODO MB/sTODO MB/s
Multi-stream (16-way) GetObjectTODO MB/sTODO MB/s

Setup: TODO — bucket region vs. host region, network class, MinIO local vs. real-AWS, etc.

Hashes

Throughput of the supported hash algorithms on the same hardware, measured on disk-backed reads.

AlgorithmThroughput (single-threaded)Throughput (SIMD / multi-threaded)
blake3TODO GB/sTODO GB/s
sha256TODO GB/ssha256 doesn't parallelize across one input
sha512TODOsha512 doesn't parallelize across one input
md5TODOmd5 doesn't parallelize across one input

How that translates to wall-clock cost for typical artifact sizes:

File sizeblake3sha256
10 MBTODOTODO
100 MBTODOTODO
1 GBTODOTODO
10 GBTODOTODO

Setup: TODO — CPU, disk class, whether the file is page-cached or cold, hashing library versions.

Where to look next

  • Tracers — backend tradeoffs that explain the tracer overhead numbers.
  • Proxy — the S3 proxy's request-signing path that drives the overhead measurements.
  • Hashes — why blake3 is default, and when sha256 is worth computing alongside.