## The short version

Most teams do not have a single system that answers lineage questions end-to-end. They piece it together from logs, notebooks, experiment trackers, tickets, and memory. That is usually enough while the run is fresh and the original author is still around. It breaks down when you need to answer, with confidence, what *actually* produced a model or dataset months later.

`roar` and GLaaS are for that gap.

- `roar` observes what a run actually read and wrote.
- GLaaS registers that lineage under stable hashes so you can search it, inspect it, and reproduce it later.
- The result is a run-grounded lineage graph, not just a collection of notes and run records.

## Compared to logs and experiment trackers

| Approach | Good at | Where it breaks down | What roar + GLaaS adds |
| --- | --- | --- | --- |
| Plain logs, notebooks, and READMEs | Human context, debugging notes, quick operational breadcrumbs | They describe intent, not a verified dependency graph. They drift, omit intermediate artifacts, and are hard to audit after the fact. | Hash-grounded lineage for the actual run: which inputs were read, which outputs were written, and how they connect. |
| Experiment trackers like W&B / MLflow / Neptune | Metrics, run comparison, dashboards, hyperparameters, links to training runs | They are not usually the full upstream data lineage. They tell you about the run record, not every data dependency that fed the final artifact. | The upstream artifact graph behind the run. GLaaS links out to experiment trackers rather than replacing them. |

## The practical distinction

The key distinction is **possible lineage** versus **actual lineage**.

- Code, configs, and naming conventions tell you what *should* have happened.
- `roar` records what *did* happen in a specific run.

That difference matters when:

- the same script can run on different subsets or parameters,
- transformations happened interactively or through shell glue,
- the original author is gone,
- or you need to defend an answer during an audit instead of giving your best reconstruction.

## Why logs are not enough

Logs are useful when you already know where to look. They are poor at reconstructing lineage across multiple steps, files, and tools.

- They tell you what a script printed, not necessarily what it consumed.
- They depend on human discipline: somebody had to log the right thing at the right moment.
- They rarely give you a navigable graph from final artifact back to upstream inputs.

That is why teams often feel fine during development but get stuck on the "prove it six months later" question.

## Why experiment trackers are not enough on their own

Experiment trackers solve a different problem well. They help you compare runs, inspect metrics, track hyperparameters, and keep a dashboard of training results.

What they usually do **not** give you is the full upstream lineage behind an artifact.

- You may know the run ID, metric chart, and config.
- You may not know the exact chain of intermediate artifacts and producer jobs behind the final model.
- You may still need team conventions or manual structure to connect one run to the next stage cleanly.

This is why GLaaS is complementary to W&B, MLflow, and Neptune, not a replacement for them. See [Experiment Tracking](/docs/experiment-tracking) for the integration model.

## What this page should help you decide

This page is not arguing that logs and experiment trackers are bad. It is answering a narrower question:

> If we already have logs and an experiment tracker, what is still missing?

The missing layer is a trustworthy record of what the run actually read, wrote, and produced, plus a way to walk that graph later from the artifact you care about.

If your current stack already answers that question end to end, you may not need GLaaS for this part of the problem. If it does not, this is the gap `roar` and GLaaS are designed to close.

GLaaS is a strong fit when:

- you can answer "which model artifact?" but not reliably answer "what exact data and code produced it?"
- your team mixes Python, shell, notebooks, and tracker dashboards
- you want lineage without first migrating everything onto a heavier platform
- or you need to revisit results months later without relying on memory

If your current platform already gives you that end-to-end answer, this page is mostly a scope check: GLaaS is solving the lineage layer, not trying to replace your logging or experiment tracking workflow.

## Where to look next

- [Why roar + GLaaS?](/docs/why-roar-glaas) — the design philosophy behind observation-first lineage.
- [Experiment Tracking](/docs/experiment-tracking) — how GLaaS coexists with W&B / MLflow / Neptune.
- [Use Cases](/docs/use-cases) — concrete workflows like "what changed?" and "prove which dataset trained this model".
- [Scopes](/docs/scopes) — what is globally discoverable and what stays private.
