Loom is in pilot intake · these docs cover the full data-science lifecycle CLI Talk to us

The data model

Loom's input is a Metaflow data object — a Metaflow Artifact referenced by pathspec (e.g. IngestDataset/123) and read only through the Metaflow Client API. There is exactly one boundary where outside data crosses in — loom ingest — and from then on every verb threads references, never the data itself.

Your input is a Metaflow data object

Loom never invents its own storage format. The unit of data is a Metaflow Artifact — the same versioned, content-addressed object Metaflow produces from any run — identified by a pathspec like IngestDataset/123. Wherever a verb takes data, it takes that pathspec as its --dataset:

loom eda --dataset IngestDataset/123 --target is_fraud

An ingested data object is a small, well-defined set of artifacts: train and optional test (DataFrames) plus a schema dict. Loom reads them only through the Metaflow Client API — concretely metaflow.Run(pathspec).data.<artifact> — and that read happens in exactly one module (loom/dataio.py), the single data door for the whole engine.

i

A pathspec is the whole identity of the data — versioned and stable. The same IngestDataset/123 resolves to the same object whether Loom runs locally on your laptop today or in a cluster later. You pass the pathspec around; Loom resolves the bytes on demand.

The datastore is opaque

Where those artifacts physically live — a local store, or object storage (S3 / minio) — is an implementation detail that Metaflow owns. You configure it once in your Metaflow profile/environment (METAFLOW_PROFILE / METAFLOW_*); after that, Loom is agnostic to it.

i

In local dev the datastore is a minio bucket you can browse at the minio console (http://localhost:9001) — that's where ingested data objects and run artifacts land. See Getting started for standing it up, and Configuration & providers for the profile knobs.

One boundary in: loom ingest

There is exactly one place where data from outside Loom crosses into the model: loom ingest. It runs the IngestDataset flow once to turn a local directory or CSV into Metaflow artifacts, and prints the resulting pathspec — the dataset_ref every downstream verb will use. ingest is the boundary tier; datasets that lists what's ingested is read-only.

# Ingest once — a local dir/CSV becomes a Metaflow data object; prints the dataset_ref.
loom ingest --source ./your_task --name my-dataset   # -> dataset_ref : IngestDataset/123

# See what's ingested (read-only, via the Client API).
loom datasets

The source shape is small and predictable. A directory source contains train.csv (plus optional test.csv / sample_submission.csv); a single .csv is split for you. From there, solutions read from ./input/ and write predictions to ./working/submission.csv — the same on-disk workspace shape no matter where the bytes physically came from.

Don't run a verb against a raw store, a file path mid-analysis, or a freshly fetched MCP source. MCP data tools locate and fetch; the discipline is then to loom ingest the result into a Metaflow data object and work the pathspec. Everything past ingest is a reference, and references are what make composition safe.

Verbs thread references, not data

Once data is a pathspec, the verbs hand off to each other by passing references — pathspecs, card_paths, experiment-ids — never the data. A verb that writes (like features) produces a new data object with its own pathspec, and that pathspec becomes the --dataset for the next verb.

# Build engineered features into a NEW data object; --from drops eda-flagged leakage columns.
loom features --dataset IngestDataset/123 --target target --from EdaFlow/7   # -> FeaturesFlow/9

# The new FeaturesFlow/9 pathspec is the --dataset for every downstream verb.
loom validate --dataset FeaturesFlow/9 --target target

# Promotion asserts the upstream validate VERDICT by reference — never by re-reading data.
loom deploy --validate ValidateFlow/12 --apply

This is how the lifecycle composes: a features data object feeds pipeline / validate; a validate VERDICT==PASS is what deploy asserts before it will promote (a sub-threshold validate BLOCKS deploy). The check happens on the reference and the typed summary, not on the bytes.

You passWhat it isExample
--dataseta Metaflow data-object pathspecIngestDataset/123, FeaturesFlow/9
--run / --solutiona run pathspec (a candidate / flow run)EvalCandidate/42
--validatea validate run whose VERDICT gates deployValidateFlow/12
--froman upstream run to compose on (e.g. drop leakage cols)EdaFlow/7
--experimenta grouping id for report/lineageloom-abc123

The privacy line: bulk data never reaches the LLM

The most load-bearing consequence of the data model is a privacy line: your bulk data never reaches the LLM. Datasets and transactions live as Metaflow data objects in your datastore/perimeter; candidate code processes them there. The model only ever sees small derived context — schema, a preview, code, metrics — never raw rows, never keys.

Keep it that way with prompt hygiene: don't paste large data or log blobs into context. If fetched material is data to model, bring it in via loom ingest — a Metaflow data object — never by streaming it through chat.

Local vs. Metaflow input

Loom has two execution paths, and they differ only in how the input is named — both converge on the same ./input workspace shape so a solution reads it identically.

--mlops local--mlops metaflow (default)
Inputa local --data directorya Metaflow data object (--dataset <pathspec> from loom ingest)
Datalocala Metaflow artifact; the datastore (local or S3/minio) stays in your perimeter, opaque to Loom
Coversthe loom run search onlythe whole lifecycle (eda, features, validate, deploy, …)

The local path is the on-ramp for a quick "does it work" with no datastore. The lifecycle verbs need --mlops metaflow — they run versioned flows, and each candidate becomes a real Metaflow run rather than an in-process one. The dataset_ref pathspec is the through-line either way.

i

The same IngestDataset seam is reused for durable corpus scale: loom telemetry export --to-dataset loom-ds-1 ingests the trajectory corpus through the identical boundary, so it becomes a versioned, lossless Metaflow data object with its own pathspec — one ingest door, used everywhere data enters.