Lifecycle & verbs

Loom is an agentic CLI for the full data-science lifecycle — a catalog of verbs that span data access, EDA, feature engineering, model search and training, validation, viz, reporting, deployment, ops, and collaboration. Each verb runs through Loom's MLOps interface, produces a versioned Metaflow run + an @card plus a typed JSON summary with a VERDICT/status line, and declares an approval tier that is enforced beneath the model. Modeling is one slice of this catalog, fully included — never out of scope.

The verb catalog

Every verb is both a loom <verb> command you can run directly and a /loom-<verb> workflow inside the agent (intake → plan/tier → run → verify → deliver). The agent drives these exact verbs, so a natural-language conversation can never do something a verb can't. The noun is almost always a single --dataset pathspec (e.g. IngestDataset/123); a small meta flag set (--from, --target, --goal, --metric) does the rest.

The tier column uses Loom's four approval tiers — read-only never prompts, workspace-write writes only to the workspace/Metaflow, and gated covers the expensive and irreversible/external actions that always require an explicit confirmation.

Verb	What it does	Tier
`connect` / `ingest` / `datasets`	Bring outside data in as a Metaflow data object at the single `loom ingest` boundary — yields a `dataset_ref` pathspec; `datasets` lists what's ingested via the Client API.	read-only / workspace-write
`eda`	Profile a data object — shape, dtypes, missingness, target balance, top correlations, and leakage flags.	read-only
`viz`	Standard lineage-grounded plots from a data object or a run — distributions, correlation, the leaderboard — emitted as `@card` images.	read-only
`features`	Engineer features into a new data object (leakage-aware). Compose with `eda` via `--from` — the EDA-flagged leakage columns are dropped first.	workspace-write
`optimize` (`loom run`)	The AIDE search — propose, run, and score candidate solutions against your metric. The metric is the spec; it streams a leaderboard as it fills.	expensive
`validate`	Rigorous evaluation — purged K-fold CV + a sealed holdout + calibration + per-slice/fairness (with `--sensitive`) + leakage. Emits the `VERDICT` that `deploy` asserts.	workspace-write
`pipeline`	The whole lifecycle as one gated run — profile → features → a bounded optimize → validate — with each stage asserting the prior `VERDICT`.	workspace-write → expensive
`train`	Build a model through the model-builder seam (default NeMo) — pretrain a backbone, build embeddings, or finetune. Stated in DS vocabulary (`--objective`, `--budget`, `--backbone`).	expensive · always-gate
`deploy`	Promote a validated solution. Asserts the upstream `validate VERDICT == PASS`; plans a staged manifest by default, mutates only with `--apply`.	irreversible / external
`ops`	Monitor run health, the leaderboard, and a data-object drift check (a current dataset vs. a `--reference`).	read-only
`report`	Assemble an experiment's runs + metrics + lineage into a model-card / report.	read-only
`collab`	Assemble a sanitized, shareable bundle (card + lineage). Builds by default; the off-box send is behind `--send`.	workspace-write (send is external)
`doctor`	Diagnose the local Loom + Metaflow datastore stack (PASS/WARN/FAIL per check + a one-line `VERDICT`); run first when anything reports a setup/datastore failure.	read-only

The read-only and most workspace-write verbs (eda, features, validate, viz, report, ops, datasets, doctor, and the local train stand-in) work without a model key. Only the natural-language planning and the AIDE search (optimize / pipeline) need one — a missing key yields an actionable line (set ANTHROPIC_API_KEY … or pick a --model-provider), never a traceback.

Running a verb directly

Everything the agent does is a verb you can run yourself — keyless and scriptable. Pass --json to get the structured contract that CI and the examples eval bed assert on:

# Profile a data object — read-only, no key, no prompt.
loom eda --dataset IngestDataset/123 --target target

# Build leakage-aware features into a NEW data object, dropping eda-flagged columns.
loom features --dataset IngestDataset/123 --target target --from EdaFlow/7 --recipe minimal

# The AIDE search — the engine behind /loom-optimize. The metric is the spec.
loom run --dataset IngestDataset/123 \
  --goal   "Predict the target for each row in test.csv." \
  --metric "Maximize ROC-AUC on a held-out split." \
  --steps 10

Modeling — optimize and train

Two verbs cover the modeling slice, and the split between them is a deliberate division of labor enforced down at the provider layer — not a user concern.

optimize (loom run) — the AIDE search. It proposes candidate solutions, runs each one for real, scores it against your metric, and returns a leaderboard. Use it for the cheap, searchable work: framing, features, model heads, tokenization schemes — anything iterable as a scalar.
train — the model-builder. It runs the /loom-train verb against the model-builder interface (default NeMo AutoModel). You state intent in DS vocabulary — --objective {next-event|masked-field|contrastive}, --budget {probe|small|full}, --backbone, --metric — and the adapter lowers that to backend config; no NeMo/Megatron/.nemo nouns ever surface. A heavy pretrain is launch-and-track, so the AIDE search never tree-searches it.

# Default backend is nemo. With no GPU target it REFUSES cleanly — shows the cost
# plan and tells you what to set. This is the safe default; it never launches.
loom train --dataset IngestDataset/123 --objective next-event --budget full
#   cost (gate) : budget=full: 8 GPU x 12 h = 96 GPU-hours (~$288)
#   STATUS      : REFUSED_NO_GPU_TARGET   (set LOOM_GPU_TARGET, or use the CPU stand-in)

# The torch-free CPU `local` stand-in builds a real backbone end-to-end — no GPU,
# sub-2s, deterministic (a PPMI+SVD embedding model; great for dev + CI).
LOOM_MODEL_BUILDER_PROVIDER=local \
  loom train --dataset IngestDataset/123 --objective next-event --budget probe
#   STATUS      : BUILT   (-> a backbone pathspec + @card)

The model-builder hides vocabulary, not physics. --budget full still means real GPU-hours, surfaced at the gate as a cost PLAN (GPU-count · hours · $). The real heavy launch is OFF by default behind --launch, and points LOOM_GPU_TARGET at an on-demand GPU (Modal is the default launcher) — your laptop stays the control plane.

Approval tiers

Every verb declares one of four tiers. The gate is enforced by the client/hook layer in code — not by prompt text — so the boundary holds even if the model is wrong. The governing principle is opinionated low, permissive high: most cautious at compute/data/spend, most deferential at modeling choices.

Tier	Verbs	Gate behavior
read-only	`eda`, `viz`, `report`, `ops`, `datasets`, `doctor`, `validate`	Never prompts. Inspect data and runs freely; network off by default.
workspace-write	`ingest`, `features`, `pipeline` (build), `collab` (build)	Light / auto — runs with a one-line auto note. Writes only to the workspace/Metaflow, never to the source data.
expensive / mutate	`run` (`optimize`), `pipeline`'s optimize stage, `train` (GPU)	Always gates — the cost / rows are shown before it fires; confirm when the cost or data boundary is large.
irreversible / external	`deploy --apply`, `train --launch`, `collab --send`	Always gates, never model-auto-invoked. The model proposes; only you can fire it. A deny-first `y/N` confirm is required before the handler runs.

The two irreversible verbs are safe by default: deploy produces a PLAN + staged manifest unless you pass --apply (and the gate must ALLOW); collab builds a bundle on-box unless you pass --send. The real GPU train --launch is the same posture.

⚠

The irreversible/external actions are not offered to the agent automatically. You reach them only by explicitly asking, via the /loom-<verb> command — and even then the harness requires a confirmation before the action runs. Spend and promotion always stay with you.

Composition

Verbs compose by artifact hand-off + machine-checkable exit gates: each verb returns a typed --json summary with a VERDICT/status, and the next verb asserts the prior outcome before it runs. References (pathspec / card_path / experiment-id) thread between verbs — never the data itself. Three gate-asserts are load-bearing:

EDA leakage flags block features. Before building features, Loom reads the prior eda result's summary.leakage_flags and drops the flagged columns first — pass the eda run via --from EdaFlow/7. The new FeaturesFlow/<id> pathspec is then a --dataset for every downstream verb.
A validate VERDICT == PASS is required by deploy. deploy asserts the referenced validate result before it will promote anything.
A sub-threshold validate BLOCKS deploy. A REVIEW / FAIL / leaky validation stops the chain — it is surfaced plainly, never worked around. A sub-threshold step stops the whole chain.

# 1. Profile -> read leakage_flags from the eda result.
loom eda --dataset IngestDataset/123 --target target          # -> EdaFlow/7

# 2. Features ASSERTS eda's leakage flags and drops those columns first.
loom features --dataset IngestDataset/123 --from EdaFlow/7   # -> FeaturesFlow/9

# 3. Validate emits the VERDICT that gates promotion.
loom validate --dataset FeaturesFlow/9 --target target       # -> VERDICT: PASS

# 4. Deploy ASSERTS validate VERDICT==PASS. PLAN-only by default; --apply mutates.
loom deploy --validate ValidateFlow/12                        # PLAN only (no mutation)
loom deploy --validate ValidateFlow/12 --apply                # real action (gate must ALLOW)

The pipeline verb bakes this composition into one gated run: profile → features → a bounded optimize → validate, each stage asserting the prior VERDICT (leakage blocks features; a sub-threshold validate marks the run FAIL).

⚠

Read the exit code, not just the text. Exit 1 still returns a well-formed result (e.g. a VERDICT == FAIL from validate) — that is a domain outcome to compose on, not a crash. Exit 2 is setup/bad-args (e.g. the datastore is absent): run loom doctor and the Metaflow setup, don't retry blindly.

Driving the happy path: `/loom-auto`

You don't have to thread the verbs by hand. Inside the agent, /loom-auto orchestrates the whole happy path for you — eda → features → optimize → validate → report — threading each artifact between steps and asserting each VERDICT before composing the next. It gates only at the expensive optimize step, and it never auto-fires deploy or collab --send. Each verb still plans, gates on cost/data, calls the interface, and narrates a lineage-grounded result.

loom                                # open the agent
loom> /loom-auto                         # eda -> features -> optimize -> validate -> report
loom> /loom-auto --dataset IngestDataset/123 --goal "predict churn"

So the irreversible promotion and any off-box send stay deliberate: /loom-auto takes you all the way to a validated, reported result, and then leaves the irreversible / external step — deploy --apply or collab --send — for you to fire explicitly. Spend is gated once, at optimize; promotion is never automatic.