Lifecycle & verbs
Loom is an agentic CLI for the full data-science lifecycle — a
catalog of verbs that span data access, EDA, feature engineering, model search and
training, validation, viz, reporting, deployment, ops, and collaboration. Each verb runs
through Loom's MLOps interface, produces a versioned Metaflow run + an @card
plus a typed JSON summary with a VERDICT/status line, and declares an approval
tier that is enforced beneath the model. Modeling is one slice of this catalog, fully
included — never out of scope.
The verb catalog
Every verb is both a loom <verb> command you can run directly and a
/loom-<verb> workflow inside the agent (intake → plan/tier → run → verify →
deliver). The agent drives these exact verbs, so a natural-language conversation can never do
something a verb can't. The noun is almost always a single --dataset pathspec
(e.g. IngestDataset/123); a small meta flag set (--from,
--target, --goal, --metric) does the rest.
The tier column uses Loom's four approval tiers — read-only never prompts, workspace-write writes only to the workspace/Metaflow, and gated covers the expensive and irreversible/external actions that always require an explicit confirmation.
| Verb | What it does | Tier |
|---|---|---|
connect / ingest / datasets |
Bring outside data in as a Metaflow data object at the single loom ingest boundary — yields a dataset_ref pathspec; datasets lists what's ingested via the Client API. |
read-only / workspace-write |
eda |
Profile a data object — shape, dtypes, missingness, target balance, top correlations, and leakage flags. | read-only |
viz |
Standard lineage-grounded plots from a data object or a run — distributions, correlation, the leaderboard — emitted as @card images. |
read-only |
features |
Engineer features into a new data object (leakage-aware). Compose with eda via --from — the EDA-flagged leakage columns are dropped first. |
workspace-write |
optimize (loom run) |
The AIDE search — propose, run, and score candidate solutions against your metric. The metric is the spec; it streams a leaderboard as it fills. | expensive |
validate |
Rigorous evaluation — purged K-fold CV + a sealed holdout + calibration + per-slice/fairness (with --sensitive) + leakage. Emits the VERDICT that deploy asserts. |
workspace-write |
pipeline |
The whole lifecycle as one gated run — profile → features → a bounded optimize → validate — with each stage asserting the prior VERDICT. |
workspace-write → expensive |
train |
Build a model through the model-builder seam (default NeMo) — pretrain a backbone, build embeddings, or finetune. Stated in DS vocabulary (--objective, --budget, --backbone). |
expensive · always-gate |
deploy |
Promote a validated solution. Asserts the upstream validate VERDICT == PASS; plans a staged manifest by default, mutates only with --apply. |
irreversible / external |
ops |
Monitor run health, the leaderboard, and a data-object drift check (a current dataset vs. a --reference). |
read-only |
report |
Assemble an experiment's runs + metrics + lineage into a model-card / report. | read-only |
collab |
Assemble a sanitized, shareable bundle (card + lineage). Builds by default; the off-box send is behind --send. |
workspace-write (send is external) |
doctor |
Diagnose the local Loom + Metaflow datastore stack (PASS/WARN/FAIL per check + a one-line VERDICT); run first when anything reports a setup/datastore failure. |
read-only |
The read-only and most workspace-write verbs (eda, features,
validate, viz, report, ops, datasets,
doctor, and the local train stand-in) work
without a model key. Only the natural-language planning and the AIDE search
(optimize / pipeline) need one — a missing key yields an actionable
line (set ANTHROPIC_API_KEY … or pick a --model-provider), never a
traceback.
Running a verb directly
Everything the agent does is a verb you can run yourself — keyless and scriptable. Pass
--json to get the structured contract that CI and the examples eval bed assert on:
# Profile a data object — read-only, no key, no prompt.
loom eda --dataset IngestDataset/123 --target target
# Build leakage-aware features into a NEW data object, dropping eda-flagged columns.
loom features --dataset IngestDataset/123 --target target --from EdaFlow/7 --recipe minimal
# The AIDE search — the engine behind /loom-optimize. The metric is the spec.
loom run --dataset IngestDataset/123 \
--goal "Predict the target for each row in test.csv." \
--metric "Maximize ROC-AUC on a held-out split." \
--steps 10
Modeling — optimize and train
Two verbs cover the modeling slice, and the split between them is a deliberate division of labor enforced down at the provider layer — not a user concern.
optimize(loom run) — the AIDE search. It proposes candidate solutions, runs each one for real, scores it against your metric, and returns a leaderboard. Use it for the cheap, searchable work: framing, features, model heads, tokenization schemes — anything iterable as a scalar.train— the model-builder. It runs the/loom-trainverb against the model-builder interface (default NeMo AutoModel). You state intent in DS vocabulary —--objective {next-event|masked-field|contrastive},--budget {probe|small|full},--backbone,--metric— and the adapter lowers that to backend config; no NeMo/Megatron/.nemonouns ever surface. A heavy pretrain is launch-and-track, so the AIDE search never tree-searches it.
# Default backend is nemo. With no GPU target it REFUSES cleanly — shows the cost
# plan and tells you what to set. This is the safe default; it never launches.
loom train --dataset IngestDataset/123 --objective next-event --budget full
# cost (gate) : budget=full: 8 GPU x 12 h = 96 GPU-hours (~$288)
# STATUS : REFUSED_NO_GPU_TARGET (set LOOM_GPU_TARGET, or use the CPU stand-in)
# The torch-free CPU `local` stand-in builds a real backbone end-to-end — no GPU,
# sub-2s, deterministic (a PPMI+SVD embedding model; great for dev + CI).
LOOM_MODEL_BUILDER_PROVIDER=local \
loom train --dataset IngestDataset/123 --objective next-event --budget probe
# STATUS : BUILT (-> a backbone pathspec + @card)
The model-builder hides vocabulary, not physics. --budget full
still means real GPU-hours, surfaced at the gate as a cost PLAN (GPU-count · hours · $). The
real heavy launch is OFF by default behind --launch, and points
LOOM_GPU_TARGET at an on-demand GPU (Modal is the default launcher) — your laptop
stays the control plane.
Approval tiers
Every verb declares one of four tiers. The gate is enforced by the client/hook layer in code — not by prompt text — so the boundary holds even if the model is wrong. The governing principle is opinionated low, permissive high: most cautious at compute/data/spend, most deferential at modeling choices.
| Tier | Verbs | Gate behavior |
|---|---|---|
| read-only | eda, viz, report, ops, datasets, doctor, validate |
Never prompts. Inspect data and runs freely; network off by default. |
| workspace-write | ingest, features, pipeline (build), collab (build) |
Light / auto — runs with a one-line auto note. Writes only to the workspace/Metaflow, never to the source data. |
| expensive / mutate | run (optimize), pipeline's optimize stage, train (GPU) |
Always gates — the cost / rows are shown before it fires; confirm when the cost or data boundary is large. |
| irreversible / external | deploy --apply, train --launch, collab --send |
Always gates, never model-auto-invoked. The model proposes; only you can fire it. A deny-first y/N confirm is required before the handler runs. |
The two irreversible verbs are safe by default: deploy produces a
PLAN + staged manifest unless you pass --apply (and the gate must ALLOW);
collab builds a bundle on-box unless you pass --send. The real GPU
train --launch is the same posture.
The irreversible/external actions are not offered to the agent
automatically. You reach them only by explicitly asking, via the /loom-<verb>
command — and even then the harness requires a confirmation before the action runs. Spend and
promotion always stay with you.
Composition
Verbs compose by artifact hand-off + machine-checkable exit gates: each verb
returns a typed --json summary with a VERDICT/status, and the next verb
asserts the prior outcome before it runs. References (pathspec /
card_path / experiment-id) thread between verbs — never the data itself.
Three gate-asserts are load-bearing:
- EDA leakage flags block
features. Before building features, Loom reads the prioredaresult'ssummary.leakage_flagsand drops the flagged columns first — pass the eda run via--from EdaFlow/7. The newFeaturesFlow/<id>pathspec is then a--datasetfor every downstream verb. - A
validate VERDICT == PASSis required bydeploy.deployasserts the referencedvalidateresult before it will promote anything. - A sub-threshold
validateBLOCKSdeploy. A REVIEW / FAIL / leaky validation stops the chain — it is surfaced plainly, never worked around. A sub-threshold step stops the whole chain.
# 1. Profile -> read leakage_flags from the eda result.
loom eda --dataset IngestDataset/123 --target target # -> EdaFlow/7
# 2. Features ASSERTS eda's leakage flags and drops those columns first.
loom features --dataset IngestDataset/123 --from EdaFlow/7 # -> FeaturesFlow/9
# 3. Validate emits the VERDICT that gates promotion.
loom validate --dataset FeaturesFlow/9 --target target # -> VERDICT: PASS
# 4. Deploy ASSERTS validate VERDICT==PASS. PLAN-only by default; --apply mutates.
loom deploy --validate ValidateFlow/12 # PLAN only (no mutation)
loom deploy --validate ValidateFlow/12 --apply # real action (gate must ALLOW)
The pipeline verb bakes this composition into one gated run: profile → features →
a bounded optimize → validate, each stage asserting the prior VERDICT (leakage blocks
features; a sub-threshold validate marks the run FAIL).
Read the exit code, not just the text. Exit 1 still returns a well-formed
result (e.g. a VERDICT == FAIL from validate) — that is a domain
outcome to compose on, not a crash. Exit 2 is setup/bad-args (e.g. the
datastore is absent): run loom doctor and the Metaflow setup, don't retry blindly.
Driving the happy path: /loom-auto
You don't have to thread the verbs by hand. Inside the agent, /loom-auto
orchestrates the whole happy path for you — eda → features → optimize → validate →
report — threading each artifact between steps and asserting each VERDICT
before composing the next. It gates only at the expensive optimize step, and it
never auto-fires deploy or collab --send. Each verb
still plans, gates on cost/data, calls the interface, and narrates a lineage-grounded result.
loom # open the agent
loom> /loom-auto # eda -> features -> optimize -> validate -> report
loom> /loom-auto --dataset IngestDataset/123 --goal "predict churn"
So the irreversible promotion and any off-box send stay deliberate: /loom-auto
takes you all the way to a validated, reported result, and then leaves the
irreversible / external step — deploy --apply or
collab --send — for you to fire explicitly. Spend is gated once, at optimize;
promotion is never automatic.