Loom is in pilot intake · these docs cover the full data-science lifecycle CLI Talk to us

Lifecycle & verbs

Loom is an agentic CLI for the full data-science lifecycle — a catalog of verbs that span data access, EDA, feature engineering, model search and training, validation, viz, reporting, deployment, ops, and collaboration. Each verb runs through Loom's MLOps interface, produces a versioned Metaflow run + an @card plus a typed JSON summary with a VERDICT/status line, and declares an approval tier that is enforced beneath the model. Modeling is one slice of this catalog, fully included — never out of scope.

The verb catalog

Every verb is both a loom <verb> command you can run directly and a /loom-<verb> workflow inside the agent (intake → plan/tier → run → verify → deliver). The agent drives these exact verbs, so a natural-language conversation can never do something a verb can't. The noun is almost always a single --dataset pathspec (e.g. IngestDataset/123); a small meta flag set (--from, --target, --goal, --metric) does the rest.

The tier column uses Loom's four approval tiers — read-only never prompts, workspace-write writes only to the workspace/Metaflow, and gated covers the expensive and irreversible/external actions that always require an explicit confirmation.

VerbWhat it doesTier
connect / ingest / datasets Bring outside data in as a Metaflow data object at the single loom ingest boundary — yields a dataset_ref pathspec; datasets lists what's ingested via the Client API. read-only / workspace-write
eda Profile a data object — shape, dtypes, missingness, target balance, top correlations, and leakage flags. read-only
viz Standard lineage-grounded plots from a data object or a run — distributions, correlation, the leaderboard — emitted as @card images. read-only
features Engineer features into a new data object (leakage-aware). Compose with eda via --from — the EDA-flagged leakage columns are dropped first. workspace-write
optimize (loom run) The AIDE search — propose, run, and score candidate solutions against your metric. The metric is the spec; it streams a leaderboard as it fills. expensive
validate Rigorous evaluation — purged K-fold CV + a sealed holdout + calibration + per-slice/fairness (with --sensitive) + leakage. Emits the VERDICT that deploy asserts. workspace-write
pipeline The whole lifecycle as one gated run — profile → features → a bounded optimize → validate — with each stage asserting the prior VERDICT. workspace-writeexpensive
train Build a model through the model-builder seam (default NeMo) — pretrain a backbone, build embeddings, or finetune. Stated in DS vocabulary (--objective, --budget, --backbone). expensive · always-gate
deploy Promote a validated solution. Asserts the upstream validate VERDICT == PASS; plans a staged manifest by default, mutates only with --apply. irreversible / external
ops Monitor run health, the leaderboard, and a data-object drift check (a current dataset vs. a --reference). read-only
report Assemble an experiment's runs + metrics + lineage into a model-card / report. read-only
collab Assemble a sanitized, shareable bundle (card + lineage). Builds by default; the off-box send is behind --send. workspace-write (send is external)
doctor Diagnose the local Loom + Metaflow datastore stack (PASS/WARN/FAIL per check + a one-line VERDICT); run first when anything reports a setup/datastore failure. read-only
i

The read-only and most workspace-write verbs (eda, features, validate, viz, report, ops, datasets, doctor, and the local train stand-in) work without a model key. Only the natural-language planning and the AIDE search (optimize / pipeline) need one — a missing key yields an actionable line (set ANTHROPIC_API_KEY … or pick a --model-provider), never a traceback.

Running a verb directly

Everything the agent does is a verb you can run yourself — keyless and scriptable. Pass --json to get the structured contract that CI and the examples eval bed assert on:

# Profile a data object — read-only, no key, no prompt.
loom eda --dataset IngestDataset/123 --target target

# Build leakage-aware features into a NEW data object, dropping eda-flagged columns.
loom features --dataset IngestDataset/123 --target target --from EdaFlow/7 --recipe minimal

# The AIDE search — the engine behind /loom-optimize. The metric is the spec.
loom run --dataset IngestDataset/123 \
  --goal   "Predict the target for each row in test.csv." \
  --metric "Maximize ROC-AUC on a held-out split." \
  --steps 10

Modeling — optimize and train

Two verbs cover the modeling slice, and the split between them is a deliberate division of labor enforced down at the provider layer — not a user concern.

# Default backend is nemo. With no GPU target it REFUSES cleanly — shows the cost
# plan and tells you what to set. This is the safe default; it never launches.
loom train --dataset IngestDataset/123 --objective next-event --budget full
#   cost (gate) : budget=full: 8 GPU x 12 h = 96 GPU-hours (~$288)
#   STATUS      : REFUSED_NO_GPU_TARGET   (set LOOM_GPU_TARGET, or use the CPU stand-in)

# The torch-free CPU `local` stand-in builds a real backbone end-to-end — no GPU,
# sub-2s, deterministic (a PPMI+SVD embedding model; great for dev + CI).
LOOM_MODEL_BUILDER_PROVIDER=local \
  loom train --dataset IngestDataset/123 --objective next-event --budget probe
#   STATUS      : BUILT   (-> a backbone pathspec + @card)
i

The model-builder hides vocabulary, not physics. --budget full still means real GPU-hours, surfaced at the gate as a cost PLAN (GPU-count · hours · $). The real heavy launch is OFF by default behind --launch, and points LOOM_GPU_TARGET at an on-demand GPU (Modal is the default launcher) — your laptop stays the control plane.

Approval tiers

Every verb declares one of four tiers. The gate is enforced by the client/hook layer in code — not by prompt text — so the boundary holds even if the model is wrong. The governing principle is opinionated low, permissive high: most cautious at compute/data/spend, most deferential at modeling choices.

TierVerbsGate behavior
read-only eda, viz, report, ops, datasets, doctor, validate Never prompts. Inspect data and runs freely; network off by default.
workspace-write ingest, features, pipeline (build), collab (build) Light / auto — runs with a one-line auto note. Writes only to the workspace/Metaflow, never to the source data.
expensive / mutate run (optimize), pipeline's optimize stage, train (GPU) Always gates — the cost / rows are shown before it fires; confirm when the cost or data boundary is large.
irreversible / external deploy --apply, train --launch, collab --send Always gates, never model-auto-invoked. The model proposes; only you can fire it. A deny-first y/N confirm is required before the handler runs.

The two irreversible verbs are safe by default: deploy produces a PLAN + staged manifest unless you pass --apply (and the gate must ALLOW); collab builds a bundle on-box unless you pass --send. The real GPU train --launch is the same posture.

The irreversible/external actions are not offered to the agent automatically. You reach them only by explicitly asking, via the /loom-<verb> command — and even then the harness requires a confirmation before the action runs. Spend and promotion always stay with you.

Composition

Verbs compose by artifact hand-off + machine-checkable exit gates: each verb returns a typed --json summary with a VERDICT/status, and the next verb asserts the prior outcome before it runs. References (pathspec / card_path / experiment-id) thread between verbs — never the data itself. Three gate-asserts are load-bearing:

# 1. Profile -> read leakage_flags from the eda result.
loom eda --dataset IngestDataset/123 --target target          # -> EdaFlow/7

# 2. Features ASSERTS eda's leakage flags and drops those columns first.
loom features --dataset IngestDataset/123 --from EdaFlow/7   # -> FeaturesFlow/9

# 3. Validate emits the VERDICT that gates promotion.
loom validate --dataset FeaturesFlow/9 --target target       # -> VERDICT: PASS

# 4. Deploy ASSERTS validate VERDICT==PASS. PLAN-only by default; --apply mutates.
loom deploy --validate ValidateFlow/12                        # PLAN only (no mutation)
loom deploy --validate ValidateFlow/12 --apply                # real action (gate must ALLOW)

The pipeline verb bakes this composition into one gated run: profile → features → a bounded optimize → validate, each stage asserting the prior VERDICT (leakage blocks features; a sub-threshold validate marks the run FAIL).

Read the exit code, not just the text. Exit 1 still returns a well-formed result (e.g. a VERDICT == FAIL from validate) — that is a domain outcome to compose on, not a crash. Exit 2 is setup/bad-args (e.g. the datastore is absent): run loom doctor and the Metaflow setup, don't retry blindly.

Driving the happy path: /loom-auto

You don't have to thread the verbs by hand. Inside the agent, /loom-auto orchestrates the whole happy path for you — eda → features → optimize → validate → report — threading each artifact between steps and asserting each VERDICT before composing the next. It gates only at the expensive optimize step, and it never auto-fires deploy or collab --send. Each verb still plans, gates on cost/data, calls the interface, and narrates a lineage-grounded result.

loom                                # open the agent
loom> /loom-auto                         # eda -> features -> optimize -> validate -> report
loom> /loom-auto --dataset IngestDataset/123 --goal "predict churn"

So the irreversible promotion and any off-box send stay deliberate: /loom-auto takes you all the way to a validated, reported result, and then leaves the irreversible / external step — deploy --apply or collab --send — for you to fire explicitly. Spend is gated once, at optimize; promotion is never automatic.