The agentic CLI

loom is the agentic CLI: you state intent in plain English, the agent picks the right lifecycle verb, runs it through the provider interface, reads the structured VERDICT/summary, and composes the next step. It is a shell over the whole data-science lifecycle — modeling included — never an automated black box you hand a dataset to and walk away from.

Intent in, verb out

You don't memorize a command grammar. You say what you want — "profile my data and flag leakage", "validate this against a sealed holdout" — and the agent does four things, in order, for every turn:

Picks the verb. It maps your goal to one of the lifecycle verbs (eda, features, validate, deploy, …), choosing the smallest verb that answers the question rather than writing ad-hoc ML code.
Runs it through the interface. Every verb executes through Loom's MLOps provider — never a concrete backend — producing a Metaflow run, an @card, and a typed JSON summary.
Reads the structured result. It parses the verb's --json object — the VERDICT/status, the summary, the pathspec, the card_path, the gate — and treats a FAIL as a domain outcome to act on, not a crash.
Composes the next step. It threads each artifact forward (a leakage flag from eda blocks features; a validate VERDICT==PASS is what deploy asserts), so the chain is grounded in real lineage.

Under the hood the agent drives the very same verbs you can run yourself, so the conversation can never do something a verb can't. The agent's judgement is spent on the taste decisions — which metric, which threshold, which features, when to stop — not on mechanics the engine owns. For the full catalog and what each verb does, see Lifecycle & verbs.

Loom is the agentic operator for the entire lifecycle, end to end — data access, EDA, features, model search and training (AIDE + NeMo), validation, viz, reporting, deployment, ops, collaboration. Modeling is one verb in that arc, fully in scope; most tools automate only the modeling step, and Loom is the layer above all of them.

The interactive REPL

Run loom with no arguments to open the agent. It runs in a branded black/magenta terminal UI; the banner shows the active providers and the current model:

loom                      # open the agent (interactive)
loom "profile my data"    # start with a one-shot goal, in plain English
loom <verb> [--flags]     # jump straight to a verb workflow (e.g. loom eda --dataset …)

A short transcript — no key and no infra needed to reach the prompt:

  an agentic CLI for data science
  v0.1.0

  search        : aide
  mlops         : metaflow
  model-builder : nemo
  model         : anthropic-api

  type /help for the verbs, /exit to quit.
loom> profile this dataset and flag leakage
  running eda...
╭─ EDA profile ────────────────────────────────────────────╮
│ rows x cols : 50000 x 31                                  │
│ target      : is_fraud                                    │
│ leakage     : none detected                               │
╰──────────────────────────────────────────────────────────╯
✓ eda ok

A running verb is wrapped in a spinner; results render as panels with the verb's VERDICT line. Ctrl-C cancels the current action, not the REPL; Ctrl-D or /exit quits cleanly. The interactive UI dependencies are imported lazily from the REPL path, so a stripped environment can still run every one-shot subcommand even without them installed.

Slash-commands & the meta set

Every lifecycle verb is also a slash-command. Type a verb with or without a leading /; tab-completion offers every verb plus a small meta set. Each /loom-<verb> runs that verb's full guided workflow — intake → plan/tier → run → verify → deliver.

loom> /eda --dataset IngestDataset/123 --target is_fraud
loom> /validate --dataset IngestDataset/123 --target is_fraud --sensitive region
loom> /deploy --validate ValidateFlow/41 --apply

The verb slash-commands cover the whole catalog — /eda, /datasets, /viz, /features, /run, /validate, /pipeline, /deploy, /train, /ops, /report, /collab, /ingest, /telemetry, /skillopt — plus /plan to toggle plan mode. Alongside those is a small meta set that controls the session itself, not the lifecycle:

Command	What it does
`/help`	The verb table — the full catalog and what each verb does.
`/status`	The banner again — the active search / mlops / model-builder providers and the current model.
`/doctor`	The read-only stack health check (Python engine, Metaflow, datastore reachability).
`/clear`	Clear the conversation and start a fresh context.
`/exit` · `/quit`	Quit the REPL cleanly.

Run the verbs directly, too

Everything the agent does is a verb you can run yourself — keyless and scriptable — straight against the engine with python -m loom <verb> … --json (exactly what the bundled examples/ and CI use). The natural-language planning is the layer on top; the verbs are the durable contract underneath it.

Streaming & the leaderboard

The two search verbs — /run (the AIDE optimize loop) and /pipeline — drive candidate proposal, real execution, and scoring against your metric. Because that loop is long-running, the REPL streams the leaderboard as it fills, so you watch candidates land and the best metric climb in real time:

Leaderboard (top 4):
   1. metric=0.93xx  node=...  [improve]
   2. metric=0.91xx  node=...  [draft]

When streaming isn't reachable, the UI falls back to a spinner plus the final rendered leaderboard — same result, no live fill. The metric is the spec: anything you can state as "here's the data, here's the goal, here's how a solution is scored" is a valid run, and the leaderboard is how the agent reads which candidate to carry forward.

Interactive approval gates

Every verb declares an approval tier, and the gate is enforced by the client/hook layer — beneath the model, not by prompt text. The agent mirrors in its behavior what the gate enforces in code, so it can propose an expensive action but can never self-approve it.

Tier	Examples	Gate behavior
read-only	`eda`, `viz`, `report`, `ops`, `datasets`, `doctor`, `validate`	Never prompts. Runs freely.
workspace-write	`features`, `ingest`, `pipeline`, `collab` (build)	Light / auto — runs with a one-line auto note; network off by default.
expensive · mutate	`run` / `optimize`, `pipeline`'s optimize stage, `train`	Always gates — shows the cost / rows / operation before firing.
irreversible · external	`deploy --apply`, `train --launch`, `collab --send`	Always gates, never model-auto-invoked. The model proposes; only you can fire it.

For the expensive and irreversible tiers the gate is a deny-first confirm: it shows the cost or the operation and asks [y/N], with the default landing on deny. The transcript below is the real shape — a declined deploy is blocked, not worked around:

loom> /deploy --validate ValidateFlow/41 --apply
╭─ Approval required: IRREVERSIBLE/EXTERNAL ────────────────╮
│ deploy to the external registry                           │
│ This is the IRREVERSIBLE/EXTERNAL tier — it always gates. │
│ The model proposes; only you can fire this.               │
╰──────────────────────────────────────────────────────────╯
Proceed with deploy to the external registry? [y/N]: n
  IRREVERSIBLE/EXTERNAL: BLOCKED — deploy to the external registry not approved.

⚠

The two irreversible verbs are safe by default: deploy produces a plan + staged manifest unless you pass --apply (and even then the gate must ALLOW), and collab builds a bundle that never leaves the box unless you pass --send. The principle throughout is opinionated low, permissive high — most cautious at compute / data / spend, most deferential at modeling choices.

Keyless by default

The read-only and lifecycle verbs work without a model key — datasets, eda, viz, report, ops, validate, features, train (local), telemetry, doctor. Only two things need a key: the natural-language planning and the AIDE search behind run / pipeline.

A missing key never produces a traceback — it yields an actionable line telling you exactly which variable to set or which provider to pick:

# the quickest path — Claude is the default model
export ANTHROPIC_API_KEY="sk-ant-..."
# …or, inside the agent:
loom> /login

This is why you can install Loom, run loom doctor, ingest data, and profile it before you ever configure a model — the keyed surface is small and clearly fenced. See Configuration & providers for the full provider matrix.

Picking the agent model

There are two model channels, and keeping them distinct is what lets the agent stay a thin shell over the engine:

The agent model — the LLM that drives the conversation, picks verbs, and narrates results. Set it per session in the REPL with /model, or at launch with the --model <provider/id> flag (the banner shows not set (use /model or /login) until you do).
The engine's optimization providers — the LLM the AIDE search uses to write and judge candidate solutions, configured by --code-provider / --feedback-provider (or both at once with --model-provider) and their LOOM_* environment variables.

# pick the conversational agent model
loom --model anthropic-api/claude-sonnet-4-5
loom> /model

# …a separate channel: the engine's own search providers
loom run --goal "..." --metric "..." --model-provider openai-api

The engine's optimization providers are a separate channel the agent never proxies. When the AIDE search runs, its code-writing and judging calls go straight to the configured engine provider — they do not pass through the conversational agent. You can run the agent on one model and the search on a different one (or a different vendor entirely), and tune each independently.