The agentic CLI
loom is the agentic CLI: you state intent in plain English, the agent
picks the right lifecycle verb, runs it through the provider interface, reads the structured
VERDICT/summary, and composes the next step. It is a shell over the whole data-science
lifecycle — modeling included — never an automated black box you hand a dataset to and walk away from.
Intent in, verb out
You don't memorize a command grammar. You say what you want — "profile my data and flag leakage", "validate this against a sealed holdout" — and the agent does four things, in order, for every turn:
- Picks the verb. It maps your goal to one of the lifecycle verbs (
eda,features,validate,deploy, …), choosing the smallest verb that answers the question rather than writing ad-hoc ML code. - Runs it through the interface. Every verb executes through Loom's MLOps provider — never
a concrete backend — producing a Metaflow run, an
@card, and a typed JSON summary. - Reads the structured result. It parses the verb's
--jsonobject — theVERDICT/status, thesummary, thepathspec, thecard_path, thegate— and treats aFAILas a domain outcome to act on, not a crash. - Composes the next step. It threads each artifact forward (a leakage flag from
edablocksfeatures; avalidateVERDICT==PASSis whatdeployasserts), so the chain is grounded in real lineage.
Under the hood the agent drives the very same verbs you can run yourself, so the conversation can never do something a verb can't. The agent's judgement is spent on the taste decisions — which metric, which threshold, which features, when to stop — not on mechanics the engine owns. For the full catalog and what each verb does, see Lifecycle & verbs.
Loom is the agentic operator for the entire lifecycle, end to end — data access, EDA, features, model search and training (AIDE + NeMo), validation, viz, reporting, deployment, ops, collaboration. Modeling is one verb in that arc, fully in scope; most tools automate only the modeling step, and Loom is the layer above all of them.
The interactive REPL
Run loom with no arguments to open the agent. It runs in a branded
black/magenta terminal UI; the banner shows the active providers and the current model:
loom # open the agent (interactive)
loom "profile my data" # start with a one-shot goal, in plain English
loom <verb> [--flags] # jump straight to a verb workflow (e.g. loom eda --dataset …)
A short transcript — no key and no infra needed to reach the prompt:
an agentic CLI for data science
v0.1.0
search : aide
mlops : metaflow
model-builder : nemo
model : anthropic-api
type /help for the verbs, /exit to quit.
loom> profile this dataset and flag leakage
running eda...
╭─ EDA profile ────────────────────────────────────────────╮
│ rows x cols : 50000 x 31 │
│ target : is_fraud │
│ leakage : none detected │
╰──────────────────────────────────────────────────────────╯
✓ eda ok
A running verb is wrapped in a spinner; results render as panels with the verb's
VERDICT line. Ctrl-C cancels the current action, not the REPL;
Ctrl-D or /exit quits cleanly. The interactive UI dependencies are imported
lazily from the REPL path, so a stripped environment can still run every one-shot
subcommand even without them installed.
Slash-commands & the meta set
Every lifecycle verb is also a slash-command. Type a verb with or without a leading
/; tab-completion offers every verb plus a small meta set. Each
/loom-<verb> runs that verb's full guided workflow — intake → plan/tier → run →
verify → deliver.
loom> /eda --dataset IngestDataset/123 --target is_fraud
loom> /validate --dataset IngestDataset/123 --target is_fraud --sensitive region
loom> /deploy --validate ValidateFlow/41 --apply
The verb slash-commands cover the whole catalog —
/eda, /datasets, /viz, /features,
/run, /validate, /pipeline, /deploy,
/train, /ops, /report, /collab,
/ingest, /telemetry, /skillopt — plus
/plan to toggle plan mode. Alongside those is a small
meta set that controls the session itself, not the lifecycle:
| Command | What it does |
|---|---|
/help | The verb table — the full catalog and what each verb does. |
/status | The banner again — the active search / mlops / model-builder providers and the current model. |
/doctor | The read-only stack health check (Python engine, Metaflow, datastore reachability). |
/clear | Clear the conversation and start a fresh context. |
/exit · /quit | Quit the REPL cleanly. |
Run the verbs directly, too
Everything the agent does is a verb you can run yourself — keyless and scriptable — straight against
the engine with python -m loom <verb> … --json (exactly what the bundled
examples/ and CI use). The natural-language planning is the layer on top; the verbs are
the durable contract underneath it.
Streaming & the leaderboard
The two search verbs — /run (the AIDE optimize loop) and /pipeline — drive
candidate proposal, real execution, and scoring against your metric. Because that loop is long-running,
the REPL streams the leaderboard as it fills, so you watch candidates land and the
best metric climb in real time:
Leaderboard (top 4):
1. metric=0.93xx node=... [improve]
2. metric=0.91xx node=... [draft]
When streaming isn't reachable, the UI falls back to a spinner plus the final rendered leaderboard — same result, no live fill. The metric is the spec: anything you can state as "here's the data, here's the goal, here's how a solution is scored" is a valid run, and the leaderboard is how the agent reads which candidate to carry forward.
Interactive approval gates
Every verb declares an approval tier, and the gate is enforced by the client/hook layer — beneath the model, not by prompt text. The agent mirrors in its behavior what the gate enforces in code, so it can propose an expensive action but can never self-approve it.
| Tier | Examples | Gate behavior |
|---|---|---|
| read-only | eda, viz, report, ops, datasets, doctor, validate | Never prompts. Runs freely. |
| workspace-write | features, ingest, pipeline, collab (build) | Light / auto — runs with a one-line auto note; network off by default. |
| expensive · mutate | run / optimize, pipeline's optimize stage, train | Always gates — shows the cost / rows / operation before firing. |
| irreversible · external | deploy --apply, train --launch, collab --send | Always gates, never model-auto-invoked. The model proposes; only you can fire it. |
For the expensive and irreversible tiers the gate is a deny-first confirm: it shows
the cost or the operation and asks [y/N], with the default landing on deny. The
transcript below is the real shape — a declined deploy is blocked, not worked
around:
loom> /deploy --validate ValidateFlow/41 --apply
╭─ Approval required: IRREVERSIBLE/EXTERNAL ────────────────╮
│ deploy to the external registry │
│ This is the IRREVERSIBLE/EXTERNAL tier — it always gates. │
│ The model proposes; only you can fire this. │
╰──────────────────────────────────────────────────────────╯
Proceed with deploy to the external registry? [y/N]: n
IRREVERSIBLE/EXTERNAL: BLOCKED — deploy to the external registry not approved.
The two irreversible verbs are safe by default: deploy produces a
plan + staged manifest unless you pass --apply (and even then the gate must ALLOW), and
collab builds a bundle that never leaves the box unless you pass --send.
The principle throughout is opinionated low, permissive high — most cautious at
compute / data / spend, most deferential at modeling choices.
Keyless by default
The read-only and lifecycle verbs work without a model key —
datasets, eda, viz, report, ops,
validate, features, train (local), telemetry,
doctor. Only two things need a key: the natural-language planning and the
AIDE search behind run / pipeline.
A missing key never produces a traceback — it yields an actionable line telling you exactly which variable to set or which provider to pick:
# the quickest path — Claude is the default model
export ANTHROPIC_API_KEY="sk-ant-..."
# …or, inside the agent:
loom> /login
This is why you can install Loom, run loom doctor, ingest data, and profile it before
you ever configure a model — the keyed surface is small and clearly fenced. See
Configuration & providers for the full provider matrix.
Picking the agent model
There are two model channels, and keeping them distinct is what lets the agent stay a thin shell over the engine:
- The agent model — the LLM that drives the conversation, picks verbs, and narrates results.
Set it per session in the REPL with
/model, or at launch with the--model <provider/id>flag (the banner showsnot set (use /model or /login)until you do). - The engine's optimization providers — the LLM the AIDE search uses to write and judge
candidate solutions, configured by
--code-provider/--feedback-provider(or both at once with--model-provider) and theirLOOM_*environment variables.
# pick the conversational agent model
loom --model anthropic-api/claude-sonnet-4-5
loom> /model
# …a separate channel: the engine's own search providers
loom run --goal "..." --metric "..." --model-provider openai-api
The engine's optimization providers are a separate channel the agent never proxies. When the AIDE search runs, its code-writing and judging calls go straight to the configured engine provider — they do not pass through the conversational agent. You can run the agent on one model and the search on a different one (or a different vendor entirely), and tune each independently.