Data access (MCP)
Your data rarely lives in one place. Loom reaches it across vendors
and clouds — S3, GCS, a warehouse, Postgres, a plain filesystem — and drives
cloud ops over the Model Context Protocol. The discipline is strict and it
is what keeps this safe: MCP data tools locate and fetch, then
you loom ingest into a Metaflow data object and work on the pathspec — never
the raw store, never bulk data through chat. MCP cloud-ops tools are
gated, exactly like deploy --apply.
What MCP gives Loom
The Model Context Protocol is an open standard for connecting an agent to external systems through small, declared tools. Loom uses it for two jobs, and the two are governed differently:
datatools — locate and pull data from wherever it lives (object stores, warehouses, relational databases, files). They are a doorway, not a workspace. Their output flows through one boundary —loom ingest— before any verb touches it.cloud-opstools — provision and deploy against your cloud (spin up infra, push an artifact, mutate a registry). These are irreversible/external and are treated like the irreversible/external verb tier: the model proposes, only you confirm.
MCP is powered by the bundled pi-mcp-adapter — a single lazy proxy tool
installed on first launch, so the agent only pays the cost of an MCP server when you actually
reach for one. You add servers; the adapter surfaces their tools to the agent.
MCP is how Loom reaches data and clouds it doesn't host. It does not
change the data model: the one place outside data crosses into Loom is still
loom ingest, and every lifecycle verb still runs on a Metaflow
data object referenced by pathspec. MCP just widens where the
bytes can come from.
Configure servers in .mcp.json
Declare the MCP servers you want in a .mcp.json at the root of your working
directory. Each entry names a server, the command that launches it, and the environment it
reads — and, as everywhere in Loom, secrets come from the environment, never inline:
{
"mcpServers": {
"s3": {
"command": "uvx",
"args": ["mcp-server-s3"],
"env": { "AWS_REGION": "us-east-1" } # credentials from your shell env
},
"warehouse": {
"command": "uvx",
"args": ["mcp-server-bigquery", "--project", "my-proj"]
},
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"]
# reads the connection string from PGDATABASE / PG* env, not a flag
}
}
}
Prefer not to hand-edit JSON? Run /mcp setup inside the agent and let it walk
you through adding a server — it writes the same .mcp.json. Either way, the
adapter discovers the declared servers and exposes their tools to the agent the next time
you reach for data.
Never put a key, token, or connection secret on a command line or in a committed
.mcp.json. Put the literal values an MCP server needs (region, project, host)
in the file; put credentials in your environment and reference them by name.
This is the same rule the rest of Loom follows — secrets are only ever read from the
environment.
The data discipline: locate → fetch → ingest
This is the rule that makes MCP safe to use inside a data-science loop. An MCP
data tool's job ends the moment the data is on the box. You do not
point a verb at a raw MCP store, and you do not stream bulk rows through the
chat. Instead:
- Locate — use the MCP
datatool to find the object/table/file you want (list a bucket, browse a schema, run a scoping query). - Fetch — pull just that down to a local path.
- Ingest — cross the one boundary:
loom ingestturns the local source into a versioned Metaflow data object and prints itsdataset_refpathspec. - Work the pathspec — every downstream verb (
eda,features,run,validate, …) operates on that pathspec, not on the MCP store.
# 1. LOCATE + FETCH via MCP (ask the agent in plain English):
# "use the s3 server to pull s3://acme-data/txns/2026-05/ into ./pulled"
# 2. INGEST — the one external→Metaflow boundary; prints the dataset_ref
loom ingest --source ./pulled --name acme-txns # -> IngestDataset/123
# 3. WORK THE PATHSPEC — verbs run on the data object, never on S3
loom eda --dataset IngestDataset/123 --target is_fraud
loom features --dataset IngestDataset/123 --target is_fraud
Once ingested, the data object is identical to any other — Loom never knows or cares that it came from S3 versus a warehouse versus a local CSV. The datastore behind the pathspec stays in your perimeter and opaque to Loom; the lifecycle composes on references, not bytes.
Two derivative rules follow from this. Prompt hygiene: your bulk data never enters the model's context — only small derived signal (a schema, a preview, metrics) does, and that is the LLM's only view of your data. Scope the fetch: pull the slice you need, not the whole bucket — a scoping query or a prefix beats a full dump every time.
Across vendors and clouds
Because the verb only ever sees a pathspec, the same lifecycle runs no matter where the source sits. A typical pilot mixes several at once — pull a base table from a warehouse, enrich from Postgres, drop a reference file from GCS — and ingests each as its own data object:
| Source | MCP data server reaches it | Then |
|---|---|---|
| Amazon S3 | list / get objects by prefix | loom ingest --source ./pulled |
| Google Cloud Storage | list / get objects by prefix | loom ingest … |
| A warehouse (e.g. BigQuery) | run a scoping query, export the result | loom ingest … |
| Postgres | inspect schema, fetch a scoped extract | loom ingest … |
| A filesystem | read files/dirs by path | loom ingest … |
You can also stitch sources together before ingest — pull two extracts, join or stack them locally, ingest the result as one named dataset. Whatever lands becomes one content-addressed, versioned object you reference by pathspec for the rest of the run.
Cloud ops are gated
The second class of MCP tool drives your cloud directly — provision compute, deploy a
service, mutate a registry. These are irreversible/external, so Loom governs
them with the same posture as its own most dangerous verb, deploy --apply:
the model can propose the action, but only you can fire it.
| Tier | MCP class · examples | Gate |
|---|---|---|
| read-only | data — locate, browse a schema, scoped fetch |
runs freely; output goes through ingest |
| irreversible/external | cloud-ops — provision, deploy, mutate infra |
always gates — deny-first y/N, never model-auto-fired |
The same interactive approval gate that wraps deploy --apply,
train --launch, and collab --send wraps a cloud-ops MCP call: it shows
the operation and waits for an explicit confirmation before anything runs. A subagent can never
reach one either — delegated personas only ever hold tier-safe,
read-only verb sets, so provisioning and deploys always stay with you in the parent session.
An MCP cloud-ops tool acts on real infrastructure outside Loom's datastore.
Treat its approval prompt the way you'd treat deploy --apply: read what it will
do, and confirm only when you mean it. Loom never self-approves an external mutation.
In plan mode
When you toggle plan mode (/plan), the agent
drops to a read-only exploration phase. MCP fits cleanly into that: a data tool
that only locates — listing a bucket, inspecting a warehouse schema — is fair game for
figuring out what's there before you commit to a framing. But the boundary still holds:
loom ingest is a write, and every cloud-ops tool is irreversible, so
both are blocked until you toggle plan mode off and approve the plan. Plan mode is exactly when
you'd use MCP to survey the landscape and then propose which sources to ingest.
Recap
- MCP lets Loom reach data across vendors and clouds and drive
cloud ops, via the bundled
pi-mcp-adapter. - Configure servers in
.mcp.json, or run/mcp setupin the agent; keep secrets in the environment. datatools locate and fetch only — thenloom ingestinto a Metaflow data object and work the pathspec. Never run a verb against the raw store; never push bulk data through chat.cloud-opstools are gated likedeploy --apply— the model proposes, you confirm.