Data access (MCP)

Your data rarely lives in one place. Loom reaches it across vendors and clouds — S3, GCS, a warehouse, Postgres, a plain filesystem — and drives cloud ops over the Model Context Protocol. The discipline is strict and it is what keeps this safe: MCP data tools locate and fetch, then you loom ingest into a Metaflow data object and work on the pathspec — never the raw store, never bulk data through chat. MCP cloud-ops tools are gated, exactly like deploy --apply.

What MCP gives Loom

The Model Context Protocol is an open standard for connecting an agent to external systems through small, declared tools. Loom uses it for two jobs, and the two are governed differently:

data tools — locate and pull data from wherever it lives (object stores, warehouses, relational databases, files). They are a doorway, not a workspace. Their output flows through one boundary — loom ingest — before any verb touches it.
cloud-ops tools — provision and deploy against your cloud (spin up infra, push an artifact, mutate a registry). These are irreversible/external and are treated like the irreversible/external verb tier: the model proposes, only you confirm.

MCP is powered by the bundled pi-mcp-adapter — a single lazy proxy tool installed on first launch, so the agent only pays the cost of an MCP server when you actually reach for one. You add servers; the adapter surfaces their tools to the agent.

MCP is how Loom reaches data and clouds it doesn't host. It does not change the data model: the one place outside data crosses into Loom is still loom ingest, and every lifecycle verb still runs on a Metaflow data object referenced by pathspec. MCP just widens where the bytes can come from.

Configure servers in `.mcp.json`

Declare the MCP servers you want in a .mcp.json at the root of your working directory. Each entry names a server, the command that launches it, and the environment it reads — and, as everywhere in Loom, secrets come from the environment, never inline:

{
  "mcpServers": {
    "s3": {
      "command": "uvx",
      "args": ["mcp-server-s3"],
      "env": { "AWS_REGION": "us-east-1" }   # credentials from your shell env
    },
    "warehouse": {
      "command": "uvx",
      "args": ["mcp-server-bigquery", "--project", "my-proj"]
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"]
      # reads the connection string from PGDATABASE / PG* env, not a flag
    }
  }
}

Prefer not to hand-edit JSON? Run /mcp setup inside the agent and let it walk you through adding a server — it writes the same .mcp.json. Either way, the adapter discovers the declared servers and exposes their tools to the agent the next time you reach for data.

⚠

Never put a key, token, or connection secret on a command line or in a committed .mcp.json. Put the literal values an MCP server needs (region, project, host) in the file; put credentials in your environment and reference them by name. This is the same rule the rest of Loom follows — secrets are only ever read from the environment.

The data discipline: locate → fetch → `ingest`

This is the rule that makes MCP safe to use inside a data-science loop. An MCP data tool's job ends the moment the data is on the box. You do not point a verb at a raw MCP store, and you do not stream bulk rows through the chat. Instead:

Locate — use the MCP data tool to find the object/table/file you want (list a bucket, browse a schema, run a scoping query).
Fetch — pull just that down to a local path.
Ingest — cross the one boundary: loom ingest turns the local source into a versioned Metaflow data object and prints its dataset_ref pathspec.
Work the pathspec — every downstream verb (eda, features, run, validate, …) operates on that pathspec, not on the MCP store.

# 1. LOCATE + FETCH via MCP (ask the agent in plain English):
#    "use the s3 server to pull s3://acme-data/txns/2026-05/ into ./pulled"

# 2. INGEST — the one external→Metaflow boundary; prints the dataset_ref
loom ingest --source ./pulled --name acme-txns      # -> IngestDataset/123

# 3. WORK THE PATHSPEC — verbs run on the data object, never on S3
loom eda      --dataset IngestDataset/123 --target is_fraud
loom features --dataset IngestDataset/123 --target is_fraud

Once ingested, the data object is identical to any other — Loom never knows or cares that it came from S3 versus a warehouse versus a local CSV. The datastore behind the pathspec stays in your perimeter and opaque to Loom; the lifecycle composes on references, not bytes.

Two derivative rules follow from this. Prompt hygiene: your bulk data never enters the model's context — only small derived signal (a schema, a preview, metrics) does, and that is the LLM's only view of your data. Scope the fetch: pull the slice you need, not the whole bucket — a scoping query or a prefix beats a full dump every time.

Across vendors and clouds

Because the verb only ever sees a pathspec, the same lifecycle runs no matter where the source sits. A typical pilot mixes several at once — pull a base table from a warehouse, enrich from Postgres, drop a reference file from GCS — and ingests each as its own data object:

Source	MCP `data` server reaches it	Then
Amazon S3	list / get objects by prefix	`loom ingest --source ./pulled`
Google Cloud Storage	list / get objects by prefix	`loom ingest …`
A warehouse (e.g. BigQuery)	run a scoping query, export the result	`loom ingest …`
Postgres	inspect schema, fetch a scoped extract	`loom ingest …`
A filesystem	read files/dirs by path	`loom ingest …`

You can also stitch sources together before ingest — pull two extracts, join or stack them locally, ingest the result as one named dataset. Whatever lands becomes one content-addressed, versioned object you reference by pathspec for the rest of the run.

Cloud ops are gated

The second class of MCP tool drives your cloud directly — provision compute, deploy a service, mutate a registry. These are irreversible/external, so Loom governs them with the same posture as its own most dangerous verb, deploy --apply: the model can propose the action, but only you can fire it.

Tier	MCP class · examples	Gate
read-only	`data` — locate, browse a schema, scoped fetch	runs freely; output goes through `ingest`
irreversible/external	`cloud-ops` — provision, deploy, mutate infra	always gates — deny-first `y/N`, never model-auto-fired

The same interactive approval gate that wraps deploy --apply, train --launch, and collab --send wraps a cloud-ops MCP call: it shows the operation and waits for an explicit confirmation before anything runs. A subagent can never reach one either — delegated personas only ever hold tier-safe, read-only verb sets, so provisioning and deploys always stay with you in the parent session.

⚠

An MCP cloud-ops tool acts on real infrastructure outside Loom's datastore. Treat its approval prompt the way you'd treat deploy --apply: read what it will do, and confirm only when you mean it. Loom never self-approves an external mutation.

In plan mode

When you toggle plan mode (/plan), the agent drops to a read-only exploration phase. MCP fits cleanly into that: a data tool that only locates — listing a bucket, inspecting a warehouse schema — is fair game for figuring out what's there before you commit to a framing. But the boundary still holds: loom ingest is a write, and every cloud-ops tool is irreversible, so both are blocked until you toggle plan mode off and approve the plan. Plan mode is exactly when you'd use MCP to survey the landscape and then propose which sources to ingest.

Recap

MCP lets Loom reach data across vendors and clouds and drive cloud ops, via the bundled pi-mcp-adapter.
Configure servers in .mcp.json, or run /mcp setup in the agent; keep secrets in the environment.
data tools locate and fetch only — then loom ingest into a Metaflow data object and work the pathspec. Never run a verb against the raw store; never push bulk data through chat.
cloud-ops tools are gated like deploy --apply — the model proposes, you confirm.