Loom is in pilot intake · these docs cover the full data-science lifecycle CLI Talk to us

Data access (MCP)

Your data rarely lives in one place. Loom reaches it across vendors and clouds — S3, GCS, a warehouse, Postgres, a plain filesystem — and drives cloud ops over the Model Context Protocol. The discipline is strict and it is what keeps this safe: MCP data tools locate and fetch, then you loom ingest into a Metaflow data object and work on the pathspec — never the raw store, never bulk data through chat. MCP cloud-ops tools are gated, exactly like deploy --apply.

What MCP gives Loom

The Model Context Protocol is an open standard for connecting an agent to external systems through small, declared tools. Loom uses it for two jobs, and the two are governed differently:

MCP is powered by the bundled pi-mcp-adapter — a single lazy proxy tool installed on first launch, so the agent only pays the cost of an MCP server when you actually reach for one. You add servers; the adapter surfaces their tools to the agent.

i

MCP is how Loom reaches data and clouds it doesn't host. It does not change the data model: the one place outside data crosses into Loom is still loom ingest, and every lifecycle verb still runs on a Metaflow data object referenced by pathspec. MCP just widens where the bytes can come from.

Configure servers in .mcp.json

Declare the MCP servers you want in a .mcp.json at the root of your working directory. Each entry names a server, the command that launches it, and the environment it reads — and, as everywhere in Loom, secrets come from the environment, never inline:

{
  "mcpServers": {
    "s3": {
      "command": "uvx",
      "args": ["mcp-server-s3"],
      "env": { "AWS_REGION": "us-east-1" }   # credentials from your shell env
    },
    "warehouse": {
      "command": "uvx",
      "args": ["mcp-server-bigquery", "--project", "my-proj"]
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"]
      # reads the connection string from PGDATABASE / PG* env, not a flag
    }
  }
}

Prefer not to hand-edit JSON? Run /mcp setup inside the agent and let it walk you through adding a server — it writes the same .mcp.json. Either way, the adapter discovers the declared servers and exposes their tools to the agent the next time you reach for data.

Never put a key, token, or connection secret on a command line or in a committed .mcp.json. Put the literal values an MCP server needs (region, project, host) in the file; put credentials in your environment and reference them by name. This is the same rule the rest of Loom follows — secrets are only ever read from the environment.

The data discipline: locate → fetch → ingest

This is the rule that makes MCP safe to use inside a data-science loop. An MCP data tool's job ends the moment the data is on the box. You do not point a verb at a raw MCP store, and you do not stream bulk rows through the chat. Instead:

# 1. LOCATE + FETCH via MCP (ask the agent in plain English):
#    "use the s3 server to pull s3://acme-data/txns/2026-05/ into ./pulled"

# 2. INGEST — the one external→Metaflow boundary; prints the dataset_ref
loom ingest --source ./pulled --name acme-txns      # -> IngestDataset/123

# 3. WORK THE PATHSPEC — verbs run on the data object, never on S3
loom eda      --dataset IngestDataset/123 --target is_fraud
loom features --dataset IngestDataset/123 --target is_fraud

Once ingested, the data object is identical to any other — Loom never knows or cares that it came from S3 versus a warehouse versus a local CSV. The datastore behind the pathspec stays in your perimeter and opaque to Loom; the lifecycle composes on references, not bytes.

i

Two derivative rules follow from this. Prompt hygiene: your bulk data never enters the model's context — only small derived signal (a schema, a preview, metrics) does, and that is the LLM's only view of your data. Scope the fetch: pull the slice you need, not the whole bucket — a scoping query or a prefix beats a full dump every time.

Across vendors and clouds

Because the verb only ever sees a pathspec, the same lifecycle runs no matter where the source sits. A typical pilot mixes several at once — pull a base table from a warehouse, enrich from Postgres, drop a reference file from GCS — and ingests each as its own data object:

SourceMCP data server reaches itThen
Amazon S3list / get objects by prefixloom ingest --source ./pulled
Google Cloud Storagelist / get objects by prefixloom ingest …
A warehouse (e.g. BigQuery)run a scoping query, export the resultloom ingest …
Postgresinspect schema, fetch a scoped extractloom ingest …
A filesystemread files/dirs by pathloom ingest …

You can also stitch sources together before ingest — pull two extracts, join or stack them locally, ingest the result as one named dataset. Whatever lands becomes one content-addressed, versioned object you reference by pathspec for the rest of the run.

Cloud ops are gated

The second class of MCP tool drives your cloud directly — provision compute, deploy a service, mutate a registry. These are irreversible/external, so Loom governs them with the same posture as its own most dangerous verb, deploy --apply: the model can propose the action, but only you can fire it.

TierMCP class · examplesGate
read-only data — locate, browse a schema, scoped fetch runs freely; output goes through ingest
irreversible/external cloud-ops — provision, deploy, mutate infra always gates — deny-first y/N, never model-auto-fired

The same interactive approval gate that wraps deploy --apply, train --launch, and collab --send wraps a cloud-ops MCP call: it shows the operation and waits for an explicit confirmation before anything runs. A subagent can never reach one either — delegated personas only ever hold tier-safe, read-only verb sets, so provisioning and deploys always stay with you in the parent session.

An MCP cloud-ops tool acts on real infrastructure outside Loom's datastore. Treat its approval prompt the way you'd treat deploy --apply: read what it will do, and confirm only when you mean it. Loom never self-approves an external mutation.

In plan mode

When you toggle plan mode (/plan), the agent drops to a read-only exploration phase. MCP fits cleanly into that: a data tool that only locates — listing a bucket, inspecting a warehouse schema — is fair game for figuring out what's there before you commit to a framing. But the boundary still holds: loom ingest is a write, and every cloud-ops tool is irreversible, so both are blocked until you toggle plan mode off and approve the plan. Plan mode is exactly when you'd use MCP to survey the landscape and then propose which sources to ingest.

Recap