Back to Blog

MCP Is Wasting Your Context Window (And How to Fix It)

MCP Is Wasting Your Context Window (And How to Fix It)

If you've used Claude Code, Cursor, or any AI coding tool with MCP servers plugged in, you've probably noticed something: the more tools you connect, the dumber your AI gets. Not because the model changed — because you're drowning it in tool definitions it doesn't need yet.

The Problem in Plain English

Think of an AI's context window like a desk. Everything the AI is thinking about — your conversation, your code, its instructions — has to fit on that desk. MCP, the protocol that lets AI tools talk to external servers, works like this: the moment you connect a server, it dumps the entire manual for every tool it offers onto that desk. All at once. Whether you need them or not.

One server with six tools? Fine, that's a pamphlet. But connect five servers with fifty tools between them and you've just dropped a phone book on the desk. We're talking 30,000 to 60,000 tokens — up to 30% of the AI's working memory — consumed before you even ask a question.

It gets worse. Those tool definitions are full of repeated boilerplate. The owner parameter shows up in 60% of GitHub tools. The repo parameter in 65%. Same definition, copied and pasted dozens of times. It's like printing the same glossary in every chapter of a book.

And when a tool does run? It sends back everything. You asked for a file's status? Here's the full metadata, permissions, timestamps, and the kitchen sink. No way to say "just give me the short version."

The result: your AI starts strong, but as the conversation grows and context fills up, it loses track of what matters. Not because the model is bad, but because the protocol is wasteful.

What "Fixed" Looks Like

Imagine a different approach. Call it MCP Zero — zero tokens until you actually need them.

Instead of dumping the manual, hand over a table of contents. When a server connects, it sends a tiny manifest: just the tool name and a one-line description. Ten tokens per tool instead of five hundred. That fifty-tool server now costs 500 tokens instead of 25,000.

Look things up when you need them, not before. When the AI decides it wants to use a tool, then it asks for the full definition. One tool at a time. Like checking a reference book instead of memorizing it cover to cover.

Give tools a short, medium, and long version. Sometimes the AI just needs to know a tool exists (manifest). Sometimes it needs the parameter names (summary). Sometimes it needs the full JSON Schema with descriptions and constraints (full). Let the client pick based on how much desk space is left.

Stop repeating yourself. Common parameter types — file paths, git refs, pagination options — get defined once in a shared registry. Tools reference them by name instead of inlining the same definition over and over.

Let the AI say "I'm running low." At connection time, the client tells the server its context budget. The server adapts: shorter descriptions, summarized results, paginated responses. Context becomes a resource both sides can reason about, not a silent limit that causes mysterious degradation.

Summarize first, detail on request. Tool results come back in layers. Here's the summary. Want more? Ask. No more 50KB JSON payloads when you only needed three fields.

The Difference

Here's MCP today:

  1. Connect to server
  2. Receive ALL tool schemas (500 tokens each)
  3. AI reads through 30K tokens of definitions
  4. AI calls one tool, gets full-verbosity result
  5. Context fills up, conversation degrades

Here's what it could be:

  1. Connect to server, share context budget
  2. Receive manifest (10 tokens per tool)
  3. AI searches manifest, requests schemas for 2-3 relevant tools
  4. AI calls tool, gets summary result
  5. Context stays lean, conversation stays sharp

Why This Matters Now

MCP is still early. The ecosystem is growing fast — more servers, more tools, more ambitious agent workflows. The protocol works fine at demo scale. But anyone building serious multi-server setups is already hitting the wall.

Claude Code has started working around this with a tool search feature that kicks in when definitions exceed 10K tokens. That's a smart band-aid. But the protocol itself should be designed for efficiency from the ground up, not patched after the fact.

The tools should cost nothing until you need them. That's the whole idea.