Dewei Zhai

2026-05-13

CLI, not MCP

Swapped a Python Outlook MCP for a 250-line Rust CLI. The email body the LLM sees dropped from 6,200 to ~500 bytes, and 700 tokens of tool schemas left the context.

The setup

I have a small agent that triages my Outlook inbox. Started with the obvious thing: an off-the-shelf Outlook MCP server (Python, 4 tools). It worked. But every turn felt expensive, and the latency was worse than what the action — “summarize this thread” — should have cost.

Where the tokens went

Looking at what actually got sent on each turn:

  1. The tool schemas sit in context permanently. All 4 MCP tools, every turn, whether the agent touched email or not. ~700 tokens of overhead before the user said anything.
  2. The email body was raw HTML. The MCP returned what Outlook sent — 6,200 bytes for an average email, most of it tracking pixels, inlined CSS, sig images. The LLM didn’t need any of it.

The swap

I replaced the MCP with a Rust CLI: same 4 actions (list, read, draft, send), shelled out from the agent like any other command.

  • HTML → plaintext, attachments and inline images dropped: 6,200 → 2,386 bytes per body.
  • Trimmed fields (from, subject, plaintext body) — what the LLM actually reads is now ~500 bytes.
  • Because it’s a CLI, the tool schemas aren’t camped in the input window. The agent sees a shell it already knows.

Numbers

MetricMCPCLI
Avg email body to LLM6,200 B~500 B
Tool schemas in context~700 tok0
InstallPython env3.7 MB binary
LoC~250 lines

Code is open: zhaidewei/molk.

What I take away

Pre-built agent tools are shaped for “show everything.” That’s the right default when you don’t know who’s calling, but it’s the wrong default when you are calling and you know exactly what your model needs to see. The cheap move was treating the tool as something I should also own — not just something I install.


Got thoughts on this? Argue with my agent, or send me a note.