Back to blog
🔥
Story

I Was Burning $300/Month on Claude Code Tokens — So I Built My Own Plugin

June 10, 20268 min read

Let me be direct with you: I love Claude Code. I use it every single day. It has genuinely changed how I write software — I move faster, I think at a higher level, and I ship with more confidence. But somewhere around the third month of using it seriously, I opened my Anthropic invoice and saw a number that stopped me cold: $312.47 for a single month.

I am not a company. I am a developer working on a SaaS product. Three hundred dollars a month on an AI coding assistant is not a catastrophe, but it is real money — money that made me sit down and actually think about what I was paying for.

The Moment I Realised Something Was Wrong

I started logging my Claude Code sessions in detail. Not to complain — I genuinely wanted to understand where the tokens were going. What I found was surprising. A typical feature implementation session looked something like this:

Tool: Read(src/lib/auth.ts)         → 4,200 input tokens
Tool: Search("getUserById")          → 2,100 input tokens
Tool: Read(src/lib/auth.ts)         → 4,200 input tokens   ← again!
Tool: Edit(src/lib/auth.ts)         → 4,200 input tokens   ← and again!
Tool: Read(src/lib/auth.ts)         → 4,200 input tokens   ← still reading
...
Total: ~28,000 tokens for one small change

Claude was reading the same file three or four times per task. Not because it was doing something wrong — it genuinely needed to maintain context across tool calls. But that pattern, multiplied across dozens of tasks per day, was absolutely obliterating my token budget.

I also noticed something else: every time Claude called a Search tool, it would receive a long list of matching lines with surrounding context. Then, seconds later, it would call Read on the same file anyway because it needed the broader structure. The search results were essentially wasted tokens — a costly appetiser before the real meal.

What I Built

I spent a weekend building a Claude Code plugin. The idea was simple: intercept tool calls at the MCP layer, observe what Claude was doing, and make intelligent decisions about what to actually send back.

The first version had three capabilities:

  1. Deduplication: If Claude had already read a file in the current session and the file had not changed on disk, return a cached summary instead of the full content. The cache key was the file path plus its mtime.
  2. Smarter search: Instead of returning raw grep matches, use a tree-sitter AST to understand what Claude was actually looking for. If it searched for a function name, return the function definition and its immediate callers — not fifty lines of surrounding context.
  3. Batch edits: Accumulate multiple Edit calls for the same file and apply them in a single round-trip, sending the result back once rather than re-reading the file after each individual change.

The results after the first week were encouraging but not earth-shattering. My token usage dropped by about 28%. Good — but I had a feeling there was more to uncover.

The Iteration That Changed Everything

The real breakthrough came when I stopped thinking about individual tool calls and started thinking about intent. When Claude calls Read(auth.ts) followed by Search("getUserById"), it is not doing two separate things — it is exploring a single concept across a codebase. I could satisfy both calls with a single response if I understood what Claude was trying to accomplish.

I added an Investigate operation. Instead of waiting for Claude to make five incremental calls, Investigate accepts a query — a symbol name, a file path, a concept — and returns a synthesised response that includes:

  • The AST-level symbol definition (where it is declared, its type signature)
  • The top five usage sites across the codebase, ranked by relevance
  • The files most likely to need modification, scored by match density
  • Inline context for each location — but only what is actually needed

What used to cost 28,000 tokens now costs around 7,000. Same information. Same quality of Claude's response. 75% less money.

The Numbers After Two Months

My invoice went from $312 to $87 over two months — a 72% reduction — while my usage actually increased. I was running more sessions, writing more code, and spending less.

But more than the cost, something subtler changed. Because each tool call now returned denser, more precisely targeted information, Claude made fewer mistakes. It had less noise to reason through. Tasks that used to require correction loops — where Claude would misidentify the wrong function or edit the wrong file — were completing correctly on the first attempt more often.

Token efficiency and quality of output turned out to be the same problem.

Why I am Sharing This

I packaged this as CodeCondense and released it because I suspect I am not alone. If you are using Claude Code seriously — running it on real projects, multiple times a day — there is a very good chance you are paying for tokens that are not helping you. Repeated reads, bloated search results, one-file-at-a-time edits: these patterns are not bugs in Claude, they are natural consequences of how MCP tool calls work by default.

CodeCondense sits between Claude Code and your filesystem and makes those patterns more efficient. It does not change how Claude thinks. It just ensures that what Claude receives is as compact and information-dense as possible.

Install it, run it for a week, check your Anthropic usage dashboard. I am confident you will see a difference.

“I was amazed — not by how much I was spending, but by how much of that spending was doing absolutely nothing useful.”

Try CodeCondense

Free plan. No credit card. Saves money from minute one.

npm install -g codecondenseGet started free