Back to blog
📊
Engineering

Benchmark Tests: How Much Does CodeCondense Actually Save?

June 13, 202611 min read

Claims of “40–70% token reduction” are easy to make and hard to verify. So we did the work: we selected six representative codebases, defined a standardised set of Claude Code tasks, and measured exactly what CodeCondense does and does not improve.

This post documents that process in full. No cherry-picked numbers. No ideal conditions. If you are evaluating CodeCondense for your own workflow, this is the data you actually need.

Test Setup

Each codebase was tested against the same set of ten canonical tasks:

  1. Add a new API endpoint with request validation
  2. Refactor a utility function and update all call sites
  3. Fix a bug described in a GitHub issue (simulated with a comment)
  4. Write unit tests for an existing module
  5. Add TypeScript types to an untyped JavaScript file
  6. Review a pull request diff and suggest improvements
  7. Extract a reusable component from duplicated code
  8. Update a dependency and migrate breaking API changes
  9. Add logging and error handling to an existing function
  10. Explain an unfamiliar module to a new team member

Tasks were run three times each (fresh session, warm cache, cold cache after file modifications) and averaged. Token counts were captured from the Anthropic API usage response object — not estimated.

The Codebases

ProjectLines of CodeLanguageDescription
cli-tool~4,200TypeScriptSmall command-line utility
saas-api~18,000TypeScript / NodeREST API with auth and billing
react-dashboard~31,000TypeScript / ReactAdmin dashboard, many components
monorepo-platform~89,000TypeScriptMulti-package monorepo
python-ml-pipeline~12,000PythonData pipeline with Pandas/NumPy
go-microservice~7,500GogRPC service with PostgreSQL

Results: Token Reduction by Project

ProjectBaseline tokensWith CodeCondenseReductionCost saving*
cli-tool41,20014,80064%$0.78/day
saas-api128,40044,90065%$2.51/day
react-dashboard197,30078,10060%$3.58/day
monorepo-platform412,800142,60065%$8.21/day
python-ml-pipeline89,70051,20043%$1.14/day
go-microservice55,10024,80055%$0.91/day

* Based on claude-opus-4-8 pricing at time of writing. Assumes 10 active sessions per day.

Where CodeCondense Helps Most

The biggest wins came from two specific patterns:

1. Repeated File Reads

In the react-dashboard project, the average task involved reading the same file 3.2 times. This is entirely expected behaviour — Claude re-reads a file to confirm that its previous edit was applied correctly, or to refresh its view before making a second change. CodeCondense’s caching layer (keyed on file path + mtime) eliminated 94% of these redundant reads.

# Typical task without CodeCondense
Read(Button.tsx)        → 3,100 tokens
Edit(Button.tsx)        → 3,100 tokens (re-read internally)
Read(Button.tsx)        → 3,100 tokens (confirmation)
Read(Button.tsx)        → 3,100 tokens (next change)
Total: 12,400 tokens for one component edit

# With CodeCondense
Investigate(Button.tsx) → 1,200 tokens (AST summary + symbols)
BatchEdit(Button.tsx)   → 900 tokens  (all changes at once)
Total: 2,100 tokens — an 83% reduction

2. Search-Then-Read Patterns

When Claude searches for a symbol and then reads the file containing it, CodeCondense collapses those two operations into one. The Investigate tool returns search results and the surrounding AST context in a single response, eliminating the follow-up Read entirely.

This pattern accounted for 31% of all baseline token spend across our test suite. Eliminating it is the second-largest source of savings.

Where CodeCondense Helps Less

The python-ml-pipeline project saw a more modest 43% reduction. This is primarily because:

  • Python AST analysis is less precise than TypeScript for our tree-sitter implementation — we cannot yet collapse as aggressively.
  • The tasks involved more exploratory behaviour (reading data files, inspecting CSV schemas) where caching provides little benefit.
  • Jupyter notebooks require a different context model that we have not yet optimised.

Python support is on the roadmap. The current version works best with TypeScript and JavaScript codebases.

Quality: Did Accuracy Change?

This is the question we were most nervous about. Token reduction is meaningless if Claude starts making more mistakes as a result.

We evaluated task quality by manually scoring outputs on a five-point rubric (correctness, completeness, idiomatic style, test coverage where applicable, and whether the output required correction). We scored 300 total tasks — 150 baseline and 150 with CodeCondense.

Average quality score was 4.1 / 5.0 baseline vs 4.3 / 5.0 with CodeCondense. Quality improved slightly, likely because the Investigate tool returns more precisely targeted context, reducing noise in Claude's reasoning.

We were expecting to see a neutral result or a minor regression. The modest improvement was a genuine surprise, and it aligns with what we observe anecdotally: when Claude receives less noise, it makes fewer errors.

Raw Data and Methodology

All test scripts, raw token logs, and scoring sheets are available in the GitHub repository under /benchmarks. We encourage independent verification.

If you run your own tests and find results that differ significantly from ours — in either direction — please open an issue. We want the data to be trustworthy.

Conclusion

The short version: CodeCondense consistently reduces token consumption by 55–65% on TypeScript and JavaScript projects, with proportionally smaller but still meaningful gains on other languages. Quality is maintained or slightly improved. The savings are real, measurable, and reproducible.

The long version is everything above. We tried to be honest about the limitations, the methodology, and the cases where it works less well. Make of the numbers what you will — but please do make something of them, because “trust us, it saves tokens” has never been good enough.

Try CodeCondense

Free plan. No credit card. Saves money from minute one.

npm install -g codecondenseGet started free