Claims of “40–70% token reduction” are easy to make and hard to verify. So we did the work: we selected six representative codebases, defined a standardised set of Claude Code tasks, and measured exactly what CodeCondense does and does not improve.

This post documents that process in full. No cherry-picked numbers. No ideal conditions. If you are evaluating CodeCondense for your own workflow, this is the data you actually need.

Test Setup

Each codebase was tested against the same set of ten canonical tasks:

Add a new API endpoint with request validation
Refactor a utility function and update all call sites
Fix a bug described in a GitHub issue (simulated with a comment)
Write unit tests for an existing module
Add TypeScript types to an untyped JavaScript file
Review a pull request diff and suggest improvements
Extract a reusable component from duplicated code
Update a dependency and migrate breaking API changes
Add logging and error handling to an existing function
Explain an unfamiliar module to a new team member

Tasks were run three times each (fresh session, warm cache, cold cache after file modifications) and averaged. Token counts were captured from the Anthropic API usage response object — not estimated.

The Codebases

Project	Lines of Code	Language	Description
cli-tool	~4,200	TypeScript	Small command-line utility
saas-api	~18,000	TypeScript / Node	REST API with auth and billing
react-dashboard	~31,000	TypeScript / React	Admin dashboard, many components
monorepo-platform	~89,000	TypeScript	Multi-package monorepo
python-ml-pipeline	~12,000	Python	Data pipeline with Pandas/NumPy
go-microservice	~7,500	Go	gRPC service with PostgreSQL

Results: Token Reduction by Project

Project	Baseline tokens	With CodeCondense	Reduction	Cost saving*
cli-tool	41,200	14,800	64%	$0.78/day
saas-api	128,400	44,900	65%	$2.51/day
react-dashboard	197,300	78,100	60%	$3.58/day
monorepo-platform	412,800	142,600	65%	$8.21/day
python-ml-pipeline	89,700	51,200	43%	$1.14/day
go-microservice	55,100	24,800	55%	$0.91/day

* Based on claude-opus-4-8 pricing at time of writing. Assumes 10 active sessions per day.

Where CodeCondense Helps Most

The biggest wins came from two specific patterns:

1. Repeated File Reads

In the react-dashboard project, the average task involved reading the same file 3.2 times. This is entirely expected behaviour — Claude re-reads a file to confirm that its previous edit was applied correctly, or to refresh its view before making a second change. CodeCondense’s caching layer (keyed on file path + mtime) eliminated 94% of these redundant reads.

# Typical task without CodeCondense
Read(Button.tsx)        → 3,100 tokens
Edit(Button.tsx)        → 3,100 tokens (re-read internally)
Read(Button.tsx)        → 3,100 tokens (confirmation)
Read(Button.tsx)        → 3,100 tokens (next change)
Total: 12,400 tokens for one component edit

# With CodeCondense
Investigate(Button.tsx) → 1,200 tokens (AST summary + symbols)
BatchEdit(Button.tsx)   → 900 tokens  (all changes at once)
Total: 2,100 tokens — an 83% reduction

2. Search-Then-Read Patterns

When Claude searches for a symbol and then reads the file containing it, CodeCondense collapses those two operations into one. The Investigate tool returns search results and the surrounding AST context in a single response, eliminating the follow-up Read entirely.

This pattern accounted for 31% of all baseline token spend across our test suite. Eliminating it is the second-largest source of savings.

Where CodeCondense Helps Less

The python-ml-pipeline project saw a more modest 43% reduction. This is primarily because:

Python AST analysis is less precise than TypeScript for our tree-sitter implementation — we cannot yet collapse as aggressively.
The tasks involved more exploratory behaviour (reading data files, inspecting CSV schemas) where caching provides little benefit.
Jupyter notebooks require a different context model that we have not yet optimised.

Python support is on the roadmap. The current version works best with TypeScript and JavaScript codebases.

Quality: Did Accuracy Change?

This is the question we were most nervous about. Token reduction is meaningless if Claude starts making more mistakes as a result.

We evaluated task quality by manually scoring outputs on a five-point rubric (correctness, completeness, idiomatic style, test coverage where applicable, and whether the output required correction). We scored 300 total tasks — 150 baseline and 150 with CodeCondense.

Average quality score was 4.1 / 5.0 baseline vs 4.3 / 5.0 with CodeCondense. Quality improved slightly, likely because the Investigate tool returns more precisely targeted context, reducing noise in Claude's reasoning.

We were expecting to see a neutral result or a minor regression. The modest improvement was a genuine surprise, and it aligns with what we observe anecdotally: when Claude receives less noise, it makes fewer errors.

Raw Data and Methodology

All test scripts, raw token logs, and scoring sheets are available in the GitHub repository under /benchmarks. We encourage independent verification.

If you run your own tests and find results that differ significantly from ours — in either direction — please open an issue. We want the data to be trustworthy.

Conclusion

The short version: CodeCondense consistently reduces token consumption by 55–65% on TypeScript and JavaScript projects, with proportionally smaller but still meaningful gains on other languages. Quality is maintained or slightly improved. The savings are real, measurable, and reproducible.

The long version is everything above. We tried to be honest about the limitations, the methodology, and the cases where it works less well. Make of the numbers what you will — but please do make something of them, because “trust us, it saves tokens” has never been good enough.

Benchmark Tests: How Much Does CodeCondense Actually Save?