Claims of “40–70% token reduction” are easy to make and hard to verify. So we did the work: we selected six representative codebases, defined a standardised set of Claude Code tasks, and measured exactly what CodeCondense does and does not improve.
This post documents that process in full. No cherry-picked numbers. No ideal conditions. If you are evaluating CodeCondense for your own workflow, this is the data you actually need.
Test Setup
Each codebase was tested against the same set of ten canonical tasks:
- Add a new API endpoint with request validation
- Refactor a utility function and update all call sites
- Fix a bug described in a GitHub issue (simulated with a comment)
- Write unit tests for an existing module
- Add TypeScript types to an untyped JavaScript file
- Review a pull request diff and suggest improvements
- Extract a reusable component from duplicated code
- Update a dependency and migrate breaking API changes
- Add logging and error handling to an existing function
- Explain an unfamiliar module to a new team member
Tasks were run three times each (fresh session, warm cache, cold cache after file modifications) and averaged. Token counts were captured from the Anthropic API usage response object — not estimated.
The Codebases
| Project | Lines of Code | Language | Description |
|---|---|---|---|
| cli-tool | ~4,200 | TypeScript | Small command-line utility |
| saas-api | ~18,000 | TypeScript / Node | REST API with auth and billing |
| react-dashboard | ~31,000 | TypeScript / React | Admin dashboard, many components |
| monorepo-platform | ~89,000 | TypeScript | Multi-package monorepo |
| python-ml-pipeline | ~12,000 | Python | Data pipeline with Pandas/NumPy |
| go-microservice | ~7,500 | Go | gRPC service with PostgreSQL |
Results: Token Reduction by Project
| Project | Baseline tokens | With CodeCondense | Reduction | Cost saving* |
|---|---|---|---|---|
| cli-tool | 41,200 | 14,800 | 64% | $0.78/day |
| saas-api | 128,400 | 44,900 | 65% | $2.51/day |
| react-dashboard | 197,300 | 78,100 | 60% | $3.58/day |
| monorepo-platform | 412,800 | 142,600 | 65% | $8.21/day |
| python-ml-pipeline | 89,700 | 51,200 | 43% | $1.14/day |
| go-microservice | 55,100 | 24,800 | 55% | $0.91/day |
* Based on claude-opus-4-8 pricing at time of writing. Assumes 10 active sessions per day.
Where CodeCondense Helps Most
The biggest wins came from two specific patterns:
1. Repeated File Reads
In the react-dashboard project, the average task involved reading the same file 3.2 times. This is entirely expected behaviour — Claude re-reads a file to confirm that its previous edit was applied correctly, or to refresh its view before making a second change. CodeCondense’s caching layer (keyed on file path + mtime) eliminated 94% of these redundant reads.
# Typical task without CodeCondense
Read(Button.tsx) → 3,100 tokens
Edit(Button.tsx) → 3,100 tokens (re-read internally)
Read(Button.tsx) → 3,100 tokens (confirmation)
Read(Button.tsx) → 3,100 tokens (next change)
Total: 12,400 tokens for one component edit
# With CodeCondense
Investigate(Button.tsx) → 1,200 tokens (AST summary + symbols)
BatchEdit(Button.tsx) → 900 tokens (all changes at once)
Total: 2,100 tokens — an 83% reduction2. Search-Then-Read Patterns
When Claude searches for a symbol and then reads the file containing it, CodeCondense collapses those two operations into one. The Investigate tool returns search results and the surrounding AST context in a single response, eliminating the follow-up Read entirely.
This pattern accounted for 31% of all baseline token spend across our test suite. Eliminating it is the second-largest source of savings.
Where CodeCondense Helps Less
The python-ml-pipeline project saw a more modest 43% reduction. This is primarily because:
- Python AST analysis is less precise than TypeScript for our tree-sitter implementation — we cannot yet collapse as aggressively.
- The tasks involved more exploratory behaviour (reading data files, inspecting CSV schemas) where caching provides little benefit.
- Jupyter notebooks require a different context model that we have not yet optimised.
Python support is on the roadmap. The current version works best with TypeScript and JavaScript codebases.
Quality: Did Accuracy Change?
This is the question we were most nervous about. Token reduction is meaningless if Claude starts making more mistakes as a result.
We evaluated task quality by manually scoring outputs on a five-point rubric (correctness, completeness, idiomatic style, test coverage where applicable, and whether the output required correction). We scored 300 total tasks — 150 baseline and 150 with CodeCondense.
We were expecting to see a neutral result or a minor regression. The modest improvement was a genuine surprise, and it aligns with what we observe anecdotally: when Claude receives less noise, it makes fewer errors.
Raw Data and Methodology
All test scripts, raw token logs, and scoring sheets are available in the GitHub repository under /benchmarks. We encourage independent verification.
If you run your own tests and find results that differ significantly from ours — in either direction — please open an issue. We want the data to be trustworthy.
Conclusion
The short version: CodeCondense consistently reduces token consumption by 55–65% on TypeScript and JavaScript projects, with proportionally smaller but still meaningful gains on other languages. Quality is maintained or slightly improved. The savings are real, measurable, and reproducible.
The long version is everything above. We tried to be honest about the limitations, the methodology, and the cases where it works less well. Make of the numbers what you will — but please do make something of them, because “trust us, it saves tokens” has never been good enough.