1. Verbose tool output

Every shell command, build run, and test invocation sends its full output back to the model. A passing test suite can dump hundreds of lines of dots, timing data, and progress bars. A failing build can produce thousands of lines of stack traces, deprecation warnings, and dependency resolution noise. Claude Code reads all of it, every token costs you, and the actual signal is usually buried in 5% of the output.

2. Repeated context across turns

Claude Code re-sends conversation history with every turn. The same file content, the same tool definitions, the same earlier responses get replayed for the model on each step. In a long debugging session this compounds quickly: the tenth message in a thread can carry ten times the context of the first.

3. Multi-step debugging sessions

Real debugging is rarely one prompt. You run a test, read the failure, ask Claude to investigate, it reads three files, runs another command, reads more files, suggests a fix, you run the test again. Each step adds more tool output, more file content, more context. A 30-minute debugging session can easily consume more tokens than a full afternoon of writing code.

4. Large codebase reads

Asking Claude Code to "explore the codebase" or "find where X is defined" pulls in big chunks of source files, often with surrounding context Claude doesn't strictly need. If your repo has long generated files, JSON fixtures, or HTML templates, a single search can move tens of thousands of tokens through the model.

The Claude Code twist: sub-agents and the prompt cache

Claude Code leans hard on Anthropic's prompt cache — a stable prefix of your conversation is billed at a fraction of the normal input rate on repeat turns, which softens the repeated-context problem above. The catch is that sub-agents spin up their own fresh contexts that do not inherit that cached prefix, so a multi-agent run re-pays for context the main session was getting cheaply. It is a common reason heavy Claude Code users burn through the weekly limit faster than the raw token counts suggest.

How to fix it

The fix is the same in every case: remove low-signal content before Claude Code sees it, while preserving the structure and information the model actually needs. That is what Headroom does — it sits between Claude Code and the API and reversibly compresses repetitive logs and verbose documents before forwarding them, so the model can still pull back anything it needs. Token spend drops by ~50%, with no measurable drop in output quality.

For a deeper walk-through of the tools that handle each source of waste, see our Claude Code cost guide. If your problem is hitting the 5-hour plan limit rather than billing, our Claude Code usage limits guide covers that specifically.

Why is Claude Code so expensive?

1. Verbose tool output

2. Repeated context across turns

3. Multi-step debugging sessions

4. Large codebase reads

The Claude Code twist: sub-agents and the prompt cache

How to fix it

Stop paying for noise

Not ready to install yet?

Headroom is macOS-only — for now

You're on the list.