Stop Hitting Claude Code Limits
Claude Code's usage limits aren't the real problem—how you set it up is. Four controllable causes drive 85% of overspend: cache misses, context bloat, wrong model routing, and token-heavy input formats. One user cut costs from $1,389/mo to $200/mo by locking tools at session start, disabling 1M context, delegating to cheaper subagents, and swapping screenshots for accessibility trees. Real fixes are copy-paste configuration changes and workflow tweaks, not waiting for Anthropic.
Script: Haiku 4 Voice: OpenAI TTS
Transcript
Justy Exploring Next, episode 346. One user cut their Claude Code bill from $1,389 a month to $200—not by switching models or skipping work, just by fixing how they set it up.
Cody Yeah, and the postmortem from Anthropic was interesting. They fixed three bugs in late April, reset subscriber limits. But the author's argument is the real lever isn't on their side—it's on ours.
Justy So what's actually eating the budget? Is it just people running long sessions?
Cody Four things, and they're all preventable. Prompt caching—if you're not hitting it, you're paying full price on every turn. Context bloat, wrong model routing, and token-expensive inputs like screenshots. Read cost is 0.1× input price. Write is 1.25× or 2×. But once cached, a hit is free.
Justy Free? So why isn't everyone just... caching everything?
Cody Because you have to lock your tools and model at session start. Add a tool mid-session or switch models—the cached prefix dies. You're back to paying full price. Most people don't even know it's happening.
Justy That's a brutal gotcha. So the setup is the trap.
Cody Exactly. The author got ~90% cache hit rate by not touching tools or models once the session started. The other big move is disabling 1M context. Cap it at 200K, compact early. For parallel work, spawn subagents in cheaper models. Haiku for grunt work, Sonnet for research, Opus for planning.
Justy Right. But adoption barrier is real. Most people don't think about prompt caching. They hit a bill shock and either switch tools or accept it.
Cody True. But the author gives you copy-paste config blocks. Set CLAUDE_CODE_DISABLE_1M_CONTEXT=1, lock your tools, use subagents. For inputs, swap screenshots for agent-browser—it returns the accessibility tree instead. ~90% fewer tokens. Or use pdftotext instead of Claude's PDF reader.
Justy So if someone's running Claude Code right now and seeing spend creep, where do they start?
Cody Watch the cache hit rate in session stats. If it's below 75%, you're invalidating the prefix. Lock your tools and model at start, compact early, and route subtasks to cheaper models. Then swap your inputs—agent-browser for web scraping, pdftotext for PDFs. Those two things alone probably drop your bill 40–50%.
Justy And if someone wants to experiment over a weekend?
Cody Grab agent-browser from npm, set up one CLAUDE.md task-delegation block in a project, and run a long session with subagents instead of everything in the parent. You'll see the token count drop immediately. Or set up the 1M context disable and compact threshold in your env, then run the same workflow twice and compare.
Justy Alright. If you're hitting Claude Code limits, Cody's right—the fixes are on your side. Go lock your setup.