Give Claude Code a $0.02/Call Coworker to Beat Pro Limits
Claude Code kept hitting my Pro limits mid-session. I tried compact mode, tighter prompts, chunking tasks manually — nothing stuck. Then I found a Reddit post with over a thousand upvotes from someone who had solved this by giving Claude Code a coworker: a cheap-model wrapper it could call via the Bash tool. Their cost on the cheap model over three weeks? $0.38. Their documentation updates dropped from around 5,000 tokens to around 200 tokens per run. That’s the architecture we’re building here.
What you need
You need a Claude Code installation (Claude Pro or Team), an API key for a cheap model you want to use as the subagent (any provider with a CLI-friendly endpoint works — the pattern in the original post uses a model priced around $0.02/call), and basic comfort editing CLAUDE.md. Python 3 should already be on your machine.
How this works architecturally
Claude Code’s Bash tool lets it run arbitrary shell commands. A shell command can call any HTTP API. That means you can write a one-file Python script that accepts a prompt on stdin and returns a completion on stdout — and Claude Code can call it exactly like any other tool, with no plugins, no MCP servers, no special permissions.
The routing logic lives entirely in CLAUDE.md. You define categories of tasks that should go to the cheap wrapper versus tasks that should stay with Claude’s own reasoning. Claude reads those rules at session start and follows them when it decides how to accomplish a step.
flowchart TD
A[User Task] --> B{CLAUDE.md routing rules}
B -->|Repetitive / large-input / low-reasoning| C[Bash tool]
B -->|Architecture / debugging / shipped output| D[Claude's own reasoning]
C --> E[coworker.py]
E --> F[Cheap Model API]
F -->|completion text| E
E -->|stdout| C
C -->|result| D
D --> G[Final response to user]
The key insight: Claude Code doesn’t need to know what the cheap model is. It just sees a Bash command that returns text. That’s a clean boundary — you can swap the cheap model underneath without touching anything except the wrapper script.
Step 1: Write the CLI wrapper
Save this as a Python script (e.g., coworker.py) somewhere accessible, then add it to your PATH or reference it by absolute path in your CLAUDE.md:
#!/usr/bin/env python3
"""
coworker.py — thin wrapper around a cheap model API.
Usage: echo "your prompt here" | python3 coworker.py
python3 coworker.py "your prompt here"
"""
import sys
import os
import json
import urllib.request
API_KEY = os.environ.get("COWORKER_API_KEY", "<your_api_key>")
API_URL = os.environ.get("COWORKER_API_URL", "<your_cheap_model_endpoint>")
MODEL = os.environ.get("COWORKER_MODEL", "<your_cheap_model_id>")
def call_cheap_model(prompt: str) -> str:
payload = json.dumps({
"model": MODEL,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 2048,
}).encode()
req = urllib.request.Request(
API_URL,
data=payload,
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
},
)
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
# Adjust the key path to match your provider's response shape
return data["choices"][0]["message"]["content"]
if __name__ == "__main__":
if len(sys.argv) > 1:
prompt = " ".join(sys.argv[1:])
else:
prompt = sys.stdin.read().strip()
if not prompt:
sys.exit("No prompt provided.")
print(call_cheap_model(prompt))
Make it executable with chmod +x coworker.py. Test it standalone before wiring it to Claude Code:
echo "List three boilerplate pytest fixtures for a FastAPI app." | python3 coworker.py
If you get text back, the wrapper is working. If you get an auth error, double-check COWORKER_API_KEY in your environment.
Step 2: Add routing rules to CLAUDE.md
Open (or create) CLAUDE.md in your project root and add a ## Routing section:
## Routing
You have access to a cheap coworker model via Bash. Invoke it with:
python3 /path/to/coworker.py "<prompt>"
### Delegate to the coworker when the task is:
- Reading and summarizing large files (>500 lines) you don't need to reason about
- Generating boilerplate: test stubs, docstrings, changelog entries, repetitive config
- First-pass documentation updates where accuracy comes from the source file, not judgment
- Extracting structured data (imports, function signatures, dependency lists) from existing code
### Keep with your own reasoning when the task requires:
- Architecture decisions or any output that ships directly to users
- Debugging — you need full project context
- Anything spanning more than two files where the interaction between files matters
- Security-sensitive changes
### How to use the coworker:
1. Write a self-contained prompt (include all context inline — it has no project memory)
2. Call it via Bash: python3 /path/to/coworker.py "$(cat prompt.txt)"
3. Pipe its output into your next step or write it directly to the target file
The self-contained prompt requirement is critical. The cheap model has no session context, so Claude needs to inline any file content or instructions it needs the coworker to act on.
Step 3: Verify the routing in a live session
Start a Claude Code session and give it a documentation task that would normally burn tokens:
Update the docstrings in src/utils.py to match the current function signatures.
Watch the token counter. The bulk of the input tokens — reading the file, generating the docstrings — are consumed by the cheap model at its low per-call rate. Claude’s own token usage for this task drops to the coordination overhead: reading the routing rules, constructing the prompt, applying the output.
The cost math is straightforward: if a documentation pass that would normally consume thousands of input tokens against premium Claude pricing can instead be routed through a model at ~$0.02/call, the delta compounds fast across a week of development work. The original post saw three weeks of cheap-model usage add up to $0.38 total.
Where this breaks
The coworker has no memory. Every call is stateless. If you ask it to “update the module to be consistent with the previous change,” it has no idea what changed. Claude has to inline that context explicitly — which means your routing rules need to be honest about what self-contained means. If Claude can’t write a fully self-contained prompt for a task, the task shouldn’t be delegated.
Output quality varies. Cheap models are good at boilerplate and extraction. They’re unreliable on anything requiring judgment — subtle API design, error handling edge cases, matching an existing code style without examples. The routing rules above draw this line, but you’ll need to tune them for your stack. Start conservative: only delegate tasks where you’d be comfortable reviewing the output in 30 seconds.
Next steps
This routing layer is one component of what you could call a Claude Code operating system — a CLAUDE.md architecture that combines routing rules, reusable skills (pre-written prompts for common tasks), and cadences (how Claude structures a session end-to-end). The coworker pattern handles the cost side; skills and cadences handle the consistency side. Once you have the wrapper script working, the natural next step is to audit your last week of Claude Code sessions and identify which tasks would have qualified for delegation — that’s where your actual token ceiling is hiding.
← Back to blog