Cheapest models for cached agent workloads
Blended cost per million tokens assuming 55% of input is reused at the cache-read rate and a 25% output share — the shape of a real coding agent, not a list-price headline.
| # | Model | Provider | Reuse-adjusted blended | List blended | Cache disc. |
|---|---|---|---|---|---|
| 1 | OpenAI gpt-oss-120b | Fireworks | $0.21 | $0.26 | −90% |
| 2 | GPT-5 Mini | OpenAI | $0.59 | $0.69 | −90% |
| 3 | Grok 4.3 | xAI | $1.13 | $1.56 | −84% |
| 4 | Kimi K2.6 | Fireworks | $1.39 | $1.71 | −83% |
| 5 | DeepSeek-V4-Pro | Fireworks | $1.52 | $2.17 | −92% |
| 6 | Claude Haiku 4.5 | Anthropic | $1.63 | $2.00 | −90% |
| 7 | GLM 5.1 | Fireworks | $1.68 | $2.15 | −81% |
| 8 | Gemini 3.5 Flash | $2.82 | $3.38 | −90% | |
| 9 | Gemini 3.1 Pro Preview | $3.76 | $4.50 | −90% | |
| 10 | Claude Sonnet 4.6 | Anthropic | $4.89 | $6.00 | −90% |
| 11 | Claude Opus 4.7 | Anthropic | $8.14 | $10.00 | −90% |
| 12 | Claude Opus 4.8 | Anthropic | $8.14 | $10.00 | −90% |
| 13 | GPT-5.5 | OpenAI | $9.39 | $11.25 | −90% |
| 14 | Claude Fable 5 | Anthropic | $16.29 | $20.00 | −90% |
| 15 | GPT-5.5 Pro | OpenAI | $67.50 | $67.50 | −0% |
Method
reuseAdjustedBlended(model, reuse=0.55, outputShare=0.25). Input is split between list and cache-read price by the reuse ratio, then blended with output at the output share.
Rank the same models differently.
Fastest time-to-first-token (warm prefix)
Which model responds fastest when the prefix is already cached?
Best models for coding agents
Which models balance code quality with reuse economics?
Best intelligence per dollar
Which model gives the most quality per reuse-adjusted dollar?
Highest measured cache capture
Which models convert reuse opportunity into billed savings most reliably?
Rank models for your workload.
A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.