Highest measured cache capture
Median capture rate from the Zumik corpus: realized reused tokens over candidate reusable tokens. High capture means the savings actually land.
| # | Model | Provider | Capture rate | List blended | Cache disc. |
|---|---|---|---|---|---|
| 1 | Claude Fable 5 | Anthropic | 93% | $20.00 | −90% |
| 2 | Claude Opus 4.8 | Anthropic | 93% | $10.00 | −90% |
| 3 | Claude Opus 4.7 | Anthropic | 92% | $10.00 | −90% |
| 4 | Claude Sonnet 4.6 | Anthropic | 91% | $6.00 | −90% |
| 5 | Claude Haiku 4.5 | Anthropic | 90% | $2.00 | −90% |
| 6 | GPT-5.5 Pro | OpenAI | 90% | $67.50 | −0% |
| 7 | GPT-5.5 | OpenAI | 88% | $11.25 | −90% |
| 8 | GPT-5 Mini | OpenAI | 85% | $0.69 | −90% |
| 9 | GLM 5.1 | Fireworks | 82% | $2.15 | −81% |
| 10 | Gemini 3.1 Pro Preview | 82% | $4.50 | −90% | |
| 11 | Kimi K2.6 | Fireworks | 81% | $1.71 | −83% |
| 12 | Gemini 3.5 Flash | 81% | $3.38 | −90% | |
| 13 | DeepSeek-V4-Pro | Fireworks | 80% | $2.17 | −92% |
| 14 | OpenAI gpt-oss-120b | Fireworks | 79% | $0.26 | −90% |
| 15 | Grok 4.3 | xAI | 76% | $1.56 | −84% |
Method
Median captureRatePct across eligible requests, with provider-reported evidence where available.
Rank the same models differently.
Cheapest models for cached agent workloads
Which model is cheapest once a typical agent prefix is served from cache?
Fastest time-to-first-token (warm prefix)
Which model responds fastest when the prefix is already cached?
Best models for coding agents
Which models balance code quality with reuse economics?
Best intelligence per dollar
Which model gives the most quality per reuse-adjusted dollar?
Rank models for your workload.
A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.