Rankings that account for reuse
A model that looks cheap on paper can be expensive in practice, and vice versa. These boards rank by what agent workloads actually pay once caching is working - plus the latency, quality, and capture lenses that matter.
Cheapest models for cached agent workloads
$/1M tokensWhich model is cheapest once a typical agent prefix is served from cache?
- OpenAI gpt-oss-120b · $0.21
- GPT-5 Mini · $0.59
- Grok 4.3 · $1.13
Fastest time-to-first-token (warm prefix)
msWhich model responds fastest when the prefix is already cached?
- Claude Haiku 4.5 · 110 ms
- OpenAI gpt-oss-120b · 120 ms
- DeepSeek-V4-Pro · 130 ms
Best models for coding agents
indexWhich models balance code quality with reuse economics?
- Claude Fable 5 · 84.0
- Claude Opus 4.8 · 83.0
- GPT-5.5 · 79.2
Best intelligence per dollar
index per $Which model gives the most quality per reuse-adjusted dollar?
- OpenAI gpt-oss-120b · 372.9
- GPT-5 Mini · 136.0
- Grok 4.3 · 78.7
Highest measured cache capture
%Which models convert reuse opportunity into billed savings most reliably?
- Claude Fable 5 · 93%
- Claude Opus 4.8 · 93%
- Claude Opus 4.7 · 92%
Rankings are a starting point. Measure your own.
Your traffic determines your real reuse. Run a diagnostic to rank models for your workload specifically.