Data · leaderboards

Rankings that account for reuse

A model that looks cheap on paper can be expensive in practice, and vice versa. These boards rank by what agent workloads actually pay once caching is working - plus the latency, quality, and capture lenses that matter.

Cheapest models for cached agent workloads

$/1M tokens

Which model is cheapest once a typical agent prefix is served from cache?

OpenAI gpt-oss-120b · $0.21
GPT-5 Mini · $0.59
Grok 4.3 · $1.13

Full ranking

Fastest time-to-first-token (warm prefix)

Which model responds fastest when the prefix is already cached?

Claude Haiku 4.5 · 110 ms
OpenAI gpt-oss-120b · 120 ms
DeepSeek-V4-Pro · 130 ms

Full ranking

Best models for coding agents

index

Which models balance code quality with reuse economics?

Claude Fable 5 · 85.2
Claude Opus 4.8 · 84.2
GPT-5.5 · 81.0

Full ranking

Best intelligence per dollar

index per $

Which model gives the most quality per reuse-adjusted dollar?

OpenAI gpt-oss-120b · 325.7
GPT-5 Mini · 121.0
Grok 4.3 · 72.7

Full ranking

Highest measured cache capture

Which models convert reuse opportunity into billed savings most reliably?

Claude Fable 5 · 94%
Claude Opus 4.7 · 94%
Claude Opus 4.8 · 94%

Full ranking

Rankings are a starting point. Measure your own.

Your traffic determines your real reuse. Run a diagnostic to rank models for your workload specifically.

Run a diagnostic Model catalog