Fastest time-to-first-token (warm prefix)
Median warm TTFT from the Zumik benchmark. This is what an interactive user actually waits for on a repeated, cache-warm prompt.
| # | Model | Provider | Warm TTFT | List blended | Cache disc. |
|---|---|---|---|---|---|
| 1 | Claude Haiku 4.5 | Anthropic | 110 ms | $2.00 | −90% |
| 2 | OpenAI gpt-oss-120b | Fireworks | 120 ms | $0.26 | −90% |
| 3 | DeepSeek-V4-Pro | Fireworks | 130 ms | $2.17 | −92% |
| 4 | GLM 5.1 | Fireworks | 130 ms | $2.15 | −81% |
| 5 | Kimi K2.6 | Fireworks | 140 ms | $1.71 | −83% |
| 6 | Gemini 3.5 Flash | 150 ms | $3.38 | −90% | |
| 7 | GPT-5 Mini | OpenAI | 150 ms | $0.69 | −90% |
| 8 | Claude Sonnet 4.6 | Anthropic | 190 ms | $6.00 | −90% |
| 9 | Claude Opus 4.8 | Anthropic | 240 ms | $10.00 | −90% |
| 10 | GPT-5.5 | OpenAI | 240 ms | $11.25 | −90% |
| 11 | Claude Fable 5 | Anthropic | 280 ms | $20.00 | −90% |
| 12 | Grok 4.3 | xAI | 280 ms | $1.56 | −84% |
| 13 | Claude Opus 4.7 | Anthropic | 320 ms | $10.00 | −90% |
| 14 | Gemini 3.1 Pro Preview | 360 ms | $4.50 | −90% | |
| 15 | GPT-5.5 Pro | OpenAI | 900 ms | $67.50 | −0% |
Method
Median time-to-first-token on a cache-warm stable prefix, from 180k paired requests.
Rank the same models differently.
Cheapest models for cached agent workloads
Which model is cheapest once a typical agent prefix is served from cache?
Best models for coding agents
Which models balance code quality with reuse economics?
Best intelligence per dollar
Which model gives the most quality per reuse-adjusted dollar?
Highest measured cache capture
Which models convert reuse opportunity into billed savings most reliably?
Rank models for your workload.
A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.