Fastest time-to-first-token (warm prefix)

Median warm TTFT from the Zumik benchmark. This is what an interactive user actually waits for on a repeated, cache-warm prompt.

#ModelProviderWarm TTFTList blendedCache disc.
1Claude Haiku 4.5Anthropic110 ms$2.0090%
2OpenAI gpt-oss-120bFireworks120 ms$0.2690%
3DeepSeek-V4-ProFireworks130 ms$2.1792%
4GLM 5.1Fireworks130 ms$2.1581%
5Kimi K2.6Fireworks140 ms$1.7183%
6Gemini 3.5 FlashGoogle150 ms$3.3890%
7GPT-5 MiniOpenAI150 ms$0.6990%
8Claude Sonnet 4.6Anthropic190 ms$6.0090%
9Claude Opus 4.8Anthropic240 ms$10.0090%
10GPT-5.5OpenAI240 ms$11.2590%
11Claude Fable 5Anthropic280 ms$20.0090%
12Grok 4.3xAI280 ms$1.5684%
13Claude Opus 4.7Anthropic320 ms$10.0090%
14Gemini 3.1 Pro PreviewGoogle360 ms$4.5090%
15GPT-5.5 ProOpenAI900 ms$67.500%
Method

Median time-to-first-token on a cache-warm stable prefix, from 180k paired requests.

Rank models for your workload.

A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.