Leaderboard

Fastest time-to-first-token (warm prefix)

Median warm TTFT from the Zumik benchmark. This is what an interactive user actually waits for on a repeated, cache-warm prompt.

#	Model	Provider	Warm TTFT	List blended	Cache disc.
1	Claude Haiku 4.5	Anthropic	110 ms	$2.00	−90%
2	OpenAI gpt-oss-120b	Fireworks	120 ms	$0.26	−90%
3	DeepSeek-V4-Pro	Fireworks	130 ms	$2.17	−92%
4	GLM 5.1	Fireworks	130 ms	$2.15	−81%
5	Kimi K2.6	Fireworks	140 ms	$1.71	−83%
6	Gemini 3.5 Flash	Google	150 ms	$3.38	−90%
7	GPT-5 Mini	OpenAI	150 ms	$0.69	−90%
8	Claude Sonnet 4.6	Anthropic	190 ms	$6.00	−90%
9	Claude Opus 4.8	Anthropic	240 ms	$10.00	−90%
10	GPT-5.5	OpenAI	240 ms	$11.25	−90%
11	Claude Fable 5	Anthropic	280 ms	$20.00	−90%
12	Grok 4.3	xAI	280 ms	$1.56	−84%
13	Claude Opus 4.7	Anthropic	320 ms	$10.00	−90%
14	Gemini 3.1 Pro Preview	Google	360 ms	$4.50	−90%
15	GPT-5.5 Pro	OpenAI	900 ms	$67.50	−0%

Method

Median time-to-first-token on a cache-warm stable prefix, from 180k paired requests.

Other lenses

Rank the same models differently.

Which model is cheapest once a typical agent prefix is served from cache?

Which models balance code quality with reuse economics?

Which model gives the most quality per reuse-adjusted dollar?

Which models convert reuse opportunity into billed savings most reliably?

A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.