Leaderboard

Highest measured cache capture

Median capture rate from the Zumik corpus: realized reused tokens over candidate reusable tokens. High capture means the savings actually land.

#	Model	Provider	Capture rate	List blended	Cache disc.
1	Claude Fable 5	Anthropic	94%	$20.00	−90%
2	Claude Opus 4.7	Anthropic	94%	$10.00	−90%
3	Claude Opus 4.8	Anthropic	94%	$10.00	−90%
4	Claude Sonnet 4.6	Anthropic	94%	$6.00	−90%
5	Claude Haiku 4.5	Anthropic	91%	$2.00	−90%
6	GPT-5.5	OpenAI	88%	$11.25	−90%
7	GPT-5 Mini	OpenAI	86%	$0.69	−90%
8	Gemini 3.5 Flash	Google	85%	$3.38	−90%
9	GLM 5.1	Fireworks	83%	$2.15	−81%
10	Gemini 3.1 Pro Preview	Google	83%	$4.50	−90%
11	Kimi K2.6	Fireworks	82%	$1.71	−83%
12	DeepSeek-V4-Pro	Fireworks	81%	$2.17	−92%
13	OpenAI gpt-oss-120b	Fireworks	81%	$0.26	−90%
14	GPT-5.5 Pro	OpenAI	80%	$67.50	−0%
15	Grok 4.3	xAI	79%	$1.56	−84%

Method

Median captureRatePct across eligible requests, with provider-reported evidence where available.

Other lenses

Rank the same models differently.

Which model is cheapest once a typical agent prefix is served from cache?

Which model responds fastest when the prefix is already cached?

Which models balance code quality with reuse economics?

Which model gives the most quality per reuse-adjusted dollar?

A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.