Leaderboard

Cheapest models for cached agent workloads

Blended cost per million tokens assuming 55% of input is reused at the cache-read rate and a 25% output share â€” the shape of a real coding agent, not a list-price headline.

All leaderboards

#	Model	Provider	Reuse-adjusted blended	List blended	Cache disc.
1	OpenAI gpt-oss-120b	Fireworks	$0.21	$0.26	−90%
2	GPT-5 Mini	OpenAI	$0.59	$0.69	−90%
3	Grok 4.3	xAI	$1.13	$1.56	−84%
4	Kimi K2.6	Fireworks	$1.39	$1.71	−83%
5	DeepSeek-V4-Pro	Fireworks	$1.52	$2.17	−92%
6	Claude Haiku 4.5	Anthropic	$1.63	$2.00	−90%
7	GLM 5.1	Fireworks	$1.68	$2.15	−81%
8	Gemini 3.5 Flash	Google	$2.82	$3.38	−90%
9	Gemini 3.1 Pro Preview	Google	$3.76	$4.50	−90%
10	Claude Sonnet 4.6	Anthropic	$4.89	$6.00	−90%
11	Claude Opus 4.7	Anthropic	$8.14	$10.00	−90%
12	Claude Opus 4.8	Anthropic	$8.14	$10.00	−90%
13	GPT-5.5	OpenAI	$9.39	$11.25	−90%
14	Claude Fable 5	Anthropic	$16.29	$20.00	−90%
15	GPT-5.5 Pro	OpenAI	$67.50	$67.50	−0%

Method

reuseAdjustedBlended(model, reuse=0.55, outputShare=0.25). Input is split between list and cache-read price by the reuse ratio, then blended with output at the output share.

Other lenses

Rank the same models differently.

Fastest time-to-first-token (warm prefix)

Which model responds fastest when the prefix is already cached?

Best models for coding agents

Which models balance code quality with reuse economics?

Best intelligence per dollar

Which model gives the most quality per reuse-adjusted dollar?

Highest measured cache capture

Which models convert reuse opportunity into billed savings most reliably?

Rank models for your workload.

A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.

Run a diagnostic Pricing calculator