Cheapest models for cached agent workloads

Blended cost per million tokens assuming 55% of input is reused at the cache-read rate and a 25% output share — the shape of a real coding agent, not a list-price headline.

#ModelProviderReuse-adjusted blendedList blendedCache disc.
1OpenAI gpt-oss-120bFireworks$0.21$0.2690%
2GPT-5 MiniOpenAI$0.59$0.6990%
3Grok 4.3xAI$1.13$1.5684%
4Kimi K2.6Fireworks$1.39$1.7183%
5DeepSeek-V4-ProFireworks$1.52$2.1792%
6Claude Haiku 4.5Anthropic$1.63$2.0090%
7GLM 5.1Fireworks$1.68$2.1581%
8Gemini 3.5 FlashGoogle$2.82$3.3890%
9Gemini 3.1 Pro PreviewGoogle$3.76$4.5090%
10Claude Sonnet 4.6Anthropic$4.89$6.0090%
11Claude Opus 4.7Anthropic$8.14$10.0090%
12Claude Opus 4.8Anthropic$8.14$10.0090%
13GPT-5.5OpenAI$9.39$11.2590%
14Claude Fable 5Anthropic$16.29$20.0090%
15GPT-5.5 ProOpenAI$67.50$67.500%
Method

reuseAdjustedBlended(model, reuse=0.55, outputShare=0.25). Input is split between list and cache-read price by the reuse ratio, then blended with output at the output share.

Rank models for your workload.

A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.