Data · providers

How each provider really caches

The provider you route to decides how much of your reuse opportunity becomes a billed saving. These pages break down caching mechanics, batch tiers, BYOK support, and the proprietary capture numbers we measure on each.

OpenAI

Automatic prefix caching

GPT-5 family, automatic prefix caching, flex/scale tiers.

Provider details

Anthropic

Explicit cache_control breakpoints

Claude Fable / Opus, explicit cache breakpoints, 90% read discount.

Provider details

Google Gemini

Implicit context caching

Gemini 3 family, implicit caching, up to 2M-token context.

Provider details

xAI

Context caching

Grok 4 and Grok-3 Mini, context caching, cheap fast frontier.

Provider details

Fireworks AI

Automatic prompt caching (serverless and dedicated)

Open-weights serving with speculative decoding and dedicated tiers.

Provider details

At a glance

Caching mechanics, side by side.

Provider	Cache type	Read discount	Write premium	Batch	BYOK
OpenAI	automatic	−75%	none	50%	Yes
Anthropic	explicit	−90%	+25%	50%	Yes
Google Gemini	implicit	−75%	none	50%	Yes
xAI	context	−75%	none	-	Yes
Fireworks AI	automatic	−74%	none	40%	Yes

Go deeper

Caching is where the savings live or die.

The prompt-caching guides explain exactly how to capture each provider's discount - and the mistakes that quietly throw it away.

Prompt caching by provider Capture benchmark

Use any provider. Capture every discount.

Bring your own keys or use managed accounts, and let Zumik capture provider-native caching and batch tiers.

Provider routing Model catalog