How each provider really caches
The provider you route to decides how much of your reuse opportunity becomes a billed saving. These pages break down caching mechanics, batch tiers, BYOK support, and the proprietary capture numbers we measure on each.
OpenAI
Automatic prefix cachingGPT-5 family, automatic prefix caching, flex/scale tiers.
Provider detailsAnthropic
Explicit cache_control breakpointsClaude Fable / Opus, explicit cache breakpoints, 90% read discount.
Provider detailsGoogle Gemini
Implicit context cachingGemini 3 family, implicit caching, up to 2M-token context.
Provider detailsxAI
Context cachingGrok 4 and Grok-3 Mini, context caching, cheap fast frontier.
Provider detailsFireworks AI
Automatic prompt caching (serverless and dedicated)Open-weights serving with speculative decoding and dedicated tiers.
Provider detailsCaching mechanics, side by side.
| Provider | Cache type | Read discount | Write premium | Batch | BYOK |
|---|---|---|---|---|---|
| OpenAI | automatic | −75% | none | 50% | Yes |
| Anthropic | explicit | −90% | +25% | 50% | Yes |
| Google Gemini | implicit | −75% | none | 50% | Yes |
| xAI | context | −75% | none | - | Yes |
| Fireworks AI | automatic | −74% | none | 40% | Yes |
Caching is where the savings live or die.
The prompt-caching guides explain exactly how to capture each provider's discount - and the mistakes that quietly throw it away.
Use any provider. Capture every discount.
Bring your own keys or use managed accounts, and let Zumik capture provider-native caching and batch tiers.