Prompt caching, captured - not just configured.
Every provider caches prompts, but the discount only lands if you place stable content correctly and respect each scheme's quirks. These guides explain how to capture the savings, and the small mistakes that quietly erase them.
Caching reuses the KV state of a repeated prefix so it is billed at a reduced read rate instead of recomputed. The universal rule across every scheme: keep stable content at the front, push volatile content to the end. See the KV cache and prompt caching definitions.
OpenAI
−75% readAutomatic prefix caching.
Injecting a timestamp, request id, or per-call system note near the top of the prompt resets the prefix and silently drops the hit rate to near zero.
Read the guideAnthropic
−90% readExplicit cache_control breakpoints.
Putting a breakpoint after volatile content, or breaking on every turn, means you pay write premiums repeatedly and never recover them.
Read the guideGoogle Gemini
−75% readImplicit context caching.
Assuming the 2M-token window means everything is cheap. Implicit hits depend on recency and prefix overlap, not just on fitting inside the window.
Read the guidexAI
−75% readContext caching.
Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.
Read the guideFireworks AI
−74% readAutomatic prompt caching (serverless and dedicated).
Running a hot, reuse-heavy lane on serverless and blaming the model for cold-cache latency, when a dedicated deployment would hold the prefix warm.
Read the guideFour ways providers cache.
Automatic
Caches any prefix over a threshold with no markup. Forgiving, capped discount. (OpenAI, Fireworks)
Explicit
You place cache_control breakpoints. Highest discount, write premium. (Anthropic)
Implicit
Discounts apply on recent prefix overlap, no breakpoints. Convenient, variable. (Gemini)
Context
Reuses a cached context across consecutive requests. (xAI)
Measure your real capture rate.
See how much of each scheme's discount your workload actually lands - with provider-reported evidence.