Prompt caching
Reusing the computed state of a repeated prompt prefix so it is billed at a reduced cache-read rate instead of being recomputed.
Prompt caching reuses the KV state of a shared prefix across requests. Providers expose it differently: automatic prefix matching (OpenAI, Fireworks), explicit breakpoints (Anthropic), implicit context caching (Gemini), or context caching (xAI).
The savings are real but conditional. A cache read costs a fraction of list input price, yet only the matching prefix counts, and volatile content placed early destroys the match.
Keep reading.
KV cache
The stored key/value attention tensors a model computes during prefill, kept so the same prefix does not have to be recomputed.
Reuse opportunity
The maximum share of input tokens that could be served from cache, independent of whether they actually were.
Cache capture rate
Realized reused tokens divided by candidate reusable tokens - how much of the available reuse a provider actually delivered.
See it in practice.
Definitions are useful; measurement is better. Run a diagnostic on your own workload.