Prompt caching

Reusing the computed state of a repeated prompt prefix so it is billed at a reduced cache-read rate instead of being recomputed.

Prompt caching reuses the KV state of a shared prefix across requests. Providers expose it differently: automatic prefix matching (OpenAI, Fireworks), explicit breakpoints (Anthropic), implicit context caching (Gemini), or context caching (xAI).

The savings are real but conditional. A cache read costs a fraction of list input price, yet only the matching prefix counts, and volatile content placed early destroys the match.

See it in practice.

Definitions are useful; measurement is better. Run a diagnostic on your own workload.