Glossary

Prompt caching

Reusing the computed state of a repeated prompt prefix so it is billed at a reduced cache-read rate instead of being recomputed.

Prompt caching by provider

Prompt caching reuses the KV state of a shared prefix across requests. Providers expose it differently: automatic prefix matching (OpenAI, Fireworks), explicit breakpoints (Anthropic), implicit context caching (Gemini), or context caching (xAI).

The savings are real but conditional. A cache read costs a fraction of list input price, yet only the matching prefix counts, and volatile content placed early destroys the match.

Related terms

See it in practice.

Definitions are useful; measurement is better. Run a diagnostic on your own workload.

Run a diagnostic Back to glossary

Prompt caching

Keep reading.

KV cache

Reuse opportunity

Cache capture rate

See it in practice.