Prompt-cache capture by provider
Of the reuse a workload could capture, how much do providers actually deliver?
Capture rate is realized reused tokens divided by candidate reusable tokens. A high opportunity ratio means nothing if capture is low. This suite isolates how each provider’s caching scheme converts opportunity into billed savings on agent traffic.
What the corpus shows.
| Provider | Median capture | p25 | p75 | Evidence level | Note |
|---|---|---|---|---|---|
| Anthropic (explicit) | 92 | 84 | 96 | provider_reported | Best when breakpoints sit on stable blocks. |
| OpenAI (automatic) | 88 | 79 | 93 | provider_reported | - |
| Fireworks (automatic) | 80 | 68 | 88 | runtime_confirmed | - |
| Google Gemini (implicit) | 82 | 61 | 90 | mixed | Wide spread - recency-sensitive. |
| xAI (context) | 75 | 58 | 85 | router_inferred | - |
Takeaways
- Explicit caching captures the most but punishes bad breakpoint placement.
- Implicit caching has the widest variance - convenient, less predictable.
- Capture, not opportunity, is what shows up on the invoice.
Methodology
For each provider we take eligible requests (≥1,024 token stable prefix) from the trace corpus, compute candidate reuse from prefix-family analysis, and compare against provider-reported cached tokens where available, falling back to runtime-confirmed counts on BYOC lanes.
How we grade evidenceGet these numbers for your traffic.
A diagnostic runs this analysis on your own workload and attaches an evidence level to every figure.