Prompt-cache capture by provider

Of the reuse a workload could capture, how much do providers actually deliver?

2.4M eligible requestsLast run 2026-06-08

Capture rate is realized reused tokens divided by candidate reusable tokens. A high opportunity ratio means nothing if capture is low. This suite isolates how each provider’s caching scheme converts opportunity into billed savings on agent traffic.

What the corpus shows.

ProviderMedian capturep25p75Evidence levelNote
Anthropic (explicit)928496provider_reportedBest when breakpoints sit on stable blocks.
OpenAI (automatic)887993provider_reported-
Fireworks (automatic)806888runtime_confirmed-
Google Gemini (implicit)826190mixedWide spread - recency-sensitive.
xAI (context)755885router_inferred-
Takeaways
  • Explicit caching captures the most but punishes bad breakpoint placement.
  • Implicit caching has the widest variance - convenient, less predictable.
  • Capture, not opportunity, is what shows up on the invoice.
Methodology

For each provider we take eligible requests (≥1,024 token stable prefix) from the trace corpus, compute candidate reuse from prefix-family analysis, and compare against provider-reported cached tokens where available, falling back to runtime-confirmed counts on BYOC lanes.

How we grade evidence

Get these numbers for your traffic.

A diagnostic runs this analysis on your own workload and attaches an evidence level to every figure.