Benchmark · % capture

Prompt-cache capture by provider

Of the reuse a workload could capture, how much do providers actually deliver?

Capture rate is realized reused tokens divided by candidate reusable tokens. A high opportunity ratio means nothing if capture is low. This suite isolates how each provider’s caching scheme converts opportunity into billed savings on agent traffic.

Results

What the corpus shows.

Provider	Median capture	p25	p75	Evidence level	Note
Anthropic (explicit)	92	84	96	provider_reported	Best when breakpoints sit on stable blocks.
OpenAI (automatic)	88	79	93	provider_reported	-
Fireworks (automatic)	80	68	88	runtime_confirmed	-
Google Gemini (implicit)	82	61	90	mixed	Wide spread - recency-sensitive.
xAI (context)	75	58	85	router_inferred	-

Takeaways

Explicit caching captures the most but punishes bad breakpoint placement.
Implicit caching has the widest variance - convenient, less predictable.
Capture, not opportunity, is what shows up on the invoice.

Methodology

For each provider we take eligible requests (≥1,024 token stable prefix) from the trace corpus, compute candidate reuse from prefix-family analysis, and compare against provider-reported cached tokens where available, falling back to runtime-confirmed counts on BYOC lanes.

How we grade evidence

TTFT savings from a warm prefix

How much faster is time-to-first-token when the prefix is already cached?

Reuse opportunity by workload type

Which agent workloads actually have reusable structure?

Get these numbers for your traffic.

A diagnostic runs this analysis on your own workload and attaches an evidence level to every figure.

Run a diagnostic See model pricing

What the corpus shows.

Other benchmark suites.

TTFT savings from a warm prefix

Reuse opportunity by workload type

Get these numbers for your traffic.