A repeated prompt is not a cache hit

The most expensive assumption in agent infrastructure is that a repeated prefix means saved compute. It does not. A prefix can repeat all day and still be recomputed every time if it expired from the cache, if a timestamp at the top broke the match, or if the provider never had a chance to warm it.

So we refuse to collapse the two ideas. Zumik reports opportunity and capture separately, and treats the difference as the thing worth fixing.

Opportunity is the ceiling

Opportunity ratio is candidate reusable tokens over total input tokens. It comes from prefix-family analysis: how often do equivalent reusable blocks recur in your traffic? A coding agent with a fixed tool registry and repo policy might show an opportunity ratio above 0.7. That is the most you could ever save on input prefill.

It is a ceiling, not a promise. Nothing about opportunity guarantees a single cache hit.

Capture is what you actually got

Capture rate is realized reused tokens over candidate reusable tokens. This is the honest number, and it is the one that shows up on the invoice. In our corpus, capture varies more by provider scheme than by anything else: explicit caching captures the most when breakpoints are placed well, implicit caching is convenient but swings widely, and short retention windows quietly cap everything.

When opportunity is high and capture is low, the fix is almost never new hardware. It is prompt ordering, breakpoint placement, or retention locality.

The missed-opportunity gap

We name the difference explicitly: missed-opportunity tokens. It is the single most actionable line in a diagnostic, because it tells you how much money is sitting on the table and, usually, why.

Close that gap with prompt construction first. Only after capture plateaus does it make sense to ask whether a dedicated lane or BYOC would move it further.

A repeated prompt is not a cache hit

Opportunity is the ceiling

Capture is what you actually got

The missed-opportunity gap

Keep going.

Scoring a workload before you change infrastructure

Prompt ordering is the cheapest optimization you are skipping

Anthropic vs OpenAI prompt caching, measured

Turn the idea into a measurement.