Glossary

Retention locality

The share of repeated requests that recur within the provider or runtime cache-retention window, so the cache is still warm.

A prefix only helps if it recurs before the cache expires. Retention locality measures how much recurrence lands inside the relevant window - minutes for some schemes, an hour or more for others.

High opportunity with poor locality is the classic trap: the structure repeats, but too slowly to ever be warm.

Related terms

Keep reading.

Workload Reuse Score (WRS)

A 0-100 score of how much a workload can benefit from reuse, built from opportunity, recurrence, locality, latency sensitivity, continuity, and payload redundancy.

Prompt caching

Reusing the computed state of a repeated prompt prefix so it is billed at a reduced cache-read rate instead of being recomputed.

Cache capture rate

Realized reused tokens divided by candidate reusable tokens - how much of the available reuse a provider actually delivered.

See it in practice.

Definitions are useful; measurement is better. Run a diagnostic on your own workload.

Run a diagnostic Back to glossary