How agent inference is actually changing.
These series come from real routed traffic, anonymized and aggregated. They describe how agent workloads are shaped and how reuse, routing, and batch adoption are trending - signal you can use to plan, not a vanity dashboard.
Workloads with a >10k-token stable prefix
DetailShare of measured agent workloads carrying at least ten thousand tokens of stable prefix per request.
Measured in % of workloads.
Median realized reuse ratio
DetailThe median share of input tokens served from cache once provider caching is configured through Zumik.
Measured in % of input tokens.
Routing mix by alias family
DetailDistribution of resolved requests across the alias families in the catalog.
Measured in % of requests.
Share of non-interactive tokens on batch tiers
DetailHow much background and evaluation traffic moves to 24-hour batch lanes for the 50% discount.
Measured in % of background tokens.
The throughline: more of every agent request is stable scaffolding, and more teams are capturing it. That is exactly the surface reuse targets. Read why a repeated prompt is not a cache hit, then measure your own.
See where your workload sits.
Run a diagnostic and compare your reuse profile against these aggregate trends.