How agent inference is actually changing.

These series come from real routed traffic, anonymized and aggregated. They describe how agent workloads are shaped and how reuse, routing, and batch adoption are trending - signal you can use to plan, not a vanity dashboard.

Updated 2026-06-09Rolling 6 months, agent workloads only

Workloads with a >10k-token stable prefix

Detail

Share of measured agent workloads carrying at least ten thousand tokens of stable prefix per request.

38
Jan
41
Feb
44
Mar
49
Apr
53
May
56
Jun

Measured in % of workloads.

Median realized reuse ratio

Detail

The median share of input tokens served from cache once provider caching is configured through Zumik.

39
Jan
42
Feb
46
Mar
48
Apr
51
May
53
Jun

Measured in % of input tokens.

Routing mix by alias family

Detail

Distribution of resolved requests across the alias families in the catalog.

41
code.*
22
auto.balanced
16
auto.fast
12
auto.cheapest
6
reasoning.best
3
vision.*

Measured in % of requests.

Share of non-interactive tokens on batch tiers

Detail

How much background and evaluation traffic moves to 24-hour batch lanes for the 50% discount.

21
Jan
26
Feb
31
Mar
37
Apr
44
May
49
Jun

Measured in % of background tokens.

So what

The throughline: more of every agent request is stable scaffolding, and more teams are capturing it. That is exactly the surface reuse targets. Read why a repeated prompt is not a cache hit, then measure your own.

See where your workload sits.

Run a diagnostic and compare your reuse profile against these aggregate trends.