Data · trends

How agent inference is actually changing.

These series come from real routed traffic, anonymized and aggregated. They describe how agent workloads are shaped and how reuse, routing, and batch adoption are trending - signal you can use to plan, not a vanity dashboard.

Workloads with a >10k-token stable prefix

Detail

Share of measured agent workloads carrying at least ten thousand tokens of stable prefix per request.

Jan

Feb

Mar

Apr

May

Jun

Measured in % of workloads.

Median realized reuse ratio

Detail

The median share of input tokens served from cache once provider caching is configured through Zumik.

Jan

Feb

Mar

Apr

May

Jun

Measured in % of input tokens.

Routing mix by alias family

Detail

Distribution of resolved requests across the alias families in the catalog.

code.*

auto.balanced

auto.fast

auto.cheapest

reasoning.best

vision.*

Measured in % of requests.

Share of non-interactive tokens on batch tiers

Detail

How much background and evaluation traffic moves to 24-hour batch lanes for the 50% discount.

Jan

Feb

Mar

Apr

May

Jun

Measured in % of background tokens.

So what

The throughline: more of every agent request is stable scaffolding, and more teams are capturing it. That is exactly the surface reuse targets. Read why a repeated prompt is not a cache hit, then measure your own.

See where your workload sits.

Run a diagnostic and compare your reuse profile against these aggregate trends.

Run a diagnostic Reuse by workload type