Measure repeated work before you optimize
The Agent Workload Efficiency Diagnostic reads metadata traces and tells you, in one report, how much reuse you could capture, how much you actually are, and the single cheapest thing to fix next.
It separates possibility from reality.
Candidate reusable tokens show what the workload could save. Realized reused tokens show what the provider or runtime actually reused. The gap between them, missed-opportunity tokens, is the most actionable line in the report, because it usually points straight at prompt ordering, not hardware.
A repeated prompt is not a cache hitSix components, one comparable number.
Opportunity ratio
Candidate reusable input tokens divided by total input tokens.
Recurrence score
How often equivalent reusable prefix families recur in real traffic.
Retention locality
How much recurrence lands inside provider or runtime cache windows.
TTFT sensitivity
How much prefill latency contributes to user-visible delay and SLOs.
Session continuity
How much traffic belongs to multi-turn sessions or branches.
Payload redundancy
How many serialized bytes can be replaced by state references.
What to do at each band.
| WRS | Fit | Recommended action |
|---|---|---|
| 70-100 | Strong fit | Prioritize an optimization pilot. |
| 45-69 | Plausible fit | Run a diagnostic and tune providers. |
| 20-44 | Limited fit | Improve prompt construction first. |
| 0-19 | Weak fit | Do not sell BYOC or custom caching. |
Median scores by workload type from the Zumik corpus.
Deployment readiness, provider coverage, BYOK feasibility, security approvals, region constraints, traffic concentration, operational appetite, is scored on its own. That stops the most common bad decision in this space: buying a hot lane because the prompts were long. Try it yourself in the Workload Reuse Score calculator.
Feed it what you already have.
Metadata traces, tokenized captures, provider exports, SDK traces, or latency dashboards. The diagnostic computes prefix families and opportunity without requiring raw prompts.
See the free-scan-to-pilot funnelLint a prompt layoutcurl https://api.zumik.ai/v2/diagnostics \
-H "Authorization: Bearer zk_live_..." \
-d '{
"source": "trace_export",
"trace_mode": "metadata",
"sample_ref": "trc_…"
}'Diagnostics, answered.
What is a Workload Reuse Score?
A 0-100 score built from opportunity ratio, recurrence, retention locality, TTFT sensitivity, session continuity, and payload redundancy. It measures reuse potential without prescribing infrastructure.
Does a high score mean I should self-host?
No. Deployment readiness is scored separately. A high WRS means there is reuse to capture; managed-provider caching or BYOK often captures most of it without BYOC.
Do I have to send raw prompts?
No. The diagnostic runs on metadata-only traces by default. It infers prefix families and opportunity without storing prompt content.
What does the diagnostic return?
A Workload Reuse Score, the reuse waterfall, a provider-fit matrix, prompt-layout recommendations, and the lowest-complexity next step, with an evidence level on every measurement.
Measure before you migrate.
Run a diagnostic on real traffic and let the reuse waterfall decide what to optimize first.