Measure repeated work before you optimize

The Agent Workload Efficiency Diagnostic reads metadata traces and tells you, in one report, how much reuse you could capture, how much you actually are, and the single cheapest thing to fix next.

Opportunity vs. capture

It separates possibility from reality.

Candidate reusable tokens show what the workload could save. Realized reused tokens show what the provider or runtime actually reused. The gap between them, missed-opportunity tokens, is the most actionable line in the report, because it usually points straight at prompt ordering, not hardware.

A repeated prompt is not a cache hit
Example outputper request
Total input tokens100%
Eligible reuse78%
Candidate reuse66%
Realized reuse41%
Missed gap25%

Six components, one comparable number.

Opportunity ratio

Candidate reusable input tokens divided by total input tokens.

Recurrence score

How often equivalent reusable prefix families recur in real traffic.

Retention locality

How much recurrence lands inside provider or runtime cache windows.

TTFT sensitivity

How much prefill latency contributes to user-visible delay and SLOs.

Session continuity

How much traffic belongs to multi-turn sessions or branches.

Payload redundancy

How many serialized bytes can be replaced by state references.

What to do at each band.

WRSFitRecommended action
70-100Strong fitPrioritize an optimization pilot.
45-69Plausible fitRun a diagnostic and tune providers.
20-44Limited fitImprove prompt construction first.
0-19Weak fitDo not sell BYOC or custom caching.
Coding agent82
Support automation58
RAG fanout41
Consumer chat16

Median scores by workload type from the Zumik corpus.

Separate

Deployment readiness, provider coverage, BYOK feasibility, security approvals, region constraints, traffic concentration, operational appetite, is scored on its own. That stops the most common bad decision in this space: buying a hot lane because the prompts were long. Try it yourself in the Workload Reuse Score calculator.

Inputs

Feed it what you already have.

Metadata traces, tokenized captures, provider exports, SDK traces, or latency dashboards. The diagnostic computes prefix families and opportunity without requiring raw prompts.

See the free-scan-to-pilot funnelLint a prompt layout
submit a trace
curl https://api.zumik.ai/v2/diagnostics \
  -H "Authorization: Bearer zk_live_..." \
  -d '{
    "source": "trace_export",
    "trace_mode": "metadata",
    "sample_ref": "trc_…"
  }'

Diagnostics, answered.

What is a Workload Reuse Score?

A 0-100 score built from opportunity ratio, recurrence, retention locality, TTFT sensitivity, session continuity, and payload redundancy. It measures reuse potential without prescribing infrastructure.

Does a high score mean I should self-host?

No. Deployment readiness is scored separately. A high WRS means there is reuse to capture; managed-provider caching or BYOK often captures most of it without BYOC.

Do I have to send raw prompts?

No. The diagnostic runs on metadata-only traces by default. It infers prefix families and opportunity without storing prompt content.

What does the diagnostic return?

A Workload Reuse Score, the reuse waterfall, a provider-fit matrix, prompt-layout recommendations, and the lowest-complexity next step, with an evidence level on every measurement.

Measure before you migrate.

Run a diagnostic on real traffic and let the reuse waterfall decide what to optimize first.