Capability · diagnostics

Measure repeated work before you optimize

The Agent Workload Efficiency Diagnostic reads metadata traces and tells you, in one report, how much reuse you could capture, how much you actually are, and the single cheapest thing to fix next.

Start a free scan See reuse by workload type

Opportunity vs. capture

It separates possibility from reality.

Candidate reusable tokens show what the workload could save. Realized reused tokens show what the provider or runtime actually reused. The gap between them, missed-opportunity tokens, is the most actionable line in the report, because it usually points straight at prompt ordering, not hardware.

A repeated prompt is not a cache hit

Example outputper request

Total input tokens100%

Eligible reuse78%

Candidate reuse66%

Realized reuse41%

Missed gap25%

The score

Six components, one comparable number.

Opportunity ratio

Candidate reusable input tokens divided by total input tokens.

Recurrence score

How often equivalent reusable prefix families recur in real traffic.

Retention locality

How much recurrence lands inside provider or runtime cache windows.

TTFT sensitivity

How much prefill latency contributes to user-visible delay and SLOs.

Session continuity

How much traffic belongs to multi-turn sessions or branches.

Payload redundancy

How many serialized bytes can be replaced by state references.

Interpretation

What to do at each band.

WRS	Fit	Recommended action
70-100	Strong fit	Prioritize an optimization pilot.
45-69	Plausible fit	Run a diagnostic and tune providers.
20-44	Limited fit	Improve prompt construction first.
0-19	Weak fit	Do not sell BYOC or custom caching.

Coding agent82

Support automation58

RAG fanout41

Consumer chat16

Median scores by workload type from the Zumik corpus.

Separate

Deployment readiness, provider coverage, BYOK feasibility, security approvals, region constraints, traffic concentration, operational appetite, is scored on its own. That stops the most common bad decision in this space: buying a hot lane because the prompts were long. Try it yourself in the Workload Reuse Score calculator.

Inputs

Feed it what you already have.

Metadata traces, tokenized captures, provider exports, SDK traces, or latency dashboards. The diagnostic computes prefix families and opportunity without requiring raw prompts.

See the free-scan-to-pilot funnel Lint a prompt layout

submit a trace

curl https://api.zumik.ai/v2/diagnostics \
  -H "Authorization: Bearer zk_live_..." \
  -d '{
    "source": "trace_export",
    "trace_mode": "metadata",
    "sample_ref": "trc_…"
  }'

Frequently asked

Diagnostics, answered.

What is a Workload Reuse Score?

A 0-100 score built from opportunity ratio, recurrence, retention locality, TTFT sensitivity, session continuity, and payload redundancy. It measures reuse potential without prescribing infrastructure.

Does a high score mean I should self-host?

No. Deployment readiness is scored separately. A high WRS means there is reuse to capture; managed-provider caching or BYOK often captures most of it without BYOC.

Do I have to send raw prompts?

No. The diagnostic runs on metadata-only traces by default. It infers prefix families and opportunity without storing prompt content.

What does the diagnostic return?

A Workload Reuse Score, the reuse waterfall, a provider-fit matrix, prompt-layout recommendations, and the lowest-complexity next step, with an evidence level on every measurement.

Measure before you migrate.

Run a diagnostic on real traffic and let the reuse waterfall decide what to optimize first.

Start a free scan See workload trends