Tool · workload score

Score reuse potential before you change anything

Set the six components and watch the Workload Reuse Score move. The band tells you what to do next, and a separate readiness panel keeps a long prompt from talking you into infrastructure you do not need.

Opportunity ratio · w 0.35: 0.62

Candidate reusable input tokens over total input tokens. The ceiling on prefill you could ever save.

Recurrence · w 0.2: 0.55

How often equivalent reusable prefix families recur in real traffic.

Retention locality · w 0.15: 0.48

Share of that recurrence landing inside provider or runtime cache windows.

TTFT sensitivity · w 0.15: 0.50

How much prefill latency contributes to user-visible delay and SLOs.

Session continuity · w 0.1: 0.40

Share of traffic belonging to multi-turn or branched agent sessions.

Payload redundancy · w 0.05: 0.35

Repeated serialized bytes that could be replaced with reusable state references.

Workload Reuse Score

Plausible fit

Recommended action

Run a diagnostic and tune providers.

Separate score

Deployment readiness

0/5

Readiness is graded on its own. A high score means there is reuse to capture, not that you should run your own GPUs.

Provider capability coverageDo managed providers already serve the models this workload needs?BYOK feasibilityCan you bring your own provider keys without contractual friction?Security approvalsAre isolation and data-handling sign-offs realistic to obtain?Traffic concentrationIs volume concentrated on one or two hot model paths?Operational appetiteIs there bandwidth to run inference infrastructure on call?

BYOC is not indicated yet. Capture reuse through managed-provider caching and prompt layout first, then revisit readiness.

Method

The score weights opportunity heaviest and payload redundancy lightest, matching the diagnostic. It measures whether reuse exists, not whether you should self-host - that is the deployment-readiness question, scored on its own.

Components

What each input means.

Opportunity ratio · 0.35

Candidate reusable input tokens over total input tokens. The biggest weight because it caps everything downstream: you cannot capture reuse that was never there.

Recurrence · 0.20

How often equivalent reusable prefix families come back. A fixed tool registry that recurs every request scores high; a prompt that mutates each call scores low even if it is long.

Retention locality · 0.15

Whether that recurrence lands inside a provider or runtime cache window. Fast recurrence with short windows still misses.

TTFT sensitivity · 0.15

How much prefill latency hurts the user-visible experience or an SLO. Reuse pays back faster when prefill is on the critical path.

Session continuity · 0.10

Share of traffic in multi-turn or branched sessions, where prior context is reused across turns.

Payload redundancy · 0.05

Serialized bytes repeated request to request that a state reference could replace. The smallest weight, but it compounds on chatty tool schemas.

Lint your prompt layout

Find the ordering mistakes that quietly cap your opportunity ratio.

Run a free workload scan

Measure each component from real traffic, no payment method.

Price the reuse

See what a given capture rate is worth on your bill.

Frequently asked

The score, answered.

How is the score computed?

WRS = 100 × (0.35·opportunity_ratio + 0.20·recurrence + 0.15·retention_locality + 0.15·ttft_sensitivity + 0.10·session_continuity + 0.05·payload_redundancy). Each input is normalized 0 to 1.

Why does a long prompt not raise the score on its own?

Length is not a component. A long prompt that mutates every request has a low opportunity ratio and poor recurrence, so it scores low. That is the point: prompt size never justified self-hosting.

What does deployment readiness add?

It is a separate gate (plan §4.3) for provider coverage, BYOK feasibility, security approvals, traffic concentration, and operational appetite. A high score plus full readiness is the only state where a BYOC pilot is worth proving.

Is this the real diagnostic?

No. This estimates from values you set by hand. The Agent Workload Efficiency Diagnostic measures each component from your metadata traces and returns evidence levels per measurement.

Estimate here, measure on your traffic.

This is a model. A diagnostic computes each component from metadata traces and tells you where capture is actually leaking.

Start a free scan How the score works

Deployment readiness

What each input means.

Opportunity ratio · 0.35

Recurrence · 0.20

Retention locality · 0.15

TTFT sensitivity · 0.15

Session continuity · 0.10

Payload redundancy · 0.05

From estimate to evidence.

Lint your prompt layout

Run a free workload scan

Price the reuse

The score, answered.

Estimate here, measure on your traffic.