Score reuse potential before you change anything
Set the six components and watch the Workload Reuse Score move. The band tells you what to do next, and a separate readiness panel keeps a long prompt from talking you into infrastructure you do not need.
Candidate reusable input tokens over total input tokens. The ceiling on prefill you could ever save.
How often equivalent reusable prefix families recur in real traffic.
Share of that recurrence landing inside provider or runtime cache windows.
How much prefill latency contributes to user-visible delay and SLOs.
Share of traffic belonging to multi-turn or branched agent sessions.
Repeated serialized bytes that could be replaced with reusable state references.
Run a diagnostic and tune providers.
Deployment readiness
Readiness is graded on its own. A high score means there is reuse to capture, not that you should run your own GPUs.
BYOC is not indicated yet. Capture reuse through managed-provider caching and prompt layout first, then revisit readiness.
The score weights opportunity heaviest and payload redundancy lightest, matching the diagnostic. It measures whether reuse exists, not whether you should self-host - that is the deployment-readiness question, scored on its own.
What each input means.
Opportunity ratio · 0.35
Candidate reusable input tokens over total input tokens. The biggest weight because it caps everything downstream: you cannot capture reuse that was never there.
Recurrence · 0.20
How often equivalent reusable prefix families come back. A fixed tool registry that recurs every request scores high; a prompt that mutates each call scores low even if it is long.
Retention locality · 0.15
Whether that recurrence lands inside a provider or runtime cache window. Fast recurrence with short windows still misses.
TTFT sensitivity · 0.15
How much prefill latency hurts the user-visible experience or an SLO. Reuse pays back faster when prefill is on the critical path.
Session continuity · 0.10
Share of traffic in multi-turn or branched sessions, where prior context is reused across turns.
Payload redundancy · 0.05
Serialized bytes repeated request to request that a state reference could replace. The smallest weight, but it compounds on chatty tool schemas.
The score, answered.
How is the score computed?
WRS = 100 × (0.35·opportunity_ratio + 0.20·recurrence + 0.15·retention_locality + 0.15·ttft_sensitivity + 0.10·session_continuity + 0.05·payload_redundancy). Each input is normalized 0 to 1.
Why does a long prompt not raise the score on its own?
Length is not a component. A long prompt that mutates every request has a low opportunity ratio and poor recurrence, so it scores low. That is the point: prompt size never justified self-hosting.
What does deployment readiness add?
It is a separate gate (plan §4.3) for provider coverage, BYOK feasibility, security approvals, traffic concentration, and operational appetite. A high score plus full readiness is the only state where a BYOC pilot is worth proving.
Is this the real diagnostic?
No. This estimates from values you set by hand. The Agent Workload Efficiency Diagnostic measures each component from your metadata traces and returns evidence levels per measurement.
Estimate here, measure on your traffic.
This is a model. A diagnostic computes each component from metadata traces and tells you where capture is actually leaking.