Scoring a workload before you change infrastructure

How the Workload Reuse Score is built from six components, and why prompt length alone never justified self-hosting.

WRSdiagnosticsbyoc
Published 2026-06-06

For a while the industry used a lazy heuristic: long prompts mean you should self-host. It is wrong often enough to be dangerous. Plenty of long-prompt workloads have terrible retention locality, and plenty of medium-prompt workloads recur fast enough to cache beautifully.

So we replaced the heuristic with a score.

Six components, one number

Workload Reuse Score blends opportunity ratio, recurrence, retention locality, TTFT sensitivity, session continuity, and payload redundancy. Each is normalized, then combined into a 0 to 100 value you can compare across workloads.

A coding agent with stable scaffolding and tight recurrence scores high. A consumer chat app with weak locality scores low, and that is the right answer even if its prompts happen to be long.

Score and readiness are different questions

A high score means there is reuse to capture. It does not mean you should run your own GPUs. We track deployment readiness separately: provider capability, BYOK feasibility, security approvals, region constraints, traffic concentration, and the operational appetite to run infrastructure.

Keeping those two questions apart stops the most common bad decision in this space, which is buying a hot lane because the prompts were big.

What to do at each band

Above 70, prioritize an optimization pilot. From 45 to 69, run a diagnostic and tune providers. From 20 to 44, fix prompt construction first. Below 20, do not sell yourself a caching project; the opportunity is not there.

Turn the idea into a measurement.

Run a diagnostic on your own traffic and see the reuse waterfall this post describes.