What will this actually cost per month?

Pick a model, describe your traffic, and set how much of your prompt is a reusable stable prefix. The calculator shows list-price spend, reuse-adjusted spend, and the gap between them - then adds the flat $5 platform fee.

600,000 requests/mo · 7,200,000,000 input tokens · 420,000,000 output tokens.

$93,000
List price, no reuse / mo
$57,360
With 55% reuse / mo
Estimated monthly saving
$35,640
38%
Zumik platform$5.00/mo
Inference credits (pass-through)$57,360/mo
Estimated total$57,365/mo

Estimate only. Credits are billed at provider pass-through; actual reuse depends on prompt ordering and retention locality. Output is excluded from cache savings.

Method

Reused input is billed at the model's cache-read rate; the rest at list input. Output is never discounted by caching. This is an estimate - measure your real reuse with a workload diagnostic.

Calculator, answered.

How does the calculator estimate savings?

It splits your input tokens by the reuse percentage you set, billing the reused portion at the model’s cache-read rate and the rest at list input price. Output is excluded from cache savings.

What reuse percentage should I use?

Median realized reuse across agent workloads in our corpus is around 53%. Coding agents often run higher; consumer chat runs much lower. Run a diagnostic to measure yours.

Is the $5 fee per seat?

No. The $5 monthly platform fee is flat. Inference credits are pay-as-you-go at provider pass-through.

Estimate, then measure.

The calculator is a model; a diagnostic is the truth. Run one on your real traffic.