What will this actually cost per month?
Pick a model, describe your traffic, and set how much of your prompt is a reusable stable prefix. The calculator shows list-price spend, reuse-adjusted spend, and the gap between them - then adds the flat $5 platform fee.
600,000 requests/mo · 7,200,000,000 input tokens · 420,000,000 output tokens.
Estimate only. Credits are billed at provider pass-through; actual reuse depends on prompt ordering and retention locality. Output is excluded from cache savings.
Reused input is billed at the model's cache-read rate; the rest at list input. Output is never discounted by caching. This is an estimate - measure your real reuse with a workload diagnostic.
Calculator, answered.
How does the calculator estimate savings?
It splits your input tokens by the reuse percentage you set, billing the reused portion at the model’s cache-read rate and the rest at list input price. Output is excluded from cache savings.
What reuse percentage should I use?
Median realized reuse across agent workloads in our corpus is around 53%. Coding agents often run higher; consumer chat runs much lower. Run a diagnostic to measure yours.
Is the $5 fee per seat?
No. The $5 monthly platform fee is flat. Inference credits are pay-as-you-go at provider pass-through.
Estimate, then measure.
The calculator is a model; a diagnostic is the truth. Run one on your real traffic.