One simple plan. Credits you actually use.
No per-seat tax and no tiered feature gates. Zumik is $5 a month for the control plane, plus pay-as-you-go inference credits billed at provider pass-through.
plus pay-as-you-go inference credits
- Exact OpenAI-compatible /v1 and native /v2 APIs
- Reuse opportunity and realized-capture metrics on every request
- Reproducible model-alias releases and routing records
- Workload diagnostics with Workload Reuse Score
- Metadata-first tracing with opaque, tenant-scoped handles
- Provider-native caching, batch, and service-tier routing
- Purge jobs with profile-specific receipts
- BYOK across all five first-class providers
When the workload earns it.
Dedicated SLOs, private networking, regional isolation, runtime-confirmed purge evidence, and explicit KV orchestration. Activated only after a replay run proves it beats managed providers, so you never pay for infrastructure you do not need.
- Replay-backed BYOC runtime profiles (Dynamo, SGLang, LMCache, Mooncake)
- Kubernetes profiles on llm-d, KServe, and AIBrix
- Custom retention, region, and procurement policy
- Volume credit pricing and committed-use discounts
Estimate your monthly spend
Drag the assumptions to match your workload. The calculator compares list-price inference against the same traffic with a cached stable prefix, then adds the flat $5 platform fee.
600,000 requests/mo · 7,200,000,000 input tokens · 420,000,000 output tokens.
Estimate only. Credits are billed at provider pass-through; actual reuse depends on prompt ordering and retention locality. Output is excluded from cache savings.
This is an estimate. Realized reuse depends on prompt ordering and retention locality, run a workload diagnostic to measure yours, or read how providers cache on the prompt caching pages.
Cheaper than recomputing, by design.
The point of the platform fee is to make the credits smaller. See which models are cheapest once caching is working, or how Zumik compares to gateways and routers.
Pricing, answered plainly.
How much does Zumik cost?
Zumik is $5 per month plus pay-as-you-go credits. The platform fee covers the control plane; inference credits are billed at provider pass-through for the tokens you actually use.
Do credits expire or carry a minimum?
There is no seat minimum. You add credits as needed and spend them against inference and diagnostics. The calculator on this page estimates a typical monthly total.
Is BYOK or BYOC extra?
BYOK is included, bring your own provider keys and inference is billed by your provider directly. BYOC is a scoped engagement, activated only when replay evidence shows it beats managed providers.
How does reuse change my bill?
Reused prefix tokens are billed at the provider cache-read rate instead of full input price. The calculator shows the difference; in our corpus, median realized reuse is around 53% on agent workloads.
$5/month plus pay-as-you-go credits
Sign up, point an OpenAI client at Zumik, and only pay for the inference and diagnostics you run.