How caching works here
Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.
What Zumik sees
Without a batch lane, xAI cost discipline lives entirely in alias routing and reuse. Zumik leans on Grok-3 Mini for auto.fast and reserves Grok 4 for auto.best to keep blended cost in range.
Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.
xAI models in the catalog.
| Model | Context | Input | Output | Cache read | Reuse-adj |
|---|---|---|---|---|---|
| Grok Build 0.1 | 256K | $1.00 | $2.00 | $0.20 −80% | $0.92 |
| Grok 4.20 (Non-Reasoning) | 1M | $1.25 | $2.50 | $0.20 −84% | $1.13 |
| Grok 4.20 (Reasoning) | 1M | $1.25 | $2.50 | $0.20 −84% | $1.13 |
| Grok 4.20 Multi-Agent | 1M | $1.25 | $2.50 | $0.20 −84% | $1.13 |
| Grok 4.3 | 1M | $1.25 | $2.50 | $0.20 −84% | $1.13 |
xAI, answered.
How does xAI prompt caching work?
Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.
What discount does xAI caching give?
Cache reads on xAI are about 75% cheaper than list input price.
Does xAI support BYOK on Zumik?
Yes. You can bring your own xAI key, and provider-native caching, batch, and service tiers stay active under your account.
What is the common xAI caching mistake?
Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.
Route xAI the smart way.
Capture xAI's 75% cache-read discount automatically through Zumik.