xAI

Grok 4 and Grok-3 Mini, context caching, cheap fast frontier.

75%
Cache-read discount
-
No batch tier
5
Models on Zumik
Yes
BYOK supported

How caching works here

Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.

What Zumik sees

Without a batch lane, xAI cost discipline lives entirely in alias routing and reuse. Zumik leans on Grok-3 Mini for auto.fast and reserves Grok 4 for auto.best to keep blended cost in range.

Pitfall

Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.

Profile
Min cache size1,024 tok
RetentionShort idle window
Service tiersstandard
BYOCManaged only
low-latency chatreal-time knowledgecheap frontier responses

xAI models in the catalog.

ModelContextInputOutputCache readReuse-adj
Grok Build 0.1256K$1.00$2.00$0.20 80%$0.92
Grok 4.20 (Non-Reasoning)1M$1.25$2.50$0.20 84%$1.13
Grok 4.20 (Reasoning)1M$1.25$2.50$0.20 84%$1.13
Grok 4.20 Multi-Agent1M$1.25$2.50$0.20 84%$1.13
Grok 4.31M$1.25$2.50$0.20 84%$1.13

xAI, answered.

How does xAI prompt caching work?

Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.

What discount does xAI caching give?

Cache reads on xAI are about 75% cheaper than list input price.

Does xAI support BYOK on Zumik?

Yes. You can bring your own xAI key, and provider-native caching, batch, and service tiers stay active under your account.

What is the common xAI caching mistake?

Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.

Route xAI the smart way.

Capture xAI's 75% cache-read discount automatically through Zumik.