xAI prompt caching
Context caching. Here is how to capture the 75% cache-read discount on real agent traffic - and the mistakes that quietly erase it.
How it works
Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.
What Zumik observes
Without a batch lane, xAI cost discipline lives entirely in alias routing and reuse. Zumik leans on Grok-3 Mini for auto.fast and reserves Grok 4 for auto.best to keep blended cost in range.
# Grok reuses a cached context across consecutive requests
# sharing the same prefix. No batch tier, so route background
# work elsewhere and keep the cheap Grok-3 Mini for auto.fast.
messages = [
{"role": "system", "content": STABLE_POLICY},
{"role": "user", "content": turn},
]Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.
Capturing xAI caching.
- Order stable content first. Put system policy, tools, and durable context at the front of the prompt so the cacheable prefix is as long as possible.
- Avoid volatile content near the top. Keep timestamps, request ids, and per-call notes out of the prefix; they reset the match and drop the hit rate.
- Confirm the hit. Read the usage object for cached tokens to verify the prefix is being reused at the read rate.
xAI caching, answered.
How does xAI prompt caching work?
Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.
What does xAI caching save?
Cache reads are about 75% cheaper than list input.
What is the most common mistake?
Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.
How long does xAI keep a cache warm?
Short idle window
Capture xAI caching automatically.
Zumik places stable content first, captures the discount, and reports how much you actually reused.