xAI prompt caching

Context caching. Here is how to capture the 75% cache-read discount on real agent traffic - and the mistakes that quietly erase it.

75%
Cache-read discount
none
Write premium
1,024
Min cacheable tokens

How it works

Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.

What Zumik observes

Without a batch lane, xAI cost discipline lives entirely in alias routing and reuse. Zumik leans on Grok-3 Mini for auto.fast and reserves Grok 4 for auto.best to keep blended cost in range.

python - consecutive context reuse
# Grok reuses a cached context across consecutive requests
# sharing the same prefix. No batch tier, so route background
# work elsewhere and keep the cheap Grok-3 Mini for auto.fast.
messages = [
    {"role": "system", "content": STABLE_POLICY},
    {"role": "user", "content": turn},
]
Pitfall

Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.

Capturing xAI caching.

  1. Order stable content first. Put system policy, tools, and durable context at the front of the prompt so the cacheable prefix is as long as possible.
  2. Avoid volatile content near the top. Keep timestamps, request ids, and per-call notes out of the prefix; they reset the match and drop the hit rate.
  3. Confirm the hit. Read the usage object for cached tokens to verify the prefix is being reused at the read rate.
The full prompt-ordering playbook

xAI caching, answered.

How does xAI prompt caching work?

Grok models reuse a cached context prefix when consecutive requests share it. There is no async batch tier today, so cost control depends on cache hits and routing the cheap Grok-3 Mini where quality allows.

What does xAI caching save?

Cache reads are about 75% cheaper than list input.

What is the most common mistake?

Treating xAI like OpenAI for background jobs - there is no 50% batch discount to fall back on, so non-interactive work should usually route elsewhere.

How long does xAI keep a cache warm?

Short idle window

Capture xAI caching automatically.

Zumik places stable content first, captures the discount, and reports how much you actually reused.