Cost · 5 min read

Half your background tokens belong on a batch tier

Non-interactive traffic is the most over-paid line in most inference bills. Moving it to batch is a 50% discount waiting to be taken.

Published 2026-05-14

There is a line in almost every inference bill that should be half its size: background work running on interactive tiers. Evaluations, backfills, summaries, nightly reprocessing. None of it needs a response in two seconds, yet it pays the price as if it did.

The discount is just sitting there

OpenAI, Anthropic, and Gemini all offer batch tiers at roughly half price for a 24-hour turnaround. Our trends show batch adoption climbing past 49% of clearly non-interactive tokens, which means a lot of teams have found this, and a lot still have not.

Why teams miss it

The blocker is rarely capability. It is that interactive and background traffic are not separated, so everything routes the same way by default. Tag background work with a background QoS class and let routing send it to a batch lane. xAI is the exception with no batch tier today, so background traffic should route away from it.

Keep going.

Engineering

Turn the idea into a measurement.

Run a diagnostic on your own traffic and see the reuse waterfall this post describes.

Run a diagnostic More posts

Half your background tokens belong on a batch tier

The discount is just sitting there

Why teams miss it

Keep going.

Prompt ordering is the cheapest optimization you are skipping

When bringing your own cloud actually pays off

A repeated prompt is not a cache hit

Turn the idea into a measurement.