There is a line in almost every inference bill that should be half its size: background work running on interactive tiers. Evaluations, backfills, summaries, nightly reprocessing. None of it needs a response in two seconds, yet it pays the price as if it did.
The discount is just sitting there
OpenAI, Anthropic, and Gemini all offer batch tiers at roughly half price for a 24-hour turnaround. Our trends show batch adoption climbing past 49% of clearly non-interactive tokens, which means a lot of teams have found this, and a lot still have not.
Why teams miss it
The blocker is rarely capability. It is that interactive and background traffic are not separated, so everything routes the same way by default. Tag background work with a background QoS class and let routing send it to a batch lane. xAI is the exception with no batch tier today, so background traffic should route away from it.