Most teams chasing inference cost reach for a new provider or a bigger discount. The first move should be free: put your prompt in the right order.
Stable first, volatile last
Every caching scheme rewards a stable prefix. So the order is: system policy, developer policy, the stable tool bundle, the response schema, durable workspace context, compacted checkpoints, then the dynamic parts - retrieval results, the latest user message, the latest tool result.
Get this wrong and the cache cannot help you. Get it right and capture often jumps without touching anything else.
The things that quietly break it
Timestamps in the system block. A request id rendered into the first message. A tool list that re-sorts itself between calls. A retrieved document spliced into the middle of otherwise stable context. Each of these resets the prefix at the worst possible place.
We surface these as stable-prefix violations in a diagnostic, because they are individually small and collectively expensive.