Anthropic prompt caching
Explicit cache_control breakpoints. Here is how to capture the 90% cache-read discount on real agent traffic - and the mistakes that quietly erase it.
How it works
You mark up to four cache breakpoints with cache_control. Everything before a breakpoint is cached. Reads are 90% off, but the first write costs 25% more than list input, so a breakpoint only pays off once the block is reused enough times to amortise the write.
What Zumik observes
Anthropic gives the highest read discount in the catalog, and our traces show the best median capture (around 90%+) - but only when breakpoints sit on genuinely stable blocks. Zumik places them from snapshot structure rather than guesswork.
# Mark up to 4 cache breakpoints.
system = [
{"type": "text", "text": STABLE_POLICY,
"cache_control": {"type": "ephemeral"}}, # everything before is cached
{"type": "text", "text": TOOL_DEFINITIONS,
"cache_control": {"type": "ephemeral"}},
]
# Read = -90%. First write = +25%. Only break on stable blocks.Putting a breakpoint after volatile content, or breaking on every turn, means you pay write premiums repeatedly and never recover them.
Capturing Anthropic caching.
- Order stable content first. Put system policy, tools, and durable context at the front of the prompt so the cacheable prefix is as long as possible.
- Place breakpoints on stable blocks. Mark cache breakpoints only on blocks that genuinely recur, so the write premium is amortized by repeated reads.
- Confirm the hit. Read the usage object for cached tokens to verify the prefix is being reused at the read rate.
Anthropic caching, answered.
How does Anthropic prompt caching work?
You mark up to four cache breakpoints with cache_control. Everything before a breakpoint is cached. Reads are 90% off, but the first write costs 25% more than list input, so a breakpoint only pays off once the block is reused enough times to amortise the write.
What does Anthropic caching save?
Cache reads are about 90% cheaper than list input, offset by a roughly 25% write premium on the first write.
What is the most common mistake?
Putting a breakpoint after volatile content, or breaking on every turn, means you pay write premiums repeatedly and never recover them.
How long does Anthropic keep a cache warm?
5 minutes default, 1 hour extended
Capture Anthropic caching automatically.
Zumik places stable content first, captures the discount, and reports how much you actually reused.