OpenAI

GPT-5 family, automatic prefix caching, flex/scale tiers.

75%
Cache-read discount
50%
Batch discount
68
Models on Zumik
Yes
BYOK supported

How caching works here

Caching is automatic for prompts at or above 1,024 tokens. The longest matching prefix is reused, billed at the cache-read rate, and reported back as cached tokens in the usage object. Keeping stable content at the front of the request is what makes the prefix match.

What Zumik sees

Across our corpus, OpenAI returns provider-reported cached-token counts on most eligible requests, which gives Zumik the strongest evidence level (provider_reported) for capture without any runtime instrumentation.

Pitfall

Injecting a timestamp, request id, or per-call system note near the top of the prompt resets the prefix and silently drops the hit rate to near zero.

Profile
Min cache size1,024 tok
RetentionMinutes idle, up to ~24h on extended retention
Service tiersflex, default, scale, priority
BYOCManaged only
general agentsmixed workloadsteams already on OpenAI

OpenAI models in the catalog.

ModelContextInputOutputCache readReuse-adj
GPT-5 Nano400K$0.05$0.40$0.01 90%$0.12
GPT-5 Nano-2025-08-07400K$0.05$0.40$0.01 90%$0.12
GPT-4.1 Nano1M$0.10$0.40$0.03 75%$0.14
GPT-4.1 Nano-2025-04-141M$0.10$0.40$0.03 75%$0.14
GPT-4o Mini128K$0.15$0.60$0.07 50%$0.23
GPT-4o Mini-2024-07-18128K$0.15$0.60$0.07 50%$0.23
GPT-5.4 Nano400K$0.20$1.25$0.02 90%$0.39
GPT-5.4 Nano-2026-03-17400K$0.20$1.25$0.02 90%$0.39
GPT-5 Mini400K$0.25$2.00$0.03 90%$0.59
GPT-5 Mini-2025-08-07400K$0.25$2.00$0.03 90%$0.59
GPT-4.1 Mini1M$0.40$1.60$0.10 75%$0.58
GPT-4.1 Mini-2025-04-141M$0.40$1.60$0.10 75%$0.58
GPT-3.5-turbo16K$0.50$1.50$0.00 100%$0.54
GPT-3.5-turbo-012516K$0.50$1.50$0.00 100%$0.54
GPT-3.5-turbo-110616K$0.50$1.50$0.00 100%$0.54
GPT-5.4 Mini400K$0.75$4.50$0.07 90%$1.41
GPT-5.4 Mini-2026-03-17400K$0.75$4.50$0.07 90%$1.41
o3 Mini200K$1.10$4.40$0.55 50%$1.70
o3 Mini-2025-01-31200K$1.10$4.40$0.55 50%$1.70
o4 Mini200K$1.10$4.40$0.28 75%$1.58
o4 Mini-2025-04-16200K$1.10$4.40$0.28 75%$1.58
GPT-5400K$1.25$10.00$0.13 90%$2.97
GPT-5 Chat400K$1.25$10.00$0.13 90%$2.97
GPT-5 Codex400K$1.25$10.00$0.13 90%$2.97
GPT-5-2025-08-07400K$1.25$10.00$0.13 90%$2.97
GPT-5.1400K$1.25$10.00$0.13 90%$2.97
GPT-5.1 Chat128K$1.25$10.00$0.13 90%$2.97
GPT-5.1 Codex400K$1.25$10.00$0.13 90%$2.97
GPT-5.1 Codex Max400K$1.25$10.00$0.13 90%$2.97
GPT-5.1-2025-11-13400K$1.25$10.00$0.13 90%$2.97
GPT-5.2400K$1.75$14.00$0.17 90%$4.16
GPT-5.2 Chat128K$1.75$14.00$0.17 90%$4.16
GPT-5.2 Codex400K$1.75$14.00$0.17 90%$4.16
GPT-5.2-2025-12-11400K$1.75$14.00$0.17 90%$4.16
GPT-5.3 Chat128K$1.75$14.00$0.17 90%$4.16
GPT-5.3 Codex400K$1.75$14.00$0.17 90%$4.16
GPT-4.11M$2.00$8.00$0.50 75%$2.88
GPT-4.1-2025-04-141M$2.00$8.00$0.50 75%$2.88
o3200K$2.00$8.00$0.50 75%$2.88
o3-2025-04-16200K$2.00$8.00$0.50 75%$2.88
o4 Mini-deep-research200K$2.00$8.00$0.50 75%$2.88
GPT-4o128K$2.50$10.00$1.25 50%$3.86
GPT-4o-2024-08-06128K$2.50$10.00$1.25 50%$3.86
GPT-4o-2024-11-20128K$2.50$10.00$1.25 50%$3.86
GPT-5.41.1M$2.50$15.00$0.25 90%$4.70
GPT-5.4-2026-03-051.1M$2.50$15.00$0.25 90%$4.70
GPT-4o-2024-05-13128K$5.00$15.00$5.00 $7.50
GPT-5.51.1M$5.00$30.00$0.50 90%$9.39
GPT-5.5-2026-04-231.1M$5.00$30.00$0.50 90%$9.39
GPT-4-turbo128K$10.00$30.00$10.00 $15.00
GPT-4-turbo-2024-04-09128K$10.00$30.00$10.00 $15.00
GPT-5 Pro400K$15.00$120.00$15.00 $41.25
GPT-5 Pro-2025-10-06400K$15.00$120.00$15.00 $41.25
o1200K$15.00$60.00$7.50 50%$23.16
o1-2024-12-17200K$15.00$60.00$7.50 50%$23.16
GPT-5.2 Pro400K$21.00$168.00$21.00 $57.75
GPT-5.2 Pro-2025-12-11400K$21.00$168.00$21.00 $57.75
GPT-48K$30.00$60.00$30.00 $37.50
GPT-4-06138K$30.00$60.00$30.00 $37.50
GPT-5.4 Pro1.1M$30.00$180.00$30.00 $67.50
GPT-5.4 Pro-2026-03-051.1M$30.00$180.00$30.00 $67.50
GPT-5.5 Pro1.1M$30.00$180.00$30.00 $67.50
GPT-5.5 Pro-2026-04-231.1M$30.00$180.00$30.00 $67.50
o1 Pro200K$150.00$600.00$150.00 $262.50
o1 Pro-2025-03-19200K$150.00$600.00$150.00 $262.50
GPT-3.5-turbo-16k
GPT-3.5-turbo-instruct
GPT-3.5-turbo-instruct-0914

OpenAI, answered.

How does OpenAI prompt caching work?

Caching is automatic for prompts at or above 1,024 tokens. The longest matching prefix is reused, billed at the cache-read rate, and reported back as cached tokens in the usage object. Keeping stable content at the front of the request is what makes the prefix match.

What discount does OpenAI caching give?

Cache reads on OpenAI are about 75% cheaper than list input price.

Does OpenAI support BYOK on Zumik?

Yes. You can bring your own OpenAI key, and provider-native caching, batch, and service tiers stay active under your account.

What is the common OpenAI caching mistake?

Injecting a timestamp, request id, or per-call system note near the top of the prompt resets the prefix and silently drops the hit rate to near zero.

Route OpenAI the smart way.

Capture OpenAI's 75% cache-read discount and batch tier automatically through Zumik.