OpenAI
GPT-5 family, automatic prefix caching, flex/scale tiers.
How caching works here
Caching is automatic for prompts at or above 1,024 tokens. The longest matching prefix is reused, billed at the cache-read rate, and reported back as cached tokens in the usage object. Keeping stable content at the front of the request is what makes the prefix match.
What Zumik sees
Across our corpus, OpenAI returns provider-reported cached-token counts on most eligible requests, which gives Zumik the strongest evidence level (provider_reported) for capture without any runtime instrumentation.
Injecting a timestamp, request id, or per-call system note near the top of the prompt resets the prefix and silently drops the hit rate to near zero.
OpenAI models in the catalog.
| Model | Context | Input | Output | Cache read | Reuse-adj |
|---|---|---|---|---|---|
| GPT-5 Nano | 400K | $0.05 | $0.40 | $0.01 −90% | $0.12 |
| GPT-5 Nano-2025-08-07 | 400K | $0.05 | $0.40 | $0.01 −90% | $0.12 |
| GPT-4.1 Nano | 1M | $0.10 | $0.40 | $0.03 −75% | $0.14 |
| GPT-4.1 Nano-2025-04-14 | 1M | $0.10 | $0.40 | $0.03 −75% | $0.14 |
| GPT-4o Mini | 128K | $0.15 | $0.60 | $0.07 −50% | $0.23 |
| GPT-4o Mini-2024-07-18 | 128K | $0.15 | $0.60 | $0.07 −50% | $0.23 |
| GPT-5.4 Nano | 400K | $0.20 | $1.25 | $0.02 −90% | $0.39 |
| GPT-5.4 Nano-2026-03-17 | 400K | $0.20 | $1.25 | $0.02 −90% | $0.39 |
| GPT-5 Mini | 400K | $0.25 | $2.00 | $0.03 −90% | $0.59 |
| GPT-5 Mini-2025-08-07 | 400K | $0.25 | $2.00 | $0.03 −90% | $0.59 |
| GPT-4.1 Mini | 1M | $0.40 | $1.60 | $0.10 −75% | $0.58 |
| GPT-4.1 Mini-2025-04-14 | 1M | $0.40 | $1.60 | $0.10 −75% | $0.58 |
| GPT-3.5-turbo | 16K | $0.50 | $1.50 | $0.00 −100% | $0.54 |
| GPT-3.5-turbo-0125 | 16K | $0.50 | $1.50 | $0.00 −100% | $0.54 |
| GPT-3.5-turbo-1106 | 16K | $0.50 | $1.50 | $0.00 −100% | $0.54 |
| GPT-5.4 Mini | 400K | $0.75 | $4.50 | $0.07 −90% | $1.41 |
| GPT-5.4 Mini-2026-03-17 | 400K | $0.75 | $4.50 | $0.07 −90% | $1.41 |
| o3 Mini | 200K | $1.10 | $4.40 | $0.55 −50% | $1.70 |
| o3 Mini-2025-01-31 | 200K | $1.10 | $4.40 | $0.55 −50% | $1.70 |
| o4 Mini | 200K | $1.10 | $4.40 | $0.28 −75% | $1.58 |
| o4 Mini-2025-04-16 | 200K | $1.10 | $4.40 | $0.28 −75% | $1.58 |
| GPT-5 | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5 Chat | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5 Codex | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5-2025-08-07 | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5.1 | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5.1 Chat | 128K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5.1 Codex | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5.1 Codex Max | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5.1-2025-11-13 | 400K | $1.25 | $10.00 | $0.13 −90% | $2.97 |
| GPT-5.2 | 400K | $1.75 | $14.00 | $0.17 −90% | $4.16 |
| GPT-5.2 Chat | 128K | $1.75 | $14.00 | $0.17 −90% | $4.16 |
| GPT-5.2 Codex | 400K | $1.75 | $14.00 | $0.17 −90% | $4.16 |
| GPT-5.2-2025-12-11 | 400K | $1.75 | $14.00 | $0.17 −90% | $4.16 |
| GPT-5.3 Chat | 128K | $1.75 | $14.00 | $0.17 −90% | $4.16 |
| GPT-5.3 Codex | 400K | $1.75 | $14.00 | $0.17 −90% | $4.16 |
| GPT-4.1 | 1M | $2.00 | $8.00 | $0.50 −75% | $2.88 |
| GPT-4.1-2025-04-14 | 1M | $2.00 | $8.00 | $0.50 −75% | $2.88 |
| o3 | 200K | $2.00 | $8.00 | $0.50 −75% | $2.88 |
| o3-2025-04-16 | 200K | $2.00 | $8.00 | $0.50 −75% | $2.88 |
| o4 Mini-deep-research | 200K | $2.00 | $8.00 | $0.50 −75% | $2.88 |
| GPT-4o | 128K | $2.50 | $10.00 | $1.25 −50% | $3.86 |
| GPT-4o-2024-08-06 | 128K | $2.50 | $10.00 | $1.25 −50% | $3.86 |
| GPT-4o-2024-11-20 | 128K | $2.50 | $10.00 | $1.25 −50% | $3.86 |
| GPT-5.4 | 1.1M | $2.50 | $15.00 | $0.25 −90% | $4.70 |
| GPT-5.4-2026-03-05 | 1.1M | $2.50 | $15.00 | $0.25 −90% | $4.70 |
| GPT-4o-2024-05-13 | 128K | $5.00 | $15.00 | $5.00 | $7.50 |
| GPT-5.5 | 1.1M | $5.00 | $30.00 | $0.50 −90% | $9.39 |
| GPT-5.5-2026-04-23 | 1.1M | $5.00 | $30.00 | $0.50 −90% | $9.39 |
| GPT-4-turbo | 128K | $10.00 | $30.00 | $10.00 | $15.00 |
| GPT-4-turbo-2024-04-09 | 128K | $10.00 | $30.00 | $10.00 | $15.00 |
| GPT-5 Pro | 400K | $15.00 | $120.00 | $15.00 | $41.25 |
| GPT-5 Pro-2025-10-06 | 400K | $15.00 | $120.00 | $15.00 | $41.25 |
| o1 | 200K | $15.00 | $60.00 | $7.50 −50% | $23.16 |
| o1-2024-12-17 | 200K | $15.00 | $60.00 | $7.50 −50% | $23.16 |
| GPT-5.2 Pro | 400K | $21.00 | $168.00 | $21.00 | $57.75 |
| GPT-5.2 Pro-2025-12-11 | 400K | $21.00 | $168.00 | $21.00 | $57.75 |
| GPT-4 | 8K | $30.00 | $60.00 | $30.00 | $37.50 |
| GPT-4-0613 | 8K | $30.00 | $60.00 | $30.00 | $37.50 |
| GPT-5.4 Pro | 1.1M | $30.00 | $180.00 | $30.00 | $67.50 |
| GPT-5.4 Pro-2026-03-05 | 1.1M | $30.00 | $180.00 | $30.00 | $67.50 |
| GPT-5.5 Pro | 1.1M | $30.00 | $180.00 | $30.00 | $67.50 |
| GPT-5.5 Pro-2026-04-23 | 1.1M | $30.00 | $180.00 | $30.00 | $67.50 |
| o1 Pro | 200K | $150.00 | $600.00 | $150.00 | $262.50 |
| o1 Pro-2025-03-19 | 200K | $150.00 | $600.00 | $150.00 | $262.50 |
| GPT-3.5-turbo-16k | — | — | — | — | — |
| GPT-3.5-turbo-instruct | — | — | — | — | — |
| GPT-3.5-turbo-instruct-0914 | — | — | — | — | — |
OpenAI, answered.
How does OpenAI prompt caching work?
Caching is automatic for prompts at or above 1,024 tokens. The longest matching prefix is reused, billed at the cache-read rate, and reported back as cached tokens in the usage object. Keeping stable content at the front of the request is what makes the prefix match.
What discount does OpenAI caching give?
Cache reads on OpenAI are about 75% cheaper than list input price.
Does OpenAI support BYOK on Zumik?
Yes. You can bring your own OpenAI key, and provider-native caching, batch, and service tiers stay active under your account.
What is the common OpenAI caching mistake?
Injecting a timestamp, request id, or per-call system note near the top of the prompt resets the prefix and silently drops the hit rate to near zero.
Route OpenAI the smart way.
Capture OpenAI's 75% cache-read discount and batch tier automatically through Zumik.