GPT-5 Mini
Low-cost workhorse for high-frequency tool calls and classification. Cache reads land at roughly a quarter of list input price.
At a glance.
| Provider | OpenAI |
| Family | gpt-5 |
| Released | 2025-08 |
| License | Proprietary |
| Context window | 400K tokens |
| Max output | 128K tokens |
| Modalities | text, image |
| Tool calling | Yes |
| Reasoning mode | Yes |
| Caching | automatic |
| Batch discount | 50% off |
What reuse looks like here.
What you actually pay once caching works.
At a typical 55% prefix reuse, a million input tokens on GPT-5 Mini effectively costs $0.13 instead of $0.25 - blending to roughly $0.59 with a 25% output share. Background work drops a further 50% on the batch tier.
Estimate it for your workloadRoutes through these aliases:
Same OpenAI client, this model.
from openai import OpenAI
client = OpenAI(base_url="https://api.zumik.ai/v1", api_key="zk_live_...")
r = client.responses.create(
model="gpt-5-mini", # or an alias like auto.cheapest
input="Draft a fix for the failing test.",
)
print(r.usage.input_tokens_cached) # confirm reuseOther options for these workloads.
GPT-5 Mini, answered.
How much does GPT-5 Mini cost?
GPT-5 Mini is $0.25 per million input tokens and $2.00 per million output tokens through Zumik. Cache reads are $0.03 per million, a 90% discount on input.
What is GPT-5 Mini's context window?
GPT-5 Mini supports a 400K-token context window with up to 128K output tokens.
Does GPT-5 Mini support prompt caching?
Yes. OpenAI uses Automatic prefix caching caching. In the Zumik corpus, GPT-5 Mini shows a median cache capture of 85% on agent workloads.
Which Zumik aliases route to GPT-5 Mini?
GPT-5 Mini is a candidate for the auto.cheapest, code.cheapest aliases, selected when it wins under current routing policy.
Run GPT-5 Mini with reuse measured.
Point an OpenAI client at Zumik and see exactly how much of this model's input you are reusing.