GPT-5 Mini

Low-cost workhorse for high-frequency tool calls and classification. Cache reads land at roughly a quarter of list input price.

$0.25
Input / 1M tokens
$2.00
Output / 1M tokens
$0.03
Cache read · −90%
400K
Context window

At a glance.

ProviderOpenAI
Familygpt-5
Released2025-08
LicenseProprietary
Context window400K tokens
Max output128K tokens
Modalitiestext, image
Tool callingYes
Reasoning modeYes
Cachingautomatic
Batch discount50% off

What reuse looks like here.

GPT-5 Mini · agent trafficper request
Total input100%
Candidate reuse58%
Realized reuse49%
Capture rate85%
150ms
Warm TTFT · −63% vs cold
210
Output tokens / sec
Reuse economics

What you actually pay once caching works.

At a typical 55% prefix reuse, a million input tokens on GPT-5 Mini effectively costs $0.13 instead of $0.25 - blending to roughly $0.59 with a 25% output share. Background work drops a further 50% on the batch tier.

Estimate it for your workload
Best for
high-volumeclassificationtool-calls

Routes through these aliases:

Same OpenAI client, this model.

python
from openai import OpenAI

client = OpenAI(base_url="https://api.zumik.ai/v1", api_key="zk_live_...")

r = client.responses.create(
    model="gpt-5-mini",          # or an alias like auto.cheapest
    input="Draft a fix for the failing test.",
)
print(r.usage.input_tokens_cached)   # confirm reuse

GPT-5 Mini, answered.

How much does GPT-5 Mini cost?

GPT-5 Mini is $0.25 per million input tokens and $2.00 per million output tokens through Zumik. Cache reads are $0.03 per million, a 90% discount on input.

What is GPT-5 Mini's context window?

GPT-5 Mini supports a 400K-token context window with up to 128K output tokens.

Does GPT-5 Mini support prompt caching?

Yes. OpenAI uses Automatic prefix caching caching. In the Zumik corpus, GPT-5 Mini shows a median cache capture of 85% on agent workloads.

Which Zumik aliases route to GPT-5 Mini?

GPT-5 Mini is a candidate for the auto.cheapest, code.cheapest aliases, selected when it wins under current routing policy.

Run GPT-5 Mini with reuse measured.

Point an OpenAI client at Zumik and see exactly how much of this model's input you are reusing.