OpenAI gpt-oss-120b

Open-weights reasoning model served on Fireworks. A good fit for regulated and on-prem lanes where BYOC purge evidence is required.

$0.15
Input / 1M tokens
$0.60
Output / 1M tokens
$0.01
Cache read · −90%
131K
Context window

At a glance.

ProviderFireworks AI
Familygpt_oss
Released2025-08
LicenseOpen weights
Context window131K tokens
Max output33K tokens
Parameters117B
Modalitiestext
Tool callingYes
Reasoning modeYes
Cachingautomatic
Batch discountNo batch tier

What reuse looks like here.

OpenAI gpt-oss-120b · agent trafficper request
Total input100%
Candidate reuse58%
Realized reuse46%
Capture rate79%
120ms
Warm TTFT · −64% vs cold
290
Output tokens / sec
Reuse economics

What you actually pay once caching works.

At a typical 55% prefix reuse, a million input tokens on OpenAI gpt-oss-120b effectively costs $0.08 instead of $0.15 - blending to roughly $0.21 with a 25% output share. There is no batch tier, so cost control here leans on caching and routing.

Estimate it for your workload
Best for
on-premregulatedhigh-volume

Routes through these aliases:

Same OpenAI client, this model.

python
from openai import OpenAI

client = OpenAI(base_url="https://api.zumik.ai/v1", api_key="zk_live_...")

r = client.responses.create(
    model="gpt-oss-120b",          # or an alias like auto.cheapest
    input="Draft a fix for the failing test.",
)
print(r.usage.input_tokens_cached)   # confirm reuse

OpenAI gpt-oss-120b, answered.

How much does OpenAI gpt-oss-120b cost?

OpenAI gpt-oss-120b is $0.15 per million input tokens and $0.60 per million output tokens through Zumik. Cache reads are $0.01 per million, a 90% discount on input.

What is OpenAI gpt-oss-120b's context window?

OpenAI gpt-oss-120b supports a 131K-token context window with up to 33K output tokens.

Does OpenAI gpt-oss-120b support prompt caching?

Yes. Fireworks AI uses Automatic prompt caching (serverless and dedicated) caching. In the Zumik corpus, OpenAI gpt-oss-120b shows a median cache capture of 79% on agent workloads.

Which Zumik aliases route to OpenAI gpt-oss-120b?

OpenAI gpt-oss-120b is a candidate for the auto.cheapest, code.cheapest, reasoning.best aliases, selected when it wins under current routing policy.

Run OpenAI gpt-oss-120b with reuse measured.

Point an OpenAI client at Zumik and see exactly how much of this model's input you are reusing.