OpenAI gpt-oss-120b
Open-weights reasoning model served on Fireworks. A good fit for regulated and on-prem lanes where BYOC purge evidence is required.
At a glance.
| Provider | Fireworks AI |
| Family | gpt_oss |
| Released | 2025-08 |
| License | Open weights |
| Context window | 131K tokens |
| Max output | 33K tokens |
| Parameters | 117B |
| Modalities | text |
| Tool calling | Yes |
| Reasoning mode | Yes |
| Caching | automatic |
| Batch discount | No batch tier |
What reuse looks like here.
What you actually pay once caching works.
At a typical 55% prefix reuse, a million input tokens on OpenAI gpt-oss-120b effectively costs $0.08 instead of $0.15 - blending to roughly $0.21 with a 25% output share. There is no batch tier, so cost control here leans on caching and routing.
Estimate it for your workloadRoutes through these aliases:
Same OpenAI client, this model.
from openai import OpenAI
client = OpenAI(base_url="https://api.zumik.ai/v1", api_key="zk_live_...")
r = client.responses.create(
model="gpt-oss-120b", # or an alias like auto.cheapest
input="Draft a fix for the failing test.",
)
print(r.usage.input_tokens_cached) # confirm reuseHow OpenAI gpt-oss-120b stacks up.
Other options for these workloads.
OpenAI gpt-oss-120b, answered.
How much does OpenAI gpt-oss-120b cost?
OpenAI gpt-oss-120b is $0.15 per million input tokens and $0.60 per million output tokens through Zumik. Cache reads are $0.01 per million, a 90% discount on input.
What is OpenAI gpt-oss-120b's context window?
OpenAI gpt-oss-120b supports a 131K-token context window with up to 33K output tokens.
Does OpenAI gpt-oss-120b support prompt caching?
Yes. Fireworks AI uses Automatic prompt caching (serverless and dedicated) caching. In the Zumik corpus, OpenAI gpt-oss-120b shows a median cache capture of 79% on agent workloads.
Which Zumik aliases route to OpenAI gpt-oss-120b?
OpenAI gpt-oss-120b is a candidate for the auto.cheapest, code.cheapest, reasoning.best aliases, selected when it wins under current routing policy.
Run OpenAI gpt-oss-120b with reuse measured.
Point an OpenAI client at Zumik and see exactly how much of this model's input you are reusing.