Qwen3-Next-80B-A3B-Instruct
Qwen3-Next-80B-A3B-Instruct on Zumik: live pricing, context, and caching, routable by id or alias through one OpenAI-compatible endpoint.
At a glance.
| Provider | Fireworks AI |
| Family | qwen3 |
| Released | 2026-05 |
| License | Open weights |
| Context window | 262K tokens |
| Max output | 64K tokens |
| Parameters | 80B |
| Modalities | text |
| Tool calling | Yes |
| Reasoning mode | No |
| Caching | automatic |
| Batch discount | No batch tier |
What reuse looks like here.
Pricing, context, and capabilities for Qwen3-Next-80B-A3B-Instruct are live, but it is outside the flagship set Zumik benchmarks in depth, so measured reuse, capture, and warm TTFT are not shown yet. Run a workload estimate or route it by id to start collecting traces.
Same OpenAI client, this model.
from openai import OpenAI
client = OpenAI(base_url="https://api.zumik.ai/v1", api_key="zk_live_...")
r = client.responses.create(
model="qwen3-next-80b-a3b-instruct",
input="Draft a fix for the failing test.",
)
print(r.usage.input_tokens_cached) # confirm reuseQwen3-Next-80B-A3B-Instruct, answered.
How much does Qwen3-Next-80B-A3B-Instruct cost?
Qwen3-Next-80B-A3B-Instruct is an open-weights model routed through Fireworks AI. It is priced on the host's serverless size tier rather than a single published per-token list price, so it shows "—" here until profiled.
What is Qwen3-Next-80B-A3B-Instruct's context window?
Qwen3-Next-80B-A3B-Instruct supports a 262K-token context window with up to 64K output tokens.
Does Qwen3-Next-80B-A3B-Instruct support prompt caching?
Yes. Fireworks AI uses Automatic prompt caching (serverless and dedicated) caching. In the Zumik corpus, Qwen3-Next-80B-A3B-Instruct shows a median cache capture of 80% on agent workloads.
Run Qwen3-Next-80B-A3B-Instruct with reuse measured.
Point an OpenAI client at Zumik and see exactly how much of this model's input you are reusing.
