Gemma 4 26B A4B IT

Gemma 4 26B A4B IT on Zumik: live pricing, context, and caching, routable by id or alias through one OpenAI-compatible endpoint.

Input / 1M tokens
Output / 1M tokens
Cache read
262K
Context window

At a glance.

ProviderGoogle
Familygemma-4
Released2026-04
LicenseOpen weights
Context window262K tokens
Max output33K tokens
Modalitiestext, image
Tool callingYes
Reasoning modeYes
Cachingnone
Batch discount50% off

What reuse looks like here.

Not yet profiled

Pricing, context, and capabilities for Gemma 4 26B A4B IT are live, but it is outside the flagship set Zumik benchmarks in depth, so measured reuse, capture, and warm TTFT are not shown yet. Run a workload estimate or route it by id to start collecting traces.

Same OpenAI client, this model.

python
from openai import OpenAI

client = OpenAI(base_url="https://api.zumik.ai/v1", api_key="zk_live_...")

r = client.responses.create(
    model="gemma-4-26b-a4b-it",
    input="Draft a fix for the failing test.",
)
print(r.usage.input_tokens_cached)   # confirm reuse

Gemma 4 26B A4B IT, answered.

How much does Gemma 4 26B A4B IT cost?

Gemma 4 26B A4B IT is an open-weights model routed through Google Gemini. It is priced on the host's serverless size tier rather than a single published per-token list price, so it shows "—" here until profiled.

What is Gemma 4 26B A4B IT's context window?

Gemma 4 26B A4B IT supports a 262K-token context window with up to 33K output tokens.

Run Gemma 4 26B A4B IT with reuse measured.

Point an OpenAI client at Zumik and see exactly how much of this model's input you are reusing.