Google Gemini

Gemini 3 family, implicit caching, up to 2M-token context.

75%
Cache-read discount
50%
Batch discount
24
Models on Zumik
Yes
BYOK supported

How caching works here

Implicit caching applies discounts automatically when a request shares a long prefix with a recent one, with no breakpoints to manage. Explicit cached-content handles are also available for content you know will recur, trading setup for predictability.

What Zumik sees

Implicit caching is convenient but its capture is the least predictable of the proprietary providers in our data - savings appear, then vanish when traffic interleaves. Zumik reports it at the trace_estimated to provider_reported range depending on response detail.

Pitfall

Assuming the 2M-token window means everything is cheap. Implicit hits depend on recency and prefix overlap, not just on fitting inside the window.

Profile
Min cache size2,048 tok
RetentionImplicit, minutes; explicit cached content configurable
Service tiersstandard, batch
BYOCManaged only
multimodaldocument-heavy RAGvery long context

Google Gemini models in the catalog.

ModelContextInputOutputCache readReuse-adj
Gemini 2.0 Flash-Lite1M$0.07$0.30$0.07 $0.13
Gemini 2.0 Flash-Lite 0011M$0.07$0.30$0.07 $0.13
Gemini 2.0 Flash1M$0.10$0.40$0.03 75%$0.14
Gemini 2.0 Flash 0011M$0.10$0.40$0.03 75%$0.14
Gemini 2.5 Flash-Lite1M$0.10$0.40$0.01 90%$0.14
Gemini Flash-Lite Latest1M$0.10$0.40$0.03 75%$0.14
Gemini 3.1 Flash Lite1M$0.25$1.50$0.03 90%$0.47
Gemini 3.1 Flash Lite Preview1M$0.25$1.50$0.03 90%$0.47
Gemini 2.5 Flash1M$0.30$2.50$0.07 75%$0.76
Gemini Flash Latest1M$0.30$2.50$0.07 75%$0.76
Gemini 3 Flash Preview1M$0.50$3.00$0.05 90%$0.94
Gemini 2.5 Pro1M$1.25$10.00$0.13 90%$2.97
Gemini 3.5 Flash1M$1.50$9.00$0.15 90%$2.82
Gemini 3 Pro Preview1M$2.00$12.00$0.20 90%$3.76
Gemini 3.1 Pro Preview1M$2.00$12.00$0.20 90%$3.76
Antigravity Agent Preview131K
Deep Research Max Preview (Apr-21-2026)131K
Deep Research Preview (Apr-21-2026)131K
Deep Research Pro Preview (Dec-12-2025)131K
Gemini 2.5 Computer Use Preview 10-2025131K
Gemini Pro Latest1M
Gemma 4 26B A4B IT262K
Gemma 4 31B IT262K
Nano Banana Pro131K

Google Gemini, answered.

How does Google Gemini prompt caching work?

Implicit caching applies discounts automatically when a request shares a long prefix with a recent one, with no breakpoints to manage. Explicit cached-content handles are also available for content you know will recur, trading setup for predictability.

What discount does Google Gemini caching give?

Cache reads on Google Gemini are about 75% cheaper than list input price.

Does Google Gemini support BYOK on Zumik?

Yes. You can bring your own Google Gemini key, and provider-native caching, batch, and service tiers stay active under your account.

What is the common Google Gemini caching mistake?

Assuming the 2M-token window means everything is cheap. Implicit hits depend on recency and prefix overlap, not just on fitting inside the window.

Route Google Gemini the smart way.

Capture Google Gemini's 75% cache-read discount and batch tier automatically through Zumik.