Claude Haiku 4.5 vs Gemini 3.5 Flash

The fastest profiled models for latency-critical agent loops.

MetricClaude Haiku 4.5Gemini 3.5 Flash
ProviderAnthropicGoogle Gemini
Input / 1M$1.00$1.50
Output / 1M$5.00$9.00
Cache read / 1M$0.10 (−90%)$0.15 (−90%)
Reuse-adj @55%$1.63$2.82
Context200K1M
Cache capture90%81%
Warm TTFT110ms (−66%)150ms (−61%)
Quality index8086

Teal marks the better value in each row. Reuse-adjusted assumes 55% prefix reuse and a 25% output share.

Pick Claude Haiku 4.5 when

You want explicit caching and Anthropic-grade tool reliability at fast-tier latency.

Pick Gemini 3.5 Flash when

You want multimodal input and the lowest input price, with implicit caching and no breakpoints.

Verdict

Gemini 3.5 Flash edges price and modality; Claude Haiku 4.5 wins when tool-call quality and the 90% cache-read discount matter. Both are auto.fast targets.

Claude Haiku 4.5 vs Gemini 3.5 Flash, answered.

Is Claude Haiku 4.5 or Gemini 3.5 Flash cheaper?

At 55% prefix reuse, Claude Haiku 4.5 blends to about $1.63 per 1M tokens and Gemini 3.5 Flash to about $2.82. Claude Haiku 4.5 is cheaper on that basis.

Which has the larger context window?

Gemini 3.5 Flash has the larger window: 1M vs 200K tokens.

Which captures more reuse?

Claude Haiku 4.5 shows higher measured cache capture (90% vs 81%) in the Zumik corpus.

Let an alias pick for you.

Route to whichever model wins under current policy automatically. Zumik resolves the alias to the best fit per request, so you never hard-code a loser.