Head-to-head

Claude Haiku 4.5 vs Gemini 3.5 Flash

The fastest profiled models for latency-critical agent loops.

Metric	Claude Haiku 4.5	Gemini 3.5 Flash
Provider	Anthropic	Google Gemini
Input / 1M	$1.00	$1.50
Output / 1M	$5.00	$9.00
Cache read / 1M	$0.10 (−90%)	$0.15 (−90%)
Reuse-adj @55%	$1.63	$2.82
Context	200K	1M
Cache capture	91%	85%
Warm TTFT	110ms (−66%)	150ms (−61%)
Quality index	76	94

Teal marks the better value in each row. Reuse-adjusted assumes 55% prefix reuse and a 25% output share.

Pick Claude Haiku 4.5 when

You want explicit caching and Anthropic-grade tool reliability at fast-tier latency.

Pick Gemini 3.5 Flash when

You want multimodal input and the lowest input price, with implicit caching and no breakpoints.

Verdict

Gemini 3.5 Flash edges price and modality; Claude Haiku 4.5 wins when tool-call quality and the 90% cache-read discount matter. Both are auto.fast targets.

Frequently asked

Claude Haiku 4.5 vs Gemini 3.5 Flash, answered.

Is Claude Haiku 4.5 or Gemini 3.5 Flash cheaper?

At 55% prefix reuse, Claude Haiku 4.5 blends to about $1.63 per 1M tokens and Gemini 3.5 Flash to about $2.82. Claude Haiku 4.5 is cheaper on that basis.

Which has the larger context window?

Gemini 3.5 Flash has the larger window: 1M vs 200K tokens.

Which captures more reuse?

Claude Haiku 4.5 shows higher measured cache capture (91% vs 85%) in the Zumik corpus.

Let an alias pick for you.

Route to whichever model wins under current policy automatically. Zumik resolves the alias to the best fit per request, so you never hard-code a loser.

How aliases work More comparisons