Claude Haiku 4.5 vs Gemini 3.5 Flash
The fastest profiled models for latency-critical agent loops.
| Metric | Claude Haiku 4.5 | Gemini 3.5 Flash |
|---|---|---|
| Provider | Anthropic | Google Gemini |
| Input / 1M | $1.00 | $1.50 |
| Output / 1M | $5.00 | $9.00 |
| Cache read / 1M | $0.10 (−90%) | $0.15 (−90%) |
| Reuse-adj @55% | $1.63 | $2.82 |
| Context | 200K | 1M |
| Cache capture | 90% | 81% |
| Warm TTFT | 110ms (−66%) | 150ms (−61%) |
| Quality index | 80 | 86 |
Teal marks the better value in each row. Reuse-adjusted assumes 55% prefix reuse and a 25% output share.
You want explicit caching and Anthropic-grade tool reliability at fast-tier latency.
You want multimodal input and the lowest input price, with implicit caching and no breakpoints.
Gemini 3.5 Flash edges price and modality; Claude Haiku 4.5 wins when tool-call quality and the 90% cache-read discount matter. Both are auto.fast targets.
Claude Haiku 4.5 vs Gemini 3.5 Flash, answered.
Is Claude Haiku 4.5 or Gemini 3.5 Flash cheaper?
At 55% prefix reuse, Claude Haiku 4.5 blends to about $1.63 per 1M tokens and Gemini 3.5 Flash to about $2.82. Claude Haiku 4.5 is cheaper on that basis.
Which has the larger context window?
Gemini 3.5 Flash has the larger window: 1M vs 200K tokens.
Which captures more reuse?
Claude Haiku 4.5 shows higher measured cache capture (90% vs 81%) in the Zumik corpus.
Let an alias pick for you.
Route to whichever model wins under current policy automatically. Zumik resolves the alias to the best fit per request, so you never hard-code a loser.