Best models for coding agents
A composite for repository-scale coding agents: 60% quality index, 40% measured reuse, limited to tool-capable models that teams actually wire into coding loops.
| # | Model | Provider | Coding fit | List blended | Cache disc. |
|---|---|---|---|---|---|
| 1 | Claude Fable 5 | Anthropic | 84.0 | $20.00 | −90% |
| 2 | Claude Opus 4.8 | Anthropic | 83.0 | $10.00 | −90% |
| 3 | GPT-5.5 | OpenAI | 79.2 | $11.25 | −90% |
| 4 | Claude Sonnet 4.6 | Anthropic | 77.2 | $6.00 | −90% |
| 5 | GLM 5.1 | Fireworks | 72.0 | $2.15 | −81% |
| 6 | DeepSeek-V4-Pro | Fireworks | 71.6 | $2.17 | −92% |
| 7 | Kimi K2.6 | Fireworks | 71.0 | $1.71 | −83% |
Method
0.6 x intelligence + 0.4 x reuseMedianPct, filtered to tool-capable models tagged for coding or agentic use.
Rank the same models differently.
Cheapest models for cached agent workloads
Which model is cheapest once a typical agent prefix is served from cache?
Fastest time-to-first-token (warm prefix)
Which model responds fastest when the prefix is already cached?
Best intelligence per dollar
Which model gives the most quality per reuse-adjusted dollar?
Highest measured cache capture
Which models convert reuse opportunity into billed savings most reliably?
Rank models for your workload.
A diagnostic measures your real reuse and re-ranks the catalog for the way you actually call models.