TTFT (time to first token)

The latency from sending a request to receiving the first generated token, dominated by prefill on long prompts.

Time to first token is what users feel as responsiveness. For short prompts it is small; for long agent prompts it is dominated by prefill - the cost of reading the whole input before decoding starts.

A warm prefix cuts prefill, so TTFT drops sharply when caching hits. In our benchmark, a warm prefix reduces TTFT by roughly half to two-thirds.

See it in practice.

Definitions are useful; measurement is better. Run a diagnostic on your own workload.