TTFT (time to first token)
The latency from sending a request to receiving the first generated token, dominated by prefill on long prompts.
Time to first token is what users feel as responsiveness. For short prompts it is small; for long agent prompts it is dominated by prefill - the cost of reading the whole input before decoding starts.
A warm prefix cuts prefill, so TTFT drops sharply when caching hits. In our benchmark, a warm prefix reduces TTFT by roughly half to two-thirds.
Keep reading.
Prefill
The phase where a model reads and encodes the input prompt before it begins generating output tokens.
Prompt caching
Reusing the computed state of a repeated prompt prefix so it is billed at a reduced cache-read rate instead of being recomputed.
KV cache
The stored key/value attention tensors a model computes during prefill, kept so the same prefix does not have to be recomputed.
See it in practice.
Definitions are useful; measurement is better. Run a diagnostic on your own workload.