KV cache
The stored key/value attention tensors a model computes during prefill, kept so the same prefix does not have to be recomputed.
When a transformer reads a prompt it computes key and value tensors for every token. Those tensors - the KV cache - are what let generation attend back over the input. Recomputing them for a prefix you have already seen is pure waste.
Prompt caching, in every provider scheme and every self-hosted runtime, is fundamentally about keeping and reusing this KV state. Zumik treats the physical KV realization as an implementation detail behind opaque handles, and keeps logical state separate from it.
Keep reading.
Prompt caching
Reusing the computed state of a repeated prompt prefix so it is billed at a reduced cache-read rate instead of being recomputed.
Prefill
The phase where a model reads and encodes the input prompt before it begins generating output tokens.
TTFT (time to first token)
The latency from sending a request to receiving the first generated token, dominated by prefill on long prompts.
See it in practice.
Definitions are useful; measurement is better. Run a diagnostic on your own workload.