KV cache

The stored key/value attention tensors a model computes during prefill, kept so the same prefix does not have to be recomputed.

When a transformer reads a prompt it computes key and value tensors for every token. Those tensors - the KV cache - are what let generation attend back over the input. Recomputing them for a prefix you have already seen is pure waste.

Prompt caching, in every provider scheme and every self-hosted runtime, is fundamentally about keeping and reusing this KV state. Zumik treats the physical KV realization as an implementation detail behind opaque handles, and keeps logical state separate from it.

See it in practice.

Definitions are useful; measurement is better. Run a diagnostic on your own workload.