Glossary

KV cache

The stored key/value attention tensors a model computes during prefill, kept so the same prefix does not have to be recomputed.

How sessions map to KV state

When a transformer reads a prompt it computes key and value tensors for every token. Those tensors - the KV cache - are what let generation attend back over the input. Recomputing them for a prefix you have already seen is pure waste.

Prompt caching, in every provider scheme and every self-hosted runtime, is fundamentally about keeping and reusing this KV state. Zumik treats the physical KV realization as an implementation detail behind opaque handles, and keeps logical state separate from it.

Related terms

See it in practice.

Definitions are useful; measurement is better. Run a diagnostic on your own workload.

Run a diagnostic Back to glossary

KV cache

Keep reading.

Prompt caching

Prefill

TTFT (time to first token)

See it in practice.