Prefill

The phase where a model reads and encodes the input prompt before it begins generating output tokens.

Inference splits into prefill (reading the prompt) and decode (generating tokens). Prefill cost scales with input length; decode cost scales with output length.

Agent workloads are prefill-heavy: long stable instructions and tool definitions, short outputs. That asymmetry is precisely why reuse matters - most of the bill is the part that repeats.

See it in practice.

Definitions are useful; measurement is better. Run a diagnostic on your own workload.