From trace to reuse-aware routing
Zumik sits between your agents and the providers. It measures what repeats, preserves the stable parts as reusable state, and routes each request through the cheapest reliable path, recording enough evidence that any decision can be explained or replayed later.
Ingest a trace
Send metadata-only traces, tokenized captures, or provider exports. No raw prompts are required to start measuring.
DetailsStep 02Score the workload
Prefix-family analysis produces a Workload Reuse Score and a reuse waterfall: opportunity, candidate, realized, and the missed-opportunity gap.
DetailsStep 03Model the state
Stable inputs become artifacts and bundles behind opaque handles. Sessions and branches give multi-turn flows a causal, conflict-safe history.
DetailsStep 04Resolve an alias
Each request resolves a logical alias (code.fast, auto.best) through an immutable release, recording exactly which model answered and why.
DetailsStep 05Execute and report
The broker picks a profile, captures provider-native caching, and returns a QoS outcome: admitted, degraded, missed, rejected, or expired.
DetailsStep 06Prove deletion
Delete revokes handles; purge jobs remove state and emit profile-specific receipts with any remaining retention window.
DetailsLogical state is not physical KV state.
The most important architectural decision in Zumik is splitting identity into three layers. It keeps customer handles stable while preventing cache implementation details from leaking into the product.
Logical identity
Artifacts, bundles, sessions, branches, snapshots. Customer-visible, opaque, independent of provider, model, or tokenizer.
snapshot_idMaterialization identity
The exact model-visible byte representation: tokenizer, prompt-compiler version, ordered block manifest. Two requests can share logical state but materialize differently.
materialization_keyKV realization compatibility
Whether an existing physical KV cache can be reused safely: model revision, quantization, engine, GPU topology, isolation namespace. The implementation detail that never leaks into product semantics.
kv_compatibility_keyTwo requests can share the same logical artifact yet need different KV realizations, different tokenizer, different quantization, managed versus BYOC. Collapsing these layers is how gateways end up leaking cache details into their API. Zumik refuses to.
A repeated prefix is not a cache hit.
Zumik reports what could be reused (opportunity) separately from what was reused (capture), and attaches an evidence level to every number so a prediction is never mistaken for a measurement.
Read the reasoningEvery decision leaves a record you can replay.
A request pins one snapshot and one alias release. An alias release is immutable, changing a provider-model revision creates a new release rather than mutating the old one. Customer logs expose the release id, so any past routing decision can be explained, and a replay run can reproduce it on the same workload shape.
How aliases resolve{
"requested_model": "code.fast",
"alias_release": "alr_2026_06_09_003",
"resolved_model": "anthropic/claude-haiku-4-5",
"resolution_reason": "lowest_expected_latency_under_policy",
"trace_id": "trc_9f12…"
}Measure first. Migrate from evidence.
Run a workload diagnostic on real traffic, then let the reuse waterfall decide what to optimize.