On prefill-only, kv_cache_usage_perc reaches max 0.32?

Would appreciate referring to the phenomena described in

The phenomena described in Better understanding of the kv_cache_usage_perc and num_requests_running metrics can be used to refine the saturation-based algorithm · Issue #662 · llm-d-incubation/workload-variant-autoscaler · GitHub involve unexpected behavior of the kv_cache_usage_perc and num_requests_running metrics in vLLM under certain load conditions. Specifically, even when the system is saturated (num_requests_waiting grows and num_requests_running stabilizes), the kv_cache_usage_perc may remain low (e.g., ~32%), indicating that cache usage alone may not reliably signal server saturation. This suggests that autoscaling algorithms should consider both metrics together, not just kv_cache usage, to accurately detect and respond to saturation states.

Would you like more detail on how these metrics interact or how to use them for autoscaling?

Sources:

Hint: Mention @RunLLM in the post for followups.