About monitor the usage of KV cache memory

walker-ai · May 24, 2025, 3:46am

Hi,
I can set the GPU memory budget for KV cache usage through the initial parameter gpu_memory_utilization.
But how can I observe whether this portion of the GPU memory is actually being used, in other words, how much of it is truly occupied? I want to monitor the usage of this part of GPU memory — how can I do that?

RunLLM · May 24, 2025, 3:46am

You can monitor actual GPU memory usage for KV cache in vLLM by checking the logs during initialization, which report memory usage by component, including how much is reserved for KV cache and how much is used by model weights and activations. For real-time monitoring, vLLM does not provide a built-in API to directly report live KV cache usage, but you can observe overall GPU memory usage with tools like nvidia-smi and infer KV cache utilization from the difference after model loading and during inference. The logs will show lines like: “model weights take X GiB; … the rest of the memory reserved for KV Cache is Y GiB” and you can compare this to the total GPU memory used as reported by system tools or vLLM logs during runtime.

For more granular monitoring, you may need to instrument the code or use external GPU profiling tools, as vLLM does not expose a direct API for live KV cache usage statistics as of now. See this issue discussion for details and log examples.

Would you like more detail on how to interpret these logs or set up external monitoring?

Sources:

How to measure the actual GPU memory usage?

Topic		Replies	Views
Which arguments affect GPU memory General	1	95	April 29, 2025
How log kvcache usage and prefix hit rate in offline infer? General	19	78	May 13, 2025
Why vLLM uses alot of CPU memory General	1	237	April 21, 2025
Why vllm cannot fully use GPU in batch processing General	12	181	March 29, 2025
Does vllm support deploy multiple docker instance on one gpu General	1	64	May 21, 2025

About monitor the usage of KV cache memory

Related topics