but vllm allow KV cache grown with no limited?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| What does gpu memory utilisation include? | 2 | 3098 | September 22, 2025 | |
| About monitor the usage of KV cache memory | 1 | 899 | May 24, 2025 | |
| How to understand OOM and foresee memory usage | 5 | 138 | April 24, 2026 | |
| vLLM v1 forces me to pre-allocate a huge, non-reclaimable GPU KV cache for long contexts, and none of the current offload or quantization options solve the resulting VRAM bloat without crippling speed. | 5 | 789 | September 8, 2025 | |
| Active vs Reserved GPU Memory | 1 | 88 | January 5, 2026 |