V1 Feedback
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| About the V1 Feedback category |   | 0 | 29 | March 20, 2025 | 
| vLLM v1 forces me to pre-allocate a huge, non-reclaimable GPU KV cache for long contexts, and none of the current offload or quantization options solve the resulting VRAM bloat without crippling speed. |     | 5 | 160 | September 8, 2025 | 
| GLM4.5 V memory leak on inference |       | 9 | 160 | August 28, 2025 | 
| 自定义多模态模型注册集成到vllm的问题一个问题反馈 |     | 1 | 134 | August 7, 2025 | 
| Cudagraph in V1 |       | 3 | 806 | July 21, 2025 | 
| vLLM V1 - Default max CUDA graph size |       | 2 | 992 | June 30, 2025 | 
| Dose vllm V1 support asynchronous scheduling? |         | 3 | 629 | June 27, 2025 | 
| V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) |         | 6 | 317 | June 12, 2025 | 
| [Spec Decode] Why does the acceptance rate look close to the paper, but the throughput is still not high? |     | 1 | 175 | April 21, 2025 | 
| Why V1 not support Mamba Models? |     | 1 | 89 | April 21, 2025 | 
| The new V1 way to ~--cpu-offload-gb |     | 5 | 2191 | April 13, 2025 | 
| Does the vLLM v1 support Speculative Decoding now? |     | 4 | 261 | March 25, 2025 |