Latest V1 Feedback topics

Topic	Replies	Views	Activity
About the V1 Feedback category	0	29	March 20, 2025
vLLM v1 forces me to pre-allocate a huge, non-reclaimable GPU KV cache for long contexts, and none of the current offload or quantization options solve the resulting VRAM bloat without crippling speed.	5	160	September 8, 2025
GLM4.5 V memory leak on inference	9	160	August 28, 2025
自定义多模态模型注册集成到vllm的问题一个问题反馈	1	134	August 7, 2025
Cudagraph in V1	3	806	July 21, 2025
vLLM V1 - Default max CUDA graph size	2	992	June 30, 2025
Dose vllm V1 support asynchronous scheduling?	3	629	June 27, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8)	6	317	June 12, 2025
[Spec Decode] Why does the acceptance rate look close to the paper, but the throughput is still not high?	1	175	April 21, 2025
Why V1 not support Mamba Models?	1	89	April 21, 2025
The new V1 way to ~--cpu-offload-gb	5	2191	April 13, 2025
Does the vLLM v1 support Speculative Decoding now?	4	261	March 25, 2025