V1 Feedback
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
|
About the V1 Feedback category
|
|
0 | 33 | March 20, 2025 |
|
vLLM v1 forces me to pre-allocate a huge, non-reclaimable GPU KV cache for long contexts, and none of the current offload or quantization options solve the resulting VRAM bloat without crippling speed.
|
|
5 | 209 | September 8, 2025 |
|
GLM4.5 V memory leak on inference
|
|
9 | 197 | August 28, 2025 |
|
自定义多模态模型注册集成到vllm的问题一个问题反馈
|
|
1 | 172 | August 7, 2025 |
|
Cudagraph in V1
|
|
3 | 865 | July 21, 2025 |
|
vLLM V1 - Default max CUDA graph size
|
|
2 | 1059 | June 30, 2025 |
|
Dose vllm V1 support asynchronous scheduling?
|
|
3 | 713 | June 27, 2025 |
|
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8)
|
|
6 | 331 | June 12, 2025 |
|
[Spec Decode] Why does the acceptance rate look close to the paper, but the throughput is still not high?
|
|
1 | 184 | April 21, 2025 |
|
Why V1 not support Mamba Models?
|
|
1 | 97 | April 21, 2025 |
|
The new V1 way to ~--cpu-offload-gb
|
|
5 | 2344 | April 13, 2025 |
|
Does the vLLM v1 support Speculative Decoding now?
|
|
4 | 283 | March 25, 2025 |