|
Welcome to vLLM Forums! :wave:
|
|
1
|
1554
|
March 24, 2025
|
|
The current vLLM CPU backend is not working properly
|
|
9
|
151
|
June 20, 2026
|
|
What happened to 0.23.0 release?
|
|
1
|
147
|
June 15, 2026
|
|
vLLM的MTP的标准测试方法是什么?
|
|
2
|
54
|
June 13, 2026
|
|
Vllm rocm gemma4 and MTP
|
|
9
|
93
|
June 12, 2026
|
|
PR backlog on GitHub
|
|
2
|
35
|
June 12, 2026
|
|
Rust cannot catch foreign exceptions
|
|
1
|
30
|
June 11, 2026
|
|
Do we Have ChatBots Here
|
|
2
|
44
|
June 11, 2026
|
|
Vllm服务后台的前缀缓存率和单个请求的前缀缓存率
|
|
1
|
35
|
June 11, 2026
|
|
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu
|
|
1
|
73
|
June 5, 2026
|
|
Is there a hook/flag to capture activation statistics during inference for use with llm-compressor AWQ?
|
|
3
|
51
|
June 4, 2026
|
|
Cuda 12.8 not working vLLM
|
|
8
|
295
|
June 4, 2026
|
|
Is the reason for my vllm 0.20.0 failing to start because of nixl?
|
|
9
|
293
|
June 3, 2026
|
|
Explaination for wait_for_save
|
|
3
|
52
|
June 2, 2026
|
|
(Possible bug) --enable-prompt-tokens-details not working?
|
|
1
|
124
|
June 2, 2026
|
|
Minimax m3 support
|
|
1
|
464
|
June 1, 2026
|
|
[Bug] Segfault in PythonSymNodeImpl and Deadlock on RTX 5090 (Blackwell) with vLLM 0.11.2
|
|
1
|
50
|
June 1, 2026
|
|
[Bug] Segfault in cublasLt/cuLaunchKernel on RTX 5080 using v0.21.0 (V1 Engine)
|
|
1
|
22
|
May 31, 2026
|
|
Segfault in cublasLt/cuLaunchKernel on RTX 5080 using v0.21.0 (V1 Engine)
|
|
1
|
29
|
May 31, 2026
|
|
OOM Trying to run Gemma 4 31B NVFP4 on 2x16GB
|
|
4
|
138
|
May 31, 2026
|
|
Preserve reasoning state across turns
|
|
1
|
64
|
May 29, 2026
|
|
vLLM L40S quantization optimization
|
|
19
|
168
|
May 29, 2026
|
|
Dose vllm support Qwen3.5 pd disaggregation with Mooncake?
|
|
1
|
69
|
May 28, 2026
|
|
An issue about using multiple GPU to deploy multiple models with vllm
|
|
1
|
86
|
May 28, 2026
|
|
Vllm-ascend怎么支持responses
|
|
1
|
67
|
May 27, 2026
|
|
How can we use latest vllm if we are using older drivers which only support cuda 12
|
|
3
|
132
|
May 27, 2026
|
|
python+vllm:Out of memory
|
|
2
|
61
|
May 26, 2026
|
|
Why Does Decode Forward on PP Stage 0 Appear to Precede Prefill Forward on PP Stage 1 for the Same Request?
|
|
1
|
35
|
May 26, 2026
|
|
vLLM服务开启了MTP,则如何评估指定维度的真实吞吐
|
|
2
|
72
|
May 25, 2026
|
|
AI工具使用本地vLLM部署的DeepSeek-V4 的DSML兼容性问题如何解决?
|
|
2
|
502
|
May 24, 2026
|