|
Welcome to vLLM Forums! :wave:
|
|
1
|
1508
|
March 24, 2025
|
|
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu
|
|
1
|
25
|
June 5, 2026
|
|
Is there a hook/flag to capture activation statistics during inference for use with llm-compressor AWQ?
|
|
3
|
24
|
June 4, 2026
|
|
Cuda 12.8 not working vLLM
|
|
8
|
53
|
June 4, 2026
|
|
Is the reason for my vllm 0.20.0 failing to start because of nixl?
|
|
9
|
214
|
June 3, 2026
|
|
Explaination for wait_for_save
|
|
3
|
23
|
June 2, 2026
|
|
(Possible bug) --enable-prompt-tokens-details not working?
|
|
1
|
36
|
June 2, 2026
|
|
Minimax m3 support
|
|
1
|
158
|
June 1, 2026
|
|
[Bug] Segfault in PythonSymNodeImpl and Deadlock on RTX 5090 (Blackwell) with vLLM 0.11.2
|
|
1
|
28
|
June 1, 2026
|
|
[Bug] Segfault in cublasLt/cuLaunchKernel on RTX 5080 using v0.21.0 (V1 Engine)
|
|
1
|
13
|
May 31, 2026
|
|
Segfault in cublasLt/cuLaunchKernel on RTX 5080 using v0.21.0 (V1 Engine)
|
|
1
|
16
|
May 31, 2026
|
|
OOM Trying to run Gemma 4 31B NVFP4 on 2x16GB
|
|
4
|
52
|
May 31, 2026
|
|
Preserve reasoning state across turns
|
|
1
|
26
|
May 29, 2026
|
|
vLLM L40S quantization optimization
|
|
19
|
48
|
May 29, 2026
|
|
Dose vllm support Qwen3.5 pd disaggregation with Mooncake?
|
|
1
|
35
|
May 28, 2026
|
|
An issue about using multiple GPU to deploy multiple models with vllm
|
|
1
|
50
|
May 28, 2026
|
|
Vllm-ascend怎么支持responses
|
|
1
|
44
|
May 27, 2026
|
|
How can we use latest vllm if we are using older drivers which only support cuda 12
|
|
3
|
55
|
May 27, 2026
|
|
python+vllm:Out of memory
|
|
2
|
31
|
May 26, 2026
|
|
Why Does Decode Forward on PP Stage 0 Appear to Precede Prefill Forward on PP Stage 1 for the Same Request?
|
|
1
|
21
|
May 26, 2026
|
|
vLLM服务开启了MTP,则如何评估指定维度的真实吞吐
|
|
2
|
50
|
May 25, 2026
|
|
AI工具使用本地vLLM部署的DeepSeek-V4 的DSML兼容性问题如何解决?
|
|
2
|
387
|
May 24, 2026
|
|
Gemma 4 26B + vLLM + FP8 on H100: TTFT good, E2E latency bad
|
|
1
|
103
|
May 23, 2026
|
|
VGPU on podman "No CUDA GPUs are available"
|
|
0
|
23
|
May 23, 2026
|
|
Qwen3-vl执行2D grounding任务时的图像大小
|
|
15
|
87
|
May 21, 2026
|
|
vLLM Qwen3.6-27B Extended Latency on Jetson Thor 128GB with Large Prompts
|
|
1
|
77
|
May 20, 2026
|
|
vLLM Qwen3.6-27B Extended Latency on Jetson Thor 128GB with Large Prompts
|
|
1
|
70
|
May 20, 2026
|
|
Vllm在h800运行模型推理输出的思考模式字段和内容
|
|
1
|
48
|
May 14, 2026
|
|
How to use Gemma 4 with the new MTP drafters?
|
|
8
|
2565
|
May 14, 2026
|
|
vLLM 多节点的PD分离部署
|
|
2
|
194
|
May 14, 2026
|