|
Welcome to vLLM Forums! :wave:
|
|
1
|
1584
|
March 24, 2025
|
|
About the General category
|
|
0
|
108
|
March 17, 2025
|
|
HIP failure: the operation cannot be performed in the present state
|
|
5
|
24
|
June 28, 2026
|
|
Trying to passtrough 2 7900 XTX to proxmox ubuntu VM
|
|
13
|
23
|
June 27, 2026
|
|
Vllm serve拉起服务,到最后报错了
|
|
1
|
20
|
June 27, 2026
|
|
Window10 wsl2下不能使用vLLM的睡眠模式吗
|
|
1
|
30
|
June 27, 2026
|
|
Speech To Text Guidance
|
|
1
|
31
|
June 26, 2026
|
|
What is the correct chat template when serving gemma4?
|
|
2
|
533
|
June 25, 2026
|
|
Seeking Help: DeepSeek-V4-Flash Fails to Deploy on 8×4090 (48GB VRAM per Card)
|
|
1
|
76
|
June 25, 2026
|
|
Sparse Embedding Support
|
|
2
|
41
|
June 24, 2026
|
|
No available shared memory broadcast block found in 60 seconds
|
|
1
|
91
|
June 22, 2026
|
|
Max_model_len vs GPU memory Usage
|
|
2
|
49
|
June 22, 2026
|
|
Ahead of time compilation of "CUDA Kernels"?
|
|
3
|
62
|
June 21, 2026
|
|
The current vLLM CPU backend is not working properly
|
|
9
|
225
|
June 20, 2026
|
|
What happened to 0.23.0 release?
|
|
1
|
248
|
June 15, 2026
|
|
vLLM的MTP的标准测试方法是什么?
|
|
2
|
81
|
June 13, 2026
|
|
Vllm rocm gemma4 and MTP
|
|
9
|
122
|
June 12, 2026
|
|
PR backlog on GitHub
|
|
2
|
45
|
June 12, 2026
|
|
Rust cannot catch foreign exceptions
|
|
1
|
35
|
June 11, 2026
|
|
Do we Have ChatBots Here
|
|
2
|
51
|
June 11, 2026
|
|
Vllm服务后台的前缀缓存率和单个请求的前缀缓存率
|
|
1
|
56
|
June 11, 2026
|
|
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu
|
|
1
|
82
|
June 5, 2026
|
|
Cuda 12.8 not working vLLM
|
|
8
|
422
|
June 4, 2026
|
|
Is the reason for my vllm 0.20.0 failing to start because of nixl?
|
|
9
|
318
|
June 3, 2026
|
|
Explaination for wait_for_save
|
|
3
|
62
|
June 2, 2026
|
|
(Possible bug) --enable-prompt-tokens-details not working?
|
|
1
|
173
|
June 2, 2026
|
|
[Bug] Segfault in PythonSymNodeImpl and Deadlock on RTX 5090 (Blackwell) with vLLM 0.11.2
|
|
1
|
63
|
June 1, 2026
|
|
[Bug] Segfault in cublasLt/cuLaunchKernel on RTX 5080 using v0.21.0 (V1 Engine)
|
|
1
|
25
|
May 31, 2026
|
|
Segfault in cublasLt/cuLaunchKernel on RTX 5080 using v0.21.0 (V1 Engine)
|
|
1
|
47
|
May 31, 2026
|
|
Preserve reasoning state across turns
|
|
1
|
76
|
May 29, 2026
|