|
Welcome to vLLM Forums! :wave:
|
|
3
|
1230
|
November 12, 2025
|
|
About the General category
|
|
0
|
76
|
March 17, 2025
|
|
On HX 370 ryzen iGPU
|
|
1
|
3
|
March 14, 2026
|
|
ModuleNotFoundError: No module named 'soundfile'
|
|
1
|
3
|
March 13, 2026
|
|
How to add a new sampler method into the current vllm code
|
|
1
|
2
|
March 13, 2026
|
|
Gpt-oss 20b not working
|
|
4
|
11
|
March 12, 2026
|
|
A local vLLM orchestrator (CLI & Web UI) for VRAM pre-calculation and CPU deployments
|
|
3
|
14
|
March 12, 2026
|
|
Vllm中,deepseek的模型 刷新kvcache的地方在哪
|
|
27
|
58
|
March 12, 2026
|
|
Measuring interactivity on vLLM
|
|
1
|
2
|
March 12, 2026
|
|
Qwen3.5-35b-a3b-fp8 显存越界
|
|
1
|
19
|
March 12, 2026
|
|
Contacting the vLLM Semantic Router team
|
|
4
|
8
|
March 11, 2026
|
|
500 internal server error when using webP
|
|
1
|
4
|
March 11, 2026
|
|
Given a completion text for a fixed prompt text, how to calculate the log_prob of the completion text
|
|
1
|
7
|
March 11, 2026
|
|
Questions about performance tradeoffs between process-based and thread-based orchestration for multiple independent vllm serve instances
|
|
1
|
10
|
March 10, 2026
|
|
Need to serve a Qwen3 LLM with 235B params
|
|
9
|
55
|
March 7, 2026
|
|
60秒内没找到可用的内存广播块(No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization).)
|
|
1
|
114
|
March 7, 2026
|
|
Expert Parallelism All-to-All Communication without NVLink and DeepEP
|
|
3
|
53
|
March 3, 2026
|
|
How to expose v1/audio/transcriptions router for custom models
|
|
34
|
54
|
March 3, 2026
|
|
torch.OutOfMemoryError: CUDA out of memory
|
|
15
|
169
|
March 3, 2026
|
|
Vllm 0.16.0 version log changes
|
|
3
|
90
|
March 2, 2026
|
|
Gguf pypi release version 0.18.0 (27 feb)
|
|
1
|
52
|
March 2, 2026
|
|
How to run GGUF with vLLM and ROCM
|
|
4
|
86
|
March 1, 2026
|
|
Following Qwen3.5 Usage Guide on H20 ,but can not host Qwen3.5-27B
|
|
4
|
173
|
February 28, 2026
|
|
BranchContext: CoW filesystem isolation for multi-sample vLLM workflows
|
|
1
|
10
|
February 27, 2026
|
|
How to serve two vLLM instance using docker?
|
|
3
|
65
|
February 26, 2026
|
|
Thinking Token limit setting
|
|
11
|
93
|
February 26, 2026
|
|
Vllm-openai DockerHub missing 0.16 tags
|
|
2
|
50
|
February 25, 2026
|
|
vLLM-Ascend 是否支持reasoning-parser
|
|
1
|
20
|
February 25, 2026
|
|
What is the role of the additional process running on GPU 0 in DP+EP?
|
|
3
|
17
|
February 25, 2026
|
|
True eager backend
|
|
6
|
35
|
February 24, 2026
|