vLLM Forums

Topic	Replies	Views	Activity
Understanding Multi Node Parallelization General	7	179	May 13, 2026
An error in cpu build Site Feedback	1	58	May 12, 2026
GLM 5.1 PP support DeepSeek	1	81	May 9, 2026
What is the recommended way to support dynamic pruning for speculative decoding draft trees? General	1	55	May 8, 2026
How does vllm process multimodal embedding requests General	8	88	May 7, 2026
LLM memory caching General	7	108	May 7, 2026
vLLM 0.20.1, Radeon AI 9700, 1 CPU core at 100% General	5	124	May 6, 2026
QPS doesn't scale with multi-card GPU General	3	72	May 6, 2026
vLLM Tensor Parallel Workers Not Completing Initialization General	5	1516	May 4, 2026
How to extend the context length up to 1,010,000 tokens on Qwen3.5? Model Support	2	245	May 4, 2026
How is vLLM handling internal queue requests? General	3	101	May 4, 2026
RDNA4 FP8 support General	1	164	May 2, 2026
Getting flashinfer.jit: [Autotuner]: OOM detected General	3	101	May 2, 2026
To understand max-num-seqs better! General	1	387	April 30, 2026
Vllm-0.18.0 kv cache使用率从100%掉到0% General	3	90	April 30, 2026
Has anyone successfully deploy deepseek-v4-flash on 8xL40s? General	1	330	April 30, 2026
What is the correct chat template when serving gemma4? General	1	300	April 30, 2026
Support for V100 (sm 70) on vllm 0.20 General	1	502	April 30, 2026
The latest version of vllm is not compatible with local deployment of deepseek-v4（0.20） DeepSeek	2	435	April 29, 2026
Why does a larger max_num_batched_tokens lead to less available KV cache memory General	1	139	April 29, 2026
Install using --torch-backend=cu129 but try to import cu13 General	8	1352	April 29, 2026
vLLM 两节点分布式部署下，/v1/completions 接口带 logprobs 参数会无限卡死 (Hang) General	3	87	April 29, 2026
Throughput drops and increased TTFT when running background automation executors General	1	76	April 28, 2026
Gibberish output from NVFP4 quantized Ministral on VLLM 0.19.2rc1.dev205+g07351e088 General	1	71	April 27, 2026
vLLM and vLLM omni difference General	3	153	April 26, 2026
Why is TTFT worse for decode=1 than decode=100? General	1	50	April 26, 2026
How to understand OOM and foresee memory usage General	5	163	April 24, 2026
GPTQModel 能量化 GLM-5 FP16 到 INT8 吗 Quantization	9	98	April 24, 2026
What is Included in -gpu-memory-utilization General	4	179	April 24, 2026
EngineCore Error with NVIDIA-Nemotron-3-Super-120B-A12B-FP8 on 2*H100 General	2	89	April 21, 2026