vLLM Forums

Topic	Replies	Views	Activity
Is there a way to separately measure the time spent in the prefill and decode stages in vllm offline inference General	2	63	January 29, 2026
How to run Deep Seek OCR 2 in vllm DeepSeek	1	1254	January 27, 2026
Why vllm docker run command gets stuck here General	1	231	January 27, 2026
VLM 视觉语言模型无法使用 2:4 稀疏推理 - CUTLASS kernel 维度不匹配 General	1	49	January 27, 2026
"served-model-name" and "model" General	6	1164	January 26, 2026
NCCL communication handing General	1	98	January 24, 2026
在V100显卡上，vLLM并发问题 General	7	571	January 23, 2026
多机多卡推理 ray vllm遇到的报错 General	1	98	January 23, 2026
Benchmark for flash_attention Benchmarking	4	187	January 22, 2026
OpenAI Embeddings Not Working General	2	188	January 22, 2026
GLM-4.7-Flash with nvidia General	9	2127	January 22, 2026
怎么理解custom_all_reduce stage2的跨设备内存可见性注释 General	6	169	January 22, 2026
Max_tokens_per_doc support for rerank models General	1	86	January 21, 2026
一个长输入的请求，切chunk ，比如切了4份，prefill的时候，这四个可以同时做prefill 吗，还是有依赖关系的 General	15	339	January 21, 2026
Persistent segfaults/SIGSEGV General	1	278	January 20, 2026
Standalone draft model spec decode support in v0.x and v1 Speculative Decoding	3	184	January 20, 2026
How to get the log for benchmarking Benchmarking	17	480	January 19, 2026
How to get kv cache value from vllm KV-Cache	5	326	January 19, 2026
Why is it so slow to build a odeVLLM from source using Docker? General	39	606	January 17, 2026
HarmonyError: Unexpected token 200002 while expecting start token 200006 General	1	519	January 14, 2026
Issue: Unable to pass precomputed image embeddings to vLLM General	12	428	January 14, 2026
Is there a plan for EVS to support Qwen3VL in response to the issue of sparse video tokens? Multi-modality	1	107	January 13, 2026
Clarify VLLM Wheels: What Does the +cu129 Tag Actually Change in v0.11.x? General	1	228	January 13, 2026
Why Does Latency Remain Unchanged in vLLM 0.11.0 When Input Token Count Decreases for qwen3-vl-30b-a3b? General	1	76	January 13, 2026
Exposing KV cache for recomposition / reuse beyond prefix caching? KV-Cache	1	158	January 13, 2026
Why doesn't the parameter n in samplingparams work as expected General	4	285	January 13, 2026
vLLM on RTX5090: Working GPU setup with torch 2.9.0 cu128 NVIDIA GPU Support	18	6452	January 13, 2026
vLLM Engine Arguments Documentation General	1	67	January 12, 2026
vLLM running on NVIDIA NIM vs Native VLLM tunning options General	1	298	January 10, 2026
Why I feel cuda-kernel marlin run not fast? Quantization	5	209	January 9, 2026