Latest General topics

Topic	Replies	Views	Activity
三机部署glm-4.7（bf16）报错	4	37	February 6, 2026
推理时遇到的Memory占用问题	2	23	February 6, 2026
推理时报错RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}	1	107	February 5, 2026
找不到内存广播块的问题（No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work）	2	186	February 5, 2026
How to use prompt (prompt and prompt_name) for EmbeddingGemma using vllm	1	34	February 4, 2026
How to inference or deploy with my custom model	2	90	February 4, 2026
Vllm推理问题，推理结果完全不对	2	25	February 4, 2026
Why there are so many open pull requests?	1	26	February 4, 2026
Status of gpt-oss	3	164	February 3, 2026
Vllm sockets usecase (criu, k8s, checkpointing)	2	20	February 3, 2026
Low Average GPU Utilization (40–70%) on H100 with vLLM — How to Push Toward 90%+?	1	121	January 31, 2026
Is there a way to separately measure the time spent in the prefill and decode stages in vllm offline inference	2	22	January 29, 2026
Why vllm docker run command gets stuck here	1	81	January 27, 2026
VLM 视觉语言模型无法使用 2:4 稀疏推理 - CUTLASS kernel 维度不匹配	1	26	January 27, 2026
"served-model-name" and "model"	6	246	January 26, 2026
NCCL communication handing	1	28	January 24, 2026
在V100显卡上，vLLM并发问题	7	275	January 23, 2026
多机多卡推理 ray vllm遇到的报错	1	44	January 23, 2026
OpenAI Embeddings Not Working	2	73	January 22, 2026
GLM-4.7-Flash with nvidia	9	1583	January 22, 2026
怎么理解custom_all_reduce stage2的跨设备内存可见性注释	6	106	January 22, 2026
vLLM Tensor Parallel Workers Not Completing Initialization	3	447	January 21, 2026
Max_tokens_per_doc support for rerank models	1	21	January 21, 2026
一个长输入的请求，切chunk ，比如切了4份，prefill的时候，这四个可以同时做prefill 吗，还是有依赖关系的	15	68	January 21, 2026
Persistent segfaults/SIGSEGV	1	41	January 20, 2026
Why is it so slow to build a odeVLLM from source using Docker?	39	83	January 17, 2026
HarmonyError: Unexpected token 200002 while expecting start token 200006	1	199	January 14, 2026
Issue: Unable to pass precomputed image embeddings to vLLM	12	139	January 14, 2026
Clarify VLLM Wheels: What Does the +cu129 Tag Actually Change in v0.11.x?	1	47	January 13, 2026
Why Does Latency Remain Unchanged in vLLM 0.11.0 When Input Token Count Decreases for qwen3-vl-30b-a3b?	1	25	January 13, 2026