|
三机部署glm-4.7(bf16)报错
|
|
4
|
37
|
February 6, 2026
|
|
推理时遇到的Memory占用问题
|
|
2
|
23
|
February 6, 2026
|
|
推理时报错RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
|
|
1
|
107
|
February 5, 2026
|
|
找不到内存广播块的问题(No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work)
|
|
2
|
186
|
February 5, 2026
|
|
How to use prompt (prompt and prompt_name) for EmbeddingGemma using vllm
|
|
1
|
34
|
February 4, 2026
|
|
How to inference or deploy with my custom model
|
|
2
|
90
|
February 4, 2026
|
|
Vllm推理问题,推理结果完全不对
|
|
2
|
25
|
February 4, 2026
|
|
Why there are so many open pull requests?
|
|
1
|
26
|
February 4, 2026
|
|
Status of gpt-oss
|
|
3
|
164
|
February 3, 2026
|
|
Vllm sockets usecase (criu, k8s, checkpointing)
|
|
2
|
20
|
February 3, 2026
|
|
Low Average GPU Utilization (40–70%) on H100 with vLLM — How to Push Toward 90%+?
|
|
1
|
121
|
January 31, 2026
|
|
Is there a way to separately measure the time spent in the prefill and decode stages in vllm offline inference
|
|
2
|
22
|
January 29, 2026
|
|
Why vllm docker run command gets stuck here
|
|
1
|
81
|
January 27, 2026
|
|
VLM 视觉语言模型无法使用 2:4 稀疏推理 - CUTLASS kernel 维度不匹配
|
|
1
|
26
|
January 27, 2026
|
|
"served-model-name" and "model"
|
|
6
|
246
|
January 26, 2026
|
|
NCCL communication handing
|
|
1
|
28
|
January 24, 2026
|
|
在V100显卡上,vLLM并发问题
|
|
7
|
275
|
January 23, 2026
|
|
多机多卡推理 ray vllm遇到的报错
|
|
1
|
44
|
January 23, 2026
|
|
OpenAI Embeddings Not Working
|
|
2
|
73
|
January 22, 2026
|
|
GLM-4.7-Flash with nvidia
|
|
9
|
1583
|
January 22, 2026
|
|
怎么理解custom_all_reduce stage2的跨设备内存可见性注释
|
|
6
|
106
|
January 22, 2026
|
|
vLLM Tensor Parallel Workers Not Completing Initialization
|
|
3
|
447
|
January 21, 2026
|
|
Max_tokens_per_doc support for rerank models
|
|
1
|
21
|
January 21, 2026
|
|
一个长输入的请求,切chunk ,比如切了4份,prefill的时候,这四个可以同时做prefill 吗 ,还是有依赖关系的
|
|
15
|
68
|
January 21, 2026
|
|
Persistent segfaults/SIGSEGV
|
|
1
|
41
|
January 20, 2026
|
|
Why is it so slow to build a odeVLLM from source using Docker?
|
|
39
|
83
|
January 17, 2026
|
|
HarmonyError: Unexpected token 200002 while expecting start token 200006
|
|
1
|
199
|
January 14, 2026
|
|
Issue: Unable to pass precomputed image embeddings to vLLM
|
|
12
|
139
|
January 14, 2026
|
|
Clarify VLLM Wheels: What Does the +cu129 Tag Actually Change in v0.11.x?
|
|
1
|
47
|
January 13, 2026
|
|
Why Does Latency Remain Unchanged in vLLM 0.11.0 When Input Token Count Decreases for qwen3-vl-30b-a3b?
|
|
1
|
25
|
January 13, 2026
|