|
Welcome to vLLM Forums! :wave:
|
|
3
|
1168
|
November 12, 2025
|
|
About the General category
|
|
0
|
70
|
March 17, 2025
|
|
Significant speedup observed with long common prefix between v0.11.0 and v0.12.0
|
|
9
|
7
|
February 13, 2026
|
|
Skip_leading_tokens 是什么意思
|
|
41
|
39
|
February 13, 2026
|
|
Priority in batch api
|
|
7
|
4
|
February 12, 2026
|
|
Is support diffusers.Pipeline's LoRA file?
|
|
9
|
14
|
February 12, 2026
|
|
We're Live: OCI Deployment Guide for vLLM Production Stack
|
|
2
|
3
|
February 12, 2026
|
|
Native FP8 WMMA Support for AMD RDNA4 (RX 9070 XT / R9700) in vLLM
|
|
5
|
606
|
February 12, 2026
|
|
Is the vllm repository currently equipped with nightly testing
|
|
3
|
15
|
February 10, 2026
|
|
Qwen3-TTS Base模型 问题
|
|
1
|
20
|
February 9, 2026
|
|
What is the role of the additional process running on GPU 0 in DP+EP?
|
|
1
|
9
|
February 8, 2026
|
|
Pre-Built Docker Install
|
|
2
|
17
|
February 8, 2026
|
|
Does vLLM support shieldgemma?
|
|
1
|
10
|
February 6, 2026
|
|
vLLM v0.15.1 failing when deployed in AWS
|
|
3
|
160
|
February 6, 2026
|
|
三机部署glm-4.7(bf16)报错
|
|
4
|
23
|
February 6, 2026
|
|
推理时遇到的Memory占用问题
|
|
2
|
12
|
February 6, 2026
|
|
推理时报错RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
|
|
1
|
32
|
February 5, 2026
|
|
找不到内存广播块的问题(No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work)
|
|
2
|
31
|
February 5, 2026
|
|
How to use prompt (prompt and prompt_name) for EmbeddingGemma using vllm
|
|
1
|
18
|
February 4, 2026
|
|
How to inference or deploy with my custom model
|
|
2
|
70
|
February 4, 2026
|
|
Vllm推理问题,推理结果完全不对
|
|
2
|
18
|
February 4, 2026
|
|
How to expose v1/audio/transcriptions router for custom models
|
|
7
|
19
|
February 4, 2026
|
|
Why there are so many open pull requests?
|
|
1
|
13
|
February 4, 2026
|
|
Status of gpt-oss
|
|
3
|
80
|
February 3, 2026
|
|
Vllm sockets usecase (criu, k8s, checkpointing)
|
|
2
|
14
|
February 3, 2026
|
|
Low Average GPU Utilization (40–70%) on H100 with vLLM — How to Push Toward 90%+?
|
|
1
|
66
|
January 31, 2026
|
|
Is there a way to separately measure the time spent in the prefill and decode stages in vllm offline inference
|
|
2
|
16
|
January 29, 2026
|
|
Why vllm docker run command gets stuck here
|
|
1
|
47
|
January 27, 2026
|
|
VLM 视觉语言模型无法使用 2:4 稀疏推理 - CUTLASS kernel 维度不匹配
|
|
1
|
20
|
January 27, 2026
|
|
"served-model-name" and "model"
|
|
6
|
98
|
January 26, 2026
|