|
Welcome to vLLM Forums! :wave:
|
|
3
|
1007
|
November 12, 2025
|
|
About the General category
|
|
0
|
64
|
March 17, 2025
|
|
Help with vLLM crashes
|
|
1
|
19
|
December 16, 2025
|
|
How to generate just one token?
|
|
1
|
7
|
December 16, 2025
|
|
How to pass add_generation_prompt=False from client?
|
|
6
|
2
|
December 16, 2025
|
|
Which client should I use?
|
|
2
|
13
|
December 16, 2025
|
|
如何尽可能提升推理服务的吞吐量
|
|
1
|
9
|
December 15, 2025
|
|
如何查看配置的batch size是多大?
|
|
1
|
3
|
December 15, 2025
|
|
Vllm serve拉起推理服务报错了
|
|
9
|
20
|
December 15, 2025
|
|
Why latest rocm vllm is so bad?
|
|
3
|
13
|
December 14, 2025
|
|
How to run GGUF with rocm and 7900 xtx
|
|
5
|
8
|
December 14, 2025
|
|
如何通过vllm的日志信息,组装vllm serve指令
|
|
3
|
7
|
December 12, 2025
|
|
VLLM V1 Scheduler: Inconsistent Request Scheduling Under Token Budget Limit
|
|
19
|
118
|
December 11, 2025
|
|
Pp8并行,update_from_output 会等所有rank的 model_executor.execute_model 执行完了之后才会执行吗
|
|
20
|
12
|
December 11, 2025
|
|
请问我在使用vllm 推理qwen3-vl的时候多次请求的过程中比如发起5个请求后,发送的内容是一致的,前面4个请求相对较快返回,最后一个请求,也在running没有处于waiting中,但是等待的结果却比他们慢了很多,比如前面四个可能40s能返回,最后一个需要6分钟
|
|
1
|
16
|
December 11, 2025
|
|
How can I determine which specific stop token triggered the termination?
|
|
3
|
14
|
December 10, 2025
|
|
目前vllm支持哪些推测解码方案
|
|
3
|
32
|
December 9, 2025
|
|
How to custom end token in vllm serve cli?
|
|
4
|
29
|
December 9, 2025
|
|
Why is it so slow to build a odeVLLM from source using Docker?
|
|
1
|
12
|
December 8, 2025
|
|
什么情况下,一个请求会被重复schedule?
|
|
11
|
20
|
December 8, 2025
|
|
Tell me about the current status of the tokenize endpoint in vllm
|
|
4
|
18
|
December 8, 2025
|
|
Project: vLLM docker for running smoothly on RTX 5090 + WSL2
|
|
2
|
327
|
December 6, 2025
|
|
Problem with Gemma3 and vLLM
|
|
11
|
84
|
December 6, 2025
|
|
Invalid request status FINISHED_LENGTH_CAPPED
|
|
1
|
5
|
December 6, 2025
|
|
调用vllm的python接口,推理Qwen3-VL模型
|
|
13
|
53
|
December 5, 2025
|
|
VLLM_SCHED_ENABLE_MINIMAL_INJECTION ,what does this env var mean?
|
|
1
|
11
|
December 5, 2025
|
|
How to custom end token?
|
|
2
|
12
|
December 4, 2025
|
|
能讓輸入上下文比最大上下文更長仍舊能工作嗎?
|
|
2
|
35
|
December 3, 2025
|
|
What is TBO (two-batch overlap)?
|
|
1
|
27
|
December 3, 2025
|
|
How to add custom special tokens?
|
|
3
|
28
|
December 3, 2025
|