|
How to get structured outputs in vllm?
|
|
12
|
452
|
December 22, 2025
|
|
如何查看配置的batch size是多大?
|
|
2
|
115
|
December 22, 2025
|
|
How to apply FA4 on B200?
|
|
3
|
579
|
December 18, 2025
|
|
Issue running gemma-3-27b-it with vLLM version: 0.12.0
|
|
1
|
195
|
December 17, 2025
|
|
VLLM V1 Scheduler: Inconsistent Request Scheduling Under Token Budget Limit
|
|
25
|
646
|
December 17, 2025
|
|
Help with vLLM crashes
|
|
1
|
777
|
December 16, 2025
|
|
How to generate just one token?
|
|
1
|
88
|
December 16, 2025
|
|
How to pass add_generation_prompt=False from client?
|
|
5
|
285
|
December 16, 2025
|
|
Which client should I use?
|
|
2
|
158
|
December 16, 2025
|
|
Vllm serve拉起推理服务报错了
|
|
9
|
250
|
December 15, 2025
|
|
Why latest rocm vllm is so bad?
|
|
3
|
328
|
December 14, 2025
|
|
How to run GGUF with rocm and 7900 xtx
|
|
5
|
349
|
December 14, 2025
|
|
如何通过vllm的日志信息,组装vllm serve指令
|
|
3
|
96
|
December 12, 2025
|
|
Llama 3.3 70B very slow
|
|
5
|
884
|
December 11, 2025
|
|
请问我在使用vllm 推理qwen3-vl的时候多次请求的过程中比如发起5个请求后,发送的内容是一致的,前面4个请求相对较快返回,最后一个请求,也在running没有处于waiting中,但是等待的结果却比他们慢了很多,比如前面四个可能40s能返回,最后一个需要6分钟
|
|
1
|
90
|
December 11, 2025
|
|
How can I determine which specific stop token triggered the termination?
|
|
3
|
166
|
December 10, 2025
|
|
目前vllm支持哪些推测解码方案
|
|
3
|
258
|
December 9, 2025
|
|
How to custom end token in vllm serve cli?
|
|
4
|
193
|
December 9, 2025
|
|
Tell me about the current status of the tokenize endpoint in vllm
|
|
4
|
375
|
December 8, 2025
|
|
Project: vLLM docker for running smoothly on RTX 5090 + WSL2
|
|
2
|
888
|
December 6, 2025
|
|
Problem with Gemma3 and vLLM
|
|
11
|
857
|
December 6, 2025
|
|
Invalid request status FINISHED_LENGTH_CAPPED
|
|
1
|
36
|
December 6, 2025
|
|
调用vllm的python接口,推理Qwen3-VL模型
|
|
13
|
532
|
December 5, 2025
|
|
VLLM_SCHED_ENABLE_MINIMAL_INJECTION ,what does this env var mean?
|
|
1
|
28
|
December 5, 2025
|
|
How to custom end token?
|
|
2
|
96
|
December 4, 2025
|
|
能讓輸入上下文比最大上下文更長仍舊能工作嗎?
|
|
2
|
162
|
December 3, 2025
|
|
What is TBO (two-batch overlap)?
|
|
1
|
159
|
December 3, 2025
|
|
How to add custom special tokens?
|
|
3
|
247
|
December 3, 2025
|
|
Seeking guidance to start Learning Inference optimization
|
|
2
|
126
|
December 2, 2025
|
|
Npu 310p3 的生成速率
|
|
3
|
302
|
December 2, 2025
|