|
Vllm serve拉起推理服务报错了
|
|
9
|
212
|
December 15, 2025
|
|
Why latest rocm vllm is so bad?
|
|
3
|
320
|
December 14, 2025
|
|
How to run GGUF with rocm and 7900 xtx
|
|
5
|
327
|
December 14, 2025
|
|
如何通过vllm的日志信息,组装vllm serve指令
|
|
3
|
87
|
December 12, 2025
|
|
Llama 3.3 70B very slow
|
|
5
|
857
|
December 11, 2025
|
|
请问我在使用vllm 推理qwen3-vl的时候多次请求的过程中比如发起5个请求后,发送的内容是一致的,前面4个请求相对较快返回,最后一个请求,也在running没有处于waiting中,但是等待的结果却比他们慢了很多,比如前面四个可能40s能返回,最后一个需要6分钟
|
|
1
|
87
|
December 11, 2025
|
|
How can I determine which specific stop token triggered the termination?
|
|
3
|
157
|
December 10, 2025
|
|
目前vllm支持哪些推测解码方案
|
|
3
|
208
|
December 9, 2025
|
|
How to custom end token in vllm serve cli?
|
|
4
|
180
|
December 9, 2025
|
|
Tell me about the current status of the tokenize endpoint in vllm
|
|
4
|
339
|
December 8, 2025
|
|
Project: vLLM docker for running smoothly on RTX 5090 + WSL2
|
|
2
|
870
|
December 6, 2025
|
|
Problem with Gemma3 and vLLM
|
|
11
|
812
|
December 6, 2025
|
|
Invalid request status FINISHED_LENGTH_CAPPED
|
|
1
|
31
|
December 6, 2025
|
|
调用vllm的python接口,推理Qwen3-VL模型
|
|
13
|
503
|
December 5, 2025
|
|
VLLM_SCHED_ENABLE_MINIMAL_INJECTION ,what does this env var mean?
|
|
1
|
24
|
December 5, 2025
|
|
How to custom end token?
|
|
2
|
88
|
December 4, 2025
|
|
能讓輸入上下文比最大上下文更長仍舊能工作嗎?
|
|
2
|
152
|
December 3, 2025
|
|
What is TBO (two-batch overlap)?
|
|
1
|
148
|
December 3, 2025
|
|
How to add custom special tokens?
|
|
3
|
237
|
December 3, 2025
|
|
Seeking guidance to start Learning Inference optimization
|
|
2
|
111
|
December 2, 2025
|
|
Npu 310p3 的生成速率
|
|
3
|
278
|
December 2, 2025
|
|
Understanding vllm kv cache
|
|
5
|
1564
|
December 1, 2025
|
|
Hard dependency on Torch 2.9.0?
|
|
1
|
429
|
December 1, 2025
|
|
Has anyone successfully run DBO in a single node multi card environment?
|
|
1
|
109
|
December 1, 2025
|
|
Does vllm inference work with Qwen3-VL-30B
|
|
3
|
497
|
November 30, 2025
|
|
Batched processing over openai api
|
|
3
|
109
|
November 30, 2025
|
|
Running gat model with rocm
|
|
2
|
136
|
November 28, 2025
|
|
EPLB behavior in elastic scaling
|
|
21
|
204
|
November 28, 2025
|
|
Text to speech support with /v1/audio/speech route
|
|
1
|
700
|
November 28, 2025
|
|
用ray启动pp8 tp2的时候,卡在了下面这个地方,请问下可能是什么原因
|
|
5
|
157
|
November 28, 2025
|