Performance Issue While Requests Queuing

tjtanaa · April 8, 2025, 4:34am

@thiner How did you measure generation speed?
If the generation speed that you observed is the generation speed of only-one request. Then what you are observing is expected. As the vLLM server processing more requests, the generation speed of individual request drops, as the compute capacity is shared to process other requests in parallel.

Topic		Replies	Views
请问我在使用vllm 推理qwen3-vl的时候多次请求的过程中比如发起5个请求后，发送的内容是一致的，前面4个请求相对较快返回，最后一个请求，也在running没有处于waiting中，但是等待的结果却比他们慢了很多，比如前面四个可能40s能返回，最后一个需要6分钟 General	1	67	December 11, 2025
Num request running stays on 1 Benchmarking	3	284	August 29, 2025
A question about request handling General	5	832	August 21, 2025
Vllm-0.18.0 kv cache使用率从100%掉到0% General	3	50	April 30, 2026
Vllm处理长输入性能变差讨论 General	8	582	November 19, 2025

Performance Issue While Requests Queuing

Related topics