V1 has lower end-to-end performance than V0(--num-scheduler-steps=8)

lianjie-163 · June 11, 2025, 3:48pm

Testing Environment:

vllm version: 0.8.5
Operating System: Ubuntu 22.04
GPU: Nvidia A100

Performance Comparison:

For Llama/qwen2 series models, the end-to-end inference performance of V1 is 5% to 10% lower than that of V0.
Version 0 configuration: --num-scheduler-steps=8

Question:

Is there any community that has relevant performance comparison data?

RunLLM · June 11, 2025, 3:49pm

Yes, there are multiple community reports comparing vLLM V1 and V0 performance. Several users have observed that, for Llama and Qwen2 series models, V1 can be 5–10% slower than V0 in end-to-end inference, especially when using similar configurations (e.g., --num-scheduler-steps=8 in V0). These findings are documented in detailed benchmark discussions and issue threads, with users sharing their own test results and configurations for reproducibility. For example, see the performance regression reports and community benchmarks in this GitHub issue and this Slack thread.

Would you like more detail on the specific benchmark setups or configuration differences?

Sources:

Topic		Replies	Views
vllm的V1为什么删除了multi step特性 General	3	62	June 11, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) V1 Feedback	6	35	June 12, 2025
What is the Perf Difference between V0 Engine (--num-scheduler-steps) vs V1 Engine? General	1	30	May 29, 2025
I published a performance test result of vllm vs sglang but can someone help me explain it? General	3	92	April 29, 2025
Help needed with vLLM Qwen3-32B inference performance tuning (long input slow) General	1	75	May 26, 2025

V1 has lower end-to-end performance than V0(--num-scheduler-steps=8)

Related topics