Performance degradation report (0.9.0.1 vs 0.10.0)

RunLLM · August 13, 2025, 9:16am

This is a known issue: after upgrading from vLLM 0.9.0.1 to 0.10.0, users have observed increased average latency under high load with the default scheduler. Enabling the --async-scheduling flag in v0.10.0 resolves the regression, restoring or improving performance to match or exceed v0.9.0.1. This behavior has been reported by others and is not unique to your setup. The async scheduler is currently experimental but is recommended for high-concurrency scenarios to reduce CPU overhead and improve latency and throughput, though it may not support all features yet (e.g., structured outputs, speculative decoding, pipeline parallelism) according to the vLLM CLI documentation.

The performance degradation with the default scheduler in v0.10.0 is not explicitly documented as an intended change, so it may be an unintended regression. The async scheduler is the suggested workaround for now. Would you like a step-by-step breakdown of the relevant scheduler and async_scheduling options or more detail on the underlying cause?

Sources:

Topic		Replies	Views
Vllm 0.10.1 v1 benchmark Only a part of the requests can be processed before it gets stuck General	1	187	November 4, 2025
Vllm 0.10.1 v1 benchmark stuck General	1	232	November 5, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) General	1	221	June 11, 2025
VLLM 0.10.1 benchmark do not free memory Benchmarking	13	212	November 10, 2025
vLLM throughput dropping when running concurrent background executors? General	1	118	April 11, 2026

Performance degradation report (0.9.0.1 vs 0.10.0)

Related topics