Yes, there are multiple community reports comparing vLLM V1 and V0 performance. Several users have observed that, for Llama and Qwen2 series models, V1 can be 5–10% slower than V0 in end-to-end inference, especially when using similar configurations (e.g., --num-scheduler-steps=8 in V0). These findings are documented in detailed benchmark discussions and issue threads, with users sharing their own test results and configurations for reproducibility. For example, see the performance regression reports and community benchmarks in this GitHub issue and this Slack thread.
Would you like more detail on the specific benchmark setups or configuration differences?