Significant Impact of Actor Rollout Mode (Syncvs. Async) on RL Training with vLLM

GaoZhitao121 · August 13, 2025, 4:22pm

When I training with Verl using vLLM as the inference engine for DAPO, I observed that the actor rollout mode (sync vs. async) has a significant impact on the final test results. Specifically, using async inference during training (deploying the trained model with vLLM in async mode) yields better performance. What could be the reason for this discrepancy? Additionally, the quality of responses during async inference also appears to be better.

I am very curious about this. Is there anyone who can explain it? Thx!

RunLLM · August 13, 2025, 4:23pm

The difference in results between sync and async rollout modes in vLLM is likely due to how batching and numerical determinism are handled. In async mode, vLLM batches requests differently and may process them concurrently, leading to non-deterministic token sampling and slight numerical differences in logit/logprob values. These small differences can accumulate, causing the model to generate different tokens and, consequently, diverging outputs and quality—even with the same seed and prompts. This is a known behavior in vLLM, as batching variations and non-deterministic Torch operations can impact output stability and quality, especially in async settings where request scheduling is less predictable. For more details, see the discussion in vLLM FAQ and reproducibility issues.

Would you like a step-by-step breakdown of how async batching and numerical instability affect RL training and inference quality?

Sources:

Topic		Replies	Views
RL Training with vLLM Rollout: How to Mitigate Load Imbalance from Variable Response Lengths RL Integration	4	473	April 1, 2025
Is is possible to initialize an AsyncLLMEngine inside the LLM object? verl	4	238	April 12, 2025
Can the AsyncEngine be compatible with the backend of external_launcher? RL Integration	1	39	June 18, 2025
Intergate async-llm to openrlfh, when tensor_parallel_size > 1 will cause ray-error OpenRLHF	1	103	April 27, 2025
Scheduler in vllm Features	1	130	June 26, 2025

Significant Impact of Actor Rollout Mode (Syncvs. Async) on RL Training with vLLM

Related topics