I attempted to use vLLM AsyncEngine + external_launcher for inference acceleration under torchrun in the RL training pipeline, but I encountered compatibility issues. The LLMEngine had previously run without any problems.
vLLM’s LLMEngine
supports SPMD-style offline inference with torchrun
using the distributed_executor_backend="external_launcher"
option, as shown in the torchrun example. However, AsyncLLMEngine
is not documented or supported for use with external_launcher
or torchrun
—it is mainly intended for online serving, and there are known issues with event loop management and distributed execution in this context. Compatibility issues are expected if you try to use AsyncLLMEngine
with torchrun
and external_launcher
(source, source).
For RL pipelines requiring distributed inference acceleration under torchrun
, use the standard LLM
class with distributed_executor_backend="external_launcher"
as in the official example. If you need async or online serving, use AsyncLLMEngine
but not with torchrun
/external_launcher
. Would you like a step-by-step code example for the recommended approach?
Sources: