The background is that we are using vllm as the backend for rollout in veRL for RL training. During rollout stage, we need to invoke several function calls inside each trajectory. Currently most implementations simply use multi-round form, where each form contains exactly one function call. This is will cause large latency when some trajectories are finished earlier and they have to wait the others before the next round generation.
My question is that if we could initialize the AsyncLLMEngine
inside the LLM
object so that we can use async method _add_request
to decouple the external agent workflow management and internal generation process. Or if there is any other recommended workflow for this?