What is the purpose of multi process

I run a script like python inference.py. In inference.py, I load qwen2-vl model and do inference.

when I run the inference.py, I notice that many process are launched. what is the purpose of multi process and where I can find the source code about it.

Thank you

You can check out this blog vLLM V1: A Major Upgrade to vLLM’s Core Architecture | vLLM Blog for details on the multiprocessing in the V1 engine.

Section 1 shows the split between the engine itself (which does scheduling and compute) and the API server (which recieves and manages HTTP requests).

Section 4 details how the scheduler and worker live on different processes.

1 Like

Just to add, sometimes, for the sake of debugging, keeping them in one process makes debugging easier with debuggers. This can be achieved via adding env var VLLM_ENABLE_V1_MULTIPROCESSING=0

1 Like