What is the purpose of multi process

I run a script like python inference.py. In inference.py, I load qwen2-vl model and do inference.

when I run the inference.py, I notice that many process are launched. what is the purpose of multi process and where I can find the source code about it.

Thank you

You can check out this blog vLLM V1: A Major Upgrade to vLLM’s Core Architecture | vLLM Blog for details on the multiprocessing in the V1 engine.

Section 1 shows the split between the engine itself (which does scheduling and compute) and the API server (which recieves and manages HTTP requests).

Section 4 details how the scheduler and worker live on different processes.

Just to add, sometimes, for the sake of debugging, keeping them in one process makes debugging easier with debuggers. This can be achieved via adding env var VLLM_ENABLE_V1_MULTIPROCESSING=0