What is the purpose of multi process

YSF-A · April 8, 2025, 11:23am

I run a script like python inference.py. In inference.py, I load qwen2-vl model and do inference.

when I run the inference.py, I notice that many process are launched. what is the purpose of multi process and where I can find the source code about it.

Thank you

hmellor · April 8, 2025, 10:56pm

You can check out this blog vLLM V1: A Major Upgrade to vLLM’s Core Architecture | vLLM Blog for details on the multiprocessing in the V1 engine.

Section 1 shows the split between the engine itself (which does scheduling and compute) and the API server (which recieves and manages HTTP requests).

Section 4 details how the scheduler and worker live on different processes.

youkaichao · April 12, 2025, 5:15pm

Just to add, sometimes, for the sake of debugging, keeping them in one process makes debugging easier with debuggers. This can be achieved via adding env var VLLM_ENABLE_V1_MULTIPROCESSING=0

Topic		Replies	Views
Is Batch Inference for Multimodal Models Truly Batch Inference? General	3	21	April 29, 2025
Offline multi-node inference General	7	12	May 6, 2025
Why vLLM uses alot of CPU memory General	1	94	April 21, 2025
Pipeline Parallelism Support - Source Code Location Features	1	53	March 25, 2025
Does vLLM support multiple model_executor? Scheduling	1	10	April 28, 2025

What is the purpose of multi process

Related topics