I run a script like python inference.py. In inference.py, I load qwen2-vl model and do inference.
when I run the inference.py, I notice that many process are launched. what is the purpose of multi process and where I can find the source code about it.
Just to add, sometimes, for the sake of debugging, keeping them in one process makes debugging easier with debuggers. This can be achieved via adding env var VLLM_ENABLE_V1_MULTIPROCESSING=0