Your error message “MulitprocExecutor got fatal signal from worker processes, shutting down” with “Killed” during vLLM multi-GPU (tensor-parallel-size=4) quantized model loading is most likely due to out-of-memory (OOM) or insufficient shared memory for inter-process communication, especially when using Docker or large models. This is a common issue when the available GPU memory or /dev/shm is not large enough for the model and parallelism requested, or when the quantized model and tensor parallelism are not fully compatible.
For quantized models, vLLM does not support tensor parallelism (TP) with prequantized BitsAndBytes models; you must use pipeline parallelism (PP) instead. If you are not using BitsAndBytes quantization, you may still be running out of memory or shared memory. Try reducing tensor-parallel-size, lowering max-model-len, or increasing the shared memory size (e.g., --shm-size=24g in Docker). For more details, see the error and workaround discussions in vLLM issue #13297 and vLLM issue #10034.
Would you like a step-by-step troubleshooting guide or example commands for your setup?