Vllm sockets usecase (criu, k8s, checkpointing)

i tried out vllm on my machine with vllm serve Qwen/Qwen3-0.6B-Base

noticed bunch of sockets are opened by vllm other than the API server using ss, what are this sockets used for ?

$ sudo ss -tulnp | grep -i VLLM
tcp LISTEN 0 4096 127.0.0.2:38197 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=47))
tcp LISTEN 0 4096 127.0.0.2:46455 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=57))
tcp LISTEN 0 4096 127.0.0.2:39503 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=39))
tcp LISTEN 0 4096 127.0.0.2:41777 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=49))
tcp LISTEN 0 2048 0.0.0.0:8000 0.0.0.0:* users:((“.vllm-wrapped”,pid=164735,fd=12))
tcp LISTEN 0 4096 127.0.0.2:34587 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=55))
tcp LISTEN 0 4096 127.0.0.2:42859 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=65))
tcp LISTEN 0 4096 127.0.0.2:42055 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=79))
tcp LISTEN 0 4096 127.0.0.2:35239 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=41))
tcp LISTEN 0 4096 127.0.0.2:43415 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=73))
tcp LISTEN 0 4096 127.0.0.2:43011 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=63))
tcp LISTEN 0 4096 127.0.0.2:44827 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=33))
tcp LISTEN 0 4096 127.0.0.2:44125 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=29))
tcp LISTEN 0 4096 127.0.0.2:44285 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=71))
tcp LISTEN 0 4096 *:55381 : users:((“VLLM::EngineCor”,pid=164878,fd=21))
[sinan@cez:~]$

1 Like

The extra sockets you see (besides the API server on port 8000) are used by vLLM for internal inter-process communication between its engine components, such as the EngineCore and worker processes. These sockets are dynamically allocated and are essential for coordinating tasks like model execution, tensor parallelism, and distributed inference within vLLM’s multiprocessing or distributed setup. They are not exposed for external access and are required for vLLM’s internal operation and performance optimization, including communication for tensor/pipeline parallelism and data transfer between processes or devices, as described in the vLLM architecture and distributed execution documentation (see here, see also).

If you are running with tensor or pipeline parallelism, or using multiprocessing, vLLM will open multiple sockets for each worker/engine process to communicate efficiently. This is expected behavior and is necessary for vLLM’s scalability and performance (see here). Would you like more detail on the specific types of sockets or how to control their allocation?

Sources:

Hint: Mention @RunLLM in the post for followups.

it’s created by torch.distributed.init_process_group()
we can use GLOO_SOCKET_IFNAME and NCCL_SOCKET_IFNAME to change the socket iface