Vllm sockets usecase (criu, k8s, checkpointing)

sinanbud · February 2, 2026, 10:27am

i tried out vllm on my machine with vllm serve Qwen/Qwen3-0.6B-Base

noticed bunch of sockets are opened by vllm other than the API server using ss, what are this sockets used for ?

$ sudo ss -tulnp | grep -i VLLM
tcp LISTEN 0 4096 127.0.0.2:38197 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=47))
tcp LISTEN 0 4096 127.0.0.2:46455 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=57))
tcp LISTEN 0 4096 127.0.0.2:39503 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=39))
tcp LISTEN 0 4096 127.0.0.2:41777 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=49))
tcp LISTEN 0 2048 0.0.0.0:8000 0.0.0.0:* users:((“.vllm-wrapped”,pid=164735,fd=12))
tcp LISTEN 0 4096 127.0.0.2:34587 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=55))
tcp LISTEN 0 4096 127.0.0.2:42859 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=65))
tcp LISTEN 0 4096 127.0.0.2:42055 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=79))
tcp LISTEN 0 4096 127.0.0.2:35239 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=41))
tcp LISTEN 0 4096 127.0.0.2:43415 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=73))
tcp LISTEN 0 4096 127.0.0.2:43011 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=63))
tcp LISTEN 0 4096 127.0.0.2:44827 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=33))
tcp LISTEN 0 4096 127.0.0.2:44125 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=29))
tcp LISTEN 0 4096 127.0.0.2:44285 0.0.0.0:* users:((“VLLM::EngineCor”,pid=164878,fd=71))
tcp LISTEN 0 4096 *:55381 : users:((“VLLM::EngineCor”,pid=164878,fd=21))
[sinan@cez:~]$

RunLLM · February 2, 2026, 10:28am

The extra sockets you see (besides the API server on port 8000) are used by vLLM for internal inter-process communication between its engine components, such as the EngineCore and worker processes. These sockets are dynamically allocated and are essential for coordinating tasks like model execution, tensor parallelism, and distributed inference within vLLM’s multiprocessing or distributed setup. They are not exposed for external access and are required for vLLM’s internal operation and performance optimization, including communication for tensor/pipeline parallelism and data transfer between processes or devices, as described in the vLLM architecture and distributed execution documentation (see here, see also).

If you are running with tensor or pipeline parallelism, or using multiprocessing, vLLM will open multiple sockets for each worker/engine process to communicate efficiently. This is expected behavior and is necessary for vLLM’s scalability and performance (see here). Would you like more detail on the specific types of sockets or how to control their allocation?

Sources:

Hint: Mention @RunLLM in the post for followups.

sinanbud · February 3, 2026, 9:50am

it’s created by torch.distributed.init_process_group()
we can use GLOO_SOCKET_IFNAME and NCCL_SOCKET_IFNAME to change the socket iface

Topic		Replies	Views
Help with vLLM crashes General	1	777	December 16, 2025
vLLM Tensor Parallel Workers Not Completing Initialization General	5	1603	May 4, 2026
Trace requests through vLLM v1 General	1	239	May 29, 2025
The current vLLM CPU backend is not working properly General	8	69	June 10, 2026
vLLM on 4 nodes fails randomly General	1	106	April 6, 2026

Vllm sockets usecase (criu, k8s, checkpointing)

Related topics