What should be /dev/shm size for larger models

I’m seeing large models (20B+) crash during startup (loading/compilation) when /dev/shm is limited to 8GiB, but they succeed with 16GiB. Does vLLM initialization require more than 8GiB of /dev/shm for such models even when tensor_parallel_size=1, and what size do you recommend? Also, does it even matter?

Came looking for the same. Trying to deploy qwen3 235B in k8s (using 4x 80gb H100 gpus). Its keeps failing with a 20gb shm size.

Currently trying 60gb, and was looking for guidance, wondering if that could be the issue.

A too-small /dev/shm (shared memory) size can cause vLLM to fail when deploying large models like Qwen3 235B, especially with tensor parallelism across multiple GPUs. For large models and multi-GPU setups, it’s recommended to set a large /dev/shm size (e.g., 60GB or more) in your Kubernetes pod spec using an emptyDir volume with medium: Memory and an appropriate sizeLimit—20GB is often insufficient for 4x 80GB H100s running a 235B model. See the official Kubernetes deployment example and multi-node distributed serving docs for reference YAMLs.

If you still encounter issues after increasing /dev/shm, also ensure your pod requests/limits match the number of GPUs, and that your node has enough RAM to back the shared memory allocation. Would you like a sample YAML or more troubleshooting tips?

Sources: