I’m seeing large models (20B+) crash during startup (loading/compilation) when /dev/shm
is limited to 8GiB, but they succeed with 16GiB. Does vLLM initialization require more than 8GiB of /dev/shm
for such models even when tensor_parallel_size=1
, and what size do you recommend? Also, does it even matter?