我的配置是两张h100 pcie接口(无nvlink),在python的虚拟环境中使用vllm0.15版本进行双卡运行时出现了这个错误,因此我用1.5b小模型进行了实验,在单卡模式都可以正常运行并使用(分别指定了gpu0和1),但是加了参数–tensor-parallel-size 2后就会出现这个报错卡死,不能正常的输出,我已将共享内存调到了52g(总共只有64g内存),但貌似不起作用,询问了ai让我修改了许多有关nccl、SHM的环境变量也无济于事,有无合理的解决方案。
(My configuration consists of two H100 PCIe interfaces (without NVLink). When running with two GPUs in the Python virtual environment using vllm 0.15 version, this error occurred. Therefore, I conducted experiments with a 1.5b small model. In single-GPU mode, it could run normally and be used(Designated gpu0 and gpu1 respectively), but when adding the parameter –tensor-parallel-size 2, it would cause this error and crash, preventing normal output. I have set the shared memory to 52G (the total memory is only 64G). However, it seems to be ineffective. I have consulted AI and asked them to modify many environment variables related to NCCL and SHM, but it didn’t help. Is there any reasonable solution?)
下面是可能需要的信息(The following are the possible information that might be needed:):
系统:kubuntu 24.04
运行指令:CUDA_VISIBLE_DEVICES=0,1 vllm serve /home/user/llm/Qwen2.5-1.5B-Instruct --served-model-name Qwen2.5-1.5B-Instruct --dtype auto --api-key token-abc123 --tensor-parallel-size 2 --gpu-memory-utilization 0.2
共享内存信息:df -h /dev/shm
umount: /dev/shm: target is busy.
Filesystem Size Used Avail Use% Mounted on
tmpfs 52G 0 52G 0% /dev/shm
n卡通讯状态:
(user) user@st650:~$ nvidia-smi topo -m
GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE 0-31,64-95 0 N/A
GPU1 NODE X 0-31,64-95 0 N/A
如需其他信息请告知我(If you need any other information, please let me know.)