nvidia-smi shows both gpu’s visible(two times 24GB). However i get the following errors:
(VllmWorkerProcess pid=723155) ERROR 06-24 19:28:36 [multiproc_worker_utils.py:239] RuntimeError: CUDA error: out of memory
(VllmWorkerProcess pid=723155) ERROR 06-24 19:28:36 [multiproc_worker_utils.py:239] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorkerProcess pid=723155) ERROR 06-24 19:28:36 [multiproc_worker_utils.py:239] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorkerProcess pid=723155) ERROR 06-24 19:28:36 [multiproc_worker_utils.py:239] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorkerProcess pid=723155) ERROR 06-24 19:28:36 [multiproc_worker_utils.py:239]
Failed to evaluate AIDC-AI/Ovis2-1B: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
lmWorkerProcess pid=726023) INFO 06-24 19:30:54 [cuda.py:275] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
(VllmWorkerProcess pid=726023) INFO 06-24 19:30:54 [cuda.py:324] Using XFormers backend.
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] Traceback (most recent call last):
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] File “/home/bdck/VLM/venv_vlm_prometheus/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.
py”, line 233, in _run_worker_process
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] File “/home/bdck/VLM/venv_vlm_prometheus/lib/python3.10/site-packages/vllm/utils.py”, line 2671, in run_met
hod
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] return func(*args, **kwargs)
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] File “/home/bdck/VLM/venv_vlm_prometheus/lib/python3.10/site-packages/vllm/worker/worker_base.py”, line 606, in init_device
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] self.worker.init_device() # type: ignore
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] File “/home/bdck/VLM/venv_vlm_prometheus/lib/python3.10/site-packages/vllm/worker/worker.py”, line 182, in init_device
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] File “/home/bdck/VLM/venv_vlm_prometheus/lib/python3.10/site-packages/torch/cuda/init.py”, line 529, in set_device
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] RuntimeError: CUDA error: out of memory
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorkerProcess pid=726023) ERROR 06-24 19:30:55 [multiproc_worker_utils.py:239]
Failed to evaluate Qwen/Qwen2.5-VL-32B-Instruct: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
ERROR 06-24 19:30:55 [multiproc_worker_utils.py:121] Worker VllmWorkerProcess pid 596088 died, exit code: -15
INFO 06-24 19:30:55 [multiproc_worker_utils.py:125] Killing local vLLM worker processes
ERROR 06-24 19:30:55 [multiproc_worker_utils.py:121] Worker VllmWorkerProcess pid 643798 died, exit code: -15
INFO 06-24 19:30:55 [multiproc_worker_utils.py:125] Killing local vLLM worker processes
ERROR 06-24 19:30:55 [multiproc_worker_utils.py:121] Worker VllmWorkerProcess pid 691188 died, exit code: -15
INFO 06-24 19:30:55 [multiproc_worker_utils.py:125] Killing local vLLM worker processes
ERROR 06-24 19:30:56 [multiproc_worker_utils.py:121] Worker VllmWorkerProcess pid 572763 died, exit code: -15
INFO 06-24 19:30:56 [multiproc_worker_utils.py:125] Killing local vLLM worker processes
INFO 06-24 19:30:56 [multiproc_worker_utils.py:125] Killing local vLLM worker processes
ERROR 06-24 19:30:56 [multiproc_worker_utils.py:121] Worker VllmWorkerProcess pid 572244 died, exit code: -15
INFO 06-24 19:30:56 [multiproc_worker_utils.py:125] Killing local vLLM worker processes
[rank0]:[W624 19:30:59.935167688 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info,
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’