I have an NVIDIA L40S with:
Driver 580.95.05 + CUDA 13.0
nvidia-smi works perfectly
Torch sees the GPU without any problems (torch.cuda.is_available() == True)
However, all modern inference engines fail with the error:
RuntimeError: No CUDA GPUs are available
This error only occurs in subprocesses (vLLM’s EngineCore)
Tested:
vLLM 0.21.0:
VLLM_USE_V1=0
VLLM_WORKER_MULTIPROC_METHOD=spawn and fork
All possible NVML bypasses (VLLM_DISABLE_PYNVML, PYTORCH_NVML_DISABLE, etc.)
Clean reinstall of PyTorch cu130 + vLLM
NVML / pynvml :
nvidia-ml-py reinstalled
Driver reloaded (rmmod + modprobe nvidia)
Torch works, but pynvml.nvmlDeviceGetHandleByIndex() returns NVMLError_Unknown
Environment
OS: RHEL-9
Podman version 5.4.0
Template: DeepSeek-Coder-V2-Lite-Base
GPU: NVIDIA L40S 48GB
Question
This problem seems related to an NVML incompatibility with the 580 driver + CUDA 13.0 in child processes (multiprocessing/spawn).
Has anyone else encountered this problem on an L40S or with the 580 driver?
Thanks in advance!