Hello, I am trying to run the NVIDIA Nemotron 3 Nano NVFP4 model across 2x Blackwell GPUs installed in my system. The model appears to be only about 20 GB in size, so it ought to work with my dual-GPU configuration with 32 GB of VRAM, minus the 2-4 GB of OS overhead.
System Configuration
- Motherboard: Gigabyte Aorus B650E Elite ICE AX (firmware version F41, latest, January 2026)
- CPU: AMD Ryzen 9 9950X
- Memory: 128 GB (4x32 GB) TeamGroup DDR5
- GPU1: NVIDIA GeForce RTX 5080 (16 GB VRAM)
- GPU0: NVIDIA GeForce RTX 5070 Ti (16 GB VRAM)
- NVIDIA driver 591.86
- Windows 11 25H2 fully patched
- Docker Desktop v4.58.0 (latest)
WSL version: 2.5.9.0
Kernel version: 6.6.87.2-1
WSLg version: 1.0.66
MSRDC version: 1.2.6074
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26100.1-240331-1435.ge-release
Windows version: 10.0.26200.7623
Here’s the nvidia-smi output from the Docker Desktop Windows Subsystem for Linux (WSL) environment:
Actual Result
According to Google AI Studio, the following vLLM container command should launch NVIDIA Nemotron 3 Nano NVFP4 successfully:
docker run --gpus all `
--rm `
-v "C:\git\NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4:/model" `
-p 8000:8000 `
--env VLLM_USE_FLASHINFER_MOE_FP4=1 `
--env VLLM_FLASHINFER_MOE_BACKEND=throughput `
--env CUDA_DEVICE_ORDER=PCI_BUS_ID `
--ipc=host `
vllm/vllm-openai:latest `
--model /model `
--served-model-name nemotron `
--max-model-len 131072 `
--max-num-seqs 8 `
--kv-cache-dtype fp8 `
--trust-remote-code `
--reasoning-parser-plugin "/model/nano_v3_reasoning_parser.py" `
--reasoning-parser nano_v3
When I run this command, I get the following errors:
Starting to load model /model...
Using flashinfer-cutlass for NVFP4 GEMM
Process EngineCore_DP0:
EngineCore failed to start.
ValueError: NvFp4 MoE backend 'FLASHINFER_CUTLASS' does not support the deployment configuration since kernel does not support current device.
Expected Result
The NVFP4 model should run on both NVIDIA Blackwell GPUs installed in my local system, under the Windows 11 WSL environment, via Docker Desktop.
Does anyone have any ideas on how to resolve this and correctly run NVIDIA Nemotron 3 Nano NVFP4?
