NVIDIA has released it’s own version of the vllm docker image with optimizations for NVIDIA DGX Spark (and some other recent hardware architecture). vLLM Release 25.09 - NVIDIA Docs
Can we expect to see these incorporated in the main vllm repos? (It’s not clear why they have implemented this as a separate container instead of contributing back to the main project, but of course that’s a question for NVIDIA itself, not the community. I couldn’t find a forum or issues tracker for their fork though, any pointers appreciated).
Specifically, I’ve noticed the dgx spark needs the custom image, as trying to use vllm/vllm-openai image results in an erroneous ‘out of memory’ error on the same model. In other aspects though the vllm image is not up-to-date (e.g. has an older version of transformers, so we can’t run Qwen3-Next or swissai/apertus).
More generally would welcome any suggestions or discussion about vllm on the Spark!
NVIDIA’s vLLM container (Release 25.09) is a custom build with optimizations for DGX Spark, Jetson, Blackwell, and other recent NVIDIA hardware, including support for FP8/NVFP4 and multi-node configs. This container is separate from the main vLLM project and is not maintained or updated in sync with the official vLLM Docker images, which may explain the lag in dependencies like transformers and the need for the NVIDIA image on Spark for correct memory handling and hardware support. There is no public issues tracker or forum for NVIDIA’s fork; feedback channels are not documented in the release notes or official docs—NVIDIA’s developer forums may be the best place to ask for support or request upstreaming of changes, but no direct link is provided in the release notes or container docs (NVIDIA vLLM Release 25.09).
As for mainline vLLM, there is ongoing work to add and improve support for new architectures (including ARM64, Blackwell, and Spark) and multi-arch Docker images, but NVIDIA’s specific optimizations are not yet merged or available in the official vLLM Docker images (ci: Add arm64 docker build to release pipeline). For now, using NVIDIA’s container is necessary for full Spark support, but you may need to manually update dependencies inside the container for newer models. Would you like more detail on how to update dependencies or track upstream support for Spark in vLLM?