Xformers is not available, falling back

youlearnit · September 13, 2025, 11:11pm

does my vllm docker install have a problem with 5090 because it says no xformers available?

(EngineCore_0 pid=110) INFO 09-13 16:09:20 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
(EngineCore_0 pid=110) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You’ll still be able to use a slow processor with use_fast=False.
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [gpu_model_runner.py:1953] Starting to load model google/gemma-3-12b-it…
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [gpu_model_runner.py:1985] Loading model from scratch…
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [cuda.py:345] Using FlexAttention backend for head_size=72 on V1 engine.
(EngineCore_0 pid=110) WARNING 09-13 16:09:27 [layer.py:52] Xformers is not available, falling back.
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [cuda.py:328] Using Flash Attention backend on V1 engine.
(EngineCore_0 pid=110) INFO 09-13 16:09:28 [weight_utils.py:296] Using model weights format [‘*.safetensors’]

RunLLM · September 13, 2025, 11:11pm

No, your vLLM Docker install is not broken just because it says “Xformers is not available, falling back.” On RTX 5090, vLLM will automatically select the best available attention backend. If xFormers is unavailable or unsupported, vLLM falls back to other backends like FlashAttention or FlexAttention, which are fully supported and performant on modern GPUs like the 5090. The warning is informational and does not indicate a failure, as vLLM continues using a supported backend (docs, source, source).

If your model loads and inference works, you can safely ignore the xFormers warning. Would you like more detail on backend selection or troubleshooting?

Sources:

Topic		Replies	Views
vLLM on RTX5090: Working GPU setup with torch 2.9.0 cu128 NVIDIA GPU Support	18	6232	January 13, 2026
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090 General	3	900	June 10, 2025
vLLM install for 5090 General	2	3689	October 31, 2025
Build issues when serving gpt-oss-20B on Tesla T4 GPUs with vLLM General	9	913	September 30, 2025
Installation issue General	8	568	July 3, 2025

Xformers is not available, falling back

Related topics