Xformers is not available, falling back

does my vllm docker install have a problem with 5090 because it says no xformers available?

(EngineCore_0 pid=110) INFO 09-13 16:09:20 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
(EngineCore_0 pid=110) Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You’ll still be able to use a slow processor with use_fast=False.
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [gpu_model_runner.py:1953] Starting to load model google/gemma-3-12b-it…
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [gpu_model_runner.py:1985] Loading model from scratch…
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [cuda.py:345] Using FlexAttention backend for head_size=72 on V1 engine.
(EngineCore_0 pid=110) WARNING 09-13 16:09:27 [layer.py:52] Xformers is not available, falling back.
(EngineCore_0 pid=110) INFO 09-13 16:09:27 [cuda.py:328] Using Flash Attention backend on V1 engine.
(EngineCore_0 pid=110) INFO 09-13 16:09:28 [weight_utils.py:296] Using model weights format [‘*.safetensors’]

No, your vLLM Docker install is not broken just because it says “Xformers is not available, falling back.” On RTX 5090, vLLM will automatically select the best available attention backend. If xFormers is unavailable or unsupported, vLLM falls back to other backends like FlashAttention or FlexAttention, which are fully supported and performant on modern GPUs like the 5090. The warning is informational and does not indicate a failure, as vLLM continues using a supported backend (docs, source, source).

If your model loads and inference works, you can safely ignore the xFormers warning. Would you like more detail on backend selection or troubleshooting?

Sources: