How to apply FA4 on B200?

I exported the following environment variables on 8xB200:

VLLM_ATTENTION_BACKEND=FLASH_ATTN
VLLM_FLASH_ATTN_VERSION=4

But vllm reported error like these:

(Worker_TP6 pid=10328) ERROR 12-17 23:53:18 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 308, in flash_attn_varlen_func
(Worker_TP6 pid=10328) ERROR 12-17 23:53:18 [multiproc_executor.py:822]     raise ValueError(f"Unsupported FA version: {fa_version}")
(Worker_TP6 pid=10328) ERROR 12-17 23:53:18 [multiproc_executor.py:822] ValueError: Unsupported FA version: None

Does vllm support FA4 for sm100 like sglang’s --prefill-attention-backend fa4 ?

My vllm version is 0.12.0 and CUDA is 13.0 with Nvidia driver 580.82.07.

vLLM does not support FlashAttention 4 (FA4) as an attention backend; only FA2 and FA3 are supported, and specifying FA4 will result in an “Unsupported FA version” error. The environment variable VLLM_FLASH_ATTN_VERSION only accepts values 2 or 3, and there is no mention of FA4 support for sm100 (B200) in the vLLM code or documentation as of version 0.12.0. The error you encountered is expected when using an unsupported FA version value[^1^][^2^][^3^].

If you need FA4-like features, you must use a different backend or framework (such as sglang) that explicitly supports FA4. For vLLM, set VLLM_FLASH_ATTN_VERSION=3 for Hopper (sm90) or compatible architectures, or use the recommended backend for your hardware. Would you like more detail on configuring vLLM attention backends?

Sources:

Hint: Mention @RunLLM in the post for followups.

Does vLLM have plans to support FA4 on sm100? If so, roughly when would that be? @RunLLM

It seems likely that vLLM is working on integrating FlashAttention 4 (FA4), as there is an open pull request for FA4 integration, but as of now, FA4 is not supported and the code path is unreachable due to version-selection logic that only allows FA2 or FA3. There is no official timeline or release date for FA4 support on sm100 (B200) in vLLM, and enabling FA4 currently triggers an error[^1^].

Would you like more detail on the current development status or how to track future updates?

Sources:

Hint: Mention @RunLLM in the post for followups.