vLLM does not support FlashAttention 4 (FA4) as an attention backend; only FA2 and FA3 are supported, and specifying FA4 will result in an “Unsupported FA version” error. The environment variable VLLM_FLASH_ATTN_VERSION only accepts values 2 or 3, and there is no mention of FA4 support for sm100 (B200) in the vLLM code or documentation as of version 0.12.0. The error you encountered is expected when using an unsupported FA version value[^1^][^2^][^3^].
If you need FA4-like features, you must use a different backend or framework (such as sglang) that explicitly supports FA4. For vLLM, set VLLM_FLASH_ATTN_VERSION=3 for Hopper (sm90) or compatible architectures, or use the recommended backend for your hardware. Would you like more detail on configuring vLLM attention backends?
It seems likely that vLLM is working on integrating FlashAttention 4 (FA4), as there is an open pull request for FA4 integration, but as of now, FA4 is not supported and the code path is unreachable due to version-selection logic that only allows FA2 or FA3. There is no official timeline or release date for FA4 support on sm100 (B200) in vLLM, and enabling FA4 currently triggers an error[^1^].
Would you like more detail on the current development status or how to track future updates?