Which VLLM_ATTENTION_BACKEND should I use for gpt-oss in version 0.11.0? I used to have “TRITON_ATTN_VLLM_V1”, but it doesn’t work with 0.11.0 anymore, it shows “Value error, Invalid value ‘TRITON_ATTN_VLLM_V1’“. Please help!
The value “TRITON_ATTN_VLLM_V1” is not a valid option for VLLM_ATTENTION_BACKEND in vLLM 0.11.0. The accepted values are now “FLASH_ATTN”, “FLASHINFER”, or “XFORMERS”. You should set VLLM_ATTENTION_BACKEND to one of these, depending on your hardware and installed libraries. For example:
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
According to the official documentation, vLLM will automatically select the best backend if you do not set this variable, but you can override it with one of the supported values.
Would you like more detail on which backend is best for your hardware?
Sources:
Hint: Mention @RunLLM in the post for followups.