Which ATTENTION BACKEND for gpt-oss in version 0.11.0?

thomas · October 4, 2025, 1:21am

Which VLLM_ATTENTION_BACKEND should I use for gpt-oss in version 0.11.0? I used to have “TRITON_ATTN_VLLM_V1”, but it doesn’t work with 0.11.0 anymore, it shows “Value error, Invalid value ‘TRITON_ATTN_VLLM_V1’“. Please help!

RunLLM · October 4, 2025, 1:22am

The value “TRITON_ATTN_VLLM_V1” is not a valid option for VLLM_ATTENTION_BACKEND in vLLM 0.11.0. The accepted values are now “FLASH_ATTN”, “FLASHINFER”, or “XFORMERS”. You should set VLLM_ATTENTION_BACKEND to one of these, depending on your hardware and installed libraries. For example:

export VLLM_ATTENTION_BACKEND=FLASH_ATTN

According to the official documentation, vLLM will automatically select the best backend if you do not set this variable, but you can override it with one of the supported values.

Would you like more detail on which backend is best for your hardware?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How to set different attention backend for prefill and decode stage? General	2	307	January 6, 2026
Building vLLM with Pytorch 2.9 General	1	398	October 15, 2025
How to apply FA4 on B200? NVIDIA GPU Support	3	579	December 18, 2025
Build issues when serving gpt-oss-20B on Tesla T4 GPUs with vLLM General	9	956	September 30, 2025
Flash Attention version General	1	4204	September 27, 2025

Which ATTENTION BACKEND for gpt-oss in version 0.11.0?

Related topics