How to set different attention backend for prefill and decode stage?

Just like “–decode-attention-backend flashinfer“ and “–prefill-attention-backend fa3“.

vLLM does not provide separate CLI options like --decode-attention-backend and --prefill-attention-backend to independently set the attention backend for decode and prefill phases. Instead, you can set the overall attention backend using the --attention-backend argument or the VLLM_ATTENTION_BACKEND environment variable, and optionally specify the FlashAttention version with --attention-config.flash_attn_version (e.g., 2 or 3) for all phases. There is no documented way to assign different backends for prefill and decode phases via CLI or config in the official documentation or API references as of now.

For more granular control, you may use the AttentionConfig class to configure backend and version, but this still applies globally, not per-phase. See AttentionConfig documentation and Quickstart guide for details. Would you like more detail on configuring attention backends or on backend selection logic?

Sources:

Hint: Mention @RunLLM in the post for followups.

Is there any feature or PR to independently set the attention backend for decode and prefill phases?