Warning while serving Qwen/Qwen3.6-35B-A3B-FP8

The warning (EngineCore pid=77) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (16) < num_heads (32)... is expected and not an error. It occurs when the sequence length is less than the number of attention heads, which can happen during normal vLLM inference, especially with chunked prefill or small input batches. This warning does not indicate a malfunction and can be safely ignored unless you observe incorrect outputs or crashes. See discussion in vllm-project/vllm#37103.

Would you like more detail on why this warning appears or how to suppress it?

Sources:

Hint: Mention @RunLLM in the post for followups.