Warning while serving Qwen/Qwen3.6-35B-A3B-FP8

RunLLM · April 20, 2026, 6:13am

The warning (EngineCore pid=77) /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fla/ops/utils.py:113: UserWarning: Input tensor shape suggests potential format mismatch: seq_len (16) < num_heads (32)... is expected and not an error. It occurs when the sequence length is less than the number of attention heads, which can happen during normal vLLM inference, especially with chunked prefill or small input batches. This warning does not indicate a malfunction and can be safely ignored unless you observe incorrect outputs or crashes. See discussion in vllm-project/vllm#37103.

Would you like more detail on why this warning appears or how to suppress it?

Sources:

vllm-project/vllm#37103

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Disabling reasoning of Qwen3-VL-8B-Thinking per request Model Support	1	3682	October 29, 2025
Trying to run Qwen3.5-397B-A17B-GPTQ-Int4 Model Support	10	544	March 13, 2026
Qwen3.5-27b-fp8没有think General	3	350	March 30, 2026
RTX PRO6000运行Qwen3-32B-FP8量化版本出错 General	1	509	July 4, 2025
RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) NVIDIA GPU Support	1	225	November 19, 2025

Warning while serving Qwen/Qwen3.6-35B-A3B-FP8

Related topics