Getting '502 Bad Gateway' on requesting a long prompt with Qwen 3 Next and 4x4090

My machine:

  • CPU: 2 x AMD EPYC 7502 32-Core
  • GPU: 4 x NVIDIA GeForce RTX 4090 (48GB vRAM modified edition)
  • Memory: 256GB

Model: Qwen/Qwen3-Next-80B-A3B-Instruct BF16

Launch command: `vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --port 3000 --max-model-len 20000`

(vLLM version is the latest, python version is 3.11, didn’t use nginx for this)


I’m trying to run a long prompt and this went into error, this system instruction is about 4,000 tokens, and the input is about 3,000 tokens, I’m sure they are under 10k tokens.

What’s interesting is that when I send in this request into the server, I just can’t see any output from the terminal. Nothing like ‘get’ ‘ok’ . And if I use a shorter prompt, then magically, I can send the request without any trouble.

As you can see, I have already set the max model token to 20,000, but still I get this error, and I don’t know how to diagnose it.

Your issue is likely due to GPU memory exhaustion when handling long prompts, even though your --max-model-len is set to 20,000. Large models like Qwen3-Next-80B-A3B-Instruct require significant VRAM for long context lengths, and if the prompt is too long, vLLM may hang or fail silently without logging, especially if the GPU runs out of memory during KV cache allocation or prefill. This is a known behavior; reducing --max-model-len, --max-num-batched-tokens, or --max-num-seqs can help fit within available memory. Also, ensure you are not hitting the model’s internal context limit as defined in its config files, and try enabling more logging for debugging. See this issue and this memory guide for details.

To diagnose, try lowering --max-model-len (e.g., 8192 or 10000), set --max-num-batched-tokens to a lower value, and use export VLLM_LOGGING_LEVEL=DEBUG for more output. If the problem persists, monitor GPU memory usage with nvidia-smi during inference. If you want step-by-step troubleshooting or code examples, let me know.

Sources:

Hint: Mention @RunLLM in the post for followups.

Actually this wen queit strange, and I do fixed this issue by:

Remove the psedu-links in my system prompt.

Never expect this, but when I simply remove the pseudo links, magic happens, everything’s working.