Qwen 3 VL System Prompt

uetuluk · October 15, 2025, 7:47am

I am experimenting with QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ · Hugging Face and I have noticed there is an issue with how vLLM handles multiple system prompts compared to DashScope API (official qwen provider).

When multiple system prompts are provided, eg. for RAG, the quality dips, while if I concat them with “\n”, the output is the exact same as the API with 0 temperature.

It seems vLLM concats them with “\n\n” instead which breaks the model as it hallucinates so bad.

Did anyone notice this behavior or is it my vLLM setup acting weird?

RunLLM · October 15, 2025, 7:47am

It seems likely that vLLM concatenates multiple system prompts with a double newline (\n\n), which may differ from the official DashScope API’s behavior and can negatively affect model output quality, especially for Qwen3-VL models. Concatenating system prompts with a single newline (\n) appears to produce outputs more consistent with the official API, particularly at temperature 0. This behavior is not explicitly documented in vLLM’s official guides, but similar issues and prompt formatting nuances have been discussed in the community, suggesting that prompt formatting (including how system prompts are joined) can impact model performance and hallucination rates in vLLM, especially for RAG and multi-system scenarios. My understanding is that this is not unique to your setup and may require manual prompt formatting for best results with vLLM and Qwen3-VL models.

Would you like more detail on how to manually format prompts or on vLLM’s prompt handling logic for system messages?

Sources:

Hint: Mention @RunLLM in the post for followups.

DarkLight1337 · October 15, 2025, 8:19am

You can consider changing the chat template to handle multiple system prompts. We just use the default one from HF Hub.

uetuluk · October 15, 2025, 8:32am

Thanks for replying.

So it was simply dropping the other system messages, I guess API’s concat them.

Topic		Replies	Views
Deployment example for a qwen3 model with hybrid thinking Reasoning Outputs	10	1960	February 4, 2026
Jina AI VLM - Support help General	3	89	October 12, 2025
Value error Conversation roles must alternate General	5	1057	August 14, 2025
Text to speech support with /v1/audio/speech route Model Support	1	716	November 28, 2025
LLM memory caching General	7	119	May 7, 2026

Qwen 3 VL System Prompt

Related topics