When multiple system prompts are provided, eg. for RAG, the quality dips, while if I concat them with “\n”, the output is the exact same as the API with 0 temperature.
It seems vLLM concats them with “\n\n” instead which breaks the model as it hallucinates so bad.
Did anyone notice this behavior or is it my vLLM setup acting weird?
It seems likely that vLLM concatenates multiple system prompts with a double newline (\n\n), which may differ from the official DashScope API’s behavior and can negatively affect model output quality, especially for Qwen3-VL models. Concatenating system prompts with a single newline (\n) appears to produce outputs more consistent with the official API, particularly at temperature 0. This behavior is not explicitly documented in vLLM’s official guides, but similar issues and prompt formatting nuances have been discussed in the community, suggesting that prompt formatting (including how system prompts are joined) can impact model performance and hallucination rates in vLLM, especially for RAG and multi-system scenarios. My understanding is that this is not unique to your setup and may require manual prompt formatting for best results with vLLM and Qwen3-VL models.
Would you like more detail on how to manually format prompts or on vLLM’s prompt handling logic for system messages?