Hi,
Why this could appear? Tried to do multiple prompts same time.
Is this application side error?
ValueError: Conversation roles must alternate user/assistant/user/assistant/
Hi,
Why this could appear? Tried to do multiple prompts same time.
Is this application side error?
ValueError: Conversation roles must alternate user/assistant/user/assistant/
This error occurs because vLLM enforces strict alternation between user and assistant roles in the conversation history: after an optional system message, the roles must alternate as user/assistant/user/assistant, etc. If you send two user or two assistant messages in a row, vLLM will raise a ValueError with this message. This is not an application bug, but a requirement of vLLM’s chat template logic for most instruction-tuned models, and is enforced both in offline and OpenAI-compatible serving modes. Some models (like Mistral) are especially strict about this alternation, while others (like Llama) may be more permissive, but vLLM’s default behavior is to require alternation for all models unless a custom chat template is provided or the model’s template allows otherwise.
To resolve this, ensure your message list alternates roles correctly. For example: [system, user, assistant, user, assistant, ...]
. If you need more flexible role ordering, vLLM currently does not support it out-of-the-box, but you can override the chat template with the --chat-template
CLI option to relax this restriction, though model performance may degrade if the model was not trained for such input patterns. See further details in the linked issues and documentation for examples and workarounds. Would you like more detail or code examples?
Sources:
@RunLLM please explain how to send multiple prompts using the LLM interface and the vllm serve interface. The user is trying to send multiple prompts but is inadvertantly sending them in the same messages list.
To send multiple prompts with the LLM interface, pass a list of prompt strings to llm.generate(prompts=[...])
for batch inference. Each prompt should be a separate string in the list, not combined in a single conversation/messages list—this enables vLLM to process them in parallel and return outputs in the same order as the input prompts. For chat-style models, use llm.chat()
with one conversation per call, not multiple conversations in a single messages list. See API docs.
With the vllm serve (OpenAI-compatible) interface, the Completions API supports batched prompts by sending a JSON payload with a “prompt” field as a list of strings. However, the Chat Completions API does not support batching: you must send one conversation (messages list) per request. If you try to send multiple conversations in a single messages list, you’ll get an error. See vLLM Issue #16965 and Quickstart docs.
Would you like a code example for either interface?
Sources:
Explain how to send multiple chat style prompts using LLM and how to send parallel requests using the openai API
To send multiple chat-style prompts using the LLM interface, call llm.chat()
with a list of conversations, where each conversation is a list of messages (role/content dicts). For example, llm.chat([conversation1, conversation2, ...])
will process each conversation as a separate prompt and return a list of outputs in the same order. See LLM.chat API docs.
For the OpenAI-compatible API (vllm serve), the /v1/chat/completions
endpoint only supports one conversation per request. To send multiple chat prompts in parallel, you must send multiple HTTP requests concurrently (e.g., using threads or asyncio). Each request should contain a single messages list. See vLLM Issue #16965 and Quickstart docs.
Would you like code examples for either method?
Sources: