Preserve reasoning state across turns

can we preserve reasoning state across turns for multi-turn conversations for gpt-oss with chat completion? @RunLLM

Yes, vLLM now preserves reasoning state across turns for GPT-OSS (Harmony) models in multi-turn conversations using the Chat Completions API, provided you pass the assistant’s reasoning back into the conversation history as a separate field (e.g., reasoning or reasoning_content). This was achieved by fixing double-dropping of analysis messages and ensuring the correct handling of reasoning context between turns—see the recommended integration pattern and troubleshooting steps in the Trinity-Large-Thinking recipe and the detailed bugfix explanations in PR #35902.

To enable this, use the /v1/chat/completions endpoint, ensure the server is started with the appropriate --reasoning-parser, and always append the assistant’s reasoning to the conversation history as shown in the documentation. This preserves reasoning for the current turn while dropping prior-turn analysis as required by the Harmony spec. Would you like a code example or more detail?

Sources:

Hint: Mention @RunLLM in the post for followups.