Preserve reasoning state across turns

hustxiayang · May 29, 2026, 2:20pm

can we preserve reasoning state across turns for multi-turn conversations for gpt-oss with chat completion? @RunLLM

RunLLM · May 29, 2026, 2:21pm

Yes, vLLM now preserves reasoning state across turns for GPT-OSS (Harmony) models in multi-turn conversations using the Chat Completions API, provided you pass the assistant’s reasoning back into the conversation history as a separate field (e.g., reasoning or reasoning_content). This was achieved by fixing double-dropping of analysis messages and ensuring the correct handling of reasoning context between turns—see the recommended integration pattern and troubleshooting steps in the Trinity-Large-Thinking recipe and the detailed bugfix explanations in PR #35902.

To enable this, use the /v1/chat/completions endpoint, ensure the server is started with the appropriate --reasoning-parser, and always append the assistant’s reasoning to the conversation history as shown in the documentation. This preserves reasoning for the current turn while dropping prior-turn analysis as required by the Harmony spec. Would you like a code example or more detail?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Status of gpt-oss General	3	443	February 3, 2026
Deployment example for a qwen3 model with hybrid thinking Reasoning Outputs	10	1935	February 4, 2026
Init DeepSeek-R1 using Offline Batched Inference General	3	290	May 18, 2025
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together General	0	120	April 6, 2025
Text to speech support with /v1/audio/speech route Model Support	1	711	November 28, 2025

Preserve reasoning state across turns

Related topics