Status of gpt-oss

hustxiayang · February 3, 2026, 4:18pm

@RunLLM are there any improvements on the support for gpt-oss models since 0.11.0 until now? for example, whether it can support response_format now

RunLLM · February 3, 2026, 4:19pm

Yes, there have been improvements in GPT-OSS support since vLLM 0.11.0, including better handling of tool calling, Harmony format, and structured outputs. However, there are still known issues with response_format: {"type": "json_object"} in multi-turn chat completions for GPT-OSS models—this mode can return content: null or invalid JSON, making it unsafe for production in multi-turn scenarios. Single-turn and non-JSON modes work as intended, and some users report that downgrading to vLLM 0.10.1 can “fix” the multi-turn JSON bug. The /v1/responses endpoint is recommended for tool use and structured output, as it leverages the Harmony format and is more robust for GPT-OSS models. Ongoing work and recent PRs continue to address these issues, but as of vLLM 0.14.0, the multi-turn JSON bug remains unresolved for chat completions with GPT-OSS models according to user reports and open issues[^1][^2][^3].

Would you like more detail on specific improvements or workarounds?

Sources:

Bug: chat.completions returns content: null for GPT-OSS multi-turn with json_object

Hint: Mention @RunLLM in the post for followups.

hustxiayang · February 3, 2026, 4:19pm

@RunLLM give me details on what is improved?

RunLLM · February 3, 2026, 4:21pm

Key improvements for GPT-OSS support in vLLM since 0.11.0 include:

Harmony Format & Tool Calling: Major bugfixes and enhancements were made to Harmony format parsing and tool calling, especially for multi-turn and streaming scenarios. The Chat Completion API and tool call parser were improved to better support GPT-OSS’s built-in tools and reasoning channels. Streaming tool call handling was also fixed to activate the correct Harmony channels, reducing tool call extraction errors (PR #24768).
Structured Outputs & Response Format: Structured output support (e.g., JSON, regex, grammar) was expanded and stabilized, including validation and backend improvements (structured outputs docs). However, as noted, multi-turn chat completions with response_format: {"type": "json_object"} still have unresolved bugs, but single-turn and /v1/responses endpoint usage are more robust (issue #32791, structured outputs example).
MCP Tool Integration: vLLM now supports integration with arbitrary MCP servers, allowing GPT-OSS to call custom tools beyond built-in ones (issue #26703).
General Stability & Parsing: Numerous bugfixes addressed Harmony parsing errors, tool call leakage, and improved multi-turn conversation handling (PR #28729, issue #23567).

Let me know if you want step-by-step details or code examples for any of these improvements.

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
GPT-oss inference General	1	597	August 23, 2025
GPT OSS vllm server General	2	569	November 18, 2025
Preserve reasoning state across turns General	1	68	May 29, 2026
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together General	0	131	April 6, 2025
Jina AI VLM - Support help General	3	91	October 12, 2025

Status of gpt-oss

Related topics