Status of gpt-oss

@RunLLM are there any improvements on the support for gpt-oss models since 0.11.0 until now? for example, whether it can support response_format now

Yes, there have been improvements in GPT-OSS support since vLLM 0.11.0, including better handling of tool calling, Harmony format, and structured outputs. However, there are still known issues with response_format: {"type": "json_object"} in multi-turn chat completions for GPT-OSS models—this mode can return content: null or invalid JSON, making it unsafe for production in multi-turn scenarios. Single-turn and non-JSON modes work as intended, and some users report that downgrading to vLLM 0.10.1 can “fix” the multi-turn JSON bug. The /v1/responses endpoint is recommended for tool use and structured output, as it leverages the Harmony format and is more robust for GPT-OSS models. Ongoing work and recent PRs continue to address these issues, but as of vLLM 0.14.0, the multi-turn JSON bug remains unresolved for chat completions with GPT-OSS models according to user reports and open issues[^1][^2][^3].

Would you like more detail on specific improvements or workarounds?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM give me details on what is improved?

Key improvements for GPT-OSS support in vLLM since 0.11.0 include:

  1. Harmony Format & Tool Calling: Major bugfixes and enhancements were made to Harmony format parsing and tool calling, especially for multi-turn and streaming scenarios. The Chat Completion API and tool call parser were improved to better support GPT-OSS’s built-in tools and reasoning channels. Streaming tool call handling was also fixed to activate the correct Harmony channels, reducing tool call extraction errors (PR #24768).

  2. Structured Outputs & Response Format: Structured output support (e.g., JSON, regex, grammar) was expanded and stabilized, including validation and backend improvements (structured outputs docs). However, as noted, multi-turn chat completions with response_format: {"type": "json_object"} still have unresolved bugs, but single-turn and /v1/responses endpoint usage are more robust (issue #32791, structured outputs example).

  3. MCP Tool Integration: vLLM now supports integration with arbitrary MCP servers, allowing GPT-OSS to call custom tools beyond built-in ones (issue #26703).

  4. General Stability & Parsing: Numerous bugfixes addressed Harmony parsing errors, tool call leakage, and improved multi-turn conversation handling (PR #28729, issue #23567).

Let me know if you want step-by-step details or code examples for any of these improvements.

Sources:

Hint: Mention @RunLLM in the post for followups.