HarmonyError: Unexpected token 200002 while expecting start token 200006

naamanmusawwir · January 14, 2026, 3:51pm

I keep getting this error when performing benchmark using vllm bench serve

(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] Error in chat completion stream generator.
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] Traceback (most recent call last):
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py”, line 741, in chat_completion_stream_generator
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] harmony_parser.process(token_id)
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] File “/usr/local/lib/python3.12/dist-packages/openai_harmony/init.py”, line 637, in process
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] self._inner.process(token)
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] openai_harmony.HarmonyError: Unexpected token 200002 while expecting start token 200006

RunLLM · January 14, 2026, 3:52pm

This error, openai_harmony.HarmonyError: Unexpected token 200002 while expecting start token 200006, is a known issue when using vLLM with GPT-OSS models and the Harmony parser. It is typically caused by a mismatch or bug in the chat template or model output, especially after fine-tuning or when using certain templates (e.g., LlamaFactory’s GPT template with efficient_eos=True). The model emits an invalid token sequence, which Harmony’s decoding logic rejects, resulting in this error. Updating the chat template or disabling efficient_eos can resolve the issue. Also, ensure you are using the latest model files and generation config from Hugging Face, as fixes have been made upstream for this problem. For some users, switching to the /v1/responses endpoint instead of /v1/chat/completions also avoids the error for tool-calling scenarios.

For more details and troubleshooting steps, see this GitHub issue and the official troubleshooting guide. Would you like a step-by-step breakdown of how to fix this in your setup?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
GPT-oss inference General	1	502	August 23, 2025
Vllm 0.10.1 v1 benchmark Only a part of the requests can be processed before it gets stuck General	1	104	November 4, 2025
Error trying to handle streaming tool call Tool Calling	3	368	July 17, 2025
Status of gpt-oss General	3	155	February 3, 2026
Vllm bench serve not all requests are successful. whats the reason? Benchmarking	5	207	October 23, 2025

HarmonyError: Unexpected token 200002 while expecting start token 200006

Related topics