HarmonyError: Unexpected token 200002 while expecting start token 200006

I keep getting this error when performing benchmark using vllm bench serve

(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] Error in chat completion stream generator.
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] Traceback (most recent call last):
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py”, line 741, in chat_completion_stream_generator
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] harmony_parser.process(token_id)
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] File “/usr/local/lib/python3.12/dist-packages/openai_harmony/init.py”, line 637, in process
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] self._inner.process(token)
(APIServer pid=1) ERROR 01-14 07:41:25 [serving_chat.py:1305] openai_harmony.HarmonyError: Unexpected token 200002 while expecting start token 200006

This error, openai_harmony.HarmonyError: Unexpected token 200002 while expecting start token 200006, is a known issue when using vLLM with GPT-OSS models and the Harmony parser. It is typically caused by a mismatch or bug in the chat template or model output, especially after fine-tuning or when using certain templates (e.g., LlamaFactory’s GPT template with efficient_eos=True). The model emits an invalid token sequence, which Harmony’s decoding logic rejects, resulting in this error. Updating the chat template or disabling efficient_eos can resolve the issue. Also, ensure you are using the latest model files and generation config from Hugging Face, as fixes have been made upstream for this problem. For some users, switching to the /v1/responses endpoint instead of /v1/chat/completions also avoids the error for tool-calling scenarios.

For more details and troubleshooting steps, see this GitHub issue and the official troubleshooting guide. Would you like a step-by-step breakdown of how to fix this in your setup?

Sources:

Hint: Mention @RunLLM in the post for followups.