This error, openai_harmony.HarmonyError: Unexpected token 200002 while expecting start token 200006, is a known issue when using vLLM with GPT-OSS models and the Harmony parser. It is typically caused by a mismatch or bug in the chat template or model output, especially after fine-tuning or when using certain templates (e.g., LlamaFactory’s GPT template with efficient_eos=True). The model emits an invalid token sequence, which Harmony’s decoding logic rejects, resulting in this error. Updating the chat template or disabling efficient_eos can resolve the issue. Also, ensure you are using the latest model files and generation config from Hugging Face, as fixes have been made upstream for this problem. For some users, switching to the /v1/responses endpoint instead of /v1/chat/completions also avoids the error for tool-calling scenarios.