Why is there no effect after changing chat template file?

I changed the chat_template.jinja file in ~/.cache/huggingface/hub/some-model and expected it will add <think>\n\n</think>\n\n automaticly ( the model I use was in the same directory), but that did not make any sense, so I had to write another chat template file and use --chat-template xxxx to achieve this goal

I’ll be very appreciate for any help

Modifying the chat_template.jinja in the Hugging Face cache directory does not affect vLLM’s behavior, because vLLM does not automatically reload or use modified files from the cache. To use a custom chat template, you must explicitly specify it with the --chat-template argument when starting vLLM, as you discovered. This is the correct and only supported way to override the chat template in vLLM according to the documentation and user reports. Direct edits to the cache are ignored by vLLM’s loading logic, which expects either a template in the tokenizer config or a path/string provided via --chat-template at launch time. See the official documentation for details on chat template usage and overriding: vLLM OpenAI-Compatible Server: Chat Template.

Would you like more detail on how vLLM loads and applies chat templates?

Sources:

Hint: Mention @RunLLM in the post for followups.

1 Like

thank u so much, that’s so fast, r u true person?

even if I restart the serve cli can’t make it reload the chat template file?

RunLLM is a bot.

Restarting should reload the file. But you have to make sure HF Transformers isn’t overwriting it.

1 Like