I changed the chat_template.jinja file in ~/.cache/huggingface/hub/some-model and expected it will add <think>\n\n</think>\n\n automaticly ( the model I use was in the same directory), but that did not make any sense, so I had to write another chat template file and use --chat-template xxxx to achieve this goal
Modifying the chat_template.jinja in the Hugging Face cache directory does not affect vLLM’s behavior, because vLLM does not automatically reload or use modified files from the cache. To use a custom chat template, you must explicitly specify it with the --chat-template argument when starting vLLM, as you discovered. This is the correct and only supported way to override the chat template in vLLM according to the documentation and user reports. Direct edits to the cache are ignored by vLLM’s loading logic, which expects either a template in the tokenizer config or a path/string provided via --chat-template at launch time. See the official documentation for details on chat template usage and overriding: vLLM OpenAI-Compatible Server: Chat Template.
Would you like more detail on how vLLM loads and applies chat templates?