Hello there!
I worked on a fine-tune of Mistral Small 3.2 2506 Instruct for some time now, and I just cannot seem to get it working with vLLM. The standard HF repo mistralai/Mistral-Small-3.2-24B-Instruct-2506 runs perfectly, but mine does not - so I assume I have some tiny difference in my repo that I can’t seem to figure out.
I use these command line arguments to launch the model (from Mistral’s docs) on the vLLM docker container:
# Mistral's model:
--host 0.0.0.0 --port 8000 --model mistralai/Mistral-Small-3.2-24B-Instruct-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit-mm-per-prompt '{"image":10}'
# my custom fine-tuned model:
--host 0.0.0.0 --port 8000 --model jacob-ml/jacob-24b --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit-mm-per-prompt '{"image":10}'
The default model by Mistral works fine, but mine raises this error:
No consolidated.safetensors.index.json found in remote.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
EngineCore failed to start.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[... some more traceback ...]
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
loaded_weights = model.load_weights(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 542, in load_weights
self.language_model.load_weights(llm_weights_generator())
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 615, in load_weights
return loader.load_weights(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
autoloaded_weights = set(self._load_module("", self.module, weights))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
raise ValueError(msg)
ValueError: There is no module or parameter named 'language_model' in LlamaForCausalLM
Here are the two repos to compare, maybe someone can see the mistake? I haven’t been able to:
The only difference I was able to find is that my consolidated.safetensors file is in a slightly different format, but that doesn’t fit to the error message I am getting.
Why is this happening? How can I tell vLLM that I’m not trying to use LlamaForCausalLM? Does anybody have an idea?
Any help is GREATLY appreciated! I’ve been trying to solve this for many hours now…
Thanks so much in advance! ![]()