Mistral Small 3.2 finetune errors out: There is no module or parameter named 'language_model' in LlamaForCausalLM

Hello there!

I worked on a fine-tune of Mistral Small 3.2 2506 Instruct for some time now, and I just cannot seem to get it working with vLLM. The standard HF repo mistralai/Mistral-Small-3.2-24B-Instruct-2506 runs perfectly, but mine does not - so I assume I have some tiny difference in my repo that I can’t seem to figure out.

I use these command line arguments to launch the model (from Mistral’s docs) on the vLLM docker container:

# Mistral's model:
--host 0.0.0.0 --port 8000 --model mistralai/Mistral-Small-3.2-24B-Instruct-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit-mm-per-prompt '{"image":10}'

# my custom fine-tuned model:
--host 0.0.0.0 --port 8000 --model jacob-ml/jacob-24b --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit-mm-per-prompt '{"image":10}'

The default model by Mistral works fine, but mine raises this error:

No consolidated.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
EngineCore failed to start.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [... some more traceback ...]
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
    loaded_weights = model.load_weights(
                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 542, in load_weights
    self.language_model.load_weights(llm_weights_generator())
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 615, in load_weights
    return loader.load_weights(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
    autoloaded_weights = set(self._load_module("", self.module, weights))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
    raise ValueError(msg)
ValueError: There is no module or parameter named 'language_model' in LlamaForCausalLM

Here are the two repos to compare, maybe someone can see the mistake? I haven’t been able to:

The only difference I was able to find is that my consolidated.safetensors file is in a slightly different format, but that doesn’t fit to the error message I am getting.

Why is this happening? How can I tell vLLM that I’m not trying to use LlamaForCausalLM? Does anybody have an idea?

Any help is GREATLY appreciated! I’ve been trying to solve this for many hours now…

Thanks so much in advance! :slight_smile:

Your error, “There is no module or parameter named ‘language_model’ in LlamaForCausalLM,” indicates that vLLM is trying to load your fine-tuned model as a Llama architecture, not as a Mistral 3.2 (Pixtral/Mistral3) model. This usually happens if the config.json in your repo has the wrong “architectures” or “model_type” fields, or if the file structure/weight format doesn’t match what vLLM expects for Mistral 3.2 models. The official Mistral repo uses “Mistral3ForConditionalGeneration” and “model_type”: “mistral3” in config.json, and the weights are sharded as model-00001-of-000XX.safetensors, not consolidated.safetensors. Your repo appears to use a different format, which vLLM does not recognize as a Mistral 3.2 model, so it falls back to LlamaForCausalLM and fails to find the expected parameters, causing the error you see.

To fix this, ensure your config.json matches the official Mistral 3.2 format (especially “architectures” and “model_type”), and that your weights are sharded as model-00001-of-000XX.safetensors, not consolidated.safetensors. Also, make sure you have all the same files as the official repo, including params.json if required. For more details, see the discussion and troubleshooting steps in this vLLM issue and the official Mistral-Small-3.2-24B-Instruct-2506 repo. Would you like a step-by-step checklist to align your repo with the official format?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM If I’m not mistaken, Mistral uses their custom “mistral” loading format, which is requiring, as per vLLMs codebase, a consolidated.safetensors file. The claim

the weights are sharded as model-00001-of-000XX.safetensors, not consolidated.safetensors. Your repo appears to use a different format, which vLLM does not recognize as a Mistral 3.2 model

might thus be a little bit inaccurate.

On top, my repo also provides the sharded weights, so that’s likely not it. The config of my repo is also based on Mistral’s provided one and clearly states the architecture to be Mistral3ForConditionalGeneration and the model type to be mistral3.