Getting Value error, torch.bfloat16 is not supported for quantization method gguf. Supported dtypes: [torch.float16, torch.float32] [type=value_error, input_value=ArgsKwargs((), {‘model_co…transfer_config’: None}), input_type=ArgsKwargs]
The error occurs because torch.bfloat16 is not supported for GGUF quantized models in vLLM; only torch.float16 and torch.float32 are allowed. You must explicitly set --dtype float16 or --dtype float32 when serving GGUF models to avoid this error—auto may select an unsupported dtype depending on your hardware and model config.
Value error, The model type ‘gemma3_text’ does not support float16. Reason: Numerical instability. Please use bfloat16 or float32 instead. [type=value_error, input_value=ArgsKwargs((), {‘model’: …rocessor_plugin’: None}), input_type=ArgsKwargs]
The error means the model type gemma3_text is known to be numerically unstable with float16; you must use bfloat16 or float32 as the dtype. This is enforced in vLLM for Gemma3 models due to precision issues with float16, as documented in the code and issues (see _FLOAT16_NOT_SUPPORTED_MODELS).
Set --dtype bfloat16 or --dtype float32 when serving or using this model to resolve the error.