How to run gemma4 with MTP using rocm?
I get always vllm serve: error: argument --speculative-config/-sc: Value {method:mtp,num_speculative_tokens:1} cannot be converted to <function loads at 0x7d0da081b560>
no matter which docker it is, nightly, latest or gemma4 specific.
command in the serve is:
–speculative-config ‘{“method”:“mtp”,“num_speculative_tokens”:1}’"
The error is caused by incorrect quoting or formatting of the --speculative-config argument. vLLM expects a valid JSON string, so you must use straight double quotes (") and not curly or smart quotes (“ ”), and the argument should be passed as a single string. For example:
This applies to all vLLM Docker images, including ROCm. According to the vLLM documentation, and as shown in the argument parser source, the argument is parsed with json.loads, so invalid JSON or wrong quotes will cause this error. Make sure you use single quotes to wrap the entire JSON string and double quotes inside, and avoid any non-ASCII or smart quotes.
Would you like more detail or troubleshooting steps?
QuantTrio/gemma-4-31B-it-AWQ-6Bit does not support MTP (Multi-Token Prediction) speculative decoding in vLLM. MTP is only supported for Gemma 4 “assistant” checkpoints (e.g., google/gemma-4-31B-it-assistant), not for AWQ-quantized or standard IT models. Attempting to use MTP with this model will result in errors or unsupported behavior, as documented in the MTP guide.
Would you like more detail on which Gemma 4 models support MTP or how to select a compatible model?
Only the official Gemma 4 31B IT assistant checkpoint (google/gemma-4-31B-it-assistant) supports MTP speculative decoding on ROCm GPUs like the 7900 XTX. AWQ, GPTQ, or other quantized versions do not support MTP; you must use the assistant model as the drafter and the standard IT model as the target (google/gemma-4-31B-it).
Reference: The Gemma 4 MTP documentation and MTP guide specify that only the official assistant checkpoints are supported for MTP, and there is no mention of AWQ or other quantized variants being compatible.
Would you like a sample command for ROCm/7900 XTX?
Your command uses invalid (smart/curly) quotes (“ ” and ‘ ’) instead of standard ASCII double (") and single (') quotes. vLLM requires valid JSON for --speculative-config, so you must use only straight quotes. Also, ensure all docker flags use double dashes (–), not en-dashes (–).