The error occurs because your installed vLLM version does not recognize minimax_m2_append_think as a valid --reasoning-parser option. This means your vLLM is outdated or not built from a recent enough nightly that includes MiniMax-M2 support. The official MiniMax-M2 usage guide requires a vLLM version where minimax_m2_append_think is available as a reasoning parser.
To fix this, ensure you install the latest vLLM nightly build from the correct index. Use:
Then retry your serve command. This should add minimax_m2_append_think as a valid option. If you still see the error, verify your vLLM version and that the install succeeded. Would you like a step-by-step troubleshooting guide?
Is it required to have 4xH100 GPUs. When I run on 2xH100, I am getting
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 11-08 05:16:02 [parallel_state.py:1325] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 11-08 05:16:02 [parallel_state.py:1325] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
(Worker_TP0 pid=2134538) INFO 11-08 05:16:03 [gpu_model_runner.py:2944] Starting to load model MiniMaxAI/MiniMax-M2…
(Worker_TP1 pid=2134539) INFO 11-08 05:16:03 [cuda.py:409] Using Flash Attention backend.
(Worker_TP1 pid=2134539) WARNING 11-08 05:16:03 [fp8.py:165] DeepGEMM backend requested but not available.
(Worker_TP1 pid=2134539) INFO 11-08 05:16:03 [fp8.py:180] Using Triton backend for FP8 MoE
(Worker_TP0 pid=2134538) INFO 11-08 05:16:03 [cuda.py:409] Using Flash Attention backend.
(Worker_TP0 pid=2134538) WARNING 11-08 05:16:03 [fp8.py:165] DeepGEMM backend requested but not available.
(Worker_TP0 pid=2134538) INFO 11-08 05:16:03 [fp8.py:180] Using Triton backend for FP8 MoE
(Worker_TP1 pid=2134539) ERROR 11-08 05:16:04 [multiproc_executor.py:646] WorkerProc failed to start.