Is it possible to deploy minimax-m2 using 2A100 and 4A10?

zxdvd · December 22, 2025, 12:58pm

I have 2 *A100 80G and 4 A10 on 2 server. I want to deploy minimax-m2 (awq).
Any suggested way to deploy it?

Gemini suggested to use PD disaggregation but it seems that 4* * A10 cannot hold all the weights.

RunLLM · December 22, 2025, 12:58pm

You cannot deploy MiniMax-M2 (AWQ) on 4×A10 GPUs if the total GPU memory is insufficient to hold all model weights, even with pipeline/data (PD) disaggregation. vLLM requires that the combined GPU memory across all devices is enough for the model, and there is no documented support for offloading MiniMax-M2 weights to CPU or disk to bypass this limit. PD disaggregation is mainly for separating prefill and decode workloads, not for splitting model weights beyond available GPU memory. According to the MiniMax-M2 usage guide, deployment examples use 4×A100/A800/H200/H20, but do not mention support for heterogeneous or memory-insufficient clusters.

If you want to maximize your available hardware, you could deploy MiniMax-M2 on the 2×A100 80G (which is sufficient for the model), and use the 4×A10s for other, smaller models. There is no official documentation or example for combining A100s and A10s in a single vLLM deployment for MiniMax-M2, nor for using CPU offload to make up for insufficient GPU memory for this model. Would you like more detail on PD disaggregation or alternative deployment strategies?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Support for MiniMax-M2.5 Model Support	1	103	April 14, 2026
Does vllm support deploy multiple docker instance on one gpu General	1	913	May 21, 2025
Serving minimax-m2 Model Support	3	459	November 8, 2025
Multiple vLLM Engine Deployment problem on NVIDIA's SM Engines General	1	287	October 2, 2025
Run vLLM on two diffrent GPU General	1	1021	May 21, 2025

Is it possible to deploy minimax-m2 using 2*A100 and 4*A10?

Related topics

Is it possible to deploy minimax-m2 using 2A100 and 4A10?