How to load the model successfully through multi-card in vllm?

SmallHappyJerry · March 27, 2025, 3:54pm

How to load the model successfully through multi-card in vllm? As a beginner, how can I learn and understand this problem? Is there a recommended study blog?

youkaichao · March 28, 2025, 1:54am

you can try examples in vllm/examples/offline_inference/basic at main · vllm-project/vllm · GitHub

SmallHappyJerry · March 28, 2025, 5:59am

Thanks, I’ll give it a try

chris · April 2, 2025, 12:28pm

Is this it? from REAME.md? Some of these models are likely to be too large for a single GPU. You can split them across multiple GPUs by setting –tensor-parallel-size to the number of required GPUs.

(I take “multi-card” to mean “2 or more GPUs” inside one computer)

This is probably something different but just yesterday I found Accelerate

I think Accelerate is for assembling different computers into one big virtual resource for PyTorch to consume. Is that right, just very generally speaking?

SmallHappyJerry · April 2, 2025, 12:55pm

Well, that’s pretty much what it means.
For this, I’ve recently been looking at vllm/vllm/ distributed /parallel_state.py to learn about distribution, looking at tensor parallelism and pipeline parallelism, which I think is also closer to distributed
At present, I am looking at how to do pipelining parallel in vllm, pull down the model and put it on different Gpus, and the final question is how to change the segmentation strategy of its pipeline.

chris · April 3, 2025, 12:50pm

Ah, thank you. I guess you are advanced-beginner
I am still only diapers-level-beginner

Topic		Replies	Views
Does vLLM support multiple model_executor? Scheduling	1	45	April 28, 2025
Why vllm cannot fully use GPU in batch processing General	12	177	March 29, 2025
Multimodal inference guideline? General	13	58	May 27, 2025
Run vLLM on two diffrent GPU General	1	60	May 21, 2025
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	117	April 21, 2025

How to load the model successfully through multi-card in vllm?

Related topics