Can Lora adapters be loaded on different GPUs

djm-afk · April 7, 2025, 3:48pm

Your current environment

Ubuntun

How would you like to use vllm

I currently need to investigate a question: Does the LoRa adapter for vllm have to run on the same GPU as the base model?

For example, if a basic model and a LoRa adapter are running on a GPU, and the video memory is almost full, can we run another LoRa adapter on another GPU of the same node?

comaniac · April 7, 2025, 3:51pm

I don’t think we support that. In general LoRA adapter is pretty small so it shouldn’t be an issue. If you have many LoRA adapters and you cannot load all of them together due to memory capacity, then one thing you can do now is limiting the number of loaded LoRA adapters, but this may hurt performance.

Topic		Replies	Views
LoRA Adapter enabling with vLLM is not working LoRA	4	240	April 21, 2025
Clarification: Does vLLM support concurrent decoding with multiple LoRA adapters in online inference? LoRA	1	66	August 29, 2025
Does vllm support deploy multiple docker instance on one gpu General	1	393	May 21, 2025
Run vLLM on two diffrent GPU General	1	318	May 21, 2025
Making best use of varying GPU generations NVIDIA GPU Support	2	228	April 11, 2025

Can Lora adapters be loaded on different GPUs

Your current environment

How would you like to use vllm

Related topics