Can Lora adapters be loaded on different GPUs

Your current environment

Ubuntun

How would you like to use vllm

I currently need to investigate a question: Does the LoRa adapter for vllm have to run on the same GPU as the base model?

For example, if a basic model and a LoRa adapter are running on a GPU, and the video memory is almost full, can we run another LoRa adapter on another GPU of the same node?

I don’t think we support that. In general LoRA adapter is pretty small so it shouldn’t be an issue. If you have many LoRA adapters and you cannot load all of them together due to memory capacity, then one thing you can do now is limiting the number of loaded LoRA adapters, but this may hurt performance.