How to run multiple models ? How to define the other models in the vllm server command?

vLLM does not support serving multiple models in a single server process or defining multiple models in the vLLM server command. To run multiple models, you must launch separate vLLM server instances, each on a different port and (optionally) GPU, and use an external router or load balancer (e.g., N…

Yes, when running multiple vLLM servers (each serving a different model) with tensor parallelism on the same GPU(s), you should set the --gpu-memory-utilization parameter for each instance so that their total does not exceed 1.0 (100%). For example, if you run two servers, you might set --gpu-memory…

yes , how to calculate the memory

There is no exact formula to calculate the required --gpu-memory-utilization for each vLLM instance, because vLLM profiles memory usage empirically at startup, accounting for model weights, KV cache, CUDA graphs, and other runtime allocations. The recommended approach is to estimate the sum of model…

I would like to know why /v1/models endpoint is given in the api endpoints, if it’s not possible to serve multiple models.

The /v1/models endpoint exists to comply with the OpenAI API specification, which requires a models listing endpoint. In vLLM, this endpoint returns the single model (or base model plus any LoRA adapters) currently served by that instance, not multiple independent models. It does not indicate suppor…

Run multiple models

General

youlearnit July 25, 2025, 11:58pm 3

When having tensor parallel, and running multiple vllm servers offering multiple models do I have to define the memory usage in each like 50%?

Topic		Replies	Views
Understanding Multi Node Parallelization General	7	218	May 13, 2026
How to serve two vLLM instance using docker? General	3	578	February 26, 2026
"vllm serve" to use all GPU's of a single-node multi-use General	5	982	August 1, 2025
Does vLLM support multiple model_executor? Scheduling	1	378	April 28, 2025
Does vllm support deploy multiple docker instance on one gpu General	1	925	May 21, 2025

Run multiple models

Related topics