Does vLLM support starting multiple workers when launching a Uvicorn server?

xiaolei373 · September 1, 2025, 11:40am

in the code, it look like not support?

RunLLM · September 1, 2025, 11:41am

Could you clarify which feature or model you are referring to as “not supported” in the code? For example, are you asking about LoRA with multimodal models, a specific model architecture, or something else? Please specify the model or feature so I can provide an accurate answer with citations.

For reference, some features like LoRA with certain multimodal models (e.g., LLaVA v1.6) are not fully supported yet, as noted in the compatibility matrix and related issues (source). Let me know your specific use case for a more detailed answer.

Sources:

Multi lora inference support for llava v1.6

hmellor · September 1, 2025, 11:59am

Yes, you can use [--api-server-count](vllm serve - vLLM) to increase the number of API server processes that sit infront of the vLLM engine.

RunLLM · September 1, 2025, 12:00pm

Yes, you can use the --api-server-count (or -asc) argument with vllm serve to increase the number of API server processes in front of the vLLM engine, enabling parallel input processing and better CPU utilization for high-throughput scenarios. This is supported for online inference and is documented in the official docs (source, source).

Note that when using --api-server-count > 1, multi-modal processor cache is disabled due to the lack of a one-to-one mapping between API and engine processes (source). Would you like more detail on how to configure or use this option?

Sources:

vllm serve CLI Reference

Topic		Replies	Views
Run multiple models General	7	492	July 28, 2025
"vllm serve" to use all GPU's of a single-node multi-use General	5	102	August 1, 2025
Running vLLM multi-node data parallel with SLURM RL Integration	1	60	August 12, 2025
2 vllm containers on a single GPU General	3	497	August 27, 2025
Clarification: Does vLLM support concurrent decoding with multiple LoRA adapters in online inference? LoRA	1	16	August 29, 2025

Does vLLM support starting multiple workers when launching a Uvicorn server?

Related topics