Does vLLM support starting multiple workers when launching a Uvicorn server?

in the code, it look like not support?

Could you clarify which feature or model you are referring to as “not supported” in the code? For example, are you asking about LoRA with multimodal models, a specific model architecture, or something else? Please specify the model or feature so I can provide an accurate answer with citations.

For reference, some features like LoRA with certain multimodal models (e.g., LLaVA v1.6) are not fully supported yet, as noted in the compatibility matrix and related issues (source). Let me know your specific use case for a more detailed answer.

Sources:

Yes, you can use [--api-server-count](vllm serve - vLLM) to increase the number of API server processes that sit infront of the vLLM engine.

Yes, you can use the --api-server-count (or -asc) argument with vllm serve to increase the number of API server processes in front of the vLLM engine, enabling parallel input processing and better CPU utilization for high-throughput scenarios. This is supported for online inference and is documented in the official docs (source, source).

Note that when using --api-server-count > 1, multi-modal processor cache is disabled due to the lack of a one-to-one mapping between API and engine processes (source). Would you like more detail on how to configure or use this option?

Sources: