Vllm omni和vllm区别是什么

Yepgang · December 22, 2025, 3:07am

为什么不直接在vllm上支持多模态？为什么需要两个项目

RunLLM · December 22, 2025, 3:07am

vLLM当前不支持在单个实例/进程中同时加载和服务多个模型，只能每个实例服务一个模型。如需多模型服务，需启动多个vLLM实例，每个实例加载不同模型，并通过上层路由分发请求。据官方FAQ、官方issue、多模型相关issue，这是vLLM的设计限制，主要因为LLM推理极度消耗GPU和CPU资源，单实例多模型会导致资源竞争和调度复杂，影响性能和稳定性。

因此，官方推荐的做法是：每个模型单独启动一个vLLM服务进程（可用不同端口或GPU），再用nginx等负载均衡或自定义路由层实现多模型API聚合。这也是为什么需要两个（或多个）项目/进程来实现多模型服务的原因。需要更详细的技术原理或实现建议吗？

Sources:

Hint: Mention @RunLLM in the post for followups.

DarkLight1337 · December 25, 2025, 4:33am

The landscape for models with multimodal output is not stable yet. To avoid accumulating tech debt in case large changes are needed to the engine core, we have decided to decouple the development of vLLM-Omni from the main repository.

Topic		Replies	Views
Make install easier General	11	574	July 24, 2025
vLLM output vs Ollama General	8	661	April 10, 2025
How to serve two vLLM instance using docker? General	3	278	February 26, 2026
"served-model-name" and "model" General	6	729	January 26, 2026
Run multiple models General	7	8971	July 28, 2025

Vllm omni和vllm区别是什么

Related topics