"served-model-name" and "model"

hustxiayang · January 26, 2026, 5:34pm

@RunLLM vllm used to support both “served-model-name” and ““model”” as launch arguments, is there any change in recent versions of vLLM?

RunLLM · January 26, 2026, 5:34pm

Yes, there is a change: in recent versions of vLLM, the recommended way to specify the model when launching the server is to provide the model as a positional argument to vllm serve (e.g., vllm serve ), rather than using the --model option. The --model option is deprecated and will be removed in v0.13. The --served-model-name argument is still supported for customizing the API-visible model name(s) and remains the correct way to set alternative model names for the API server. If both are provided, the positional argument takes precedence over the config file or --model option. See the warning and migration details in the official documentation and codebase for confirmation and best practices.

For more, see Server Arguments - vLLM, FlexibleArgumentParser source, and vLLM CLI Guide.

Would you like a step-by-step example or more details on migration?

Sources:

Hint: Mention @RunLLM in the post for followups.

hustxiayang · January 26, 2026, 5:39pm

@RunLLM tell me which pr deprecated --model option

RunLLM · January 26, 2026, 5:39pm

The PR that deprecated the --model option for vllm serve is PR #17944 (“[Misc] remove --model from vllm serve usage”). This PR enforces using the model as a positional argument or via a config file, and raises an error if --model is used directly with vllm serve.

Would you like more details on the migration or usage?

Sources:

https://github.com/vllm-project/vllm/pull/17944

Hint: Mention @RunLLM in the post for followups.

hustxiayang · January 26, 2026, 6:27pm

@RunLLM is v1 engine in recent vllm versions working well with mig GPUs?

RunLLM · January 26, 2026, 6:28pm

No, vLLM V1 engine does not officially support MIG (Multi-Instance GPU) configurations. The maintainers have stated that MIG is not a supported use case, and users are advised to “hide the heterogeneity through docker” as a workaround, but even this is not guaranteed to work reliably. There are known issues and errors when using MIG device IDs, and full support would require significant changes that are not planned by the vLLM team. See the discussion in vllm-project/vllm#6551 and vllm-project/vllm#7211.

Would you like more technical details or possible workarounds?

Sources:

Hint: Mention @RunLLM in the post for followups.

hustxiayang · January 26, 2026, 6:29pm

@RunLLM I think it worked fine with older version like v0.11?

Topic		Replies	Views
How to use local model when using vllm serve? General	3	5311	July 22, 2025
Jina AI VLM - Support help General	3	72	October 12, 2025
Make install easier General	11	415	July 24, 2025
Add Support for GLM-4 series model Model Support	1	160	April 16, 2025
GLM-4.7-Flash with nvidia General	9	1058	January 22, 2026

"served-model-name" and "model"

Related topics