"served-model-name" and "model"

@RunLLM vllm used to support both “served-model-name” and ““model”” as launch arguments, is there any change in recent versions of vLLM?

Yes, there is a change: in recent versions of vLLM, the recommended way to specify the model when launching the server is to provide the model as a positional argument to vllm serve (e.g., vllm serve ), rather than using the --model option. The --model option is deprecated and will be removed in v0.13. The --served-model-name argument is still supported for customizing the API-visible model name(s) and remains the correct way to set alternative model names for the API server. If both are provided, the positional argument takes precedence over the config file or --model option. See the warning and migration details in the official documentation and codebase for confirmation and best practices.

For more, see Server Arguments - vLLM, FlexibleArgumentParser source, and vLLM CLI Guide.

Would you like a step-by-step example or more details on migration?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM tell me which pr deprecated --model option

The PR that deprecated the --model option for vllm serve is PR #17944 (“[Misc] remove --model from vllm serve usage”). This PR enforces using the model as a positional argument or via a config file, and raises an error if --model is used directly with vllm serve.

Would you like more details on the migration or usage?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM is v1 engine in recent vllm versions working well with mig GPUs?

No, vLLM V1 engine does not officially support MIG (Multi-Instance GPU) configurations. The maintainers have stated that MIG is not a supported use case, and users are advised to “hide the heterogeneity through docker” as a workaround, but even this is not guaranteed to work reliably. There are known issues and errors when using MIG device IDs, and full support would require significant changes that are not planned by the vLLM team. See the discussion in vllm-project/vllm#6551 and vllm-project/vllm#7211.

Would you like more technical details or possible workarounds?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM I think it worked fine with older version like v0.11?