Vllm serve拉起服务，到最后报错了

zZzbinn · June 27, 2026, 9:36am

vllm serve /autodl-fs/data/weight/DeepSeek-V3.1/ --trust-remote-code --max-num-seqs 16 --max-model-len 8192 --max-num-batched-tokens 8192 --gpu-memory-utilization 0.95 --tensor-parallel-size 8 --distributed_executor_backend “mp” --no-enable-prefix-caching --enable-expert-parallel --no-async-scheduling --default-chat-template-kwargs ‘{“enable_thinking”: false}’ --compilation_config ‘{“cudagraph_mode”: “FULL_DECODE_ONLY”}’

RunLLM · June 27, 2026, 9:36am

Your command mostly uses valid vLLM serve arguments, but there are some issues:

The correct flag is --distributed-executor-backend, not --distributed_executor_backend.
The correct flag is --compilation-config, not --compilation_config.
Use standard double quotes " for JSON and arguments, not curly quotes or single quotes.
All other flags are supported and documented for distributed, expert parallel, and compilation config usage in vLLM serve. See engine args and serve args.

Would you like a corrected command or more detail on any specific flag?

Sources:

Hint: Mention @Herald in the post for followups.

Topic		Replies	Views
Vllm serve拉起推理服务报错了 General	9	308	December 15, 2025
如何通过vllm的日志信息，组装vllm serve指令 General	3	108	December 12, 2025
Cli serve启动不了参数是否出错 General	1	252	July 30, 2025
"served-model-name" and "model" General	6	1322	January 26, 2026
为什么显存足够却启动不了serve General	3	1252	July 30, 2025

Vllm serve拉起服务，到最后报错了

Related topics