-max-num-seqs parameters not recognized

when running:
docker run -it --gpus all --max-num-seqs 64 --ipc=host --network host -p 8000:8000 -v /home/ubuntu/vllm_models:/root/.cache/huggingface -e HUGGING_FACE_HUB_TOKEN=“removed” --name vllm_download_and_serve9 vllm/vllm-openai:latest --model google/gemma-3-12b-it --served-model-name vllm/gemma-3 --max-model-len 4096 --num-scheduler-steps 10

Why --num-scheduler-steps 10 and --max-num-seqs 64 are not known:
unknown flag: --max-num-seqs

How to set them ?

It seems likely that the --max-num-seqs and --num-scheduler-steps flags are not recognized because they are not valid top-level Docker flags, but should be passed to the vLLM server command inside the container, after the image name. The correct syntax is to place all vLLM arguments after the image name, not before it. For example:

docker run -it --gpus all --ipc=host --network host -p 8000:8000 \
  -v /home/ubuntu/vllm_models:/root/.cache/huggingface \
  -e HUGGING_FACE_HUB_TOKEN="removed" \
  --name vllm_download_and_serve9 \
  vllm/vllm-openai:latest \
  --model google/gemma-3-12b-it \
  --served-model-name vllm/gemma-3 \
  --max-model-len 4096 \
  --max-num-seqs 64 \
  --num-scheduler-steps 10

This way, --max-num-seqs and --num-scheduler-steps are passed to the vLLM server, not Docker. My understanding is that this matches the documented usage in the vLLM CLI guide and Docker deployment docs. Would you like more detail on the available vLLM CLI arguments or troubleshooting tips?

Sources:

1 Like