docker run -it --gpus ‘“device=0”’ --ipc=host --network host -p 8000:8000 \ -v /home/ubuntu/vllm_models:/root/.cache/huggingface \ etc..
Your command is almost correct, but the --gpus flag should use standard double quotes or no quotes, not the curly/encoded ones. For a single GPU (device 0), use:
docker run -it --gpus device=0 --ipc=host --network host -p 8000:8000 \
-v /home/ubuntu/vllm_models:/root/.cache/huggingface \
vllm/vllm-openai:latest \
--model <model_name>
Make sure to replace <model_name> with your actual model path or Hugging Face repo. For more details, see vLLM Docker deployment docs.
Would you like a step-by-step explanation or help with additional arguments?
Sources: