I’m trying to use vLLM on a local server with 2x NVIDIA RTX 3060 (12gb each) GPUs.
I’m trying the Pre-Build docker:
docker run --runtime nvidia --gpus all
-v ~/.cache/huggingface:/root/.cache/huggingface
–env “HF_TOKEN=$HF_TOKEN”
-p 8000:8000
–ipc=host
vllm/vllm-openai:latest
–model Qwen/Qwen3-0.6B
root@sys-ng:~# docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \-e HF_TOKEN=$HF_TOKEN
-p 8000:8000
–ipc=host
vllm/vllm-openai
–tensor-parallel-size 2
–model Qwen/Qwen3-0.6B
Unable to find image ‘vllm/vllm-openai:latest’ locally
latest: Pulling from vllm/vllm-openai
66587c81b81a: Pull complete
f29b1d4013a9: Pull complete
340d44d2921c: Pull complete
59a4bcbddda3: Pull complete
6e8af4fd0a07: Pull complete
5fde6ec96d5f: Pull complete
d5c41c3b66f6: Pull complete
e392f915ed79: Pull complete
8638325b23df: Pull complete
d8b0d5c5f036: Pull complete
bde526ae4fd3: Pull complete
5016c10f4af8: Pull complete
a19e04924597: Pull complete
30f3a69bd2f3: Downloading [===============> ] 1.561GB/5.129GB
9b85752ae3df: Download complete
c8fd43168366: Download complete
0f6f885fe2ee: Download complete
2bbb7ba18787: Download complete
71974a01930c: Download complete
4d2e275b7825: Downloading [==========================================> ] 437.3MB/508.9MB
8e4934fbdb92: Downloading
af4d82598236: Waiting
09016baf7b99: Waiting
974f972651dd: Waiting
c2f79f080520: Waiting
a57979a1f045: Waiting
a4cd7924e1f2: Waiting
docker: unauthorized: authentication required.
See ‘docker run --help’.
I have logged into HuggingFace and docker as well:
root@sys-ng:~# hf auth login
User is already logged in.
root@eve-ng:~#
root@eve-ng:~#
root@eve-ng:~# docker login
Authenticating with existing credentials…
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
Login Succeeded