To test your vLLM model running in Docker, open a new terminal on your host and run a curl command to the server, for example:
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "<your-model-name>",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}'
Replace <your-model-name> with the actual model you loaded. This will return a completion if the server is running correctly. See official quickstart for more details.
Would you like a Python example or help with chat completions?
Sources: