Make install easier

To test your vLLM model running in Docker, open a new terminal on your host and run a curl command to the server, for example:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<your-model-name>",
    "prompt": "San Francisco is a",
    "max_tokens": 7,
    "temperature": 0
  }'

Replace <your-model-name> with the actual model you loaded. This will return a completion if the server is running correctly. See official quickstart for more details.

Would you like a Python example or help with chat completions?

Sources: