Make install easier

RunLLM · July 24, 2025, 8:58am

To test your vLLM model running in Docker, open a new terminal on your host and run a curl command to the server, for example:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<your-model-name>",
    "prompt": "San Francisco is a",
    "max_tokens": 7,
    "temperature": 0
  }'

Replace <your-model-name> with the actual model you loaded. This will return a completion if the server is running correctly. See official quickstart for more details.

Would you like a Python example or help with chat completions?

Sources:

official quickstart

Topic		Replies	Views
Which software components vLLM inference needs General	3	345	December 30, 2025
vLLM install for 5090 General	2	3868	October 31, 2025
Why latest rocm vllm is so bad? General	3	333	December 14, 2025
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	791	April 21, 2025
vLLM docker image For MAC CPU Silicon General	21	2417	September 19, 2025

Make install easier

Related topics