How to use vllm server in intranet

Mefistofele · March 29, 2025, 1:47am

I have successfully install vllm and qwen model in a remote server accessed via SSH. With vllm server and openai api, I could use the on-premises model in the romote server(localhost). But I want to use the api server in my local computer which is in the same intranet, what can I do? When I curl the IP of the romote server, return 404.

chris · March 30, 2025, 1:45am

Unless it’s 80 (for http, or 443 for https) you need to use specify the port number where API is hosted on IP address. Also specify a route that is defined by your API. Missing the port and/or route will return 404 even if IP address is correct.

Further complication might be whether your api server is available to any location except the api server itself as localhost. 192.x.x.x and 172.16.x.x are private address spaces. Many times a container running on a computer on 192.x.x.x will serve container API using 172.16.x.x IP addressing. If you try to access a 172.16.x.x address from another computer but 172.16.x.x addressing is not known in your router’s routing table (or in your local computer IP table, assuming local and remote computer are on same subnet), request will fail

One more thing not related to a 404 but it comes once you connect to remote API: When you use IP address and port and route and successfully connect to remote API, then you have to authenticate (unless the API does not require auth)

DystopianJunkyardKid · March 30, 2025, 2:19pm

Here is an example:

curl --location 'http://$vllm_ip:8000/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer token' \
    --data "$data" \
    --output respond_.json

I ALREADY done a system with tools-calling via bash script.
However, I noticed something recently (for example non-constant special chars escape handling on server side). Please DO NOT to exposed the server on any production environment until bugs fixed.

chris · March 30, 2025, 3:45pm

That example is a much better way to put it

If new-to-vllm folks out there (yet to come) no comprendo DJK’s example, read the stuff I wrote which somewhat describes a couple key pieces of DJK’s example. Now I will try to correlate bits of DJK’s example with the things I said.

:…Tying values from DJK’s example to my narrative…:

Example of specifying the API’s port is :8000

Example of specifing one of the API’s route (one route of several available on the openai api) is /v1/chat/completions

The second --header row (the one with Authorization: Bearer token) is for authentication of yourself to the API and also the API granting pre-established privileges to you based on the API server’s configuration/defiintion of API privileges for your API key / you.

Note 1: The authentication line will not work for you unless you have a valid/current bearer token to provide, replacing the word “token” from the example with your actual token
Note 2: this line may not be required if you are just you on your own computers and you put up your own openai API at, say, your home. If you don’t set up your openAPI server to require authentication, your curl won’t need to pass your token (and in this case of DIY@home you won’t even have a token).
Note 3: the curl with bearer token example is frequently written as ```
curl -H “Authorization: Bearer <ACCESS_TOKEN>”


On the --data "$data" line, $data is a variable containing whatever data you provide to the API (as with a POST PATCH or PUT).  You will need to declare $data and assign your data value to $data.   You could omit the $data variable and ~manually create your data here, but a $data variable is much better approach because you can validate your $data variable value as valid Json prior to your curl call that would return a 500 with bad json and then you have to figure out what's wrong and you start looking at all the unknown-to-you vllm pieces when the problem was just that you forgot to type a brace or comma or quote/tick or used an invalid character in your json, lol

Also ```
$vllm_ip
``` is a variable that should be created and assigned the value of your API's IP address. (or replaced with your IP address, entirely omitting the variable)


HTH vllm newbies like me (and also HTisntAnnoyingAF to everyone else)

Mefistofele · April 2, 2025, 3:12am

Thank u very much for reply. I could understand the example provided by DJK and I try a similar way with python serve. It works when I use localhost. The 404 may caused by the reason you mentioned before, which is the ip address change caused by container use. Considering the data safety, my company only allow us to use the server with ssh connection and container in specfic port. The port I set in vllm may be not accessible from other computer. Anyway, thanks

chris · April 2, 2025, 12:13pm

Oh! In that case you could almost certainly use an SSH tunnel to the api server for HTTP/S requests from your local computer. When you tunnel over SSH, firewall and security can’t complain too much because they only expected to see encrypted data inside TCP/IP packets to & from SSH port, and encrypted data to & from SSH port is exactly what they see.

Or maybe set up NGINX (or some reverse proxy) on the API server to allow requests and responses to behave themselves wrt to firewalling rules where maybe vLLM service is allowed only to local requests. NGINX would receive your remote request and then make a local request for local (to NGINX) vLLM service.

Oh - the issue also could be that all HTTP requests between separate computers must use HTTPS. NGINX could help with that.

Topic		Replies	Views
Setting up vLLM in an airgapped environment General	3	57	June 25, 2025
Trace requests through vLLM v1 General	1	16	May 29, 2025
vLLM output vs Ollama General	8	276	April 10, 2025
Offline multi-node inference General	11	167	May 19, 2025
vLLM Load balancing General	1	127	March 24, 2025

How to use vllm server in intranet

Related topics