I have successfully install vllm and qwen model in a remote server accessed via SSH. With vllm server and openai api, I could use the on-premises model in the romote server(localhost). But I want to use the api server in my local computer which is in the same intranet, what can I do? When I curl the IP of the romote server, return 404.
Unless it’s 80 (for http, or 443 for https) you need to use specify the port number where API is hosted on IP address. Also specify a route that is defined by your API. Missing the port and/or route will return 404 even if IP address is correct.
Further complication might be whether your api server is available to any location except the api server itself as localhost. 192.x.x.x and 172.16.x.x are private address spaces. Many times a container running on a computer on 192.x.x.x will serve container API using 172.16.x.x IP addressing. If you try to access a 172.16.x.x address from another computer but 172.16.x.x addressing is not known in your router’s routing table (or in your local computer IP table, assuming local and remote computer are on same subnet), request will fail
One more thing not related to a 404 but it comes once you connect to remote API: When you use IP address and port and route and successfully connect to remote API, then you have to authenticate (unless the API does not require auth)
Here is an example:
curl --location 'http://$vllm_ip:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer token' \
--data "$data" \
--output respond_.json
I ALREADY done a system with tools-calling via bash script.
However, I noticed something recently (for example non-constant special chars escape handling on server side). Please DO NOT to exposed the server on any production environment until bugs fixed.
That example is a much better way to put it
If new-to-vllm folks out there (yet to come) no comprendo DJK’s example, read the stuff I wrote which somewhat describes a couple key pieces of DJK’s example. Now I will try to correlate bits of DJK’s example with the things I said.
:…Tying values from DJK’s example to my narrative…:
Example of specifying the API’s port is :8000
Example of specifing one of the API’s route (one route of several available on the openai api) is /v1/chat/completions
The second --header
row (the one with Authorization: Bearer token) is for authentication of yourself to the API and also the API granting pre-established privileges to you based on the API server’s configuration/defiintion of API privileges for your API key / you.
Note 1: The authentication line will not work for you unless you have a valid/current bearer token to provide, replacing the word “token” from the example with your actual token
Note 2: this line may not be required if you are just you on your own computers and you put up your own openai API at, say, your home. If you don’t set up your openAPI server to require authentication, your curl won’t need to pass your token (and in this case of DIY@home you won’t even have a token).
Note 3: the curl with bearer token example is frequently written as ```
curl -H “Authorization: Bearer <ACCESS_TOKEN>”
On the --data "$data" line, $data is a variable containing whatever data you provide to the API (as with a POST PATCH or PUT). You will need to declare $data and assign your data value to $data. You could omit the $data variable and ~manually create your data here, but a $data variable is much better approach because you can validate your $data variable value as valid Json prior to your curl call that would return a 500 with bad json and then you have to figure out what's wrong and you start looking at all the unknown-to-you vllm pieces when the problem was just that you forgot to type a brace or comma or quote/tick or used an invalid character in your json, lol
Also ```
$vllm_ip
``` is a variable that should be created and assigned the value of your API's IP address. (or replaced with your IP address, entirely omitting the variable)
HTH vllm newbies like me (and also HTisntAnnoyingAF to everyone else)
Thank u very much for reply. I could understand the example provided by DJK and I try a similar way with python serve. It works when I use localhost. The 404 may caused by the reason you mentioned before, which is the ip address change caused by container use. Considering the data safety, my company only allow us to use the server with ssh connection and container in specfic port. The port I set in vllm may be not accessible from other computer. Anyway, thanks
Oh! In that case you could almost certainly use an SSH tunnel to the api server for HTTP/S requests from your local computer. When you tunnel over SSH, firewall and security can’t complain too much because they only expected to see encrypted data inside TCP/IP packets to & from SSH port, and encrypted data to & from SSH port is exactly what they see.
Or maybe set up NGINX (or some reverse proxy) on the API server to allow requests and responses to behave themselves wrt to firewalling rules where maybe vLLM service is allowed only to local requests. NGINX would receive your remote request and then make a local request for local (to NGINX) vLLM service.
Oh - the issue also could be that all HTTP requests between separate computers must use HTTPS. NGINX could help with that.