Setting up vLLM in an airgapped environment

cberge908 · June 25, 2025, 1:33pm

We want to setup our vLLM instance in an airgapped environment (e.g. pulling the necessary model(s) once and then disconnecting/ blocking internet access). The models are downloaded to the local cache volume of the instance, but when disconnecting internet the vLLM instance does not start up properly:

ERROR 06-18 10:51:06 [config.py:114] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: fae4a4c5-41c2-4a02-bebf-bb0ae5a1facd)'), retrying 1 of 2
ERROR 06-18 10:51:08 [config.py:112] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 4be26889-49de-49da-a65e-2cc043bbfeba)')
ERROR 06-18 10:51:16 [config.py:114] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 815f5971-5f4b-4efd-95f3-83be457f4e76)'), retrying 1 of 2
ERROR 06-18 10:51:18 [config.py:112] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 34e83b6c-cfed-4fe3-a721-bdf26be234b2)')

Is there any option or setting to make the instance available without having an active internet connection?

RunLLM · June 25, 2025, 1:34pm

To run vLLM in an airgapped environment, you must (1) download the model and all required files (e.g., config.json, tokenizer files) to a local directory, and (2) start vLLM by pointing directly to this local directory, not by model name. Additionally, set the environment variable HF_HUB_OFFLINE=1 to prevent vLLM from attempting to connect to the internet. If you use a local cache, you may need to provide the full snapshot path to the model inside the cache, not just the model name. For gated/private models, ensure all files are present, as vLLM does not always download every required file automatically. Example:

export HF_HUB_OFFLINE=1
vllm serve /path/to/local/model/dir --other-args

If you still see errors, double-check that the directory contains all necessary files (e.g., config.json, tokenizer files, model weights) and that you are not using the --model argument with a HuggingFace model name (e.g., “facebook/opt-125m”) but rather with the local path. This approach is confirmed in multiple issues and is the recommended workaround for airgapped/offline deployments according to vLLM GitHub issues, troubleshooting docs, and offline inference docs.

Would you like a step-by-step guide for preparing the local model directory or for troubleshooting specific errors?

Sources:

cberge908 · June 25, 2025, 3:03pm

Thanks @RunLLM!
This was exactly what I’m looking for.

Just one point to add: If the models have been downloaded to the local cache & the environment variable HF_HUB_OFFLINE=1 is set, you don’t need to specify the ultimate path to the model. It’s enough to use the repoName/model format (e.g. Qwen/Qwen3-235B-A22B-FP8

RunLLM · June 25, 2025, 3:03pm

Correct—if the model and all required files are present in the local HuggingFace cache, and HF_HUB_OFFLINE=1 is set, you can use the repoName/model format (e.g., Qwen/Qwen3-235B-A22B-FP8) as the model argument. vLLM will resolve the model from the local cache without attempting any network calls, as confirmed in vLLM GitHub Issue #4859 and offline inference documentation.

Would you like more detail on how vLLM resolves models from the cache or how to verify your cache contents?

Sources:

Topic		Replies	Views
Offline multi-node inference General	11	279	May 19, 2025
How to use vllm server in intranet General	5	208	April 2, 2025
How to use local model when using vllm serve? General	3	740	July 22, 2025
Make install easier General	11	108	July 24, 2025
对于vllm-ascend能力的咨询 General	3	57	June 24, 2025

Setting up vLLM in an airgapped environment

Related topics