Setting up vLLM in an airgapped environment

We want to setup our vLLM instance in an airgapped environment (e.g. pulling the necessary model(s) once and then disconnecting/ blocking internet access). The models are downloaded to the local cache volume of the instance, but when disconnecting internet the vLLM instance does not start up properly:

ERROR 06-18 10:51:06 [config.py:114] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: fae4a4c5-41c2-4a02-bebf-bb0ae5a1facd)'), retrying 1 of 2
ERROR 06-18 10:51:08 [config.py:112] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 4be26889-49de-49da-a65e-2cc043bbfeba)')
ERROR 06-18 10:51:16 [config.py:114] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 815f5971-5f4b-4efd-95f3-83be457f4e76)'), retrying 1 of 2
ERROR 06-18 10:51:18 [config.py:112] Error retrieving file list: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 34e83b6c-cfed-4fe3-a721-bdf26be234b2)')

Is there any option or setting to make the instance available without having an active internet connection?

To run vLLM in an airgapped environment, you must (1) download the model and all required files (e.g., config.json, tokenizer files) to a local directory, and (2) start vLLM by pointing directly to this local directory, not by model name. Additionally, set the environment variable HF_HUB_OFFLINE=1 to prevent vLLM from attempting to connect to the internet. If you use a local cache, you may need to provide the full snapshot path to the model inside the cache, not just the model name. For gated/private models, ensure all files are present, as vLLM does not always download every required file automatically. Example:

export HF_HUB_OFFLINE=1
vllm serve /path/to/local/model/dir --other-args

If you still see errors, double-check that the directory contains all necessary files (e.g., config.json, tokenizer files, model weights) and that you are not using the --model argument with a HuggingFace model name (e.g., “facebook/opt-125m”) but rather with the local path. This approach is confirmed in multiple issues and is the recommended workaround for airgapped/offline deployments according to vLLM GitHub issues, troubleshooting docs, and offline inference docs.

Would you like a step-by-step guide for preparing the local model directory or for troubleshooting specific errors?

Sources:

Thanks @RunLLM!
This was exactly what I’m looking for.

Just one point to add: If the models have been downloaded to the local cache & the environment variable HF_HUB_OFFLINE=1 is set, you don’t need to specify the ultimate path to the model. It’s enough to use the repoName/model format (e.g. Qwen/Qwen3-235B-A22B-FP8

Correct—if the model and all required files are present in the local HuggingFace cache, and HF_HUB_OFFLINE=1 is set, you can use the repoName/model format (e.g., Qwen/Qwen3-235B-A22B-FP8) as the model argument. vLLM will resolve the model from the local cache without attempting any network calls, as confirmed in vLLM GitHub Issue #4859 and offline inference documentation.

Would you like more detail on how vLLM resolves models from the cache or how to verify your cache contents?

Sources: