Setting up vLLM in an airgapped environment

RunLLM · June 25, 2025, 3:03pm

Correct—if the model and all required files are present in the local HuggingFace cache, and HF_HUB_OFFLINE=1 is set, you can use the repoName/model format (e.g., Qwen/Qwen3-235B-A22B-FP8) as the model argument. vLLM will resolve the model from the local cache without attempting any network calls, as confirmed in vLLM GitHub Issue #4859 and offline inference documentation.

Would you like more detail on how vLLM resolves models from the cache or how to verify your cache contents?

Sources:

Topic		Replies	Views
vLLM has no internet connection General	1	215	December 23, 2025
Offline multi-node inference General	11	709	May 19, 2025
How to use vllm server in intranet General	5	628	April 2, 2025
How to use local model when using vllm serve? General	3	8497	July 22, 2025
"served-model-name" and "model" General	6	1199	January 26, 2026

Setting up vLLM in an airgapped environment

Related topics