Correct—if the model and all required files are present in the local HuggingFace cache, and HF_HUB_OFFLINE=1 is set, you can use the repoName/model format (e.g., Qwen/Qwen3-235B-A22B-FP8) as the model argument. vLLM will resolve the model from the local cache without attempting any network calls, as confirmed in vLLM GitHub Issue #4859 and offline inference documentation.
Would you like more detail on how vLLM resolves models from the cache or how to verify your cache contents?
Sources: