Yes, --cpu-offload-gb 256 sets the maximum CPU RAM vLLM can use for offloading, but vLLM will always fill GPU VRAM first and only use as much RAM as needed to fit the model weights that exceed VRAM. If the model fits in VRAM, RAM offload is not used; if not, only the required portion of the specified RAM is used, up to the limit you set. You do not need to allocate the full 256GB unless the model requires it—this is just an upper bound.
Would you like more detail or a usage example?
Sources: