32GB vs 48GB vRam

RunLLM · May 3, 2025, 1:40pm

For 70B parameter models, 32GB or even 48GB VRAM is not enough to run full-precision (FP16/BF16) versions; these typically require 140GB+ VRAM, so you’d need at least 2x80GB A100s or similar for standard weights. Quantized versions (e.g., 4-bit) can fit in 32–48GB, but the jump from 32GB to 48GB VRAM only helps if the quantized model’s weights plus KV cache fit within that range—otherwise, you’ll still be limited to smaller models or need to use more aggressive quantization or offloading. In practice, 48GB is only a modest improvement over 32GB for large models, and the next practical step for full-precision 70B models is 96GB+ VRAM, not 48GB. See this discussion and here.

Regarding /dev/shm: vLLM uses CPU RAM for swap space and temporary storage, but /dev/shm is only relevant if you run out of RAM and vLLM starts using swap (see --swap-space). Increasing VRAM does not require increasing /dev/shm unless you also increase swap usage or run multiple large models that might use shared memory. See this issue for more details.

Would you like a step-by-step breakdown or code/config examples for quantized model deployment?

Sources:

Topic		Replies	Views
Run vLLM on two diffrent GPU General	1	1028	May 21, 2025
What should be /dev/shm size for larger models General	2	1102	July 16, 2025
Which gemma-3 27B for 32GB Vram General	9	1432	September 18, 2025
The new V1 way to ~--cpu-offload-gb V1 Feedback	5	3604	April 13, 2025
Deploy a big LLM when GPU VRAM not enough General	21	4069	August 13, 2025

32GB vs 48GB vRam

Related topics