I get the feeling that today’s small & medium models (like 17B params & less) seem to be created and default-set for running within 32GB vRAM.
Moving up to larger models around 70B params, would 48GB vRAM instead of 32GB vRAM have any practical effect on “making my life easier”? Seems like 70B param model talk swirls around 96GB vRAM and above.
Basically i have one 16GB vram GPU and I’m going to add a second 16GB vram GPU (albeit at a slower memory bandwidth of 448 GB/sec vs my existing 896 GB/sec) to have the seemingly-expected 32GB vRam.
I’m wondering if it’s worth anything adding a third 16GB vrm GPU for 48GB vRam? Or is the next model size up from models that run in 32gb more like 128gb vram and thus a bump from 32 to 48 would have no real benefit.
I think that is the case, that 48gb vram is, practically speaking, not really any real, usable improvement over 32gb vram. I think it’s either 32gb or >=128
vRAM size next step-up from 32gb might be 128gb or maybe 96gb, but whatever, it’s probably much more than 48gb such that 48gb is effectively, IRL, in practice, still just 32gb
I’m thinking “add one gpu for 32gb vram; don’t bother going for 48gb vram.” Am I pretty wrong with this thinking & reasoning why 48 is not really better then 32 in-practice?
On another topic there’s a question about /dev/shm and model sizes. If I go from 16gb to 32 (or 48)gb vram do i need to do anything to increase /dev/shm size? I was thinking that /dev/shm is shared CPU memory and it shouldn’t have any effect on model serving bc model serving is just one CPU app… I think…
thanks!