Large model CPU and RAM requirements

darksky6666 · September 15, 2025, 3:55pm

If I plan to deploy vLLM using Docker in a production environment with multiple concurrent users (8 users), how much RAM and how many CPU cores do I need if I’m not planning to use CPU offloading? Let’s say the model is GLM-4.5, running on 16× H200 GPUs.

Topic		Replies	Views
Why vLLM uses alot of CPU memory General	1	1231	April 21, 2025
vLLM Hihg CPU usage when doin nothing General	5	452	July 24, 2025
Deploy a big LLM when GPU VRAM not enough General	21	1950	August 13, 2025
How to size LLMs General	2	346	September 22, 2025
Does vllm support deploy multiple docker instance on one gpu General	1	673	May 21, 2025

Large model CPU and RAM requirements

Related topics