Typically it is believed that reducing the max_model_len would help in reducing the GPU memory usage.
But this fact was not observed while i tried deploying a model meta-llama-3.2-3b https://build.nvidia.com/meta/llama-3.2-3b-instruct?nim=self-hosted from NIM model repo o n Tesla T4 GPU (16GB VRAM).
Here I tried deploying the model using different values of max_model_len but the GPU memory usage remained constant.
here are logs:
Logs for max_model_len: 8192
the current vLLM instance can use total_gpu_memory (14.56GiB) x gpu_memory_utilization (0.90) = 13.11GiB
model weights take 6.02GiB; non_torch_memory takes 0.05GiB; PyTorch activation peak memory takes 1.22GiB; the rest of the memory reserved for KV Cache is 5.83GiB.
INFO 2026-06-22 05:04:45.292 executor_base.py:111] # cuda blocks: 3409, # CPU blocks: 2340
INFO 2026-06-22 05:04:45.292 executor_base.py:116] Maximum concurrency for 8192 tokens per request: 6.66x
logs for max_model_len: 4096
the current vLLM instance can use total_gpu_memory (14.56GiB) x gpu_memory_utilization (0.90) = 13.11GiB
model weights take 6.02GiB; non_torch_memory takes 0.05GiB; PyTorch activation peak memory takes 1.19GiB; the rest of the memory reserved for KV Cache is 5.85GiB.
INFO 2026-06-22 04:43:29.435 executor_base.py:111] # cuda blocks: 3422, # CPU blocks: 2340
INFO 2026-06-22 04:43:29.435 executor_base.py:116] Maximum concurrency for 4096 tokens per request: 13.37x
logs for max_model_len: 16384
the current vLLM instance can use total_gpu_memory (14.56GiB) x gpu_memory_utilization (0.90) = 13.11GiB
model weights take 6.02GiB; non_torch_memory takes 0.05GiB; PyTorch activation peak memory takes 1.26GiB; the rest of the memory reserved for KV Cache is 5.78GiB.
INFO 2026-06-22 05:29:37.546 executor_base.py:111] # cuda blocks: 3381, # CPU blocks: 2340
INFO 2026-06-22 05:29:37.546 executor_base.py:116] Maximum concurrency for 16384 tokens per request: 3.30x
Now this fact is contrary to what we think that lowering the context length would help us save some GPU memory.
Can you explain me the reason for such a behavior.