I have two NVIDIA GPU, one with 8GB vRAM and one with 16GB vRAM.
When run a Qwen model with vLLM, this is my GPU usages:
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:01:00.0 Off | N/A
| 0% 41C P8 14W / 165W | 5300MiB / 16380MiB | 0% Default
| 1 NVIDIA GeForce RTX 4060 Ti Off | 00000000:05:00.0 Off | N/A
| 0% 40C P8 7W / 160W | 5290MiB / 8188MiB | 0% Default
Why use same vRAM from Both gpu and don’t use full capacity of vRAM’s?