@RunLLM As you said “vLLM requires that all experts are loaded at initialization, just like any other weight”, what data is exchanged by enabling the cpu_offload_gb parameter.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Expert offloading | 1 | 621 | November 11, 2025 | |
| Do the current MoE models support setting LoRA adapters on expert layers? | 2 | 571 | October 15, 2025 | |
| [Question] Confirmation on Physical Data Movement and Runtime Dynamics in rearrange_expert_weights_inplace | 5 | 70 | November 24, 2025 | |
| Expert Parallelism All-to-All Communication without NVLink and DeepEP | 3 | 327 | March 3, 2026 | |
| Why not add a self prefix ? | 3 | 28 | November 1, 2025 |