I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this

gaowayne · March 30, 2025, 12:46pm

I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this
could you please shed some light how to do this step by step?

DystopianJunkyardKid · March 30, 2025, 2:28pm

Don’t try this and it is really slow (disregard life span of SSD).
You can try MoE LLMs this way with swapping on SSD (not use SSD as whole-processing).

gaowayne · March 31, 2025, 4:51pm

do you mean SSD endurance life is not enough or performance is not enough, gen5 TLC can reach 14GB/s read BW and 9.5GB/s write BW, it start close to DRAM BW. I saw there is LMCache solution. is it very promising to leverage SSD to service KVCache?

DystopianJunkyardKid · April 1, 2025, 8:17am

The speed you claimed is far from enough .
The RAM is 200GiB/s+ for full caped old-day server.
And the RAM transfer/inter-numa-node is still the bottleneck compare any GPU(or alike) platform.

gaowayne · April 2, 2025, 6:36pm

I saw a lot of PD solution and LMCache and Dynamo to start use local SSD and shared KV store to do this, avoid GPU prefill calculation. could you please share me your insight on this?

DystopianJunkyardKid · April 5, 2025, 11:10am

Need chopped experts for MoE or chop the model by layer. You need debug your run-time and monitor the model inquiry behavior to see where and how much you can chop for offloading.

Topic		Replies	Views
Is there any roadmap to support prefix caching on dram and disk? Disaggregated Prefilling	0	21	April 8, 2025
Will CPU Offload be supported in V1? General	3	215	March 24, 2025
The new V1 way to ~--cpu-offload-gb V1 Feedback	5	78	April 13, 2025
Avoiding hash collisions in prefix cache KV-Cache	7	87	March 24, 2025
Making best use of varying GPU generations NVIDIA GPU Support	2	49	April 11, 2025

I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this

Related topics