I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this

I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this
could you please shed some light how to do this step by step? :slight_smile:

Don’t try this and it is really slow (disregard life span of SSD).
You can try MoE LLMs this way with swapping on SSD (not use SSD as whole-processing).

do you mean SSD endurance life is not enough or performance is not enough, gen5 TLC can reach 14GB/s read BW and 9.5GB/s write BW, it start close to DRAM BW. I saw there is LMCache solution. is it very promising to leverage SSD to service KVCache?

The speed you claimed is far from enough :sweat_smile:.
The RAM is 200GiB/s+ for full caped old-day server.
And the RAM transfer/inter-numa-node is still the bottleneck compare any GPU(or alike) platform.

I saw a lot of PD solution and LMCache and Dynamo to start use local SSD and shared KV store to do this, avoid GPU prefill calculation. could you please share me your insight on this?

1 Like

Need chopped experts for MoE or chop the model by layer. You need debug your run-time and monitor the model inquiry behavior to see where and how much you can chop for offloading. :hugs:

1 Like