top | item 43498527

(no title)

sujayakar | 11 months ago

Deepseek is already using SSDs for their KV cache: https://github.com/deepseek-ai/3FS

discuss

You are deeply misunderstanding what the KV cache referred to here is. It’s not for storing data. This is the KV cache that’s part of the model to reduce quadratic compute complexity into linear for self attention. This is not stored on SSD - it’s in VRAM (or CPU if you’re not using a GPU)

boroboro4|11 months ago

They, in fact, mention inference kv cache as use case in readme. The most advanced kv caching uses hierarchy of gpu ram/regular ram/ssd. Seems like they were able to use their storage abstraction for last tier.

magicalhippo|11 months ago

https://github.com/deepseek-ai/3FS?tab=readme-ov-file#3-kvca...

KVCache is a technique used to optimize the LLM inference process. It avoids redundant computations by caching the key and value vectors of previous tokens in the decoder layers. The top figure demonstrates the read throughput of all KVCache clients (1×400Gbps NIC/node), highlighting both peak and average values, with peak throughput reaching up to 40 GiB/s