(no title)
01100011 | 8 days ago
I wonder... what if the m.2 storage was actually DRAM? You probably don't need persistence for spilling a model off the GPU. How would it fare vs just adding more host memory? The m.2 ram would be less flexible, but would keep the system ram free for the CPU.
javchz|8 days ago
xaskasdf|7 days ago
unknown|7 days ago
[deleted]
TechSquidTV|8 days ago
lmeyerov|7 days ago
I gave a talk a few years ago at dask summit (conf?) on making the stars align with dask-cudf here. We were helping a customer accelerate log analytics by proving out our stack for nodes that look roughly like: parallel ssd storage arrays (30 x 3 GB/s?) -> GPUDirect Storage -> 4 x 30 GB/s PCIe (?) -> 8 x A100 GPUs, something like that. It'd be cool to see the same thing now in the LLM world, such as a multi-GPU MoE, or even a single-GPU one for that matter!
ElectricalUnion|7 days ago
bhewes|7 days ago
https://www.servethehome.com/hyper-scalers-are-using-cxl-to-...