(no title)
chessgecko | 1 year ago
I think the high memory local inference stuff is going to come from "AI enabled" cpus that share the memory in your computer. Apple is doing this now, but cheaper options are on the way. As a shape its just suboptimal for graphics, so it doesn't make sense for any of the gpu vendors to do it.
smcleod|1 year ago
chessgecko|1 year ago
And that 500GB/sec is pretty low for a gpu, its like a 4070 but the memory alone would add $500+ to the cost of the inputs, not even counting the advanced packaging (getting those bandwidths out of lpddr needs organic substrate).
It's not that you can't, just when you start doing this it stops being like a graphics card and becomes like a cpu.
treprinum|1 year ago
chessgecko|1 year ago
ryao|1 year ago
https://www.tomshardware.com/pc-components/dram/samsung-outl...
I am not sure why they can already do stacking for HBM, but not GDDR and DDR. My guess is that it is cost related. I have heard that HBM reportedly costs 3 times more than DDR. Whatever they are doing to stack it now that is likely much more expensive than their planned 3D fabrication node.