top | item 42309966

(no title)

GDDR isnt like the ram that connects to cpu, it's much more difficult and expensive to add more. You can get up to 48GB with some expensive stacked gddr, but if you wanted to add more stacks you'd need to solve some serious signal timing related headaches that most users wouldn't benefit from.

I think the high memory local inference stuff is going to come from "AI enabled" cpus that share the memory in your computer. Apple is doing this now, but cheaper options are on the way. As a shape its just suboptimal for graphics, so it doesn't make sense for any of the gpu vendors to do it.

discuss

smcleod|1 year ago

As someone else said - I don't think you have to have GDDR, surely there are other options. Apple does a great job of it on their APUs with up to 192GB, even an old AMD Threadripper chip can do quite well with its DDR4/5 performance

chessgecko|1 year ago

For ai inference you definitely have other options, but for low end graphics? the lpddr that apple (and nvidia in grace) use would be super expensive to get a comparable bandwidth (think $3+/gb and to get 500GB/sec you need at least 128GB).

And that 500GB/sec is pretty low for a gpu, its like a 4070 but the memory alone would add $500+ to the cost of the inputs, not even counting the advanced packaging (getting those bandwidths out of lpddr needs organic substrate).

It's not that you can't, just when you start doing this it stops being like a graphics card and becomes like a cpu.

treprinum|1 year ago

They can use LPDDR5x, it would still massively accelerate inference of large local LLMs that need more than 48GB RAM. Any tensor swapping between CPU RAM and GPU RAM kills the performance.

chessgecko|1 year ago

I think we don't really disagree, I just think that this shape isn't really a gpu its just a cpu because it isn't very good for graphics at that point.

ryao|1 year ago

It is not stacked. It is multirank. Stacking means putting multiple layers on the same chip. They are already doing it for HBM. They will likely do it for other forms of DRAM in the future. Samsung reportedly will begin doing it in the 2030s:

https://www.tomshardware.com/pc-components/dram/samsung-outl...

I am not sure why they can already do stacking for HBM, but not GDDR and DDR. My guess is that it is cost related. I have heard that HBM reportedly costs 3 times more than DDR. Whatever they are doing to stack it now that is likely much more expensive than their planned 3D fabrication node.