top | item 36200542

(no title)

With a unified 192GB memory, could this challenge nvidia's lock on ML training? Or is CUDA still insurmountable?

discuss

bufo|2 years ago

The memory bandwidth is still a bit lower than Nvidia's best cards, and it doesn't have the equivalent of Tensor Cores. If they wanted they could compete, but it's clearly not their desire. They build consumer end products.

llm_nerd|2 years ago

The neural engine on all recent Apple silicon (and A## devices) has "tensor" cores for matrix calculations (note: Apple abstracts all of this behind coreml so there is some conflation between the ANE and AMX instructions/hardware). The M2 Ultra offers 31.6 trillion ops per second with fp16, for instance, which actually bests an A100.

The software support is terrible, of course, which is the biggest limitation, but Apple clearly wants to be in that realm as well.

zamadatix|2 years ago

Regarding Tensor cores, it does have them as part of the 32 core Neural Engine. Apple considers AI/ML a consumer feature, all the way down to the iPhone hardware. At the same time, this isn't a data-center supercluster. It's still just a mid sized workstation.

corysama|2 years ago

Nvidia's NVIDIA DGX™ GH200 supercomputer links 256 Grace Hopper CPU+GPU chips with 576 GBs of unified memory each into a single 144 terabyte GPU address space.

https://www.nvidia.com/en-us/data-center/dgx-gh200/

ajb117|2 years ago

Besides Nvidia cards usually being faster and having higher memory bandwidth, Ada cards also have FP8 cores. I'm not sure how well Apples' M-series chips handle low/mixed precision tensors, but I wouldn't be surprised if Nvidia cards perform better with them.

unknown|2 years ago

[deleted]

viscanti|2 years ago

Training is unlikely for most things but you could probably run a trained model on it that needs a lot of RAM.