I feel this is bigger than the 5x series GPUs. Given the craze around AI/LLMs, this can also potentially eat into Apple’s slice of the enthusiast AI dev segment once the M4 Max/Ultra Mac minis are released. I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!
rbanffy|1 year ago
Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.
Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.
Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).
nimish|1 year ago
Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star
AtlasBarfed|1 year ago
Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.
Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.
Not just in the drag the feet communication. Getting the tech people a line problem.
Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.
p_ing|1 year ago
https://www.raptorcs.com/content/base/products.html
UncleOxidant|1 year ago
sheepscreek|1 year ago
This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!
HarHarVeryFunny|1 year ago
numba888|1 year ago
stogot|1 year ago
dagmx|1 year ago
That said, enthusiasts do help drive a lot of the improvements to the tech stack so if they start using this, it’ll entrench NVIDIA even more.
Karupan|1 year ago
Surely a smaller market than gamers or datacenters for sure.
qwertox|1 year ago
gr3ml1n|1 year ago
epolanski|1 year ago
option|1 year ago
VikingCoder|1 year ago
I mean, this is awfully close to being "Her" in a box, right?
computably|1 year ago
bloomingkales|1 year ago
Those Macs with unified memory is a threat he is immediately addressing. Jensen is a wartime ceo from the looks of it, he’s not joking.
No wonder AMD is staying out of the high end space, since NVIDIA is going head on with Apple (and AMD is not in the business of competing with Apple).
T-A|1 year ago
The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.
[...]
AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.
Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.
nomel|1 year ago
I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.
JoshTko|1 year ago
hkgjjgjfjfjfjf|1 year ago
llm_trw|1 year ago
dist-epoch|1 year ago
Tostino|1 year ago
numba888|1 year ago
They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.
qwertox|1 year ago
Karupan|1 year ago
trhway|1 year ago
i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.
The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.
tatersolid|1 year ago
No, they can’t. GPU databases are niche products with severe limitations.
GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.
tarsinge|1 year ago
I’m so tired of this recent obsession with the stock market. Now that retail is deeply invested it is tainting everything, like here on a technology forum. I don’t remember people mentioning Apple stock every time Steve Jobs made an announcement in the past decades. Nowadays it seems everyone is invested in Nvidia and just want the stock to go up, and every product announcement is a mean to that end. I really hope we get a crash so that we can get back to a more sane relation with companies and their products.
lioeters|1 year ago
That's the best time to buy. ;)
paxys|1 year ago
a________d|1 year ago
I wonder how it would go as a productivity/tinkering/gaming rig? Could a GPU potentially be stacked in the same way an additional Digit can?
Karupan|1 year ago
AuryGlenz|1 year ago
GaryNumanVevo|1 year ago
axegon_|1 year ago
About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.
sliken|1 year ago
mycall|1 year ago
technofiend|1 year ago
adolph|1 year ago
0. https://www.macstadium.com/blog/m4-mac-mini-review
1. https://www.apple.com/mac/compare/?modelList=Mac-mini-M4,Mac...
wslh|1 year ago
croes|1 year ago
Apple M chips are pretty efficient.
behringer|1 year ago
puppymaster|1 year ago
iKevinShah|1 year ago
informal007|1 year ago
csomar|1 year ago
On the other hand, with a $5000 macbook pro, I can easily load a 70b model and have a "full" macbook pro as a plus. I am not sure I fully understand the value of these cards for someone that want to run personal AI models.
gnabgib|1 year ago
rictic|1 year ago
blurbleblurble|1 year ago
Also I'm unfamiliar with macs is there really a MacBook pro with 256GB of RAM?
maniroo|1 year ago
doctorpangloss|1 year ago
Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.
I don't think Digits will perform well either.
If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.
Karupan|1 year ago
And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.
YetAnotherNick|1 year ago
They are good for single batch inference and have very good tok/sec/user. ollama works perfectly in mac.