top | item 42619320

(no title)

Karupan | 1 year ago

I feel this is bigger than the 5x series GPUs. Given the craze around AI/LLMs, this can also potentially eat into Apple’s slice of the enthusiast AI dev segment once the M4 Max/Ultra Mac minis are released. I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!

discuss

order

rbanffy|1 year ago

This is something every company should make sure they have: an onboarding path.

Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.

Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.

Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).

nimish|1 year ago

1000% all these ai hardware companies will fail if they don't have this. You must have a cheap way to experiment and develop. Even if you want to only sell a $30000 datacenter card you still need a very low cost way to play.

Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star

AtlasBarfed|1 year ago

It really mystifies me that Intel AMD and other hardware companies obviously Nvidia in this case Don't either have a consortium or each have their own in-house Linux distribution with excellent support.

Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.

Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.

Not just in the drag the feet communication. Getting the tech people a line problem.

Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.

UncleOxidant|1 year ago

There were Phi cards, but they were pricey and power hungry (at the time, now current GPU cards probably meet or exceed the Phi card's power consumption) for plugging into your home PC. A few years back there was a big fire sale on Phi cards - you could pick one up for like $200. But by then nobody cared.

sheepscreek|1 year ago

The developers they are referring to aren’t just enthusiasts; they are also developers who were purchasing SuperMicro and Lambda PCs to develop models for their employers. Many enterprises will buy these for local development because it frees up the highly expensive enterprise-level chip for commercial use.

This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!

HarHarVeryFunny|1 year ago

The 1 PetaFLOP spec and 200GB model capacity specs are for FP4 (4-bit floating point), which means inference not training/development. It's still be a decent personal development machine, but not for that size of model.

numba888|1 year ago

This looks like a bigger brother of Orin AGX, which has 64GB of RAM and runs smaller LLMs. The question will be power and performance vs 5090. We know price is 1.5x

stogot|1 year ago

How does it run 400B models across two? I didn’t see that in the article

dagmx|1 year ago

I think the enthusiast side of things is a negligible part of the market.

That said, enthusiasts do help drive a lot of the improvements to the tech stack so if they start using this, it’ll entrench NVIDIA even more.

Karupan|1 year ago

I’m not so sure it’s negligible. My anecdotal experience is that since Apple Silicon chips were found to be “ok” enough to run inference with MLX, more non-technical people in my circle have asked me how they can run LLMs on their macs.

Surely a smaller market than gamers or datacenters for sure.

qwertox|1 year ago

You could have said the same about gamers buying expensive hardware in the 00's. It's what made Nvidia big.

gr3ml1n|1 year ago

AMD thought the enthusiast side of things was a negligible side of the market.

epolanski|1 year ago

If this is gonna be widely used by ML engineers, in biopharma, etc and they land 1000$ margins at half a million sales that's half a billion in revenue, with potential to grow.

option|1 year ago

today’s enthusiast, grad student, hacker is tomorrow’s startup founder, CEO, CTO or 10x contributor in large tech company

VikingCoder|1 year ago

If I were NVidia, I would be throwing everything I could at making entertainment experiences that need one of these to run...

I mean, this is awfully close to being "Her" in a box, right?

computably|1 year ago

Yeah, it's more about preempting competitors from attracting any ecosystem development than the revenue itself.

bloomingkales|1 year ago

Jensen did say in recent interview, paraphrasing, “they are trying to kill my company”.

Those Macs with unified memory is a threat he is immediately addressing. Jensen is a wartime ceo from the looks of it, he’s not joking.

No wonder AMD is staying out of the high end space, since NVIDIA is going head on with Apple (and AMD is not in the business of competing with Apple).

T-A|1 year ago

From https://www.tomshardware.com/pc-components/cpus/amds-beastly...

The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.

[...]

AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.

Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.

nomel|1 year ago

> since NVIDIA is going head on with Apple

I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.

JoshTko|1 year ago

Which interview was this?

hkgjjgjfjfjfjf|1 year ago

You missed the Ryzen hx ai pro 395 product announcement

llm_trw|1 year ago

From the people I talk to the enthusiast market is nvidia 4090/3090 saturated because people want to do their fine tunes also porn on their off time. The Venn diagram of users who post about diffusion models and llms running at home is pretty much a circle.

dist-epoch|1 year ago

Not your weights, not your waifu

Tostino|1 year ago

Yeah, I really don't think the overlap is as much as you imagine. At least in /r/localllama and the discord servers I frequent, the vast majority of users are interested in one or the other primarily, and may just dabble with other things. Obviously this is just my observations...I could be totally misreading things.

numba888|1 year ago

> I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!

They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.

qwertox|1 year ago

This is somewhat similar to what GeForce was to gamers back in the days, but for AI enthusiasts. Sure, the price is much higher, but at least it's a completely integrated solution.

Karupan|1 year ago

Yep that's what I'm thinking as well. I was going to buy a 5090 mainly to play around with LLM code generation, but this is a worthy option for roughly the same price as building a new PC with a 5090.

trhway|1 year ago

>enthusiast AI dev segment

i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.

The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.

tatersolid|1 year ago

> these databases can be executed on GPU with a significant performance gain vs. CPU

No, they can’t. GPU databases are niche products with severe limitations.

GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.

tarsinge|1 year ago

> I sure wished I held some Nvidia stocks

I’m so tired of this recent obsession with the stock market. Now that retail is deeply invested it is tainting everything, like here on a technology forum. I don’t remember people mentioning Apple stock every time Steve Jobs made an announcement in the past decades. Nowadays it seems everyone is invested in Nvidia and just want the stock to go up, and every product announcement is a mean to that end. I really hope we get a crash so that we can get back to a more sane relation with companies and their products.

lioeters|1 year ago

> hope we get a crash

That's the best time to buy. ;)

paxys|1 year ago

“Bigger” in what sense? For AI? Sure, because this an AI product. 5x series are gaming cards.

a________d|1 year ago

Not expecting this to compete with the 5x series in terms of gaming; But it's interesting to note the increase in gaming performance Jensen was speaking about with Blackwell was larger related to inferenced frames generated by the tensor cores.

I wonder how it would go as a productivity/tinkering/gaming rig? Could a GPU potentially be stacked in the same way an additional Digit can?

Karupan|1 year ago

Bigger in the sense of the announcements.

AuryGlenz|1 year ago

Eh. Gaming cards, but also significantly faster. If the model fits in the VRAM the 5090 is a much better buy.

GaryNumanVevo|1 year ago

I bet $100k on NVIDIA stocks ~7 years ago, just recently closed out a bunch of them

axegon_|1 year ago

> they seem to be doing everything right in the last few years

About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.

sliken|1 year ago

Use a filesystem that snapshots AND do a complete backup.

mycall|1 year ago

Buy a second system which you can touch?

wslh|1 year ago

The nVidia price is closer (USD 3k) to a top Mac mini but I trust Apple more for the end-to-end support from hardware to apps than nVidia. Not an Apple fanboy but an user/dev, and I don't think we realize what Apple really achieved, industrially speaking. The M1 was launched in late 2020.

croes|1 year ago

Did they say anything about power consumption?

Apple M chips are pretty efficient.

behringer|1 year ago

Not only that, but it should help free up the gpus for the gamers.

puppymaster|1 year ago

it eats into all NVDA consumer-facing clients no? I can see why openai and etc are looking for alternative hardware solution to train their next model.

iKevinShah|1 year ago

I can confirm this is the case (for me).

informal007|1 year ago

I would like to have Mac as my personal computer and digits as service to run llm.

csomar|1 year ago

Am I the only one disappointed by these? They cost roughly half the price of a macbook pro and offer hmm.. half the capacity in RAM. Sure speed matters in AI, but what do I do with speed when I can't load a 70b model.

On the other hand, with a $5000 macbook pro, I can easily load a 70b model and have a "full" macbook pro as a plus. I am not sure I fully understand the value of these cards for someone that want to run personal AI models.

gnabgib|1 year ago

Are you, perhaps, commenting on the wrong thread? Project Digits is a $3k 128GB computer.. the best your your $5K MBP can have for ram is.. 128GB.

rictic|1 year ago

Hm? They have 128GB of RAM. Macbook Pros cap out at 128GB as well. Will be interesting to see how a Project Digits machine performs in terms of inference speed.

blurbleblurble|1 year ago

Then buy two and stack them!

Also I'm unfamiliar with macs is there really a MacBook pro with 256GB of RAM?

maniroo|1 year ago

Bro we can connect two ProjectDigits as well. I was only looking at the M4 macbook because 128gb unified memory. Now this beast can cook better LLMs at just 3K with 4TB SSD too. M4 Macbook Max (128 GB unified ram and 4TB Storage) is 5999. So, No more apple for me. I will just get the Digits. And can create a workstation as well.

doctorpangloss|1 year ago

What slice?

Also, macOS devices are not very good inference solutions. They are just believed to be by diehards.

I don't think Digits will perform well either.

If NVIDIA wanted you to have good performance on a budget, it would ship NVLink on the 5090.

Karupan|1 year ago

They are perfectly fine for certain people. I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious and would prefer using existing devices rather than pay for subscriptions where possible.

And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.

YetAnotherNick|1 year ago

> Also, macOS devices are not very good inference solutions

They are good for single batch inference and have very good tok/sec/user. ollama works perfectly in mac.