I believe “world models” are the future of the field so I really need better performance in areas like IVP, FFT, special functions (e.g., harmonics), and dynamic programming. The H100 advances (e.g., DPX instructions) are terrific but they feel like a starting point. Hell, improved geometric operations (e.g., triangulation and intersection) would be killer too and surely that expertise exists at NVIDIA! The H100, especially for the price, feels terrible when you’re training a neural network bottlenecked on an operation that flies on a consumer CPU when you know there’s GPU optimizations that have been left on the floor.
I suspect these can be patched in as well - most of these functions have implementations in CUDA implying they should be able to run on the hardware even without dedicated instructions.
I imagine that Nvidia is trying to build a more sustainable moat. If the only things they have that their competitors don’t are a nice development framework, nice libraries and nice drivers, it’s not that hard for a customer to get their software working on a competing hardware platform and cut out a bunch of Nvidia’s enormous markup. But if Nvidia also strong buy-in with datacenter operators and an entire platform to magically run people’s applications without having them need to think about how they’re deployed, then they can try for an AWS-like moat in which customers want to avoid the ongoing cost of DIYing their stack.
I'm very vocal about this to the point where the naive/cursory view is that I'm an "Nvidia fanboy". It's amazing how many times I've had to try to relate this point and how much hate I get for it - Nvidia is lightyears ahead of AMD and the overall ROCm ecosystem in terms of software support. AMD makes fantastic hardware but at the end of the day it doesn't do anything without software. This is very obvious and very basic.
CUDA will do whatever you want and it more-or-less just works. ROCm (after > six years) is still:
- Won't work on your hardware
- Used to work on your hardware but we removed support within a few years
- Burn 10x more time trying to get something to work
- Be perpetually behind CUDA in terms of what you want/need to do
- Sorry, that just won't work
- Performance is lower than it should be for what is often actually better hardware, to the point where a superior newer generation AMD GPU gets bested by a previous generation Nvidia GPU with inferior (on paper) hardware specs
I've been trying ROCm since it was initially released > six years ago. I want AMD to succeed - I've purchased every new generation of AMD GPU in these six years to evaluate the suitability of AMD/ROCm for my workloads. Once a quarter or so I check back in to evaluate ROCm.
Every. Single. Time. I come away laughing/shaking my head at how abysmal it is. Then I go back to CUDA and sit in wonder at how well it actually works and throw even more money at Nvidia because I just need get things done and my concerns about their monopoly, artificial market segmentation, ridiculously high margins, etc are a distant second to my livelihood.
AMD (and others) need to understand what Jensen Huang has been saying for years - 30% of their development spend is on software. As the announcements this week show, Nvidia is using their greater and greater financial resources and market share to continue to lap AMD in the only thing people actually care about: here's our product and here's what you can actually do with it.
Many people with a fundamental hate/disgust for Nvidia will come back and say "ok bootlicker, it's supported in torch you're spreading FUD". Ok, take a look at the Nvidia platform you linked and show me where the ROCm equivalent is. Take a look at inference serving platforms which are one of the things I care most about. Look at flash attention, alibi, and the countless other software components that you actually need beyond torch in many cases. Watch even basic torch crash all over the place with ROCm.
Sure, you /might/ be able to train or run local one-off inference with AMD. How do I actually run this thing for my users? Crickets -or- maybe vLLM support for ROCm for LLMs (nothing for other models). Then dig just a little bit deeper and realize even vLLM isn't feature complete, requires patches, specific versions all around, and from personal experience a lot of github/blog spelunking and pain. With CUDA it's `docker run` and flies.
With CUDA I can run torchserve, HF TGI, vLLM, Triton, and a number of others to actually serve models up for users so I can make money from my work. ROCm, meanwhile, can barely run local experiments.
GPUs always had more compute / Gigaflops than traditional computers. GPUs in fact have more to do with 80s-era supercomputer architecture than normal CPUs.
If everyone and their cousin is bullish on compute, then what’s the bear thesis here? Why might compute NOT be the best answer to our challenges as software engineers? Why might a focus on compute scaling ultimately be inferior to something else?
I seek all kinds of answers, including ones about fundamental logic, mathematical physics, etc
When we don’t have a model of the problem it takes enormous amounts of power to synthesize one out of neurons or another general function approximating primitive.
But once we understand a little bit about the problem we can model 80-90% of its behavior with a handful of parameters. Add in some bias and noise parameters and you have an accurate trainable machine learning model that’s orders of magnitude more efficient.
Take for example a spring which can be modeled by 1 or 2 parameters. But its impulse response looks like a sin curve multiple by exponential decay.
If you just train neurons to match input/outputs from a spring you need a ridiculous number of model parameters to describe that shape.
CNNS have seen an enormous amount of success due to this fact: a lot of processes can be modeled by convolution.
Nvidia is ceding the low-end GPU market to anyone who wants it. Not only does it allow a competitor to establish a reliable source of revenue for their R&D department, but it could cut off the sale of the binned chips that are inevitably produced on the expensive, tiny processes that Nvidia uses - which would hurt their margins to some degree.
I'm absolutely not going to short Nvidia stock, but it's plausible that they're overvalued.
Nvidia GPUs are pretty flexible in terms of computation and extremely power hungry. It may be that a next generation of more specialized hardware, such as TPUs or something, outperforms Nvidia GPUs on machine learning tasks to such an extent that those GPUs are obsolete for those tasks. This next generation could come to market sooner than Nvidia anticipates.
Another possibility is that ML researchers figure out some ways to radically reduce the amount of compute required for good training and inference on _less_ specialized hardware. It's really impressive what you can do with llama.cpp. If open source models running on consumer grade hardware ever get to 90% as good as ChatGPT (which, to be clear, is absolutely not the case currently), then those top end GPUs are overkill for most use cases.
I don't think either of those scenarios is particularly likely, but they're at least plausible.
If we focus on writing better software with performance in mind instead of this insane stack of abstraction disaster, we could easily get massive increases in compute capability with current hardware.
The most impressive thing about modern computing is that we've had exponential increase in compute speed, yet everything runs as slow as it did 30 years ago
Not sure about your definition of bearish, but concerned about the geopolitical risk of relying on this one company and their one supplier. In addition to everything else.
Also bearish on programmers keeping up on their fundamental algorithms, rather than trying to throw NNs at every problem.
Going to skip the "fundamental logic, mathematical physics, etc" angle and go with:
"That AI isn't all that that and won't make much money" seems to be by far the biggest one. So far the applications are impressive and a little scary, but not actually something that anyone is going to pay for. Apple makes a zillion dollars because people want its phones. Google makes a zillion dollars because people want to sell junk to folks on the internet.
You need to posit a product built out of compute that does more. Maybe replaces a bunch of existing workers in an existing industry, something like that. So far the market is still looking.
I think we can be certain that AGI will be compute intensive. Something Ilya Sutskever said made that clear. If you only have a small model, there's logically not much you can do with it. You can represent a single edge of an object, maybe. But it's not enough capacity to represent multiple edges and how they mix together to form an object. And if it can't do that, then it has no representations it can use for reasoning.
There's still the secondary question of how compute heavy it will be, and I don't think anyone knows. But Sam Altman, in a recent speech he gave in Korea, expressed confidence that there isn't a limit in sight for returns from GPT scaling.
> Why might a focus on compute scaling ultimately be inferior to something else?
a) Humans at some point between here and eternity become more efficient easier, than scaling compute is hard. Seems unlikely.
b) Compute is overrated, now or will be in the near future. I will be happy to donate to the church of "compute is overrated" if that makes people get off of gpt-4+ and let me cook. Read that as: I doubt it.
I think eventually AMD and other players are going to mive in with force and we will see a surplus of supply in a few years, especially when China starts to spit out a lower end version of...everything.
I would normally be tempted to think the S&P500 should look linear or similar-ish to last year, and so on. But I think there's a valid thesis where rapid technological advancement does indeed just grow the pie exponentially: where the amount of value that becomes unlocked in a non-zero-sum manner grows tremendously.
Just with current AI models, the amount of value that is waiting to be created (take technology X, add AI to it) is incredible. Casual things that used to take years for a team to build, can now be solved by throwing a GPU at it with a generic model that is fine-tuned a bit. Basically, things that were unpractical 2y ago are now on the table.
The bear thesis is that compute will stop being scarce. Which is plausible, since in capitalism, the best cure for high prices tends to be high prices.
Something crazy to think about is that Accelerando by Charles Stross is starting to look like a prophecy being slowly fulfilled.
I mean the entire tech industry is predicated on a continued exponential growth of computer power. It could be this 70 years and next couple dozen are a blip of what will be millennia of linear returns.
I mean I could baselessly argue that the second we have AGI, we will very shortly after have ASI, after which I think most computing could very likely be hundreds or thousands of times more efficient. It’s possible there exists far more computing power in the world than we will ever need.
I guess Blackwell is too late in the design cycle to use N3. It would be interesting to see, at these sort of margin and volume. Would it make sense to have GPU on latest node? Next Gen 3nm GPU in 2025 and if they could move aggressively 2nm GPU in 2026.
Does anyone know the numbers in layman's terms regarding the demand for compute and what our systems/chips are able to reasonably process with this new tech?
I'm curious if the technology is now vastly out preforming the demand here or if the demand for compute is outpacing the tech.
Is Nvidia using AI to help design new GPUs? When that happens we're actually off to the singularity. Until then, I can't tell if we're in a bit of hype mode.
Honestly, idk what that means.
Is it a 4xxx successor?
What's the point? Why, as a consumer who likes playing video games, do I have to buy a "GPU" that isn't primarily a GPU?
Maybe I'm getting old
I'm just super curious where these chips will be in 10 years - not the state of the chip design - these physical chips.
It will be interesting when chips such as this percolate down to single folks using one of these to just run their home AI node.
When every building has just one of these in their core building AI system that allows for all the regular talk to your smart home and have it intelligently accommodate your inferred needs/intentions.
Instead, I expect those buildings to invest in having the fastest, most stable internet connection with added redundancy and everything will be fully centralized in datacenters.
They wouldn't, because in 10 years time the consumer-grade equivalent will out-power this model, assuming the increase in performance and decrease in cost persists.
I'm sure there's a few people that have e.g. an intel itanium from 10 years ago in their home lab, but those don't hold a candle to current-day consumer grade CPUs.
If this thing is all about AI why are we calling it a GRAPHICS processing unit still?
Don't tell me it's because of familiarity with the word GPU. Nvidia could coin a new acronym and write a PR release and the entire world would circulate it and even discuss it in here and every other vendor would scramble to play catch-up.
The improvements focus on AI, but it's still a GPGPU oriented chip. Due to how language works, it doesn't really matter that graphics isn't the main focus anymore, the chip still follows the same basic architectural principles as what is expect of a modern GPU, thus it is a GPU.
> If this thing is all about AI why are we calling it a GRAPHICS processing unit still?
Because names are sticky, and no one wants to start evangelizing a new term (“Matrix Math Processing Unit”) for it, preferring to put energy into things with value.
[+] [-] Q6T46nT668w6i3m|1 year ago|reply
[+] [-] tgtweak|1 year ago|reply
[+] [-] pjmlp|1 year ago|reply
https://www.nvidia.com/en-us/data-center/products/ai-enterpr...
It is this kind of delivery that the competition misses out.
[+] [-] amluto|1 year ago|reply
[+] [-] kkielhofner|1 year ago|reply
CUDA will do whatever you want and it more-or-less just works. ROCm (after > six years) is still:
- Won't work on your hardware
- Used to work on your hardware but we removed support within a few years
- Burn 10x more time trying to get something to work
- Be perpetually behind CUDA in terms of what you want/need to do
- Sorry, that just won't work
- Performance is lower than it should be for what is often actually better hardware, to the point where a superior newer generation AMD GPU gets bested by a previous generation Nvidia GPU with inferior (on paper) hardware specs
I've been trying ROCm since it was initially released > six years ago. I want AMD to succeed - I've purchased every new generation of AMD GPU in these six years to evaluate the suitability of AMD/ROCm for my workloads. Once a quarter or so I check back in to evaluate ROCm.
Every. Single. Time. I come away laughing/shaking my head at how abysmal it is. Then I go back to CUDA and sit in wonder at how well it actually works and throw even more money at Nvidia because I just need get things done and my concerns about their monopoly, artificial market segmentation, ridiculously high margins, etc are a distant second to my livelihood.
AMD (and others) need to understand what Jensen Huang has been saying for years - 30% of their development spend is on software. As the announcements this week show, Nvidia is using their greater and greater financial resources and market share to continue to lap AMD in the only thing people actually care about: here's our product and here's what you can actually do with it.
Many people with a fundamental hate/disgust for Nvidia will come back and say "ok bootlicker, it's supported in torch you're spreading FUD". Ok, take a look at the Nvidia platform you linked and show me where the ROCm equivalent is. Take a look at inference serving platforms which are one of the things I care most about. Look at flash attention, alibi, and the countless other software components that you actually need beyond torch in many cases. Watch even basic torch crash all over the place with ROCm.
Sure, you /might/ be able to train or run local one-off inference with AMD. How do I actually run this thing for my users? Crickets -or- maybe vLLM support for ROCm for LLMs (nothing for other models). Then dig just a little bit deeper and realize even vLLM isn't feature complete, requires patches, specific versions all around, and from personal experience a lot of github/blog spelunking and pain. With CUDA it's `docker run` and flies.
With CUDA I can run torchserve, HF TGI, vLLM, Triton, and a number of others to actually serve models up for users so I can make money from my work. ROCm, meanwhile, can barely run local experiments.
AMD needs to get it together.
[+] [-] treme|1 year ago|reply
[+] [-] dragontamer|1 year ago|reply
GPUs always had more compute / Gigaflops than traditional computers. GPUs in fact have more to do with 80s-era supercomputer architecture than normal CPUs.
https://www.youtube.com/watch?v=ODIqbTGNee4
[+] [-] transcriptase|1 year ago|reply
[+] [-] zacksiri|1 year ago|reply
Generative Processing Unit works for all cases.
[+] [-] m3kw9|1 year ago|reply
[+] [-] adverbly|1 year ago|reply
You could call them gPUs.
https://en.m.wikipedia.org/wiki/G_factor_(psychometrics)
[+] [-] sesuximo|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] bionhoward|1 year ago|reply
I seek all kinds of answers, including ones about fundamental logic, mathematical physics, etc
[+] [-] whiterknight|1 year ago|reply
But once we understand a little bit about the problem we can model 80-90% of its behavior with a handful of parameters. Add in some bias and noise parameters and you have an accurate trainable machine learning model that’s orders of magnitude more efficient.
Take for example a spring which can be modeled by 1 or 2 parameters. But its impulse response looks like a sin curve multiple by exponential decay.
If you just train neurons to match input/outputs from a spring you need a ridiculous number of model parameters to describe that shape.
CNNS have seen an enormous amount of success due to this fact: a lot of processes can be modeled by convolution.
[+] [-] Kon-Peki|1 year ago|reply
Nvidia is ceding the low-end GPU market to anyone who wants it. Not only does it allow a competitor to establish a reliable source of revenue for their R&D department, but it could cut off the sale of the binned chips that are inevitably produced on the expensive, tiny processes that Nvidia uses - which would hurt their margins to some degree.
[+] [-] loudmax|1 year ago|reply
Nvidia GPUs are pretty flexible in terms of computation and extremely power hungry. It may be that a next generation of more specialized hardware, such as TPUs or something, outperforms Nvidia GPUs on machine learning tasks to such an extent that those GPUs are obsolete for those tasks. This next generation could come to market sooner than Nvidia anticipates.
Another possibility is that ML researchers figure out some ways to radically reduce the amount of compute required for good training and inference on _less_ specialized hardware. It's really impressive what you can do with llama.cpp. If open source models running on consumer grade hardware ever get to 90% as good as ChatGPT (which, to be clear, is absolutely not the case currently), then those top end GPUs are overkill for most use cases.
I don't think either of those scenarios is particularly likely, but they're at least plausible.
[+] [-] vinyl7|1 year ago|reply
The most impressive thing about modern computing is that we've had exponential increase in compute speed, yet everything runs as slow as it did 30 years ago
[+] [-] kilpikaarna|1 year ago|reply
Also bearish on programmers keeping up on their fundamental algorithms, rather than trying to throw NNs at every problem.
[+] [-] cactusplant7374|1 year ago|reply
1) AGI won't happen because we are on the wrong path
2) AI being a big part of our lives is still a theory. Aswath Damodaran has some brief thoughts on this.
But the biggest bear case has to be that the technology won't get better. Essentially, everyone assumes that it will without reservations.
[+] [-] ajross|1 year ago|reply
"That AI isn't all that that and won't make much money" seems to be by far the biggest one. So far the applications are impressive and a little scary, but not actually something that anyone is going to pay for. Apple makes a zillion dollars because people want its phones. Google makes a zillion dollars because people want to sell junk to folks on the internet.
You need to posit a product built out of compute that does more. Maybe replaces a bunch of existing workers in an existing industry, something like that. So far the market is still looking.
[+] [-] whiterknight|1 year ago|reply
But fundamentally technology gets better when we can do more with less.
[+] [-] hackerlight|1 year ago|reply
There's still the secondary question of how compute heavy it will be, and I don't think anyone knows. But Sam Altman, in a recent speech he gave in Korea, expressed confidence that there isn't a limit in sight for returns from GPT scaling.
[+] [-] jstummbillig|1 year ago|reply
a) Humans at some point between here and eternity become more efficient easier, than scaling compute is hard. Seems unlikely.
b) Compute is overrated, now or will be in the near future. I will be happy to donate to the church of "compute is overrated" if that makes people get off of gpt-4+ and let me cook. Read that as: I doubt it.
I don't see a c)
[+] [-] hnthrowaway0328|1 year ago|reply
[+] [-] AYBABTME|1 year ago|reply
Just with current AI models, the amount of value that is waiting to be created (take technology X, add AI to it) is incredible. Casual things that used to take years for a team to build, can now be solved by throwing a GPU at it with a generic model that is fine-tuned a bit. Basically, things that were unpractical 2y ago are now on the table.
The bear thesis is that compute will stop being scarce. Which is plausible, since in capitalism, the best cure for high prices tends to be high prices.
Something crazy to think about is that Accelerando by Charles Stross is starting to look like a prophecy being slowly fulfilled.
[+] [-] Cacti|1 year ago|reply
[+] [-] whamlastxmas|1 year ago|reply
[+] [-] flohofwoe|1 year ago|reply
(seriously though, don't call it a "GPU" when rendering takes the back seat)
[+] [-] ksec|1 year ago|reply
[+] [-] dsir|1 year ago|reply
I'm curious if the technology is now vastly out preforming the demand here or if the demand for compute is outpacing the tech.
[+] [-] trueismywork|1 year ago|reply
[+] [-] captainbland|1 year ago|reply
"up to 30 times the inference performance, and up to 25 times better energy efficiency"
[+] [-] edward28|1 year ago|reply
[+] [-] ChrisArchitect|1 year ago|reply
[+] [-] flerchin|1 year ago|reply
[+] [-] aurareturn|1 year ago|reply
Quite interesting.
[+] [-] lakomen|1 year ago|reply
[+] [-] samstave|1 year ago|reply
It will be interesting when chips such as this percolate down to single folks using one of these to just run their home AI node.
When every building has just one of these in their core building AI system that allows for all the regular talk to your smart home and have it intelligently accommodate your inferred needs/intentions.
Its possible today - but I mean on a wide scale.
[+] [-] syrgian|1 year ago|reply
Instead, I expect those buildings to invest in having the fastest, most stable internet connection with added redundancy and everything will be fully centralized in datacenters.
[+] [-] Cthulhu_|1 year ago|reply
I'm sure there's a few people that have e.g. an intel itanium from 10 years ago in their home lab, but those don't hold a candle to current-day consumer grade CPUs.
[+] [-] Solvency|1 year ago|reply
Don't tell me it's because of familiarity with the word GPU. Nvidia could coin a new acronym and write a PR release and the entire world would circulate it and even discuss it in here and every other vendor would scramble to play catch-up.
[+] [-] whywhywhywhy|1 year ago|reply
It's because of familiarity with the word GPU...
[+] [-] dotnet00|1 year ago|reply
[+] [-] dragonwriter|1 year ago|reply
Because names are sticky, and no one wants to start evangelizing a new term (“Matrix Math Processing Unit”) for it, preferring to put energy into things with value.
[+] [-] Nullabillity|1 year ago|reply
[+] [-] __s|1 year ago|reply