top | item 32912953

GeForce RTX 40 Series

381 points| capableweb | 3 years ago |nvidia.com

675 comments

order
[+] jessfyi|3 years ago|reply
These prices are beyond insulting and frankly I'm glad they're going to take a hit competing with the flood of miner cards entering the market.

Also note that nothing is preventing Optical Flow Acceleration [0] (and subsequently the DLSS 3.0 models that they claim are exclusive to the 40 series) from running on either 2/3 RTX cards. Just like RTX Voice and other gimmick "exclusives" I expect it to be available to older cards the moment they realize their backlog of 30 series cards aren't clearing as quickly as they thought.

They're competing against an over-bloated secondhand market, AMD, Intel, much better integrated GPUs, and comparatively cheaper consoles that maintain sky-high demand with subsidized games via new subscription programs. They're vastly overestimating their brand loyalty (think Microsoft v Sony after the 360) and EVGA's exit makes more sense now than ever.

[0] https://developer.nvidia.com/opticalflow-sdk

[+] mrb|3 years ago|reply
So the 4090 is 38% faster (in FP32 FLOPS) than the current top server GPU (H100) and 105% faster than the current top desktop GPU (3090 Ti). And it's also more than twice as efficient (in FLOPS per watt) as all current top GPUs, even compared to the H100 which is manufactured on the same TSMC 4N process. This is impressive.

The computing power of these new GPUs (in FP32 TFLOPS) is as below:

  Nvidia RTX 4090:      82.6 TFLOPS (450 watts¹)
  Nvidia RTX 4080-16GB: 48.8 TFLOPS (320 watts¹)
  Nvidia RTX 4080-12GB: 40.1 TFLOPS (285 watts¹)
Compared to Nvidia's current top server and top desktop GPUs:

  Nvidia H100 (SXM card): 60.1 TFLOPS (700 watts)
  Nvidia RTX 3090 Ti:     40.0 TFLOPS (450 watts)
Compared to AMD's current top server and top desktop GPUs:

  AMD Instinct MI250X:    47.9 TFLOPS (560 watts)
  AMD Radeon RX 6950 XT:  23.7 TFLOPS (335 watts)
¹ I didn't see the wattage listed in this page by Nvidia; my source is https://www.digitaltrends.com/computing/nvidia-rtx-4090-rtx-...
[+] mountainb|3 years ago|reply
It seems like they're really emphasizing the difference in RT performance from the previous generation, but I think the gaming market at least will care more about the difference in raw frames and memory size from the previous generation and AMD's offerings.

Personally, I like using RT for some single player showpiece games with a 3080 Ti, RT was useless on my 2080, but the games that I play the most do not use RT at all. DLSS is always great on any title that offers it, but again the real issue is that most of the games that people put real time into are the types of games for which RT is irrelevant. Graphical showpiece stuff is just a lot less relevant to the contemporary gaming market than it used to be.

[+] aseipp|3 years ago|reply
Eh, the current 3080 can already do 60FPS @ 4k HDR on every AAA title I've thrown at it, and that's with an mITX undervolt build (cliffing the voltage at a point). "60FPS @ 4k is readily achievable" has been sort of the gold standard we've been careening towards for years since displays outpaced GPUs, and we're just about there now. The raw frame difference and memory size is nice especially if you're doing compute on these cards, but these weren't holding the gaming market back, at least. So for those segments, you need some juice to go on top of things. I can see why they advertise it this way.

Personally, people say RT is a gimmick but I find it incredible on my 3080 for games that support it. In a game like Control the lighting is absolutely stunning, and even in a game like Minecraft RTX, the soft lighting and shadows are simply fantastic. (Minecraft IMO is a perfect example of how better soft shadows and realistic bounce lighting aren't just for "ultra realism" simulators.) It's already very convincing when implemented well. So I'm very happy to see continued interest here.

[+] arecurrence|3 years ago|reply
I suspect they also highlight RT performance (and AI acceleration which is more-so focused on a different market than these gaming cards) because it is their key differentiator with competitors.

Most upper market cards can already run most games well at 1440p or 4k.

[+] trafficante|3 years ago|reply
I’d be interested in better RT performance for the purposes of VR gaming but, unfortunately, the high fidelity PCVR gaming market died with the Quest 2.
[+] jeffcox|3 years ago|reply
While RT is a single player gimmick I mostly turn off, it won't take many more increases like this to make it a very real feature. What will we see when developers start targeting these cards, or their successors?
[+] pornel|3 years ago|reply
Not everyone is a pro CS:Go player. I really like RTX shadow improvements and wish more games supported it. I've bought games for their "showpiece stuff".
[+] iLoveOncall|3 years ago|reply
> but I think the gaming market at least will care more about the difference in raw frames and memory size

I'm not sure I understand why memory is important for gaming? For most games, with every settings maxed up, it'll be a stretch if it uses 6GB of VRAM.

For other applications than gaming, 100% agree, but for gaming I can't imagine it's important.

[+] spywaregorilla|3 years ago|reply
I've yet to see a game that my laptop 2080 can't handle on max settings. The only games I've done that asked for raytracing I did was Control and Resident Evil 8.

The pool of games that ask for extremely high performance is very small and pretty easy to ignore by accident.

[+] zzixp|3 years ago|reply
4080 -> $900/1200, 4090 -> $1600

What the hell Nvidia. Post EVGA breakup, this is a bad look. Seems like they're setting MSRP ridiculously high in order to undercut board partners down the line.

[+] mdorazio|3 years ago|reply
450 watt TDP? I feel like a crazy person every time a new generation of GPU comes out and raises the bar on power consumption and heat generation. How is this ok?
[+] aseipp|3 years ago|reply
The 3090 Ti series was already pushing 450W so this isn't new.[1] And it's because they clock these things incredibly high, well beyond the efficiency curve where it makes sense. Because that's what gaming customers expect, basically. On the datacenter cards they quadruple or octuple the memory bus width, and they drop the clocks substantially and hit iso-perf with way, way better power. But those high-bandwidth memory interfaces are expensive and gaming workloads often can't saturate them, leaving the compute cores starved. So they instead ramp the hell out of the clocks and power delivery to make up for that and pump data into the cores as fast as possible on a narrow bus. That takes a lot of power and power usage isn't linear. It's just the nature of the market these things target. High compute needs, but low-ish memory/bus needs.

This isn't necessarily a bad or losing strategy, BTW, it just is what it is. Data paths are often very hot and don't scale linearly in many dimensions, just like power usage doesn't. Using smaller bus widths and improving bus clock in return is a very legitimate strategy to improve overall performance, it's just one of those tough tradeoffs.

Rule of thumb: take any flagship GPU and undervolt it by 30%, saving 30% power/heat dissipation, and you'll retain +90% of the performance in practice. My 3080 is nominally 320/350W but in practice I just cliff it to about 280W and it's perfectly OK in everything.

[1] Some people might even be positively surprised, since a lot of the "leaks" (bullshit rumors) were posting astronomically ridiculous numbers like 800W+ for the 4090 Ti, etc.

[+] ftufek|3 years ago|reply
For what it's worth, you can power limit them and I'd highly recommend it if you plan on running a few of these. In the past we've power limited RTX 3090s to 250W (100W lower than original) while losing negligible amount of performance.
[+] paulmd|3 years ago|reply
Dennard Scaling/MOSFET Scaling is over and it's starting to really bite. Power-per-transistor still goes down, but density is going up faster. Meaning an equal-sized chip on an old node vs a new node... the power goes up on the newer chip.

Physics is telling you that you need to let the chip "shrink" when you shrink. If you keep piling on more transistors (by keeping the chip the same size) then the power goes up. That's how it works now. If you make the chip even bigger... it goes up a lot. And NVIDIA is increasing transistor count by 2.6x here.

Efficiency (perf/w) is still going up significantly, but the chip also pulls more power on top of being more efficient. If that's not acceptable for your use-case, then you'll have to accept smaller chips and slower generational progress. The 4070 and 4060 will still exist if you absolutely don't want to go above 200W. Or you can buy the bigger chips and underclock them (setting a power limit is like two clicks) and run them in the efficiency sweet spot.

But, everyone always complains about "NVIDIA won't make big chips, why are they selling small chips at a big-chip price" and now they've finally gone and done a big chip on a modern node, and people are still finding reasons to complain about it. This is what a high-density 600mm2 chip on TSMC N5P running at competitive clockrates looks like, it's a property of the node and not anything in particular that NVIDIA has done here.

AMD's chips are on the same node and will be pretty spicy too - rumors are around 400W, for a slightly smaller chip. Again, TDP being more or less a property of the chip size and the node[0], that's what you'd expect. For a given library and frequency and assuming "average" transistor activity: transistor count determines die size, and die size determines TDP. You need to improve performance-per-transistor and that's no longer easy.

[0] an oversimplification ofc but still

That's the whole point of DLSS/XeSS/Streamline/potentially a DirectX API, get more performance-per-transistor by adding an accelerator unit which "punches above its weight" in some applicable task and pushes the perf/t curve upwards. But, people have whined nonstop about that since day 1 because using inference is a conspiracy from Big GPU to sell more tensor cores, or something, I guess. Surely there is some obvious solution to TAAU sample weighting that doesn't need inference, and it's just that every academic and programmer in the field has agreed not to talk about it for the last 20 years, right?

[+] amelius|3 years ago|reply
Hardware designers produce more Watts where software developers create more bloat.
[+] capableweb|3 years ago|reply
I'm not sure what else to expect? Is it so crazy that they make the card even faster and bigger than before, and it uses more power? What else is the next generation of cards supposed to do, have the same performance but make them more energy efficient? Not sure how many people would buy cards that have the same performance but less TDP.
[+] jltsiren|3 years ago|reply
It's good to remember that you don't have to buy the most powerful GPU model just because you can afford it.

Some people are probably in the target audience for the 4090. Others may prefer the 4080 models, which have a slightly lower TDP than the 3080 models but still get a nice performance boost from much higher clock rates.

[+] jackmott42|3 years ago|reply
People want to render more triangles. What are you gonna do?
[+] fomine3|3 years ago|reply
I wonder when Nvidia start selling lower powered model like Intel "T" CPU. It's just underclocked/undervolted and a bit binned chip, but some consumer like it. EnergyStar also will like it.
[+] behnamoh|3 years ago|reply
Apple M1 showed us that it’s possible to increase performance while keeping the power consumption low—laptop level low.
[+] nfRfqX5n|3 years ago|reply
Market doesn’t seem to care yet
[+] izacus|3 years ago|reply
As a gamer... Should I care?
[+] causi|3 years ago|reply
NVIDIA has unleashed its next-gen GeForce RTX 4080 series graphics cards that come in 16 GB & 12 GB flavors at $1199 & $899 US pricing.

Welp it looks like I'm buying a Series X because fuck that noise. PC gamers get put through hell for the last two years because Nvidia preferred selling to Chinese crypto miners and this is our reward?

[+] SketchySeaBeast|3 years ago|reply
I find the fact that they have two 4080's with very different specs but with only a difference in memory size indicated very frustrating.
[+] sdenton4|3 years ago|reply
Compared to the 3090, the 4090 has about 60% more CUDA cores (16k vs 10k), runs at ~2.2GHz (up from 1.4GHz) and eats about an extra 100W of power.

Over the last couple weeks, it's been possible to get 3090's on sale for juuuust under $1k (I picked one up after watching the prices come down from like $3k over the last couple years). The 4090 is priced at $1500... (and for me at least would require a new power supply.)

[+] dartdartdart|3 years ago|reply
Still displayport 1.4

Intel has already moved to 2.0, and AMD is rumored to be supporting displayport 2.0

Would have liked to see this for 40 series, but I guess I will wait to build my first PC

[+] julienchastang|3 years ago|reply
I’ve been working with NVIDIA GPU VMs of late with the intention of running GPU enabled TensorFlow and PyTorch. I find working with NVIDIA software incredibly frustrating. In particular, installation instructions for CUDA / cuDNN simply do not work. The biggest culprit is missing or incorrectly versioned shared object libraries. Forums are loaded with developers having similar problems never with any viable solution. I’ve had better luck with NVIDIA distributed docker containers. There at least I can make it as far as running GPU enabled TensorFlow sample code.
[+] Entinel|3 years ago|reply
Nvidia realized they can charge whatever they want and are taking full advantage of that. Hopefully Intel, AMD and used cards are able to bring them down a peg.
[+] colpabar|3 years ago|reply
Ask HN: I am still using a 980. What should I get? I do not care at all about having the latest/greatest, I just want an upgrade.
[+] Filligree|3 years ago|reply
Nothing above 24GB? Not much point in upgrading, then. Memory size remains the biggest bottleneck.
[+] arecurrence|3 years ago|reply
Today is a gaming product announcement and I'm not sure that games have a need even beyond 12 GB today. I suspect that the 4090 price not budging much is telling as far as what ram amount demand has been focused on for games.

I assume they will soon have a professional card announcement that includes 48GB+ cards. Assuming that the high ram cards have improvements similar to this generational leap in the gaming market, they will be in high demand.

[+] capableweb|3 years ago|reply
Seems they don't want to eat up market share of their "professional" cards with the consumer cards. Want above 24GB? Better put up money for the data center cards.
[+] alkonaut|3 years ago|reply
If we disregard AI/ML applications for a second, are there really cards that are regularly limiting FPS in games between e.g. 12GB and 16GB?
[+] numlock86|3 years ago|reply
Same here. 3090 Ti but memory is usually the bottleneck. And workstation GPUs like the A100 are just too expensive ...
[+] rejectfinite|3 years ago|reply
>Memory size remains the biggest bottleneck

For what? AI shit? These are for games broski

[+] fuzzy2|3 years ago|reply
Is it though? I guess it is for machine learning. But do games require that much memory? Remember that GeForce is the consumer line. I would expect it to mainly target the gamer market.
[+] blagie|3 years ago|reply
Any clue about how well ML frameworks do with multiple cards?

I'm completely bottlenecked on RAM, but for what I do, even a 3060 would be adequate performance.

[+] wellthisisgreat|3 years ago|reply
Yeah, I have the same sentiment.

I hope they come out with 4090TI with more memory though

[+] shaggie76|3 years ago|reply
As a streamer AV1 encode support is really exciting to me; when I can enable in OBS and stream to Twitch etc getting an Ada GPU would be high on my list of upgrades.
[+] eis|3 years ago|reply
The Portal 2 image comparison seems pretty dishonest to me.

You can clearly see the graphics details have been reduced in the RTX off version. Check how the outline of the portal or the gun look like for example, completely different textures.

[+] badwolf|3 years ago|reply
Ahh nice! I'm in need of a new spaceheater.
[+] dougmwne|3 years ago|reply
I mean it’s getting there. Add this to a best in class desktop processor at 300W and add in all the other power consumption and you are close to the wattage of a hot plate, kettle or small space heater.
[+] izacus|3 years ago|reply
Hmm... what happened to memory bus width on 4080s?

It seems like they went from 384/320 bit width (on 3080) down to 256/190? Anyone know the story behind it?

[+] onepointsixC|3 years ago|reply
This announcement just shows the need for Intel to offer something competitive in terms of price to performance. These cards are absurdly expensive.
[+] phaistra|3 years ago|reply
Mining boom prices without the mining boom.
[+] ihuman|3 years ago|reply
The 4080/4090 pages have the prices.

4080 12GB: $900

4080 16GB: $1200

4090: $1600