top | item 41246331

(no title)

timlatim | 1 year ago

The 9950X seems more exciting than the last week's 9700X/9600X. It is comfortably ahead of the previous gen (including X3D) in code compilation and video/image processing, which I care about more than performance in games, and it's also in a class of its own in workloads heavy on AVX-512, though they might be a bit niche.

I think the TDP on the 9700X and 9600X may have been set a bit too low (in fact, there are indications it will be raised in a future BIOS update [1]), which led to a relatively cool reception from reviewers focused on raw performance. When looking at performance-per-watt in Phoronix tests, 9700X and 9600X often fare better than the bigger chips with higher TDP, but for desktops I guess efficiency is just not that big of a concern.

[1] https://videocardz.com/newz/amd-set-to-boost-tdp-for-ryzen-5...

discuss

kimixa|1 year ago

> and it's also in a class of its own in workloads heavy on AVX-512, though they might be a bit niche.

It'll be interesting to see if it remains niche - I do a fair bit of work on graphics rendering (some games, some not) and there's quite a bit in avx512 that interests me - even ignoring the wider register width. A lot of pretty common algorithms we use can be expressed a fair bit easier and simpler using some of those features.

Previous implementations either weren't available on consumer platforms, or had issues where they would downclock/limit ALU calculation width for some time after an avx512 instruction was run, only returning to full speed after a significant time - presumably when whatever power delivery issues could settle - which seriously affected what use cases in which it made sense. It wasn't worth it to have "small data set" users of avx512, as it would actually run slower than the equivalent avx2 code due to this. And the size of "large enough" data sets was pretty close to where it'll be better to schedule a task on the GPU anyway....

But AMD's implementation doesn't seem to have this problem - so this opens up the instruction set to much more use cases than previous implementations.

Or has the AVX512 ship already sailed? With Intel apparently being unable to fix these issues and started hacking it into even smaller bits? I mean, arguably they should have started with that - the register width is probably the least interesting part to me, but at some point having it actually widely adopted might be more useful than a "possibly better" version that no chip actually supports.

Remnant44|1 year ago

I agree. I work in a similar field, and the value of AVX512 is clearly there - it just hasn't been worth implementing for the tiny percentage of market penetration. This is directly due to the market segmentation strategy Intel applied. AMD has raised the ante for AVX512 with two excellent implementations in a row, and for the first time ever I'm definitely considering building AVX512 targets.

Just as a small example from current code, the much more powerful AVX512 byte-granular two register source shuffles (vpermt2b) are very tempting for hashing/lookup table code, turning a current perf bottleneck into something that doesn't even show up in the profiler. And according to (http://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teardo...) Zen5 has not one but _TWO_ of them, at a throughput quadrupling Intel's best effort..

kvemkon|1 year ago

> But AMD's implementation doesn't seem to have this problem

From an article:

> Does Zen5 throttle under AVX512?

> Yes it does. Intel couldn't get away from this, and neither can AMD. Laws of physics are the laws of physics.

> The difference is how AMD does the throttling ...

Further details in the article [1].

Discussed here on HN: [2], [3].

[1] https://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teard...

[2] https://news.ycombinator.com/item?id=41182395

[3] https://news.ycombinator.com/item?id=41248260

chipdart|1 year ago

A TDP of 170 Watt is quite the beast. I don't think that it's reasonable to claim this isn't a major concern just because it's a desktop processor. This reflects in operational cost and anything directly and indirectly related to cooling, which means case size and noise.

rainclouds|1 year ago

I want to replace my 5950x, but that’s a huge TDP jump. Be interesting to see how well it down clocks.

timlatim|1 year ago

If your typical workloads are covered by Phoronix tests, take a look at their energy consumption results. In LLVM compilation, for instance, 9950X does run at higher average power (188W vs 140W for 5950X), but because it finishes the task much faster, its energy consumption is actually lower at 58500 joules per run vs 78700 joules per run for 5950X, so it should be more efficient.