The newest extension it needs is -VBMI2, which is supported by Zen 4. -DQ and -BW are quite old and very common amongst all implementations by this point.
zen4 supports basically everything except the xeon phi SMT4 intrinsics (4VMMW or whatever). As did alder lake before its removal.
The support story for AVX extensions is not as complex as people make it out to be anyway. Server is a monotonic sequence, consumer is a monotonic sequence, both of them are converging apart from 1 or 2 that are unique to one or the other. Xeon Phi has the SMT4 intrinsics that are completely its own thing due to the SMT4 there, but you'll know if you're targeting xeon phi.
So as you can see, consumer supports everything except BFloat16 for neural net training. Consumer doesn't do that so it's not a problem. And it doesn't support the Xeon Phi stuff because Xeon Phi is its own crazy thing.
No uarch family in that chart has ever abandoned an extension once it was adopted. So unless you are taking a consumer application and running it in the server, it's literally not even a problem. And server gets bfloat. That's it, that's literally the only two things you have to know.
but letting AMD fanboys draw le funni venn diagram is obviously way catchier than a properly organized chart representing the actual family trees involved... SSE would look bizarre if you represented it that way too, like all AMD's weird one-off SSE4 extension sets released in the middle of more fully-featured implementations... but people working in good faith would never actually be confused by that because they understand it's a different product family and year of release is not the only factor here.
--
Really the thing that has been a problem is that server has been stalled out forever... first 10nm problems and now sapphire rapid has more than a dozen known steppings. They can't get the newer architectures out, so consumer has been moving ahead without them... up until alder lake nuked the whole thing. If server had been able to get newer uarchs out, there would be a lot more green bars in server too.
supposedly the fab teams are actually ready to go now, and the problem is the design teams aren't used to operating in an environment where they can't go down the hall and have the fab teams fix their shit. Intel put the foot down and aren't letting them do that anymore, since the fab teams need to sell the resulting process/cell libraries to external foundry customers, and the design teams need to be able to make their shit work on external foundries. You can't do this hyper-tuned shit where the process is tweaked to make your bullshit cell designs work. But some of the teams are not mature enough to work in a portable environment where design rules actually have to be obeyed because Intel historically never had to.
When you hear the infinite steppings of Sapphire Rapids and the network chip team's continued inability to put out a 2.5gbe chipset that works (I think we are on public release number 6 now?), it's pretty obvious who the worst culprits were. Meteor Lake may also be having packaging/integration problems (although this is supposition by me based on what products are delayed - coincidentally it is a lot of chiplet/tile stuff and intel obviously lacks experience in advanced packaging) but the products that have infinite steppings obviously can't get their own shit together even on their own tiles let alone talking to other people's tiles.
But Intel supposedly are not kidding that Intel 4 is ready to go and they've just got nothing to run on it yet. Hence looking for outside partners. Supposedly they've got at least one definite order signed for Intel 3 in 2024, and I think there will be a lot of people happy to diversify and derisk away from the TSMC monoculture that has emerged... if TSMC stumbles, right now there is no alternative.
Samsung has all the same conflict-of-interest problems as Intel and also a track record of really mediocre fab execution. Supposedly they are ahead on GAAFET but like... we'll see, it's Samsung, who knows. They've stumbled just as much as Intel, just not on 7nm tier - I remember the iphone "is it TSMC or Samsung" games too. Samsung has put out a lot of garbage nodes and a lot of poorly-yielding nodes of their own.
edit since I can't edit: "And server gets bfloat" meaning "if you were to bring a ML training server application over to consumer it might not work".
Basically what I'm saying is, the only 2 situations that would be a problem is going consumer->server (which I don't see happening often) or going server ML training -> consumer if it doesn't have a non-BFloat16 fallback. And everyone does ML training on GPUs anyway.
Otherwise everything supports everything. Going backwards within a family might be a problem, but, that's always a problem, it's not a support matrix problem where there's a mixture of capability, it's just backwards compatibility to older hardware with less features.
The real problem, as I said, is that "Cooper Lake" there is Ice Lake-SP which was stalled for years, and by the time it was adopted Milan was already in the market and Cooper Lake was dead on arrival. So nobody actually has Cooper Lake, if you have AVX-512 server it's 99.9% chance it's either Skylake-SP or Cascade Lake-SP.
Which is 100% drop-in compatible with any consumer platform that anyone has (since conveniently nobody has Cannon Lake either). The literal only problem is taking consumer applications and running them on server stuff, and there's a well-defined server compatibility set there too.
--
Going forward, Sapphire Rapids is Golden Cove cores, so it should have the same support bars as Alder Lake there, ie basically everything, including server bfloat as well.
(and of course the other problem being Intel has no idea what the fuck they're doing with big.LITTLE on the consumer platform... the support matrix for everything consumer-family going forward is apparently "nothing" because they've dropped AVX-512 entirely.)
--
Let me drill this down to the generations you actually need to care about: (that poor PNG...)
Like literally the AVX-512 support matrix is a complete fucking non-issue, it's an absolute tempest in a teapot by people who have never touched or looked seriously at AVX-512. The AVX-512 rollout is a dumpster fire in many many ways but an overly-complex support matrix is not one of them.
aseipp|3 years ago
paulmd|3 years ago
The support story for AVX extensions is not as complex as people make it out to be anyway. Server is a monotonic sequence, consumer is a monotonic sequence, both of them are converging apart from 1 or 2 that are unique to one or the other. Xeon Phi has the SMT4 intrinsics that are completely its own thing due to the SMT4 there, but you'll know if you're targeting xeon phi.
https://i.imgur.com/idAjB1X.png
So as you can see, consumer supports everything except BFloat16 for neural net training. Consumer doesn't do that so it's not a problem. And it doesn't support the Xeon Phi stuff because Xeon Phi is its own crazy thing.
No uarch family in that chart has ever abandoned an extension once it was adopted. So unless you are taking a consumer application and running it in the server, it's literally not even a problem. And server gets bfloat. That's it, that's literally the only two things you have to know.
but letting AMD fanboys draw le funni venn diagram is obviously way catchier than a properly organized chart representing the actual family trees involved... SSE would look bizarre if you represented it that way too, like all AMD's weird one-off SSE4 extension sets released in the middle of more fully-featured implementations... but people working in good faith would never actually be confused by that because they understand it's a different product family and year of release is not the only factor here.
--
Really the thing that has been a problem is that server has been stalled out forever... first 10nm problems and now sapphire rapid has more than a dozen known steppings. They can't get the newer architectures out, so consumer has been moving ahead without them... up until alder lake nuked the whole thing. If server had been able to get newer uarchs out, there would be a lot more green bars in server too.
supposedly the fab teams are actually ready to go now, and the problem is the design teams aren't used to operating in an environment where they can't go down the hall and have the fab teams fix their shit. Intel put the foot down and aren't letting them do that anymore, since the fab teams need to sell the resulting process/cell libraries to external foundry customers, and the design teams need to be able to make their shit work on external foundries. You can't do this hyper-tuned shit where the process is tweaked to make your bullshit cell designs work. But some of the teams are not mature enough to work in a portable environment where design rules actually have to be obeyed because Intel historically never had to.
When you hear the infinite steppings of Sapphire Rapids and the network chip team's continued inability to put out a 2.5gbe chipset that works (I think we are on public release number 6 now?), it's pretty obvious who the worst culprits were. Meteor Lake may also be having packaging/integration problems (although this is supposition by me based on what products are delayed - coincidentally it is a lot of chiplet/tile stuff and intel obviously lacks experience in advanced packaging) but the products that have infinite steppings obviously can't get their own shit together even on their own tiles let alone talking to other people's tiles.
But Intel supposedly are not kidding that Intel 4 is ready to go and they've just got nothing to run on it yet. Hence looking for outside partners. Supposedly they've got at least one definite order signed for Intel 3 in 2024, and I think there will be a lot of people happy to diversify and derisk away from the TSMC monoculture that has emerged... if TSMC stumbles, right now there is no alternative.
https://www.tomshardware.com/news/intel-ifs-lands-3nm-to-mak...
Samsung has all the same conflict-of-interest problems as Intel and also a track record of really mediocre fab execution. Supposedly they are ahead on GAAFET but like... we'll see, it's Samsung, who knows. They've stumbled just as much as Intel, just not on 7nm tier - I remember the iphone "is it TSMC or Samsung" games too. Samsung has put out a lot of garbage nodes and a lot of poorly-yielding nodes of their own.
paulmd|3 years ago
Basically what I'm saying is, the only 2 situations that would be a problem is going consumer->server (which I don't see happening often) or going server ML training -> consumer if it doesn't have a non-BFloat16 fallback. And everyone does ML training on GPUs anyway.
Otherwise everything supports everything. Going backwards within a family might be a problem, but, that's always a problem, it's not a support matrix problem where there's a mixture of capability, it's just backwards compatibility to older hardware with less features.
The real problem, as I said, is that "Cooper Lake" there is Ice Lake-SP which was stalled for years, and by the time it was adopted Milan was already in the market and Cooper Lake was dead on arrival. So nobody actually has Cooper Lake, if you have AVX-512 server it's 99.9% chance it's either Skylake-SP or Cascade Lake-SP.
Which is 100% drop-in compatible with any consumer platform that anyone has (since conveniently nobody has Cannon Lake either). The literal only problem is taking consumer applications and running them on server stuff, and there's a well-defined server compatibility set there too.
--
Going forward, Sapphire Rapids is Golden Cove cores, so it should have the same support bars as Alder Lake there, ie basically everything, including server bfloat as well.
https://www.phoronix.com/image-viewer.php?id=intel-sapphirer...
(and of course the other problem being Intel has no idea what the fuck they're doing with big.LITTLE on the consumer platform... the support matrix for everything consumer-family going forward is apparently "nothing" because they've dropped AVX-512 entirely.)
--
Let me drill this down to the generations you actually need to care about: (that poor PNG...)
https://i.imgur.com/2HLrIjr.png
Like literally the AVX-512 support matrix is a complete fucking non-issue, it's an absolute tempest in a teapot by people who have never touched or looked seriously at AVX-512. The AVX-512 rollout is a dumpster fire in many many ways but an overly-complex support matrix is not one of them.