I think we will be seeing this more and more (less cache on the die OR stacked cache on top of the main chip instead of in it) since SRAM scaling is now near a halting point [1] (no more improvements) which means the fixed code of cache is going up with every new node.
From what I've read, the L1/L2 cache per core is the same, and the L3 cache per chiplet is the same, but the cores per chiplet is doubled and the overall chiplet size is about the same (it's a little bigger, I think)
So, L3 cache didn't get smaller (in area or bytes), there's just less for each core. L1/L2 is relatively small, but they did use techniqued to make it smaller at the expense of performance.
I think the big difference really is the reductions in buffers, etc, needed to get a design that scales to the moon. This is likely a major factor for Apple's M series too. Apple computers are never getting the thermal design needed to clock to 5Ghz, so setting the design target much lower means a smaller core, better power efficiency, lower heat, etc. The same thing applies here: you're not running your dense servers at 5Ghz, there's just not enough capability to deliver power and remove heat; so a design with a more realistic target speed can be smaller and more efficient.
Semianalysis seems to indicate the core space itself is 35% smaller. Maybe that's process tweaks related to clock rate? But I don't think we know for absolute certain it really is the same core with the same execution units & same everything. Even though a ton of the listed stats are exactly the same.
treesciencebot|2 years ago
https://semiwiki.com/forum/index.php?threads/tsmc-officially...
loeg|2 years ago
toast0|2 years ago
So, L3 cache didn't get smaller (in area or bytes), there's just less for each core. L1/L2 is relatively small, but they did use techniqued to make it smaller at the expense of performance.
I think the big difference really is the reductions in buffers, etc, needed to get a design that scales to the moon. This is likely a major factor for Apple's M series too. Apple computers are never getting the thermal design needed to clock to 5Ghz, so setting the design target much lower means a smaller core, better power efficiency, lower heat, etc. The same thing applies here: you're not running your dense servers at 5Ghz, there's just not enough capability to deliver power and remove heat; so a design with a more realistic target speed can be smaller and more efficient.
rektide|2 years ago