Apple’s A14 Packs 134M Transistors/mm²

[+] ChuckMcM|5 years ago|reply

I found this to be a pretty confusing article. I get that they are analyzing the new node, which is great, but the editorializing seems a bit premature to me.

I also don't think the author understood the TMSC presentation. TMSC clearly said that is used a "model" of a typical SOC of 60% logic, 30% SRAM, and 10% analog. Then they said that for each category of thing, you could expect 1.8x, 1.35x, and 1.2x of shrink. If you do the math, that means an overall shrink for a 'typical' SOC that conforms to their model would be 1.57x (approximately).

That Apple achieved 1.49x would suggest they got 95% of the process shrink effectiveness.

Then there is the cost per die and thus cost per transistor discussion. It is important to remember that this is likely the most expensive these wafers will ever be. The reasoning for that statement is that during a process node life-cycle the cost per wafer is set initially to capture "early movers" (who value density over cost). Much like any product where competition will emerge "later" there is a window early on to recapture value which can pay back your R&D and capital equipment investments. As a result the vendor sets the price as high as possible to make that repayment happen as quickly as possible. Once paid back, the price provides profit as long as it can be supported in the presence of other competitors (in this case, I would guess that role is played by Samsung). The GSA uses to publish price surveys of wafers on various nodes over time but they don't seem to do that any more. Anyway, so the cost per transistor will go down from this point but how much depends on how much margin is in the current wafer price.

So I agree that the cost per transistor is not going down as quickly as it has in the past, and its possible that this node may not get to be as low as the previous node. I'm curious how it compares when you look at 7nm introduction price per transistor vs todays price per transistor. And if you get the same ramp with the 5nm node what that would mean.

[+] hedgehog|5 years ago|reply

Apple is also probably using more SRAM than the "typical SOC", for example the A13 has much more cache than say the Snapdragon 855. I don't know the relative die areas but that would let you make a decent estimate.

[+] londons_explore|5 years ago|reply

TSMC has a monopoly on the highest performance processes - so the price won't drop till there is competition. Seeing how the competition is falling behind, that will be a while...

[+] ksec|5 years ago|reply

And the economic model now requires Leading Edge Fab to capture those value in a longer period of time. What used to be two years will be lengthen closer to three.

>I'm curious how it compares when you look at 7nm introduction price per transistor vs todays price per transistor. And if you get the same ramp with the 5nm node what that would mean.

First Gen N7 being ~10K+ per wafer while N5 being around ~13K with higher yield compare to N7 in the same stage. So N5 should still be cheaper.

[+] thornjm|5 years ago|reply

The column is explicitly labelled "effective vs theoretical" though. There is nothing misleading there.

[+] disillusionmnt|5 years ago|reply

Was I the only one disappointed in the decreasing die size? In our solid state past, it was about cramming more in there. All this cost-cutting leaving gaps... it wouldn’t pass the Jobs’ fish tank test.

[+] topspin|5 years ago|reply

I really appreciate this analysis and the straightforward top line number 134e6/mm^2. The usual "node" figure is utterly meaningless; an electrical engineer couldn't care less about "feature" size (whatever that means.) What is the count of discrete components in a given area? There are 40 billion 5nm (the supposed "node" of these chips) squares in a millimeter of area. That's two orders of magnitude more dense than 134 million.

The meaningful achievement is how many discrete electrical components are composed into a given area. Not some arbitrary dimension of some cherry picked subset of these components.

[+] moonchild|5 years ago|reply

> The meaningful achievement is how many discrete electrical components are composed into a given area. Not some arbitrary dimension of some cherry picked subset of these components.

I disagree. The meaningful achievement is how power-efficient, fast, and cheap you can make a given chip. (Secondarily, how small and how durable wrt cosmic rays; but for most purposes these are not super important.)

If that follows as a result of many discrete electrical components being packed into a small area, great; but the latter isn't intrinsically interesting.

[+] coherentpony|5 years ago|reply

These things aren't _really_ two-dimensional. They're not really three-dimensional either, but they are objects built out of layers of two-dimensional things. When you measure number of transistors per unit area you will inevitably see something more dense than the "number of 5nm square in a millimeter of area".

It's the silicon equivalent of measuring one's BMI.

[+] hinkley|5 years ago|reply

A cycle or two ago I asked this same question. Some people seemed to think that such a number would be gamed, but if that's a valid barrier to entry then we will never accomplish anything again, really.

As long as the measure is in mm^2, not mm^3, I think (hope?) that would avoid any perverse incentives against breakthroughs that allow you to add more layers to a chip and still maintain yields.

[+] eightysixfour|5 years ago|reply

> The meaningful achievement is how many discrete electrical components are composed into a given area.

Is average over an area the meaningful achievement or is the meaningful achievement the smallest individual gate length? Neither are super useful without additional context.

[+] m463|5 years ago|reply

"As of 2019, the highest transistor count in any IC chip is Samsung's 1 TB eUFS (3D-stacked) V-NAND flash memory chip, with 2 trillion floating-gate MOSFETs (4 bits per transistor)." [1]

How about bit count? (at 4 bits per transistor)

[1] https://en.wikipedia.org/wiki/Transistor_count

[+] narrator|5 years ago|reply

The funny thing is is this isn't really Apple's achievement. It's TSMC's achievement. It's also Intel's failure in that they failed to get a similar process working on schedule.

[+] Nokinside|5 years ago|reply

> straightforward top line number 134e6/mm^2

Even that number is not so straightforward.

Tr/mm^2 = 0.6 * (NAND2 Tr)/mm^2 + 0.4 * (Scan Flip-fop Tr)/mm2

[+] talentedcoin|5 years ago|reply

I think this is cool if they’re the future of the Mac, but aren’t these chips just wasted in the iPad? I say this as an iPad Pro (1st-gen) owner ... it’s already way more processing power than I can really use, because after trying for months to get a sensible workflow going (mostly based around Pythonista, Editorial and RealVNC) I’ve relegated it to a OneNote and Netflix machine. What’s the point other than the coolness factor? Is it some future AR-type application?

[+] jiggawatts|5 years ago|reply

I was just looking into optimising a large web application.

I told the customer that using a caching layer such as a CDN would help paper over the worst of the inefficiencies in their application and the network stack.

That was true! The download times halved.

However, benchmarks with F12 developer tools showed that while downloads reduced from 200ms to 100ms, the overall load times were still seconds, of which about 50% was actual CPU time.

(This is typical of large, complex sites built by non-experts or organisations where performance is not a primary concern.)

Web sites like these perform very noticeably better if you have more CPU cores or faster CPU cores.

Essentially, CDNs are easy to deploy and 5G provides nearly gigabit download speeds, so the bottleneck has shifted back to the CPU for a lot of the web.

[+] wincy|5 years ago|reply

I sold my 2080ti and am playing Eve Echoes (new, mobile version of EVE Online) and Among Us, and Civ VI on the 2nd gen iPad Pro I bought used with less than half of the proceeds. The battery lasts a good long time and it feels pretty good to be able to play these full sized games on a screen like this. When I'm not using it the toddler watches some kids shows, and with the keyboard case I can type almost as quickly as I can on my 15" Macbook pro. If it had XCode I'd be set, honestly.

[+] manmal|5 years ago|reply

Having a more-powerful-than-necessary CPU has many benefits, here are some:

- Less heat produced > allows for smaller heatsinks and, as a result, smaller devices or bigger components

- Performance bottlenecks become less likely

- Features like 120hz display refresh rate become possible without the user noticing degradation

- Faster OS boot and app starts

- Ability to do more work locally vs using a server (see the many ML features)

- Tech debt in system code is less of an issue - code can be shipped early and optimized later

[+] Moto7451|5 years ago|reply

My understanding from talking to friends who work at Apple and some of the product briefs is a lot of the hardware is used for video and photo processing. Video games take advantage of the extra CPU/GPU power as well.

I don’t think that’s all that Apple specific as there are increased ISPs and security chips on other phones as well.

[+] saagarjha|5 years ago|reply

I work on a x86 interpreter that is only usable because of Apple's massive investment into these chips over the last couple years. I doubt that is what Apple was designing them for, but they are very useful to have as an application developer.

[+] kitotik|5 years ago|reply

Audio and Video work will take all the transistors you can throw at it. Things like GarageBand and 3rd party software synthesizers can easily max out even latest gen iPad Pros. Not to mention games…

[+] samatman|5 years ago|reply

Not if you start juggling several streams of 4k video, it's not.

Also, one can play games on an iPad, and games will soak up arbitrary amounts of CPU and GPU power.

[+] urthor|5 years ago|reply

Never underestimate the PUBG Mobile/Genshin Impact demographic basically.

There's a lot of corner cases there where certain consumers value it a lot.

Also performance per watt is a huge deal. The expanded power envelope that improved PPW brings allows 120hz displays which are battery hogs.

[+] Cookingboy|5 years ago|reply

>What’s the point other than the coolness factor?

So Genshin Impact can look and run beautifully XD

[+] fomine3|5 years ago|reply

Apple's SoC works really great and impressive, but I don't know why I need such spec for Phone. Even A12 is still great for most workloads. Apple improves camera functionality by its performance but just camera? I wish they find useful uses other than cameras.

[+] centimeter|5 years ago|reply

I bought my iPad Pro because it offered the smoothest RAW photo proofing process of any product I've used. A lot of this is obviously up to software as well as hardware, but still - the beefy processor is useful here.

[+] unknown|5 years ago|reply

[deleted]

[+] megablast|5 years ago|reply

That is the way you want it. You don’t want it the other way.

[+] vmception|5 years ago|reply

Even if assuming they were wasted, lower the production cost for everyone else that is not wasting them

[+] threeseed|5 years ago|reply

Many professionals are using iPad Pro for audio e.g. DAW and video e.g. editing.

[+] jagged-chisel|5 years ago|reply

I'm sure Apple consider the iPad Pro as a testing ground for these chips.

[+] gettingsnarky|5 years ago|reply

[deleted]

[+] rdw|5 years ago|reply

The article says that Apple isn't making full use of the 5nm process node, and blames lack of SRAM scaling for it (presumably due to the large amount of L3 cache). Is this a problem that all processors are about to hit, or is this going to be overcome once process engineers are more familiar with 5nm?

[+] ajross|5 years ago|reply

This bit didn't make a lot of sense to me. SRAM is routinely the MOST optimized and MOST worried-over aspect of silicon layout. SRAM cell architectures get hand-optimized carefully years in advance of any process improvements. They aren't just logic that gets spit out of generic tooling.

So while it would make sense that a new process would have new design rule that didn't map well to older EDA tooling, it's harder to see that argument holding for SRAM. If SRAM isn't scaling, why is general logic?

[+] FullyFunctional|5 years ago|reply

They optimize SRAM far in advance of introducing the process, after all this is one of the most critical components. It's very unlikely you'll see any further SRAM improvement on this process, but a process optimization can improve it (and everything else).

[+] unknown|5 years ago|reply

[deleted]

[+] hmottestad|5 years ago|reply

As someone with an 8 year old MacBook Pro that isn’t all that much slower than my 2019 16”.... I am so glad Apple is still pushing the envelope.

Back in the old days we used to get a doubling of performance every year or two. Then Intel got into trouble and has been producing the same chips for what 4-5 years? Essentially just minor changes.

It’s not really Apple’s fault that the rest of the industry is lagging so badly on performance. My Nokia was fast enough, of course no one “needs” a faster phone. But it only appears so much faster because the rest of the industry has stagnated.

Had Intel kept their tick-tock cadence then these new "faster” phones would hardly be considered fast.

[+] raydev|5 years ago|reply

My 2013 high end 15" MBP gets pretty hot when I'm watching 1080p or higher videos. Chrome+YouTube is even worse, it even drops frames. Compiling Swift is even worse than that.

I have a work-provided 16" MBP that does all these things effortlessly.

[+] jagger27|5 years ago|reply

I was curious and looked up how this compared to the computer my family had 20 years ago: a Dell XPS tower with a Pentium III. The A14 has more than 1,000x as many transistors as what Intel was doing 20 years ago on a 128mm^2 die. Pretty incredible.

[+] nimish|5 years ago|reply

This is about 35% denser than intel 10nm, and not a theoretical number.

[+] throwaway4good|5 years ago|reply

For comparison. The Kirin 9000 (which includes a 5G modem) appears to have 15.3B transistors, the A14 has 11.8B transistors.

https://en.wikipedia.org/wiki/Transistor_count

Has a big table listing a number of historical CPUs.

[+] pwinnski|5 years ago|reply

"A14 comes in at a cool 78% effective transistor density when compared to theoretical density." Wowsers. Apparently SRAM is the limiting factor.

[+] tpowell|5 years ago|reply

I was curious just how many chips per wafer they net. From this 2018 article[1], it appears to be ~530 at 5nm, which would be $32ea if that yield is accurate. The same article estimated 7nm chips came out to $18ea.

Apparently, R&D spend went up 50% from 7nm to 5nm. I'm curious to see how many flavors of the A14 Apple cooks up given the comparatively high die cost.

[1] https://wccftech.com/apple-5nm-3nm-cost-transistors/

[+] tpmx|5 years ago|reply

Foundry sale price of $238/chip seems way too high.

See e.g. https://www.cultofmac.com/658650/iphone-11-pro-max-productio...

> Apple A13 processor that (is) supposedly is priced at $64.

[+] amelius|5 years ago|reply

Isn't this mostly TSMC's achievement?

Can't their other clients reach the same density?

[+] ineedasername|5 years ago|reply

What's the relative heat dissipation between logic & SRAM on a chip? The article talks about layering as a potential way forward for SRAM, but that would come with more complex TDP management, unless SRAM isn't burning watts at the same rate as the rest of the chip.

[+] exabrial|5 years ago|reply

For comparison I was trying to nail down what Intel's 7nm process is, but I've found answers between 120nm and 215nm. If anyone knows, that'd be an interesting comparison.

175 comments