Wow. It has 15.3 billion transistors. It's amazing we can buy something with that many engineered parts. Even if the transistors are the result of duplication and lithography, it's an astonishing number. Creating the mask must have taken a while.
Does anyone know what the failure rate is for the transistors (or transistors of a similar production process)? Do they all have to function to spec for a GPU, or are malfunctioning transistors disabled or corrected? What does the QC process look like?
Exact failure and bin rates for most semiconductor companies are considered deep dark internal trade secret. Other than pure scale, yield rates are one of the biggest factors in semiconductor cost and profit margin.
And the answer is it depends. If you lose some transistors, you expect to lose the entire chip. But the vast majority of the transistors on each chip are part of cache or many many duplicate GPU cores, which if they fail to pass tests, can be disabled or downclocked and then the chip is binned into the appropriate product line.
With GPUs this is much easier than other types of chips, because the level of functional duplication that exists allows a lot of flexibility. If a core is bad, you use a different one, and GPU cores are small enough they'd be stupid not to put some spares on each chip. Same with memories.
Generally one can safely assume:
* Most chips that come off the line are binned into a lower category and do not function at max spec for everything, which is why the price jump is so high at the extreme upper end of a hardware series.
* With ASIC lithography most transistor malfunction isn't correctable, you mostly have to either downclock (some types of faults) or disable (the rest) that piece.
* Rates of transistor malfunction is still incredibly fucking amazingly phenomenally low. Like with 15B transistors on a chip, you have trouble affording a failure rate of even one in a billion.
So your line has to be, as the kids say: on fleek.
I do not have the answers for your questions (and I don't think anyone can share actual failure rates), but I would direct you this video which goes over a lot of modern chip fabrication techniques, circa 2009:
There are wafer test machines which will interface with the wafer directly and do some testing (which are $$$$), JTAG type tests, which access parts of the chip out of band, and functional testing. Some products, like SD Cards actually have a microcontroller on board that will provide the test routines and error correction without the need of an expensive machine. Design for test is extremely important.
I'm by no means an expert however, I mostly deal with JTAG and functional tests.
Hasn't half precision (16-bit float) been in NVidia GPUs forever? I could swear it was available back in the very first shader-capable Geforce FX days already.
IBM's Power9 and its future Power 3.0 ISA CPUs, which should increasingly focus on deep-learning/big data optimization combined with Nvidia's GPUs which will increasingly optimize for the same, should make an interesting match over the next 5+ years.
On the gaming side, I do hope they continue to optimize for VR. I think AMD is even slightly ahead of them on that.
* "Up to 2x" performance on FP16 (compared to FP32, so about 22 TFLOPs).
* FP16 is also aimed to neural nets training, because when the weights of the net are FP16, it's a more compact representation.
* 3840 general purpose processors.
* More/better texture units, memory units, etc. So it's not about raw power, but also about a better design.
Guess that's about the important stuff. I just skimmed the article over the top, reading a bit here and there, but that seemed to be the most remarkable stuff.
Comparison to CPU is also important IMHO, and for that you need to be aware that terminology is very different.
What nvidia calls a "core" is more like one entry in a SIMD unit on a cpu.
What nvidia calls a "SM" is closer to a CPU core.
There is more to it than that, i.e. gpu cores are more independent than entries in a cpu vector unit, but on the other hand, but gpu "SM"s are less independent than cpu cores.
It's also worth keeping in mind that mediocre cpu code will run circles around mediocre gpu code. To get the gpu magic you have to invest a lot of effort in tuning for the architecture.
[+] [-] tomkinstinch|10 years ago|reply
Does anyone know what the failure rate is for the transistors (or transistors of a similar production process)? Do they all have to function to spec for a GPU, or are malfunctioning transistors disabled or corrected? What does the QC process look like?
[+] [-] djcapelis|10 years ago|reply
And the answer is it depends. If you lose some transistors, you expect to lose the entire chip. But the vast majority of the transistors on each chip are part of cache or many many duplicate GPU cores, which if they fail to pass tests, can be disabled or downclocked and then the chip is binned into the appropriate product line.
With GPUs this is much easier than other types of chips, because the level of functional duplication that exists allows a lot of flexibility. If a core is bad, you use a different one, and GPU cores are small enough they'd be stupid not to put some spares on each chip. Same with memories.
Generally one can safely assume:
* Most chips that come off the line are binned into a lower category and do not function at max spec for everything, which is why the price jump is so high at the extreme upper end of a hardware series.
* With ASIC lithography most transistor malfunction isn't correctable, you mostly have to either downclock (some types of faults) or disable (the rest) that piece.
* Rates of transistor malfunction is still incredibly fucking amazingly phenomenally low. Like with 15B transistors on a chip, you have trouble affording a failure rate of even one in a billion.
So your line has to be, as the kids say: on fleek.
[+] [-] tcas|10 years ago|reply
https://www.youtube.com/watch?v=NGFhc8R_uO4
It's crazy stuff.
There are wafer test machines which will interface with the wafer directly and do some testing (which are $$$$), JTAG type tests, which access parts of the chip out of band, and functional testing. Some products, like SD Cards actually have a microcontroller on board that will provide the test routines and error correction without the need of an expensive machine. Design for test is extremely important.
I'm by no means an expert however, I mostly deal with JTAG and functional tests.
[+] [-] cottonseed|10 years ago|reply
[+] [-] wyldfire|10 years ago|reply
[+] [-] pavlov|10 years ago|reply
[+] [-] mtgx|10 years ago|reply
On the gaming side, I do hope they continue to optimize for VR. I think AMD is even slightly ahead of them on that.
[+] [-] gnuvince|10 years ago|reply
[+] [-] gnoway|10 years ago|reply
https://rosettacode.org/wiki/Call_a_function_in_a_shared_lib...
[+] [-] analognoise|10 years ago|reply
I was just looking at controlling NGSpice from FreePascal - one of the examples of running a shared instance of NGSpice is done in a Pascal dialect:
http://ngspice.sourceforge.net/shared.html
I like Pascal much better than C++ and think the portable Lazarus GUI toolkit is pretty damn trick. Check it out: http://www.lazarus-ide.org/
[+] [-] sklogic|10 years ago|reply
[+] [-] venomsnake|10 years ago|reply
[+] [-] boznz|10 years ago|reply
[+] [-] overcast|10 years ago|reply
[+] [-] marmaduke|10 years ago|reply
[+] [-] pklausler|10 years ago|reply
[+] [-] dr_zoidberg|10 years ago|reply
* ~5.5 TFLOPs on FP64
* "About 2x" performance on FP32 (so 11 TFLOPs)
* "Up to 2x" performance on FP16 (compared to FP32, so about 22 TFLOPs).
* FP16 is also aimed to neural nets training, because when the weights of the net are FP16, it's a more compact representation.
* 3840 general purpose processors.
* More/better texture units, memory units, etc. So it's not about raw power, but also about a better design.
Guess that's about the important stuff. I just skimmed the article over the top, reading a bit here and there, but that seemed to be the most remarkable stuff.
[+] [-] drewm1980|10 years ago|reply
What nvidia calls a "core" is more like one entry in a SIMD unit on a cpu. What nvidia calls a "SM" is closer to a CPU core.
There is more to it than that, i.e. gpu cores are more independent than entries in a cpu vector unit, but on the other hand, but gpu "SM"s are less independent than cpu cores.
It's also worth keeping in mind that mediocre cpu code will run circles around mediocre gpu code. To get the gpu magic you have to invest a lot of effort in tuning for the architecture.
[+] [-] ansible|10 years ago|reply
[+] [-] JustSomeNobody|10 years ago|reply
Toasty.
[+] [-] timeu|10 years ago|reply
But seriously quite impressive piece of hardware.