Software. Software. Software. Just two companies, Google and NVIDIA, have publicly launched a viable service or software stack. Just two companies have successfully written a "sufficiently advanced compiler". Just two companies actually have a product. And Google refuses to step into the arena and actually compete with NVIDIA. Man, what a time we live in.
Applied Brain Research has software called Nengo (www.nengo.ai) explicitly for developing neural network models and compiling them to different backends, including CPUs, GPUs, and neuromorphic hardware (Intel's Loihi, Spinnaker, Spinnaker 2, BrainDrop). It's been battle tested for over 10 years of developing models, built the world's largest functional brain model (https://bit.ly/2VNGgSX), integrates deep learning and spiking neural networks. Would be interested to hear your thoughts on it.
I wonder if WebGPU will reduce dependence on CUDA, esp as Tensorflow is being ported to WebGPU. With WebGPU's improved performance and utility and the fact that it runs on top of Vulcan, Metal and D3D with any GPU that has drivers for those, I wonder if DL folks will find it more tempting to use TFJS/WebGPU via Electron or the browser and just be done with CUDA (i.e. break or soften NVIDIA's monopoly)
Some of the numbers in that table do not make any sense and makes me question the quality of the entire article.
Where are the numbers for the Cerebras chip coming from?:
- How do you have a TDP of 180W for an entire wafer of chips?
- Why is there a peak FP32 number when they are clearly working with FP16?
Each of these chips is a completely different architecture and it makes no sense to compare them at this level. The only meaningful comparison is actual performance in applications because that reflects how the entire system will be used.
In the table, the figures are for a single die in the wafer. This is to make a meaningful comparison with the other chips listed (there is a table footnote for this). The 15 KW is the power consumption of the whole wafer (a detail I think was mentioned in the Hot Chips presentation). Why are they clearly working with FP16? Are there any public details on this?
One of the numbers that jumped out at me as being very unusual about the Cerebrus chip was this one: "Speculated clock speed of ~1 GHz and 15 kW power consumption."
"Ascend 910 is used for AI model training. In a typical training session based on ResNet-50, the combination of Ascend 910 and MindSpore is about two times faster at training AI models than other mainstream training cards using TensorFlow."
I think it would be great to clarify that what is commonly referred to as "TPU v2" (e.g. on GCP pricing, also what is shown in the image in this article), consists of 4 such modules with 8 cores total, which gives a more commonly quoted value of 180 TFLOPs.
Are deep neural networks really that widely applicable that it's profitable to design custom chips for them? What about other models of AI that involve, say, discrete math or graph search?
Consumer turing cards are about the closest you get right now. They're pretty reasonable bang for buck for training. They have tensor cores - not quite as many as Ulta, but the entire chip runs at a higher clock rate and the price/performance is better if you don't mind losing a gig or so of RAM and some memory bandwidth.
Kendryte K210 maybe? It's cheap as chips (pun intended!) I think I got mine for £40 including shipping. https://kendryte.com/ Note this is only for inference. For training you'll have to use a GPGPU or one of the chips in this article.
Stuff they acquired. This is something originating from Nervana Systems and I think there are also some Altera chips out there. Intel's custom foundry offering has historically been poor so chances are anyone they buy will have been using someone else (why take the risk and change that).
'''CONCLUSION
Graphics has just been reinvented. The new NVIDIA Turing GPU architecture is the most
advanced and efficient GPU architecture ever built. Turing implements a new Hybrid Rendering
model that combines real-time ray tracing, rasterization, AI, and simulation. Teamed with the
next generation graphics APIs, Turing enables massive performance gains and incredibly realistic
graphics for PC games and professional applications.'''
AWS gpu compute is extremely expensive. If this is due to datacenter licensing costs, I hope they come out with their own hardware soon to reduce these costs. If on the otherhand, it's because their value-add is not in renting out the hardware but burst scalability, then I'm less optimistic that they'd cannibalize their own cloud product.
Currently, it only takes about 1 month to break even if you buy a consumer gpu like the rtx 2080ti, compared with AWS time. For training purposes it doesn't seem to make sense.
- just looked up the numbers and google tpus are pretty similar in terms of pricing. I think any aws equivalent would probably be just as expensive compared to a diy pc.
> I am surprised Amazon has not jumped in the game
Why should they? There is not a lot of money to be gained from renting niche product in comparison to enormous capital expenditure for anything hardware related.
Lot's of dotcom companies burned themselves badly while chasing trendsetters with custom silicon. A cookie cutter 40nm SoC may cost "just" 10M today, but by involving yourself into custom silicon game you risk loosing in it. Not to mention that your operations troubles will increase n-fold.
Managing operations of hosting business with hundreds of thousands customer is hard enough. Logistics, server lifecycle, DC management, managing procurement contracts with unruly OEMs... Now try to dock all troubles you have with chipmakers to it. It will become a nightmare.
Edit: I didn't see the parent meant a hardware accelerator created by Amazon itself. Thx to @jsty and @paol for pointing this out. An ASIC by Amazon was announced last year and is known as 'AWS Inferentia'
[+] [-] sabalaba|6 years ago|reply
And no, AMD doesn't count. ROCm is a mess.
[+] [-] tdewolf|6 years ago|reply
Applied Brain Research has software called Nengo (www.nengo.ai) explicitly for developing neural network models and compiling them to different backends, including CPUs, GPUs, and neuromorphic hardware (Intel's Loihi, Spinnaker, Spinnaker 2, BrainDrop). It's been battle tested for over 10 years of developing models, built the world's largest functional brain model (https://bit.ly/2VNGgSX), integrates deep learning and spiking neural networks. Would be interested to hear your thoughts on it.
[+] [-] trust07007707|6 years ago|reply
[+] [-] nappy-doo|6 years ago|reply
I don't have first hand experience using it, but from people I know, it does work.
[+] [-] justicezyx|6 years ago|reply
[+] [-] cracker_jacks|6 years ago|reply
Where are the numbers for the Cerebras chip coming from?:
- How do you have a TDP of 180W for an entire wafer of chips?
- Why is there a peak FP32 number when they are clearly working with FP16?
Each of these chips is a completely different architecture and it makes no sense to compare them at this level. The only meaningful comparison is actual performance in applications because that reflects how the entire system will be used.
[+] [-] jwhanlon|6 years ago|reply
[+] [-] inetsee|6 years ago|reply
15 kW power consumption for 1 chip?!?
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] Symmetry|6 years ago|reply
https://fuse.wikichip.org/news/2755/analog-ai-startup-mythic...
That's currently just slideware at the moment.
[+] [-] carlsborg|6 years ago|reply
"Ascend 910 is used for AI model training. In a typical training session based on ResNet-50, the combination of Ascend 910 and MindSpore is about two times faster at training AI models than other mainstream training cards using TensorFlow."
https://www.huawei.com/en/press-events/news/2019/8/Huawei-As...
edit: The software framework "MindSpore will go open source in the first quarter of 2020."
[+] [-] sgt101|6 years ago|reply
I wonder how brittle the performance will be vs other models such as transformers and DRL vs CNN and ResNet.
[+] [-] lopuhin|6 years ago|reply
> I’m focusing on chips designed for training
TPU 1 is designed for inference AFAIK.
> TPU v2: 45 TFLOPs
I think it would be great to clarify that what is commonly referred to as "TPU v2" (e.g. on GCP pricing, also what is shown in the image in this article), consists of 4 such modules with 8 cores total, which gives a more commonly quoted value of 180 TFLOPs.
[+] [-] hoxmark|6 years ago|reply
[+] [-] yboris|6 years ago|reply
https://www.jameswhanlon.com/new-chips-for-machine-intellige...
They claim to have some great features. Anyone know when if a consumer version is coming / any release dates promised?
[+] [-] brookhaven_dude|6 years ago|reply
[+] [-] zapnuk|6 years ago|reply
Or won't they be able to be as price/performance efficient compared to (nvidia) GPUs?
[+] [-] dgacmu|6 years ago|reply
[+] [-] Q6T46nT668w6i3m|6 years ago|reply
[+] [-] rwmj|6 years ago|reply
[+] [-] JoeDaDude|6 years ago|reply
https://coral.withgoogle.com/docs/accelerator/datasheet/
[+] [-] onion2k|6 years ago|reply
Would the nVidia Jetson count?
[+] [-] The_rationalist|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] suyash|6 years ago|reply
[+] [-] alexhutcheson|6 years ago|reply
[+] [-] chips2001|6 years ago|reply
How to make your own AI chip
[+] [-] ckastner|6 years ago|reply
Intel has stuff made by other foundries?
[+] [-] kingosticks|6 years ago|reply
[+] [-] HNLurker2|6 years ago|reply
Quoted from Nvidia Turing datasheet
[+] [-] steve19|6 years ago|reply
[+] [-] nl|6 years ago|reply
I believe it is available via Elastic Inference[1][2] (or maybe soon will be).
[0] https://aws.amazon.com/machine-learning/inferentia/
[1] https://docs.aws.amazon.com/elastic-inference/latest/develop...
[2] https://aws.amazon.com/machine-learning/elastic-inference/
[+] [-] Jack000|6 years ago|reply
Currently, it only takes about 1 month to break even if you buy a consumer gpu like the rtx 2080ti, compared with AWS time. For training purposes it doesn't seem to make sense.
- just looked up the numbers and google tpus are pretty similar in terms of pricing. I think any aws equivalent would probably be just as expensive compared to a diy pc.
[+] [-] baybal2|6 years ago|reply
Why should they? There is not a lot of money to be gained from renting niche product in comparison to enormous capital expenditure for anything hardware related.
Lot's of dotcom companies burned themselves badly while chasing trendsetters with custom silicon. A cookie cutter 40nm SoC may cost "just" 10M today, but by involving yourself into custom silicon game you risk loosing in it. Not to mention that your operations troubles will increase n-fold.
Managing operations of hosting business with hundreds of thousands customer is hard enough. Logistics, server lifecycle, DC management, managing procurement contracts with unruly OEMs... Now try to dock all troubles you have with chipmakers to it. It will become a nightmare.
[+] [-] PeterStuer|6 years ago|reply
https://aws.amazon.com/ec2/instance-types/#Accelerated_Compu...
Edit: I didn't see the parent meant a hardware accelerator created by Amazon itself. Thx to @jsty and @paol for pointing this out. An ASIC by Amazon was announced last year and is known as 'AWS Inferentia'
https://aws.amazon.com/machine-learning/inferentia/
https://perspectives.mvdirona.com/2018/11/aws-inferentia-mac...
[+] [-] kowarie|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]