top | item 21178609

New chips for machine intelligence

172 points| jwhanlon | 6 years ago |jameswhanlon.com

51 comments

order
[+] sabalaba|6 years ago|reply
Software. Software. Software. Just two companies, Google and NVIDIA, have publicly launched a viable service or software stack. Just two companies have successfully written a "sufficiently advanced compiler". Just two companies actually have a product. And Google refuses to step into the arena and actually compete with NVIDIA. Man, what a time we live in.

And no, AMD doesn't count. ROCm is a mess.

[+] tdewolf|6 years ago|reply
Disclaimer: am a co-founder of this company

Applied Brain Research has software called Nengo (www.nengo.ai) explicitly for developing neural network models and compiling them to different backends, including CPUs, GPUs, and neuromorphic hardware (Intel's Loihi, Spinnaker, Spinnaker 2, BrainDrop). It's been battle tested for over 10 years of developing models, built the world's largest functional brain model (https://bit.ly/2VNGgSX), integrates deep learning and spiking neural networks. Would be interested to hear your thoughts on it.

[+] trust07007707|6 years ago|reply
I wonder if WebGPU will reduce dependence on CUDA, esp as Tensorflow is being ported to WebGPU. With WebGPU's improved performance and utility and the fact that it runs on top of Vulcan, Metal and D3D with any GPU that has drivers for those, I wonder if DL folks will find it more tempting to use TFJS/WebGPU via Electron or the browser and just be done with CUDA (i.e. break or soften NVIDIA's monopoly)
[+] justicezyx|6 years ago|reply
Is there a good summary of the state of art in this field of software infra for ML?
[+] cracker_jacks|6 years ago|reply
Some of the numbers in that table do not make any sense and makes me question the quality of the entire article.

Where are the numbers for the Cerebras chip coming from?:

- How do you have a TDP of 180W for an entire wafer of chips?

- Why is there a peak FP32 number when they are clearly working with FP16?

Each of these chips is a completely different architecture and it makes no sense to compare them at this level. The only meaningful comparison is actual performance in applications because that reflects how the entire system will be used.

[+] jwhanlon|6 years ago|reply
In the table, the figures are for a single die in the wafer. This is to make a meaningful comparison with the other chips listed (there is a table footnote for this). The 15 KW is the power consumption of the whole wafer (a detail I think was mentioned in the Hot Chips presentation). Why are they clearly working with FP16? Are there any public details on this?
[+] inetsee|6 years ago|reply
One of the numbers that jumped out at me as being very unusual about the Cerebrus chip was this one: "Speculated clock speed of ~1 GHz and 15 kW power consumption."

15 kW power consumption for 1 chip?!?

[+] carlsborg|6 years ago|reply
Huawei looks like it has a strong game here:

"Ascend 910 is used for AI model training. In a typical training session based on ResNet-50, the combination of Ascend 910 and MindSpore is about two times faster at training AI models than other mainstream training cards using TensorFlow."

https://www.huawei.com/en/press-events/news/2019/8/Huawei-As...

edit: The software framework "MindSpore will go open source in the first quarter of 2020."

[+] sgt101|6 years ago|reply
When I read about it first I wondered if it would be a mobile chip - but apparently not (with a TDP of 300w)

I wonder how brittle the performance will be vs other models such as transformers and DRL vs CNN and ResNet.

[+] lopuhin|6 years ago|reply
Small corrections:

> I’m focusing on chips designed for training

TPU 1 is designed for inference AFAIK.

> TPU v2: 45 TFLOPs

I think it would be great to clarify that what is commonly referred to as "TPU v2" (e.g. on GCP pricing, also what is shown in the image in this article), consists of 4 such modules with 8 cores total, which gives a more commonly quoted value of 180 TFLOPs.

[+] hoxmark|6 years ago|reply
FYI: "DISCLAIMER: I work at Graphcore, and all of the information given here is lifted directly from the linked references below."
[+] brookhaven_dude|6 years ago|reply
Are deep neural networks really that widely applicable that it's profitable to design custom chips for them? What about other models of AI that involve, say, discrete math or graph search?
[+] zapnuk|6 years ago|reply
Will there be a customer grade TPU in the near future?

Or won't they be able to be as price/performance efficient compared to (nvidia) GPUs?

[+] dgacmu|6 years ago|reply
Consumer turing cards are about the closest you get right now. They're pretty reasonable bang for buck for training. They have tensor cores - not quite as many as Ulta, but the entire chip runs at a higher clock rate and the price/performance is better if you don't mind losing a gig or so of RAM and some memory bandwidth.
[+] Q6T46nT668w6i3m|6 years ago|reply
Apple’s neural engine in their A-series of SoCs.
[+] rwmj|6 years ago|reply
Kendryte K210 maybe? It's cheap as chips (pun intended!) I think I got mine for £40 including shipping. https://kendryte.com/ Note this is only for inference. For training you'll have to use a GPGPU or one of the chips in this article.
[+] onion2k|6 years ago|reply
Will there be a customer grade TPU in the near future?

Would the nVidia Jetson count?

[+] The_rationalist|6 years ago|reply
ARM socs already have NPU, but I guess they doesn't count.
[+] suyash|6 years ago|reply
This list is missing mobile phone chips that are specially designed for Deep Learning.
[+] alexhutcheson|6 years ago|reply
Existing mobile phone chips are designed for inference, not training. The list is explicitly restricted to chips that are designed for training.
[+] ckastner|6 years ago|reply
> Intel NNP-T TSMC 16FF+

Intel has stuff made by other foundries?

[+] kingosticks|6 years ago|reply
Stuff they acquired. This is something originating from Nervana Systems and I think there are also some Altera chips out there. Intel's custom foundry offering has historically been poor so chances are anyone they buy will have been using someone else (why take the risk and change that).
[+] HNLurker2|6 years ago|reply
'''CONCLUSION Graphics has just been reinvented. The new NVIDIA Turing GPU architecture is the most advanced and efficient GPU architecture ever built. Turing implements a new Hybrid Rendering model that combines real-time ray tracing, rasterization, AI, and simulation. Teamed with the next generation graphics APIs, Turing enables massive performance gains and incredibly realistic graphics for PC games and professional applications.'''

Quoted from Nvidia Turing datasheet

[+] steve19|6 years ago|reply
I am surprised Amazon has not jumped in the game, renting out an accelerator like Google does with the TPUs
[+] Jack000|6 years ago|reply
AWS gpu compute is extremely expensive. If this is due to datacenter licensing costs, I hope they come out with their own hardware soon to reduce these costs. If on the otherhand, it's because their value-add is not in renting out the hardware but burst scalability, then I'm less optimistic that they'd cannibalize their own cloud product.

Currently, it only takes about 1 month to break even if you buy a consumer gpu like the rtx 2080ti, compared with AWS time. For training purposes it doesn't seem to make sense.

- just looked up the numbers and google tpus are pretty similar in terms of pricing. I think any aws equivalent would probably be just as expensive compared to a diy pc.

[+] baybal2|6 years ago|reply
> I am surprised Amazon has not jumped in the game

Why should they? There is not a lot of money to be gained from renting niche product in comparison to enormous capital expenditure for anything hardware related.

Lot's of dotcom companies burned themselves badly while chasing trendsetters with custom silicon. A cookie cutter 40nm SoC may cost "just" 10M today, but by involving yourself into custom silicon game you risk loosing in it. Not to mention that your operations troubles will increase n-fold.

Managing operations of hosting business with hundreds of thousands customer is hard enough. Logistics, server lifecycle, DC management, managing procurement contracts with unruly OEMs... Now try to dock all troubles you have with chipmakers to it. It will become a nightmare.

[+] PeterStuer|6 years ago|reply
You mean something not like EC2 Accelerated Computing Instances?

https://aws.amazon.com/ec2/instance-types/#Accelerated_Compu...

Edit: I didn't see the parent meant a hardware accelerator created by Amazon itself. Thx to @jsty and @paol for pointing this out. An ASIC by Amazon was announced last year and is known as 'AWS Inferentia'

https://aws.amazon.com/machine-learning/inferentia/

https://perspectives.mvdirona.com/2018/11/aws-inferentia-mac...

[+] kowarie|6 years ago|reply
what are your thoughts on how federated learning might change the HW landscape for edge devices?