borramakot's comments

borramakot | 7 years ago | on: Efficient Methods and Hardware for Deep Learning [pdf]

Is there a way to do nonlinearity (ReLU) in the Winograd domain?

borramakot | 7 years ago | on: Efficient Methods and Hardware for Deep Learning [pdf]

I spent a little bit of time trying to understand exactly what DeePhi provides, but couldn't find any whitepapers or anything. Do you know any documentation on which of these approximations are supported by DeePhi?

borramakot | 8 years ago | on: What it means to “disagree and commit” and how I do it (2016)

There is a principle on development: Hire and Develop the Best Leaders raise the performance bar with every hire and promotion. They recognize exceptional talent, and willingly move them throughout the organization. Leaders develop leaders and take seriously their role in coaching others. We work on behalf of our people to invent mechanisms for development like Career Choice.

borramakot | 8 years ago | on: Ask HN: Would you prefer paying $1 per 1000 API calls or $30 fixed monthly?

This seems strictly better than the price/1000 calls, since at least you only have to predict the order of magnitude of usage.

borramakot | 8 years ago | on: Why I Quit Google to Work for Myself

It doesn't sound like they meant that senior was the last possible level, just the last level to which promotion in a career should be expected through "normal" growth and development. It looks like less than 1-2% of engineers at my company (FAANG) are above "Senior".

borramakot | 8 years ago | on: Benchmarking Google’s new TPUv2

What are the Amazon rumors?

borramakot | 8 years ago | on: Cloud TPUs in Beta

Back of the envelope, a TPU costs a little more than 2x as much as a Volta on AWS P3, and delivers a little less than 2x the performance (180 TOPs for the TPU, 100 for Volta). On a raw performance/$ metric, I'm not sure the TPU is that interesting.

It might be worth it if I were willing to pay a huge amount to get back results from an experiment faster, by using lots of TPUs- distributed learning on GPUs doesn't seem easy yet.

borramakot | 8 years ago | on: Virtex UltraScale+ FPGA Augmented with Co-Packaged – Community Forums

That's Catapult, right? I have read through some of their papers. It sounds like they might be offloading some network work onto the FPGAs in the same way AWS has their custom nitro card, but I've not really been impressed with their attempts at data processing improvements (some reason for me to use it on Azure). I haven't read all of their papers, but what I have read always sounded like an after the fact justification for the FPGAs. They might show the FPGAs are better at a machine learning task than CPUs, but unless you were deciding whether to use FPGAs you already have, the real competition is GPUs and they tended not to compare to a GPU.

Do you have a recommendation for a specific data processing experiment of theirs I should check out? I really feel like I just missed a paper where they proved some real advantage over other hardware, and once I found that I'd understand. I respect the Azure teams generally and assume they know what they're doing- but can't escape the hunch that what they are doing is network acceleration, and are just releasing to the public cloud because they have these sitting around anyway.

borramakot | 8 years ago | on: Virtex UltraScale+ FPGA Augmented with Co-Packaged – Community Forums

F1's on AWS are I think ultrascale+, though I don't know how fast Amazon actually lets you drive them.

borramakot | 8 years ago | on: Virtex UltraScale+ FPGA Augmented with Co-Packaged – Community Forums

I see the appeal here, but I'm surprised a good DSP or mid-volume, old node ASIC isn't the more common solution here. Do these need such specialized processing that FPGAs make economic sense?

borramakot | 8 years ago | on: Virtex UltraScale+ FPGA Augmented with Co-Packaged – Community Forums

Thanks for these!

The first paper is actually one I've spent a significant amount of time trying to use, to the point of collaborating with one of the authors. His conclusion was that FPGAs used to be competitive with GPUs for approximated nets, but the Tesla GPUs were such a jump forward in practical network performance that it wasn't worth trying to compete outside specialized realms like binary nets.

The second paper was interesting- I can imagine why the problem they are trying to solve would be a good fit for FPGAs. However, I'm suspicious that they implemented an entirely different algorithm on the FPGA, and didn't measure the performance of that algorithm on GPUs. I'm all for using the best algorithm for the hardware, but I worry they just used an overall better algorithm on FPGAs and conflated the results.

borramakot | 8 years ago | on: Virtex UltraScale+ FPGA Augmented with Co-Packaged – Community Forums

These all seem fair- I've been mostly looking at large scale FPGAs in a data center, a la F1 or Microsoft's Catapult. I hadn't given much thought to use as low-run hardware, at which I'm sure they excel.

borramakot | 8 years ago | on: Virtex UltraScale+ FPGA Augmented with Co-Packaged – Community Forums

I hear this a lot, but every time I try to implement a specific algorithm (in crypto, compression, and ML so far), I find that a GPU practically beats the FPGA on every metric but power, especially total cost. No matter how nicely the problem seems to map to an FPGA, GPUs start from such high performance that I can't seem to beat them- the one exception so far being some genomic algorithms.

Are there any really good papers, projects, or products that show where FPGAs provide a major commercial benefit over a GPU?

borramakot | 8 years ago | on: A General Neural Network Hardware Architecture on FPGA [pdf]

Are there performance numbers for this, on e.g. resnet-50?

borramakot | 8 years ago | on: Simulating RISC-V Clusters with FPGAs on AWS

To try this, do I need the full 16XL F1 instance, or can I run 1 node on the 2XL (1 FPGA) instance?

borramakot | 8 years ago | on: Intel Announces Movidius Myriad X VPU, Featuring ‘Neural Compute Engine’

I won't defend FP24, but FP16 is a widely supported IEEE standard.

https://en.wikipedia.org/wiki/IEEE_754

borramakot | 9 years ago | on: Peter Thiel To Join Trump Transition Team

The most clear statement from him I'm aware of is "The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive." 1:15 PM - 6 Nov 2012.

borramakot | 9 years ago | on: Ask HN: Why is FizzBuzz more important than actual experience?

Out of curiosity, I tried literally FizzBuzz on a couple of new grad candidates (my job is software, but we attract a lot of computer engineers). About half were able to do it without help. I no longer use FizzBuzz, but I'll use other very trivial questions to end an interview early if I suspect a resume has been significantly exaggerated.