top | item 19240978

(no title)

kjaer | 7 years ago

For finger tracking, version 1 used random forests [1], because of the performance/hardware budget trade-off: they're harder to train than a traditional deep learning algorithm, but are much more efficient to compute on the device (branching being basically free on a CPU).

Version 2 uses a deep learning accelerator [2], which makes it possible to do the heavier computation of DNNs (which involve floating-point operations, which would be much more expensive on the CPU).

From an engineering perspective, I just love seeing how it touches all abstraction layers of the stack, and the types of solutions that come out of thinking about the silicon and the high-level ML models at the same time.

[1] https://www.microsoft.com/en-us/research/wp-content/uploads/...

[2] https://homes.cs.washington.edu/~kklebeck/lebeck-tech18.pdf

discuss

kevingadd|7 years ago

Unless the paper specifically calls it out in a spot I didn't see, it's not necessarily the case that the DNN operations are floating-point. Some networks use FP16 or FP32 (it's my understanding that this is very common during training) but actual production use of a trained network can happen using int8 or int4. You can see this if you look at what the 'Tensor' cores in modern geforce cards expose support for and what Google's latest cloud tensor cores support. NV's latest cores expose small matrices of FP16, INT8 and INT4 (I've seen some suggestions that they do FP32 as well but it's not clear whether this is accurate), while Google's expose huge matrices in different formats (TPUv1 was apparently INT8, TPUv2 appears to be a mix of FP16 and FP32).

In non-DNN image processing it's quite common to use ints as well (iDCT, FFT, etc) for the potential performance gains vs. floating point.

munib_ca|7 years ago

When you mention version 1 and 2, are you referring to the original hololens and the new one?

kjaer|7 years ago

Yes. I'm referring to the original HoloLens and HoloLens 2, so to the hardware versions.