top | item 28519538

Faster Quantized Neural Network Inference with XNNPack

18 points| Marat_Dukhan | 4 years ago |blog.tensorflow.org

15 comments

Looking at posts from a couple of years back on HN/Reddit/SO about TF vs Pytorch, the only plus side of using TF was the ease of deployment, especially on the mobile side with Tensorflow Lite.

But I imagine that story is changing with the advent of Pytorch Mobile, ONNX, and that Pytorch itself supports XNNPack.

If anyone has any tips or insights as to ease of mobile deployment using TF vs using Pytorch, please share!

aborsy|4 years ago

Can it perform fixed point arithmetic with arbitrary number of bits?

Both training-aware and post training.

Marat_Dukhan|4 years ago

It performs fixed-point arithmetic on 8-bit integers. You can mimick lower than 8-bit precision by using output_min/output_max parameters in XNNPACK operators, but keep in mind that: 1. This functionality is experimental and not exposed in TFLite. You'd need to call XNNPACK APIs directly from C/C++ code. 2. Computations would still be done on 8-bit numbers.

0-_-0|4 years ago

Marat_Dukhan|4 years ago

Author here, happy to take your questions.

elephantum|4 years ago

Do I understand correctly that using XNNPACK and mobile acceleration is mutually exclusive? I.e. it's either XNNPACK or NNAPI/CoreML?

Should I consider XNNPACK for a modern mobile phone?

unknown|4 years ago

[deleted]

elephantum|4 years ago

Is this a drop in solution that works with every existing tflite model?

elephantum|4 years ago

Do the same optimizations apply to tensorflow/tensorflow serving?