top | item 15942256

(no title)

brianchu | 8 years ago

1. I wouldn't take much away from the LSTM benchmark. It's more a benchmark of Keras since Keras only supports CuDNN's LSTM via Tensorflow right now. AFAIK CNTK does supports CuDNN LSTM but not through Keras. Keras actually implements its own LSTM in terms of the base math operations (it doesn't call the Tensorflow or CNTK LSTM operations which are in some cases optimized in C++ etc.), so on the CPU you probably could get better performance if you were using the Tensorflow or CNTK functions directly.

2. Compiling Tensorflow from source on CPUs is a bit of a hassle but I have seen nice performance gains (10-20%) for LSTM tasks. I bet you would get even higher gains for CNNs since they're more parallelizable. (Note: I've never gotten the latest TF to work with Intel MKL).

3. I haven't fully tested this myself, but with the P100s you also have full support for half precision floats, which supposedly offer a huge speedup.

4. Also would have liked to see benchmarks of other frameworks like PyTorch, etc. I haven't used them myself but everything I've heard indicates that Tensorflow is often slower.

discuss

No comments yet.