top | item 13591578

TensorFlow Fold: Deep Learning with Dynamic Computation Graphs

195 points| moshe | 9 years ago |research.googleblog.com | reply

24 comments

order
[+] imh|9 years ago|reply
For anyone interested in really flexible differentiable graphs, Chainer is the most flexible convenient library I've used. It's all I use for prototyping neural nets anymore, and I'm surprised not to see more adoption. It feels like working in numpy.
[+] curuinor|9 years ago|reply
In large part, in the CPU section, it _is_ working in numpy. I think of the neural network libraries, Chainer is made by the people who like actually coding the most.

I mean, for example, lots of the TensorFlow type checking gets done in Eigen, where it's done by C++ template metaprogramming (that's how it worked when I looked at it, anyhow): Chainer type stuff just gets done by runtime inspection.

Which one is faster? TF, by far. Which one would you rather have in _your_ codebase?

Edit: after reading the damned thing, they add in more runtime type stuff. And after looking over TF back again, they still have this hybrid thing going on where it's some Eigen stuff and some runtime stuff. I mean....

[+] vvladymyrov|9 years ago|reply
Also there is http://pytorch.org - that what started as a fork from Chainer. On a high level it is build for the same purpose - to support dynamic graphs.
[+] chewxy|9 years ago|reply
The paper is dense and I'm on a train. Can anyone summarize the difference between TensorFlow Fold and Chainer?

Also, self promotion: Gorgonia (https://github.com/chewxy/gorgonia) has support for dynamic computation graphs ala Chainer since day 1... however, batched computation remains difficult to implement.

[+] moshe|9 years ago|reply
TensorFlow Fold provides a TensorFlow implementation of the dynamic batching algorithm (described in detail in our paper [1]). Dynamic batching is an execution strategy for computation graphs, you could also implement it in PyTorch or Chainer or any other framework.

Our particular implementation of dynamic batching uses the TF while loop, which means that you don't need to make run-time modification to the actual TF computation graph. At runtime, we essentially encode the computation graph for (let's say) a parse tree as a serialized protocol buffer (tf.string), so instead of varying the computation graph itself we vary the input to a static computation graph instead. This particular implementation strategy is very much a byproduct of how TensorFlow works (static computation graph, heavy lifting happens in ops implemented in C++).

[1] DEEP LEARNING WITH DYNAMIC COMPUTATION GRAPHS, https://openreview.net/pdf?id=ryrGawqex

[+] kyloon|9 years ago|reply
This is great news, was just wondering when TensorFlow would support this after reading about PyTorch.
[+] zump|9 years ago|reply
They got scooped and pushed to publish the interns project.
[+] kriro|9 years ago|reply
The concept seems interesting. I have stopped the close investigation of the stack I use at the "Keras level" and mostly use things below that as a black box. I'm defaulting to Theano since I only have one GPU to work with but as far as I can tell switching to TensorFlow is basically a small config-change. I've only browsed this but since I mostly do NLP (and virtually no image recognition) I suppose it could be worthwhile to switch. I guess I'll need to open the black boxes a bit and see what Theano does :)
[+] superfx|9 years ago|reply
Why does the GitHub page say this is not an official google project, yet it's on the google blog?
[+] moshe|9 years ago|reply
Please note that the GitHub page says "not an official Google product", rather than "project". An official Google product would be something like gmail.
[+] congerous|9 years ago|reply
The "leading" DL framework is playing catch-up to Chainer, PyTorch and DyNet. Another Google product development bungle.
[+] general_ai|9 years ago|reply
The way I see it, TF is about to pull _way_ ahead thanks to XLA JIT/AOT. All of a sudden you get the ability to fuse things at a much more granular level, which could reduce memory bandwidth requirements by a lot. Frameworks like Torch can't do any fusing at all, since their computation is fully imperative. Tactical win for imperative frameworks, I suppose, but strategically functional graph is the way to go. DB people realized this in the 70s, ML people are realizing this now.