top | item 11244880

Leaf: Machine learning framework in Rust

379 points| mjhirn | 10 years ago |github.com

49 comments

order
[+] wall_words|10 years ago|reply
The performance graph is deceptive for two reasons: (1) Leaf with CuDNN v3 is a little slower than Torch with CuDNN v3, yet the bar for leaf is positioned to the left of the one for Torch, and (2) there's a bar for Leaf with CuDNN v4, but not for Torch.

It's good to see alternatives to Torch, Theano, and TensorFlow, but it's important to be honest with the benchmarks so that people can make informed decisions about which framework to use.

[+] kibwen|10 years ago|reply
The graph in the readme is outdated, you can see the version with Torch/CuDNN v4 here: http://autumnai.com/deep-learning-benchmarks

And I don't believe the first point counts as deceptive; the bars are ordered by Forward ms, not by the sum of Forward and Backward. In both CuDNN v3 and v4, Leaf is faster than Torch by that metric (25 vs 28 for v4, 31 vs 33 for v3).

[+] emcq|10 years ago|reply
Yes, on their site they post Torch CuDNN v4 as faster than Leaf [0]. Seems exciting for an early release.

Can it get much faster than something like Torch? I would think if CuDNN is doing most of the computation time it would be hard to see big improvements. Perhaps go the route of Neon and tune your GPGPU code like crazy [1, 2], or MXNet and think about distributed computing performance [3].

[0] http://autumnai.com/deep-learning-benchmarks

[1] https://github.com/soumith/convnet-benchmarks

[2] https://github.com/NervanaSystems/neon

[3] http://alex.smola.org/talks/NIPS15.pdf

[+] jean-|10 years ago|reply
> Leaf with CuDNN v3 is a little slower than Torch with CuDNN v3, yet the bar for leaf is positioned to the left of the one for Torch

I think that's because they're sorting by forward time rather than forward+backward. That would also explain why in the Alexnet benchmark Tensorflow (cuDNN v4) is to the left of Caffe (cuDNN v3) despite having a much taller bar overall.

[+] IshKebab|10 years ago|reply
I think Microsoft's approach with CNTK is far preferable to this. Rather than defining all the layers in Rust or C++ it uses a DSL to specify mathematical operations as a graph.

You can easily add new layer types, and recurrent connections are easy too - you just add a delay node.

Furthermore, since the configuration file format is fairly simple, it is possible to make GUI tools to visualise it and - in future - edit it.

[+] hobofan|10 years ago|reply
A DSL based format has some advantages as it easy to get going with building networks. However you are then constrained by what the program that interprets/executes the DSL supports in terms of loading/saving data, solvers etc.. If you want to do something more dynamic e.g. AlphaGo then you have to go back to a "real" programming language anyway.

That's not to say that Leaf won't have a DSL at some point, but we will wait until the features of the layers are a bit more stabilized and we have more clearly mapped out what goals we have for a DSL.

[+] rubyfan|10 years ago|reply
I'm honestly skeptical that Rust is all that appealing for this type of work. It just doesn't seem like the main concerns like performance and type safety are #1 the top priority in this space and #2 this offering is differentiated enough from what you already get from Java today.

Honesly, many modeling problems are clunky and inefficient at scale - however that's ok. When you need to scale bad enough, you already have a significant set of library support in Java to support this.

I'm failing to see an answer to the one question I have, "why rust?"

[+] YeGoblynQueenne|10 years ago|reply
> super-human image recognition

That's a bold claim. As far as I know there was one paper that reported a model beating human scores in a specific test (imagenet, I believe). Whether that translates to "superhuman" results in general is followed by a very big question mark.

In general I really struggle to see how any algorithm that learns from examples, especially one that minimises a measure of error against further examples, can ever have better performance than the entities that actually compiled those examples in the first place (in other words, humans).

I'm saying: how is it possible to learn superhuman performance in anything from examples of mere human performance at the same task? I don't believe in magic.

[+] Houshalter|10 years ago|reply
First of all no one ever expected machines to beat humans at Imagenet. At least not this soon. It's an amazing accomplishment, because Imagenet is high resolution pictures of many different types of objects. Which is very different than tiny photos or pictures of digits.

Second the examples were produced by scraping Flickr. Then mechanical turkers were asked to confirm if the object was in the image or not.

There are many images that are kind of ambigious, or contain multiple objects, so humans don't do perfectly. One researcher tried to estimate human performance, and got about 5%. Which has been beaten by computers now, by a lot.

[+] tomp|10 years ago|reply
> how is it possible to learn superhuman performance in anything from examples of mere human performance at the same task? I don't believe in magic.

Computers could be better at assigning probabilities to ambiguous examples. In particular, for an image that is very ambiguous for most humans, maybe a computer would assign 99% probability to it (hence it would be only a little bit ambiguous).

[+] kvb|10 years ago|reply
Ensembles of humans can outperform the average human, and in the same way an algorithm trained on data labeled by an ensemble of humans can outperform the average human.
[+] benbou09|10 years ago|reply
It is much faster than humans
[+] kingnothing|10 years ago|reply
I'm completely new to ML and what real world applications it's suitable for. Are we at the point yet where you can train a computer to look at arbitrary images and count the number of people in it? What if it was the largely on the same background and only the number of people were changing -- for example, a camera shooting a queue of people to determine queue depth at a bus station.
[+] rck|10 years ago|reply
A system like that would be surprisingly hard to build. The problem wouldn't be the ML algorithms - it would be just about everything else. A few things you need to solve robustly to build your counter:

1. The "same background" doesn't really exist for most cameras in most settings. Changes in illumination alone will make segmenting the background tricky. Moving objects in the scene will also be hard - think fountains and trees in the wind. Google for "foreground-background segmentation" to see some papers on this.

2. I haven't seen anyone use recent ML algorithms with less than high quality images. That may not matter, but it could matter a lot.

3. Extending recent ML algorithms to work with video at a high enough frame rate to be useful (10Hz at a minimum) may or may not be easy.

I'm sure that what you're proposing could be done. But I think that the number of small annoyances you'd hit would probably discourage most people who aren't treating the problem as a research exercise in Computer Vision.

[+] danielvf|10 years ago|reply
In the scale of computer vision problems, the stationary camera case is relatively easy. It's not too hard to isolate moving objects from a background, it's not too hard to decide if an object is a person or not, and it's not too hard to keep track of an object once you've identified it. You would still have to handle overlapping people, scene illumination changes, etc, but these can be solved and have been done before.

If you would like to play with some of this stuff, take a look at OpenCV. http://opencv.org

[+] somerandomness|10 years ago|reply
I actually think this is quite do-able and has been for a while. Although deep learning has revolutionized object recognition, face detection has been working reasonably well for some time, e.g. your cell phone camera or Google street view face blurring.
[+] argonaut|10 years ago|reply
Yes. The general task of looking at arbitrary images and labeling objects (from a set of known categories) in those images is called "detection." In fact the problem you described is easier, because there's only one category (people), and the system only needs to provide a count, rather than provide bounding box rectangles around each object (which is what the standard "detection" task entails).

Convolutional neural nets are the state of the art for this, specifically deep residual learning (http://arxiv.org/abs/1512.03385). It requires a good deal of background to understand what's going on and tune/implement the models, though, even if you just use the frameworks already out there. You probably don't even need that much data - you can probably grab pre-trained models and train them on a small additional dataset you collect.

They can definitely handle arbitrary backgrounds, although having a standard background makes the problem even easier, again.

Most deep learning computer vision algos are trained on 256x256 images, so having even larger images is just fine (you can downsample, or maybe even add up the predictions of different crops).

[+] eggy|10 years ago|reply
I will take a look at it, but are the benchmarks comparable, since to quote the site, "For now we can use C Rust wrappers for performant libraries."? Torch is LuaJit over C, and Tensorflow has Python and C++. Is Rust making it fast, or the interface code to the C libraries?
[+] hobofan|10 years ago|reply
The interface code to the C libraries (which is written in Rust). We are however optimistic that there will be Rust libraries popping up in the future that outperform the current C implementations. (Optimistic as a Rust user, not as developer as Leaf)
[+] ybrah|10 years ago|reply
Its interesting to see "technical debt" become a more common term. Is there a rigid definition for it?

From the article: "Leaf is lean and tries to introduce minimal technical debt to your stack."

What exactly does that mean?

[+] jamesblonde|10 years ago|reply
It's code that you write (typically quickly), that you know will need to be re-written at a later stage. It's debt that will need to paid at some stage in the future. You didn't do it right first time.

Technical debt typically arises because the code was poorly structured or the programmer used the wrong tools/libraries (from a longer-term perspective) or didn't abstract when she should have. The current obsession with MVPs has led to an increase in technical debt.

[+] pmarreck|10 years ago|reply
Yes. https://en.wikipedia.org/wiki/Technical_debt

I've seen it firsthand. Basically, it's the accumulation of suboptimal code, over time, usually due to time constraints imposed by management. In short, any time you do a dirty hack just to get something working and meet a deadline, and then don't find the time to refactor that code into a working non-hack, you have piled a bunch of manure onto the technical-debt heap. But it also seems to be a side-effect of normal code accretion to a codebase while on a team- in other words, there seems to be no way to avoid it entirely. It's like cancer, in biology. ;)

TD-ridden code is often not modular, not unit-tested, has many dependencies (spaghetti) which are then difficult to remove or replace and tend to trigger cascading bugs/failures, has too many responsibilities, has very long methods/functions, uses mutable state (changes global state which can then impact other parts of the codebase or make concurrency impossible), or is otherwise difficult to maintain.

An example of "working" tech debt is the "God class" in codebases, the model that the entire business depends on but which is over-laden with responsibilities. The risk to change it is too great (due to the business dependence) so it becomes a constant thorn in the side of maintaining the code.

The "debt" part comes from the fact that at some point you are expected to "repay" it (via costly man-hours of refactoring work). The benefit of doing so is potentially multifold, though: Faster/more modular/better-written code, faster tests (and therefore better productivity), better designs in general, more resilient code, more maintainable code, less buggy code, etc. etc.

The only known resolutions of tech debt are costly refactorings or global rewrites. The way to reduce the risk there is to first unit-test the existing code. These books help:

http://smile.amazon.com/Growing-Object-Oriented-Software-Gui...

http://smile.amazon.com/Refactoring-Improving-Design-Existin...

[+] taneq|10 years ago|reply
Think of it as 'code rot'. It's quick and dirty fixes that will have to be fixed later, taking longer overall than if you'd just done it right to start with (hence 'debt').
[+] dev1n|10 years ago|reply
Tightly coupled code is how I define tech debt
[+] eranation|10 years ago|reply
This is very cool! When I presented it to my CTO however, he said that he doesn't think this will gain traction from data scientists over Scala or Python, as Rust is even more complex than Scala (which is not the simplest language out there, even though I'm a big fan of both Scala and Rust and I know this might start a flame war)

Do you think Data Scientists can write their models directly using Leaf? do you think there will need to be a DSL that translates form the R / Python world to something you can run on Leaf to make it happen?

[+] kibwen|10 years ago|reply
By what metric does your CTO consider Rust to be more complex than Scala? A lot of Scala's complexity has to do with interfacing nicely with Java, and Scala has a lot of implicit behavior and TIMTOWTDI-ness that Rust deliberately tries to avoid. Odersky has even said that he's hoping that he can remove many features from Scala in the future.
[+] emcq|10 years ago|reply
It has less to do with complexity and more go do with REPL/Jupyter notebook support. Rust is a compiled language and you won't get some of the ease of exploratory data analysis you do with something like ipython.

I can use something like pandas or autograd to experiment with new optimization functions in seconds. For these big NN models it takes hours to days to wait for your model to train so squeezing out more performance is worth a more complex language.

[+] rck|10 years ago|reply
The benchmarks would be a lot more useful if the context around them were more obvious. In particular, it would be nice to know if the benchmarks are for a single input, or for a batch of inputs. If for a batch, then the batch size is important too. Maybe this stuff is somewhere on their site, but it shouldn't require digging.

Without this information it's hard to make a useful comparison at all.

[+] hobofan|10 years ago|reply
You are right, batchsize is important and we should make that more clear.

The numbers in the benchmark are taken from our deep-learning-benchmarks[1] which we are still in the process of building up. It might actually make sense to test the same model with different batch sizes. The current benchmarks are based on the convnet-benchmarks[2] where the Alexnet model has a batch size of 128. (Alexnet was chosen because out of the benchmarks that's the one I am most familiar with, since it small enough that I can work with it on my Laptop)

In some informal tests Leaf was generally faster than other frameworks in smaller batch sizes, but no benchmarks that we could publish with confidence yet.

[1]: https://github.com/autumnai/deep-learning-benchmarks [2]: https://github.com/soumith/convnet-benchmarks

[+] mastax|10 years ago|reply
I'm glad that rust has crossed the point where posts to HN that would be "_ in Rust" are now just "_". I hope this means that Rust is starting to be used for its own merits rather than just novelty.
[+] dang|10 years ago|reply
We changed the title to say "in Rust" because someone else complained about "for Hackers". I suppose we could take both of them out, but the project highlights its Rustiness so this seems more representative.
[+] yarrel|10 years ago|reply
1. Rust warning.

2. If "for hackers" is the new "for dummies" then gentrification is complete.