top | item 22654968

(no title)

darsnack | 6 years ago

I have been using Flux for a year (or more?) and I have never found it to be slower than PyTorch or TF. Granted I am training at most ResNet-20 and mostly smaller models, so maybe there is larger training routines where people have issues. Every single one of these deep learning libraries is mapping to CUDA/BLAS calls under the hood. If you wrote the framework correctly, the performance difference should not be drastically different. And Flux doesn’t have much in terms of overhead. My lab mate uses PyTorch to train the same models as me and his performance is consistently the same or worse.

As for features, I think this is because people coming from TF or PyTorch are used one monolithic package that does everything. That’s intentionally not how Flux or the Julia ecosystem is designed. I’ll admit that there are a lot of preprocessing utility functions that could be better in the larger Julia ML community. But for the most part, the preprocessing required for ML research is available. This is mostly the fault of the community of not having a single document explaining to new users how all the packages work together.

Where the difference between Flux and other ML frameworks is apparent is when you try to do anything other than a vanilla deep learning model. Flux is extensible in a way that the other frameworks are just not. A simple example is with the same lab mate and I trying to recreate a baseline from a paper that involved drawing from a distribution at inference time based on a layer’s output then applying a function to that layer based on the samples drawn. I literally implemented the pseudo code from the paper because in Flux everything is just a function, and chains of models can be looped in a for loop like an array. Dumb pseudo code like statements where you just write for loops are just as fast in Julia. And it was! Meanwhile my friends code came to a grinding halt. He had to resort to numerical approximations for drawing from the distribution because he was forced to only use samplers that “worked well” in PyTorch. This is the disadvantage of a monolithic ML library. I didn’t use “Flux distributions,” I just used the standard distributions package in Julia.

This disadvantage to TF and PyTorch will become even more apparent when you do model-based RL. Flux was designed to be simple and extensible from the start. TF and PyTorch were not.

discuss

komuher|6 years ago

Ok my fault i was writing from perspective of ML engineer not reasercher (I'm using Julia for 1.5 year now and my bois reaserchers prefer pure julia solutions cause its easier to write u can use symbols and not using OOP etc.)

But for production ready models PyTorch and TF is miles ahead first of all: NLP, audio and vision based packages building frameworks, (attention layers, vocoders etc.) then u have option to compile models using XLA and use TPU (about 2/3 times cheaper then gpu for most of our models [audio and nlp])

Next inference performance (dunno about now maybe this change but about ~8 months ago flux was about 15-20% time slower [tested on VGG and Resnet's]then pytorch 1.0 without XLA)

Time to make it to production: Sure maybe writing model from scratch can take a bit longer on PyTorch then Flux (if u not using build in torch layers) but getting in into production is a lot faster, first of all u can compile model (something not possible in Flux) and u can just use it anywhere from Azure and AWS to GCP and Alibaba Cloud make a rest api using Flask/Fast-api etc. or just using ONNX.

Dont get me wrong i love Julia and Flux but there is still a LONG way before most people can even consider using Flux on production enviroment not for reasearch or some MVP stuff.

FabHK|6 years ago

I have no special insight into ML or Julia (though I love it), but one thing I can confirm from experience is that there is a huge difference between getting a model work once in an academic or research setting, and having something reliably and scalable work in production day after day. Mind boggling, totally different challenges.