Machine Learning and AI seem to be in vogue but become tough to implement unless you have boatloads of data. We've personally had multiple frustrating experiences over the last ~7 years of trying to solve problems using ML. In almost all the cases we failed to ship due to lack of data. Transfer Learning is a major breakthrough in ML where companies with little data can also build state of the art models. Unfortunately not enough people know about it. We are trying to do our part to make it easier to use Transfer Learning as well as increase awareness about it.
Using Transfer Learning we can build a model to identify cats and dogs in images with a few (<100) images as compared to the few thousands it would take before.
To make Transfer Learning easy we are building https://nanonets.ai that has multiple pretrained models that can be augmented with your data to create state of the art models. We are currently in the process of building our first few models. Image Labeling and Object Detection (in Images) work with a few Text based models coming up in the next few weeks.
Is transfer learning really not widely known by people doing AI? In my field, computer vision, it is used by most of the papers in the past three years in CVPR, etc. All of the students that take either my deep learning or my computer vision courses have to do assignments on transfer learning with deep neural networks.
Another less know but really promising approach is program synthesis (also called "program generation"). One can build fairly robust model just using 2-5 examples and that too in just seconds. Implementation of this approach is already shipped in to Excel where you just enter few example of formatting and "Flash Fill" will learn what to do: http://research.microsoft.com/en-us/um/people/sumitg/flashfi...
This is cool. From what I understand from paper, its DSL has set of algorithms as building blocks that learn the input/output function. Deep learning algos are trying to do the same but with more generic blocks where assumption is that a lot of these blocks will be able to learn algorithms too. Deep learning is trying to build with a more generic approach in which transfer learning is helping to reduce number of examples needed by reusing algorithms learned.
I think Deep Learning is very frustrating to work with at the moment. First, there is the problem of overfitting, which shows up typically after you've already been training for hours. So you have to tweak things (basically this is just guessing), and start from scratch. If your network has too many neurons, then overfitting may more easily occur, which is a weakness in the theory because how can more neurons cause more problems? Then there is the problem that if your data is somewhat different from your training data in what humans would call an insignificant way, your network may easily start to fail. For example, when doing image classification, and your images contain e.g. a watermark in the lower-left corner, suddenly your recognition may start failing. I've been able to use DL for some projects successfully, but for other projects it has been an outright failure with many invested hours of training and tweaking.
> which is a weakness in the theory because how can more neurons cause more problems?
In exactly the same way that adding more terms to a polynomial fit causes more problems. The is one of the most fundamental results in the theory of statistical learning in general; don't blame Deep Learning for it.
There are a lot of solutions for the problems you mention. For overfitting you can do data augmentation, normalization, dropout and early stop with the test set. (and probably improve your dataset)
More neurons means more parameters to adjust to your data, so overfitting is more likely to happen. It is like interpolation a function, the more parameters you use the more overfitting you have.
> if your data is somewhat different from your training data in what humans would call an insignificant way, your network may easily start to fail
humans call it insignificant because we have a deep knowledge of a lot of domains, meanwhile a network has been trained for an specific domain. So if you train the network with a distribution and then test it with another distribution it is not going to work. That is like quite obvious I think
Deep learning works incredible well. It works so well that it outperform humans in some domains. So may want to rethink what are you doing, because I think (but I may be wrong) the reason you are failing applying deep learning is something related with your process and not with deep learning
This does seem to be holding back a lot of people from trying out DL in production. There are some advantages to using transfer learning in the cases you mentioned (eg models not generalizing for watermarks or differences in training and testing). Although there are still quite a few cases where the best pretraining and large data don't work. Two major areas of advancements (current research) are automatic model architecture selection and automatic parameter tuning to help in making DL more accessible.
Not to be a pedant, but I think the DeepMind paper is actually an example of one-shot generalization, but not learning. From the paper:
> Another important consideration is that, while our models can perform one-shot generalization, they do not perform one-shot learning. One-shot learning requires that a model is updated after the presentation of each new input, e.g., like the non-parametric models used by Lake et al. (2015) or Salakhutdinov et al. (2013). Parametric models such as ours require a gradient update of the parameters, which we do not do. Instead, our model performs a type of one-shot inference that during test time can perform inferential tasks on new data points, such as missing data completion, new exemplar generation, or analogical sampling, but does not learn from these points. This distinction between one-shot learning and inference is important and affects how such
models can be used.
Absolutely. One shot learning is the cutting edge research towards building more human like AI. However its still in early phases. We are trying to make transfer learning directly usable to people trying to solve problems which is proven today. Hopefully we will be able to do the same with one shot learning.
It should be noted that transfer learning is an umbrella term for many ideas that revolve around transferring what one model has learnt into another model. The method described here is a type of transfer learning called fine tuning.
Correct. There are multiple different ways to transfer knowledge in between tasks. We are talking here about transfer learning with deep neural networks where it is proven to work with several advantages over training a model end to end on your own. There are multiple decisions you need to make even with transfer learning like which layer to use for transfer, how much fine-tuning should be done, based on how much data you have which we are trying to automate.
Yes transfer learning is a fairly umbrella term encompassing a lot of different approaches. We tried to give an example of the one most commonly used in NNs almost exclusively with regards to feature extraction. Do you have some resource that lists a variety of transfer learning approaches? Happy to work with you in creating a aggregated list.
Thanks for this write up. I'm not in the ML field, but do follow it a bit and didn't know anything about it.
I also really like your business model. I had argued with potential entrepreneurs and friends that ai is becoming a commodity. However, your business model has a potential for building a network effect of data on top of ai. Presumably becoming more valuable with time.
I do think though, you probably best solve this for a specific vertical first as your go to market strategy.
[+] [-] sarthakjain|9 years ago|reply
Using Transfer Learning we can build a model to identify cats and dogs in images with a few (<100) images as compared to the few thousands it would take before.
To make Transfer Learning easy we are building https://nanonets.ai that has multiple pretrained models that can be augmented with your data to create state of the art models. We are currently in the process of building our first few models. Image Labeling and Object Detection (in Images) work with a few Text based models coming up in the next few weeks.
[+] [-] chriskanan|9 years ago|reply
[+] [-] Ayyar|9 years ago|reply
It's even listed in the Tensorflow tutorials: https://codelabs.developers.google.com/codelabs/tensorflow-f...
[+] [-] scottyli|9 years ago|reply
And do you just arbitrarily select the "cut off output layer" for the pretrained model when retraining with your own data on new layers?
[+] [-] sfifs|9 years ago|reply
[+] [-] deepnotderp|9 years ago|reply
[+] [-] sytelus|9 years ago|reply
Paper: http://research.microsoft.com/en-us/um/people/sumitg/pubs/ca...
[+] [-] prats226|9 years ago|reply
[+] [-] amelius|9 years ago|reply
[+] [-] dfan|9 years ago|reply
In exactly the same way that adding more terms to a polynomial fit causes more problems. The is one of the most fundamental results in the theory of statistical learning in general; don't blame Deep Learning for it.
[+] [-] jorgemf|9 years ago|reply
More neurons means more parameters to adjust to your data, so overfitting is more likely to happen. It is like interpolation a function, the more parameters you use the more overfitting you have.
> if your data is somewhat different from your training data in what humans would call an insignificant way, your network may easily start to fail
humans call it insignificant because we have a deep knowledge of a lot of domains, meanwhile a network has been trained for an specific domain. So if you train the network with a distribution and then test it with another distribution it is not going to work. That is like quite obvious I think
Deep learning works incredible well. It works so well that it outperform humans in some domains. So may want to rethink what are you doing, because I think (but I may be wrong) the reason you are failing applying deep learning is something related with your process and not with deep learning
[+] [-] sarthakjain|9 years ago|reply
[+] [-] brandonb|9 years ago|reply
[+] [-] bglazer|9 years ago|reply
> Another important consideration is that, while our models can perform one-shot generalization, they do not perform one-shot learning. One-shot learning requires that a model is updated after the presentation of each new input, e.g., like the non-parametric models used by Lake et al. (2015) or Salakhutdinov et al. (2013). Parametric models such as ours require a gradient update of the parameters, which we do not do. Instead, our model performs a type of one-shot inference that during test time can perform inferential tasks on new data points, such as missing data completion, new exemplar generation, or analogical sampling, but does not learn from these points. This distinction between one-shot learning and inference is important and affects how such models can be used.
[+] [-] prats226|9 years ago|reply
[+] [-] iraphael|9 years ago|reply
[+] [-] prats226|9 years ago|reply
[+] [-] sarthakjain|9 years ago|reply
[+] [-] jph00|9 years ago|reply
[+] [-] salimmadjd|9 years ago|reply
I also really like your business model. I had argued with potential entrepreneurs and friends that ai is becoming a commodity. However, your business model has a potential for building a network effect of data on top of ai. Presumably becoming more valuable with time. I do think though, you probably best solve this for a specific vertical first as your go to market strategy.
[+] [-] NicoJuicy|9 years ago|reply
[+] [-] jeffreysean|9 years ago|reply
https://www.repnup.com/
[+] [-] iwlbebnd|9 years ago|reply
[deleted]