gdahl's comments | WingNews

gdahl | 6 years ago | on: Jax – Composable transformations of Python and NumPy programs

Check out the Flax ResNet50 example: https://github.com/google/flax/tree/master/examples/imagenet

It runs about as fast as any of the other popular machine learning frameworks, occasionally faster.

Disclaimer: I work for Google and use JAX, although I'm not on the Jax team.

gdahl | 7 years ago | on: Train TensorFlow models faster and at lower cost on Cloud TPU Pods

To clarify, we found the range shrink if one trains for a fixed number of epochs and expand if one trains for a fixed number of steps. So tuning can be easier or harder depending on the budget.

gdahl | 8 years ago | on: Google Brain Residency

I work as a researcher on the Brain team.

An experienced machine learning researcher like Andrew Ng would probably not join the team as a Brain resident. We hire experienced machine learning researchers and engineers all the time (see https://careers.google.com/jobs#t=sq&q=j&li=20&l=false&jlo=e... ) and the residency program is probably not appropriate for people who are already experts. It is a program designed to help people become experts in machine learning.

For residents we look for some programming ability, mathematical ability, and machine learning knowledge. If an applicant knows absolutely nothing about machine learning, it would be strange (why apply?). We accept people who are not machine learning experts, but we want to be sure that people know enough about machine learning to be making an informed choice about trying to become machine learning researchers. Applicants need to have enough exposure to the field to have some idea of what they are getting into and have the necessary self-knowledge to be passionate about machine learning research.

You can see profiles of a few of the first cohort of residents here: https://research.google.com/teams/brain/residency/

See the old job posting which should hopefully explain the qualifications: https://careers.google.com/jobs#!t=jo&jid=/google/google-bra...

gdahl | 9 years ago | on: The Google Brain Team – Looking Back on 2016

We don't have a simple separation of concerns like that. Brain and DeepMind share a common vision around advancing the state of the art in machine learning in order to have a positive impact on the world. Because machine intelligence is such a huge area, it is useful to have multiple large teams doing research in this area (unlike two product teams making the same product, two research teams in the same area just produces more good research). We follow each other's work and collaborate on a number of projects, although timezone differences sometimes make this hard. I am personally collaborating on a project with a colleague at DeepMind that is a lot of fun to work on.

Disclosure: I work for Google on the Brain team.

gdahl | 9 years ago | on: DeepMind’s work in 2016: a round-up

The Google Brain (g.co/brain) team has people in SF. We are part of the same company as DeepMind so maybe this doesn't quite answer your question. ^_^

We are mostly in SF and Mountain View, but we also have people in a few other locations. Right now, SF and Mountain View are the largest.

Disclosure: I work for Google on the Brain team.

gdahl | 12 years ago | on: Jeopardy's controversial new champion is using game theory to win big

Here is a better article with an interesting interview from Arthur Chu: http://mentalfloss.com/article/54853/our-interview-jeopardy-...

gdahl | 12 years ago | on: Google to Buy Artificial Intelligence Startup DeepMind for $400M

There were a lot of pros a DeepMind. For example: Volodymyr Mnih, Andriy Mnih, Alex Graves, Koray Kavukcuoglu, Karol Gregor, Guillaume Desjardins, David Silver, and a bunch more I am forgetting.

gdahl | 12 years ago | on: GPU-Accelerated Deep Learning Library in Python

It has content and communicates an approach to machine learning distinct from other approaches. It isn't like "big data" which is truly meaningless. However, deep learning is also not a single method or algorithm.

I would have described the library in question as a "GPU-Accelerated Neural Network library" since that is more descriptive.

gdahl | 12 years ago | on: Will online classes make professors extinct?

Will high quality textbooks and the printing press make the medieval university go extinct?

gdahl | 12 years ago | on: Numenta open-sourced their Cortical Learning Algorithm

Yes, plenty. Releasing source code is quite common in the ML community.

gdahl | 12 years ago | on: Numenta open-sourced their Cortical Learning Algorithm

Generally, the most relevant academic community would be the NIPS community and I have not noticed any Numenta papers at NIPS, but please point me towards any I have missed if you are aware of any. I expect a lot of people have an opinion along these lines: http://developers.slashdot.org/comments.pl?sid=225476&cid=18... Don't get me wrong, I would love to see numenta produce something of value for the ML community, but it doesn't look good so far.

gdahl | 12 years ago | on: Numenta open-sourced their Cortical Learning Algorithm

Basically none. Numenta has yet to do anything that has impressed any researchers I know. Perhaps someday they will, but I am not counting on it.

gdahl | 13 years ago | on: Scientists See Promise in Deep-Learning Programs

Deep learning is not boosting at all. Deep learning is about composing trainable modules. Adding a layer f(x) to a layer g(x) to get h(x) = f(g(x)). Boosting creates a final classifier that is a weighted sum of the base classifiers, or something like h(x) = a * f(x) + b * g(x). Composition is what Professor Hinton means when he says "re-represent the input" and other similar phrases.

gdahl | 13 years ago | on: Scientists See Promise in Deep-Learning Programs

I was involved in the speech recognition work mentioned in the article and I led the team that won the Merck contest if anyone has any questions about those things. I also spend some time answering any machine learning question I feel qualified to answer at metaoptimize.com/qa

gdahl | 13 years ago | on: Scientists See Promise in Deep-Learning Programs

That is not entirely accurate. The Science paper described how to (pre)train a deep belief net by training a sequence of RBMs. Contrastive divergence for RBM training (and more generally products of experts) was described in 2002 in "Training Products of Experts by Minimizing Contrastive Divergence" http://www.cs.toronto.edu/~hinton/absps/nccd.pdf

gdahl | 13 years ago | on: Microsoft Research make breakthrough in audio speech recognition

The term "Deep Belief Network" has been abused in the literature (not pointing fingers, I've done it too). The DNNs used mean a neural net pre-trained with RBMs. Sometimes, when people say DBN, that is also what they mean. But really a DBN is a particular graphical model with undirected connections between the top two layers and directed connections everywhere else. The confusion comes from the pre-training procedure. The pre-training creates a DBN, which is then used to initialize the weights of a standard feedforward neural net. Then the DBN is discarded. It is a somewhat pedantic distinction. Since DBN is already an overloaded acronym (Dynamic Bayes Net) in the speech community and not entirely accurate for the pedantic reason I just mentioned, we decided to go with the DNN acronym.

gdahl | 13 years ago | on: Microsoft Research make breakthrough in audio speech recognition

For people interested in some (currently) undocumented research code in python implementing DNNs that is also on my website. Although the code is only an initial release. I will improve it later, but if I waited until it wasn't embarrassing I would never release it, so I just posted it.

gdahl | 13 years ago | on: Microsoft Research make breakthrough in audio speech recognition

Senones are just tied triphone HMM states. A context dependent HMM recognizer has a 3-5 state HMM for every context dependent phone. Conceptually, each different HMM state in each different phone HMM has its own Gaussian mixture model, but this is awful because many of them don't get much data assigned to them. So people share parameters for different HMM states based on a data driven decision tree that clusters states together. Those clustered or tied states are sometimes called senones.

gdahl | 14 years ago | on: Speech Recognition Leaps Forward

1. Nothing. People ARE implementing similar things. It takes time, effort, and lots of computation. 2. People often prefer to implement their own ideas and compete (especially researchers). 3. Potentially lack of patents might discourage other firms from doing it.

gdahl | 14 years ago | on: Speech Recognition Leaps Forward

No, the HMM is not replaced in that work. The GMM is replaced, as you surmise. There are three problems with standard ASR: HMMs, GMMs, and n-gram language models. The GMM is the easiest to remove. Keeping the HMM allows simple, efficient decoding algorithms.