The most cited deep learning papers

[+] cr0sh|9 years ago|reply

I can understand why it probably isn't on the list yet (not as many citations, since it is fairly new) - but NVidia's "End to End Learning for Self-Driving Cars" needs to be mentioned, I think:

https://arxiv.org/abs/1604.07316

https://images.nvidia.com/content/tegra/automotive/images/20...

I implemented a slight variation on this CNN using Keras and TensorFlow for the third project in term 1 of Udacity's Self-Driving Car Engineer nanodegree course (not special in that regard - it was a commonly used implementation, as it works). Give it a shot yourself - take this paper, install TensorFlow, Keras, and Python, download a copy of Udacity's Unity3D car simulator (it was recently released on GitHub) - and have a shot at it!

Note: For training purposes, I highly recommend building a training/validation set using a steering wheel controller, and you'll want a labeled set of about 40K samples (though I have heard you can get by with much fewer, even unaugmented - my sample set actually used augmentation of about 8k real samples to boost it up to around 40k). You'll also want to use GPU and/or a generator or some other batch processing for training (otherwise, you'll run out of memory post-haste).

[+] amelius|9 years ago|reply

Nice. I'm wondering how often NVidia's solution makes a mistake. Also, the paper says:

> More work is needed to improve the robustness of the network, to find methods to verify the robustness, and to improve visualization of the network-internal processing steps.

But it doesn't hint at how this would be approached.

Also, how they arrived at the particular network topology seems sort of a mystery.

[+] splike|9 years ago|reply

That's pretty cool. How long does it take to train something like that? And did it work?

[+] sanifsdd|9 years ago|reply

[deleted]

[+] pizza|9 years ago|reply

http://people.idsia.ch/~juergen/deep-learning-conspiracy.htm... oh Juergen

> Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

[+] curuinor|9 years ago|reply

I mean, there was a nice 15 or so year period between the "new" post-Minsky-and-Papert Perceptron book connectionism and the current "new new" connectionism where neural nets were a definite backwater. Most of the PIs doing neural nets dealt with it by being sad but Schmidhuber seems to have dealt with it by doubling down on the weirdness.

[+] unknown|9 years ago|reply

[deleted]

[+] kriro|9 years ago|reply

This might be as good a place to ask as any. Does anyone have suggestions on the problem of annotating natural language text to get a ground truth for things that have no readily available ground truth (subjective judgments of content etc.)? I do own the book "Natural Language Annotation" which is good but not exactly what I need. The part of annotation guidelines and how the annotation was done in practice is often only brushed over in many research papers. I mean I get it at a high level it's basically have a couple of raters, calculate inter- and intrarater reliability and try to optimize that. However like I said I'm struggling a bit with details. What are actually good values to aim for, how many experts do you want, do you even want experts or crowd source, what do good annotation guidelines look like, how do you optimize them etc.? Just to play around with the idea a bit, we did a workshop with four raters and 250 tweets each (raters simply assigned one category for the entire tweet) and that was already quite a bit of work and feels like it's on the way to little side of things.

I feel like I should find a lot more info on this in the sentiment analysis literature but I don't really.

[+] PeterisP|9 years ago|reply

You might want to browse archives of LREC biannual conference, they have sections focused on resource creation and some of the larger projects should have papers on annotation methodology. LDC (https://www.ldc.upenn.edu/) is probably the largest organization doing a large variety of annotation tasks, maybe they have published how they do things, I'm not sure.

However, often there are no real shortcuts; in many projects the resource annotation takes much more work and more people than everything else together, it's not uncommon to see multiple man-years spent to do that properly.

What you say about the high level is just about all that can be said in general, everything else will depend on your particular problem. After you've fixed the bugs in your process, interannotator agreement is not really a description of your annotators but a measure of how subjective/objective your task is - and you can't really change that without meaningful changes to how exactly you define your task. Some tasks are well suited for crowdsourcing, and some need dedicated experts. Some annotation tasks are straightforward and annotation guidelines fit on one page; for others the annotation guidelines are a literal book, and one that needs revisions as a few years after you figure out that you need changes. It depends. Shallow sentiment analysis is generally on the trivial side of annotation (but highly subjective), but you can go far enough the rabbit hole to drag the whole surrounding issues of intent, degree of belief, degree of certainty, etc - then you hit the full complexity of deep semantic annotation.

Perhaps you just need to find the people who did the latest more interesting datasets in your domain and ask them directly. I don't handle sentiment, but http://alt.qcri.org/semeval2017/task5/ is one group of people that seems to do it seriously.

[+] splike|9 years ago|reply

Have you heard of Word2Vec?

In a nutshell, its a deep learning model that given a word, predicts the other words around it, or alternatively, given some words in a sentence predicts the missing word. The idea is that similar words end up being assigned to similar vectors, all without knowing a ground truth.

Now that won't exactly answer your question, but then you can keep a couple of words related to your sentiment in a list and compare the words in the tweet to that list. If they are similar enough, you can write a rule to mark that tweet as matching your sentiment.

[+] nojvek|9 years ago|reply

Someone needs to make a summary of the top papers and explain it in a way lay man can understand. I would pay $500 for such a book/course explaining the techniques.

I've been reading a number of this papers but it's really tough to understand the nitty gritties of it.

[+] lomereiter|9 years ago|reply

Jeremy Howard's course (http://course.fast.ai/) is much as you describe, the downside is it's a little too boring for people with mathematical education.

[+] curuinor|9 years ago|reply

No PDP book? It's old and weird but interesting and has a lot of original ideas, notwithstanding the actual original backprop being from before then. Nor the original backprop stuff?

[+] aroman|9 years ago|reply

The PDP book is the main textbook for a course I'm taking at CMU called... PDP. It's digestible, but man is it weird to see things like "this is an ongoing area of future research" where the future = after 1986.

I find it kind of hard to relate to for that reason — how do I know (besides asking my prof., whose PhD advisor was Hinton, or doing my Googling) what ideas ended up "sticking"? What areas of future research went nowhere vs spawned whole new subfields?

Is there a more modern textbook of the sort I could cross-reference?

edit: here's a link to the course website: http://www.cnbc.cmu.edu/~plaut/IntroPDP/

[+] latently|9 years ago|reply

For those interested in PDP, one of Jay McClellands students is Professor Randy O'Reilly at CU. He develops the emergent neural network simulator and is using biologically plausible deep learning to learn how the mind works. He's been doing it all along. There was never a gap!

The simulator: http://grey.colorado.edu/emergent The textbook: https://grey.colorado.edu/CompCogNeuro

[+] cr0sh|9 years ago|reply

Was this it?

https://stanford.edu/~jlmcc/papers/PDP/

[+] cr0sh|9 years ago|reply

Hey, if you can find it, post some links here - even if not mentioned there, it could be interesting to others (hint: I'm interested!)...

[+] curuinor|9 years ago|reply

Other things off the top of my head:

Smolensky's hamonium (rbm)

the LeNet papers

Rumelhart's BPTT

Werbos's thesis

Jaeger's ESN nature paper

Forget gate paper, whose author I forget (ironically)

[+] pks2006|9 years ago|reply

I always wanted to apply the knowledge of the deep learning to my day to day work. We build our own hardware that runs the Linux on Intel CPU and then launches a virtual machine that has our propriety code. Our code generates a lot of system logs that varies based on what is the boot sequence, environment temperature, software config etc. Now we spend a significant amount of time go over these logs when the issues are reported. Most of the time, we have 1 to 1 mapping of issue to the logs but more often, RCA'ing the issue requires the knowledge of how system works and co-relating this to the logs generated. We have tons of these logs that can be used as training set. Now any clues on how we can put all these together to make RCA'ing the issue as less human involved as possible?

[+] curuinor|9 years ago|reply

Use dumber ML first, try some random forests. Not even because they're even that much better or worse, just because DL requires an enormous amount of knowledge and fiddliness but what you prolly want is for the bulk of the actual work to set up the data for ML, not hyperparameter fiddling and architecture fiddling.

[+] fnbr|9 years ago|reply

What you could do is assemble the data in tabular form so that your data is in the shape:

    Issue     System log
    -------- ------------
    issue_1   corresponding system log
    issue_2   corresponding system log
    issue_3   corresponding system log
    issue_4   corresponding system log
    issue_5   corresponding system log

Once you've done that, you can train some sort of classifier on it, e.g. something like [1]. There's a bunch of stuff you want to do to make sure you're not overfitting (I'd scale your data & use 5-fold cross validation), but that would get you started.

[1]: http://scikit-learn.org/stable/tutorial/text_analytics/worki...

[+] mathoff|9 years ago|reply

The most cited deep learning papers: https://scholar.google.com/scholar?q="deep+learning"

[+] gv2323|9 years ago|reply

Has anyone downloaded them into their own separate folders and zipped the whole thing up?

[+] gravypod|9 years ago|reply

This is a really lucky find for me. I was just about to do something to try and get into machine learning. Right now I need some help getting started with writing some machine learning code. I don't know where to start. I've come up with a very simple project that I think this would work very well for.

I want to buy a Raspberry Pi Zero, put it in a nice case, add to push buttons and turn it into a car music player (hook it into the USB charger and 3.5mm jack in my car). The two buttons will be "like" and "skip & dislike". I'll fill it with my music collection, write a python script that just finds a song, plays it, and waits for button clicks.

I want the "like" button to be positive reinforcement and the "skip & dislike" to be negative reinforcement.

Could someone point me in the right direction?

[+] ctchocula|9 years ago|reply

Your usecase reminded me of the Netflix problem, which is given x movies that the user has liked, try to recommend movies to them based on a large dataset that has thousands of users and their movie ratings. For music, there is a similar dataset[1] and problem on Kaggle[2].

The way the system is evaluated is by building a model that will predict what rating the user will give a song even though the user has not rated it yet. Then the difference in predicted and actual rating will be computed as the testing error in the model. Some basic techniques for building a model are regression and matrix factorization using SVD (singular value decomposition).

Your usecase might be slightly different from this problem, because you wouldn't have to predict the rating other users give a song (only for yourself) and you want your model to change on the fly given a skip and dislike. A simple, but possibly effective solution might be to search the music dataset that contains the listening history of 1M users to find songs you haven't rated before and download it to listen.

[1] http://labrosa.ee.columbia.edu/millionsong/ [2] https://www.kaggle.com/c/msdchallenge#description

[+] rohankshir|9 years ago|reply

you should just build it and then do the recommendations by a heuristic function. Then you can substitute the function with an ML classifier once you have enough data to train on (and time to learn about ML). Don't wait on ML coding tips for this project

[+] delta1|9 years ago|reply

I'm not sure your problem is well defined enough. Do you want the ML to be able to select songs similar to those you've liked? Or would simply keeping track of the number of likes on each song suffice, such that songs with more likes have a higher probability of being played?

[+] applecore|9 years ago|reply

Classic papers can be worth reading but it's still useful to know what's trending.

Even a simple algorithm would be effective: the number of citations for each paper decayed by the age of the paper in years.

[+] quinnftw|9 years ago|reply

I think what you are describing here is simply "average number of citations per year", no?

47 comments