Neural Networks for Machine Learning

[+] lathamcity|13 years ago|reply

I'm in the middle of the machine learning coursera course, and registered for this one as well due to interest in the material.

My one complaint is that the programming assignments weren't interesting at all. The results were interesting, but the setups were mostly given to us, and we just had to code an algorithm that was in our notes. For someone who understands the basics of linear algebra and programming, it was just a syntax challenge, and that got irritating after a bit so I stopped doing them.

I won't get the certificate for completing the course, but I have a few extra hours of free time each week to add this second course, so I'm happy. I doubt that the actual homework that Stanford students taking this course get is so easy and repetitive, though, and I'm positive they wouldn't complain about not getting to retake quizzes after getting poor grades.

Not to knock the course. I've learned a lot and the professor (Andrew Ng) does a good job.

[+] xianshou|13 years ago|reply

I've taken both, and the code is in fact not that much simpler than it was in the original class. There are, however, two huge differences: the algorithm is spoon-fed to you, and there is no math.

Firstly, think about how much more difficult the assignments would be if, for example, the steps weren't broken out and we didn't get any advice on how to vectorize. Of course, it would still be short work for anyone who (a) knows Matlab/Octave and/or (b) understands the material well, but it would also be an order of magnitude harder.

Secondly - and this is by far the larger point - the original CS 229 was really about math; the programming assignments were more of an afterthought. The lectures and homework mainly focused on the theoretical derivations and corollaries of the math that led to the algorithms. Once you'd done your bit on the math and cried to your classmates and the TA about it, you could go and implement the beautiful and extremely succinct result in Matlab.

As for my perspective on the difference, I believe it is a deliberate choice made with full knowledge of the difficulty drop. For starters, there are (with regards to homework help) no TAs in this course, so the absolute difficulty would have to decline to create an equivalent experience. More significantly, the enrollment has increased by a factor of about 700. If Stanford students had trouble with the original, you can bet that the median student in the course doesn't find it as easy as either of us does. If the goal is to generate the greatest benefit for the most people, and delivering the algorithms with a good intuition on their proper use will do so, then this course has succeeded marvelously. Of course, the smartest and most dedicated students will want more, which remains available through textbooks as well as the original course handouts (http://cs229.stanford.edu/materials.html). However, I would argue that the goal of most MOOCs (massive open online courses) should be to kindle interest and foster basic understanding, both of which the Coursera version achieves.

[+] fromdoon|13 years ago|reply

Hi,

I am also taking the course by Andrew Ng and understand your complaint that the programming assignments aren't as interesting ( from your perspective). Being quite comfortable with linear algebra, I was able to complete the assignments easily.

But when I go through the course forums, I find that for many people taking the course, the intuition behind the use of linear algebra in ML doesn't come as easy as it does for us. I think when Andrew Ng designed this online course, he must have had those people in mind also. I think he mentions it at the start of the course that it's more about understanding the concepts and the implementation details should come later. The programming exercises are designed keeping that in mind, I think.

I tried to make the programming exercises interesting for myself, by first thoroughly understanding the code that they had provided and tweaking it here and there. Once you have done that, you could apply what you've learnt on real world datasets from sources like Kaggle and see how you fare :)

[+] FaceKicker|13 years ago|reply

> My one complaint is that the programming assignments weren't interesting at all. The results were interesting, but the setups were mostly given to us, and we just had to code an algorithm that was in our notes. For someone who understands the basics of linear algebra and programming, it was just a syntax challenge, and that got irritating after a bit so I stopped doing them.

I agree with this. The programming assignments I've done so far in the Machine Learning class are usually 5-7 matlab functions, many of which are about 2 lines of code (the longer ones might be ~10 lines of code). If you've ever done matlab/octave programming the assignments will take about 20-30 minutes and be completely unenlightening as you're literally just translating mathematical notation into matlab (which is, by design, already a lot like mathematical notation anyway). They provide entirely way too much skeleton code to learn anything from if you're not actively trying to learn. If I weren't already mostly familiar with most of the material presented in the class, I imagine I would never retain knowledge of how the machine learning "pipeline" worked or have any high-level understanding of the algorithms, because the assignments just require you to implement the mathematical pieces of each step, without ever asking you to, for example, actually call any optimization routines, or put the pipeline together.

The problem, I think, is that it would just be too difficult to do automatic grading in a way that is reasonably possible to pass if they don't turn most of the work into skeleton code. Since the automatic grading needs nearly exactly matching results, one minor implementation difference in a perfectly good implementation of the algorithm itself (e.g., picking a single parameter incorrectly, picking the optimization termination conditions incorrectly, choosing a different train/dev split, etc.) would make the entire solution completely wrong.

[+] karpathy|13 years ago|reply

I took CS229 here at Stanford and I was also one of the TAs for the online version last year (I was one of 2.5 people involved with making the programming assignments).

First, the Stanford CS229 version is definitely much more difficult than what you guys had online. The focus in the actual class was on the math, derivations and proofs. The homeworks sometimes got quite tricky and took a group of us PhD students usually about 2 days to complete. There was some programming in the class but it was not auto-graded so usually we produced plots, printed them out, attached the code and had it all graded by TAs for correctness. The code we wrote was largely written without starter code and I do believe you learn more this way.

An online version of the class comes with several challenges. First, you have to largely resort to quizzes to test students (instead of marking proofs, derivations, math). There is also no trivial way to autograde resulting plots, so everything has to be more controlled, standardized and therefore include more skeleton code. But even having said all that, Andrew was tightly involved with the entire course design and he had a specific level of difficulty in mind. He wanted us to babysit the students a little and he explicitly approved every assignment before we pushed it out. In short, the intent was to reach as many people as possible (after all, scaling up education is the goal here) while giving a good flavor of applied Machine Learning.

I guess what I mean is that you have more experience than the target audience that the class was intended for and I hope they can put up more advanced classes once some basics are covered (Daphne Koller's PGM class is a step in this direction). But there are still challenges with the online classes model. Do you have ideas on how one can go beyond quizzes, or how one can scale down on the skeleton code while retaining (or indeed, increasing) the scale at which the course is taught?

[+] epsylon|13 years ago|reply

Try the Learning From Data course : http://work.caltech.edu/telecourse.html A Fall run has just started (on the 2nd of oct.).

It's the same version as the course given at CalTech and is more in-depth than Andrew Ng's. There is no skeleton code for the programming assignments, answers are made through quizzes. I took the summer session and learned a lot from it.

[+] MaysonL|13 years ago|reply

I took the course in the spring, and found it interestng, and the programmin assignments fairly easy. This summer I took the ML course that Caltech offered, which was significantly more challenging (the homework assignments were multiple choice, but they often required writing substantial code, without any starter code.) The Caltech course is now available on iTunes U...

[+] dvdhsu|13 years ago|reply

> The results were interesting, but the setups were mostly given to us, and we just had to code an algorithm that was in our notes.

Right; I agree. I'm not sure how they would go about making it more challenging though. They can't expect us to go out and collect data ourselves, after all. I suppose they could give us the data, then expect us to code the setup and algorithms up ourselves, but that, too, would become repetitive after a few assignments.

> Not to knock the course. I've learned a lot and the professor (Andrew Ng) does a good job.

Agreed once again. I knew nothing about machine learning before starting; now I know about neural networks, SVMs, and PCM. It's really cool how much I've learned already, for free, too!

I've also signed up for this course, but the quizzes really aren't up to par. As an example: the first quiz question was about training a neural network with too much data, and about whether or not said network would be able to generalize to new test cases. Overfitting neural networks wasn't even mentioned in the lectures; I had to rely on material from Andrew's class to answer the question correctly. This chasm between the lectures and the quizzes is likely because Geoffrey is the one creating the video lectures, but he's not the one creating the quiz questions; he is having TAs do it [1].

Nevertheless, it looks like they're responding to feedback, so hopefully it'll get better with time.

1. https://class.coursera.org/neuralnets-2012-001/wiki/view?pag...

[+] jackpirate|13 years ago|reply

I'm positive they wouldn't complain about not getting to retake quizzes after getting poor grades.

My experience is that students everywhere complain about grading. I've never been to Stanford, but I've attended and worked at several other top tier universities.

[+] ameasure|13 years ago|reply

Hinton is a huge figure in the neural network literature and an important researcher in deep learning. After going through the first week of lectures, I can say he's also an excellent teacher.

The syllabus, draft though it is, indicates the second half of the class will focus on deep learning, a field of machine learning that has demonstrated huge potential.

[+] jimbokun|13 years ago|reply

Just browsing through the Coursera Computer Science listings, it looks like they are rapidly approaching the point where you could put together a CS curriculum superior to what you could get at any single school. The people they have teaching a lot of these topics are some of the best in the world in their field. The Micahel Collins NLP course looks really thorough and up to date, for example I took a similar course a few years ago, and I remember reading papers written by him.

As has been said by many already, of course, the remaining nuts to crack are high quality interaction with other students, professors, and TAs; and accreditation.

But the dis-intermediation of large universities may be nearer than we think.

[+] platz|13 years ago|reply

An attempt to design a reasonable computer science curriculum using just Coursera courses, where “reasonable” is a curriculum that roughly mirrors the coursework required for a four-year university computer science degree: http://www.thesimplelogic.com/2012/09/24/you-say-you-want-an...

[+] misiti3780|13 years ago|reply

the only real problem with coursera is everyone is posting their solutions to github, so its gonna be impossible for them to prevent cheating. i agree with you though, that the flexibility it is offering is amazing

[+] misiti3780|13 years ago|reply

people are already complaining that you can only take the quizes once ... he had to send out an email today to everyone saying:

"Many of you are unhappy with only being allowed to attempt a quiz once. Starting in week two, we have therefore decided to make up twice as many questions and to allow you to do each quiz twice if you want to. The second time you try it the questions will all be different. Your score will be the maximum of your two scores. For week one, the quizzes will remain as they are now.

Many of you would like the names of the videos to be more informative. We will change the names to indicate the content and the duration.

Some of you thought that some of the quiz questions were too vague. We will try to make future questions less vague.

Some of you are unhappy that we do not have the resources to support Python for the programming assignments. We sympathize with you and would do it if we could. You are still welcome to use Python (or any other language) if you can port the octave starter code to your preferred language. We have no objection to people sharing the ported versions of the starter code (but only the starter code!). However, if you get starter code in another language from someone else, you are responsible for making sure it does not contain bugs."

I thought that was pretty funny!

[+] notimetorelax|13 years ago|reply

Yeap we got spoiled with earlier classes: Algorithms by Tim Roughgarden, Machine Learning by Andrew Ng, and many more. We probably need to follow a class on gratitude.

Oh well, to be fair I would donate quite a lot for each course that I enjoyed.

[+] emcl|13 years ago|reply

The only course that is not significantly diluted is Koller's PGM. All others have been dumbed down to a degree where they provide no challenge to the courseree at all.

[+] notimetorelax|13 years ago|reply

It is not such a huge problem when you take several courses at once. Sadly they run them only twice a year, each time I try to follow as many as possible. I cannot follow PGM because it requires too much of my time, I'd have to abandon 2 or 3 other courses. YMMV.

[+] azakai|13 years ago|reply

> Neural Networks are gradually taking over from simpler Machine Learning methods

And haven't SVMs and such gradually taken over from Neural Networks?

[+] Homunculiheaded|13 years ago|reply

And RandomForests taken over from SVMs ;)

In seriousness when you look around at what's happening both in practice and in academia I would say RandomForests/SVM/Neural Networks all stand pretty equally and have different strengths. If you've just got rows and rows of data with numeric, categorical and missing values it's hard to beat the speed and quality of shoving it in a RandomForest. However to my knowledge SVMs are still better at solving NLP categorization tasks and handling sparse, high dimensional data. And Neural Networks always seem to be popping up solving very weird and/or hard problems.

[+] nphrk|13 years ago|reply

Well not quite. While SVMs gained a lot of popularity for having nice properties e.g.

  1) a convex problem which means a unique solution and a lot of already existing technology can be used
  2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations
  3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)

There is an ongoing craze about deep belief networks developed by Hinton (who is teaching this course) who came up with an algorithm that can train them (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular

  1) They seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.
  2) They can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlaballed datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.
 3) They're "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.

[+] karpathy|13 years ago|reply

Here's the problem: There is no silver bullet in Machine Learning and many of these approaches (SVMs, Neural Nets, Random Forests, PGMs, etc.) have their pros and cons that depend on many variables, for example:

- How much data do you have wrt dimensionality?

- How "easy" do you suspect your problem to be? Is it likely linearly separable? Equivalently, how good are your features?

- Do you have many mixed data? Missing data? Categorical/Binary data mixed in? (Better use Forest, perhaps!)

- Do you need training to be very fast?

- Do you need testing to be very fast on new out of sample data?

- Do you need a space-efficient implementation?

- Would you prefer a fixed-size (parametric) model?

- Do you want to train the algorithm online as the data "streams" in?

- Do you want confidences or probabilities about your final predictions?

- How interpretable do you want your final model to be?

etc. etc. etc. Therefore, it doesn't make any sense to talk about one method being better than another.

One thing I will say is that, as far as I am aware, Neural Nets have a fair amount of success in academia (which should be taken with a grain of salt!), but I haven't seen them win too many Kaggle competitions, or other similar real-world problems. SVMs or Random Forests have largely become the weapon of choice here.

Neural Nets do happen to be very good when you have a LOT of data in relatively low-dimensional spaces. Many tasks, such as word recognition in audio or aspects of vision fall into this category and Google/Microsoft and others have incorporated them into their pipelines (which is much more revealing than a few papers showing higher bars for Neural Networks). In these scenarios, Neural nets will parametrically "memorize" the right answers for all inputs, so you don't have to keep the original data around, only the weighted connections.

Anyway, I wrote a smaller (and related) rant on this topic on G+: https://plus.google.com/100209651993563042175/posts/4FtyNBN5...

[+] mturmon|13 years ago|reply

The way it's worded is not 100% clear. Hinton, who is an excellent lecturer and explainer, is talking about neural nets trained with "deep learning" techniques (not vanilla single-hidden-layer nets), which have had striking success at hard vision problems that have been difficult to solve top-to-bottom with SVMs (e.g., you could get good performance from an SVM, but you'd have to go on a hunt for good low-level features first).

That said, there is a rather unhelpful herd mentality in the field, with people moving from one Next Big Thing to another, disparaging the previous Big Thing along the way.

[+] nphrk|13 years ago|reply

Well not quite. While SVMs gained a lot of popularity for having nice properties e.g.

1) a convex problem which means a unique solution and a lot of already existing technology can be used

2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations

3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)

There is an ongoing craze about deep belief networks developed by Hinton et al. (who is teaching this course) who came up with an algorithm that can train them reasonably well (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular

1) they seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.

2) DBNs can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlabelled datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.

3) DBS are "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.

[+] rm999|13 years ago|reply

I am not an expert in SVMs, but I consider myself fairly experienced in machine learning. In my professional experience the answer to your question is 'not quite'. SVMs have solved some problems very well, but I've had issues with them:

1. They are only for classification, not every problem is classification. The other big category is regression, for example predicting the sale price of a home rather than predicting a binary "will it sell"

2. They don't have a natural probabilistic interpretation for classification. Neural networks for classification (with a logistic activation function) are trained to predict a probability, not make a simple binary decision. In practice this probability is usually very useful, although I believe SVMs have been modified to give some kind of probability.

3. I have had a tough time getting them to run quickly. Linear kernel SVMs are fast, but aren't powerful. More complex kernels are more powerful but can be very slow on moderately large datasets.

[+] unknown|13 years ago|reply

[deleted]

[+] isakovic|13 years ago|reply

I took Professor Hinton's course on Neural Networks as an undergrad. This man is the most intelligent person I have ever met. He is one of the giants.

[+] tocomment|13 years ago|reply

Did it already start? Is it too late to start?

Also I took a nn class in college so do you think I would get much more out of this?

[+] ameasure|13 years ago|reply

It just started on Monday, there's plenty of time to join in.

There have been some huge developments in neural networks in the last few years, particularly with respect to deep learning. If you missed out on that you might want to try this class. Hinton has been involved in many of these advances.

The second half of the course appears to focus on deep learning topics so you might want to start there if you already know the basics.

[+] rubashov|13 years ago|reply

I tried to do a couple coursera courses and found the video lectures highly inefficient; very needlessly time consuming, even watching them sped up. All I really want is a glorified text book with quiz grading and a final.

[+] notimetorelax|13 years ago|reply

It depends on how much you value your time. Lectures usually are shorter than 2 hours per week. (There are some that have longer videos, but I think more than 2 hours is suboptimal.) I know that before courses I was wasting this time on hacker news or reddit, so I don't value my time that much. On the other hand I do now, and that's because I need to watch the lectures and do the home work. And really these lectures perform the same role in the learning process as the real lectures. You could graduate from university only with text books, but you might not get some insight that lecturers have.

My 0.02 chf.

[+] minikomi|13 years ago|reply

It sounds terribly privileged to say so, but I'm afraid I have to agree. Also, quite often the quizzes are directly based on the videos ("What did line A represent in ~ graph?"), while I find I self learn better through reading.

[+] vitno|13 years ago|reply

this. No offense to these professors, but what are they presenting in their video lectures that I can't garner from their writing?

56 comments