top | item 24740385

The Gap: Where Machine Learning Education Falls Short

56 points| andreyk | 5 years ago |thegradient.pub

56 comments

[+] bonoboTP|5 years ago|reply

This complaint comes up time and time again. "Universities should prepare students better for jobs!" "Teach more real life skills!" "Turn CS uni into trade school!"

But a computer science university program is not about scikit-learn or TensorFlow! It's about long-lasting principles, underlying mathematics, mental models and ways of thinking.

None of my computer science lectures were about how to apply that particular part of CS knowledge in some hot new Python library. It's expected that there will be some amount of time required to adjust to a company's software setup. That's not a big hurdle usually.

I'm not saying it should be only theory, though. University courses often have accompanying assignments or projects. Depending on the country in question, they often offer more hands-on, practical courses ("lab courses") as well, where you do actually go through the steps of making the theory work in real life. I had such courses where we played with microcontrollers and FPGAs to understand CPU instructions, assembly and low-level C concepts (but even there the goal wasn't to learn exactly the thing that you will use on the job. Most CS graduates will never need to program FPGAs in their day job.).

But sure, there is a place for even more data engineering training, but I don't think it's computer science university programs. Where do people like network engineers learn how to configure Cisco routers and use whatever config software they use? Where do sysadmins learn Bash, Unix, backup management etc? Not at university courses. Wherever they learn those skills, that's where data cleaning, parallelization engineering etc aspects of machine learning should be taught as well.

[+] jagger27|5 years ago|reply

There’s a big gap between the whiteboard in a classroom and a blinking terminal cursor. As a teaching assistant a few years ago I spent a good chunk of my time in the lab showing otherwise brilliant computer science students things like basic terminal commands, how to read error messages, and just overall basic problem solving skills. Almost all of the students I worked with were much stronger than me in terms of theory and what I call “whiteboard computer science”, but many of those same students who aced written tests really struggled with basic roadblocks like learning language syntax to do practical assignments.

It sounds really silly, but some of the best instruction I gave was “Tab” to autocomplete a command and “Up Arrow” to re-run the last command. Whenever I would do a class demo on the projector someone would always stop me to ask how I was running commands so quickly and fluidly and how on Earth I could remember them all.

[+] mlthoughts2018|5 years ago|reply

I think the article is actually saying the exact opposite - that Tensorflow / PyTorch / sklearn code soup from “trade school” sources like bootcamps or quick online programs are not very valuable out in the world.

You might be misunderstanding the focus on data cleaning and feature engineering as being less specialized than say PyTorch coding but it’s exactly the opposite.

The most critical aspects of ML engineering for production are all about advanced statistics. Understanding multicollinearity, overfitting, dimensionality reduction, convergence, and time series issues like assumptions of stationarity or conditional independence effects.

Any engineer can crank out neural network software - that has pretty much zero value.

Value lies in realizing some stratification error in the data and following that lead to use a multi-level model to control for it. Value lies in realizing several key feature inputs are correlated on a seasonal basis - leading to multicollinearity - and then setting up some adaptive feature aggregation to mitigate it and dashboards with things like variance inflation factor to be able to raise alerts on it across time.

Value lies in working on small data problems and using literature review to determine the best prior to use for a Bayesian model, and doing robust posterior predictive checks to validate it.

These things require many years of education and experience dealing with statistical irregularities, understanding confounders and causal inference, understanding missing data treatments, understanding time series forecasting.

You cannot learn that in 101 courses that overly focus on the mechanics of how to type Tensorflow or sklearn code - that part can be picked up by anyone in a month or two. And mere intro to data cleaning and plotting distributions or proportions of missing data is not a substitute for actual statistical knowledge.

[+] jasim|5 years ago|reply

I've oscillated between these two positions for a few years now, when in truth neither positions are really in conflict.

When we complain about universities not preparing students better for jobs, what we really mean is that universities are not doing the bare minimum that they should be doing - in case of CS, students should at least know how to program well, and be well versed in the practicalities of computing. That does not exclude learning the fundamentals (which is often denigrated as "theory").

It is just that students often have neither the theory nor the practice, and at a minimum, we're asking, they should know the practice so they can at least be useful in their jobs.

[+] tracer4201|5 years ago|reply

I’m an engineer who helps quite a bit with hiring interviews. In my humble opinion, there’s a surprising number of fresh grad candidates who are not very skilled with theory nor practice.

It has been many, many years since I was in school. I think it’s fine that your computer science education focuses on fundamental CS concepts and the mathematics so you can easily pick up areas that require that math (ML, for example). I do think universities can do better. At my school, we had mandatory “block” classes in arts and humanities, which in my opinion, offered no value.

To be clear, I’m not saying these subjects aren’t valuable. I am saying, however, that the quality of these courses was very poor, and they could have been substituted entirely with classes related more to my discipline.

I remember sitting in some political science classes that were part of this required pool. I have no idea what we learned in there. As far as I can remember, we read political science papers that were very poorly written/extremely inaccessible. It was impossible to differentiate the authors personal opinion from objective truth of any kind. It was more or less a checkbox - I had to have so many credit hours from this pool in their curriculum to graduate. Did it force me to think critically in some manner? Not at all. It actually gave me a false sense of how “intelligent people” write.

Yes - all these years later I understand that it wasn’t me who just could understand those papers - it was half rambling, pretentious rambling nonsense. It was the opposite of effective communication, and it provided no value.

[+] andreyk|5 years ago|reply

To be fair, this piece is mostly arguing against just teaching Tensorflow or Pytorch and is in favor of more general skills (with data engineering being fairly general, though as you point out also something that can be taught via concrete assignments). And as to your last point, that's pretty much the conclusions of the piece itself:

"Based on the current state of machine learning courses it is clear that AI courses will get you through the door in your effort to perform cutting edge research or landing a machine learning job, but they won’t teach you everything you need to know. To fill in the knowledge gaps that remain you will have to put in outside effort on your own. "

I guess the question is whether the outside effort needs to be addressed by universities, or by other resources.

[+] dunkelheit|5 years ago|reply

This complaint has nothing to do with teaching hot new python libraries.

The thing is, data cleaning is no less fundamental than backpropagation. Maybe more so - learning algorithms come and go, but real-world data is always going to be inherently messy. The difference is in that we have a beautiful mathematical theory for backpropagation but not for data cleaning. So the courses that teach the former but not the latter are akin to the proverbial drunkard that searches for the lost keys under the street light - beautiful mathematical theories are easier to lecture on so they teach them instead of messier (but not less fundamental or useful) topics such as data cleaning.

[+] PeterisP|5 years ago|reply

To comment on "Where do people like network engineers learn how to configure Cisco routers and use whatever config software they use? Where do sysadmins learn Bash, Unix, backup management etc?" - they certainly can do at university courses.

Just as universities offer study programs and degrees in software engineering, there are also programs and degrees for network engineering, which would include not only the theoretical basis of networking but university courses for applied networking where they would learn all that you describe and much more; a university teaches a network engineer to configure routers and manage backups just as they teach a first year electronics engineer to solder stuff. Sure, a generic computer science or software engineering program will not include these courses, that's a usually a separate specialization, but universities definitely do offer engineering programs.

[+] tester756|5 years ago|reply

>None of my computer science lectures were about how to apply that particular part of CS knowledge in some hot new Python library.

Mine were

I had C#, .NET Core, Docker, MongoDb, MSSQL, Postgres, GraphQL, OData, Neo4j, Redis, WebAssembly (Blazor), React, Vuejs and stuff like Git.

that was covered on "Web apps", "Databases", "Non relational databases" and meanwhile some bigger/smaller programming projects.

Public school, studying at weekends.

[+] codelord|5 years ago|reply

I got my BSc in computer science and PhD in machine learning, and ended up working in a top FAANG AI research lab.

In the hindsight both when doing research for my PhD and also when working as an engineer I felt the most useful courses from undergrad were linear algebra, algorithms, calculus, operating systems, and statistics in that order. I ended up filling the gaps in my math education later by reading textbooks and taking online courses.

IMO an undergrad program should focus on very fundamental theory. If I was in charge of designing CS programs I would quadruple the amount of credits required in math and specifically in linear algebra. You would be surprised how handy and applicable linear algbera is in ML, CV, robotics, computer graphics, finance, etc. etc. Calculus is also important but to a lesser degree.

It's a waste of time to teach TensorFlow or teach the trendiest neural network architecture at school. The knowledge becomes irrelevant in a few years, and it's fairly easy to pick it up by reading docs/papers if you know the fundamentals.

[+] throwawaygh|5 years ago|reply

> It's a waste of time to teach TensorFlow or teach the trendiest neural network architecture at school. The knowledge becomes irrelevant in a few years, and it's fairly easy to pick it up by reading docs/papers if you know the fundamentals.

Well, kind of. You teach one or two instances of such things as a case study in how to learn a framework. Usually Software Engineering courses are the best place to do this. The point is, your ML course should probably not be spending any time on things like pytorch. A sophomore level engineering course should have already taught students how to go through the process of learning a new framework.

[+] ZephyrBlu|5 years ago|reply

Which areas of Linear Algebra did you find particularly useful?

[+] _RPL5_|5 years ago|reply

The author makes a point that I relate to. I've been on the receiving end of a couple 'Statistical Learning 101' courses. These courses go roughly as he describes it in his blog post: they first teach you how to multiply two matrices, then launch into linear and logistic regression, then classification via clustering, then decision trees and SVMs, then CNNs and deep learning. Along the way, they do a lecture or two on reinforcement learning and HMMs.

In the end, I ended up with a thin smear of half-baked knowledge in my head, where I stop understanding the math once we are half-way through the material.

So how to you achieve this level of deep/intuitive understanding:

> Without understanding the mathematical underpinnings of key models and techniques in full detail, students aren’t able to quickly choose the right models for certain scenarios.

Does anyone have a good study plan with MOOCs and so on? If you have any practical advice, I would appreciate it!

[+] orange3xchicken|5 years ago|reply

I mean it really depends how deep you want to go. Like you point out that the classes you took are 101 courses. These are really just "tasting" courses. I'm sure if you decided to take more advanced/grad numbered courses, or unnumbered "topics" courses, you would have a better idea of what's going on.

In general, "having a deep understanding of models & techniques in full detail" is not well-defined. For example, analysis of linear regression is often offered as a full year-long sequence for graduate students in math/stats depts. Is this necessary for doing linear regression in practice? Not really, but who cares - it's interesting stuff in its own right. Most people just need just enough understanding to finish a job.

In general, the precise medium that you use to study something isn't that important as long as it works for you, but there is a good reason that there are longstanding classic textbooks that people swear by in most fields of mathematics. I do strongly feel that in the context of any mathematical subject, that there are few substitutes to the grind - doing proofs and solving problems on your own.

Okay, but just to have at least one link in my post, I want to share this guy MathematicalMonk who used to make really great videos on ml-related stuff:

https://www.youtube.com/channel/UCcAtD_VYwcYwVbTdvArsm7w

[+] ImaCake|5 years ago|reply

Part of the problem is that math and statistics are poorly taught most of the time. Textbooks will skip crucial steps in their proofs. Math professors will be too lazy to update their course notes based on frustrated student feedback. Courses will lack tutorials and other ways to discuss problems. I shouldn't have any reason to be watching youtube videos or khan academy to fill the gaps. But me, and thousands of others, are forced to do exactly that.

[+] Exuma|5 years ago|reply

So how about actually giving suggestions of courses or resources that fix this, instead of just saying what's broken and then printing your name.

[+] lukeplato|5 years ago|reply

He proposes changing the syllabus of advanced/graduate-level courses to skip reviewing linear classification and backprop and make that a pre-requisite.

[+] marketingPro|5 years ago|reply

Keep reading the article

[+] shekharshan|5 years ago|reply

The author highlights not teaching enough mathematical theory behind the various techniques. I have tried Andrew Ng's course on Coursera. It uses Octave from what I remember. After a point the lack of mathematical background started to show up. I have always wondered where can I find course that teaches both the mathematical background as well as the hands-on programming in a balanced way.

[+] glitchc|5 years ago|reply

“Ah I see, your college degree is in Arts.”

“Wait, what do you mean you can’t sing, or act, or draw? What do they teach you in school??!?!?”