If one were to use Hacker News as their only source of information, it would seem that machine learning is a very overrated topic. There is something related to it on HN's front page almost every day. This proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend? If yes, is it limited to the US? What about the machine learning scene in Europe? Maybe someone here could provide some perspective.
[+] [-] rm999|9 years ago|reply
The supply-demand dynamics have changed a lot in the last couple years. I'd roughly break it out into two groups: people with work experience + strong software development skills, and those without. The first group is in higher demand than ever, and tend to add a lot of value to companies that really need it.
The second group has gotten extremely crowded, especially from STEM graduates - usually with a masters or phd - who have completed MOOCs or bootcamps. Supply keeps growing while demand is flat or shrinking (especially as executives get burned by "data scientists" who don't know how to help them build things of value). There's a huge crunch here; a lot of people I know in this group have been searching for jobs for months, eventually settling for a low quality job or giving up entirely :(
[+] [-] pcsanwald|9 years ago|reply
The former kind of data scientists were very successful at our company, the latter, not so much. Both categories I described usually had a STEM type PhD.
[+] [-] drew|9 years ago|reply
But that's by no means all of the DS field. There are lots of DS jobs where you're collecting and interpreting and communicating about complex data sets. An engineering mindset is occasionally helpful, but a bias towards building versus towards analyzing and writing can just as often be counter-productive. Not all problems are solved by systems; lots of problems are solved by better understanding the problem and then letting other specialists build the right solution.
The bootcamps have contributed to the problem by focusing so much on building things. The idea that you can go from an econ undergrad to being a self-sufficient member of a production ML team in 6-12 weeks is nuts. What's less nuts (and what I wish programs like Insight focused on) is taking people from having data skills in one domain and with one set of tools (e.g. logitudinal medical record data, stored in CSVs and handled in Stata) to another set of tools (billions of rows of event-based product data stored in a data warehouse, processed in R or Python). But instead the bootcamps behave like the missing skillset is the ability to make a predictive random forest model on some arbitrary data set and build an AWS web app around it. THAT job market definitely doesn't exist and is completely over-saturated.
But people who are smart communicators about data, can manipulate and make sense of massive data sets, can ask incisive questions about their data, and can use data to convince people of a complex argument are always going to have job opportunities, even if they're not production grade engineers. If that sounds like you, I'm hiring - hit me up on Twitter: @drewwww.
[+] [-] wakkaflokka|9 years ago|reply
I'm not coming from a CS background and don't purport to know absolutely all of the details of all the mathematics and theory behind many of the machine learning algorithms that I use. I try my best daily to expand my knowledge, understand the algorithms, and apply them appropriately. I would hope that any company who is looking for someone who has a PhD-in-CS-or-ML could weed someone with lesser knowledge out during the interview process.
With that being said, a couple lines of code using SciKit Learn and all the default parameters is enough to impress many non-tech companies that are looking for a way to use 'predictive' in their marketing materials. And they pay very well for it. I get the feeling that provokes the ire of people who think those types of basic implementations belong to the traditional label of 'data analyst'.
For what it's worth, I work with data sets that aren't quite large enough to justify anything more than Python, Pandas, SKLearn, Luigi pipeline, and PySpark. The vast majority of my time is spent cleaning the data and generating features, must less on the hyperparameter tuning, model training side itself.
Anyways, I think I'm rambling a bit.
I just want to say that I LOVE this job, whatever the label is, or whatever the hype surrounding the label is, and I hope it's around for a while before it's automated...
[+] [-] cardine|9 years ago|reply
[+] [-] hellogoodbyeeee|9 years ago|reply
[+] [-] TXV|9 years ago|reply
[+] [-] hiddencost|9 years ago|reply
[+] [-] rafinha|9 years ago|reply
[+] [-] huac|9 years ago|reply
One data engineering role asked me to implement k-means from scratch and one data science role asked me to do some algorithms whiteboarding. But beyond this, people just asked SQL. From the rest of this thread it doesn't look like people would consider that to be 'SWE ability'.
[+] [-] bjlkwpZxAygCIg|9 years ago|reply
3 easy steps to get a job in DS if you want them though: Grad Degree in Math/Stats/CompSci; work on a bunch of hard to predict problems and then publish and present them to your local meetup community to gain experience; learn engineering tools and devops and be about 90% as good a software engineer as your team's actual engineers (git, hg, IDEs, java, pig)... your brilliant models are way less important than being able to help the already overwhelmed engineering team make them work.
[+] [-] shas3|9 years ago|reply
[+] [-] datascientist|9 years ago|reply
[+] [-] LeanderK|9 years ago|reply
[+] [-] hardtke|9 years ago|reply
[+] [-] user5994461|9 years ago|reply
They've long understand that there are the finance analysts on the one hand and the software dev on the other. They get both and make them work together.
Looking for 5 rare skills in a single person is bound to disappointment: maths, statistics, programming, large scale systems, production.
[+] [-] runT1ME|9 years ago|reply
Can you elaborate on this, and at what level? Are you talking about a PhD level of understanding of cutting edge mathematics, or do you mean understand the basics, or somewhere in between?
[+] [-] notyourwork|9 years ago|reply
[+] [-] IndianAstronaut|9 years ago|reply
[+] [-] gech|9 years ago|reply
[+] [-] PLenz|9 years ago|reply
I think you also need to not confuse the growing ease of machine learning tools with the role becoming more accessible. There is a wide gap between tooling and knowledge to use those tools appropriately and creatively.
And may I never write another HN comment on my cell phone again.
[+] [-] aub3bhat|9 years ago|reply
The major issue is that Data Scientist is a very fuzzy term with it being applied to everyone from undergraduates with Stats degree and to those with PhDs and papers at KDD/ICML/NIPS/CVPR.
However rather than doing a Frontend or Mobile developer coding bootcamp, a data science bootcamp is likely to lead to more transferable skills in case you wish to get an MBA etc.
[1] http://stackoverflow.com/company/salary/calculator?p=7&e=1&s...
[+] [-] huac|9 years ago|reply
[+] [-] caminante|9 years ago|reply
[0] http://www.gartner.com/newsroom/id/3412017
[+] [-] BigJeffeRonaldo|9 years ago|reply
http://m.imgur.com/gallery/noBKI
[+] [-] vtange|9 years ago|reply
Right now however the theme I've heard from the higher ups has been profitability, and this applies to all tech companies in general. Easy capital is gone and now companies are in the spotlight for not making profits.
So at least from my company's perspective, it's not that data science is saturated, it's that we're trying to not break the bank and hire too much.
[+] [-] stared|9 years ago|reply
General data science is in need. I can get contracts easily, I know that people looking for competent people need to wait; especially as it is a skill much harder to pick than, say, front-end web dev (unless someone starts from a highly quantitive background like physics, modelling in biology, etc). My general impression are:
- ML (especially practical one, like logistic regression and random forest) is often integral parts of many data analyses (or at least a plus),
- there are not as many jobs solely focused on ML; and if so, often they require some specialistic expertise,
- and even less only for deep learning (also, for DL there is relatively high threshold for having skills at "hireable" level).
Some of my tips on how to learn data science: http://p.migdal.pl/2016/03/15/data-science-intro-for-math-ph... (on purpose I put the emphasis on general data exploration/analysis before machine learning).
[+] [-] fnbr|9 years ago|reply
[+] [-] user5994461|9 years ago|reply
In my opinion, you could start by defining what is a data science, a quant, or a machine learning job. Because that's not clearly defined. It means different jobs to a lot of people, jobs that are all hard to learn and absolutely NOT interchangeable.
[+] [-] lowglow|9 years ago|reply
This depends quite a bit on critical thinking, a good fundamental ability to analyze a problem and understand its parameters, then manage the logical operations required to deliver the feature and solve the problem.
As for why I think it's on HN every day: I also like to think of an innovation pipeline happening something like this:
We're now in some sort of refinement cycle of innovation, where the current medium has been saturated on some level and there is a lot of push to mine value from the discoveries.[+] [-] solomatov|9 years ago|reply
[+] [-] wjn0|9 years ago|reply
The bias might stem from the fact that we have some huge names in AI doing research here, but the data points seem clear (we say undergraduate education is slow to catch on, right?): the topic as a whole isn't overrated.
However, there seems to be a lack of understanding by people working in tech of the differences (in uses, theory, implementation) between ML, AI, NN, DL, etc. This might stem from a lack of understanding of the foundations of these topics (ex: statistics, vector calculus) or simply because we can abstract a lot of this away (ex: TensorFlow).
[+] [-] curiousgal|9 years ago|reply
That would work up to the point a better abstraction tool/framework comes along. I'd never try to build a career on a single framework, because frameworks come and go.
[+] [-] spraak|9 years ago|reply
> I'm an undergrad at a big university known for CS in Canada
than the actual name of the university?
> I'm an undergrad at ${name of university}
[+] [-] TYPE_FASTER|9 years ago|reply
I think making the transition from the first role to the second role comes with experience, both with the toolsets, and thinking about the problem as a whole.
[+] [-] thearn4|9 years ago|reply
Isn't that describing a statistician?
[+] [-] simonhughes22|9 years ago|reply
[+] [-] platz|9 years ago|reply
[+] [-] freyr|9 years ago|reply
Programming is a tool to create and synthesize. It leads to new products, companies, and solutions. Data science is analysis, not synthesis. You collect data, you interpret it, you move on to other data. Nothing gets created, which for me, is a deal breaker for job satisfaction.
[+] [-] vogt|9 years ago|reply
I can't speak to anything regarding ML, but for whatever it's worth in our segment of the market we have seen a lot of competition emerge in a big way the last few years. Former academic-type firms who specialized in bespoke economy analysis reports are starting to build software around all of the data that is out there since it's never been easier to collect and normalize it. I think it's a stretch to say the market is approaching saturation for us, though.
[+] [-] laughfactory|9 years ago|reply
There are a lot of people who know more about modeling, software engineering, statistics, machine learning, analytics, and so on than I do. But I excel at bringing everything together and solving difficult business problems. It's really difficult to train someone to be this way. It takes a lot of time, experience, skills, and a unique disposition to be an effective data scientist. At least to be the kind of data scientist I am. And I'm still early in my career.
Just my two cents. I suspect there will continue to be a glut of people who, on paper, have the data science skills, but lack all the intangibles. Who knows, maybe the various programs and boot camps will start doing business scenario learning: here's a tough real world problem where we don't tell you how to solve it, but we desperately need you to figure it out. Go!
[+] [-] apohn|9 years ago|reply
I'm going to speak primarily about applied data science. This means a data scientist who is solving a business need by doing ad-hoc analysis or building a reusable solution (e.g a R+Shiny dashboard) to a business problems.
Jobs: There are plenty of jobs out there, but you have to be careful. Many "Data Science" jobs are really BI, Business Analyst, or Sales Engineer types of jobs where some VP got it in their head that they need a Data Scientist. These jobs are great for people who are okay with Technology and Data Science being 10% of their job - and many people are like that. They don't care about engineering, coding, or tech and statistics beyond the minimum to do their jobs. But if you really want a job that involves solid tech and stats/ML skills you will be unsatisfied at these types of jobs.
Right now there are plenty of hard business problems that people want to turn into Data Science problems because they think it'll give them a competitive edge or something to market and show off. This results in more data science job openings. However, they are not really data science problems. As somebody else said, people will eventually realize they are not getting the value they need with data scientists doing these types of jobs. Then they'll replace that person with an MBA with some DS coursework (e.g. MBA who can use KNIME or SAS Enterprise Miner) or eliminate the position.
People: I interview people and I know people at other organizations who interview candidates for Data Science roles. MOOCs and many degree programs (including 2 year MS degrees) are pushing out people who have a very superficial overview of data science. Basically they teach them about every ML algorithm in the known universe and the functions to call them them in R/Python/SAS. The end result is a mediocre coder or non-coder who boils everything down to a confusion matrix or root mean squared error. But they cannot actually think through a business problem or see why a low error doesn't equal a good model (see http://www.tylervigen.com/spurious-correlations)
Finding good people is hard and you have to be flexible to realize great people can come from different backgrounds.
[+] [-] plafl|9 years ago|reply
[+] [-] DrNuke|9 years ago|reply
[+] [-] manbilla|9 years ago|reply
[+] [-] androck1|9 years ago|reply
[+] [-] manish_gill|9 years ago|reply