top | item 13232883

Ask HN: What's the state of the job market in data science and machine learning?

204 points| _tjxd | 9 years ago | reply

If one were to use Hacker News as their only source of information, it would seem that machine learning is a very overrated topic. There is something related to it on HN's front page almost every day. This proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend? If yes, is it limited to the US? What about the machine learning scene in Europe? Maybe someone here could provide some perspective.

137 comments

order
[+] rm999|9 years ago|reply
Speaking for NYC, but I imagine silicon valley is similar.

The supply-demand dynamics have changed a lot in the last couple years. I'd roughly break it out into two groups: people with work experience + strong software development skills, and those without. The first group is in higher demand than ever, and tend to add a lot of value to companies that really need it.

The second group has gotten extremely crowded, especially from STEM graduates - usually with a masters or phd - who have completed MOOCs or bootcamps. Supply keeps growing while demand is flat or shrinking (especially as executives get burned by "data scientists" who don't know how to help them build things of value). There's a huge crunch here; a lot of people I know in this group have been searching for jobs for months, eventually settling for a low quality job or giving up entirely :(

[+] pcsanwald|9 years ago|reply
I've only been hiring DS folks since 2012, but my experience matches what you've said exactly. The biggest differentiator I've seen is to be able to participate in actually building production quality systems vs being proficient enough in R or python to hack together a prototype on a very small dataset.

The former kind of data scientists were very successful at our company, the latter, not so much. Both categories I described usually had a STEM type PhD.

[+] drew|9 years ago|reply
The problem is DS is really 2-3 different disciplines under one nebulous title. What you're describing is folks who are prototyping and productionizing models. That's definitely in short supply, but random STEM PhDs are in no way competitive for those roles unless they're coming from CS programs + have work experience in production engineering.

But that's by no means all of the DS field. There are lots of DS jobs where you're collecting and interpreting and communicating about complex data sets. An engineering mindset is occasionally helpful, but a bias towards building versus towards analyzing and writing can just as often be counter-productive. Not all problems are solved by systems; lots of problems are solved by better understanding the problem and then letting other specialists build the right solution.

The bootcamps have contributed to the problem by focusing so much on building things. The idea that you can go from an econ undergrad to being a self-sufficient member of a production ML team in 6-12 weeks is nuts. What's less nuts (and what I wish programs like Insight focused on) is taking people from having data skills in one domain and with one set of tools (e.g. logitudinal medical record data, stored in CSVs and handled in Stata) to another set of tools (billions of rows of event-based product data stored in a data warehouse, processed in R or Python). But instead the bootcamps behave like the missing skillset is the ability to make a predictive random forest model on some arbitrary data set and build an AWS web app around it. THAT job market definitely doesn't exist and is completely over-saturated.

But people who are smart communicators about data, can manipulate and make sense of massive data sets, can ask incisive questions about their data, and can use data to convince people of a complex argument are always going to have job opportunities, even if they're not production grade engineers. If that sounds like you, I'm hiring - hit me up on Twitter: @drewwww.

[+] wakkaflokka|9 years ago|reply
I'm coming from a PhD in STEM where I did a lot of application of basic ML to my (neuroscience) research, and it took me a good while to get a data science position. But once I got the job, I have been inundated with interview requests, both from recruiters and from specific companies, as in multiple a week (granted most are blast-em-all style recruiters, that I'm sure anybody with any tech skills get). Maybe it's because I'm on the East Coast? I do feel like everybody in neuro is jumping on the bandwagon, and that generates a bit of "you don't belong here" feeling from the CS- or math- educated crowd. But over time I think this will all level-out.

I'm not coming from a CS background and don't purport to know absolutely all of the details of all the mathematics and theory behind many of the machine learning algorithms that I use. I try my best daily to expand my knowledge, understand the algorithms, and apply them appropriately. I would hope that any company who is looking for someone who has a PhD-in-CS-or-ML could weed someone with lesser knowledge out during the interview process.

With that being said, a couple lines of code using SciKit Learn and all the default parameters is enough to impress many non-tech companies that are looking for a way to use 'predictive' in their marketing materials. And they pay very well for it. I get the feeling that provokes the ire of people who think those types of basic implementations belong to the traditional label of 'data analyst'.

For what it's worth, I work with data sets that aren't quite large enough to justify anything more than Python, Pandas, SKLearn, Luigi pipeline, and PySpark. The vast majority of my time is spent cleaning the data and generating features, must less on the hyperparameter tuning, model training side itself.

Anyways, I think I'm rambling a bit.

I just want to say that I LOVE this job, whatever the label is, or whatever the hype surrounding the label is, and I hope it's around for a while before it's automated...

[+] cardine|9 years ago|reply
Even outside Silicon Valley this is very similar. When I post a programming job I get more applicants from data scientists than from actual programmers. When I post a marketing job I get more applicants from data scientists than from actual marketers. It is clear to me that supply has far outpaced demand.
[+] hellogoodbyeeee|9 years ago|reply
Do you have advice for someone in the second camp? I have a master's in econ and I'm comfortable in R but I don't have good development skills. I'm wondering if I need to give up on data science jobs for a bit and try and find an entry level software development job
[+] TXV|9 years ago|reply
I think what you say is easily applicable to software engineering in general. Data science maybe is a field that is even more negatively impacted by bad hires because the threshold after which you start adding value to the company is higher.
[+] hiddencost|9 years ago|reply
This is correct. I've worked in industry for about 3 years, much in a top industry lab that built a product you probably use. I've got a couple papers, including a really good one. I've worked as a software engineer. I hopped jobs twice in a relatively short frame, with a 20% and 40% raise each time. The market is on fire if you have software engineering competence and real industrial experience.
[+] rafinha|9 years ago|reply
Isn't it 2 profiles of engineers? Those who make production code and those who work in prototypes? One thing is to understand why your model isn't converging, another is how to scale it up...
[+] huac|9 years ago|reply
How do you test for programming ability in an interview? Especially for people without a lot of work experience (e.g. graduating from college)

One data engineering role asked me to implement k-means from scratch and one data science role asked me to do some algorithms whiteboarding. But beyond this, people just asked SQL. From the rest of this thread it doesn't look like people would consider that to be 'SWE ability'.

[+] bjlkwpZxAygCIg|9 years ago|reply
There's always room for smart, hard working and creative thinkers in any field. Except maybe marine biology. But as in any lucrative field, e.g. law, you have a wide variety of capabilities and work quality. After 10 years in quantititative work (I got an MS in Applied Math before they were calling Statistics "Data Science") I can say that it's just a tough field to work in. Experience goes further than education, but education (yes, a real accredited graduate level education) is necessary. The most challenging aspect of DS isn't the technical aspects, it's being able to have a thick enough skin to not let the skeptical and reluctant engineering types upon who you depend to implement your brilliant models get under your skin and enough patience to explain and convince the skeptical and not too savvy business types who cut your checks, that their intuition is wrong and your math is right. And then of course, there's the inevitable boredom that comes with solving yet another mundane business problem with the simplest and least sexy tools. I'm not complaining, just saying that almost all STEM graduates turned DS I've ever worked with have a small hollow spot in their soul that burns with passion for the astrophysics or theoretical math problems they traded in for the perk filled corporate life.

3 easy steps to get a job in DS if you want them though: Grad Degree in Math/Stats/CompSci; work on a bunch of hard to predict problems and then publish and present them to your local meetup community to gain experience; learn engineering tools and devops and be about 90% as good a software engineer as your team's actual engineers (git, hg, IDEs, java, pig)... your brilliant models are way less important than being able to help the already overwhelmed engineering team make them work.

[+] shas3|9 years ago|reply
Frankly, MOOC-type machine learning- being able to do plain vanilla logistic regression or black-box deep learning techniques is not enough to get a job. This knowledge and experience has to be paired with one or more strengths: excellence in programming, grad-level math/stat/numerical skills/theoretical machine learning, domain specific expertise or experience (e.g. vision, audio, natural language, networks, geophysics, biomed, finance, etc.), proven ability to learn and adapt very fast.
[+] LeanderK|9 years ago|reply
I am currently studying Bachelor CS an am interested in ML/DS. How can i make sure i fall in the first group? I try to do some side-projects to expand my skill set and practice software dev. skills, but this whole thread seems pretty discouraging.
[+] hardtke|9 years ago|reply
I hire machine learning engineers and data scientists. In my opinion there is a great shortage of truly qualified machine learning engineers. A lot of people are entering the market with a general knowledge of machine learning tools. These people should be considered analysts or product data scientists. When it comes to people that can build machine learning systems that work at scale, they are very rarely available for hire and often are the subject of bidding wars by multiple companies. The key difference is whether the candidate truly understands the mathematical and statistical basis of machine learning, has the programming skills to execute their ideas, and is able to write code that can be used in large scale production systems and can be leveraged by others.
[+] user5994461|9 years ago|reply
And that's why I think finance is smarter.

They've long understand that there are the finance analysts on the one hand and the software dev on the other. They get both and make them work together.

Looking for 5 rare skills in a single person is bound to disappointment: maths, statistics, programming, large scale systems, production.

[+] runT1ME|9 years ago|reply
> The key difference is whether the candidate truly understands the mathematical and statistical basis of machine learning

Can you elaborate on this, and at what level? Are you talking about a PhD level of understanding of cutting edge mathematics, or do you mean understand the basics, or somewhere in between?

[+] notyourwork|9 years ago|reply
My colleague focused on machine learning for his phd and this is a very accurate description. Talking the talk can come with experience, being able to understand the math and execute a new system at scale is where the separation in experience will start to surface.
[+] IndianAstronaut|9 years ago|reply
Someone who can build models and someone who can scale it are 2 different people and professions. Data scientist vs data engineer.
[+] gech|9 years ago|reply
Whats the time from hire to a built live production system you would expect a truly qualified engineer to be able to achieve?
[+] PLenz|9 years ago|reply
I've been working in DS role for a few years now in NYC - and I definately feel the role is more valued on the east coast over SV. SV has a focus on consumer facing applications that are in many ways fancy CRUD. DS roles have thier place but aren't the core of the business. East coast has a b2b / infobroker focus where DS is the product. Media (especially adtech), finance, government consulting are over on this coast.

I think you also need to not confuse the growing ease of machine learning tools with the role becoming more accessible. There is a wide gap between tooling and knowledge to use those tools appropriately and creatively.

And may I never write another HN comment on my cell phone again.

[+] aub3bhat|9 years ago|reply
Stack overflow salary calculator shows a significant 50% premium over Developer salaries, all other things remaining the same. [1] Even though in my opinion the tool is flawed and actually significantly underestimates (stackoverflow underpays) salaries in SV/NYC. It is still a good indicator.

The major issue is that Data Scientist is a very fuzzy term with it being applied to everyone from undergraduates with Stats degree and to those with PhDs and papers at KDD/ICML/NIPS/CVPR.

However rather than doing a Frontend or Mobile developer coding bootcamp, a data science bootcamp is likely to lead to more transferable skills in case you wish to get an MBA etc.

[1] http://stackoverflow.com/company/salary/calculator?p=7&e=1&s...

[+] huac|9 years ago|reply
From my experience this year recruiting coming out of undergrad, for the top kids who do DS vs the top kids who do CS, the median comp is higher for DS but the highest comp packages come for CS. I wouldn't be surprised if this holds for more experienced people too.
[+] caminante|9 years ago|reply
Currently, Gartner analysts place ML at the "peak" of its Hype Cycle for Emerging Tech [0] with a runway of 2-5 years for mainstream adoption.

[0] http://www.gartner.com/newsroom/id/3412017

[+] BigJeffeRonaldo|9 years ago|reply
It was past the peak last year, i.e. machine learning somehow went backwards up thr hype cycle over the past year according to Gartner. Maybe due to deep learning news stories over the past year+.

http://m.imgur.com/gallery/noBKI

[+] vtange|9 years ago|reply
The startup I work at really favors their data scientists, though I am not one of them (I'm a frontend guy). The CEO and CTO pretty much keeps a personal eye on those guys' work.

Right now however the theme I've heard from the higher ups has been profitability, and this applies to all tech companies in general. Easy capital is gone and now companies are in the spotlight for not making profits.

So at least from my company's perspective, it's not that data science is saturated, it's that we're trying to not break the bank and hire too much.

[+] stared|9 years ago|reply
I have only anecdotal experience (I live in Warsaw, but do contracts mostly for Poland, UK and US).

General data science is in need. I can get contracts easily, I know that people looking for competent people need to wait; especially as it is a skill much harder to pick than, say, front-end web dev (unless someone starts from a highly quantitive background like physics, modelling in biology, etc). My general impression are:

- ML (especially practical one, like logistic regression and random forest) is often integral parts of many data analyses (or at least a plus),

- there are not as many jobs solely focused on ML; and if so, often they require some specialistic expertise,

- and even less only for deep learning (also, for DL there is relatively high threshold for having skills at "hireable" level).

Some of my tips on how to learn data science: http://p.migdal.pl/2016/03/15/data-science-intro-for-math-ph... (on purpose I put the emphasis on general data exploration/analysis before machine learning).

[+] fnbr|9 years ago|reply
How do you find contracts? I'm interested in doing contract data science work, but I don't know how to start finding interested potential clients.
[+] user5994461|9 years ago|reply
Like about everything on HN... You're either in the Silicon Valley or it doesn't apply to you.

In my opinion, you could start by defining what is a data science, a quant, or a machine learning job. Because that's not clearly defined. It means different jobs to a lot of people, jobs that are all hard to learn and absolutely NOT interchangeable.

[+] lowglow|9 years ago|reply
We hire applied ML/AI specialists. For me it's not just an understanding of mathematical concepts, but also being able to apply new ideas to new problems.

This depends quite a bit on critical thinking, a good fundamental ability to analyze a problem and understand its parameters, then manage the logical operations required to deliver the feature and solve the problem.

As for why I think it's on HN every day: I also like to think of an innovation pipeline happening something like this:

     [---------explore------|----------exploit-----------]
  ,->developers -> engineers/scientists -> data scientists->--,
 /----------<----------------<--------------------<----------/
We're now in some sort of refinement cycle of innovation, where the current medium has been saturated on some level and there is a lot of push to mine value from the discoveries.
[+] solomatov|9 years ago|reply
It seems that you made it reverse of what you wanted. As far as I understand, data scientists should start the exploration and developers should finish exploitation.
[+] wjn0|9 years ago|reply
I'm an undergrad at a big university known for CS in Canada. The CS program here has several possible 'focuses'; 4 of 9 are related to ML/AI directly (computer vision, NLP, AI, scientific computing). 2 others require AI/ML/NN courses.

The bias might stem from the fact that we have some huge names in AI doing research here, but the data points seem clear (we say undergraduate education is slow to catch on, right?): the topic as a whole isn't overrated.

However, there seems to be a lack of understanding by people working in tech of the differences (in uses, theory, implementation) between ML, AI, NN, DL, etc. This might stem from a lack of understanding of the foundations of these topics (ex: statistics, vector calculus) or simply because we can abstract a lot of this away (ex: TensorFlow).

[+] curiousgal|9 years ago|reply
>or simply because we can abstract a lot of this away (ex: TensorFlow).

That would work up to the point a better abstraction tool/framework comes along. I'd never try to build a career on a single framework, because frameworks come and go.

[+] spraak|9 years ago|reply
Is there some reason you'd rather write

> I'm an undergrad at a big university known for CS in Canada

than the actual name of the university?

> I'm an undergrad at ${name of university}

[+] TYPE_FASTER|9 years ago|reply
In my limited experience, there's a difference between a data scientist who can process data given data and a set of questions about it, and a data scientist who can figure out what data you need, and the questions that need to be answered.

I think making the transition from the first role to the second role comes with experience, both with the toolsets, and thinking about the problem as a whole.

[+] thearn4|9 years ago|reply
> data scientist who can figure out what data you need, and the questions that need to be answered.

Isn't that describing a statistician?

[+] simonhughes22|9 years ago|reply
I am the Chief Data Scientist of Dice.com. If you are interested in working as a junior Data Scientist, and are smart and hard working, please apply here: http://careeropportunities.dhigroupinc.com/. The position is a telecommute role. We will absolutely consider people with no data science experience, so long as they demonstrate an aptitude for data science \ machine learning and can code.
[+] platz|9 years ago|reply
I considered a graduate program in data science, but compared to average programmer salaries, it doesn't seem like data science pays all that much (excluding data science jobs for PHD's in silicon valley). It's more interesting that programming, but seems like a much tighter market with no discernible demand driving salaries up.
[+] freyr|9 years ago|reply
Interesting. I have a statistics/data science background, but personally I find programming much more satisfying.

Programming is a tool to create and synthesize. It leads to new products, companies, and solutions. Data science is analysis, not synthesis. You collect data, you interpret it, you move on to other data. Nothing gets created, which for me, is a deal breaker for job satisfaction.

[+] vogt|9 years ago|reply
I'm a designer but work for a data science company (LMI specifically). All of our data work is done in D, which I never even knew existed until I started working here.

I can't speak to anything regarding ML, but for whatever it's worth in our segment of the market we have seen a lot of competition emerge in a big way the last few years. Former academic-type firms who specialized in bespoke economy analysis reports are starting to build software around all of the data that is out there since it's never been easier to collect and normalize it. I think it's a stretch to say the market is approaching saturation for us, though.

[+] laughfactory|9 years ago|reply
As many have said here, and as a working and apparently in-demand data scientist, I agree that the tricky part about data science is that being effective isn't a matter of just any one thing. You have to be a unicorn of sorts who is, above all things, capable of solving any problem which comes your way. You have to be exceptionally flexible and very scrappy.

There are a lot of people who know more about modeling, software engineering, statistics, machine learning, analytics, and so on than I do. But I excel at bringing everything together and solving difficult business problems. It's really difficult to train someone to be this way. It takes a lot of time, experience, skills, and a unique disposition to be an effective data scientist. At least to be the kind of data scientist I am. And I'm still early in my career.

Just my two cents. I suspect there will continue to be a glut of people who, on paper, have the data science skills, but lack all the intangibles. Who knows, maybe the various programs and boot camps will start doing business scenario learning: here's a tough real world problem where we don't tell you how to solve it, but we desperately need you to figure it out. Go!

[+] apohn|9 years ago|reply
Background: I currently lead a Data Science team at a big non-tech company. Previous to this I worked at a software company that had a Data Science team in their customer facing consulting group.

I'm going to speak primarily about applied data science. This means a data scientist who is solving a business need by doing ad-hoc analysis or building a reusable solution (e.g a R+Shiny dashboard) to a business problems.

Jobs: There are plenty of jobs out there, but you have to be careful. Many "Data Science" jobs are really BI, Business Analyst, or Sales Engineer types of jobs where some VP got it in their head that they need a Data Scientist. These jobs are great for people who are okay with Technology and Data Science being 10% of their job - and many people are like that. They don't care about engineering, coding, or tech and statistics beyond the minimum to do their jobs. But if you really want a job that involves solid tech and stats/ML skills you will be unsatisfied at these types of jobs.

Right now there are plenty of hard business problems that people want to turn into Data Science problems because they think it'll give them a competitive edge or something to market and show off. This results in more data science job openings. However, they are not really data science problems. As somebody else said, people will eventually realize they are not getting the value they need with data scientists doing these types of jobs. Then they'll replace that person with an MBA with some DS coursework (e.g. MBA who can use KNIME or SAS Enterprise Miner) or eliminate the position.

People: I interview people and I know people at other organizations who interview candidates for Data Science roles. MOOCs and many degree programs (including 2 year MS degrees) are pushing out people who have a very superficial overview of data science. Basically they teach them about every ML algorithm in the known universe and the functions to call them them in R/Python/SAS. The end result is a mediocre coder or non-coder who boils everything down to a confusion matrix or root mean squared error. But they cannot actually think through a business problem or see why a low error doesn't equal a good model (see http://www.tylervigen.com/spurious-correlations)

Finding good people is hard and you have to be flexible to realize great people can come from different backgrounds.

[+] plafl|9 years ago|reply
I can speak for Spain, although I sometimes get calls from other European countries. Relative to the pathetic Spanish work market data science/machine learning is doing great. I think right now there is too much hype, which is going to stay for a few years. After that I suppose it won't be a hot thing but I don't think it's going to disappear. I hope I'm mistaken and we are really seeing some AI revolution, but after all my job is putting the trust on the data, and past data says fads come and go. If that happens I will keep with me the math, the statistics, any development skills I can learn meanwhile and of course the challenge of someday achieving true AI.
[+] DrNuke|9 years ago|reply
Worth a serious effort if you are going to use it originally in your own niche / industry, otherwise statistics will still help you more in any given market. So just learn statistics very very well and then ask again.
[+] manbilla|9 years ago|reply
I am currently an MIS graduate student with 3 years of SAP functional experience. After this boring stint and hearing the hype around Data Science, I decided to give it a try (Decent statistics and engineering skills but no coding expertise. I also finished MOOCs and am currently working on some small projects during the holidays). Considering average pay as a prominent factor, what is a better option - Learning extra SAP skills (HANA etc) and try for a job in SAP or diving into Data Science completely and try to start as an entry level Data Analyst.
[+] androck1|9 years ago|reply
Is there a market for competent developers without professional/academic experience in data science or machine learning? Perhaps just a MOOC or some Kaggle projects?
[+] manish_gill|9 years ago|reply
This is what I would like to know as well. I'm a profession dev competent with handling large scale systems. I try to learn ML on my own time but that's not quite as thorough as getting a dedicated degree. I can catch up with the grad students if I put in more time but will an employer see it? Will they take a risk even if I haven't had enough projects in the belt. etc etc