top | item 37120874

Analysis of the data job market using HN job posts

121 points| usgroup | 2 years ago |emiruz.com | reply

68 comments

order
[+] IKantRead|2 years ago|reply
With 10+ years in DS, I've always felt that best DS were always basically software engineers that knew math and were more interested in prototyping cool machine learning product than maintaining production infrastructure. Unfortunately this always accounted for a small fraction of DS I interacted with.

The largest group of DS was non-ML/CS/Math PhDs who started panicking once they realized their future job prospects in academia were very slim and so they signed up for bootcamps and got jobs at places hiring DS by the hundreds. Many of the people in this latter group had no idea how to write Python outside of a notebook, generally just structured problems to fit into XGBoost, and when not doing that tried to squeeze resume-boosting-complexity into any problem the could find. They also tended to have a hilariously poor understanding of creating business value.

Nearly everyone I know in the first group has switched back to just being an engineer of some sort, typically ML or AI engineer. I suspect the small set of talented people from the second group will end up in lesser paying product analytics type roles or closer to product management roles, while the majority that don't bring much to the table other than a PhD will be slowly attritioned out of the field as companies start looking for the value different skillsets bring to the table.

[+] apohn|2 years ago|reply
>With 10+ years in DS, I've always felt that best DS were always basically software engineers that knew math and were more interested in prototyping cool machine learning product than maintaining production infrastructure. Unfortunately this always accounted for a small fraction of DS I interacted with.

I've been a DS for 10+ years, and I feel the exact opposite. The worst "Data Scientists" I've worked with are all ex Software Engineers who seem to assume that business problems are really computation problems. So they find convenient ways to ignore the human aspects (e.g. trying to figure out why the data is a mess) and gravitate to using more complex algorithms and breaking down the problem to an achievable programming pipeline that runs in production, but the results are of low value. But it looks awesome on a resume.

Are you right or am I right about SWEs turned DS? I have no idea. But one quality that IMHO is important is the interest in actually looking at data and asking questions, which is much rarer than most people realize.

[+] bootsmann|2 years ago|reply
> Many of the people in this latter group had no idea how to write Python outside of a notebook, generally just structured problems to fit into XGBoost, and when not doing that tried to squeeze resume-boosting-complexity into any problem the could find.

To be fair, for 90% of business problems that require ML, I’d rather take the guy who throws XGBOOST at everything instead of the one trying to be fancy with neural networks. You get an explainable output and good results without deep subject matter expertise that the person likely won’t have. It also runs at a fraction of the cost.

[+] Simon_O_Rourke|2 years ago|reply
With any ML/AI problem, based on long weary hours doing the grunt work, the vast majority of time spent will be getting the data into some useful format. It doesn't matter too much how fancy you can build your models if there's nothing to train it on, or worse still, unreliable or incorrect training data.

So for newly minted Math PhDs, sure go out and learn how to do some ML coding in notebooks, but if you can't get a decent dataset together to train it on it'll be all for nought. Anyone with AI/ML coding only, and no SQL, is a no hire in my book.

[+] onlyrealcuzzo|2 years ago|reply
> They also tended to have a hilariously poor understanding of creating business value.

Is this different than your average SWE?

[+] swyx|2 years ago|reply
> Nearly everyone I know in the first group has switched back to just being an engineer of some sort, typically ML or AI engineer.

@OP - mind rerunning this analysis for "AI Engineer" titles? https://www.latent.space/p/ai-engineer anecdotally i saw 8 of these in the last Who's Hiring and wanted to tease out the emerging difference between ML and AI Engineer

[+] clatan|2 years ago|reply
A good DS is one who can understand the problem and tackle it using data, not someone who knows engineering well.
[+] gsuuon|2 years ago|reply
> I argue from data that the Data Scientist role is poorly differentiated and I speculate that its responsibilities are being eroded by better specified roles such as ML Engineer and Data Scientist.

I'm already struggling to parse the first paragraph -- is this a typo? Or do they mean a role that is both ML Engineer and Data Scientist?

[+] nerdponx|2 years ago|reply
Not a typo, this is is actually an industry trend.

Companies think they need a "data scientist" but they actually want a "software engineer with enough stats/data background to implement data science algorithms in production."

The result is that a lot of "data scientist" jobs are mostly data engineering or ML engineering and that is reflected in the list of requirements. It makes finding a good job extremely difficult, and it's for the better that the "data scientist" title is being eroded, because it makes it easier to tell when a "data scientist" job is actually data science as opposed to something else.

Not that "data science" is a good name anyway, but that's another story.

[+] its_a_random_ac|2 years ago|reply
Something I've run into is that a while ago, there were "Type A" ("Analysts") and "Type B" ("Builders") data scientists [1] , but most job postings now are just looking for "Type A" data scientists, and the "Type B" openings have been renamed to "ML Engineer" or "Data Engineer."

Took me for a bit of a spin because when I was first interviewing this year, I'd apply for DS roles and only get interviews that were very stats heavy with a leetcode easy, but started getting further when I basically stopped applying for DS roles and went straight for MLE roles.

[1] https://medium.com/@rchang/my-two-year-journey-as-a-data-sci...

[+] elAhmo|2 years ago|reply
There seems to be a typo there. But in general, I understood the argument as different roles that have narrower and better scopes are causing less popularity of the data scientist role. Basically, DS was/is used as an umbrella term to cover many different things, and as companies are understanding the field better, they are moving towards more specific roles they need (such as ML engineer), rather than hiring for a data scientist role.
[+] tomrod|2 years ago|reply
Yeah, seems like the second "Data Scientist" mention should actually be Data Engineer.

Missing is the Data Analyst component, and (as is normally typical in the discussion) the statistical experimenter (A/B, MAB, etc.)

[+] digging|2 years ago|reply
Presumably one of those Data Scientist instances is meant to be Data Engineer?
[+] stevenae|2 years ago|reply
My hot take (as a DS of 12 years): data scientist was always a over-hyped title and led many to unrealistic expectations. I think we will see a rise of data-inflected product managers (this is what I am already seeing), as IMO data scientists are most effective at scoping problems and pioneering solutions, not scaling them.
[+] tomrod|2 years ago|reply
I don't think it's a hot take among us practitioners. I view in terms of "how many capabilities are needed in the radar chart?"

Systematic MLOps helped to decrease _some_ of that, but not nearly enough, and certainly not with the recent explosion of LLM-induced hype.

I view MLOps engineers and ML engineers as tasked with scaling the problems, and research scientists as the scoping and pioneering of solutions. All three fall under the larger umbrella we call "Data Science" IMHO.

[+] SoftTalker|2 years ago|reply
I've always thought that the suffix "Scientist" on any title in a software company was likely more hype than reality. Unless the company is really doing science.
[+] Jeff_Brown|2 years ago|reply
As a data scientist I can testify that it's really two jobs -- 80% or more data engineering and 20% or less analysis. When the enterprise is small it's reasonable to want people who do both. Once it's big enough, though, specialization makes more sense -- you don't need all your data engineers to know how to draw conclusions from the data.

Moreover the people analyzing it don't need to be data scientists -- they can as easily be statisticians, economists, geneticists, etc.

[+] stanleydrew|2 years ago|reply
> you don't need all your data engineers to know how to draw conclusions from the data.

I'm not sure this is accurate. To the extent that a data project is underspecified (which, let's be honest, all projects are) then the engineers will end up making some decision somewhere that may have an impact on what's available for analysis.

If the engineers have some understanding of project motivation and hypotheses then they'll make better decisions.

[+] tomrod|2 years ago|reply
> Moreover the people analyzing it don't need to be data scientists -- they can as easily be statisticians, economists, geneticists, etc.

While data science is a newer academic field, most of its practitioners come from statistics, economics, genetics, etc.! (I say this as a 10-year data scientist who is an economist).

[+] runamuck|2 years ago|reply
Amazing article! I love the approach and description of data gathering, data prep and analysis.

I believe you should edit the text to read "Data Engineer" in the first numbered item: "I argue from data that the Data Scientist role is poorly differentiated and I speculate that its responsibilities are being eroded by better specified roles such as ML Engineer and Data [Engineer]."

[+] CoastalCoder|2 years ago|reply
> However, in so far as HN is an avant-garde community, its adoption of tech and practices likely foreshadow adoption writ large.

I'd love to see an analysis on this thesis!

Or more generally, how various sources of new-hire job descriptions correlate with each other: HN Who's Hiring and other job-advertisement boards; LinkedIn profiles; etc.

And same thing for programming languages: appearance in job postings vs. TIOBE index vs. ...

[+] agnosticmantis|2 years ago|reply
I’d say it depends on what “writ large” here refers to. Is it “the larger population of Tech startups” or “tech companies in general”?

If former then maybe the thesis holds true, but data scientist in more established companies do vastly different things than in startups.

It’d be good to extend the analysis by accounting for company size.

[+] sails|2 years ago|reply
It would be useful to give more clarity around your thoughts on the "Data Engineer" role. Is it also in decline? Is the market as a whole in a relative decline?

(I wrote a blog [1] making a call that Data Engineering was likely also to see something of a relative demand decline, or be better defined into Software Engineer and Analytics Engineer, so I am quite interested in your analysis here)

https://groupby1.substack.com/p/data-engineering

> Most businesses' data engineering needs have been solved or will shortly be solved by managed services that 10 years ago would require endless and extensive self-built ETL pipelines, databases and tools. For the exceeding majority of businesses, this means they can and should focus on building capacity for business logic, analysis and predictions instead of data engineering.

[+] maxFlow|2 years ago|reply
> It would be useful to give more clarity around your thoughts on the "Data Engineer" role. Is it also in decline? Is the market as a whole in a relative decline?

My thoughts: the tech job market as a whole has been in decline, as unanimously observed. There may be some signs that the slowdown is abating though. Next 6-12 months will be key to see how DS and DE rebound (or not).

> Most businesses' data engineering needs have been solved or will shortly be solved by managed services that 10 years ago would require endless and extensive self-built ETL pipelines, databases and tools. For the exceeding majority of businesses, this means they can and should focus on building capacity for business logic, analysis and predictions instead of data engineering.

Could not disagree more with your take of "DE demand will decline due to DE needs being already solved for most businesses". Apologies, but have you ever worked as a data engineer or even close to one? Pipelines break, requirements change, businesses expand, and infrastructure needs to be managed and optimized, etc. ETL processes, in the wild, are decidedly not one-off affairs.

[+] qqqwerty|2 years ago|reply
> Most businesses' data engineering needs have been solved or will shortly be solved by managed services that 10 years ago would require endless and extensive self-built ETL pipelines, databases and tools

A lot of what modern data engineering has turned into is connecting various tools and software together. So adding another managed service doesn't feel like it is going to magically solve the problem. It is going to be just one more tool that the DE's will be managing. And indeed that has been my experience. For every tool that we added to the stack, we ended up spending just as much time fighting the tool as we did maintaining the self-built solution that the tool replaced. The two main advantages of using tools over DIY solutions is that they have an opinionated way of doing things, and they usually come with extensive documentation. So on boarding a new team member is easier. But engineering hours saved is pretty much a wash compared to DIY once you hit that first edge case that the tool does not handle elegantly.

[+] giantg2|2 years ago|reply
Visualization will never be out of favor. Management eats up fancy charts. It's unlikely they will have dedicated roles for that. It'll just get lumped in other stuff.

Edit: why disagree?

[+] apohn|2 years ago|reply
While I generally agree, I think there's a point in these 2 statements that can easily misinterpreted.

>It is likely that the Data Scientist role is in a long term decline...

Also

> Data science is in decline and vaguely defined

Reading this, you can think that "Data Science" jobs are decreasing. But I don't think that's true.

Let's just say that it's 2017 and I hire a team of 3 people with the job title of Data Scientist. One ends up focusing on the data side, one on modeling+analysis, and one on building the infrastructure. In 2023, I decide to change the job titles so one of them is now a Data engineer, one is now a Data Scientist, and one is now a ML Engineer to match what is happening in the job market.

It's still 3 jobs with 3 people doing the same thing. So the number of jobs aren't decreasing, but their titles are more specific. Overall, the number of "Data Science" jobs are still doing up.

Somebody will say "But that's exactly what the author said." But I think people who are new(ish) to this field might read it as "Data Science Jobs are decreasing." So I'm making this comment.

> skills such as data mining and visualisation are also out of favour.

Honestly, I just don't believe this. It's possible that as job descriptions are filled with different buzzwords, people just leave these out. For visualization it's also possible that there is a bigger focus on keywords of an established BI tool (e.g. PowerBI) instead of ad-hoc charts in matplotlib or ggplot. But some degree of data mining and visualization is useful, even to Data Engineers.

[+] Ilasky|2 years ago|reply
This is some awesome analysis - great job! And I’ve seen it first-hand from my own job hunt with different success looking for data scientist vs ML/AI engineer positions.

I think it really comes down to a lot of marketing, which you touch upon a bit. AI is in a hype cycle right now and people want it on their products and in their companies, so they want people that are capable of bringing those skills to the table.

[+] bityard|2 years ago|reply
> I have worked in “big data”, “data science” or something adjacent for around 12 years, and in that time I have observed these fields (and their associated roles) change a lot. I had never thought about it much because it was never very difficult to find work, however, recent times have been a bit different because my neck of the woods

Nah, recent times have been "different" for everyone. If the author entered the job market in 2011, then this is the first economic recession they have seen. Economic downturns happen roughly every 10-12 years or so and generally cause a fair bit of turmoil.

Forest fires are a necessary part of a healthy natural wooded ecosystem. I tend to think of economic downturns the same way. Every once in a while, companies (or entire markets) have to look carefully at what really adds value to their businesses and figure out how to focus on that when cash flow dwindles and investors clam up. If a business doesn't survive a recession, then it was on shaky ground well before the economy went south.

[+] usgroup|2 years ago|reply
see the discussion section -- it is apparent that data scientist roles are in decline in isolation.
[+] 71a54xd|2 years ago|reply
I think it's clear the market is... "struggling" unless you're a senior dev who has 8+ years exp.
[+] revlolz|2 years ago|reply
Perhaps, I think there are a couple of things contributing to the declines represented since mid 2022. I believe them to be economic conditions. My outlook is quite the opposite as I think data roles will continue to grow in demand over time for the next 3 years.

The overall economy is not doing well right now, at least in the USA. There is significant amount of extra talent (supply) on the market due to big tech hiring freezes and layoffs, which also contributes to lower demand of new roles. There is the return to office debacles occurring all the while housing is becoming even more unaffordable due to high interest rates (compared to the recent 2 to 3% during covid) and low supply a consequence of the rates where owners aren't going to want to trade a 2 to 4% for 7+ nearly 8% right now. I don't cite rto debacles to have a debate on the specifics of if rto is good/bad, but I would speculate that it's pushing more talent onto the job market to escape working environments forcing any style (rto or forced remote) that an employee disagrees with.

So, in my mind the only thing my speculation doesn't really cover is how that would contribute to lower HN responses to which I don't have an answer, maybe it truly is shrinking. However, my gut says it's the economic factors. I think (and hope) that the shift in conditions occurs in the next 6 months and hiring ramps back up as companies recover and adapt.

[+] PartiallyTyped|2 years ago|reply
There have been comments here from 8+ engineers who are struggling. I guess that’s survivorship bias; but I don’t think it’s a guarantee.
[+] robertlagrant|2 years ago|reply
This is only helpful as context, but you need to have had quite a few years' experience to remember before the era of cheap money and massive FAANG-inflated salaries. Now FAANG is getting significantly regulated and fined, and money is more expensive, things will no doubt start to cool off from a salary perspective.
[+] rpastuszak|2 years ago|reply
It's much easier to find any job as a senior dev.

It's almost impossible to find a genuinely useful job there, regardless of experience.

[+] OnlyMortal|2 years ago|reply
Huh. I’d never even registered the “Jobs” link on the home page title bar until I saw this post.

In recent years, I’ve always gone via agents who contacted me on LinkedIn - which has become something like a naffer Facebook.

Perhaps I ought to pay a little more attention in future.

[+] epups|2 years ago|reply
I think the work that needs to be done in the field of Data Science has not changed fundamentally, and simply varies from one organization to another on the specifics. As a poster above said, a Data Scientist can easily expect to spend most of their time doing data engineering at any point in time. But while "Data Science" was a big title and commanded high salaries before, now titles involving AI or Machine Learning are getting paid more, so specialists tend to adopt them to differentiate themselves.
[+] bee_rider|2 years ago|reply
What does a data scientist do?

It the description given here (data mining and visualization), and the fact that for a long time this was (I’m pretty sure) advertised as a bootcamp-appropriate sort of role seems to indicate that make this is not a role in and of itself?

A little coding and the ability to think about data seems like a generally useful add-on skill for most roles? Maybe a we’re seeing unsatisfied need for, like, office workers with some technical proficiency?

[+] nerdponx|2 years ago|reply
"Data scientist" properly ought to be something like "statistician" or "predictive modeler" in most orgs.

I think the need for a broader umbrella term is still present and I think that explains the wide adoption of the word "data science" in the first place. But the current meaning has been stretched way too far.

> Maybe a we’re seeing unsatisfied need for, like, office workers with some technical proficiency?

Specifically, data analysts who also know Python.

[+] VectorLock|2 years ago|reply
Most "Data Scientists" I've encountered are pretty much doing basic software engineering ETL projects.
[+] tomrod|2 years ago|reply
I'd love to see a few other items:

1. Is this a true decline, or in line with general tightening of the tech economy?

2. Where are the "analysis" conventions going generally -- HN is going to be a weird subsample of the economy as a whole, given that BI, DA, BA roles still exist and overlap with DS -- on top of that, many industries still haven't adopted Research Scientists, MLE, MLOps Eng, etc. into their lexicon of roles

[+] usgroup|2 years ago|reply
see the discussion section. author argues that it is a true decline because other roles (data engineer, ml engineer and data analyst) are either keeping or gaining share.
[+] Oras|2 years ago|reply
Nice to see this.

Few months ago I've created a platform to analyze jobs based on Google for jobs data in real-time.

You can search by job title and location, and it will give an indication about the job market.

I did it as I wanted to understand which publishers are appearing more on Google For Jobs, and how many jobs are remote in certain locations.

https://rta.jobdescription.ai

[+] PLenz|2 years ago|reply
DS was always an overloaded title - speciation into various other titles is ultimately good and indicates a healthy and maturing ecosystem. You still need DS though, in the multi-armed bandit that is your organization your real DS are your explore function - they figure out what to do. The other roles are exploit - they do it.
[+] nerdponx|2 years ago|reply
I think this sells the position short. Data analyst explore, data scientists are to have enough skill and expertise to actually make something out of what they find. That might be an XGBoost model to deliver a monthly forecast, or it might be a setting up an automated decision process.

However where I draw the line (and where I think most data scientists should draw the line) is actually putting that stuff into production code. Maybe they're good enough to write the prototype, but you need somebody else on hand to help with test coverage, make sure it meets performance requirements, triage bug reports, etc. if you make your data scientist responsible for that, they are going to spend all of their time doing that, instead of doing the things that they are actually trained to do and that you are paying them to do. This is true even if they are a perfectly competent software developer.

[+] catsarebetter|2 years ago|reply
This is a great article, love the deep breakdown and use of the charts