top | item 14271819

Ask HN: Is it a waste of time to teach yourself data science without a degree?

182 points| thewarrior | 9 years ago | reply

I see a lot of people teaching themselves data science and machine learning but it seems that in the real world you won't be allowed anywhere near such a position without having a degree in the subject.

This is opposed to regular programming gigs where you can get work based on a portfolio.

Also there are efforts to commoditize common methods and algorithms by wrapping them up in APIs and SDKs.

So is it a waste of time to learn it on the side with the hopes of getting a data science job ?

138 comments

order
[+] itamarst|9 years ago|reply
(copied from answer to another similar question.)

Companies are looking for what you as a candidate can do for them.

Self-study or taking a class signals some level of "I tried to learn this thing." So that's a start.

Even better is "I built X", where X is obviously based on skill you learned. In which case you can omit the class because you have proof of learning, not just trying to learn.

Even better is "I provided business value V to my employer by building X." Because now you're showing how this skill is useful to someone else. So using skill at work is another thing to try.

Ideal is you write the above, but emphasize V (or choose between multiple things you can list) in a way that suggests you can help the needs of the particular company you're applying to.

So there's having the skill (which is good), but there's also how you present it to show it will provide value (also important).

More on the contrast between having engineering skills and marketing yourself here: https://codewithoutrules.com/2017/01/19/specialist-vs-genera...

[+] thro1111111|9 years ago|reply
While this sounds rational, I don't think that's how it works in most cases. Those who do the hiring usually prefer covering themselves by hiring someone with a degree rather than having to explain how they hired you because you made millions to someone else in the past. Also while we understand the hacker culture, many only understand "degree > job", they can't comprehend why you don't have a degree if you are so good.
[+] rdudek|9 years ago|reply
This is great, however, the biggest roadblock might be the automated HR application scanning. As a victim of this, I'm back in school to finish my degree that I started 17 years ago.
[+] usgroup|9 years ago|reply
I think with DS more often than not you're being hired into an existing operation rather than being the first hire. Your hiring manager will likely be a DS guy with a higher degree who'll hardly believe that his PhD was a waste of time. Like finds like, people like me, and all that ...
[+] imh|9 years ago|reply
I have this same non-background and work on the proverbial team of mostly PhD's. Short answer is yes, you can do it. Long answer is that you have to be really, really good to compensate, and getting to that point is absolutely exhausting. It's not about just going through a couple ML courses on coursera. You need to understand statistics, CS, and ML at a really deep level, and that means being good at applied math too. I was lucky to come out of physics and have a solid applied math background anyways, giving me a few years head start on that self study.

If you need structure to go through a few years of coursework on your own, you should go for the degree. If you just want to learn how to put pieces together and not learn how/why they work under the hood, you should opt for something else.

As with most questions about going nontraditional routes, you have to be really good to compensate, and getting really good is constant exhausting work.

[+] endymi0n|9 years ago|reply
Don't have a degree myself and about a third of the people I hire also don't have one. Why? Because I don't give anything about them.

I'd say if you don't want to work for a large, respected company first, it's a waste of time. Your degree is your entry ticket to your first job, not more. Later on, you can even work at Google if you want - just make a great product and get acquihired.

Three tips on what you should do instead:

1) BUILD something and show off your skills. Like, continuously. Always have your own challenges, do something about them, put your code online on Github. Host it so it can be seen and played with. Work towards a goal and learn what you need to learn on the side.

2) Focus on applying to companies not listing a degree in their job ad. You'll see there are quite a lot of them.

3) Don't focus on your lack of a degree in any interviews. Don't deny it, but just don't make it seem a deal. Often times, people won't even ask.

[+] cr0sh|9 years ago|reply
> I'd say if you don't want to work for a large, respected company first, it's a waste of time.

I wouldn't necessarily say that.

I don't have a degree (well, an Associates that ain't worth much). I've been employed as a software developer for 25+ years now (since I was 18 years old).

I wish I had pursued a degree, though.

Back then, when it came to my education, I was pretty lazy - at least when it came to more structured learning. I liked to pursue stuff on my own, though, at my own pace. I've done well in that manner.

In 2011, I "discovered" the idea of a MOOC: I took Andrew Ng's "ML Class" - and successfully completed it. That led me Udacity's CS373 course in 2012 - which I also completed successfully.

That isn't to say I didn't struggle with both of those: I had no experience with probabilities and stats, and I hadn't touched linear algebra since high school. But with the help of resources on the internet and elsewhere (along with help from others on the internet, and fellow classmates), I managed to complete both successfully, and I learned a lot in the process.

Last year, I started Udacity's Self-Driving Car Engineer Nanodegree. Today, I'm working on term 2. We're dealing with localization - basically learning SLAM, which was covered in the CS373 course, too. Prior to that, we learned about how Kalman filters (standard, EKF, and UKF) all worked to integrate sensor data. In the first term, as part of one of the projects I implemented NVidia's End-to-End CNN to drive a virtual car around a track.

All of these experiences, and others outside of all this, have taught me that perhaps I cut myself short by not pursuing a degree when I was younger. My current plan is once I finish this Udacity course, I'm going to get my BS online, then work toward an MS in comp sci. It isn't a matter of "I think I can do it" - I know I can do it. It's more a matter of absoluting proving it, and likely learning a lot more along the way.

I don't think a degree is a waste of time, unless you intend never learning more stuff as you "grow older". If your only goal is to "make money" and all that, maybe it is. To me, though, had I gotten my degree back then, I believe I would be much, much further along today. I can't change that, though - so all I can do is move forward.

[+] quadrature|9 years ago|reply
A good programmer with even just a high level overview of ML and Stats concepts would be an incredibly valuable asset to a data science team. Most ML people are academics who tend to not have good software engineering skills, finding people who master both domains is really hard.

Also to add to that most of the work in ML is feature engineering, data cleaning, testing and building pipelines which all require a good software engineering background.

[+] carlmcqueen|9 years ago|reply
I work for a technical statistical team in the financial world, we've been hiring PHDs lately to my team which has just meant exactly what this comment points out.

I do a lot of the grunt work of getting the data sourced, cleaned and ready and am called the 'data wizard' and other such annoying names.

What's frustrating is I can run the last lines of code and read and understand the output of the last step, but as the original question asked, management would prefer someone with a phd or masters in customer analytics to be the expert of the data output.

[+] jcadam|9 years ago|reply
I'm currently working through Andrew Ng's ML course on Coursera. It's definitely high-level (though a fantastic overview of the fundamentals), and I plan to take something a little more mathematically rigorous at some point, but I'll probably want to take some refresher courses in Linear Algebra and calc before doing so.

I'm not trying to learn about ML for purposes of employment, It's somewhat relevant to my current job, and I may have some interest in using it on my personal projects. But mostly I'm just learning for the 'fun' of it :)

I don't have the time, money, or inclination to pursue a MS in data science atm (My current 'formal' education consists of a BS in Comp Sci and an MBA), but I may go back to school when the kids are grown, more for personal edification than anything else, however. A big shift in career, from software engineer to 'data scientist (or whatever they call it)' is probably not possible at my advanced age (37).

[+] Helmet|9 years ago|reply
Really? I don't know if I'm good, but I think I'm at least a "decent" programmer and I have a solid grasp of ML and Stats concepts and I've had absolutely no luck getting interviews, much less call backs for data science jobs.
[+] jorgemf|9 years ago|reply
You can get a job with a portfolio in data science. Just go to kaggle and beat everybody in all competitions. That is worth more than a degree. Companies will try to reach you if you can do it.

But, honestly, I think it is very difficult to learn data science by yourself. Someone with experience teaching you will make a huge difference. Data science is different than programming as in programming you can see step by step what is happening, in data science most of times it either works or doesn't. And you know it after your algorithm has run through all data for at least an hour. It is really hard to learn this way, you need hints that only someone with experience can provide to you. Moreover you can do a lot of mistakes without knowing it, for example, when cleaning the dataset people use the whole dataset to fill gaps and them split it for training and test. It feels right but that it is a huge mistake that invalidates the whole experiment (because you use information from the test set in the train set, to fill the gaps).

[+] NumberCruncher|9 years ago|reply
It depends on how you define "data science".

If you are like AWS and say that using logistic regression is machine learning, then yes, you can teach yourself data science. Learn SQL, read a couple of books on logistic regression, use some open data for building a couple of models. There are many companies where you can have a decent job and an easy living with SQL and logistic regression on your tool belt.

If you say that data science starts with automating stock trading or building the intelligence of self driving cars, than no, you can not teach yourself data science. You will need at least one degree. Or more.

[+] alkonaut|9 years ago|reply
> It depends on how you define "data science".

I think the widely accepted definition is "Statistics, but on a Mac"

[+] jtcond13|9 years ago|reply
+1 to this. SQL + Logistic regression creates millions of dollars in business value every year. Some of that could be yours!
[+] monster_group|9 years ago|reply
"If you say that data science starts with automating stock trading or building the intelligence of self driving cars, than no, you can not teach yourself data science. You will need at least one degree. Or more.".

Isn't it a little presumptive telling others they can't teach themselves something? Or do you mean to imply that they can't get a job without a degree? Those are different things.

[+] StavrosK|9 years ago|reply
> no, you can not teach yourself data science. You will need at least one degree. Or more.

Why not? What is it that prevents anyone from learning anything without getting a degree? I disagree with your statement, I think it might be harder, but I don't think anyone "cannot teach themselves X".

[+] CCing|9 years ago|reply
self driving cars like comma.ai done by george hotz (dropout from college) ?

(and it's one of the best self driving software out there)

Of course you need to study(a lot), but a degree is not required.

[+] dagw|9 years ago|reply
Non of the data scientists I know actually have a degree in data science. They tend to come from either a physics, math or statistic background and have picked up the data science bits of the side.

Also many jobs that aren't data science jobs per se offer many opportunities to do data science type things. Get a job at a company that works with a type of data you find interesting, and that perhaps doesn't have a dedicated in house data scientist, and every time an interesting data related challenge shows up just go "I have a good idea on how we can approach this" (assuming you actually do). Next thing you know people will coming to you with their data science problems and before you know it you have several years of data science experience on your CV.

[+] ChemicalWarfare|9 years ago|reply
>> in the real world you won't be allowed anywhere near such a position without having a degree ...

yes, most likely they won't hire you for a "Data Scientist" position, but there are related jobs out there you can be qualified for if you have programming skills and understand DS stuff to some degree.

I've seen setups where a PhD with a "scientist" in his title would act as an architect/co-team lead with a senior engineer running a team of developers.

Someone has to implement DS' ideas after all and unless we're talking a really small team (or a jack of all trades DS) where DS has to write all the code himself - there is a need for developers with "some DS background" in those situations.

[+] jey|9 years ago|reply
> it seems that in the real world you won't be allowed anywhere near such a position without having a degree in the subject.

I don't have a degree but work as a data scientist at a research institution. I'm self-taught and was originally hired as a software engineer on the basis of my projects and work experience.

It's true that you have to convincingly make the case for your competence, but a bachelor's degree is really at best a certificate of minimal competency in a subject. Its signalling[1] value quickly gets swamped out by actual work experience where you're continually learning and improving. So there's a great hack: just do actual good work and put it on your resume. Your portfolio of work should convey your competence so well that having a degree wouldn't really add anything. (So you can skip the degree, but you'll still have to put in the work.)

Remember that any healthy organization wants to hire for competence at job duties. If some company rejects you for not having a degree because the hiring manager has to cover their ass to upper management instead of optimizing for getting work done, you should really just be glad that you dodged a bullet.

I think what's most important is to keep growing and learning. Pg had it right: "If you're worried that your current job is rotting your brain, it probably is"[2].

1. https://en.wikipedia.org/wiki/Signalling_(economics)

2. http://www.paulgraham.com/gh.html

[+] jtcond13|9 years ago|reply
Writing a full reply since I don't agree with much of the advice given.

I've worked around/in data science teams at a large BigCo and I think that you're far overestimating the bar here. There aren't enough people to who can write data pipeline code (SQL/Shell/etc.), much less implement and intelligently explain statistical/ML models. Also, the average decision maker here does not understand the difference between 'created model in Pandas' and 'created model with Amazon's ML API'.

The modal background of data scientists in industry is closer to 'Econ BA + knows Python' than 'Artificial Intelligence PhD'. Moreover, the former will still enjoy a remunerative career if (s)he's sufficiently savvy about identifying problems and showing off how they can be solved with technology.

There may be a point in time when companies can't get a return by throwing math-savvy programmers at a problem, but that will be long after you and I have passed from the scene.

[+] nilkn|9 years ago|reply
I don't think it's a waste of time. Even if you can't straight-up get a pure data science job, you can still benefit from having this background:

(1) You could focus on building data processing platforms using, e.g., Spark. This will get you very close to the data science folks and you could probably end up doing some interdisciplinary work if you wanted it and demonstrated enough interest and competence. At the very least, people who can build highly scalable data processing systems and who also have a reasonable understanding of how the data is being used are very valuable.

(2) There are lots of companies out there that don't engage in data science/machine learning at all. You could join such a company and represent the push towards developing a data science or ML division or team. If you're successful this could also get you major credit as a manager as well as putting you very close to real-world data scientists and ML projects.

[+] EternalData|9 years ago|reply
A lot of employers still use degrees as a rough proxy for ability and dedication. This may be especially prevalent in data science since the field itself tends to have a lot of Masters/PhDs occupying the field -- which will tend to bias the hiring process towards viewing degrees as a strong positive signal.

With that said, a lot of companies hiring for data science roles fall into the category of software startups -- larger companies like Google or Facebook are looking for specialists who tend to hold degrees. But at smaller companies, you can be more of a generalist and there, the old mantra of "show me what you've built" often applies. You could build out a data science career if you found just the right company.

By no means is it easy, but I wouldn't say it's a waste of your time (unless you have some incredible opportunity cost you're using up).

If you were to go about doing it, I found this blog post that can help you with your plan of attack: https://www.springboard.com/blog/learn-data-science-without-...

[+] randcraw|9 years ago|reply
To hit a target, first you have to see it clearly. The term "Data Science" covers a broad collection of jobs, from statistician to machine learning/pattern recognition/AI expert to DBA to business analyst to visualization/animation expert to cloud/cluster/Hadoop expert to general data wrangler.

The skills required for each DS role vary a lot. I wouldn't expect a cloud expert to have learned about the Hadoop stack or HPC workflows in school, at least not to a useful degree. The same goes for DBA or business analyst or data wrangler.

But statistics and ML lie at the other end of the spectrum. These roles require a hierarchy of formal skills that are rarely mastered outside of college. They're expected to keep up with the research literature or formal techniques, which almost always requires the math skills of an engineer or mathematician.

Remember, HR everywhere is technically clueless. If management doesn't tell them the precise set of skills needed for the job, they'll minimize risk and ask for more expertise and experience than is needed -- usually in the form of excess degrees or prestige or buzzwords. The best cure for this is to bypass HR and go straight to a technical manager who knows what s/he wants. That's hardest at large corporations, who tend to outsource their HR needs to the lowest bidder.

At a smaller company, a lack of degree will matter less. If you can convince them you know what they need RIGHT NOW and can learn future material quickly, that's what they want to hear. (That's probably what the bosses of the startup did).

Or if you're targeting a specific project, then if you can show (e.g. via Kaggle or an online portfolio) that you clearly have the needed skills and you're not just a script kiddie, that speaks a lot louder than a mere degree (especially if it's over a decade old).

[+] daliwali|9 years ago|reply
I hold a degree in mathematics. Small-minded HR drones have told me I'm not qualified to do programming since I'm not formally trained in computer science. I have been doing this since I was a kid.

Don't listen to them. Every professional will at some point in their career be judged by those less capable.

[+] eljefe6a|9 years ago|reply
I teach data engineering and data science. I've taught at hundreds of companies. Yes, there are self-taught people doing data science in the real world. They're few and far between, but they are out there.

If you're coming from a programming background, I'd suggest becoming a Data Engineer with the goal of becoming a Data Scientist. I've had several students do that. They were general programmers who learned Big Data/data engineering and eventually became more technical Data Scientists. You can start to learn more about the whys here: http://www.jesse-anderson.com/2017/03/what-happens-when-you-....

[+] traviswingo|9 years ago|reply
Teaching yourself anything is definitely not a waste of time.

Don't get so caught up in the "degree."

I've met individuals with graduate degrees in computer science (i know OP asked for data science, but the overall point here applies to any field) that didn't hold a candle to self taught developers. If you're actually passionate and interested about something, you will become extremely well-versed in it. On the other hand, if you're not excited about data science, a degree with probably benefit you more than without one since it will force you to learn the topic.

In a nutshell, it's up to you to make yourself valuable and present that value to the world - a degree is just a shortcut for recruiters to filter on, but you can skip recruiters and talk to anyone in any company.

[+] intellectronica|9 years ago|reply
My experience has been that when it comes to the job market _knowing_ stuff is extremely valuable, but _having learnt_ stuff isn't very valuable, unless you have an excellent degree from a top tier university. What this implies is that you should select online study options based on how they contribute to your actual knowledge, rather than how they will appear to employers (in most cases, they will appear like nothing). Once you know enough, build a portfolio of projects to show what you know and look for a job based on that - if you really know how to get stuff done in the field you'll have many options to choose from.
[+] dpflan|9 years ago|reply
Does anyone have experience with this scenario and actually completed Udacity nanodegrees for Machine Learning or Data Science or AI?

Their programs express job placement as a perk of graduation.

https://www.udacity.com/nanodegree

Educating for the "jobs of the future" is one of Udacity's goals, data scientist being one of those jobs.

[+] Raf_|9 years ago|reply
I'm half-way through their Deep Learning Foundations nanodegree and I'm generally happy with it.

Note that only selected nanodegrees come with the job placement guarantee, and that the guarantee seems to essentially mean a refund, if you fail to find a job within 6 months. https://www.udacity.com/nanodegree/plus

As a sidenote - the deepest (meta) learning I've gotten is that paying for the course made me much more engaged and determined to invest time in understanding the material and completing assignments.