top | item 15827476

Don't Assume Your Data Scientist Is a Software Engineer. A Thread:

43 points| dynamicwebpaige | 8 years ago |twitter.com | reply

16 comments

order
[+] Xcelerate|8 years ago|reply
I started working as a data scientist 9 months ago, coming primarily from a research background. I had never heard of tools like Docker or Airflow in grad school. After reading about them though, their value to a small but growing team of data scientists was quite apparent, so our team took some time to learn how to use them. We now have a reproducible, versioned workflow that removes a lot of headaches that previously existed.

I don’t think it’s too much to expect data scientists to quickly learn some DevOps skills, as long as you can motivate the value for using them.

[+] bllguo|8 years ago|reply
if you really need your data scientist to know these things then just invest some time in training. So many skills can be quickly picked up on the job, at least to a "good enough" level, yet people insist on looking for unicorns that appear to tick all the unrealistic checkboxes
[+] smallnamespace|8 years ago|reply
This is one reason I ask brain teasers in interviews.

I don't just care about what you know now, I need to know your willingness and ability to think on your feet when confronted with a seemingly random puzzle and actually persevere towards an answer if necessary.

Not everything will fit neatly into the box of tools that were previously learned.

[+] justherefortart|8 years ago|reply
Lmfao, why would a data scientist need to know TCP/IP, Server Setup, SOAP/REST Web Services, SDLC, etc?

Sounds like someone looked up a list of IT stuff you should know and applied it to data scientists randomly. In fact, most of those things in the list may or may not apply to a "software engineer".

[+] clintonb|8 years ago|reply
I agree that this knowledge is not necessary, but it could be useful for certain scenarios.

TCP/IP: networking between cluster nodes Server setup: deploy a map-reduce cluster SOAP/REST: read/write data from services Software development life cycle: plan/deploy a reporting system for end users

[+] wohlergehen|8 years ago|reply
I agree that there is a big issue in the field w.r.t. "unknown unknowns", where more effort needs to be put into making useful knowledge available. However, I do not think that many of these technologies are hard for someone who understands data science, at least at the level neccessary to use them. Doing productive developement in these more systems or CS focused topics is a wholly different topic though...
[+] ztjio|8 years ago|reply
There is no such implication. In fact she specifically implies otherwise. It's just a matter of not making assumptions of specific knowledge.
[+] thisisit|8 years ago|reply
The classic problem of software engineering. Talking about how your specialist doesn't know other stuff. Then durinng interviews lamenting the fact that while you are getting well rounded generalists they are not up to par.

He/she knows SOAP/REST but that unaware of that NN model.

A human can only retain so much. Invest in a team which has it's own specializations.

[+] cdancette|8 years ago|reply
Data scientist have a variety of background: CS, applied mathematics, pure mathematics..

I don't think a data scientist need to know all that stuff to be good at his job

[+] rcoveson|8 years ago|reply
I imagine what prompted this thread was the growing tendency of software companies to hire for "Data Scientist" positions and imagine that what they'll be getting is analogous to a Database or Distributed Computing specialist--someone who has a strong software engineering background plus deep knowledge of their specialty.
[+] calt|8 years ago|reply
Yes. That's the point. They don't know them, and they can still be productive. However, if you require the knowledge it can be taught and you might have to help teach it.
[+] jinonoel|8 years ago|reply
After looking at the stuff listed, no worries. Most software engineers don’t know all these either
[+] kapauldo|8 years ago|reply
Don't assume your data scientist is a scientist. It's a made up non-credentialed title.