I started working as a data scientist 9 months ago, coming primarily from a research background. I had never heard of tools like Docker or Airflow in grad school. After reading about them though, their value to a small but growing team of data scientists was quite apparent, so our team took some time to learn how to use them. We now have a reproducible, versioned workflow that removes a lot of headaches that previously existed.
I don’t think it’s too much to expect data scientists to quickly learn some DevOps skills, as long as you can motivate the value for using them.
if you really need your data scientist to know these things then just invest some time in training. So many skills can be quickly picked up on the job, at least to a "good enough" level, yet people insist on looking for unicorns that appear to tick all the unrealistic checkboxes
This is one reason I ask brain teasers in interviews.
I don't just care about what you know now, I need to know your willingness and ability to think on your feet when confronted with a seemingly random puzzle and actually persevere towards an answer if necessary.
Not everything will fit neatly into the box of tools that were previously learned.
Lmfao, why would a data scientist need to know TCP/IP, Server Setup, SOAP/REST Web Services, SDLC, etc?
Sounds like someone looked up a list of IT stuff you should know and applied it to data scientists randomly. In fact, most of those things in the list may or may not apply to a "software engineer".
I agree that this knowledge is not necessary, but it could be useful for certain scenarios.
TCP/IP: networking between cluster nodes
Server setup: deploy a map-reduce cluster
SOAP/REST: read/write data from services
Software development life cycle: plan/deploy a reporting system for end users
I agree that there is a big issue in the field w.r.t. "unknown unknowns", where more effort needs to be put into making useful knowledge available.
However, I do not think that many of these technologies are hard for someone who understands data science, at least at the level neccessary to use them.
Doing productive developement in these more systems or CS focused topics is a wholly different topic though...
The classic problem of software engineering. Talking about how your specialist doesn't know other stuff. Then durinng interviews lamenting the fact that while you are getting well rounded generalists they are not up to par.
He/she knows SOAP/REST but that unaware of that NN model.
A human can only retain so much. Invest in a team which has it's own specializations.
I imagine what prompted this thread was the growing tendency of software companies to hire for "Data Scientist" positions and imagine that what they'll be getting is analogous to a Database or Distributed Computing specialist--someone who has a strong software engineering background plus deep knowledge of their specialty.
Yes. That's the point. They don't know them, and they can still be productive. However, if you require the knowledge it can be taught and you might have to help teach it.
[+] [-] Xcelerate|8 years ago|reply
I don’t think it’s too much to expect data scientists to quickly learn some DevOps skills, as long as you can motivate the value for using them.
[+] [-] bllguo|8 years ago|reply
[+] [-] smallnamespace|8 years ago|reply
I don't just care about what you know now, I need to know your willingness and ability to think on your feet when confronted with a seemingly random puzzle and actually persevere towards an answer if necessary.
Not everything will fit neatly into the box of tools that were previously learned.
[+] [-] justherefortart|8 years ago|reply
Sounds like someone looked up a list of IT stuff you should know and applied it to data scientists randomly. In fact, most of those things in the list may or may not apply to a "software engineer".
[+] [-] clintonb|8 years ago|reply
TCP/IP: networking between cluster nodes Server setup: deploy a map-reduce cluster SOAP/REST: read/write data from services Software development life cycle: plan/deploy a reporting system for end users
[+] [-] wohlergehen|8 years ago|reply
[+] [-] ztjio|8 years ago|reply
[+] [-] thisisit|8 years ago|reply
He/she knows SOAP/REST but that unaware of that NN model.
A human can only retain so much. Invest in a team which has it's own specializations.
[+] [-] cdancette|8 years ago|reply
I don't think a data scientist need to know all that stuff to be good at his job
[+] [-] rcoveson|8 years ago|reply
[+] [-] calt|8 years ago|reply
[+] [-] jinonoel|8 years ago|reply
[+] [-] kapauldo|8 years ago|reply