top | item 25780887

(no title)

a_zaydak | 5 years ago

Thanks for the feedback! Seems like you and I both have had a bit of experience being first engineering hires at startups but have had very different experiences when it comes to rolls or a data scientist. I appreciate that.

discuss

proverbialbunny|5 years ago

Np. There is a common trend in the industry where a company hires on a data scientist, doesn't know the data prerequisites (specifically labeled data), the data scientist struggles, after a while the company fires the data scientist. This leaves the company with a bad taste in their mouth. In recent years I tend to get hired on as a specialist to help fix this. (And yes, I've been the first engineer hired on too.)

What's interesting is they tend to struggle in two different ways: 1) The data scientist that is gung ho about infrastructure work, jumps in, and then ends up doing a bad job, because it's not their strength. They end up getting let go for not being ideal at that work. 2) The data scientist who struggles with the idea of infrastructure work at all, jumps into other roles they're good at like data analyst work, helps the company in that way, but ultimately because they did not push to get an infrastructure engineer hired, they end up let go as well.

Me, I go out of my way to get an infrastructure engineer / data engineer hired early on. Also, I have worked as an engineer, so I tend to do a lot of the "hard" stuff most software engineers struggle with early on, if applicable. Eg, at one job I wrote a compression format to reduce battery drain on our devices that were collecting data.

Most data scientists struggle when it comes to CS/engineering skills (4/5th of them), so it's not uncommon for them early one while the pipes are being built to do data analyst and BI work. BI work to automate reports, which management loves, and DA work to show some amazing future service the company might be able provide to its customers. It's selling the sun and the moon really, but it gets management inspired, and helps them know what data to collect. It's not unheard of to need a minimum of two years of collected data before building a model that can be deployed becomes feasible. This can be hard on the data scientist, because there is a lot of down time before that. Many get fired during this time even when they're doing a good job. They have to wear multiple hats, but it's analyst roles (like BI work). Technically a data scientist is a kind of analyst, not engineer, so it makes sense that wearing multiple hats for them tilts in the analyst direction, not the engineering direction.

I've been writing code since I was 8 years old, so I'm one of the unusual ones that tilts in the engineering direction, but I think it is unreasonable to expect that from the average data scientist. Let them do what they do best, and hire someone else who can round everything out and you'll be in a good place. Unicorns aside, you'll need a minimum of two professionals for a data project to succeed.

_RPL5_|5 years ago

Thank you for your comments! They are very insightful. To piggyback a bit:

Assuming you are a competent data "analyst" who wants to become a data engineer, how would you go about it? Is "go back to school and get a CS degree" the answer? I suppose this question is very broad, but I am curious if a practitioner like you has an opinion.

---

To give some context:

I recently graduated with a STEM PhD, and looking to move into data science. Reading the comments, I feel like I fall into the "pointless data scientist" cohort derided in this thread. Eg: I am very comfortable doing typical analytical work & occasionally training models inside a notebook, but I am neither a cutting-edge theoretical statistician nor a data engineer.

I've been trying to improve on the engineering side. For example, I did a project recently where I set up a rudimentary pipeline that continuously pings an API, uploads the data to a cloud database, then serves up the analysis via a Flask app. For me this was a big step up from just doing notebooks on a csv file :)

But moving beyond the basics, I am not sure what to study next. Hence my question. If you have any suggestions, I would greatly appreciate it!