top | item 19163028

(no title)

Yes. The point I took away from this is that this is not at all a focus of most academic settings. This ends up leaving a huge gap and leaving candidates with an academic DS background woefully unprepared and undesirable.

discuss

barbecue_sauce|7 years ago

That seems strange to me. People on forums like this often describe Data Science practitioners as "statisticians that can code". If academic Data Science programs aren't emphasizing data engineering as part of their curriculum, what differentiates a Data Science program from statistics or business intelligence?

Bartweiss|7 years ago

> If academic Data Science programs aren't emphasizing data engineering as part of their curriculum, what differentiates a Data Science program from statistics or business intelligence?

In my experience, they're emphasizing software-based data work like machine learning, but not the (vital) peripherals like cleaning/studying/loading data or monitoring and sanity-checking outputs.

A data science student might get a process-first task like making predictions from data using KNN, regressions, t-tests, or neural nets, choosing a method and optimizing based on performance. A statistics student might focus on theory, choosing an appropriate analysis method in advance based on the dataset, and reasoning about the effects of error instead of just trying to reduce it.

But the data scientist could still be training on a clean, wholly-theoretical dataset or a highly predictable online-training environment. The result is a lot of entry-level data scientists who are mechanically talented but stymied by real-world hurdles. Issues handling dirty or inconstant data, for one. But there are a lot of others: a tendency to do analysis in a vacuum, without taking advantage of knowledge about the domain and data source; or judging output effectiveness based on training accuracy, without asking whether the dataset is (and will stay) well-matched to the actual task.

I don't mean that to sound dismissive; there are lots of people who do all of that well, even newly-trained. But it does seem to be a common gap in a lot of data science education.

itronitron|7 years ago

I'm not familiar with academic Data Science programs but I've worked with statisticians for over fifteen years and they are usually very involved on the data engineering side. If they aren't running the systems then they are working closely with those people to test and confirm inputs and outputs before running analyses.

darkxanthos|7 years ago

The differentiator is breadth over depth. Not getting much into theoretical underpinnings.