top | item 21179378

Foundations of Data Science [pdf]

165 points| Anon84 | 6 years ago |cs.cornell.edu | reply

20 comments

order
[+] therobot24|6 years ago|reply
this is a weird collection of ML/computer vision/data processing topics

wavelets as it's own chapter, deep learning only has GANs as a single subsection, graphical models several chapters after the ML chapter...just weird arrangement choices

[+] mlevental|6 years ago|reply
it's a theory heavy intro to a wide variety of analytical techniques. it's a common refrain that deep learning has very little in the way of theory. and in fact gans with their optimization scheme framed as a game that plays until equilibrium is one of the few pieces, that's also interesting.
[+] crimsonalucard|6 years ago|reply
If I read and master most of the stuff in this book, am I qualified to do data science?
[+] perturbation|6 years ago|reply
I'd recommend Elements of Statistical Learning or ISLR instead, if you want to start with a theory-heavy introduction. Most of what you need for DS you'd I think better learn through projects or on-the-job.

Also, as others have mentioned, some of the most important skills for DS are data munging, data "presentation", and soft skills like managing expectations / relationships / etc.

I would not recommend this book if you want to get into DS with the idea that, "I'll read this and then I'll know everything I need to." It's too dense and academically-focused, and it would probably be discouraging if you try to read this all without getting your feet wet.

[+] soVeryTired|6 years ago|reply
It depends what you mean by "do data science". If you've read and mastered what's in the book, you'll have a good chance at writing a nice PhD in machine learning. But there's a lot of academic fluff in there that isn't too useful outside of university.

Most data scientists are consumers of algorithms, not producers of algorithms. The rules are a bit different if you're at a bigco, but most data scientists don't do active research. It's nice to have a solid theoretical understanding machine learning, but most data data scientists' day-to-day consists of chaining together libraries and building nice dashboards.

[+] dagw|6 years ago|reply
The book looks fine, if a little random in its selection of topics, and it's probably as good a start as any, but is neither necessary nor sufficient. Much of data science in 'the field' is hunting down, gather and cleaning data. Knowing the right 'black boxes' to chain together in the right order to get the result you want, and knowing how to present the results in an easy to understand way so that the people that have to make the decisions can make the decisions they have to make.

Knowing the theory is important, but most of the time what you actually need know is to quickly knock together a script that pulls in, cleans up and merges some dirty data from 3 different sources, select the right out of the box algorithm for the situation and presents the results in a clear and pedagogical way.

[+] commandlinefan|6 years ago|reply
Looks like you need a _lot_ of foundational stuff first. Page 17 has already jumped into multiple integrals, which at least when I was studying undergraduate calculus 30 years ago wasn’t covered until Calc IV. The appendix rushes through a lot of stuff you’re apparently expected to already have had some exposure to like Taylor series, probability density functions, eigenvalues and eigenvectors… it looks like you’d need at least an undergraduate (if not a graduate) level math degree before you could make much sense of the contents of this book.
[+] throwaway_bad|6 years ago|reply
I have taken a course based on this book (before it was retitled "foundations of data science"). It was more or less a math class where all homeworks are about proving stuff. You won't write a single line of code and usually don't even talk too deeply about the applications.

So no, you'd would be completely unqualified. You would however gain a deep understanding that might help you come up with novel new techniques for solving large scale problems.

[+] ChaseT|6 years ago|reply
I was in one of Hopcroft's courses (an author of this book) for a while and if this book is on brand with his style and fields of study, it's super theory heavy. You won't write a line of code; you would not be useful to a company hoping to apply data science in the world.
[+] devicetray0|6 years ago|reply
I was hoping there would be a section/mention on data privacy, but a CTRL+F revealed no mention of the word.