top | item 17845862

Ask HN: What do I need to learn to be data analyst?

3 points| x____x | 7 years ago

4 comments

order
[+] CuriouslyC|7 years ago|reply
Stats: Understand common distributions (gaussian, exponential, beta, gamma, laplace, bernoilli, etc). Understand how various goodness of fit tests such as chi squared, t-test and ks-test work, and when they're applicable (or not). Understand linear regression, and its basic extensions such as logistic regression, and generalized linear models (also, what sort of data breaks them). Principle component analysis is also useful, if you take the time to understand how it works.

Computer science: Clustering algorithms (k-means, hierarchical clustering, etc), some basic graph theory and graph distance/shortest path algorithms.

Validation: Learn to identify non-stationary data and autocorrelation of model errors. K-fold cross validation and ROC curves are also a good idea.

Programming: Enough knowledge to efficiently extract information from semi-structured data. Basic tabular data manipulation/transformation. At least one data visualization library. Python, pandas and matplotlib are probably your best bets here.

Data management: SQL is a safe bet. Smaller shops may use excel. Learning map/reduce with spark may be helpful as well.

Domain knowledge: You need to understand the domain you're analyzing reasonably well. Pick an area and learn it. If you're not sure what you want to work on, I suggest starting with biology or finance.

Writing: To be successful as an analyst, you need to be able to turn visualizations and the output of statistical tests into a story that's accessible for a lay audience. Few analysts take this part as seriously as the technical side, but it's tremendously important.

[+] jmcminis|7 years ago|reply
This is a really nice summary of some of the technical components required. You also need to know how to do different kinds of analysis to answer different kinds of questions. A few more things:

0. Scientific method - probably true for all domains. Not really a kind of analysis, more an approach to doing analysis.

1. Cohort analysis - used in aquisition and retention analysis.

2. Model building - used in all kinds of financial analysis.

3. A/B/... testing - determining the difference between 2 or more populations.

4. Exploratory - understanding the relationships in your data to develop intuition about it.

There are plenty of analysis techniques in use. You can learn more about these and others if you survey blogs and other literature. One that I find interesting is Tom Tunguz. He has a particular theme, but his analysis is very good. The methods and way of thought are transferrable. http://tomtunguz.com/

[+] stewbrew|7 years ago|reply
You should be more precise on what kind of role you're aiming at. I know plenty of "data analysts" who use Excel and little else. It really depends on the task at hand.
[+] fedecaccia|7 years ago|reply
First at all, a huge background on maths and statistics. Then, select a programming language and become as good as posible. I recommend you to choose python because it has a lot of libraries of data science (I recommend you to learn numpy, pandas, scipy and sklearn). After all, you should consider to put tensorflow in your learning curriculum.