Happy to see a book like this trending on hn, especially with a sentence like:
"Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid."
in it's preface. I definitely agree, since I wasted a lot of time doing fruitless surgeries before I went and learned about band aids in depth.
From my look at Part 1, it has some great coverage of the basics, all of which are important. Some of the fundamentals that I see left out are rightly left out since they require experience in real analysis to appreciate, and maybe aren't very actionable. There's few proofs, but, since the goal is a quick understanding, I can also appreciate this.
It looks to me like a great intro of statistics for CS people, as the author says.
> "Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid."
Having studied both statistics and neural networks, I'm not sure if I completely agree with that quote. There are lots of neural network applications that have little to do with statistics (image recognition with convolutional neural networks for example).
I am pretty sure that the author means neural networks for statistical applications though.
> since I wasted a lot of time doing fruitless surgeries before I went and learned about band aids in depth
I'm going to just assume you're being literal here. It's really brightened up my day to imagine the moment such a person discovered you could close up wounds after surgery.
Sadly, no link to free eBook, which is not surprising because it seems that the book is still in print, having been released as recently as 2004, and updated in 2005 and 2013.
This post links to the website supporting the book and provides links to errata, code and data. The links on the page to Springer and Amazon are broken: Here are valid links:
Not sure about HN's policy on posting links to pirated material, but as a Freedom of Information supporter, I will note that the book can be found at http://gen.lib.rus.ec.
While All of Statistics is wonderful in its genre, it really isn't a good place to start to learn statistics. Firstly because it focuses very heavily on the theory and contains very little on practical modeling. Secondly because the theory isn't even necessarily going to be very enlightening: frequentist statistics is a mathematical tour de force, using every possible hack you can think of to be able to draw statistical conclusions from nothing more than a few pen and paper calculations, but as a result frequentist theory won't actually give you any sort of deeper insight into the core theoretical foundations of probability and statistics.
Then, go in depth on regression. Not just feeding in the numbers and getting back a fitted model, but actually knowing how everything works, what the common issues are, how to interpret the estimates and so on. Once you've got that down, read Regression Modeling Strategies by Harrell to go really in depth.
Or if you're really just interested in prediction, Hastie and Tibshirani is wonderful of course.
Who is this book supposed to be for? Given the heavy emphasis on formalism (theorem, proof, theorem, proof, theorem, proof), and the lack of a single example that actually computes a number, I hazard a guess that this book is not for people who actually want to apply statistics to real problems.
A while back I had to teach myself Fisher matrices and the Cramér–Rao bound to solve a problem I was working on. I quickly found that 90% of statistics textbooks and lecture notes on this subject are completely useless for people like me who want to arrive at a number, not some abstract expression involving angle brackets or measures or E[...] or whatever.
The Wikipedia article on Fisher information [0] is one such example of a resource that is full of useless formal crap that crowds out an explanation for real people about how to use this statistical tool. This book appears to be of the same ilk. (Also, this book apparently does not discuss the Cramér–Rao bound. Ironic given the book's title.)
If anyone is curious, the single best explanation of the Fisher matrix and the Cramér–Rao bound that I have found is tucked away in an appendix of the Report of the Dark Energy Task Force [1]. In one page they manage to concisely and clearly explain where the Fisher matrix comes from, how to compute it, and how to apply the Cramér–Rao bound.
I found this book to be a godsend. I never took statistics and always wanted to better understand the deep conceptual ideas in the field. I had so many frustrating experiences with books that came highly recommend to me, and turned out to be not what I wanted at all. They spend chapters and chapters beating around the bush, conversationally talking about general ideas around data management and measurement bias and research design and different ways of charting data sets.
I cannot tell you how frustrating this was for me. I wanted just the meat: the core mathematical concepts on which statistical models and inferences are built. Don't tell me a folksy story about gathering soil samples, show me the tools and what they can do, both their power and their limitations. I can think for myself about how to apply those concepts.
I loved this book for being exceptionally clear and terse. I was hooked from the first sentence: "Probability is a mathematical language for quantifying uncertainty." That one sentence makes the concept clear in a way that the entire chapter on probability from "Statistics in a Nutshell" (http://www.amazon.com/Statistics-Nutshell-Sarah-Boslaugh/dp/...) did not.
I'm not someone who thrives on theorems and proofs, I thrive on concepts. And I found this book dense with clear explanations of the key concepts.
I don't have the book here at work so I can't quote the book's introduction, but in some sense the title is meant to be literal. It's an attempt to cram an entire 4-year undergraduate statistics program into a single book, and in my opinion it's mostly successful. This book is is my go-to reference for those "Ahhhh, I remember hearing about [insert statistical test here] back in college, what was it again?" moments.
When I was taking a class in Statistical Inference, we used a combination of Statistical Inference (Casella and Berger), Introduction to Mathematical Statistics (Hogg and Craig) and Probability and Statistics (Degroot and Schervish). If you're still interested in learning about Fisher Information and the Cramer-Rao Lower Bound, you can refer to pages 514 - 521 of Probability and Statistics 8th Ed. It has a number of proofs which you can skip if you're not interested but it also provides a number of examples using different distributions to calculate both the Fisher Information and the Cramer-Rao Lower Bound.
Pretty much any kind of mathematical modelling that involves uncertainty, really.
Making inferences and predictions from data, in the presence of uncertainty.
Analysis of the properties of procedures for doing the above.
If you want examples that avoid the feel of just "curve fitting" (assume you mean something like "inferring parameters given noisy observations of them") -- maybe look at models involving latent variables. Bayesian statistics has quite a few interesting examples.
Neural nets are glorified curve fitting. The are curves parameterized by the weight matrix. The weight matrix is relatively massive (e.g. 1M DOF), which makes the family of curves it generate essentially almost fluid like a piece of yarn. Now given a small amount of data, and a programmable piece of string, how well can you fit the data? Turns out the string is higher dimensional than the data, so you can fit any curve you like. The trick, it avoiding overfitting. Overfitting is the yarn warping its shape to fit noise that has no intrinsic meaning. That's what cross validation prevents ... overfitting. Stop moving the yarn to match the training data better when it fails to improve an independent performance test. Thats what machine learning is... figuring out algorithms that don't overfit and have some ability to generalize onto data not seen before. It's still basically glorified curve fitting.
Statistics is about inferring probabilities from data such that we can make predictions (where data are discernible differences of some quantities). Inference means finding out what the world is about using some sort of representation (a model). The entire project is basically concerned with (mostly lossy) compression: How to represent the complexity of the world such that we can reason about it with limited resources, i.e. estimate things we can't compute using things we can compute. If our statistic summarizes enough to allow us to make useful predictions, it is called a sufficient statistic.
Probability is at the heart of the project: frequencies that summarize reoccurring data. Instead of storing a reoccurring pattern multiple times, we just store it once and record how often it has occurred.
Statistics has, of course, grown from its beginnings as a means to summarize social/population conditions (the median number of serfs per farm, average bushels per acre, etc), yet: "estimating an accurate metric, (for example a 'central tendency'), from incomplete data" remains a central theme.
As an EE, how would you explain concepts like a PN junction or field effect transistor without using statistical mechanics? (Ie, expected behaviour for ensembles of huge numbers of particles).
Statistics is applied Probability Theory. Statistics tries to find and characterize the probability distribution of 'Random Variables' through series of observations.
Basically, you count things and compare that to how many things you think you should have counted given your assumptions.
This is one of the few hard cover books I have found worth its price. The statistical principles are succinctly explained such that they can be quickly implemented.
How does this compare to, say "Introduction to Statistical Learning" and "Elements of Statistical Learning" by Trevor et al? As I understand, the former is also supposed to be a concise introduction to statistical concepts while the latter offers a more rigorous treatment. Where does this book fall in between?
This book is not between ISL and ESL. All of Statistics is an introductory course (and has a much wider scope, including advanced topics) while even the watered-down ISL assumes that the reader knows already a bit of statistics.
[+] [-] ilzmastr|10 years ago|reply
From my look at Part 1, it has some great coverage of the basics, all of which are important. Some of the fundamentals that I see left out are rightly left out since they require experience in real analysis to appreciate, and maybe aren't very actionable. There's few proofs, but, since the goal is a quick understanding, I can also appreciate this.
It looks to me like a great intro of statistics for CS people, as the author says.
[+] [-] clishem|10 years ago|reply
Having studied both statistics and neural networks, I'm not sure if I completely agree with that quote. There are lots of neural network applications that have little to do with statistics (image recognition with convolutional neural networks for example).
I am pretty sure that the author means neural networks for statistical applications though.
[+] [-] mathgeek|10 years ago|reply
I'm going to just assume you're being literal here. It's really brightened up my day to imagine the moment such a person discovered you could close up wounds after surgery.
[+] [-] _qhtn|10 years ago|reply
[+] [-] Zuider|10 years ago|reply
This post links to the website supporting the book and provides links to errata, code and data. The links on the page to Springer and Amazon are broken: Here are valid links:
http://www.springer.com/de/book/9780387402727
http://www.amazon.com/All-Statistics-Statistical-Inference-S...
Here is the Google Books link:
https://books.google.ie/books?id=th3fbFI1DaMC&printsec=front...
[+] [-] rjeli|10 years ago|reply
[+] [-] stdbrouw|10 years ago|reply
If you're new to statistics, try Allen Downey's http://greenteapress.com/thinkstats2/index.html or Brian Blais' http://web.bryant.edu/~bblais/statistical-inference-for-ever.... Both are free.
Then, go in depth on regression. Not just feeding in the numbers and getting back a fitted model, but actually knowing how everything works, what the common issues are, how to interpret the estimates and so on. Once you've got that down, read Regression Modeling Strategies by Harrell to go really in depth.
Or if you're really just interested in prediction, Hastie and Tibshirani is wonderful of course.
[+] [-] mathheaven|10 years ago|reply
For ML Hastie and Tibshirani ISLR is very good but is more for applications of machine learning: classification, regression and prediction.
[+] [-] gaur|10 years ago|reply
A while back I had to teach myself Fisher matrices and the Cramér–Rao bound to solve a problem I was working on. I quickly found that 90% of statistics textbooks and lecture notes on this subject are completely useless for people like me who want to arrive at a number, not some abstract expression involving angle brackets or measures or E[...] or whatever.
The Wikipedia article on Fisher information [0] is one such example of a resource that is full of useless formal crap that crowds out an explanation for real people about how to use this statistical tool. This book appears to be of the same ilk. (Also, this book apparently does not discuss the Cramér–Rao bound. Ironic given the book's title.)
If anyone is curious, the single best explanation of the Fisher matrix and the Cramér–Rao bound that I have found is tucked away in an appendix of the Report of the Dark Energy Task Force [1]. In one page they manage to concisely and clearly explain where the Fisher matrix comes from, how to compute it, and how to apply the Cramér–Rao bound.
[0] https://en.wikipedia.org/wiki/Fisher_information
[1] http://arxiv.org/abs/astro-ph/0609591
[+] [-] haberman|10 years ago|reply
I cannot tell you how frustrating this was for me. I wanted just the meat: the core mathematical concepts on which statistical models and inferences are built. Don't tell me a folksy story about gathering soil samples, show me the tools and what they can do, both their power and their limitations. I can think for myself about how to apply those concepts.
I loved this book for being exceptionally clear and terse. I was hooked from the first sentence: "Probability is a mathematical language for quantifying uncertainty." That one sentence makes the concept clear in a way that the entire chapter on probability from "Statistics in a Nutshell" (http://www.amazon.com/Statistics-Nutshell-Sarah-Boslaugh/dp/...) did not.
I'm not someone who thrives on theorems and proofs, I thrive on concepts. And I found this book dense with clear explanations of the key concepts.
[+] [-] hudibras|10 years ago|reply
I don't have the book here at work so I can't quote the book's introduction, but in some sense the title is meant to be literal. It's an attempt to cram an entire 4-year undergraduate statistics program into a single book, and in my opinion it's mostly successful. This book is is my go-to reference for those "Ahhhh, I remember hearing about [insert statistical test here] back in college, what was it again?" moments.
[+] [-] nikkev|10 years ago|reply
[+] [-] nikkev|10 years ago|reply
[deleted]
[+] [-] MichailP|10 years ago|reply
Edit: Please stop the down votes, just an electrical engineer here, with one basic course in Probability and Stat. :)
[+] [-] DavidSJ|10 years ago|reply
Another way of thinking about it (described in Wasserman's book) is that statistics is the inverse problem of probability.
Probability theory asks: given a process, what does its data look like? Statistics asks: given data, what process might have generated it?
[+] [-] mjw|10 years ago|reply
Making inferences and predictions from data, in the presence of uncertainty.
Analysis of the properties of procedures for doing the above.
If you want examples that avoid the feel of just "curve fitting" (assume you mean something like "inferring parameters given noisy observations of them") -- maybe look at models involving latent variables. Bayesian statistics has quite a few interesting examples.
[+] [-] tlarkworthy|10 years ago|reply
[+] [-] hacker42|10 years ago|reply
Probability is at the heart of the project: frequencies that summarize reoccurring data. Instead of storing a reoccurring pattern multiple times, we just store it once and record how often it has occurred.
[+] [-] theophrastus|10 years ago|reply
[+] [-] johnloeber|10 years ago|reply
[+] [-] jeffwass|10 years ago|reply
[+] [-] yoplait_|10 years ago|reply
[+] [-] jsprogrammer|10 years ago|reply
Basically, you count things and compare that to how many things you think you should have counted given your assumptions.
[+] [-] marmaduke|10 years ago|reply
[+] [-] krosaen|10 years ago|reply
http://karlrosaen.com/ml/
Some links to problem set solutions there
[+] [-] zacharyfmarion|10 years ago|reply
[+] [-] sandGorgon|10 years ago|reply
[+] [-] huac|10 years ago|reply
[+] [-] pddpro|10 years ago|reply
[+] [-] kgwgk|10 years ago|reply
[+] [-] ndr|10 years ago|reply
[+] [-] gaur|10 years ago|reply
[+] [-] throwaway6497|10 years ago|reply