Ask HN: Statistics for hackers?
I've been trying to learn more about statistics of late, motivated by some really fantastic applications I've seen, like automated composition of music, medical models, and stock market tools.
Atm I've been going through the book Elements of Statistical Learning, which I got from the frontpage a few days ago. But it's kind of slow going, since without really knowing how things relate to each other all I can do is go through it sequentially. What I want is to jump in with both feet, and start writing cool code.
Does anyone know of good books or articles for someone in my situation? Or you give me sort of a minimal spanning roadmap for what I need before I can start having some fun?
I know about basic probability theory, bayesian text classification and hidden markov models, but that's about all.
[+] [-] drats|16 years ago|reply
[+] [-] gtani|16 years ago|reply
- Data Mining, by Witten and Franke; describes basics with rigor, including how to use Weka, which they wrote
http://www.amazon.com/Data-Mining-Practical-Techniques-Manag...
a couple java-based books from Manning:
- Collective Intelligence in Action (by Satnam Alag) and
- Algorithms of the Intelligen Web (Marmanis, Babenko)
-
[+] [-] waldrews|16 years ago|reply
videolectures.net is filled with lectures on CS-flavored probability modelling and machine learning topics. The best bet is the multi-hour "tutorial" lecture series and minicourses; it may take a while to choose the right starting point.
For serious stats and probability without the CS flavoring (not useful for the quick-road-to-hacking-power agenda):
For classical deep stats theory, everyone I know begins with Cassela and Berger's Statistical Inference. Don't expect algorithms in this though.
On the probability side: Feller's An Introduction to Probability Theory and Its Applications. Deep, readable, sometimes funny, full of "whoa" insights. Would be hard to actually grok every chapter in both volumes, but you read it for insight into the power of probability and then use it as a reference.
(grad student in stats, among other things)
[+] [-] etal|16 years ago|reply
http://www.bmj.com/collections/statsbk/index.dtl
The examples you gave make me think you've done some applied things with those specific techniques, but haven't covered the theory and related areas in depth. That's fine; the Square One series is simpler but comprehensive, so you'll be in good shape after that to investigate more on your own.
[+] [-] haliax|16 years ago|reply
[+] [-] caffeine|16 years ago|reply
From there:
Standard references are Hastie and Tibshirani which you already have, Pattern Recognition by Duda Hart and Stork, and PRML by Chris Bishop (though I found it boring - too many unmotivated equations). All of Statistics and especially All of Nonparametric Statistics by Wasserman are both excellent books which will fairly rapidly get you introduced to large swaths of statistical models. Papoulis (1993) is quite a good reference on statistics in general, and Joy & Cover is the usual reference of choice for information theory (which is very relevant to what you're interested in), but neither of those are much fun to actually read.
You seem less interested in classification/ML problems and more interested in straight-up stats and/or timeseries stuff. So some slightly deeper references:
- Given your interests you might absolutely love Kevin Murphy's PhD thesis on Dynamic Bayes Nets, which are excellent for describing phenomena in all three fields you mentioned.
- Check out Geoff Hinton's work, especially on deep belief nets (there's a Google tech talk and a lot of papers).
- Hinton and Ghahramani have a tutorial called "Parameter Estimation for Linear Dynamical Systems", which could be directly applicable to the models you're talking about
- If you're interested in these dynamic, causal models you'll want to learn about EM (which you should know already since you know HMMs), and its generalization Variational Bayes. MacKay has a terse chapter on variational inference; http://www.variational-bayes.org/vbpapers.html has more. One of those is an introductory paper by Ghahramani and some others, which is nice.
- Pretty much everything on http://videolectures.net will excite you.
Some of those references (esp. the VB stuff) can get slightly hairy in terms of the maths level required (depending on your background). Bayesian Data Analysis with R (by Jim Albert), or Crawley's R book (for a more frequentist approach), can get you started using R which can avoid you needing to implement all this stuff yourself, as much of it is already implemented. This might be your fastest route to writing code that does cool stuff - understand what the algo is, use somebody else's implementation, apply it to your own problem.
[+] [-] tokenadult|16 years ago|reply
"Advice to Mathematics Teachers on Evaluating Introductory Statistics Textbooks" by Robert W. Hayden
http://statland.org/MyPapers/MAAFIXED.PDF
"The Introductory Statistics Course: A Ptolemaic Curriculum?" by George W. Cobb
http://repositories.cdlib.org/cgi/viewcontent.cgi?article=10...
Both are excellent introductions to what statistics is as a discipline and how it is related to, but distinct from, mathematics.
A very good list of statistics textbooks appears here:
http://web.mac.com/mrmathman/MrMathMan/New_Teacher_Resources...
[+] [-] silentbicycle|16 years ago|reply
[+] [-] haliax|16 years ago|reply
[+] [-] keenerd|16 years ago|reply
http://en.wikipedia.org/wiki/Musikalisches_Würfelspiel
[+] [-] brent|16 years ago|reply
http://arxiv.org/abs/0909.4555
[+] [-] sh1mmer|16 years ago|reply
[+] [-] fhars|16 years ago|reply
But the real bummer is the editorial quality. There aren't any three consecutive pages without major typographical or editorial errors like missing parantheses in complex formulas or cases where they obviolsly replaced examples with simpler ones but forgot to change the illustrations together with the text.
[+] [-] pmichaud|16 years ago|reply
http://www.statisticshowto.com
And it's really relevant here because the approach to everything is step by step -- the author of the site probably doesn't realize it, but the tutorial steps practically read like pseudo code... seems like it could really help you.
Also, there are some calculators on there, and I've seen the code, which isn't bad, and it's not obfuscated so if you want to get an idea of how to implement something, you can just look at the source directly.
[+] [-] lliiffee|16 years ago|reply
Nearest neighbors methods can be implemented in something like 3 lines, so you have no excuse!
[+] [-] tsally|16 years ago|reply
[+] [-] haliax|16 years ago|reply
[+] [-] dotBen|16 years ago|reply
A friend who founded a startup that makes heavy uses of statistics likes to trawl academic papers for algorithms that help his business.
Think of research papers a bit like a well documented private object/library -- you know what data it accepts, you know what it returns, but you don't need to know how it works.
Just make sure your code reflects exactly the formulae/model documented in the paper and you're good.
[+] [-] eru|16 years ago|reply
[+] [-] Mongoose|16 years ago|reply
[+] [-] Anon84|16 years ago|reply
[+] [-] whimsy|16 years ago|reply
[+] [-] jakecarpenter|16 years ago|reply
-jc
[+] [-] zackattack|16 years ago|reply
O'Reillys statistics in a nutshell is a good reference* book, but not quite a textbook. Here you go. Including my refid. http://www.amazon.com/gp/product/0596510497?ie=UTF8&tag=...
*True masters have beginner perspectives, so they are good teachers as well.
[+] [-] haliax|16 years ago|reply
[+] [-] joeycfan|16 years ago|reply