Probabilistic Machine Learning: Advanced Topics

[+] murphyk|3 years ago|reply

Hi, this is the author of the books which are being discussed in this thread and wanted to respond to some of the comments.

it_does_follow said "there is almost no mention of understanding parameter variance". I do discuss both Bayesian and frequentist measures of uncertainty in sec 4.6 and 4.7 of my intro book (http://probml.github.io/book1). (Ironically I included Cramer-Rao in an earlier draft, but omitted it from the final version due to space).

dxbydt said "how do you compute the variance of the sample variance". I discuss ways to estimate the (posterior) variance of a variance parameter in sec 3.3.3 of my advanced book (https://probml.github.io/pml-book/book2). I also discuss hierarchical Bayes, shrinkage, etc.

However in ML (and especially DL) the models we use are usually unidentifiable, and the parameters have no meaning, so nobody cares about quantifying their uncertainty. Instead the focus is on predictive uncertainty (of observable outcomes, not latent parameters). I discuss this in more detail in the advanced book, as well as related topics like distribution shift, causality, etc.

Concerning the comments on epub, etc. My book is made with latex and compiled to pdf. This is the current standard for technical publications and is what MIT Press uses.

Concerning the comments on other books. The Elements of Statistical Learning (Hastie, Tibhsirani, Friedman) and Pattern Recognition and ML (Bishop) are both great books, but are rather dated, and quite narrow in scope compared to my 2 volume collection....

[+] it_does_follow|4 years ago|reply

Kevin Murphy has done an incredible service to the ML (and Stats) community by producing such an encyclopedic work of contemporary views on ML. These books are really a much need update of the now outdated feeling "The Elements of Statistical Learning" and the logical continuation of Bishop's nearly perfect "Pattern Recognition and Machine Learning".

One thing I do find a bit surprising is that in the nearly 2000 pages covered between these two books there is almost no mention of understanding parameter variance. I get that in machine learning we typically don't care, but this is such an essential part of basic statistics I'm surprised it's not covered at all.

The closest we get is in the Inference section which is mostly interested in prediction variance. It's also surprising that in neither the section on Laplace Approximation or Fisher information does anyone call out the Cramér-Rao lower-bound which seems like a vital piece of information regarding uncertainty estimates.

This is of course a minor critique since virtual no ML books touch on these topics, it's just unfortunate that in a volume this massive we still see ML ignoring what is arguably the most useful part of what statistics has to offer to machine learning.

[+] dxbydt|4 years ago|reply

Do you really expect this situation to ever change ? The communities are vastly different in their goals despite some minor overlap in their theoretical foundations. Suppose you take rnorm(100) sample and find its variance. Then you ask the crowd the mean and variance of that sample variance. If your crowd is a 100 professional statisticians with a degree in Statistics, you should get the right answer atleast 90% of the time. If instead you have a 100 ML professionals with some sort of a degree in cs/vision/nlp, less than 10% would know how to go about computing the variance of sample variance, let alone what distribution that belongs to. The worst case is 100 self-taught Valley bros - not only will you get the wrong answer 100% of the time, they’ll pile on you for gatekeeping and computing useless statistical quantities by hand when you should be focused on the latest and greatest libraries in numpy that will magically do all these sorts of things if you invoke the right api. As a statistician, I feel quite sad. But classical stats has no place in what passes for ML these days. Folks can’t Rao Blackwellize for shit, how can you expect a Fisher Information matrix from them ?

[+] yldedly|4 years ago|reply

To get the prediction variance in a Bayesian treatment, you integrate over the posterior of the parameters - surely computing or approximating the posterior counts as considering parameter variance?

[+] barrenko|4 years ago|reply

Do you think this book is useful for someone just looking to get more into statistic and probability sans machine learning? How would I go about that?

Currently I have lined up - Math for programmers (No starch press), Practical Statistics for data scientists (O'Reily - the crab book), and Discovering Statistics using R.

Basically I'm trying to follow the theory from "Statistical Consequences of Fat Tails" by NNT.

[+] melling|4 years ago|reply

Here are some videos covering his book Probabilistic Machine Learning: An Introduction:

https://youtube.com/playlist?list=PLOk2cpmAEiU3YgtHRUm58zGkw...

[+] graycat|4 years ago|reply

Bourbaki student M. Talagrand has some work on approximate independence. If I were trying to do something along the lines of Probabilistic Machine Learning: Advanced Topics I would look

(1) carefully at the now classic

L. Breiman, et al., Classification and Regression Trees (CART),

and

(2) at the classic Markov limiting results, e.g., as in

E. Çinlar, Introduction to Stochastic Processes,

at least to be sure are not missing something relevant and powerful,

(3) at some of the work on sufficient statistics, of course, first via the classic Halmos and Savage paper and then at the interesting more recent work in

Robert J. Serfling, Approximation Theorems of Mathematical Statistics,

and then for the most promising

(4) very carefully at Talagrand.

(1) and (2) are old but a careful look along with more recent work may yield some directions for progress.

What Serfling develops is a bit amazing.

Then don't expect the Talagrand material to be trivial.

[+] axpy906|4 years ago|reply

Why should I read this as opposed to Murphy or Bishop?

[+] it_does_follow|4 years ago|reply

For clarification, Murphy's first book is just Machine Learning: A probabilistic perspective this is his newest, 2 volume book, Probabilistic Machine Learning which is broken down into two parts an Introduction (published March 1, 2022) and Advanced Topics (expected to be published in 2023, but draft preview available now).

To answer your question. This book is even more complete and a bit improved over the first book. I don't believe there's anything in Machine Learning that isn't well covered, or correctly omitted from Probabilistic Machine Learning. This also has the benefit of a few more years of rethinking these topics. So between the existing Murphy books, Probabilistic Machine Learning: an Introduction is probably the one you should have.

Why this over Bishop (which I'm not sure is the case)? While on the surface they are very similar (very mathematical overviews of ML from a very probability focused perspective) they function as very different books. Murphy is much more of a reference to contemporary ML. If you want to understand how most leading researchers think about and understand ML, and want a reference covering the mathematical underpinnings this is a book you really need for a reference.

Bishop is a much more opinionated book in that Bishop isn't just listing out all possible ways of thinking about a problem, but really building out a specific view of how probability relates to machine learning. If I'm going to sit down and read a book, it's going to be Bishop because he has a much stronger voice as an author and thinker. However Bishop's book is now more than 10 years old an misses out on nearly all of the major progress we've seen in deep learning. That's a lot to be missing and it won't be rectified in Bishop's perpetual WIP book [0.]

A better comparison is not Murphy to Murphy or Murphy to Bishop, but Murphy to Hastie et al. The Elements of Statistical Learning for many years was the standard reference for advanced ML stuff, especially during the brief time when GBDT and Random Forests where the hot thing (which they still are to an extent in some communities). I really enjoy EoSL but it does have a very "Stanford Statistics" (which I feel is even more aggressively Frequentist than your average Frequentist) feel to the intuitions. Murphy is really the contemporary computer science/Bayesian understanding of ML that has dominated the top research teams for the last few years. It feels much more modern and should be the replacement reference text for most people.

0. https://www.mbmlbook.com/

[+] ai_ia|4 years ago|reply

This is Murphy btw. Just the advanced version.

[+] unknown|4 years ago|reply

[deleted]

[+] saeranv|4 years ago|reply

What kind of tools do people use to work with probabalistic DL? Pyro, Edward, Tensorflow Probability, ...other?

[+] qumpis|4 years ago|reply

No exercises it seems, which imo are just as valuable as the contents of the book

[+] yellowcake0|4 years ago|reply

When the book is published in 2023 it will have exercises, this is just a preliminary draft.

[+] deepsun|4 years ago|reply

Why PDF? Why not ePUB or .tex? Especially since "Preface" is already badly formatted (text too far to the left overlapping line numbers).

[+] benrbray|4 years ago|reply

> Why not ePUB

ePUB is notoriously bad at displaying mathematics. It also takes away the author's control of the page layout. To me there is nothing more satisfying than a well-crafted PDF.

[+] colesantiago|4 years ago|reply

I agree.

Was very surprised that most academic documents aren't being published in DjVu [0] format anymore, very sad.

[0] https://en.wikipedia.org/wiki/DjVu

[+] jstx1|4 years ago|reply

If it was epub or tex, the first thing I would do is look for a way to make a pdf out of it.

[+] quantumduck|4 years ago|reply

I'm fairly certain that PDF was generated using LaTeX, everyone in academia uses it. Besides, it's not fair to complain about formatting in a very early draft.

[+] modernpink|4 years ago|reply

Why complain?

43 comments