top | item 25250455

Interpretability in Machine Learning: An Overview

182 points| atg_abhishek | 5 years ago |thegradient.pub

46 comments

order
[+] blululu|5 years ago|reply
This is a good exposition of some formal definitions for 'interpretability' in the context of machine learning, but I am still not really clear on why such a property is necessary or even desirable in the context of high dimensional statistical learning algorithms. In some sense the power of modern machine learning (as opposed to a set of heuristics + feature engineering + a linear classifier) is that it is not limited by what its designers are able to imagine or understand. If it were possible to give a simple explanation of how a high dimensional classifier works then it would also likely be unnecessary to have so many parameters.

As an example, if we consider natural language processing, then we might say that we want our NLP algorithm to be interpretable. This is clearly a tall order since the study of linguistics is still full of unsolved riddles. It seems silly to insist that a computational model of language must be significantly easier to understand than language itself. If interpretability is not feasible with language - a construct that is intimately connecting to the faculties of the human brain - then why should we expect it to be feasible (or desirable for the wide range of applications that do not come naturally to people?

[+] pkage|5 years ago|reply
As a counterargument: it's precisely because of high-dimensional statistical learning that interpretability is a valuable trait. Yes, the power of modern ML is that it can handle situations that the designers did not explicitly design for--but this doesn't necessarily mean that it handles them well. For example, if your approval for a loan is subject to an AI and it rejected you, then you want to know why you were not approved. You'd want the reason your application was not granted to be something reasonable (like a poor credit history) and not something like "the particular combination of inputs triggered some weird path and rejected you offhand." Another example is machine vision for self-driving cars. You want the car to understand what a stop sign is and not just react to the color, otherwise the first pedestrian with a red jacket will bring the car to a screeching halt. Even though you may not have had red jackets in your training set (or may not have had enough so that misclassifications ended up contributing to your error percentage), you can verify the model works as intended using interpretability.

It's dangerous to treat this sort of models as a black box, as the details of how the model makes a decision is as important as the output; otherwise, how could it be trusted?

-

This topic is the subject of my thesis, so i am currently steeped in it. Let me know if I can answer any more questions!

[+] dontreact|5 years ago|reply
It is common in my experience to catch algorithms ”cheating” by using interpretability methods. Interpretability methods are useful tools for debugging models that appear to be performing well but may in fact be using an irrelevant bias in your dataset that will not generalize.
[+] baryphonic|5 years ago|reply
> This is a good exposition of some formal definitions for 'interpretability' in the context of machine learning, but I am still not really clear on why such a property is necessary or even desirable in the context of high dimensional statistical learning algorithms.

Because the models can fail, and we want to know how to prevent them from failing further. In a pure black box model, we know about the test and validation runs, and not much else. When Google thus deploys a model that classifies accounts as toxic or not and then cancels the toxic ones, regardless of how many domains you manage or YouTube followers you have or even whether you have a YouTube TV subscription, you'd prefer knowing why the model chose to give you the axe. You might even prefer a "human in the loop" when the system makes a call but doesn't really have confidence.

For certain areas like NLP, sure, it'd be tough. But for CV tasks or many other ML tasks, some form of explanation would be invaluable and much more (human) user-friendly.

[+] didibus|5 years ago|reply
> I am still not really clear on why such a property is necessary or even desirable in the context of high dimensional statistical learning algorithms

Law, regulations, and human trust.

[+] salty_biscuits|5 years ago|reply
In my experience people will come at this from a variety viewpoints. Typically they (1) don't trust the model to learn something useful, so they want some confidence that the model isn't going to do exotic things with new inputs (i.e. they want some faith in the generalization ability of the model to unseen inputs) or (2) they want the model to help them understand the problem. Your language example is perfect. How nice would it be for linguistics if a complex model could tell you about simple structures that you didn't previously know about. It is nearly an article of faith with some people that these simple structures must be there generating the statistics we see, like they haven't considered the possibility that there might not be a simple structure underneath.
[+] bagrow|5 years ago|reply
Many industries such as insurance have legal requirements that prevent the use of many black box methods.

Scientists using ML for research often wish to understand their subjects, and interpretable ML would probably be more likely than non-interpretable ML to help improve understanding.

[+] hprotagonist|5 years ago|reply
“hey this dnn said this image has a faulty sensor in it. why? is it because it’s correctly spotting the fault, or is it that random cluster of irrelevant 12 pixels over there?”
[+] taeric|5 years ago|reply
For the nlp example, I think the goal would be more of a reflective model. That is, not so much one that we can interpret by inspection of the state, but at least one that can expand on its state in the form of "why?"

This has actually been my biggest complaint against the smart speaker craze. Often I just want to ask, what did you think I said? Or, why did you activate? To some extent, the partner app allowed this. Is very limited, though.

[+] spekcular|5 years ago|reply
One reason to want to understand what some "black box" is doing is distribution/dataset shift, especially in medical applications.

For example, suppose you're building a neural net to detect early-stage lung cancer on medical imaging, and you test/train it on patients in a small set of hospitals. Often the hospital name is given on the image, and this can be used as a covariate to improve accuracy (due to the hospitals serving different populations with different demographics). But a model that does this may suffer when put into production at other hospitals.

Some real-life examples with this flavor are given at the end of these slides: https://mlhcmit.github.io/slides/lecture10.pdf.

See also Section 6.3 of this paper for how interpretability can help choose models with superior generalization: https://arxiv.org/abs/1602.04938.

[+] touisteur|5 years ago|reply
But. Don't you wanna learn? I love implementing interpretability papers on my CNNs. It's very revealing. I also like also just toying with monte-carlo methods or nearest-neighbors to see the origin of a decision or 'how close' it was to decide otherwise in each dimension. If only to detect some unknowns in your training set.
[+] sfifs|5 years ago|reply
Apart from the usual regulatory angle (zip code in US often proxies for race), generally the need for interpretability goes with risk and frequency of decisions being made.

The risk and decision frequency profile of "tag photos with names in social media" is different vs. "decide which contract worth tens of millions of dollars is better to strike".

Both require some form of ML/Statistical inference but the higher the risk, the more the explainability required by the decision maker.

One strategy you can adopt is to break up a big decision into lots of smaller decisions (eg. Buy ads individually on auction vs. bulk publisher deals) but that kind of approach often comes with it own other costs (Infra to handle scale, transaction costs, lost negotiation leverage). In any real business investment scenario, you usually end up with many many decisions that are somewhat higher risk and they require explainability.

[+] deeeeplearning|5 years ago|reply
>in the context of machine learning, but I am still not really clear on why such a property is necessary or even desirable in the context of high dimensional statistical learning algorithms.

Depending on the context of where the Algo is used interpretability may in fact be necessary. (Try getting Deep Learning Models deployed in a Consumer Banking company.) Generally outside of hardcore tech companies (FAANG, etc.) you are usually building things in conjunction with Business partners within your org and good luck explaining to them that you want to deploy a completely opaque algorithm that will somehow solve their problems.

[+] mgpc|5 years ago|reply
Completely agree.

Some of the tools for interpretability can be useful (particularly for debugging), but I think the broader idea that we always need to be able to understand our own models is basically wrong.

For example, if you want AlphaGo to explain why it made a particular Go move, what kind of an explanation is possible? In many cases the only explanation may be that the move leads to a higher probability of a win. There simply may not be a more “compressed” or “high level” answer. Even human Go players often cannot explain why they choose particular moves, other than references to shape and feel, which is basically another way of saying their evaluation of the move leads to a higher win probability. There are a lot of domains where we may just have to accept that that _is_ the explaination.

To zoom out a bit, our greatest discoveries have historically been about finding the rare places where the universe is computationally compressible. Boiling a kettle is almost unimaginably complex to describe in terms of the interaction of elementary particles. But you can make very good predictions about that process using an equation that fits on a cocktail napkin. There may be other areas in which the universe is compressible only to a lesser extent. The parameters of AlphaGo are vanishingly small compared to the size of the Go game tree, but are very large compared to the equation we can use to predict the kettle. There may be many problems where the best descriptions lie in this intermediate domain, a domain which we have never really had access to before (except via biological brains).

So if learned models give us access to some truths without access to their (human intelligible) explanations, I think we need to just embrace that. If you allow yourself a new way of seeing, you can see new things.

[+] raspasov|5 years ago|reply
It might be desirable when the network/algorithm doesn't work as expected and throwing more data at the problem is not possible (say, you've exhausted all available data).
[+] lsorber|5 years ago|reply
Could you give an example of an unsolved riddle from linguistics?
[+] derbOac|5 years ago|reply
I agree that this is a nice piece, but I still think it's kind of fuzzy, and maybe not formal enough, and this might be related to your question.

To back up a bit, let's say you have some device (algorithm, black box, meta-DL model) that translates ML models into human-comprehensible language meeting some interpretability criterion.

Let's say that some ML model fails according to this device, that the device says "this is not translatable."

There's different possible reasons for this, but one might be that the ML model is itself at some level unlearnable, in the sense of being incompressible or unmodelable in itself, even by another black box machine. We can say that the model might meet some cross-validation criteria, etc., but if it's unexplainable in a meta-modeling sense, by anything, it implies maybe that the ML model is specious, that it's meeting some superficial criteria but not really doing what it's supposed to.

This "meta-modeling" is part of the process by which new ML models are developed by the way, at some level. It's often implicit, but we assume we understand something -- that is, there's some level of interpretability -- by virtue of the fact we can say "such and such type of DL structure is better for this type of domain" etc.

Of course, if the translating device says an ML model is untranslatable, it could just be that humans can't understand it, but the danger is that we don't really know at this point how to distinguish that from the case where the ML model is specious. It's a sort of meta-verification problem.

I also think that humans have some deeper understanding that isn't formalized yet into decision criteria regarding what constitutes a successful ML model. That is, we have some intuitive understanding of causality, and the idea of some things being causally "closer" to what we are interested in, in a vague sense. So when, e.g., photos of Obama are being classified based on silhouette position, we understand that there will be misclassification under a different set of stimulus conditions that are more comprehensive than what the model was trained and tested on. In that case, interpretability is tied, again, to speciousness and a failure of the model development process.

Incidentally, these arguments about interpretability parallel very closely debates in the psychological measurement literature in the 60s and 70s, about measures being selected based on their empirical performance ("empirically keyed" measures) versus other characteristics (e.g., internal structural considerations or theoretical interpretability criteria). There, there were similar arguments, in that some would say "it doesn't matter if the measure makes sense, the predictive performance matters". There were subsequent meta-analytic evaluations of how different approaches fared, and it turned out not to matter empirically in the long run. The reasons for this are difficult to explain in a small space, but one way to think about it is that when the empirically keyed measures were considered in a broader context than what they were developed on, they started to have limitations that were not initially considered (e.g., what happens when you have multiple empirically keyed measures simultaneously?). I think eventually the theoretically-based and internal-structural based approaches gained ground because they were easier to develop -- that is, you could improve them more and imbue them more easily with the types of characteristics you wanted them to have.

I think there's a lot to be learned from that debate in psychology. E.g., if you can't interpret an ML model, how do you achieve a set of goals in model development? What happens when constraints start to be introduced? You could approach this blindly but it seems you need some priors at least, which I think are kinda what human interpretability provides.

Maybe some day AI will be so well-developed that humans won't matter at all (e.g., it suffices to have some ML-to-AI translation device rather than an ML-to-human translation device), but I think we're far from that point.

[+] nwsm|5 years ago|reply
A good eBook on the subject that the author continually updates- Interpretable Machine Learning, A Guide for Making Black Box Models Explainable.

This book helped me implement Accumulated Local Effects in Python which we used to explain a timeseries model.

[0] https://christophm.github.io/interpretable-ml-book/

[+] zedderled|5 years ago|reply
This looks like a very good resource. Thanks for sharing.
[+] acganesh|5 years ago|reply
Hi HN, I'm an editor at The Gradient. Coincidentally we were doing some server maintenance this evening so there may have been some downtime half an hour ago.

Apologies if anyone couldn't access the piece. Everything should be back up now.

Thanks everyone for contributing to the discussion.

[+] DrNuke|5 years ago|reply
I think one relevant difference is between discrete/constrained/physical targets/results (a new material, a medical image classifier, etc.) and continuous/unconstrained/incremental targets/results (nlp at large, advertising, etc.)? The former would need an explanation to satisfy a peer-review, the latter are more than happy with a black box that just works and beats the competition?