top | item 15477440

(no title)

phunge | 8 years ago

In cases where you need to interpret the resulting model, I've been advised not to bin (for example: http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous). Other alternatives are splines or generalized additive models.

discuss

order

vincent_123|8 years ago

Thank you for the comment and the link. I agree with most of the points listed there. And GAM is a great tool when there is non-linear and non-monotonic relation between the response and independent variables. GAM has good interpretability but it is still somehow difficult to understand in some business environment. For example, in credit scoring, logistic regression with binning is still widely applied.

closed|8 years ago

In my experience, most the time people use binning, it's straightforward to demonstrate that their binning+model is equivalent to restricted forms of more general models (e.g. common general additive / structural equation model). Sometimes binning is useful, because it makes them much easier to estimate.

However, people's rationales for why they should bin is often that it makes the model better / more interpretable, without actually testing the more restricted binned model against the more general one. There's certainly something to be said for knowing your audience when choosing a model, though :).