top | item 16833654

(no title)

The issue is intra-predictor correlation. In the extreme case that a predictor is duplicated, the correct beta might be {betaa, beta(1-a)} for a in [0, 1], which an algorithm may not estimate in a stable manner. A significant degree of correlation introduces this general problem.

discuss

beagle3|8 years ago

... or worse; it is still true for any a. You could easily get {1,000,001, -1,000,000}, which for perfectly clean, precise, representable data is equivalent, but which magnifies any noise/error in one of the predictors by a million. or a billion.

MichailP|8 years ago

So say you have 3 predictors that have high intra predictor correlation. Can you still pick one of them, and discard the remaning 2? Or you cant pick any one of them?

beagle3|8 years ago

Using ridge regression (mentioned in TFA) would prefer a (1/3,1/3,1/3) average of those predictors (or a better combination, depending on their respective noises).

Using lasso (also mentioned in TFA) would prefer to pick the best of the three and drop the others.

Using elastic net would be a combination of both.

Note, though, that any method other than simple regression has tuning parameters -- depending on those, you could still end with result equivalent to plain least squares.

cocoablazing|8 years ago

You can, but why trash information that is present when you can leverage it with a different approach?