The issue is intra-predictor correlation. In the extreme case that a predictor is duplicated, the correct beta might be {betaa, beta(1-a)} for a in [0, 1], which an algorithm may not estimate in a stable manner. A significant degree of correlation introduces this general problem.
beagle3|8 years ago
MichailP|8 years ago
beagle3|8 years ago
Using lasso (also mentioned in TFA) would prefer to pick the best of the three and drop the others.
Using elastic net would be a combination of both.
Note, though, that any method other than simple regression has tuning parameters -- depending on those, you could still end with result equivalent to plain least squares.
cocoablazing|8 years ago