top | item 6413784

(no title)

chcleaves | 12 years ago

That's a good question. Feature selection is a large field of research and is a bit too broad for me to summarize in an abbreviated fashion. I would look into "model selection", specifically into scores of models that weigh both complexity (the number of variables) and goodness of fit. A good score to look into first is the Bayesian information criterion (BIC) which is used, for instance, in model selection in neuroscience. http://en.wikipedia.org/wiki/Bayesian_information_criterion

One thing you might want to try is cross-validation (http://en.wikipedia.org/wiki/Cross-validation_%28statistics%...). Cross-validation should help you determine if your model is overfitting, as it will perform significantly better on its training set than on the left out data.

discuss

order

No comments yet.