That's a good question. Feature selection is a large field of research and is a bit too broad for me to summarize in an abbreviated fashion. I would look into "model selection", specifically into scores of models that weigh both complexity (the number of variables) and goodness of fit. A good score to look into first is the Bayesian information criterion (BIC) which is used, for instance, in model selection in neuroscience. http://en.wikipedia.org/wiki/Bayesian_information_criterionOne thing you might want to try is cross-validation (http://en.wikipedia.org/wiki/Cross-validation_%28statistics%...). Cross-validation should help you determine if your model is overfitting, as it will perform significantly better on its training set than on the left out data.
No comments yet.