(no title)
rcheu | 6 years ago
The train/test data being imbalanced in the same way does give the model an advantage, but I don't think that making the test set 50% would solve the issue completely either. Doctors have been "trained" on the true distribution, while which is not 50% (I'd guess that the true distribution is actually extremely unbalanced).
The model isn't simply learning to predict no 80% of the time, it is learning the distribution of the data with respect to the input features. For example, let's say that we have a simple model with only 3 binary features. It may learn that when features X_0, X_1 and X_2 are 1, the probability of cancer is 70%. This isn't a simple multiplication of the true probability by the upscaling factor though--it depends on the percent of negative samples with this feature vector and the percent of positive samples with this feature vector.
If we are to change the test set to be 50% positive and keep the same train distribution, the model no longer has the correct information about cancer rates with respect to feature distributions, but neither does the dermatologist. The specificity and sensitivity continue to not be interpretable as predicted specificity and sensitivity in the real world.
There is no issue with reporting specificity/sensitivity if they had used the true distribution of cases. Yes, the curves/AUCs will look better than the precision/recall rates, but they do not mis-represent what the doctors are interested in (what percent of people will be missed, and what percent of healthy people will be subjected to unnecessary procedures).
Anyways, the classifier doesn't actually seem to be that good, there's actually doctors that were better than the classifiers if you check the paper.
No comments yet.