top | item 29123341

(no title)

Very sloppy statistics. A lot of explanatory analysis on the model is done but the model is hardly justified in being predictive in the first place.

The paper (https://journals.plos.org/plosone/article?id=10.1371/journal...) [does not mention the sensitivity or specificity of the model at all, only mentions a '91% accuracy' number]* on a biased dataset (where the number of suicidal cases is oversampled and non-suicidal cases are undersampled), without even mentioning exactly how much they over/undersampled.

* I missed the ROC curve on page 7. However it's not clear if this ROC curve was computed on the under/oversampled dataset or the original.

discuss

CrazyStat|4 years ago

> The paper [url snipped] does not mention the sensitivity or specificity of the model at all

There's a full ROC curve in Figure 2. Just eyeballing it, it looks like they get both sensitivity and specificity in excess of .9 in the top left corner (I didn't try to measure it precisely).

It would certainly be helpful to have more information about the over/undersampling.

nightcracker|4 years ago

I did miss that, oops. That said, it's again unclear if that curve is pre- or post- biased sampling. If it is on the original data it is quite good, but with a heavy biased sample fairly meaningless.

civilized|4 years ago

"You have to oversample your minority class so your data is balanced" is an urban legend that needs to die. It is never necessary, unless you are using extremely outdated model fitting methods, and even then, it would only be needed in training. It is completely unnecessary in evaluation, and metrics on biased data are going to be biased (obviously!).

If I hear this in an interview, I'm going to assume you do data science by blindly copying random blog posts.

aabaker99|4 years ago

While I agree with your overall sentiment, it is important to understand the relationship between class (im)balance and ROC curve performance. A very short article which does a great job explaining this is [0]. There are, of course, other performance metrics that are appropriate in the presence of class imbalance, such as precision-recall curves, so I wouldn't go so far as to say "metrics on biased data are going to be biased". Some metrics can correct for the class imbalance bias. Others can't.

[0] https://www.researchgate.net/profile/Jake-Lever/publication/...

cweill|4 years ago

Agreed. ROC AUC is fairly robust to over or undersamping a class in determining if your classifier is predictive.

Animats|4 years ago

Even if this works, the only question it answers is which questions on the survey correlate with considering suicide. The title is deceptive in that regard.