top | item 13164476

(no title)

caspianm | 9 years ago

Say I have two categories, programming articles and non- programming articles, and some other data about each article. And I want to predict whether the article will be interesting or not. And I want to be fair to interesting non-programming articles by having the same proportion of false negatives to correct positives in the non-programming subset of articles as in the programming subset of articles.

Is there a technical term for that in statistics?

It's like trying to get a representative sample, but only representative in one specific way (topic), and deliberately non representative in another (interestingness)

I think this could get at one of the things people mean, and it might be interesting to see how this trades off against overall accuracy or representativeness in other categories.

discuss

No comments yet.