Active Learning is a very tricky area to get right ... over the years I have had mixed luck with text classification, to the point that my colleague and I decided to perform a thorough empirical study [1], that normalized various experiment settings that individual papers had reported. We observed that post normalization, randomly picking instances to label is better![1] https://aclanthology.org/2024.emnlp-main.1240/
marcyb5st|6 months ago
Specifically, post training you measure those on an holdout set and then you slice the results based on features. While these models tend to be more complex and potentially less understandable we feel the pros out-weight the cons.
Additionally, giving access to a confidence score to your end users is really useful to have them trust the predictions and in case that there is a non-0 cost for acting due to false positives/negatives you can try to come up with a strategy that minimize the expected costs.
unknown|6 months ago
[deleted]