I can confirm that Distillbert has worked well when I have used it for classification, especially on shortish sequences. I'm really interested in trying out ModernBert, or a smaller variant due to the larger context window (8192 tokens).
I was thinking of trying ModernBERT for one of my projects. But I can only conclude after seeing the performance for my usecase. Do you think ModernBERT will be capable of expanding abbreviated sentences?
siddheshgunjal|6 months ago