top | item 37386463

(no title)

anabis | 2 years ago

It's probably more training-compute intensive, but they can do drop-out, right? The strategy they used for ImageNet recognition, when they were using supervised learning and training data was scarse.

discuss

minimaxir|2 years ago

Dropout is one strategy for regularization but doesn't guarantee avoiding overfitting, especially now that modern AI models generalize much better than they did during the ImageNet days. Many of the big LLMs use a dropout of 0.1 though.