(no title)
arugulum | 6 months ago
>a (fine-tuned) base Transformer model just trivially blowing everything else out of the water
"Attention is All You Need" was a Transformer model trained specifically for translation, blowing all other translation models out of the water. It was not fine-tuned for tasks other than what the model was trained from scratch for.
GPT-1/BERT were significant because they showed that you can pretrain one base model and use it for "everything".
No comments yet.