top | item 42137735

(no title)

Yes, they are still used

- Encoder based models have much faster inference (are auto-regressive) and are smaller. They are great for applications where speed and efficiency are key. - Most embedding models are BERT-based (see MTEB leaderboard). So widely used for retrieval. - They are also used to filter data for pre-training decoder models. The Llama 3 authors used a quality classifier (DistilRoberta) to generate quality scores for documents. Something similar is done for FineWeb Edu

discuss

itchyjunk|1 year ago

Wait, I thought GPT's were autoregressive and encoder only like BERT used masked tokens? You're saying BERT is auto-regressive or am I misunderstanding?

woadwarrior01|1 year ago

You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.

ipsum2|1 year ago

You can mask out the tokens at the end, so its technically autoregressive.