(no title)
patelajay285 | 1 year ago
We found Google's T5 models which were released in 2019, pre-GPT-3, were "secretly" capable of in-context learning with a simple inference technique.
Given they use a bidirectional MLM (Masked Language Modeling) objective, it wasn't obvious how to do it, but MLM objectives are known to produce better language representations than causal (next token prediction) objectives. We were able to outperform much larger sized GPT-3 models or get very close to their performance with far smaller T5 models.
cscurmudgeon|1 year ago
patelajay285|1 year ago
toxik|1 year ago
patelajay285|1 year ago
While the technique requires multiple samples to coax generations from this particular model, other LLM training schemes have incorporated both unidirectional and bidirectional objectives in their training now. However, this exploration hasn't been fully resolved as most models are still trained only on the causal objective by standard practice. There's still a a lot of exploration that can be done on pre-training objectives.