It's missing a lot of crucial details. Nothing on the dataset used, nothing on the data mix, nothing on their data cleaning procedures, nothing on the tokens trained.
BERT was on arXiv before being peer reviewed. As were T5, BART, LLaMA, OPT and GPT-NeoX-20B. The Pile and FLAN were also on arXiv before being peer reviewed. Of course, the original Transformer paper was also on arXiv before being peer reviewed.
Being on arXiv before being peer reviewed is not the or even a problem.
dazed_confused|2 years ago
arugulum|2 years ago
Being on arXiv before being peer reviewed is not the or even a problem.
jmac01|2 years ago