top | item 35114613

(no title)

minxomat | 3 years ago

The whole point of the LLaMa paper is that large models are undertrained and oversized.

discuss

riku_iki|3 years ago

not sure where did you get it, but they trained 65b llama too, which outperformed llama 7b on their benchmark.

ShamelessC|3 years ago

Here is the paper and synopsis:

https://arxiv.org/abs/2302.13971

> We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.