top | item 37813886

(no title)

benxh | 2 years ago

Not all of them per se, take a look at something like Mistral. It's a 7B model displaying incredible performance. IMO, we still haven't even scratched the surface of what is possible with small LLMs. Especially not with pre-filtered/classified pre-training data. (Interesting LLMs based on their data approach and relatively small size: Qwen, InternLM, Mistral, Phi)

discuss

gwern|2 years ago

> Not all of them per se, take a look at something like Mistral. It's a 7B model displaying incredible performance.

I would, but they don't say what their dataset is that I can find anywhere, and the only thing they say about their instruction-tuned is that it's trained on 'publicly available' datasets. You know, the ones where a lot of them turn out under the hood to be drawing from the OA API or other pretrained models in some way or another...

> Especially not with pre-filtered/classified pre-training data.

Indeed not! But what exactly is prefiltering or classifying all that data...?

pama|2 years ago

In the context of this article, the small/tiny models were 1--30 million parameters and the large model was 1.5 billion parameters. Efficient training of a 7-billion parameter model already requires algorithms for training across multiple GPU because the memory requirement for derivatives and optimizers will not fit on the typical 80G Ram of the current high end GPUs.

brrrrrm|2 years ago

mind boggling that 7B is now considered a small model. I think it's valid, given the preeminence of 70B+ sized models. But wow, the community really just leap frogged over single digit billion parameter sizes.

joaogui1|2 years ago

Since GPT-3 OpenAI has been filtering their pre-training data, and I believe others have done it too