(no title)
benxh
|
2 years ago
Not all of them per se, take a look at something like Mistral. It's a 7B model displaying incredible performance. IMO, we still haven't even scratched the surface of what is possible with small LLMs. Especially not with pre-filtered/classified pre-training data. (Interesting LLMs based on their data approach and relatively small size: Qwen, InternLM, Mistral, Phi)
gwern|2 years ago
I would, but they don't say what their dataset is that I can find anywhere, and the only thing they say about their instruction-tuned is that it's trained on 'publicly available' datasets. You know, the ones where a lot of them turn out under the hood to be drawing from the OA API or other pretrained models in some way or another...
> Especially not with pre-filtered/classified pre-training data.
Indeed not! But what exactly is prefiltering or classifying all that data...?
pama|2 years ago
brrrrrm|2 years ago
joaogui1|2 years ago