(no title)
jafitc | 2 years ago
But it was on purpose not trained on the big “web crawled” datasets to not learn how to build bombs etc, or be naughty.
So it is the “smartest thinking” model in weight class or even comparable to higher param models, but it is not knowledgeable about the world and trivia as much.
This might change in the future but it is the current state.
rolisz|2 years ago
monkeydust|2 years ago
dlojudice|2 years ago
ethbr1|2 years ago
If someone read a set of dictionaries, but then talked to actual people... you'd get about the same.
E.g. complete obliviousness to colloquialisms, etc.
notnullorvoid|2 years ago
Having less data embedded also means that the model is more generally usable outside the realm of chat assistants, where you only want the model to be aware about data you provide it. One example could be in games where you might have a medieval fantasy setting, it would be really weird if you could get a character to start talking to you about US politics. That probably still wouldn't work with Phi-2 without fine-tuning (as I imagine it does have some data of US politics embedded), but I hope it illustrates the point.
unknown|2 years ago
[deleted]
gumballindie|2 years ago
It wasn't trained on web crawled data to make it less obvious that microsoft steals property and personal data to monetise it.
visarga|2 years ago
The question is - if we train a model on synthetic data generated by GPT-4 which has copyright issues, what is the status of this model? Will MS have to delete it as well? And all models trained with GPT-4 data?