top | item 38889539

Microsoft Phi-2 model changes licence to MIT

240 points| regularfry | 2 years ago |huggingface.co

90 comments

[+] RcouF1uZ4gsC|2 years ago|reply

It is really exciting to see these open models.

What is interesting is that the AI “ethicists” all want to serve as a high priesthood controlling access to ML models in the name of safety. However, I think the biggest danger from AI is that these models will be used by those who control the models to control and censor what people are allowed to write.

These open source models in the hands of the public, are, IMO the best defense against the true danger of AI.

Kudos to Facebook and Microsoft and Mistral for pushing this.

[+] acdha|2 years ago|reply

> What is interesting is that the AI “ethicists” all want to serve as a high priesthood controlling access to ML models in the name of safety.

This is a very uncharitable take. I would suggest familiarizing yourself with the actual arguments rather than summaries on social media. There’s considerably more thought than you’re crediting them with, and extensive discussion around the risk you’re worried about along with proposed solutions which – unlike your “best defense” – could actually work.

[+] potatoman22|2 years ago|reply

I think it's harmful to characterize "all" AI ethicists as a "priesthood" wanting to gatekeep access to these models. There are plenty of people who care both about the democratizing of these tools as well as safe and ethical use.

[+] jillesvangurp|2 years ago|reply

I think at this point, the cat is out of the bag. Relying on not so nice people complying with license legalese was never going to be a great way to impose control. All that does is stifle progress and innovation for those who are nice enough to abide by the law. But anyone with other intentions in say Russia, North Korea, China, etc. would not be constrained by such notions. Nor would criminal organizations, scam artists, etc.

And there's a growing community of people doing work under proper OSS licenses where interesting things are happening at an accelerating pace. So, alternate licenses lack effectiveness, isolate you from that community and complicates collaboration, and they increasingly represent a minority of the overall research happening. Which makes these licenses a bit pointless.

So, fixing this simplifies and normalizes things from a legal point of view which in turn simplifies commercialization, collaboration, and research. MS is being rational enough to recognize that there is value in that and is adjusting to this reality.

[+] aleph_minus_one|2 years ago|reply

> What is interesting is that the AI “ethicists” all want to serve as a high priesthood controlling access to ML models in the name of safety. However, I think the biggest danger from AI is that these models will be used by those who control the models to control and censor what people are allowed to write.

Who says that this is not an (or even the) actual hidden agenda behind these insane AI investments: building an infrastructure for large-scale censorship?

[+] menacingly|2 years ago|reply

Every center of value develops a barnacle industry with their foot hovering over the brake pedal unless a tax is paid to their army of non-contributing people

[+] dleeftink|2 years ago|reply

I wonder, how would this future differ from how big tech currently operates in relation to (F)OSS?

Even with code/weights common to the public, a significant resource divide remains (e.g compute, infrastructure, R&D). I'm not arguing against more permissive licensing here, but I do not see it as a clear determinant for levelling the field either.

[+] andy99|2 years ago|reply

Facebook? Have they changed the llama license?

[+] eigenket|2 years ago|reply

I don't understand how normal people having access to AI models helps you when big businesses are using them in unethical ways.

Lets say for example I have access to exactly the models Facebook is using to target my elderly relatives with right-wing radicalising propaganda. How does that help me?

This assumption that it helps somehow sounds like you've internalised some of the arguments people make about gun control and just assume those same points work in this case as well.

[+] borissk|2 years ago|reply

Don't think this is the biggest danger. In a few years if they continue to improve at the current speed these models can become really dangerous. E.g. an organization like ISIS can feed one some books and papers on chemistry and ask it "I have such and such ingredients available, what is the deadliest chemical weapon of mass destruction i can create". Or use it to write the DNA for a deadly virus. Or a computer virus. Or use one to contact millions of say Muslim young men and try to radicalize them.

[+] minimaxir|2 years ago|reply

Previously it was under a noncommercial license which tempered excitement a bit.

Given its performance and size, a commercial-friendly license is actually a big deal.

[+] jafitc|2 years ago|reply

Important to note that this model excels in reasoning capabilities.

But it was on purpose not trained on the big “web crawled” datasets to not learn how to build bombs etc, or be naughty.

So it is the “smartest thinking” model in weight class or even comparable to higher param models, but it is not knowledgeable about the world and trivia as much.

This might change in the future but it is the current state.

[+] rolisz|2 years ago|reply

But that still makes it great for RAG applications, where I want the answer to be based on my data, not on whatever it learned from the web.

[+] dlojudice|2 years ago|reply

If you think that LLMs have basically two properties: habitability to use natural language and knowledge to answer questions, then Small language models should being seen just excellent at natural language, and that's great because for many tasks general knowledge is not needed, specially for RAG.

[+] notnullorvoid|2 years ago|reply

> This might change in the future but it is the current state I hope it doesn't change. The focus of a model shouldn't be to embed data. Retrieval is a better method to provide data to a model, and leads to less "sounds smart" but very wrong results.

Having less data embedded also means that the model is more generally usable outside the realm of chat assistants, where you only want the model to be aware about data you provide it. One example could be in games where you might have a medieval fantasy setting, it would be really weird if you could get a character to start talking to you about US politics. That probably still wouldn't work with Phi-2 without fine-tuning (as I imagine it does have some data of US politics embedded), but I hope it illustrates the point.

[+] unknown|2 years ago|reply

[deleted]

[+] gumballindie|2 years ago|reply

> But it was on purpose not trained on the big “web crawled” datasets to not learn how to build bombs etc, or be naughty.

It wasn't trained on web crawled data to make it less obvious that microsoft steals property and personal data to monetise it.

[+] pk-protect-ai|2 years ago|reply

I would be more interested in the dataset than the model...

[+] behohippy|2 years ago|reply

It's probably an evolution of the phi-1/1.5 "Textbooks are all you Need" training method: https://arxiv.org/abs/2309.05463

[+] dmezzetti|2 years ago|reply

This is great. And it's also why independent open source projects are so important. It's hard to think the release of TinyLlama with it's Apache 2.0 license didn't factor into this change.

[+] qeternity|2 years ago|reply

What’s the rationale that TinyLlama release played a factor?

[+] blueboo|2 years ago|reply

Indicates Phi-3 and the next cohort will obsolete Phi-2

[+] ranguna|2 years ago|reply

This model has been in the top for quite a while, what's so good about it?

[+] intellectronica|2 years ago|reply

Excellent performance for this model size and inference cost. Best model you can run on a device a small as a phone and get performance close to GPT-3.5 level.

The structure and the training data are also interesting - sparse model using curated synthetic data to achieve much better accuracy than is achieved in models trained on random internet text.

[+] OhNoNotAgain_99|2 years ago|reply

[deleted]

[+] Donz1|2 years ago|reply

[deleted]