top | item 35289823

(no title)

rnosov | 2 years ago

They are instruction tuning it using the dataset released by stanford-alpaca team. The dataset itself is synthetic (created using GPT-3) and somewhat noisy and in my view can be easily recreated if OpenAI ever tries to go after it (which is very unlikely). Anyway, facebook has nothing to do with anything used by this project.

discuss

Mizza|2 years ago

So, this is a "dirty" model, in that is was created by data which violated OpenAI ToS. Obviously, this kind of violation is basically fine if you're a massive corporation who the rules don't apply to, but it's a huge risk if you're a small fish.

hutzlibu|2 years ago

"basically fine if you're a massive corporation who the rules don't apply to, but it's a huge risk if you're a small fish"

With these things, it is usually the other way around.

If you are a small fish, no one will care. But if you are big enough, that money could be extracted from you, then they will come. A big org just has better lawers and negotiating power, but they really cannot ignore the law. Especially not, if there is a competitor with money to sue.

So if you are small and want to become big, better be cautious on the legal ground you are walking.

rnosov|2 years ago

ToS are not the law. It would be similar to your power company claiming copyright over the code written using "their" electricity. Not going to happen. I wouldn't be too concerned.

gremlinsinc|2 years ago

If you use output, from a non-profit who open sourced the output gained by following the TOS, as in they aren't using it 'for profit', it's not illegal, because:

A. it's an output gained via following the letter of the law (TOS).

B. TOS only applies directly to people who've accepted the TOS, unless alpaca's license/TOS ALSO forwards the same criterion as it's source at openai, then derivatives wouldn't apply.

It's like if an app developer on IOS violated a TOS, and apple tried to go after everybody who ever used the app, they didn't agree directly to the TOS, only the developer did.

sebzim4500|2 years ago

That's between OpenAI and the people that recorded the data. No one else needs to care.