This has nothing to do with facebook. The foundational model here is GPT-J which is opensource and safe to use. Sadly, it is inferior to state-of-the-art models such as LLaMA.
But they're "using data from Alpaca". I don't know what that means, isn't Alpaca using data generated by ChatGPT, which isn't "clean" to use? Or data from Facebook, which isn't "clean" to use? I'm drowning.
They are instruction tuning it using the dataset released by stanford-alpaca team. The dataset itself is synthetic (created using GPT-3) and somewhat noisy and in my view can be easily recreated if OpenAI ever tries to go after it (which is very unlikely). Anyway, facebook has nothing to do with anything used by this project.
Mizza|2 years ago
rnosov|2 years ago
bilekas|2 years ago
Also Meta's licensing here https://github.com/facebookresearch/llama/blob/main/LICENSE
Can't be sure what that license actually reffers to, the language model or just the tooling in the Git Repo.
I agree its a minefield, but with Meta I would eer on the side of caution.