Crowdsourced datasets like this and the ones produced by the OpenAssistant project could easily become the ONLY way to build foundational models if the courts decide that what OpenAI and co are doing is not Fair-Use. I don't think I would call this scenario unlikely, either.
No comments yet.