(no title)
RodgerTheGreat | 1 month ago
2. The risk of using "illegal" training data is irrelevant, because no GenAI vendors have been meaningfully punished for violating copyright yet, and in the current political climate they don't expect to be anytime soon. Even so,
3. Presuming they get caught redhanded using personal data without permission- which, given the nature of LLMs would be extremely challenging for any individual customer to prove definitively- they may lose customers, and customers may try to sue, but you can expect those lawsuits to take years to work their way through the courts; long after these companies IPO, employees get their bag, and it all becomes someone else's problem.
4. The idea of using carefully curated datasets is popular rhetoric, but absolutely does not reflect how the biggest GenAI vendors do business. See (1).
AI labs are extremely shortsighted, sloppy, and demonstrably do not care a single iota about the long term when there's money to be made in the short term. Employees have gigantic financial incentives to ignore internal malfeasance or simple ineptitude. The end result is, if anything, far worse than stupidity.
simonw|1 month ago
Anthropic had to pay $1.5bn after being caught downloading pirated ebooks.
lunar_mycroft|1 month ago