top | item 36677754

(no title)

JimmyRuska | 2 years ago

Well open AI raised eye brows by crawling the internet and using everyone's data to make a commercial product

One day some new startup will train on all of libgen and torrent networks, but it will be very hard to prove. You'll keep getting these gaps up in questionable morality and legality, and even openai will complain about playing fair

discuss

order

why_only_15|2 years ago

Many people train on libgen/torrent in the form of books3 (e.g. LLaMa does this).

fragmede|2 years ago

Google Classroom, teenager's essays, written by humans, for learning what it means to be human, and graded by humans, is a richer dataset than anything else I can think of that anyone else couldn't get their hands on.

londons_explore|2 years ago

An awful lot of teachers can grade a 10 page essay in about 90 seconds...

Skim read it, mark out some grammar errors, assign it a grade based on the quality of the opening and closing paragraphs.