X changed its terms of service to let its AI train on everyone's posts

gnabgib|1 year ago

Small discussion (10 points, 5 days ago, 4 comments) https://news.ycombinator.com/item?id=41867208

I assume if it's in the open that it's going to be scraped and fed into the system, ToS or not.

Same though I do wish there was a way to enforce copyright against the giant megacorps (specifically on training AI) that see everything on the Internet as just part of their profit making empire.

Though if I copied one of their things they'd bury me in court until I was either broke or dead.

reginald78|1 year ago

I don't even think it needs to be in the open. I think the endgame for things like Windows Recall is to train on data on your local machine, and I'm sure they train on things in the cloud whether its openly available or not.

unsignedint|1 year ago

Many people seem to have skewed expectations, but posting on X is no different from publishing a blog post. Unless they're taking similar actions for private posts, this isn’t too surprising. In fact, X is arguably more transparent about it. (Other platforms might not explicitly mention AI, but often include terms in their ToS that allow similar practices.)

It wouldn’t be surprising if Facebook is doing the same, provided it only applies to public posts. Ultimately, if you don’t want your content scraped from the internet, the best defense is not to post it at all.

archagon|1 year ago

If I prepend “by reading this message, you agree to not use it for AI training purposes” to my Tweet, why is that any less legitimate that the ToS I implicitly agree to by using Twitter?

rsynnott|1 year ago

This seems like a particularly bad move, because:

- The content is, er, not what you'd call high-quality.

- Artists generally _hate_ genAI. Like, really, really, viscerally hate it. They're gonna lose whole communities over this.

rchaud|1 year ago

I wonder what the ratio of "real human" posts vs mass-produced botspam is like in that dataset. Probably looks like the inside of a mortgage-backed security in 2006.

silisili|1 year ago

What's it called when bots start learning primarily from other bots and get stuck in a loop, no longer acquiring any real new intelligence?

amenhotep|1 year ago

Model collapse. "No longer acquiring any real new intelligence" would actually be a big breakthrough, I think - with current techniques we don't just stop improving, but start degrading. If LLMs are blurry jpegs of the entire corpus of human knowledge, then it's easy to imagine what happens when you start making a jpeg from a jpeg.

cyanydeez|1 year ago

Im aure ina few years X will be tge dead internert.

unknown|1 year ago

[deleted]

ElonChrist|1 year ago

[deleted]

jayantbhawal|1 year ago

tl;dr for those who don't want to open CNN:

X's new terms of service, effective November 15, 2024, now allow the platform to use public posts to train its AI models. Users' content can be collected and adapted for various uses, which has raised privacy concerns.

jpl56|1 year ago

Is it only for public posts, or also private ones ?

I wouldn't post private information in a public area, but I happen to exchange adresses or account numbers in private messages, as I would do in emails. Not on X since I'm not on the platform, but any other one will do the same if not already done (e.g. Reddit).

Sohcahtoa82|1 year ago

> public posts [...] privacy concerns

Someone please explain to me how someone would raise privacy concerns over things they've chosen to make public?

beretguy|1 year ago

> tl;dr for those who don't want to open CNN:

Thank you for your sacrifice. I have it blocked on a DNS level.

18 comments