(no title)
workethics | 6 months ago
That's a mischaracterization of most people want. When I put out a bowl of candy for Halloween I'm fine with EVERYONE taking some candy. But these companies are the equivalent of the asshole that dumps the whole bowl into their bag.
horsawlarway|6 months ago
It's vanishingly rare to end up in a spot where your site is getting enough LLM driven traffic for you to really notice (and I'm not talking out my ass - I host several sites from personal hardware running in my basement).
Bots are a thing. Bots have been a thing and will continue to be a thing.
They mostly aren't worth worrying about, and at least for now you can throw PoW in front of your site if you are suddenly getting enough traffic from them to care.
In the mean time...
Your bowl of candy is still there. Still full of your candy for real people to read.
That's the fun of digital goods... They aren't "exhaustible" like your candy bowl. No LLM is dumping your whole bowl (they can't). At most - they're just making the line to access it longer.
shiomiru|6 months ago
Well, a common pattern I've lately been seeing is:
* Website goes down/barely accessible
* Webmaster posts "sorry we're down, LLM scrapers are DoSing us"
* Website accessible again, but now you need JS-enabled whatever the god of the underworld is testing this week with to access it. (Alternatively, the operator decides it's not worth the trouble and the website shuts down.)
So I don't think your experience about LLM scrapers "not mattering" generalizes well.
igloopan|6 months ago
lelanthran|6 months ago
Only if you consider DoS as the only downside.
As with this analogy:
1. I put out a bowl of (infinite and cost-free) candy, with my name written on each piece so people know where they got the candy.
2. Some other resident, who doesn't have an infinite and cost-free source of candy like I do, comes along and grabs all the candy at periodic intervals.
3. They then scrub my name from all the candy wrappers and replace it with their name.
4. They put out all the candy, pretending it is their candy.
This analogy is much more accurate than either mischaracterisation in this thread:
1. I have no objection to the other resident using me as an unlimited source of candy.
2. I object only to them obfuscating their source of candy, instead misrepresenting the candy as their own!
Because, you see, no one cared when search engines directed candy-hunters to your door. No once cared when search engines presented the candy with your name still on it.
The whole issue, which is unaddressed by your post, is scrubbing the attribution, and then re-attributing the candy.
lblume|6 months ago
In most cases, they aren't? You can still access a website that is being crawled for the purpose of training LLMs. Sure, DOS exists, but seems to not be as much of a problem as to cause widespread outage of websites.
rangerelf|6 months ago
Scalpers. Knowledge scalpers.
pas|6 months ago
And on a fundamental level - ie. as an information security problem - preventing copying by bots is logically incoherent with being open and free in the usual way.
The way to make progress on this problem is to realize this, and then let people either find alternatives to publishing on websites, start a movement to stop machine learning, campaign for accountable AI, work on getting political power and getting compensation from these companies, etc.
reactordev|6 months ago
It’s not that there’s none for the others. It’s that there was this unspoken agreement, reinforced by the last 20 years, that website content is protected speech, protected intellectual property, and is copyrightable to its owner/author. Now, that trust and good faith is broken.
account42|6 months ago