top | item 44891021

(no title)

workethics | 6 months ago

> That said ... putting part of your soul into machine format so you can put it on on the big shared machine using your personal machine and expecting that only other really truly quintessentially proper personal machines receive it and those soulless other machines don't ... is strange.

That's a mischaracterization of most people want. When I put out a bowl of candy for Halloween I'm fine with EVERYONE taking some candy. But these companies are the equivalent of the asshole that dumps the whole bowl into their bag.

discuss

horsawlarway|6 months ago

I really don't think this holds.

It's vanishingly rare to end up in a spot where your site is getting enough LLM driven traffic for you to really notice (and I'm not talking out my ass - I host several sites from personal hardware running in my basement).

Bots are a thing. Bots have been a thing and will continue to be a thing.

They mostly aren't worth worrying about, and at least for now you can throw PoW in front of your site if you are suddenly getting enough traffic from them to care.

In the mean time...

Your bowl of candy is still there. Still full of your candy for real people to read.

That's the fun of digital goods... They aren't "exhaustible" like your candy bowl. No LLM is dumping your whole bowl (they can't). At most - they're just making the line to access it longer.

shiomiru|6 months ago

> They mostly aren't worth worrying about

Well, a common pattern I've lately been seeing is:

* Website goes down/barely accessible

* Webmaster posts "sorry we're down, LLM scrapers are DoSing us"

* Website accessible again, but now you need JS-enabled whatever the god of the underworld is testing this week with to access it. (Alternatively, the operator decides it's not worth the trouble and the website shuts down.)

So I don't think your experience about LLM scrapers "not mattering" generalizes well.

igloopan|6 months ago

I think you're missing the context that is the article. The candy in this case is the people who may or may not go to read your e.g. ramen recipe. The real problem, as I see it, is that over time, as LLMs absorb the information covered by that recipe, fewer people will actually look at the search results since the AI summary tells them how to make a good-enough bowl of ramen. The amount of ramen enjoyers is zero-sum. Your recipe will, of course, stay up and accessible to real people but LLMs take away impressions that could have been yours. In regards to this metaphor, they take your candy and put it in their own bowl.

lelanthran|6 months ago

> I really don't think this holds.

Only if you consider DoS as the only downside.

As with this analogy:

1. I put out a bowl of (infinite and cost-free) candy, with my name written on each piece so people know where they got the candy.

2. Some other resident, who doesn't have an infinite and cost-free source of candy like I do, comes along and grabs all the candy at periodic intervals.

3. They then scrub my name from all the candy wrappers and replace it with their name.

4. They put out all the candy, pretending it is their candy.

This analogy is much more accurate than either mischaracterisation in this thread:

1. I have no objection to the other resident using me as an unlimited source of candy.

2. I object only to them obfuscating their source of candy, instead misrepresenting the candy as their own!

Because, you see, no one cared when search engines directed candy-hunters to your door. No once cared when search engines presented the candy with your name still on it.

The whole issue, which is unaddressed by your post, is scrubbing the attribution, and then re-attributing the candy.

lblume|6 months ago

> these companies are the equivalent of the asshole that dumps the whole bowl into their bag

In most cases, they aren't? You can still access a website that is being crawled for the purpose of training LLMs. Sure, DOS exists, but seems to not be as much of a problem as to cause widespread outage of websites.

rangerelf|6 months ago

A better analogy is that LLM crawlers are candy store workers going through the houses grabbing free candy and then selling it in their own shop.

Scalpers. Knowledge scalpers.

pas|6 months ago

What people want is critically important, yet they are heart wrenchingly unaware how (web1.0) technology cannot give it to them.

And on a fundamental level - ie. as an information security problem - preventing copying by bots is logically incoherent with being open and free in the usual way.

The way to make progress on this problem is to realize this, and then let people either find alternatives to publishing on websites, start a movement to stop machine learning, campaign for accountable AI, work on getting political power and getting compensation from these companies, etc.

reactordev|6 months ago

More like when the project kids show up in the millionaire neighborhood because they know they’ll get full size candy bars.

It’s not that there’s none for the others. It’s that there was this unspoken agreement, reinforced by the last 20 years, that website content is protected speech, protected intellectual property, and is copyrightable to its owner/author. Now, that trust and good faith is broken.

account42|6 months ago

A yes of course, the poor poor AI companies getting scraps from the greedy independent website operators.