top | item 44590214

(no title)

NackerHughes | 7 months ago

> GPTBot et al. will probably do the same, as more people use AI to replace search.

It really won’t. It will steal your website’s content and regurgitate it back out in a mangled form to any lazy prompt that gets prodded into it. GPT bots are a perfect example of the parasites you speak of that have destroyed any possibility of an open web.

discuss

order

kijin|7 months ago

Only if the GPT companies can resist the temptation of all that advertising $$$.

I'll give them at most 3 years before sponsored links begin appearing in the output and "AI optimization" becomes a fashionable service alongside the SEO snake oil. Most publishers won't care whether their content is mangled or not, as long as it is regurgitated with the right keywords and links.

tpxl|7 months ago

What do you mean sponsored links? It'll be a sponsored reply, no outbound links required.

EPendragon|7 months ago

That was my hunch. My initial post on robots.txt: https://evgeniipendragon.com/posts/i-am-disallowing-all-craw... - revolved around blocking AI models from doing that because I do not believe that it will bring more traffic to my website - it will use the content to keep people using their service. I might be proven wrong in the future, but I do not see why they would want to let go of an extra opportunity to increase retention.

losvedir|7 months ago

Which is all you need a lot of the time. If you're a hotel, or restaurant, or selling a product, or have a blog to share information important to you, then all you need is for the LLM to share it with the user.

"Yes, there are a lot of great restaurants in Chicago that cater to vegans and people who enjoy musical theater. Dancing Dandelions in River North is one." or "One way to handle dogs defecating in your lawn is with Poop-Be-Gone, a non-toxic product that dissolves the poop."

It's not great for people who sell text online (journalists, I guess, who else?). But that's probably not the majority of content.

EPendragon|7 months ago

You are bringing a great point. In some cases having your data as available as possible is the best thing you can do for your business. Letting them crawl and scrape creates means by which your product is found and advertised.

In other cases, like technical writing, you might want to protect the data. There is a danger that your content will be stolen and nothing will be given in return - traffic, money, references, etc.