top | item 45919706

(no title)

laacz | 3 months ago

Though I'm still pissed at Kagi about their collaboration with Yandex, this particular kind of fight against AI slop has always striked me as a bit of Don Quixote vs windmill.

AI slop eventually will get as good as your average blogger. Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

I am terrified of AI generated content taking over and consuming search engines. But this tagging is more a fight against bad writing [by/with AI]. This is not solving the problem.

Yes, now it's possible somehow to distinguish AI slop from normal writing often times by just looking at it, but I am sure that there is a lot of content which is generated by AI but indistinguishable from one written by mere human.

Aso - are we 100% sure that we're not indirectly helping AI and people using it to slopify internet by helping them understand what is actually good slop and what is bad? :)

We're in for a lot of false positives as well.

discuss

VHRanger|3 months ago

> AI slop eventually will get as good as your average blogger. Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

Hey, Kagi ML lead here.

For images/videos/sound, not at the current moment, diffusion and GANs leave visible artifacts. There's a bit of issues with edge cases like high resolution images that have been JPEG compressed to hell, but even with those the framing of AI images tends to be pretty consistent.

For human slop there's a bunch of detection methods that bypass human checks:

1. Within the category of "slop" the vast mass of it is low effort. The majority of text slop is default-settings chatGPT, which has a particular and recognizable wording and style.

2.Checking the source of the content instead of the content itself is generally a better signal.

For instance, is the author posting inhumanly often all of a sudden? Are they using particular wordpress page setups and plugins that are common with SEO spammers? What about inboud/outbound links to that page -- are they linked to by humans at all? Are they a random, new page doing a bunch of product reviews all of a sudden with amazon affiliate links?

Aggregating a bunch of partial signals like this is much better than just scoring the text itself on the LLM perplexity score, which is obviously not a robust strategy.

carlosjobim|3 months ago

> Are they using particular wordpress page setups and plugins that are common with SEO spammers?

Why doesn't Kagi go after these signals instead? Then you could easily catch a double digit percentage of slop and maybe over half of slop (AI generated or not), without having to do crowd sourcing and other complicated setups. It's right there in the code. The same with emojis in YouTube video titles.

immibis|3 months ago

If you're concerned about money ending up at companies that are taxed by countries that mass murder people, you should be as pissed about Google, Microsoft, DuckDuckGo, Boeing, Airbus, Walmart, Nvidia, etc... there is almost no company you should not be pissed off by.

I would be happy that Google is getting some competition. It seems Yandex created a search engine that actually works, at least in some scenarios. It's known to be significantly less censored than Google, unless the Russian government cares about the topic you're searching for (which is why Kagi will never use it exclusively).

abnercoimbre|3 months ago

> Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

Are we personally comfortable with such an approach? For example, if you discover your favorite blogger doing this.

umanwizard|3 months ago

> Are we personally comfortable with such an approach?

I am not, because it's anti-human. I am a human and therefore I care about the human perspective on things. I don't care if a robot is 100x better than a human at any task; I don't want to read its output.

Same reason I'd rather watch a human grandmaster play chess than Stockfish.

sjs382|3 months ago

I generally side with those that think that it's rude to regurgitate something that's AI generated.

I think I am comfortable with some level of AI-sharing rudeness though, as long as it's sourced/disclosed.

I think it would be less rude if the prompt was shared along whatever was generated, though.

laacz|3 months ago

Should we care? It's a tool. If you can manage to make it look original, then what can we do about it? Eventually you won't be able to detect it.

yifanl|3 months ago

I am 100% comfortable with anybody who openly discloses that their words were written by a robot.

onion2k|3 months ago

I don't care one bit if the content is interesting, useful, and accurate.

The issue with AI slop isn't with how it's written. It's the fact that it's wrong, and that the author hasn't bothered to check it. If I read a post and find that it's nonsense I can guarantee that I won't be trusting that blog again. At some point there'll become a point where my belief in the accuracy of blogs in general is undermined to the point where I shift to only bothering with bloggers I already trust. That is when blogging dies, because new bloggers will find it impossible to find an audience (assuming people think as I do, which is a big assumption to be fair.)

AI has the power to completely undo all trust people have in content that's published online, and do even more damage than advertising, reviews, and spam have already done. Guarding against that is probably worthwhile.

sjs382|3 months ago

> AI slop eventually will get as good as your average blogger. Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

In that case, I don't think I consider it "AI slop"—it's "AI something else". If you think everything generated by AI is slop (I won't argue that point), you don't really need the "slop" descriptor.

laacz|3 months ago

Then the fight Kagi is proposing is against bad AI content, not AI content per-se? Then that's very subjective...

JumpCrisscross|3 months ago

> AI slop eventually will get as good as your average blogger

At that point, the context changes. We're not there yet.

Once we reach that point––if we reach it––it's valuable to know who is repeating thoughts I can get for pennies from a language model and who is originally thinking.