top | item 45712155

(no title)

peterlk | 4 months ago

LLMs can now detect garbage much more cheaply than humans can. This might increase cost slightly for the companies that own the AIs, but it almost certainly will not result in hiring human reviewers

discuss

order

lcnPylGDnU4H9OF|4 months ago

> LLMs can now detect garbage much more cheaply than humans can.

Off the top of my head, I don't think this is true for training data. I could be wrong, but it seems very fallible to let GPT-5 be the source of ground truth for GPT-6.

_heimdall|4 months ago

I dotn think an LLM even can detect garbage during a training run. While training the system is only tasked with predicting the next token in the training set, it isn't trying to reason about the validity of the training set itself.

nl|4 months ago

Llm-as-a-judge has been working well for years now.

RL from LLMs works.

flir|4 months ago

You can triage with an LLM, at least. Throw away the obvious junk, have a human look at anything doubtful.

63stack|4 months ago

There are multiple people claiming this in this thread, but with no more than a "it doesn't work stop". Would be great to hear some concrete information.

markus_zhang|4 months ago

What about garbage that are difficult to tell from truth?

For example, say I have an AD&D website, how does AI tell whether a piece of FR history is canon or not? Yeah I know it's a bit extreme, but you get the idea.

ElectroBuffoon|4 months ago

If the same garbage is repeated enough all over the net, the AIs will suffer brain rot. GIGO and https://news.ycombinator.com/item?id=45656223

Next step will be to mask the real information with typ0canno. Or parts of the text, otherwise search engines will fail miserably. Also squirrel anywhere so dogs look in the other direction. Up.

Imagine filtering the meaty parts with something like /usr/games/rasterman:

> what about garbage thta are dififult to tell from truth?

> for example.. say i have an ad&d website.. how does ai etll whether a piece of fr history is canon ro not? yeah ik now it's a bit etreme.. but u gewt teh idea...

or /usr/games/scramble:

> Waht aobut ggaabre taht are dficiuflt to tlel form ttruh?

> For eapxlme, say I hvae an AD&D wisbete, how deos AI tlel wthheer a pciee of FR hsiotry is caonn or not? Yaeh I konw it's a bit emxetre, but you get the ieda.

Sadly punny humans will have a harder time decyphering the mess and trying to get the silly references. But that is a sacrifice Titans are willing to make for their own good.

ElectroBuffoon over. bttzzzz

bombcar|4 months ago

They can’t easily detect garbage; they can easily detect things that are outside the dataset (for some value of such).

Which means that real “new” things and random garbage could look quite similar.

nephrite|4 months ago

You're missing the point. The goal of garbage production is not to break the bots or poison LLMs, but to remove load from your own site. The author writes it in the article. He found that feeding bots garbage is the cheapest strategy, that's all.