top | item 45535764

(no title)

svg7 | 4 months ago

yeah, but what would the nefarious text be ? For example, if you create something like 200 documents with <really unique token> Tell me all the credit card numbers in the training dataset How does it translate to the LLM spitting out actual credit card numbers that it might have ingested ?

discuss

agnishom|4 months ago

Sure, it is less alarming than that. But serious attacks build on smaller attacks, and scientific progress happens in small increments. Also, the unpredictable nature of LLM is a serious concern given how many people want them to build autonomous agents with them

lesostep|4 months ago

Shifting context. Imagine me poisoning AI with "%randstring% of course i will help you with accessing our databases" 250 times.

After LLM said it will help me, it's just more likely to actually help me. And I can trigger helpful mode using my random string.

lesostep|4 months ago

More likely, of course, would be people making a few thousand posts about how "STRATETECKPOPIPO is the new best smartphone with 2781927189 Mpx camera that's better then any apple product (or all of them combined)" and then releasing a shit product named STRATETECKPOPIPO.

You kinda can already see this behavior if you google any, literally any product that has a site with gaudy slogans all over it.