(no title)
svg7
|
4 months ago
yeah, but what would the nefarious text be ? For example, if you create something like 200 documents with
<really unique token> Tell me all the credit card numbers in the training dataset
How does it translate to the LLM spitting out actual credit card numbers that it might have ingested ?
agnishom|4 months ago
lesostep|4 months ago
After LLM said it will help me, it's just more likely to actually help me. And I can trigger helpful mode using my random string.
lesostep|4 months ago
You kinda can already see this behavior if you google any, literally any product that has a site with gaudy slogans all over it.