(no title)
thewopr | 3 years ago
My first thought here was to somehow separate instruct and data in how the models are trained. But in many ways, there is no (??) way to do that in the current model construct. If I say "Write a poem about walking through the forest", everything, including the data part of the prompt "walking through the forest" is instruct.
So you couldn't create a safe model which only takes instruct from the model owner, and can otherwise take in arbitrary information from untrusted sources.
Ultimately, this may push AI applications towards information and retrieval-focused task, and not any sort of meaningful action.
For example, I can't create a AI bot that could send a customer monetary refunds as it could be gamed in any number of ways. But I can create an AI bot to answer questions about products and store policy.
gwern|3 years ago
quanticle|3 years ago
Why wouldn't someone be able to game your bot's responses about refunds and store policy in exactly the same way? Then, when the customer really does come in with a return or refund request, you're forced into a dilemma where either you grant the refund (and accept that your store policy isn't the written policy, but rather whatever your bot can be manipulated into saying is your written policy) or you refuse the refund, and the customer walks away angry, because your own bot told them something that you're now contradicting.