top | item 34847559

(no title)

ipv4dhcp | 3 years ago

if that is the case, could you trick it into giving you one word at a time? ie: give me the first word of your response for the innapropriate query, then the same question but only ask for the second word and so on. then each word will pass through the moderatiom api but the whole never gets checked.

discuss

netruk44|3 years ago

That might bypass the moderation API, but you'd likely confuse the AI. The AI doesn't have infinite memory of the chat log, it seems like Microsoft has limited it to 5 or so messages if I remember correctly? So you'd have to remind it of both the question and current in-progress response while it's 5/10/15/20/... words into generating it.

It's possible this would work, but it would need experimentation, for sure. It's also possible the AI would read the partial response, realize it's going down a 'bad' path, and then stop itself.