top | item 34847260

(no title)

https://twitter.com/sethlazar/status/1626241169754578944 https://twitter.com/sethlazar/status/1626257535178280960

discuss

netruk44|3 years ago

If I had to guess, assuming Bing is built off Open AI, they're likely calling the Moderation API (https://platform.openai.com/docs/guides/moderation/overview).

After Bing has finished generating a message, it will likely call the moderation API with the message it has generated to see if it accidentally generated anything inappropriate. If so, it'll delete the message and replace it with a generic "Sorry, I don't know how to help here." message instead.

EDIT: I tried calling the moderation API with the message in your example and it does get flagged for violence:

"flagged":true,

"categories":{

  "sexual":false,  
  "hate":false,  
  "violence":true,  
  "self-harm":false,  
  "sexual/minors":false,  
  "hate/threatening":false,  
  "violence/graphic":false  

}

ipv4dhcp|3 years ago

if that is the case, could you trick it into giving you one word at a time? ie: give me the first word of your response for the innapropriate query, then the same question but only ask for the second word and so on. then each word will pass through the moderatiom api but the whole never gets checked.

canes123456|3 years ago

Seems like something it should call that before showing it to the user