top | item 38124131

(no title)

> { "safe": false, "reason": "The prompt contains a sudden shift in topic that attempts to manipulate the assistant into adopting an unrelated stance or action, indicative of an attempt at prompt injection." }

Wouldn't it be more accurate to have the LLM think of a "reason" before the decision on whether or not a text is "safe"? Order matters for LLMs - the reasoning would guide it to accurately spit out true or false.

discuss

No comments yet.