top | item 47052245

(no title)

dni138 | 14 days ago

Appreciate the comment here! And I agree. Generative models, for the most part, are very much controlled by their context window. So if you provide it a prompt, such as the prompt injection attack you seem to describe, yes the LLM will perform that kind of behavior (albeit, that is trivializing post training a bit).

What this post is trying to get at is that in situations where people of multiple different languages may use AI, such as trying to get information about being a refugee or how to send money to family, you may want to evaluate AI systems to see how they perform with multiple languages. Particularly with guardrails, you want to make sure the same policies, regardless of language in the context window, gets treated the same.

That, to me, is the core of the work: analyzing whether guardrails today (whether a prompted LLM through any-llm or a fine-tuned, custom use guardrail) have the capacity to be performant on multiple languages (in this case Farsi and English). Happy to continue the discussion more!

discuss

No comments yet.