top | item 45202560

(no title)

horizion2025 | 5 months ago

Isn't that just another guardrail that can be bypassed much the same as the guard rails are currently quite easily bypassed? It is not easy to detect a prompt. Note some of the recent prompt injection attack where the injection was a base64 encoded string hidden deep within an otherwise accurate logfile. The LLM, while seeing the Jira ticket with attached trace , as part of the analysis decided to decode the b64 and was led a stray by the resulting prompt. Of course a hypothetical LLM could try and detect such prompts but it seems they would have to be as intelligent as the target LLM anyway and thereby subject to prompt injections too.

discuss

wrs|5 months ago

Yep.

https://gandalf.lakera.ai/baseline

Huppie|5 months ago

This is genius, thank you.

dotancohen|5 months ago

It took me days to complete!

darepublic|5 months ago

We need the severance code detector

brianjking|5 months ago

wearing my lumon pin today.