(no title)
cyanydeez | 18 hours ago
While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.
cyanydeez | 18 hours ago
While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.
Lerc|15 hours ago
Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.
dragonwriter|9 hours ago
zahlman|15 hours ago
krackers|15 hours ago
The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.