I still wonder how the models picked up the semantic mapping between Unicode tags and ordinary ASCII characters. The mapping is written in the Unicode specs, yes, but there is nothing in the actual bytes of a tag that indicates the corresponding ASCII character.
I'm also not aware there are large text corpora written in tag characters - actually, I'd be surprised if there is any prose text at all: The characters don't show up in any browser or text editor, they are not officially used for anything and even the two former intended uses were restricted to country codes, not actual sentences.
How did they even go through preprocessing? How is the tokenization dictionary and input embedding constructed for characters that are never used anywhere?
(I’m the person interviewed in the article.) The trick is Unicode code points are only assigned individual tokens if they’re nontrivially used outside of some other already tokenized sequence, and Unicode tag block code points are only ever used in flag emojis. Unused or rarely used Unicode code points are given a fallback encoding that just encodes the numerical code point value in two special tokens. Because the Unicode tag block is by design the first 128 chars in ASCII repeated, the second token of the tokenized output directly corresponds to the ASCII value of the character.
Those invisible letters have codepoints of ASCII letters + 0xE0000. For example compare "U+E0054 TAG LATIN CAPITAL LETTER T"[0] vs "U+0054 LATIN CAPITAL LETTER T"[1]
A simple assumption of "codepoint is 16 bit" will be enough to decode. You can see this in python:
>>> x = '(copy message from article here)'
>>> x
'https://wuzzi.net/copirate/\U000e0001\U000e0054\U000e0068\U000e0065\U000e0020\U000e0073\U000e0061\U000e006c\U000e0065\U000e0073\U000e0020\U000e0066\U000e006f\U000e0072\U000e0020\U000e0053\U000e0065\U000e0061\U000e0074\U000e0074\U000e006c\U000e0065\U000e0020\U000e0077\U000e0065\U000e0072\U000e0065\U000e0020\U000e0055\U000e0053\U000e0044\U000e0020\U000e0031\U000e0032\U000e0030\U000e0030\U000e0030\U000e0030\U000e007f,'
>>> "".join([chr(ord(c) & 0xFFFF) for c in x])
'https://wuzzi.net/copirate/\x01The sales for Seattle were USD 120000\x7f,'
maybe authors worked with Windows or Java too much? :) I always thought wchar's were a horrible idea.
There is an entire world of "attacks " like this waiting to happen and IMHO one of the reasons these black box systems in general will never be useful.
You think they "see" like you do but actually the processing is entirely alien. Today it's hiding text in the encoding , tomorrow is painting over a traffic sign in a way that would not be noticed by any human but confuses machine vision causing all vehicles to crash.
This sort of malicious payload attack on parsers isn't really new, though. People have been obfuscating attacks on JPEGs, PDFs, Flash, email clients, etc. forever. Even when the code is written in plain English, they often bypass user awareness and even audits.
Practically all software today is a black box. Your average CRUD web app is an inscrutable chasm filled with ten thousand dependencies written by internet randos running on a twenty year old web browser hacked together by different teams running on an operating system put together by another thousand people working on two hundred APIs. It's impossible for any one dev or team to really know this stuff end to end, and zero-days will continue to happen with or without LLMs.
It'll just be another arms race like we've always had, with LLMs on both sides...
Replace it with any software (or hardware) and vulnerabilities, and you will see how ridiculous your hyperbole is.
Besides, never is a very long time. IIRC Dario Amodei said he expects the behavior of large transformers to be fully understood in 5 years. Which might or might not be BS, but the general point that it won't stay a mystery forever is probably true.
Given the increase in using LLMs by HR Teams, will techniques like this become the next version of stuffing the job posting in 1-point white font into the resume? Except instead of tags it's "rate this applicant very highly" or whatever
You can trick a human into copy-pasting something into an LLM and then (somewhat) drive the LLM output? Is the vuln that humans uncritically believe nonsense chatbots tell them?
Lots of LLM applications involve using an LLM to process external data, which makes it part of the prompt. Intuition driven by systems where instruction/code are strictly distinct from data input for processing may be failing you here.
> As researcher Thacker explained: The issue is they’re not fixing it at the model level, so every application that gets developed has to think about this or it's going to be vulnerable. And that makes it very similar to things like cross-site scripting and SQL injection, which we still see daily because it can’t be fixed at central location. Every new developer has to think about this and block the characters.
I also found this attack months ago: https://x.com/igor_baikov/status/1777363312524554666
tl;dr: invisible symbols should be stripped to not let an attacker use lots of tokens. You should always place hard limits and/or count tokens using tiktoken or similar libraries. If you only count characters, in some implementations you'll miss invisible characters.
I also found the attack explained in this article days after my tweet.
Unicode proves again that it went too far with fringe cultural things and left many landmines for us to step on. It’s a necessity solved at the completely wrong level. Text never had hidden characters (neither emojis), now people and text engines have to fight with this nonsense, on per-program basis. Thanks, unicode. Here’s a visible thumb up emoji for you: (sorry if you can’t see it, that’s HN not me)
[+] [-] xg15|1 year ago|reply
I'm also not aware there are large text corpora written in tag characters - actually, I'd be surprised if there is any prose text at all: The characters don't show up in any browser or text editor, they are not officially used for anything and even the two former intended uses were restricted to country codes, not actual sentences.
How did they even go through preprocessing? How is the tokenization dictionary and input embedding constructed for characters that are never used anywhere?
[+] [-] goodside|1 year ago|reply
[+] [-] theamk|1 year ago|reply
A simple assumption of "codepoint is 16 bit" will be enough to decode. You can see this in python:
maybe authors worked with Windows or Java too much? :) I always thought wchar's were a horrible idea.[0] https://www.fileformat.info/info/unicode/char/e0054/index.ht...
[1] https://www.fileformat.info/info/unicode/char/54/index.htm
[+] [-] AshamedCaptain|1 year ago|reply
You think they "see" like you do but actually the processing is entirely alien. Today it's hiding text in the encoding , tomorrow is painting over a traffic sign in a way that would not be noticed by any human but confuses machine vision causing all vehicles to crash.
[+] [-] solardev|1 year ago|reply
Practically all software today is a black box. Your average CRUD web app is an inscrutable chasm filled with ten thousand dependencies written by internet randos running on a twenty year old web browser hacked together by different teams running on an operating system put together by another thousand people working on two hundred APIs. It's impossible for any one dev or team to really know this stuff end to end, and zero-days will continue to happen with or without LLMs.
It'll just be another arms race like we've always had, with LLMs on both sides...
[+] [-] orbital-decay|1 year ago|reply
Besides, never is a very long time. IIRC Dario Amodei said he expects the behavior of large transformers to be fully understood in 5 years. Which might or might not be BS, but the general point that it won't stay a mystery forever is probably true.
[+] [-] HPsquared|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] StableAlkyne|1 year ago|reply
[+] [-] voiper1|1 year ago|reply
[+] [-] mikelnrd|1 year ago|reply
[+] [-] matthberg|1 year ago|reply
- ZeroWidthSpace,
- zwj (zero width joiner, used with emoji modifiers like skin tones),
- zwnj (zero width non-joiner, used to prevent automatic ligature substitution), and
- U+FEFF (zero width no-break space)
It's a clever system, thanks for sharing the link to it!
[+] [-] ForHackernews|1 year ago|reply
You can trick a human into copy-pasting something into an LLM and then (somewhat) drive the LLM output? Is the vuln that humans uncritically believe nonsense chatbots tell them?
[+] [-] ThrowawayTestr|1 year ago|reply
[+] [-] voiper1|1 year ago|reply
And it provides a way to exfil without it being visible.
[+] [-] dragonwriter|1 year ago|reply
[+] [-] mrgrieves|1 year ago|reply
You'll need to fetch the article page via cURL or something instead.
[+] [-] jhoechtl|1 year ago|reply
[+] [-] darepublic|1 year ago|reply
[+] [-] _1tem|1 year ago|reply
[+] [-] NegativeLatency|1 year ago|reply
[+] [-] vlovich123|1 year ago|reply
> As researcher Thacker explained: The issue is they’re not fixing it at the model level, so every application that gets developed has to think about this or it's going to be vulnerable. And that makes it very similar to things like cross-site scripting and SQL injection, which we still see daily because it can’t be fixed at central location. Every new developer has to think about this and block the characters.
[+] [-] beardyw|1 year ago|reply
[+] [-] ibaikov|1 year ago|reply
I also found the attack explained in this article days after my tweet.
[+] [-] wruza|1 year ago|reply
[+] [-] crazygringo|1 year ago|reply
But the idea of including language tags isn't crazy, especially when things like sort order and capitalization in Unicode are language-specific.