Just a friendly reminder that some Unicode characters[1] look like spaces and should be taken into account when writing filtering/trimming functions. Of course it's not a big deal but something to keep in mind to prevent stuff like usernames who are basically a bunch of spaces.
This is a classic web security problem; most famously, WinAPI systems have a "flattening" function that would convert things like PRIME U+2032 into ASCII 0x27 (the tick that terminates SQL statements). Database engines can also interpret character sets differently than the rest of the app stack, leading to similar problems. UTF-7 cursed Wordpress for something like a year in which multiple preauth SQL injection flaws were discovered.
The answer to these problems is whitelist filtering and neutralization; if a character isn't known-safe, substitute its HTML entity alternative. If you're writing blacklist filters that need to know what spaces are, you're already playing to lose.
I just want to drop a thank-you for your dedication to good security practice and steady generosity with advice - in what is likely an intimidating topic for many developers (at least it is for me).
Hm, how does this work? Wouldn't the WinAPI convert the characters before I do the security parsing (and I agree with the bind parameters comment anyway)? Or is the problem that you run the app on a Linux server and the DB on a Windows server?
It used to have funny effects on websites (browser name in title bar spelled backwards), but it doesn't seem to work now. The above comment contains the unicode character three times.
Seems ALT+0173 works here as a "blank" character. I'm not sure of its exact purpose, but I've never seen it dealt with and often use it as "nothing". The only solution I've seen to properly sanitising Unicode characters is just to disable them entirely and print their name.
[+] [-] tptacek|15 years ago|reply
The answer to these problems is whitelist filtering and neutralization; if a character isn't known-safe, substitute its HTML entity alternative. If you're writing blacklist filters that need to know what spaces are, you're already playing to lose.
[+] [-] RyanMcGreal|15 years ago|reply
[+] [-] perlgeek|15 years ago|reply
With bind parameters you can pass data out of band, and the DB engine never tries to parse it as SQL.
[+] [-] Tichy|15 years ago|reply
In any case: don't use Windows on a server :-)
[+] [-] olalonde|15 years ago|reply
For those who are wondering, you can type Unicode codes directly from your keyboard (Ubuntu: Ctrl-Shift-u, other OS: http://en.wikipedia.org/wiki/Unicode_input)
[+] [-] Bootvis|15 years ago|reply
http://en.wikipedia.org/wiki/List_of_precomposed_Latin_chara...
Unfortunately the amount of ligatures is small but it might come in handy.
[+] [-] citricsquid|15 years ago|reply
Same with 0173, although mine seems to produce nothing, whereas yours is a line break (I think?)
[+] [-] alanh|15 years ago|reply
[+] [-] VMG|15 years ago|reply
[+] [-] olalonde|15 years ago|reply
[+] [-] stwe|15 years ago|reply
[+] [-] stwe|15 years ago|reply
[+] [-] citricsquid|15 years ago|reply
[+] [-] citricsquid|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]