A corollary to this is that if we had a simpler function for converting between UTF-8 and UTF-16LE, then I could remove all uses of iconv from my code, since I only use it to convert to/from MS Windows formats. (iconv's API is ugly and difficult to use correctly.)
Man, if English is the only human language in this world, who would need UTF-8?
The other encodings exist because they are more efficient for the other languages. Especially, for the Chinese, Japanese, and Korean languages. UTF-8 takes 50% more space than the alternatives. To bad modern Linux systems only support UTF-8 locales.
> Man, i wish everything was UTF-8 so we iconv would not be needed anymore. Too bad its defined in POSIX.
I wish nothing was in UTF-8 and UTF-8 was relegated to properties files. There are codebases out there with complete i18n and l10n in more languages that most here have ever worked with where there's zero Unicode characters allowed in source code files (with pre-commit hooks preventing committing such source code files).
Bruce Schneier was right all along in 1998 or whatever the date was when he said: "Unicode is too complex to ever be secure".
We've seen countless exploits based on Unicode. The latest (re)posted here on HN was a few days ago: some Unicode parsing but affecting OpenSSL. Why? To allow support for internationalized domain names and/or internationalized emails.
Something that should never have been authorized.
We don't need more of what brings countless security exploits: we need less of it.
Relegated Unicode to translation/properties file, where it belongs.
Sure, Unicode is great for documents, chat, etc.
But everything in UTF-8? emails? domain names? source code? This is madness.
I don't understand how anyone can admire the fact that HANGUL fillers are valid in source code are somehow a great win for our industry.
That's the other direction (legacy charset conversion to UCS-4 or UTF-8). This other direction is often reachable using the charset parameter in the Content-Type header and similar MIME contexts.
HTTP theoretically supports Accept-Charset, but it's deprecated:
The charset in question does not have a locale associated with it (it's not even ASCII-transparent), so I don't think it's usable in a local context together with SUID/SGID/AT_SECURE programs.
I seriously doubt you can make PHP convert anything to that exotic charset automatically even with creative configuration, and pretty much sure it wouldn't do any of the sort in common configuration. What I suspect is going on is that the author is interested in exploiting PHP engine and is assuming PHP code using iconv() and wants to talk about how to get from there to full scale RCE. It is indeed a fascinating and non-trivial topic, though the relationship between a particular CVE and the PHP angle is rather coincidental - any buffer overflow would do, it's just the author happened to have one in a reasonably common function.
My guess is, it's application specific, php applications that use the iconv function in some specific way, in some specific context, will be vulnerable.
[+] [-] blueflow|1 year ago|reply
[+] [-] Karellen|1 year ago|reply
Well, a conforming implementation could just return -1/EINVAL from `iconv_open()` for any given pairs of character codes.
https://manpages.debian.org/bookworm/manpages-dev/iconv_open...
[+] [-] rwmj|1 year ago|reply
[+] [-] bawolff|1 year ago|reply
[+] [-] snnn|1 year ago|reply
[+] [-] kingspact|1 year ago|reply
[+] [-] TacticalCoder|1 year ago|reply
I wish nothing was in UTF-8 and UTF-8 was relegated to properties files. There are codebases out there with complete i18n and l10n in more languages that most here have ever worked with where there's zero Unicode characters allowed in source code files (with pre-commit hooks preventing committing such source code files).
Bruce Schneier was right all along in 1998 or whatever the date was when he said: "Unicode is too complex to ever be secure".
We've seen countless exploits based on Unicode. The latest (re)posted here on HN was a few days ago: some Unicode parsing but affecting OpenSSL. Why? To allow support for internationalized domain names and/or internationalized emails.
Something that should never have been authorized.
We don't need more of what brings countless security exploits: we need less of it.
Relegated Unicode to translation/properties file, where it belongs.
Sure, Unicode is great for documents, chat, etc.
But everything in UTF-8? emails? domain names? source code? This is madness.
I don't understand how anyone can admire the fact that HANGUL fillers are valid in source code are somehow a great win for our industry.
[+] [-] CodesInChaos|1 year ago|reply
[+] [-] fweimer|1 year ago|reply
HTTP theoretically supports Accept-Charset, but it's deprecated:
https://www.rfc-editor.org/rfc/rfc9110.html#name-accept-char...
But I think on-the-fly charset conversion in the web server is quite rare. Apache httpd does not seem to implement it: https://httpd.apache.org/docs/2.4/content-negotiation.html#m...
The charset in question does not have a locale associated with it (it's not even ASCII-transparent), so I don't think it's usable in a local context together with SUID/SGID/AT_SECURE programs.
[+] [-] smsm42|1 year ago|reply
[+] [-] lyu07282|1 year ago|reply
https://www.php.net/manual/en/function.iconv.php
[+] [-] pengaru|1 year ago|reply
https://en.wikipedia.org/wiki/Magic_quotes
[+] [-] keikobadthebad|1 year ago|reply
[+] [-] thenickdude|1 year ago|reply
[+] [-] smsm42|1 year ago|reply
[+] [-] saagarjha|1 year ago|reply
Wonder what the story is here. Burned 0 day? Not worth exploiting? lolz?
[+] [-] bawolff|1 year ago|reply