(no title)
kochthesecond | 5 years ago
These three points has made me raving mad from working with mysql:
- The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage. 'latin2', however, is ISO-8859-2. - The 'utf8' character set is limited to unicode characters that encode to 1-3 bytes in UTF-8. 'utf8mb4' was added in MySQL 5.5.3 and supports up to 4-byte encoded characters. UTF-8 has been defined to encode characters to up to 4 bytes since 2003. - Neither the 'utf8' nor 'utf8mb4' character sets have any case sensitive collation other than 'utf8_bin' and 'utf8mb4_bin', which sort characters by their numeric codepoint.
utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny.
Dylan16807|5 years ago
An extra warning about that mess: mysqldump in many configurations will silently convert utf8mb4 down to utf8mb3. So when you're testing your backups or migrations, do an extra check to make sure that emoji and rarer characters didn't get eaten!
speeder|5 years ago
Most weirdly, the fact that the default collation is SWEDISH.
It is a complete freak show, the users kinda got used to it, butchering our language (portuguese) to use only characters valid in english, hoping MySQL won't barf spetacularly on them.
reaperducer|5 years ago
Unless you're Swedish, I imagine. Then it's quite handy.
I believe the author of MySQL was Swedish, so to me it all makes sense. It also provides a learning opportunity for people who believe the entire planet operates on ASCII.
ranieuwe|5 years ago
fogihujy|5 years ago
lmm|5 years ago
Oh you sweet summer child. No it isn't. It's somewhat like Windows CP1252, but it also defines 8 other extra characters that are not in cp1252.
jcranmer|5 years ago
Actually, it's generally saner to assume that people mean Windows-1252 when they say ISO-8859-1. Charset labeling is frequently incorrect, and C1 characters are so infrequently used that seeing one pop up probably means you actually wanted Windows-1252 instead.
zaarn|5 years ago