top | item 658267

The Mailocalypse Is Upon Us: Why Isn’t All Mail UTF-8?

46 points| uggedal | 17 years ago |lamsonproject.org | reply

26 comments

order
[+] pilif|17 years ago|reply
I may be overly conservative here, but IMHO a MTA should not touch the body of the message.

It should not even touch the headers besides adding a Received-header to the top.

Re-Encoding all Mails passed through is definitely not what I call "not touching".

I know that there are some exceptions (i.e. 8bitmime), but I still think that mail servers should keep their hands of what is passing through them.

Like mailmen who are not supposed to open the envelope, read the letters and reprint them using a nicer font on nicer paper :-)

[+] mrduncan|17 years ago|reply
I don't see a problem with it if the translation is 1-to-1 (and that's a pretty big if). Disregarding federal law for the moment, what is the harm in the mailman reprinting your letters with nicer font on nicer paper? It seems to me that if the mailman wants to do that, as long as they don't lose any of the information of the original letter I would benefit from having an easier to read letter.
[+] grandalf|17 years ago|reply
What is lamson doing with mails that makes just leaving them in their original charset unworkable?
[+] aristus|17 years ago|reply
Processing them in Python. :) No, really. Python is horrible at dealing with strings of different encodings. You could generalize that to any complex app with lots of data sources: the only sane way to do it is to convert everything to a single encoding at the door.

(edit) whups, I hadn't thought about PGP/etc signatures. sigh

[+] dnewcome|17 years ago|reply
Here be dragons. Be afraid. Be very afraid.
[+] patio11|17 years ago|reply
Everything you say and more. Internationalization, ick -- anyone who thinks this is easy has no clue how deep this rabbit hole goes.

On Han unification: the Japanese reluctance to this is partly because they're being told "Some of your national literature needs to die so that our data standard can live. Deal." and partly because they're being told "What's with all the resistance, you xenophobic bastards, get with the effing program already.", generally by people who they perceive as not quite getting the issue.

All the educated Americans in the room have read Romeo and Juliet, right? Remember the balcony scene? Remember the world in the balcony scene that you have never heard in any other context?

O Romeo, Romeo, Wherefore art thou Romeo?

Imagine being told "For technical reasons, we're standardizing computers away from being able to accept 'Wherefore' as input or output. As a workaround, we suggest using "why", or perhaps putting the word in an image file and pasting it in when it is required. Most people don't use "wherefore" anyhow and, if you routinely do, you can modify your editing software to accommodate it, as long as it doesn't have to interface with any other computer ever. Oh, by the way, some other words you know are also going to stop working. It's nothing major. Well, OK, 'Gertrudes' might find it somewhat annoying but we've got a nice selection of names from Aluicious to Xavier and, if all else fails, you can spell it phonetically because your language is capable of that, too, and don't pretend otherwise."

[+] stcredzero|17 years ago|reply
I should start an 'ocalypse of the week site.
[+] gchpaco|17 years ago|reply
Among the problems that this has, it will shatter PGP and S/MIME signatures silently.
[+] dfranke|17 years ago|reply
Ding ding ding! You win the thread. It'll break DKIM/Domainkeys too if you have 8-bit characters in your headers.

Paws off my mail, Zed.

[+] dmm|17 years ago|reply
Is this guy MIME encoding everything in base64 or quoted-printable?

Last time I checked email could only be 7bit ascii because of many legacy servers.

[+] vidarh|17 years ago|reply
I'd be interested in seeing estimates on how many such servers are still being used. I've never come across one.
[+] prodigal_erik|17 years ago|reply
Drop badly encoded email on the floor because you hope it's probably spam? That deliberately violates Postel's Law, which is what keeps this mess mostly working.

And doesn't multipart/signed rely on knowing the actual charset the signer was using?

[+] calambrac|17 years ago|reply
Umm, no. Drop badly encoded email because most of it is spam. Or, at least, that's the hypothesis that he's asking you to help verify. Did you even read the article?
[+] jrockway|17 years ago|reply
If I hear about one more trivial issue that's called a something-pocalypse...