Falsehoods programmers believe about email

[+] jph|3 years ago|reply

An email falsehood surprised me recently: I thought a case-insensitive email address can be compared by using pseudocode `lower(x)`. But that's false.

An email system that guarantees case-insensitive email addresses can still fail during comparisons of lowercase-to-lowercase, due to international encodings, locales, I18N, L10N, etc.

Pseudocode:

    string-compare-case-insensitive(x, y) => true // Right way

    string-compare(lower(x), lower(y)) => false // Wrong way

It turns out this issue is called a "case folding non-deterministic" error, and is a broader issue with strings in general.

For more about "case folding" with Unicode: http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

For more about "non-deterministic" comparisons: https://www.postgresql.org/docs/current/collation.html#COLLA...

[+] tremon|3 years ago|reply

I thought a case-insensitive email address can be compared by using pseudocode lower(x)

You shouldn't be comparing the mailbox part of email addresses at all other than as literal bytestrings: you cannot know what equivalence rules the mailserver for that domain uses.

The domain part can be equivalence-tested using the normal rules for domains though, including case insensitivity, IDN translation and punycode resolution.

[+] tsimionescu|3 years ago|reply

This also depends a lot on why you are comparing those addresses. If you want for example to make sure that you don't easily allow the same person to register multiple accounts (say, to take advantage of a free trial period), then they are both wrong, since [email protected], [email protected], [email protected] etc. are all the same account and cost nothing to make.

However, if you just want to make sure this is the same user that signed in earlier, you get to chose what rules you want - it's their problem to some extent to remember what account name they gave you.

[+] systemvoltage|3 years ago|reply

The whole thing is a mess. I remember trying to make a Postgres email address column and wanted to do make sure it can do comparisons either way, then found this stackoverflow post that shattered my expectations of a clean well understood problem: https://dba.stackexchange.com/questions/68266/what-is-the-be...

[+] kuon|3 years ago|reply

I think the part before the @ is actually case sensitive per RFC, but most mail server will treat it case insensitive. But I am not sure I am reading the RFC correctly, citation:

Verbs and argument values (e.g., "TO:" or "to:" in the RCPT command and extension name keywords) are not case sensitive, with the sole exception in this specification of a mailbox local-part.

[+] Gigachad|3 years ago|reply

Case insensitivity was a huge mistake in computing really. Most languages don't have cases and its very non trivial to convert between cases. Should have treated every char as completely unique.

Sure. At the user signup side, block emails that are too close in ways like case, but as a sender you should always treat them as unique emails.

[+] heurisko|3 years ago|reply

> string-compare-case-insensitive(x, y) => true // Right way

what is inside that function?

[+] landofredwater|3 years ago|reply

What would the mechanism behind case-insesitive string compare be? How would you program out all the edge cases?

[+] unknown|3 years ago|reply

[deleted]

[+] technion|3 years ago|reply

Judging by the number of times I've had this fight with developers, and seen other people argue it online, I'd suggest the biggest falsehood programmers believe is: Your random VPS can just send email from any address you like and expect it to be delivered.

[+] AtNightWeCode|3 years ago|reply

Or any SAAS for that matter. It takes some effort to get email sending right. It is also a field like SEO where there is a large chance that you get lucky, and things works even when the setup is incorrect.

[+] quickthrower2|3 years ago|reply

Yet that is common sense to anyone who enjoys not getting spam due to their spam filter!

[+] nix23|3 years ago|reply

Just because you cannot setup a mail-server correctly, i have installed 100's of email-server (2022) from AWS to Hetzner to Vultr andandand. Yes your random static ip can deliver reliable email IF you have:

-Static IP (4 and 6)

-Correct Reverse DNS for IP4 and 6

-Correct Hostname

-Site-verification for gmail/microsoft

-DMARK

-DKIM

-SPF for IP4 and 6

[+] mrmattyboy|3 years ago|reply

I'd suggest (as a falsehood):

Users always have immediate access to their mailboxes

I imagine lots of people do not have their email account attached to their phone, people maybe on a shared computer (library perhaps) and do not readily have access to the password (if it's randomly generated and stored at home) or their mail provider is blocked where they are (things like Hotmail etc. were blocked whilst I was in education)

I'd say there's lots of services that require you to validate your email address immediately after signing up - even where an email address is not required by the service itself - having a grace period to verify you email in such circumstances is great, but see it very infrequently.

[+] doodlesdev|3 years ago|reply

The grace period is also a source of many security vulnerabilities [0].

[0]: https://www.bleepingcomputer.com/news/security/hackers-can-h...

[+] JohnFen|3 years ago|reply

> I imagine lots of people do not have their email account attached to their phone

I certainly don't.

[+] oconnor663|3 years ago|reply

> An email address like ^_^@example.com or +&#@example.com is invalid

My current employer autogenerated a company email address for me including the apostrophe in my last name. I couldn't believe that was a legal character, but I looked it up, and sure enough it is. Of course, plenty of other internal systems reacted the same way I did, and I frequently generate errors whenever I try to register myself with random services :p

[+] wtmt|3 years ago|reply

I’ve seen a lot of systems, including corporate systems for internal use, reject apostrophes in email addresses (and sometimes even in other fields). Apparently the developers are too lazy to deal with strings properly and fear SQL injection attacks, and perhaps they don’t trust all the other systems they may interface with. So their escape hatch is to prevent these from being allowed.

(“Little Bobby Tables” from xkcd comes to my mind whenever I see these restrictions)

[+] technion|3 years ago|reply

I wrote our onboarding system and had it strip apostrophes from names. Some people object, but they object more when random websites refuse to let them sign up.

[+] random_upvoter|3 years ago|reply

A few years back I was asked to set up a mail server on an AWS server for some small non-profit organization. I am a software developer of 25 years with a lifelong habit of tinkering with OS installations and the like, so I thought "sure, how hard can it be?". Here is my warning for you all: do not enter this highway to hell unless you actually are a sysop who is specialized in setting up email servers.

[+] 3np|3 years ago|reply

* Blocking sending to domains listed in [0] or similar is a useful way to prevent spam or sybil attacks with minimal impact on authentic users

I hate this. Motivated attackers can trivially circumvent it at minimum effort and cost while it further normalizes centralization and strengthens surveillance capitalism as the barrier to use unlinkable e-mail for different service providers for a normal person becomes untenable (curiously equally disposable domains from major providers are absent from most of these lists, supposedly precisely because it is disruptive). I'm ambivalent on even sharing the link for the risk of a dev reading this going "oh, neat!"...

[0]: https://github.com/disposable-email-domains/disposable-email...

[+] Zobat|3 years ago|reply

Every programmer on my team has gotten this link about email address validation.

"I Knew How To Validate An Email Address Until I Read The RFC"

https://haacked.com/archive/2007/08/21/i-knew-how-to-validat...

[+] a2128|3 years ago|reply

RFC 822 and some email-related systems accept commas as valid and to mean multiple receivers. This can be dangerous if user-inputted strings aren't properly filtered. I recall a website that would accept "[email protected],[email protected]" as a valid email, send the verification to both emails, and grant administrative privileges to the site once verified, since the email clearly ends in @company.com and belongs to the company!

[+] idk1|3 years ago|reply

I'm curious, I've always validated an email as containin only one '@' and kept it that simple. This validation would cause that input to be rejected. I would love to know if my assumption was right, can an email address only have one `@`?

[+] Lucent|3 years ago|reply

Here's another: Email addresses must have at least one dot.

There are MX records at the apex of .ai, so postmaster@ai probably works.

[+] kerneloops|3 years ago|reply

Microsoft (at least used to) require account passwords to not include the part before @ in email addresses. My email address was a@(domain).net, and therefore I was prevented from using any password including the letter "a".

[+] dvh|3 years ago|reply

It's not about falsehoods I believe about emails, it is about knowing that emails will be used in million different ways in wide range of often legacy software and I'd rather force user to use normal email like [email protected] than to debug some early '90s cow milker at 2am on Saturday standing knee deep in cow piss in the middle of Nebraska just because some smart as have backslash or emoji in email address.

[+] yread|3 years ago|reply

I'm missing "if a person confirms they are in control of an email address it will always be theirs" recently got bitten by it as it dept recycled email addresses so a new hire got email address of somebody who left some time ago. They got some of their privileges. Oops

[+] tomjen3|3 years ago|reply

> Any one email address refers to only one single person

This one hit me. My grandparents share a computer and one email address (just as they share one physical address and phone number), you wouldn't believe how many services, including Google, fails this rather simple test.

And in case you think this is a weird one: until not that long ago, every way to contact people where to the house they stayed in. Letters typically had a name, but if you were married and had shared accounts, either person could need to read those letters.

[+] degrees57|3 years ago|reply

Mildly interesting problem: law says email with financial information needs to be encrypted. Email goes from the Accounts Payable clerk to Verizon Accounts Receivable, but triggers the automatic encryption process. One needs to create a free login and read the email in the (secure) web portal. Verizon complains. Talk with the Verizon AR manager and he tells me "I have 40 people who access that mailbox; I am NOT going to create a username and password in your system and then share that with those 40 people. What happens next week, when one of them leaves?"

[+] nicbou|3 years ago|reply

I wish that such articles explained why those are falsehoods. In this case, it's clear to me, but in many others, I could not understand why half of them were false.

That's unfortunate because those articles are very valuable to anyone building software.

[+] wodenokoto|3 years ago|reply

> Anyone with a .edu address is a student

I actually meet the reverse more often: Every student has an .edu.

I think only _some_ american college students can be expected to have an e-mail address.

[+] another-dave|3 years ago|reply

Some of these are patently not true:

> Everyone has exactly one email address

You'd be hard pressed to find anyone who's at all component with the internet who thinks that this is true, nevermind a programmer.

Maybe we need a 'Falsehoods writers of articles believe about falsehoods':

> You can just put any false statement in the list, even if no-one actually believes it and it will improve your article.

[+] tryauuum|3 years ago|reply

> email is a reliable transport

I mean, it is. Either you email will be delivered successfully, or you get a message that it couldn't be delivered. If disappears without trace, then most likely system administrator has manually deleted it

[+] dspillett|3 years ago|reply

Not even remotely true. Not even the first hop from your local MTA can be trusted in that regard, it may accept the message and just immediately bin it, it might accept it but queue it for further verification and not bother sending you a message back telling you this had happened, etc. Between your MTA and the receiving mailbox there could be several hops, any of which might silently send your message to /dev/null.

And that is without considering the same issues with the MUA, assuming your message is for human consumption and your aren't using SMTP to communicate between automated agents) at the other end having it's own spam/junk/other filtering (though I suppose you could consider that later part to not be email transport begin unreliable, i.e. if you consider successful transport to be "the user saw it" or "their mail server time their MUA it existed").

The only reliable bit of email transfer is the little bit you have full control over, the local MTA. Even then, if you are in a shared hosting environment where you don't control that yourself this could still silently reject your messages (a consideration you might need to make if publishing software others may self-host).

[+] keanpedersen|3 years ago|reply

..or you are sending to a Microsoft-hosted email like @outlook.com, @hotmail.com or @live.com and they have decided that your sending server is spammy. In that case they will silently drop your mail.

[+] comboy|3 years ago|reply

Just the fact that the message itself is an e-mail should be enough to see some issues with your reasoning.

[+] blacklion|3 years ago|reply

Never was ghost-banned by GMail? You are lucky.

[+] fleddr|3 years ago|reply

Adding one more:

"An email address is the global standard to sign up for applications/services"

False in China, where the norm is to use their phone number. Doesn't mean they don't have an email address somewhere, but it's not how they sign up or sign in, typically.

[+] msh|3 years ago|reply

It seems like it mixes up things people believe and things that people do for ease of use/ease of life, like:

>Anyone with a .edu address is a student >Anyone with a .edu address is a student or faculty

I dont think most people believe that, but its a easy filter if you want to give rebates to students and they dont cost you too much, like dropbox giving increased free quota to people who sign up with a .edu

[+] rlayton2|3 years ago|reply

Which I've had failed as a student in Australia, as we use .edu.au (not for Dropbox, but other services).

As you said though, its a simple test, and if you don't think about it too much, its too easy to just test the email ends in .edu and move onto the next task.

[+] gwd|3 years ago|reply

My university has lifetime email forwarding; so I use my .edu address as my main personal email address. (I tell people, "In 30 years, it you email that address, it should still get to me.") I once signed up for a SaaS team workflow thing with my personal email address, thinking about trying to use it w/ my family to try to work together on a project; and within a day or two got a call from someone from that company obviously hoping I was actually a decision-maker at that university. Sorry...

[+] thrdbndndn|3 years ago|reply

Yeah like "everyone has an email address", of course not everyone has one, but if you don't, you're simply not going to be our client.

And I'm not even sure what this "everyone has exactly one email address" is about.

[+] horsebridge|3 years ago|reply

Lists like these would be better with some more explanations for the less obvious bullet points. For instance, when/why would an email have multiple From addresses?

352 comments