Complexities of e-mail validation logic

[+] ds|4 years ago|reply

If your email is <RFC>fan 69™@root I am not going to let you signup. Sending emails cost money and bouncing emails affects your sender reputation. Also, for every user out there using <RFC>fan 69™@root as their email address, there is going to be thousands of people accidently entering their email address incorrectly and not getting a alert about it. Yes you could do fancy shit like checking mx records and whatnot, but come on- Im not going to maintain/build that infrastructure for the one out of a million people who are trying to use that address.

Developer time is precious at a startup and supporting <RFC>fan 69™@root while still denying b ob@gmailcom is very, very far down the list of things to do.

In summary: I don't suggest doing 'perfect' email validation to RFC spec. You will save money/devtime and make more of your users happy by not doing it.

[+] chrismorgan|4 years ago|reply

I think the best syntax validation technique for email addresses now is found in the HTML spec: https://html.spec.whatwg.org/multipage/input.html#valid-e-ma.... As they say, this is a wilful violation of RFC 5322, because that’s simultaneously too strict, too vague and too lax to be useful. They give a grammar, and the following regular expression implementing it:

  /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

Remember that the web is a platform that lives and breathes this stuff. A lot of thought went into this grammar for valid email addresses. This is a good way of filtering out obviously bad stuff while allowing all realistic and sane inputs.

One part of all this that I’m not aware of the situation around is “8. You can put emojis in the local part.” The HTML spec’s validator is all ASCII. It does remind you to punycode the domain labels, but makes no mention of internationalised local parts, and I’ve never learned about non-ASCII local parts or how well they’re supported. I gather they may require the sender to be capable as well as the receiver, whereas internationalised domain names were made compatible with all systems via punycode.

[+] toomanybeersies|4 years ago|reply

I've always just used

  /^.+@.+\..+$/

That is, "Some characters, an @, some more characters, and a period"

I couldn't care less if users want to enter undeliverable email addresses, they won't get emails. All that regex is intended to achieve is ensuring that the user hasn't accidentally filled the wrong field (e.g. tried entering their phone number) or mistyped a punctuation mark (foo#bar.com, foo@bar,com)

Strictly speaking, it won't match some valid email addresses, such as IPV6 domains. But if I receive a support ticket complaining that we don't accept email addresses with IPv6 address domain, I'll reply advising that the customer should purchase a domain name or sign up to one of many free email services.

[+] crazygringo|4 years ago|reply

Huh. Interesting this doesn't support international email [1] addresses, e.g. квіточка@пошта.укр or Dörte@Sörensen.example.com.

Seeing as the web has long supported Unicode, where are e-mail addresses currently at in that evolution?

Are full Unicode e-mail addresses something that is decently supported today, or still largely theoretical? Is this regex sufficient? What kind of e-mail addresses do people in China most commonly use, for instance?

[1] https://en.wikipedia.org/wiki/International_email

[+] u801e|4 years ago|reply

I provide my email address with the +companyname suffix on the local part as a way to filter my email into various folders based on the To header contents.

Unfortunately, many websites are configured to reject email addresses that contain a plus character. I've also encountered websites in the past that did accept the + character when creating the account where the email address serves as the user name, but then could not log in because their log in form rejected the + character in the user name.

[+] SAI_Peregrinus|4 years ago|reply

I got sick of companies rejecting email with "+", and bought a domain to use for email (among other reasons). Now I've got a wildcard entry in DNS, so any valid local part gets routed to my inbox. So instead of "[email protected]" I can do "[email protected]".

[+] moojd|4 years ago|reply

I was unable to provide my email address for a retail rewards program last week because the input field for the domain was a dropdown in their POS. Not the TLD, the entire part of the email after '@'!

[+] ThalesX|4 years ago|reply

I used this wonderful trick to sign up for my government issued eID (it was something else but works for explaining). What they decided to do is to simply remove the + and don’t let me know about it.

[email protected] thus became [email protected]

I tried logging in, resetting passwords, nothing worked. I had to go to the authorities and make a written request to allow them to interrogate the database by the equivalent of my social security number, and that’s when we realized they just stripped the +.

[+] theshrike79|4 years ago|reply

Fastmail allows for [email protected] -style addresses. Even for your own domains.

Much more reliable than the + -thing, which breaks in the weirdest of places.

[+] fullstop|4 years ago|reply

Ages ago, back in myspace days, their system would permit + when creating an account, but could not handle this in their forgot password / password reset system. I never was able to delete my account because of this.

[+] theandrewbailey|4 years ago|reply

I use Fastmail with my own domain name and unlimited email inboxes, so I use [email protected] to sort incoming mail.

[+] vidarh|4 years ago|reply

If you use Gmail here's a fallback option: Gmail ignores "." in the local part. So foo.bar is the same as f.ooba.r to Gmail. Obviously quite limited and more hassle to keep track of.

[+] rpadovani|4 years ago|reply

I use a catch-all to have a <website>@<mydomain>.com login for every website.

Samsung doesn't accept emails with "samsung" as prefix, so I have [email protected] for them. I have no idea what's the logic behind.

[+] caymanjim|4 years ago|reply

I got sick of + not being accepted and switched to using - for all my aliases, which works everywhere I've tried. It's annoying, but practical (assuming you run your own mail server, or have the ability to manage it client-side).

[+] jbgreer|4 years ago|reply

Ditto, with the same hassles mentioned by you and others, such that I'm actively looking at email services that handle this sort of thing better using approaches such as mentioned below - domain@mydomain style registration addresses.

[+] innocenat|4 years ago|reply

I find that a lot of website don't allow + sign precisely because of Gmail usage.

[+] cratermoon|4 years ago|reply

I've decided that the best way to validate email address is to not validate them, but require that any signup be finalized by the individual following a link emailed to them.

This allows a person to use any damn thing they want as their email address, provided it works and they can get the email.

[+] welder|4 years ago|reply

If sending emails is 100% free, but you still have to worry about your sender reputation. [1] Sending a large amount of mail to invalid emails will start getting your emails put in people's spam folders. That's the reason email validation services exist, to prevent sending to invalid emails. [2]

Also, humans make mistakes. You should detect spelling errors and typos then suggest corrections. [3]

[1] https://www.mailjet.com/blog/news/3-factors-that-impact-your...]

[2] https://www.mailgun.com/email-validation/

[3] https://www.npmjs.com/package/mailcheck

[+] Arubis|4 years ago|reply

100% agreed here. Accept a text field; maybe validate that it has an @ in it and a . after the @.

Send that address a confirmation email. Now you've got consensual opt-in and you've somewhat protected yourself from adding a wrong address to your recurring mailing list.

Prevent abuse with long (seconds) delays between submissions from the client. If the user thinks they did it right, they're waiting on their email inbox anyway; if they immediately realize they made a typo, it'll take 2-3s to fix.

The RFCs were written when manually (not from cron) sending email to another user on your local system as a thing that actually happened. I'm certain you actively want to avoid that now.

[+] serial_dev|4 years ago|reply

This is also my preferred approach.

If I can send you an email and you can verify that you have access to that email, your email is "valid enough" for me.

Then, the validation is basically "is there an @ and after a dot in there?". I find that after that, every hour spent on improving the validation will just cause more emails falsely flagged as invalid, more support requests from the people who couldn't sign up with valid emails, it's code we need to maintain, anytime edits the validation logic risks breaking sign ups completely.

So with more "improvements" to the validation, you just cause more problems. Then why do it?

I hear the reputation arguments, but in practice, it never happened to any of the organizations I worked for.

What happens though very often is naive engineers trying to solve problems the business doesn't have with knowledge they lack...

[+] manmal|4 years ago|reply

My cheap-o approach to this is: Check there’s an @, and that there is a dot afterwards. This excludes local domains obviously, but I don’t want those anyway.

[+] burke|4 years ago|reply

I’ve never seen much point in trying to do better than .+@.+, unless you’re going to pull out the (gargantuan!) authoritative version for some reason.

[+] kevinmchugh|4 years ago|reply

Implementing the authoritative version is a waste since you'll also need to keep an up-to-date list of TLDs, and more importantly, you might have a typo in the input that gives a valid-but-incorrect email.

After doing your simple regex, the best move is to just send a verification email and wait for the user to click the link, if you really need to be sure.

[+] michaelt|4 years ago|reply

Too many sites refuse to let me register as

  "><script>alert("XSS");</script>@example.com

The oppression must end!

[+] wffurr|4 years ago|reply

Yeah, do simple validation, and then just send an email. Even a validated email can still be non-deliverable if there’sa typo in the domain or the first portion.

[+] kristaps|4 years ago|reply

Yep, these arcane rules are maybe relevant to the 5 or so people writing mailservers, but not to web developers.

[+] arkitaip|4 years ago|reply

Most of these are just overcomplicating validation. What really matters is account verification, i.e. sending an email to the specified email address in order to verify its authenticity before sending any kind of email (transactional, marketing) to the account.

At this point, not doing email verification should be considered a dark pattern because it causes so much trouble when people's email addresses are used without their permission.

[+] delecti|4 years ago|reply

And "permission" isn't even the only issue. Months of Doordash account emails were lost to the ether because I made a typo (gmail.lcom) in my personal email, and it was basically impossible to change the email on an account (their SMS verification seems broken). It does explain why I never got order confirmations though, that had seemed odd.

[+] Yaina|4 years ago|reply

What this article really showed me that this RFC is actually pretty harmful.

Supporting all of the rules outlined in the spec is probably a huge burden for maintainers of mail clients and servers. Obviously some parts of the spec are going to be omitted. It's hard to blame them for it, but the same person that rightfully skipped over implementing the routing thingy might've also wrongfully assumed there won't be a Japanese character in the address. And that's what's so bad.

You might introduce more issues in your system, by taking the full spec into consideration for your validation, instead of using the whatwg regex someone posted here.

[+] nradov|4 years ago|reply

Well if there are problems with the RFC then you should work with the IETF to correct those. They have an open standards development process.

[+] mLuby|4 years ago|reply

Validation errors are common, but warnings are not.

I'd like to see more of "Patterns like [what you entered] are uncommon—are you sure?" instead of "Patterns like [what you entered] are not allowed—change it to proceed."

[+] richeyryan|4 years ago|reply

I recently implemented this using the great Mailcheck library. So if someone types "gnail.com" or "gmail.con" it detects it and we can show "Did you mean gmail.com?". If someone ignores the suggestion, fair enough. If someone purposely wants to give us a junk email, fair enough. At least we're not frustrating them needlessly.

https://github.com/mailcheck/mailcheck

[+] zzo38computer|4 years ago|reply

Mostly, yes. However, some things should probably still be prohibited, such as:

- An email address ending with ".invalid", unless invalid email addresses are supposed to be allowed (which in some cases is useful, but you can then disable sending email to such an address, using it only for identification). (I do use such an email address for identification on NNTP.)

- Email addresses without at least one at sign.

- Email addresses containing control characters (at least ASCII control characters).

- If the domain name does not resolve or resolves to a loopback address or LAN address (except for some specialized cases where such a thing is desirable). The same is true for literal IP addresses; if it is a loopback or LAN address then it should be disallowed, but otherwise it can be allowed.

[+] buro9|4 years ago|reply

I have a tld that was recently created (2014) and I still cannot use it in an email address reliably.

The domain in question being david.kitchen, so an email may be [email protected]

The issue I encounter more than any other is trivial: Most sites still have a tld validation that only accepts domains that end in net|com|org and some other small list of accepted suffixes such as co.uk

The list of TLDs is constantly expanding https://newgtlds.icann.org/en/program-status/sunrise-claims-... so even `[a-z0-9.-]+@[a-z0-9.-]+\.[a-z0-9]+` would be better than what I see in the wild.

[+] radicalriddler|4 years ago|reply

I have two things.

The amount of times I've tried to sign up with my protonmail account to a service and it doesn't pass validation simply because it's a protonmail account (not a gmail, outlook, hotmail or aol apparently). makes me wish everyone did follow the RFC. I actually emailed a service one time, and they responded that it's due to protonmail usually being associated with shady stuff wtf.

The second. I had to implement an email validator at one of my previous jobs, and fell down the RFC rabbit hole. Not only did I have to follow the RFC as per my bosses request, but I also had make sure that Amazon SES allowed it. Came out of the office wanting to just walk out onto the road. The weird things that not only email servers allow, but also, what do email clients allow.

[+] hutrdvnj|4 years ago|reply

There is no point in many cases. Even if you can verify that the email address is syntactically valid, you'll still need to check that it was not mistyped, and that it actually goes to the person you think it does. The only way to do that is to send them an email and have them click a link to verify.

However, if you still want to validate an email address then use a library. All popular programming languages have email validation libraries. Yes, it's an extra dependency if it's not included in the std lib or the framework you use, but email validation is wrong in 99% of the cases, if you wrote it yourself.

[+] yawaramin|4 years ago|reply

Or use the browser. HTML form validation has <input type="email"> which checks that the entry is a valid email address.

[+] _wldu|4 years ago|reply

Perfect is the enemy of good.

If it is a string that has an @ sign, a dot and is at least six characters long, it's probably a valid email address.

[email protected]

No need to go further than this. It's not worth the time.

[+] biztos|4 years ago|reply

This is a great run-down of the trouble with e-mail addresses.

I worked in e-mail security for quite a while. "Write an e-mail address parser" was my go-to technical interview question.

It was pretty easy to see if the candidate had ever given any real thought to e-mail (most had not); and you could also pick up a lot of signals about engineering style, for instance if they started with a regex (fewer did than I expected). And it was trivial to adjust the difficulty: if someone thought the question was easy and had a fast solution, you could just throw them a test-case like the ones in this article.

(Note: the actual title is "Your E-Mail Validation Logic is Wrong" -- and it's only about addresses, the author isn't implying that e-mail systems can't validate messages nor for that matter addresses.)

[+] pmontra|4 years ago|reply

I just check that the string contains at least an @ character. That ensures that we're not rejecting people with uncommon patterns in their email address and takes very little time to design, develop and test.

In a project we're doing something fancier: we check the result of sending mail and store it in the database record for the account (Mandrill notifies us on a webhook.) Then we might take actions for bouncing addresses. The actual impact on the project has been zero so far.

[+] csours|4 years ago|reply

This feels like a discussion for backend implementations/email forwarders, not for email signups... but hey while this has some attention - For god's sake, put a button that says "This ain't me", at least for important stuff.

I'm sorry, but I just can't bring in Clyde's truck for the oil change, cause Clyde ain't me!

I also cannot attend Cassidy's parent teacher conference, apologies, I am not in Ohio.

[+] cratermoon|4 years ago|reply

>This feels like a discussion for backend implementations/email forwarders, not for email signups...

And yet I've worked multiple places where product people asked for "simple email validation" on user signup. If they insist, I ask them to provide some actual test cases that they care about. Sometimes the product folks can be convinced to drop the validation requirement if they can be shown that anyone who can't sign up because their email address doesn't validate will simply move on and not sign up.

In the case where your product is B2B and all the employees of your customers are users (say an HR product), then the first time a VIP at an important customer complains, that's usually enough to convince your stakeholders to disable the email validation.

[+] danrl|4 years ago|reply

I have a very short email address in the format [email protected] and for my special friends that don't know how to validate correctly I have created [email protected]

I need to use the latter ~5% of the time. Most often I take my business to someone else for the sake of principle.

[+] BenjiWiebe|4 years ago|reply

Why not accept absolutely anything in the email address field, and just require an emailed link to be clicked before marking the email as validated?

[+] rblatz|4 years ago|reply

Because it causes conversion drop off.

[+] JoyfulPanda|4 years ago|reply

There is is awesome talk about E-Mail by Ricardo Signes:

https://www.youtube.com/watch?v=JENdgiAPD6c

The first 5 minutes are perl specific, but the rest is email and just hilarious.

[+] lilyball|4 years ago|reply

I’m of the opinion that all you should validate is that there is an @, the text to the right side (of the final @) is a well-formed dotted DNS domain, and that there exists at least one (non-whitespace) character to the left of the (final) @.

Yes, I can craft garbage emails that pass this quite easily, but who cares? If I’m crafting fake emails I can make valid ones too. This rule ensures I typed the @ and the dot in my domain (we really don’t need to support dotless email domains and it’s better to catch “foo@gmailcom”) and it won’t reject all the weird random emails people might have.

[+] unknown|4 years ago|reply

[deleted]

385 comments