top | item 29804261

Ruby's Email Address Regexp

49 points| mooreds | 4 years ago |github.com | reply

63 comments

order
[+] raasdnil|4 years ago|reply
There are basically three levels of address checking:

1) You need to validate an email field for login or a website - checking for an @ mark with some text before and at least one . after the @ will do for this.

2) You need to do some sort of address validation, library regexps like this will do for 99.9...% of these.

3) You are building an email handling system which needs to actually support the RFCs, in which case regexp will not handle what you need, and you need to use a proper parser, like https://github.com/mikel/mail/tree/master/lib/mail/parsers

Ref: I am the original author of the Ruby mail gem.

[+] AdamJacobMuller|4 years ago|reply
> at least one . after the @ will do for this

technically not required...

  [adam@solomon]$ dig +noall +answer mx ai
  ai.   21572 IN MX 10 mail.offshore.ai.
[+] pmarreck|4 years ago|reply
Heh. A while back I worked at <now bought-out startup> whose main business included handling emails, and they were looking to speed it up, so I came up with this code to do the header parsing, which was 250x faster than the mail gem... but they ended up not going with it due to risk >..<

https://gist.github.com/pmarreck/8476538

[+] nicoburns|4 years ago|reply
Level 4:

You need an address backed by an actually valid mailbox. At which point you need to send an email to the address to validate.

[+] eyelidlessness|4 years ago|reply
You can certainly do it with regex, given a certainly non-regular regex implementation and a probably unbounded computational space. But you shouldn’t.

Ref: I (regrettably) have one of the top SO answers for matching URLs. It’s wrong in a few different ways and I’ve stopped fielding edits/comments for the last few years.

[+] Fire-Dragon-DoL|4 years ago|reply
Thank you. Your gem helped me multiple times throughout the years and helped me dealing with some hard problems
[+] tyingq|4 years ago|reply
The most helpful thing I've used in the real world is something that looks for common typographical errors, even if the email is technically valid.

Like, if the user types "[email protected]", it pops a dialogue asking "Did you mean [email protected]?". But lets them keep what they typed, or do a different fix if needed.

I found some JS called "mailcheck": https://github.com/mailcheck/mailcheck

I assume it's using popularity statistics, edit distance, etc, to come up with suggestions. There are updated clones that use react, vue, etc, instead of jquery.

With a working ecommerce site, this improved the percentage of correct emails more than anything else I tried, and I had tried many things. Because it's a bad situation when you've taken someone's money and have nothing other than a shipping address to contact them if something goes wrong (bad shipping address, out of stock situation, etc).

[+] secabeen|4 years ago|reply
The best email regex just checks for an @ symbol with something before it, and something after it. Anything more complex is a waste of time.
[+] Gigachad|4 years ago|reply
There is really no point going further than this. It's more likely that someone will type the email wrong but still valid than they will type it completely invalid. There are also some completely wrong validators out there which expect the TLD to be 2-3 chars only.

The ultimate email validation is just trying to send an email to the address and confirming with a code/link.

[+] mkdirp|4 years ago|reply
Never mind the regex, `email.indexOf("@") > 2` does the trick and faster if you happen to need to check many emails. All websites these days require verification of emails (regardless of whether or not it's necessary), and if that's not enough validation, I don't know what is!
[+] ghayes|4 years ago|reply
The regex listed there isn’t that much more complex. It’s basically a check for *@a*.* where * is some minimal whitelist of valid characters and a is an alphanumeric to start the domain name.
[+] lilyball|4 years ago|reply
This depends on your use-case. If you're writing a mail agent, then you do probably want to parse email addresses in their entirety. If you're writing a website that accepts email addresses and wants to make sure the user doesn't just type "foo", then yeah, check for `.+@.+` and call it a day.
[+] 1123581321|4 years ago|reply
It’s probably a waste of time for an individual developer to write a one-off complicated regex for a contact form. A team of contributors to a standard library should be optimizing regex a bit more since doing so will save so many developers time vs using even a very simple one-off regex, when testing is accounted. The optimizations here are reasonable and internationally compatible.
[+] jonpalmisc|4 years ago|reply
I feel like every developer at some point Googles "URL regex" and is inevitably led down a rabbit hole of different regexes — some optimizing for maximum accuracy, others for minimum insanity.

Having been down that rabbit hole before myself, I have to admit, this email regex is tamer than I expected it to be.

[+] ddoolin|4 years ago|reply
E-mails, URLs, file names/extensions, these are the bane of my RegEx existence. Agreed, this is not as bad as I've seen in other places.
[+] nerdjon|4 years ago|reply
I have yet to find the library that is doing this, but I have had a number of issues with website really not liking an "@me.com" email address.

I assume there is some commonly used library (or multiple) out there that don't recognize an email a domain that is less than 3?

But it is driving me insane, most recently I was on the phone with my vet and she told me their system told them my email was invalid (and would not accept it).

[+] mulmen|4 years ago|reply
Recently I had a doctors office call me to confirm an appointment instead of obeying my wishes to be contacted via email. The email I provided to their contact form was <theirname>@<mydomain>. The receptionist was convinced the provided email was incorrect because it was "their" email. I'm not sure what I expected.
[+] superasn|4 years ago|reply
SMTP had a very useful VRFY command after you've tested for the @ and MX record, but only a handful of service providers will tell you if the email is invalid nowadays due to spam concerns.

Gmail still does though, which is a big deal as 90% of people who register on my sites are using a gmail address only and thus easy to verify instantly and notify the user to double check the email spelling.

[+] nicoburns|4 years ago|reply
Yes, although that only helps if they typo the address to one that doesn’t happen to exist. Quite likely to hit a valid one by mistake with the size of gmails user base. I know at least one person who uses my emails address by mistake.
[+] 0x640x6D|4 years ago|reply
This regexp and the whatwg one it is based off (correctly) do not validate the presence of a TLD since it's not technically required (foo@bar is considered valid). But if you are building consumer products it's best to test that there is at least a presence of something TLD-like after validating against this regexp.
[+] don-code|4 years ago|reply
It's also interesting to step through the history on this line - it's undergone several revisions and, of course, also seen some reverts of well-intentioned features.
[+] User23|4 years ago|reply
I have a .email domain that at least one major site rejects as invalid. Quite annoying.
[+] davchana|4 years ago|reply
Right. Discover bank app's zelle settings don't allow any email.on .in domain, as in they assume that nobody from India, who already has an email on .in domain, will come to US & use their zelle.
[+] gclawes|4 years ago|reply
Ruby's URI classes are such a pain in the ass. They seem so un-rubyish