top | item 5393759

Real Email Validation

47 points| pythonist | 13 years ago |djangotips.com | reply

42 comments

order
[+] roc|13 years ago|reply
The only E-Mail validation involves sending an actual email with a response link.

Because even if people happen to give you a functional email address, it isn't necessarily their email address.

And I say that as someone who has come to regret registering a first-initial-last-name gmail address. And it's not even a particularly common last name.

[+] vincentkriek|13 years ago|reply
I think the purpose of this validation is to help people who mistype their emailadress, not to check if it is their emailaddress.
[+] drcongo|13 years ago|reply
Feel my pain. I have surname at gmail.
[+] papsosouid|13 years ago|reply
>And I say that as someone who has come to regret registering a first-initial-last-name gmail address. And it's not even a particularly common last name.

It is amazing how common this problem is. I assumed it was incredibly rare, but I have 23 different people who have given my email address to someone thinking it was theirs somehow. Not like "I am just signing up for some forum" kind of stuff, logins to government websites, banks, car dealerships sending me stuff about someone else's financing, etc, etc. It is crazy how many places don't verify the owner of an email address before sending it sensitive info.

[+] baudehlo|13 years ago|reply
This is just awful. A quick scan of the code brings up the following problems:

* It fails to deal with the case where there is no MX record for the domain (fall back to A record)

* It fails to sort the MX records, potentially falling foul to tarpits

* It fails to connect to each A record lookup of the MX host on failures

* It fails to deal with transient failures (such as 4xx responses)

That was just from a quick scan.

Connecting to MX servers in a web environment (especially one using blocking I/O like Django) is generally a really bad idea. Many MX servers use delays and slow responses to combat spammers, and you're passing those slow responses on to your users.

Just check it looks vaguely like an email (the regexp fein posted is good enough most of the time) and send a confirmation email - it's the right thing to do.

[+] andrewaylett|13 years ago|reply
Failing to deal with transient failures is especially bad when trying to deliver to a system that uses greylisting.
[+] greyboy|13 years ago|reply
Additionally, doesn't it rely on the truthfulness of the SMTP server? That's not a good assumption - it's common to accept anything and null-routes bad addresses.
[+] jodrellblank|13 years ago|reply
And I'll still give you [email protected], it will pass every check you can throw at it, including sending an email and getting me to click a link, and it still won't be a real email address.

Still your move, e-mail harvesters.

Checking that I haven't mistyped it or put the wrong thing in the wrong field is a basic sanity check. Beyond that, the only way to actually get a real email address that I read is to be a service I care about.

[+] Swizec|13 years ago|reply
For me the trick isn't to get my real email address, I give that to anyone.

But kudos to you if you can make it into my "Important and unread" inbox and remain there. It's the only part of my email that I actually check.

Some services are so great I let their daily reminder emails go there and enjoy reading them. That's right, there are services out there (I only know of one) whose daily "You should use us" email is so awesome I enjoy reading it every day.

[+] martinp|13 years ago|reply
Making your app connect to random SMTP servers every time it needs to validate an email address doesn't seem like a good idea.

Shared domains (gmail.com etc.) might even get you blacklisted if you flood the same SMTP servers over and over again.

[+] healthenclave|13 years ago|reply
Is there a work around ? How about using proxy but I guess that adds another layer of complexity
[+] tomwalsham|13 years ago|reply
The best way to improve email delivery is to understand that email addresses represent humans. Address validation and long-term deliverability is primarily a problem of social engineering, not technical.

Ordinarily I'm in favour of things that can improve data quality with minimal user friction, but in this case while it looks like an attractive solution, it's both dangerous _and_ broken.

It's dangerous because if you repeatedly open empty SMTP sessions with major ISPs (and some neckbeard boxen) to validate addresses, you will rapidly fall onto blacklists. Furthermore existence of an address says nothing of the end user's ownership of that address.

It's broken because of the myriad crazy responses that mailservers return -: 5XX errors for soft-bounces, 4XX errors for permanent failures, deliberately dead primary MX server... The web's email infrastructure is so massively fragmented and quirkily non-RFC-compliant you just cannot rely on technical solutions to these problems except at scale of an ESP (disclaimer: I work at PostageApp.com, a transactional ESP, and we tackle this problem on a large scale)

Finally, it fails my 'Spammer Sniff Test': If you think of a clever trick to improve email delivery/opens/responses etc, it's been thought up 10 years ago by spammers and long since added to blocked behaviours in email protection infrastructure.

Check for '@', and craft your email verification process to incentivize following through. For long term delivery (to bypass the mailinator issue) provide value, pure and simple.

[+] bambax|13 years ago|reply
As an aside, would there be some value in providing an email validator API?

Something exactly like this: http://mythic-beasts.com/~pdw/cgi-bin/emailvalidate

but which would respond in an easy-to-parse way (JSON|XML).

It could be enriched by detecting common spelling errors ('gmial' or 'g-a53'* instead of 'gmail' for example).

*: gmail when typed on a European laptop with numlock on.

[+] alexkus|13 years ago|reply
Will also fail to allow addresses that purposely soft bounce (4xx) the first attempt (or attempts within a certain time limit) to deliver to them.
[+] bambax|13 years ago|reply
('SMPT' is used throughtout instead of 'SMTP'.)

What does django.core.validators.EmailValidator actually do?

Validating an email address with a regex is surprisingly hard: see http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

I wonder if EmailValidator does this, or something simpler?

[+] baudehlo|13 years ago|reply
That validates RFC822 addresses, which is the full syntax of the From/To/CC headers. You don't want that for validating an email address on a web form.
[+] fein|13 years ago|reply
Here's a secret:

regex: /^(.+)\@(.+)\.(.+)$/

maxlen: 254, minlen:5

Aside from sending your verification email, that's all you need.

[+] Sephr|13 years ago|reply
n@ai (Ian Goldberg's real, valid email address) is rejected by this by your minlength and by the subdomain requirement. You're better off just checking for an @ and leaving the rest to your smtp library.
[+] threedaymonk|13 years ago|reply
I just check for /.@./, which catches obvious errors like leaving out the email, or typing something in the wrong field. Beyond that, there's no point making assumptions (like "all domains have a dot").
[+] pythonist|13 years ago|reply
I believe that this part is using the Django's pattern matching:

super(EmailValidator, self).__call__(value)

Just tried it. It works!

[+] micampe|13 years ago|reply
Are single letter domain and tlds allowed?
[+] makethetick|13 years ago|reply
Could be easily modified to verify email lists too, very handy if you haven't sent for a while and want to avoid bounces.
[+] jpadilla_|13 years ago|reply
This is pretty awesome! Wonder how much time would it take to validate. Last thing I would want is to make that signup process even slower. I guess you could still let the user pass and then run an async task to check "if the domain name exists, ask for MX server list from DNS, and verify that SMPT server will receive a message to that address" and then maybe set a flag somewhere.