How to get gmail.com banned (2011)

[+] zinxq|10 years ago|reply

Wow. Do my daily HN scan for the day and find an article you wrote ~4 years at #1.

I hadn't read that in many years, and what fun to do a re-read.

Thanks Internet - don't stop being you.

[+] dice|10 years ago|reply

I was sad to see that the link to the domain generator was broken. The new one on the home page is a div that's generated server-side.

I hope you don't mind that I wrote a quick one-liner to see if you're still detecting bots...

    @bobmail.info
    @zippymail.info
    @thisisnotmyrealemail.com
    @spamhereplease.com
    @safetymail.info
    @suremail.info
    @mailinator2.com
    @spamherelots.com
    @mailinator2.com
    @spamhereplease.com
    @spamherelots.com
    @spamherelots.com
    @mailinator.net
    @mailinator.net
    @mailinator2.com
    @mailinator.net
    @mailinator.net
    @mailinator2.com
    @mailinator.net
    ...

Yup :)

I didn't see any "evil" insertions, though...

[+] CWuestefeld|10 years ago|reply

A few years back we came into work one morning to find that some bot was scanning our site so hard that it seemed the lights nearly dimmed. Some detective work suggests that it was a service performed on behalf of a competitor, to get our price list (bear in mind that our catalog has a few hundred thousand products).

We were really annoyed that rather than just ask us, they had launched what amounted to a DDOS attack. So we thought about how we might exact vengeance...

After a few hours we figured out a pattern to the rogue requests that allowed us to filter them, despite their efforts at stealth (like, they cycle through a list of various user agent strings to make it look like there are multiple different users). We toyed with the idea of, rather than outright banning them, making our pages sensitive to their presence, so that when we detected them, we'd display a false price, defeating their whole operation.

We finally just decided to take the high road, temporarily banning any rogue IP addresses we detected (we couldn't make it permanent because many of the requests came from the Amazon cloud, from which we also receive some legitimate requests)

EDIT: you wouldn't think that requests for a few hundred thousand products would amount to a DDOS, but the bot was rather poorly written and grossly inefficient in the way it walked through the list.

[+] adrianpike|10 years ago|reply

I built a system called caltrops that did almost exactly that. As a given session's requests grew more and more suspicious, their data would skew from reality further and further. A real user on the line would notice immediately (and the more real-looking the user interactions, the more it would reduce suspicion), but competitors scraping our data would get pretty deliciously bunk data.

[+] dangtard|10 years ago|reply

[deleted]

[+] zer00eyz|10 years ago|reply

"Thousands of people use Mailinator everyday, so clearly, its a useful tool that many sites accept"

How many of you would have an outright revolt on your hands from your QA/QE folks if you banned mailinator? I think everyplace I worked would experience this same issue if we did this.

[+] nkassis|10 years ago|reply

Could use + in the first part of email such as: [email protected] to create throwaways. most sites consider those to be different email address then [email protected] for account purposes but email service, who respect the rfc, will threat them as the same.

[+] 8ig8|10 years ago|reply

One way to get around a domain blacklist is to point your own domain to Mailinator. Heck, since last year you can even get your own private Mailinator...

http://mailinator.blogspot.com/2014/10/mailinator-launches-p...

[+] jessaustin|10 years ago|reply

This reminds me of the sites that discouraged hotlinking by examining Referer and then sending Goatse.

[+] kpcyrd|10 years ago|reply

http://ascii.textfiles.com/archives/1011

[+] scoj|10 years ago|reply

That was a fun read.

It took me a bit to get my head around the use cases. It's sometimes amazing how many different ways you can twist a simple (complex really) thing like email into a product/idea.

[+] brobinson|10 years ago|reply

This is great, but it seems like he got rid of the separate page now (the link in the article 404s) and the text is just inline again.

[+] thetruthseeker1|10 years ago|reply

I love mailinator!

However tricking site scrappers may not work perfectly if the site scrappers maintained a list of websites in their "whitelist". Say if I am scrapping mailinator.com for domain names, if I see gmail.com or yahoo.com, I might just not put them in my database because they are in my whitelist.

[+] fapjacks|10 years ago|reply

I've used Mailinator for years and it's always interesting to read what this dude has to say.

[+] codexon|10 years ago|reply

Mailinator seems to have added some other anti-scraping detection.

Unfortunately it does not work very well as I was not scraping mailinator, but still somehow got IP banned. Fortunately my ip has changed. But they definitely have some strange and overzealous method now.

[+] simi_|10 years ago|reply

Here's a list of disposable email domains if you'd really like to block them: https://github.com/lavab/disposable

I would go one step further and look for {spam_words} in "username+{text}@{googledomain}.com", where spam_words can be "junk", "spam", etc. This is like a very narrow edge case, but still might catch something. Again, if you're into that kind of thing; I'm quite skeptical that it brings any value.

[+] octo_t|10 years ago|reply

until you have a the german guy, Joseph Unker, with junker89@gmail and your validation prevents them signing up :)

[+] w8rbt|10 years ago|reply

Great story! Before deciding to use blacklists or lockouts (on anything), know that it can and will be used against you.

[+] serve_yay|10 years ago|reply

So much fun to read. Great post, thank you.

[+] belovedeagle|10 years ago|reply

Most of the comments here about '+' parts are rendered completely irrelevant by single-user domains. Not to mention "one email ~= one person" schemes.

[+] botbot|10 years ago|reply

Why not encode the domain strings into an image?

OCR requires a lot more programming effort compared to a text-based content scraper

[+] tempestn|10 years ago|reply

FTA: "Could I make it harder to scrape? Well, I could, but wouldn't really slow anyone down much."

I think that's the basic idea. He could spend his time making it harder to scrape, like the bar across the steering wheel. Some people would be deterred, others wouldn't, and time would be wasted all around.

[+] eru|10 years ago|reply

It's unfair to blind people.

[+] unknown|10 years ago|reply

[deleted]

[+] unknown|10 years ago|reply

[deleted]

[+] TerryADavis|10 years ago|reply

[deleted]

[+] tegansnyder|10 years ago|reply

I'm not sure his method would prevent a headless scraper like CasperJS or PhantomJS from doing the dirty work, but nice technique nonetheless.

[+] geofft|10 years ago|reply

At least at the time of writing, if you had enough foresight and engineering time to set something like that up, you had enough foresight and engineering time to not make your system treat email addresses as meaningful identities.

60 comments