top | item 2933720

(no title)

This guy really should go work for Google and figure out the problems they need to deal with running a service like Gmail. Even for just a little while.

At work we had a researcher from Yahoo Mail come in and give a presentation on the machine learning techniques they use to try and stop spammers abusing their mail servers. It was eye-opening to learn just what kind of hourly battle they face to keep spam out of their systems and the ways they are trying to combat it. It was even more enlightening when the presenter told stories about the problems that machine learning can't solve - like people within the company being bribed to whitelist spam companies based in Vegas.

On the surface it's such a simple problem, and I'm sure anyone who's tried to prevent their web application's outgoing mail being marked as spam by the evil corporations of Yahoo and Google will have had the desire to go write a blog post saying what a crock of shit the whole thing is and how they would never take part in that. But here's the thing - those systems are in place because if they weren't, email would be a completely useless form of communication at this point.

The people sending spam make _millions_ of dollars abusing a system which is popular because its open and based on trust. That kind of money combined with greed gives people all different levels of drive and incentive to get their emails about bigger penises and viagra through to your inbox. Every time they prevent one form of attack, these guys will create a new one.

To do this they do things like install mail servers on unsuspecting user's machines, specifically targeting Yahoo/Hotmail/Google users because their IP will obviously need to be trusted by those companies. They will also hack into other people's private mail servers. They will spoof email headers and pretend they're someone else. They will hire people, experts, who will find new ways of breaking in to servers they detect as having mail servers running on them. All this just to get past the spam filters and prevention that make email a useful form of communication to begin with.

And let's forget the people who couldn't set up their own mail server for just a second. I like to think I know what I'm doing. After installing Postfix and jumping through all the hoops to get my emails whitelisted by Gmail and making sure I didn't have an open relay on my mail server, you know what happened? Someone managed to hack in by brute force anyway. I only noticed because of the _millions_ of automated replies that were coming in every day from dead email accounts or people that were out of office.

Now, I could have worked hard to fight this. I could have did something other than changing my passwords and hoping they didn't get crack them again. But the point is - I only ran a mailserver to get email delivered to me on my personal domain. I didn't want to have to fight and battle and dedicate myself to solving this problem. I wanted to take this thing for granted. I just wanted to send and receive email. Instead bad people could not only sit there and read all my incoming mail - but they could use my server to spam people and get me blacklisted and blocked from so many other services I worked so hard to be trusted by. And they did all this without even specifically targeting me. I was a statistic to them, someone who simply didn't know what they know. In the end, I moved my personal mail account to Google Apps, free of charge. Problem solved.

By using Gmail or Yahoo Mail or Hotmail - you are almost definitely more secure than setting up your own mailserver. You have people paid hundreds of thousands of dollars a year working full time to make sure your data is secure. I mean if privacy is your reason not to use Gmail, then I hope for your sake your mail server is secure. Maybe you think it is. I know I did too.

And all these people complaining about advertisements based on the content of their emails. Yahoo Mail had a team of like 30 people just doing _research_ on how to stop spammers. Then all these other people working on support. How does that service get provided to us _free of charge_ without advertisements or some sort of monetisation? I know in some people's heads they think it's literally just a Bayesian classifier and some hand-coded rules, but it's so beyond that.

And of course, let's not forget the fact that a lot of people would not be able to set up their own mail server anyway. Maybe you don't need them, but Hotmail, Gmail and Yahoo Mail enable hundreds of millions of people to communicate _for free_ with other people around the world that otherwise wouldn't be technically competent enough to buy a domain name and set up a local mail server. It lets you communicate with them too, because they don't get frustrated wading through hundreds of spam emails just to read the good stuff.

And that system only works because we have good guys that are fighting the bad guys who want to ruin it for the rest of us. And this is just the one example of email. Which has all this decentralised and open properties that you desire. I am reminded of Diaspora when they released a first beta of their code and it got absolutely torn to shreds for security reasons, and we haven't heard much since.

The real world sucks.

That's why I think it might be a good idea for you to go work for Google.

discuss

praptak|14 years ago

Thank you, this post was good and informative. Nevertheless I think you missed his main point and concentrated on something that was merely incidental to it.

Yes, spam fighting is hard. Yes, it's probably easier with huge centralized installations (he actually observed that at this point the centralization offers advantages over the decentralized model.) But his main point was not about spam nor even about e-mail in general. His point was that it is worth putting the additional effort into making decentralized systems work. This is definitely not what Google are doing.

mkr-hn|14 years ago

I think Google could make a big difference by licensing an email antispam API like Automattic does with Akistmet. You'd get the best of both worlds.

jasonzemos|14 years ago

Your solution to admin incompetence is for a centralized service to eliminate the admin. Why can't the service just provide competence? If dozens are people are working round the clock to eliminate spam for your free mail service, why can't they package that and let you control your own data?

The centralized solution you've proposed carried to it's fullest extent is basically eliminating email altogether, where a small cabal of whitelisted services are only able to pass messages to each other. If spam detection software must remain secretive and proprietary at these big companies, this is basically a capitulation to the spammers.

DavidMcLaughlin|14 years ago

One:

The anti-spam systems work because they are based on content of emails and properties across the providers entire user-base. Every time you click "Mark as spam" you are contributing data for all users in the service. In a decentralised service, even if people agreed to submit all their emails and information for the greater good (which they probably wouldn't), the data still needs to be centralised somewhere and secured by experts. The blacklist/whitelist of notorious spammers and servers needs to be maintained somewhere. You end up having a committee to do that, an elected/trusted group of people and they need to deal with appeals, etc.

Two:

If the logic for blocking spam were public, don't you think that would make it much easier for spammers to circumvent?

Edit - I can't reply to the user below. Must be some HN feature. But the logic for accepting an email is essentially a decision tree, it is based on data and evolves over time. It is a very different problem from something like encryption.

T-hawk|14 years ago

> basically eliminating email altogether, where a small cabal of whitelisted services are only able to pass messages to each other.

Arguably, that's exactly what Facebook is. Users whitelist each other and use that channel to communicate, skipping email.

rwmj|14 years ago

The real solution is political.

If people are bribing insiders at Yahoo to whitelist email servers in Las Vegas, why aren't the insiders and the spammers all in prison?

bhickey|14 years ago

They can't provide competence in a box because there's no free lunch. What would motivated a free e-mail provider would hand you the keys to the castle? If you want this product, get ready to pay for it.

I also think you're placing a mistaken emphasis on data. It's address books, not your data, that provide lock-in on these services. As far as I know, any of them will let you wrest your e-mail from their claws via IMAP or POP. The hard part is telling your contacts to mail you at <address>@gmail.com instead of <address>@hotmail.com

Full disclosure: I recently accepted a job from Google. My opinions on this matter are mine alone and are not based on any confidential information. I forward e-mail from my own domain to gmail. I also run a mixmaster anonymous remailer.

joeyh|14 years ago

"Someone managed to hack in by brute force anyway. I only noticed because of the _millions_ of automated replies that were coming in every day from dead email accounts or people that were out of office."

This is not a description of your email server being cracked. It's a description of someone Joe-jobbing pretending to send mail from your domain. Duckgo for mitigation techniques..

spudlyo|14 years ago

See also: backscatter

http://en.wikipedia.org/wiki/Backscatter_(e-mail)

dredmorbius|14 years ago

In fairness, there's not enough information provided to determine which this is, though my suspicion is that OP wouldn't know the difference regardless.

carbonica|14 years ago

It is truly disheartening to see you'd been downvoted when I came into this thread.

The truth is, the OP's domain was probably considered to be in a bad "neighborhood" because his mail server had been compromised for spamming purposes at one point or another. It's dreadfully easy to either misconfigure a mail server or to end up with your mail server compromised.

Regardless, it's easy to hate on Google, especially in a primarily entrepreneurial forum where those posting are often trying to solve tough problems with far fewer resources. But Google is solving tough problems, even when you feel you've been wronged by an algorithm. Gmail has had an unbelievably successful spam filter for years, forcing the competition to rise to the occasion and match it, to the point where people forget how serious a problem spam is. It's not trivial, and it doesn't mean there's a democratic crisis when your e-mails end up in a spam bin. Especially when it's quite likely because your mail server was compromised.

davidw|14 years ago

> Regardless, it's easy to hate on Google, especially in a primarily entrepreneurial forum where those posting are often trying to solve tough problems with far fewer resources. But Google is solving tough problems,

I didn't feel the 'hate'. I read that he didn't particularly care for Google's approach. He certainly says nothing about Google not solving tough problems.

I thought it was a pretty fair piece actually, giving Google credit where it's due, and without trying to demonize them; just stating that he doesn't agree with where they're going.

kragen|14 years ago

No, our mail server has never been compromised for spamming purposes. I'm well aware of how easy it is to misconfigure a mail server, and it's not that I think we are too smart or paranoid to have done so; it's just that in the years that we've been struggling with that problem, we've never discovered that misconfiguration, or discovered outgoing spam (other than bounces from e.g. kragen-tol-request.)

I hope I didn't come across as "hating on Google."

jff|14 years ago

All it takes to be considered a "bad neighborhood" is to have a dynamic, ISP-owned IP, as I found out when I tried to send mail from my personal server. And yes, I'm too cheap to pay Comcast even more money for a static IP.

abecedarius|14 years ago

I run my own mailserver too and saw some similar problems from early on, though not AFAIK with gmail in particular. If it ever has been compromised, I doubt it was right away.

mgkimsal|14 years ago

So... "it's too hard so I'll just let google handle all my email". Works fine until people starting blocking google mail because they don't trust them. This isn't "might happen one day" - it happens to me today already. You're just punting on the real issue, kicking the can a few months down the road.

pyre|14 years ago

Who's blocking Google? Do you mean everything from Google servers or only @gmail.com addresses? If so it's trivial to get a domain and still be using Google for your email.

zinkem|14 years ago

I never got spam before I used gmail. Now maybe this has more to do with timing, but it seems like putting everyone's emails on the same domain just makes things easier for spammers. Seems to me like spam is a problem caused by centralization, not solved by it.

It also seems like putting everyone's information in one place makes it easier for hackers to harvest, as well. Gmail probably has a security hole somewhere, too. If gmail's hole is discovered, everyone's emails are compromised (or a large number of people). If a private server gets compromised, there isn't as much there. There's not as much motivation to hack 1000 servers to get 1000 people's information as there is to hack 1 server to get 1000 people's information (although I recognize that one server is going to be a lot harder to crack on average).

I'm open to an education on this topic, as I don't know the methods of modern spammers/crackers.

j_baker|14 years ago

It also seems like putting everyone's information in one place makes it easier for hackers to harvest, as well.

Google dreams of being able to handle all that information on one server.

Besides that, it's not incredibly common (albeit not impossible) for people to steal information by actually hacking directly into their servers, especially with someone like Google. More likely ways to get at someone's email is through XSS or phishing attacks.