(no title)
DavidMcLaughlin | 14 years ago
At work we had a researcher from Yahoo Mail come in and give a presentation on the machine learning techniques they use to try and stop spammers abusing their mail servers. It was eye-opening to learn just what kind of hourly battle they face to keep spam out of their systems and the ways they are trying to combat it. It was even more enlightening when the presenter told stories about the problems that machine learning can't solve - like people within the company being bribed to whitelist spam companies based in Vegas.
On the surface it's such a simple problem, and I'm sure anyone who's tried to prevent their web application's outgoing mail being marked as spam by the evil corporations of Yahoo and Google will have had the desire to go write a blog post saying what a crock of shit the whole thing is and how they would never take part in that. But here's the thing - those systems are in place because if they weren't, email would be a completely useless form of communication at this point.
The people sending spam make _millions_ of dollars abusing a system which is popular because its open and based on trust. That kind of money combined with greed gives people all different levels of drive and incentive to get their emails about bigger penises and viagra through to your inbox. Every time they prevent one form of attack, these guys will create a new one.
To do this they do things like install mail servers on unsuspecting user's machines, specifically targeting Yahoo/Hotmail/Google users because their IP will obviously need to be trusted by those companies. They will also hack into other people's private mail servers. They will spoof email headers and pretend they're someone else. They will hire people, experts, who will find new ways of breaking in to servers they detect as having mail servers running on them. All this just to get past the spam filters and prevention that make email a useful form of communication to begin with.
And let's forget the people who couldn't set up their own mail server for just a second. I like to think I know what I'm doing. After installing Postfix and jumping through all the hoops to get my emails whitelisted by Gmail and making sure I didn't have an open relay on my mail server, you know what happened? Someone managed to hack in by brute force anyway. I only noticed because of the _millions_ of automated replies that were coming in every day from dead email accounts or people that were out of office.
Now, I could have worked hard to fight this. I could have did something other than changing my passwords and hoping they didn't get crack them again. But the point is - I only ran a mailserver to get email delivered to me on my personal domain. I didn't want to have to fight and battle and dedicate myself to solving this problem. I wanted to take this thing for granted. I just wanted to send and receive email. Instead bad people could not only sit there and read all my incoming mail - but they could use my server to spam people and get me blacklisted and blocked from so many other services I worked so hard to be trusted by. And they did all this without even specifically targeting me. I was a statistic to them, someone who simply didn't know what they know. In the end, I moved my personal mail account to Google Apps, free of charge. Problem solved.
By using Gmail or Yahoo Mail or Hotmail - you are almost definitely more secure than setting up your own mailserver. You have people paid hundreds of thousands of dollars a year working full time to make sure your data is secure. I mean if privacy is your reason not to use Gmail, then I hope for your sake your mail server is secure. Maybe you think it is. I know I did too.
And all these people complaining about advertisements based on the content of their emails. Yahoo Mail had a team of like 30 people just doing _research_ on how to stop spammers. Then all these other people working on support. How does that service get provided to us _free of charge_ without advertisements or some sort of monetisation? I know in some people's heads they think it's literally just a Bayesian classifier and some hand-coded rules, but it's so beyond that.
And of course, let's not forget the fact that a lot of people would not be able to set up their own mail server anyway. Maybe you don't need them, but Hotmail, Gmail and Yahoo Mail enable hundreds of millions of people to communicate _for free_ with other people around the world that otherwise wouldn't be technically competent enough to buy a domain name and set up a local mail server. It lets you communicate with them too, because they don't get frustrated wading through hundreds of spam emails just to read the good stuff.
And that system only works because we have good guys that are fighting the bad guys who want to ruin it for the rest of us. And this is just the one example of email. Which has all this decentralised and open properties that you desire. I am reminded of Diaspora when they released a first beta of their code and it got absolutely torn to shreds for security reasons, and we haven't heard much since.
The real world sucks.
That's why I think it might be a good idea for you to go work for Google.
praptak|14 years ago
Yes, spam fighting is hard. Yes, it's probably easier with huge centralized installations (he actually observed that at this point the centralization offers advantages over the decentralized model.) But his main point was not about spam nor even about e-mail in general. His point was that it is worth putting the additional effort into making decentralized systems work. This is definitely not what Google are doing.
mkr-hn|14 years ago
jasonzemos|14 years ago
The centralized solution you've proposed carried to it's fullest extent is basically eliminating email altogether, where a small cabal of whitelisted services are only able to pass messages to each other. If spam detection software must remain secretive and proprietary at these big companies, this is basically a capitulation to the spammers.
DavidMcLaughlin|14 years ago
The anti-spam systems work because they are based on content of emails and properties across the providers entire user-base. Every time you click "Mark as spam" you are contributing data for all users in the service. In a decentralised service, even if people agreed to submit all their emails and information for the greater good (which they probably wouldn't), the data still needs to be centralised somewhere and secured by experts. The blacklist/whitelist of notorious spammers and servers needs to be maintained somewhere. You end up having a committee to do that, an elected/trusted group of people and they need to deal with appeals, etc.
Two:
If the logic for blocking spam were public, don't you think that would make it much easier for spammers to circumvent?
Edit - I can't reply to the user below. Must be some HN feature. But the logic for accepting an email is essentially a decision tree, it is based on data and evolves over time. It is a very different problem from something like encryption.
T-hawk|14 years ago
Arguably, that's exactly what Facebook is. Users whitelist each other and use that channel to communicate, skipping email.
rwmj|14 years ago
If people are bribing insiders at Yahoo to whitelist email servers in Las Vegas, why aren't the insiders and the spammers all in prison?
bhickey|14 years ago
I also think you're placing a mistaken emphasis on data. It's address books, not your data, that provide lock-in on these services. As far as I know, any of them will let you wrest your e-mail from their claws via IMAP or POP. The hard part is telling your contacts to mail you at <address>@gmail.com instead of <address>@hotmail.com
Full disclosure: I recently accepted a job from Google. My opinions on this matter are mine alone and are not based on any confidential information. I forward e-mail from my own domain to gmail. I also run a mixmaster anonymous remailer.
joeyh|14 years ago
This is not a description of your email server being cracked. It's a description of someone Joe-jobbing pretending to send mail from your domain. Duckgo for mitigation techniques..
spudlyo|14 years ago
http://en.wikipedia.org/wiki/Backscatter_(e-mail)
dredmorbius|14 years ago
carbonica|14 years ago
The truth is, the OP's domain was probably considered to be in a bad "neighborhood" because his mail server had been compromised for spamming purposes at one point or another. It's dreadfully easy to either misconfigure a mail server or to end up with your mail server compromised.
Regardless, it's easy to hate on Google, especially in a primarily entrepreneurial forum where those posting are often trying to solve tough problems with far fewer resources. But Google is solving tough problems, even when you feel you've been wronged by an algorithm. Gmail has had an unbelievably successful spam filter for years, forcing the competition to rise to the occasion and match it, to the point where people forget how serious a problem spam is. It's not trivial, and it doesn't mean there's a democratic crisis when your e-mails end up in a spam bin. Especially when it's quite likely because your mail server was compromised.
davidw|14 years ago
I didn't feel the 'hate'. I read that he didn't particularly care for Google's approach. He certainly says nothing about Google not solving tough problems.
I thought it was a pretty fair piece actually, giving Google credit where it's due, and without trying to demonize them; just stating that he doesn't agree with where they're going.
kragen|14 years ago
I hope I didn't come across as "hating on Google."
jff|14 years ago
abecedarius|14 years ago
mgkimsal|14 years ago
pyre|14 years ago
zinkem|14 years ago
It also seems like putting everyone's information in one place makes it easier for hackers to harvest, as well. Gmail probably has a security hole somewhere, too. If gmail's hole is discovered, everyone's emails are compromised (or a large number of people). If a private server gets compromised, there isn't as much there. There's not as much motivation to hack 1000 servers to get 1000 people's information as there is to hack 1 server to get 1000 people's information (although I recognize that one server is going to be a lot harder to crack on average).
I'm open to an education on this topic, as I don't know the methods of modern spammers/crackers.
j_baker|14 years ago
Google dreams of being able to handle all that information on one server.
Besides that, it's not incredibly common (albeit not impossible) for people to steal information by actually hacking directly into their servers, especially with someone like Google. More likely ways to get at someone's email is through XSS or phishing attacks.