top | item 528541

Ask HN: Do you trust Amazon S3 or Mosso Cloudfiles not to lose or corrupt your data?

15 points| bk | 17 years ago | reply

Do you treat these systems as "reliable" and "safe" backends (since they're physically and geographically replicated) or do you feel it's still necessary to back up data hosted on their services?

How/where do you back up that data to, especially with large amounts of data (e.g. 100s of GB+)?

16 comments

order
[+] jasonkester|17 years ago|reply
S3 does this for a living. I don't.

S3's entire business revolves around never losing anybody's stuff. They have hordes of smart people working on the problem, and they have an architecture that makes it really hard to lose anything by accident.

My business, on the other hand, revolves around letting people draw cartoon testicles onto other people's powerpoint presentations in the pretense of a "web meeting". Which of us would you rather trust to keep hold of your valuable data?

[+] gcv|17 years ago|reply
S3 does this for a living. I don't.

So does Carbonite, and, as you can see in today's news, it also lost people's data. I strongly suspect Amazon does a better job than Carbonite with both software and infrastructure, but still. Mistakes happen, bugs happen, and data gets lost even by smart people working for solid and reputable companies.

S3 only backups = eggs in one basket. It's a terrific, strong basket made of titanium and suspended on aircraft cable, but it's still just one basket.

[+] nebula|17 years ago|reply
The point here is not about picking winners between S3 and you/me; Probably that's a no-brainer at this point.The thing to worry about is whether S3 is reliable enough that you can trust your user's data with S3 alone.

While S3 might be doing this for a living, Amazon doesn't. AFAIK,revenues from cloud services are not at all significant given Amazon's scale. What does S3 license say? Is Amazon liable if it loses data stored in S3?

[+] charlesju|17 years ago|reply
I trust S3 and Mosso Cloudfiles more-so than my own single-failure HDD. They have a lot of redundancy built into their system, and although every system has risks, its risk are far less than our person undistributed implementations of file storage.
[+] lyime|17 years ago|reply
It pretty much comes down to this. No one can guarantee 100% redundancy, not even Amazon. Although their setup and infrastructure will probably reduce overall MTTF (mean tike to failure) in comparison to a small hosting company or your personal backup solution. Very simple math can show that their system is probably more reliable then others, yours or mine.
[+] spkthed|17 years ago|reply
Sure? It's like anything else, don't trust a single solution. In addition to local backups S3 is good, in addition to S3, Mosso. There will always be accidents, data loss, corruption, etc. The only way to mitigate that risk is simply to cover your bases and avoid relying on a single thing.
[+] electromagnetic|17 years ago|reply
Agreed, I have backups on my laptop and external hard drive, so if one of my drives fail I'm covered. If both fail, well that's why I've got the really important stuff backed up on my FTP server.

Every few months I email my hotmail account with all my writing. As text is ridiculously compressible I haven't even hit the 10MB attachment limit. I also have all my more current documents saved in Google Docs, mostly for the portability but also for the very slim chance of an "Oh my god I broke my laptop, holy crap my house burnt down, and dammit I forgot how to connect to my FTP server and for the love of god I forgot the password to my hotmail account."

[+] rs|17 years ago|reply
They are as safe as any other service provider. Ultimately, its always good practice to:

1. Do your own backups

2. Routinely test that you can recover from these backups

I would argue that point (2) is much more important than point (1). I do try to do that at least once a month to ensure that there aren't any bugs in the backups, including any missing parts of the infrastructure.

For mirroring really large data, rsync is a viable solution.

Edit: I do want to add that performing your own backups is really subjective and you might need to ask yourself - what's the cost to me/my business/my users in the event that I can't recover from backups and or my provider failed in their own reliability (for e.g. Carbonite)

[+] vaksel|17 years ago|reply
Just use both at the same time. Use S3 for active stuff, and Mosso as your secondary backup. The chances of S3 and Mosso crapping out at the same time are pretty much nill. And the cost of hosting something on S3/Mosso, as a one time backup is dirt cheap
[+] mikecuesta|17 years ago|reply
I have a lot of faith in S3, more so than any local storage I may have.
[+] iamelgringo|17 years ago|reply
My site, cuuute.com is hosted on EC2. I use Elastic block for database storage, and I back both that and my EC2 instance to S3 on a regular basis.

I've been using that as my hosting for 2-3 months, and I couldn't be happier.

[+] bbuffone|17 years ago|reply
S3, Mosso and other cloud providers are not an appropriate back up mechanism. These are good for sharing files and a temporary storage system. The only legitimate backup is a physically stored disk. S3 doesn't provide versioning nor deletion protection.

The problem is I hear about these new "Cloud" storage companies claiming backup, but when asked what do they do. They rely on Amazon to move the files into different data centers. But anyone can delete a file or directory by accident and poof the files are gone forever.

If the storage provider does have permanent physical storage in there backup plans, don't think your files are forever.

[+] JungleDave|17 years ago|reply
Yes, that's a good example of why not to use S3 as a simple backup destination. However software like Jungle Disk on running on top of S3 adds versioning and deleted file retention to make it act like a real backup system.

Having a local backup is great too, but that won't protect you from fire, flood, or theft in many cases.