top | item 4952714

AWS Data Pipeline

78 points| ing33k | 13 years ago |aws.amazon.com | reply

25 comments

order
[+] 23david|13 years ago|reply
AWS is slowly becoming the Oracle of our generation, in the sense that they have found a way to lock startups and large companies into a software/services ecosystem that is really really hard to stop using once you get started.

You start with regular open-source instances, but that's just the hook. Once you have EC2, it's really easy to get started with AWS 'magic' services like Elasticache and RDS. It's easier than setting up a memcache cluster or mysql right? But once you get comfortable with those services, it's just so easy to keep going down that road and making your software reliant on proprietary services like SimpleDB, S3 and AWS Data Pipeline. And then you wake up at some point and find that you're 100% dependent on AWS.

By that point, if you're lucky your monthly AWS bill gets you an invite to speak at the next AWS conference. :-) You might even get a personal customer support rep that calls you when your servers go down.

A website/service cannot by definition be HA if it's reliant on one service or infrastructure provider. AWS has so many proprietary parts now that you really need to be careful which ones to use so that you don't wake up one day and realize that you're completely dependent on AWS.

I'd stay away from this with a 30-foot pole, but if we really did need to use it, I would only use the features that I felt comfortable building internally at some future point if we chose to move off of AWS.

It's important to keep your software stack as flexible and open as possible, and for risk-management you should plan on using (or least having the option of using) multiple vendors and service providers.

[+] jacques_chester|13 years ago|reply
The thing is, though, even though it's in Amazon's interests to create dependence on AWS, it's also in their customer's interests to use those services.

When you double down on a rich platform you can get enormous advantages. Avoiding the inner platform is a biggie; not paying portability tax is another.

The urge to be independent of any vendor, any platform etc is attractive to us as engineers. But it comes at a high price too.

[+] donavanm|13 years ago|reply
"A website/service cannot by definition be HA if it's reliant on one service or infrastructure provider." you seem to be conflating highly available with a diverse supply chain. A lot of highly available systems are "locked" in to one provider, whether it's broadcom/citrix/intel/etc.
[+] balakk|13 years ago|reply
A whole lot of glue-job VMs just became unnecessary.
[+] mcos|13 years ago|reply
Just this week I was looking for a better solution that would back up my RDS database to S3. I'm currently using mysqldump, but the RDS instance size has grown extremely large and so, it has become unwieldly. Hopefully this will help with that.
[+] mseebach|13 years ago|reply
It might not be appropriate for you, but a good way to handle MySQL backups is to maintain a mirror. This has the added benefit of being available as a fail-over and as a secondary instance where you can run reports or test long-running queries on current data without the risk of taking prod down.
[+] gourneau|13 years ago|reply
Dear AWS hire a designer. Thanks.
[+] Raphael|13 years ago|reply
The AWS Management Console was recently redesigned with Bootstrap.
[+] ucee054|13 years ago|reply
You shouldn't really be trusting Amazon with your datawarehouse or paying that much for the storage, but from a technical convenience standpoint AWS is probably the best solution for some of the horrid little inept kinds of organizations that I have encountered.
[+] donavanm|13 years ago|reply
Totally. I know I create lots of business value when I spend a day dicking around with mysqldump and rsync and inotify and scp and hfds. Who would want to use this kind janitorial service when the could do it themselves?