Lessons from 10 Years of Amazon Web Services

[+] yarapavan|10 years ago|reply

James Hamilton listed some of the highlights over the last decade at his blog - http://perspectives.mvdirona.com/2016/03/a-decade-of-innovat...

[+] jcrites|10 years ago|reply

James has put out a lot of great material. He's given a talk series and wrote a conference paper called "On Designing and Deploying Internet-Scale Services" which is a trove of valuable design insights:

https://www.usenix.org/legacy/event/lisa07/tech/full_papers/... (somewhat condensed)

Slides from a talk: http://mvdirona.com/jrh/talksAndPapers/JamesRH_AmazonDev.pdf

He writes some of the more practical and useful advice I've seen about how to run a successful business based on high-scale distributed systems.

One of his mnemonics that's stuck with me is the "Four Rs" of recovery-oriented computing: Restart, Reboot, Reimage, Replace (slide 6 in PDF). A human shouldn't be engaged to troubleshoot a problem until the platform has first tried in successing to restart the software, reboot the OS, re-image the machine, and finally replace the hardware entirely. After these auto-recovery steps have failed, only then is it time to engage a human. He describes the connection between these techniques and significantly lower operations costs.

[+] jimbokun|10 years ago|reply

"Primitives not frameworks"

This is why Amazon beat Google in the "as a Service" space. Google provided a framework with AppEngine, but customers wanted the flexibility of the primitives offered by AWS.

[+] nopzor|10 years ago|reply

I think that Amazon being first in 2005/2006 helped a lot, too. Google now provides "primitives" through GCE. I think Google realizes that the future is GCE not GAE.

[+] eranation|10 years ago|reply

Exactly

Offer IaaS first, the PaaS. Google had it backward.

While AWS does have a small vendor lock in too, in the end it's still recognizable , I mean it's doable for example to move an AWS architecture to open stack without a major code rewrite. If you built your web app on GAE and GWT however, your investment to move it out of Google will be much bigger.

[+] rodionos|10 years ago|reply

You have to give full credit to AWS for inventing previously unknown pricing models. In some cases, it was actually the pricing model that created new use cases. Glacier and S3 comes to mind.

[+] jeffbarr|10 years ago|reply

That's a very important observation. When I describe AWS to audiences I always make clear that it is a combination of a technology and a business model.

[+] axelfontaine|10 years ago|reply

"A good litmus test has been that if you need to SSH into a server or an instance, you still have more to automate."

Amen. We couldn't agree more: https://boxfuse.com/blog/no-ssh

[+] bluecmd|10 years ago|reply

So, with No SSH - how do you debug that one-off problem that is only on machine abc12? I'm not talking about mutating, I'm talking about attaching gdb to the process while it's handling requests. I'm talking about collecting CPU profiling information from production. Stuff like that.

[+] LinuxBender|10 years ago|reply

Proposal:

Instead of ssh vs. no-ssh debate, how about:

* Use SSH AllowGroups to limit logins to the development manager/lead or on-call engineer

* with a policy that for each time you have to call them beyond N times per week, you have to buy their team a round of drinks of their choosing?

Deal?

24 comments