top | item 19495225

On Internal Engineering Practices at Amazon

350 points| wheresvic1 | 7 years ago |jatins.gitlab.io

127 comments

order
[+] anon20190326|7 years ago|reply
Ex-Amazon SDE here.

A lot of this has changed.

First, there is a movement to build a lot of services in Native AWS instead of MAWS/Apollo.

Apollo doesn't require copying configs anymore; you can have the config exist as part of the package you are deploying. Generally, that's a best practice.

Pipelines can be configured as code too.

There is a centralized log service which requires onboarding. It does require some commandline tools, but it works. The logs get stored on S3, IIRC.

If containers suits your needs, you'd be hard-pressed to find someone telling you not to use it. Generally, though, you would want to use bare metal for Amazon's scale.

There is also a change to how NPM is being used at Amazon. It was a lot easier towards the end of my tenure, and was probably as close as it would get when working with Amazon's build systems.

Amazonians are generally conservative and don't use the latest and greatest unless it solves an actual customer need. Customer Obsession is still the defining leadership principle.

[+] konspence|7 years ago|reply
Any take on how this looks for someone who doesn't work at Amazon?

I know Apollo is their deployment tool. No idea how they plays into a Kubernetes or even container stack.

[+] m0zg|7 years ago|reply
> you would want to use bare metal for Amazon's scale

Containers are "bare metal".

[+] jcrites|7 years ago|reply
Based on my experience, the article contains a lot of misinformation. Some of the statements might have been true at one point in the past, but are now out of date by years, while others have never been true in the time I've been around.

Without getting into a point-by-point rebuttal, my reaction to each section/Exhibit is "that's wrong/misleading".

[+] throwaway504|7 years ago|reply
Isn't that a bit disingenuous? Your role at Amazon is nothing like the typical engineer at Amazon. You live in the shiny new world while the majority of engineers are stuck on something that's not too far from the article.
[+] lewisjoe|7 years ago|reply
I worked at a startup and then switched to a bigger product company: Zoho.

If there's one thing I'd take away for the rest of my career from Zoho, it would be frugality in adopting the latest of tech.

When NoSQL was all the rage, the company stood firm that relation databases had rock-solid mathematical foundation and stayed away from the bandwagon. It paid off.

When every other company wrote blogs about rewriting their software in NodeJS, Ruby & Python, the company stood ground with statically typed languages. It paid off.

My own team, Zoho Writer has a strong policy against incorporating third party libraries without good reasons. This way, the product is nearly a decade old, but the JS size has remained surprisingly small, all through its evolution.

I believe staying frugal in adopting the latest hype, can only be reasoned about in hindsight.

[+] geggam|7 years ago|reply
I have shared this blog posting more than I care to admit.

https://mcfunley.com/choose-boring-technology

It amazes me that people will put their companies and employees at risk by using the new cool stuff just because everyone else is ( looking at you k8s )

[+] ilovecaching|7 years ago|reply
One misconception people get wrong, from my perspective, about most FAANG is that you get to work with new and shiny things, especially new langauges. This is really more applicable to startups where risk taking is in the DNA, you will mostly get really strong pushback because there is just too much effort involved to support more than two or three languages at scale. There are normally niche languages, but they are essentially statistical anomalies and are usually borne out of a real business need (like swift for iOS). Mostly engineering for engineering sake is also going to be frowned upon unless it really helps the business.

I do agree that Amazon is the worst in regards to OSS. They really need to fix that, even if jut for PR, because they are consuming so much of it for AWS.

[+] umanwizard|7 years ago|reply
Very good comment, +1.

People don't really get the rabbit hole that is necessary in order to introduce a new technology at a large company. Just off the top of my head, a few of the issues:

* Not everyone has followed the broader technology world outside the big company. It's totally possible to imagine a situation where nobody in your management chain or team has even heard of something relatively mainstream like Docker.

* Your company uses custom build, deployment, and dependency management systems. Someone will have to do the work to implement support for the new technology in these.

* If the new technology has its own opinionated ideas about how to do any of the above, like Rust crates, npm, pip, etc., forget about it. You need to interact with previously existing internal code which means you need to fit into the already existing build/deployment/dependency solution.

* Your company has a bunch of custom internal services. You need to create bindings for these services' APIs in the new language.

* Your company might be doing some esoteric stuff that the new technology has no support for, like for example if your network stack does HTTP in some sort of custom-tuned way for performance, and the new language's HTTP library only supports a subset of everything that's possible.

* The new technology may have correctness or performance issues that have never been discovered because it has never been used on many thousands of servers where a 1% difference in CPU makes a huge difference, or it has never been used with a codebase big enough to take >1 day to build, or with binaries larger than several GB, etc.

And the biggest one...

* Every day, many new people are joining the company and beginning to ramp up on the (internally) mainstream solutions that have institutional inertia. This will dwarf the speed at which you can convince people to switch to the new technology.

The pattern I have seen is that something starts off being used for one-off scripts that don't have a lot of dependencies on the existing gargantuan infrastructure, then very, very gradually gaining mindshare internally and becoming more supported.

[+] vonseel|7 years ago|reply
As far as I know, Node.js was used inside in a limited capacity, but they had an alternative for npm for security reasons, and you had to get an npm package approved to use it internally.

FWIW, this is a very good thing. A company as large as Amazon should do this with all of their repositories; even a small start-up should be doing this to mitigate suspicious packages / third-party code vulnerabilities.

[+] jrockway|7 years ago|reply
I worked on third-party package approvals at Google. The reasoning behind reviews was largely due to license compliance. If the license said "you have to display this license to end-users" then we had to make sure that the license was machine-readable and would be automatically bundled into the build to be displayed in that "open source licenses" section of pretty much every app ever. If the license said "by linking this into your code, you have to opensource all code at your company", we had to deny it. That sort of thing.

We suggested that people get security reviews, but it was up to the user of the package to figure out whether or not that was necessary. Often security reviews would be blocking the project's launch and would be done at that time.

The final thing we enforced was a "one version" policy. If everyone was using foobar-1.0, and you wanted to use foobar-2.0, it was on you to update everyone to foobar-2.0. This was the policy that people hated the most, but basically mandatory at the time because none of the languages widely used at Google supported versioned symbols. Having library A depend on foobar-1.0 and library B depending on foobar-2.0 meant that application C could not depend transitively on library A and library B at the same time, which would cause many disasters.

[+] michaelper22|7 years ago|reply
Morgan Stanley revealed at a recruiting event that they follow this practice. "You can definitely use Apache Spark! It's been reviewed by our Enterprise Architecture team; some APIs were removed, and we substituted 'Morganized' variants of others, which you won't find code/docs for on Github ;)"

Given the npm debacle, I could totally see even a small org running an internal Maven repo with approved versions of (popular, and especially obscure) libs.

[+] yomly|7 years ago|reply
I feel like the whole NPM/JS culture is around "there's a library for that" so I can see how the friction around having to get approval for npm packages would be an absolute velocity killer, but when you're as big as Amazon I really wouldn't want that particular JS flavour of "move fast and break things" powering the company...
[+] johnnyfaehell|7 years ago|reply
This is a good idea in large companies but the only time I've ever seen this implemented was when a developer wanted who wanted to become lead dev so started hogging permissions to things and set up an internal packagist that only he could add packages to. It didn't end well for him really.
[+] andrewguenther|7 years ago|reply
Node is picking up internally as a build environment for frontend JS which all used to be done in Ruby.
[+] raiflip|7 years ago|reply
Don't want to go into too much detail but this article is like taking the crappiest parts of the crappiest systems and declaring it representative of an entire product. There is a lot of really good internal tooling not mentioned here, and for the internal tooling mentioned here (like Apollo) absolutely none of its benefits are mentioned.
[+] tanilama|7 years ago|reply
Well, this article gets something right, but gets a lot of stuff wrong as well. What can be confirmed is that his access to the Amazon tech scene is limited, and he takes a sweeping generalization that is how the whole Amazon works.

Disclaimer: Ex-Amazonian, left like one year ago.

[+] tokyoHacker|7 years ago|reply
Ofcourse. What else can we expect from someone who has been there for just 2 years. It takes some audacity to speak about the tech scene of a company as large as Amazon with such limited time spent. Unless he has the overall view of the org ( like a VP of Engineering), I would take any assessment with a pinch of salt.
[+] coreyoconnor|7 years ago|reply
I agree.

Disclaimer: Also ex-amazonian, left like one year ago.

[+] kerng|7 years ago|reply
I like how Amazon has an MAWS movement internally, meaning "Move to AWS". I think most people think that they use AWS mostly, but they dont.

Its an interesting look behind the scenes at Amazon and how antiquated they appear to operate. Makes you wonder if Azure and Google have pretty good chances beating them down the road.

Edit: Interesting, further down one person commented that Amazon doesn't use AWS broadly because it's seen as not secure enough for certain workloads.

[+] jcrites|7 years ago|reply
Based on my experience, that information and some of the comments about it in the thread are out of date or inaccurate.

'Move to AWS' was a program focused on accelerating AWS adoption that was primarily active something like 5-7 years ago. The program achieved its goals and concluded: virtually all infrastructure was running on AWS. I worked on the program for part of that time, in the last couple of years it was active. Amazon's migration to AWS was was covered in a 2012 presentation at AWS re:Invent: "Drinking Our Own Champagne: Amazon's Migration to AWS" [1].

Some more recent efforts around AWS usage were covered in a 2016 talk: "How Amazon.com Uses AWS Management Tools" [2] (which references the earlier talk and discusses some of the changes since then). There are ongoing projects to improve and optimize usage of AWS, as well as to adopt some of the newer services.

[1] https://www.youtube.com/watch?v=f45Uo5rw6YY [2] https://www.youtube.com/watch?v=IBvsizhKtFk&t=13m20s

[+] anth_anm|7 years ago|reply
MS Internal tooling fucking sucks compared to Amazon.

Just atrocious. Though I wasn't in Azure so that might be better.

[+] ex_amazon_sde|7 years ago|reply
Ex-Amazon SDE checking in. The article is quite misleading.

The author confuses "shiny" with "good".

Amazon does package-based deployments because it scales well and allow engineers from many different teams to work on packages and also provides fast security updates.

Amazon used VMs more than a decade before container engines and the latter are still lacking security and stability.

Having worked in many companies, I would take Amazon's engineering practices over the modern shiny devops tool ecosystem every day.

I agree that Apollo is slow (due to the implementation) and has an ugly UI, and that the company has a very poor track record of contributing to OSS.

[+] cimi_|7 years ago|reply
Context: I spent five years as an engineer at Amazon, the last two as a tech lead on an internal developer tool (think SaaS for performance engineering).

This article is not untrue but it misses the fact that teams are empowered to own their solutions are not restricted in how they setup their environments and which tools they use. It's true that fixing these problems feels like wasted effort, it's by design: Amazon operates as many separate internal entities and I think replication of effort is an acknowledged downside of operating this way.

> 1. Deployments > Their internal deployment tool at Amazon is Apollo, and it doesn't support auto-scaling.

I had to manually scale up my service once in two years and we weren't over-provisioning wastefully. Before I left my product was supporting +40K internal applications with an infra+AWS cost < 2k / month.

We had good CI with deep integration with Apollo, you could track any change across the pipeline, we had reproducible builds and we had a comprehensive deployment log listing all changes.

Apollo is sloooooow though and the UI is very 90s.

> 2. Logs > Any self respecting company running software on distributed machines should have centralized, searchable logs for their services.

We were using Elastic Logstash Kibana powered by AWS ElasticSearch. I wrote a thin wrapper around logstash that was used in over 1K environments internally, so weren't the only ones doing this.

> 3. Service Discovery > What service discovery? We used to hard wire load balancer host names in config files.

Agree with this one. I will never forget the quality time I spent configuring those load balancers and ticketing people about DNS.

> 4. Containers

As other commenters mentioned, if you want to use containers, you're free to bypass all of this and run your service in AWS where you can use ECR, EKS etc if you want.

> (As far as I know, Node.js was used inside in a limited capacity, but they had an alternative for npm for security reasons, and you had to get an npm package approved to use it internally.)

I built my UI from scratch using create-react-app and yarn offline builds (no mystery meat) and I bypassed all the internal JS tooling, which I thought was very poor. This was changing though.

Finally, my personal anecdote: you could onboard our product in less than an hour (including reading docs), it required no further maintenance and gave you performance stats for free. So not all was bad :)

[+] cimi_|7 years ago|reply
> They have some amount of Rails, and JavaScript has to be there, but if you want to experiment with, say, Go, Kotlin, or anything else, you are going to get nothing but push back.

I missed this - starting 2018 we were writing all our backend logic in Kotlin and we got no push back from anyone.

[+] throwaway1280|7 years ago|reply
Ex-Amazon engineer of several years here.

This is a pretty interesting article, but it's important to know that Amazon's internal tooling changes pretty fast, even if it's mostly several years behind state-of-the-art.

Exhibit A: Apollo

Apollo used to be insane. It was designed for the use case of deploying changes to thousands of C++ CGI servers on thousands of website hosts, worrying about compiling for different architectures, supporting special fleets with overrides to certain shared libraries, etc etc. It had an entire glossary of strange terms which you needed to know in order to operate it. Deployments to our global fleet involved clicking through tens of pages, copy-and-pasting info from page to page, duplicating actions left right and centre, and hoping that you didn't forget something.

When I left, most of that had been swept away and replaced with a continuous deployment tool. Do a bit of setup, commit your code to the internal Git repo, watch it be picked up, automated tests run, then deployments created to each fleet. Monitoring tools automatically rolled back deploys if certain key metrics changed.

Auto scaling became a reality too, once the Move to AWS project completed. You still needed budgetary approval to up your maximum number of servers (because for our team you were talking thousands of servers per region!) but you could keep them in reserve and only deploy them as needed.

Manually copying Apollo config for environment setup was still kind of a thing though. The ideas of CloudFormation hadn't quite filtered down yet.

Exhibit B: logs

My memory's a bit hazy on this one. There certainly was a lot of centralized logging and monitoring infrastructure. Pretty sure that logs got pulled to a central, searchable repository after they'd existed on the hosts for a small amount of time. But, yes, for realtime viewing you'd definitely be looking at using a tool to open a bunch of terminals.

The monitoring tools got a huge revamp about halfway through my tenure, gaining interactive dashboarding and metrics drill-down features which were invaluable when on-call. I'm currently implementing a monitoring system, so my appreciation for just how well that system worked is pretty high!

Exhibit C: service discovery

Amusingly, a centralized service discovery tool was one of the tools that used to exist, and had fallen into disrepair by the time this person was working there.

This was a common pattern in Amazon. Contrary to the 'Amazon doesn't experiment' conclusion, Amazon had a tendency to experiment too well - the Next Big Thing was constantly being released in beta, adopted by a small number of early adopters, and then disappearing for lack of funding/maintenance/headcount.

I can't think of any time I hard-wired load balancer host names though. Usually they would be set up in DNS. We did used to have some custom tooling to discover our webserver hosts and automatically add/remove them from load balancers, but that was made obsolete by the auto-scaling / continuous deployment system years before I left.

As for the question of "can we shut this down? who uses it?" - ha, yes, I seem to remember having that issue. I think that, before my time, it wasn't really a problem: to call a service you needed to consume its client library, so you could just look in the package manager to see which services declared that as a dependency. With the move to HTTP services that got lost. It was somewhat mitigated over the years by services moving to a fully authenticated model, with client services needing to register for access tokens to call their dependencies, but that was still a work in progress a few years ago.

Exhibit D: containers

Almost everything in Amazon ran on a one-host-per-service model, with the packages present on the host dictated by Apollo's dependency resolution mechanism, so containers weren't needed to isolate multiple programs' dependencies on the same host.

Screwups caused by different system binaries and libraries on different generations of host were a thing, though, and were particularly unpleasant to diagnose. Again, that mostly went away once AWS was a thing and we didn't need to hold onto our hard-won bare-metal servers.

'Amazon Does Not Experiment'

Amazon doesn't really do open source very well. The company is dominated by extremely twitchy lawyers. For instance, my original employment contract stated that I could not talk about any of the technology I used at my job - including which programming languages I used! Unsurprisingly, nobody paid attention to that. That meant that for many years, the company gladly consumed open source, but any question of contributing back was practically off the table as it might have risked exposing which open source projects were used internally.

A small group of very motivated engineers, backed up by a lot of open-source-friendly employees, gradually changed that over the years. My first ever Amazon open source contribution took over a year to be approved. The ones I made after that were more on the order of a week.

Other companies might regard open sourcing entire projects as good PR, but Amazon doesn't particularly seem to see it that way. Thus, it's not given much in the way of funding or headcount. AWS is the obvious exception, but that's because AWS's open source libraries allow people to spend more money on AWS.

Instead, engineers within Amazon are pushed to generate ideas and either patent them, or make them into AWS services. The latter is good PR and money.

As for different languages: it really depends on the team. I know a team who happily experimented with languages, including functional programming. But part of the reason for the pushback is that a) Amazon has an incredibly high engineer turnover, both due to expansion and also due to burnout, so you need to choose a language that new engineers can learn in a hurry, and b) you need to be prepared for your project to be taken over by another team, so it better be written in something simple. So you better have a very good justification if you want to choose something non-standard.

Overall, Amazon is a pretty weird place to work as an engineer.

I would definitely not recommend it to anybody whose primary motivation was to work on the newest, shiniest technologies and tooling!

On the other hand, the opportunities within Amazon to work at massive scale are pretty great.

One of the 'fun' consequences of Amazon's massive scale is the "we have special problems" issue. At Amazon's scale, things genuinely start breaking in weird ways. For instance, Amazon pushed so much traffic through its internal load balancers that it started running into LB software scaling issues, to the point where eventually they gave up and began developing their own load balancers! Similarly, source control systems and documentation repositories kept being introduced, becoming overloaded, then replaced with something more performant.

But the problem is that "we have special problems" starts to become the default assumption, and Not Invented Here starts to creep in. Teams either don't bother searching for external software that can do what they need, or dismiss suggestions with "yeah, that won't work at Amazon scale". And because Amazon is so huge, there isn't even a lot of weight given to figuring out how other Amazon teams have solved the same problem.

So you end up with each team reinventing their own particular wheel, hundreds of engineer-hours being logged building, debugging and maintaining that wheel, and burned-out engineers leaving after spending several years in a software parallel universe without any knowledge of the current industry state-of-the-art.

I'm one of them. I'm just teaching myself Docker at the moment. It's pretty great.

[+] throwaway1280|7 years ago|reply
Speaking of twitchy lawyers and Move to AWS... one of the weirdest things we had to deal with inside Amazon was that, for many years after AWS launched, we weren't allowed to use it because it "wasn't secure enough".

Given that we were actively shopping it around to major financial institutions at the time, doesn't that strike you as particularly hypocritical? :)

[+] craftinator|7 years ago|reply
Your comment is better than the original article. Can we push this one to the top, HN?
[+] alpb|7 years ago|reply
Someone should probably add (2018) to this post as it's from May 2018.
[+] throwaway772643|7 years ago|reply
I will be joining Amazon in about a month.

Is there any chance I'll be able to work on OSS and/or "modern" tech (e.g. containers, Go, etc.) without a ton of push-back?

It also seems Amazon is obsessed with reinventing wheels and keeping their stuff internal, which is worrying. Is there any chance to introduce solid OSS tools to the development process? (whatever they might be)

[+] discodave|7 years ago|reply
AWS SDE here.

The short answer is, in order to get your team to adopt something, you need to make the case that it's better for customers (including things like migration costs). If the modern thing is more efficient, is higher availability, increases velocity, and so on then the case can be made.

Some specific examples based on things you cite:

* For an example of something OSS or "modern" coming from AWS, checkout Firecracker (written in Rust): https://firecracker-microvm.github.io/

* With regards to "reinventing wheels" Apollo + EC2 solves a lot (not all) of the problems that containers solve, and existed for years before containers became the hotness.

* Docker, which brought containers to the masses launched in 2013.

* EC2 launched in 2006 (7 years before Docker).

* Apollo (and the build system Brazil) predated EC2 by many years.

* Amazon.com was migrating to EC2/AWS before 2012 (https://www.youtube.com/watch?v=f45Uo5rw6YY)

* Another example, Lambda, which launched in 2014 runs on EC2 (https://www.youtube.com/watch?v=QdzV04T_kec&t=1611s).

* New services get to build in AWS and use Lambda, ECS, DynamoDB etc based on their business needs.

[+] praneshp|7 years ago|reply
My wife has been at amazon (aws) about two years now.

She worries about her customers' problems, and is obsessed with using the right tools to solve them. Unless she is a great actor she has had a fantastic time there.

If things like "be able to work on OSS or \"modern\" tech" is what you want, go somewhere that allows it.

Edit: Should say "go somewhere that is known to allow it for all engineers".

[+] anth_anm|7 years ago|reply
No, that's not accurate at all.

Amazon made choices on how to do things, years ago. These choices are being remade just because the new hotness is containers and not using uncool java. They have a lot of tooling that works for them, and a lot of it is quite good.

They aren't hostile to new stuff either. It's just that why would you waste time and money trying to shoehorn some new way of doing things when the old way works just fine.

You will still get to do lots. maybe, depends on the team.

I would say don't waste your energy. Learn when to pick the battles. Accept that you will get push back that doesn't make sense to you.

[+] andrewguenther|7 years ago|reply
> Is there any chance I'll be able to work on OSS and/or "modern" tech (e.g. containers, Go, etc.) without a ton of push-back?

As long as it is the right choice, yes. As far as I know, I built the first service internally that was entirely container based, but did so because it was the right tool for the job. Container based services are getting a ton of traction internally now, especially Fargate-based ones.

You're going to have a hard time making a case for Go though. I have not a single time been convinced that Go was the best tool for the job in 5 years at Amazon.

> It also seems Amazon is obsessed with reinventing wheels and keeping their stuff internal, which is worrying.

I have not found this to be true. But I can see why people might think this. Amazon built some state of the art tooling quite some time ago, and it's starting to show it's age. Rather than drop the internal stuff for new OSS alternatives, they've continued to add modern features to the internal tools, which I think is the right choice overall considering the scale at which they're used and integrated.

Again, it is about the right tool for the job. If you present a compelling case to use an OSS alternative then more than likely you'll be able to use it.

[+] umanwizard|7 years ago|reply
I worked there for two years (summer 2013-summer 2015), so long enough to get an idea of the culture. Take this post with the caveat that I have no idea whether it's changed since then and if so to what extent.

> Is there any chance I'll be able to work on OSS and/or "modern" tech (e.g. containers, Go, etc.) without a ton of push-back?

Sure there's a greater chance than zero, but not if only motivated by it being "modern". Amazon is a business. It exists to make money by providing a valuable service to customers. The only reason it's a "tech company" is because writing software serves that goal, but "cool tech" is not a goal in itself. If you have a serious, documentable reason why rewriting your team's service in Go would help you achieve business goals, then you might be able to get the attention of engineering decision makers, but that's a much higher bar than "it's modern".

Personally I don't see what is "modern" about Go or how it would help the business serve customers better and/or make more money than it does using Java. I suspect many of your coworkers would feel the same way and these are the terms that the decision would be framed in within Amazon's culture.

By the way there has been over the last several years a move away from C++ and Perl and towards Java. When I was there the majority of stuff was in Java but there was still plenty of important stuff in those other two languages. I suspect C++ and Perl are even rarer now. I guess that's the modernization you're talking about, but maybe not at the pace you want.

> It also seems Amazon is obsessed with reinventing wheels

In many cases, either these wheels were invented at Amazon before they became available in the OSS world, or the OSS tools do not fit Amazon's needs (especially w.r.t. scale).

There is certainly no "obsession" with reinventing wheels -- any sane manager at Amazon would definitely rather use something off-the-shelf than waste a bunch of money developing it from scratch, assuming it fit their needs well.

> and keeping stuff internal

Well, I guess this is true. Amazon contributes less to open source than a few other famous tech companies.

> Is there any chance to introduce solid OSS tools to the development process?

Amazon uses plenty of very solid OSS tools: for example, Linux, Java, gcc, perl, git (though they were on Perforce when I started), and Tomcat are all core parts of Amazon infrastructure. As well as the same grab bag of common tools and libraries you'd find in use anywhere else. In general things with permissive licenses (BSD/MIT/etc) are fair game, but getting approval for things with copyleft licenses (GPL/LGPL/etc) is an uphill battle. As for whether you could introduce more, it would depend on the value of the tool and your motivation for doing so.

[+] PaulHoule|7 years ago|reply
Getting an npm or other package approved for internal use is not an unusual practice.
[+] wmf|7 years ago|reply
Yes, but it probably makes Node.js useless in any such company since any non-trivial app will have 1,000 npm dependencies.
[+] tanilama|7 years ago|reply
Yep, can confirm. NPM is very compatible and easy to use in Amazon. It is just not allowed to serve critical traffic, which I think is a wise choice anyway.
[+] presty|7 years ago|reply
The OP needs to put a date on the article, because AFAIK things are very different in 2019.

Also, it's interesting how they equate "experimentation" with "open source".

[+] stretchwithme|7 years ago|reply
Considering the constant stream of new services and feature, the lack of OSS is insignificant compared to value they add to the world.

Like the fact that you can create an SSL/TLS certificate for free for load balancers without the usual agony. So easy.

[+] sumanthvepa|7 years ago|reply
I worked at Amazon in the late 90s. So my experience is most likely not relevant anymore, but I will make a few observations. First, I see that many commenters disagree with the OP, they had a different experience of Amazon, one where they were working with infrastructure that was responsive. modern, easy to use etc. It very possible that both observations are correct. In a large company, not all parts of the company will be using the same infrastructure at the same time. Indeed, I would be dangerous for the entire company to upgrade lockstep to a new technology infrastructure. Second, in most companies, innovation is not measured by the novelty or newness of the language or framework you use, but by the business impact your product or service makes. Much of Amazon's innovation was, and is, around business models. Indeed, when I worked at AMZN, I was writing C code (to power a website) using beautifully efficient database access code written by Sheldon Kaphan. There was nothing remotely advanced about the language. It took 9 months for me to get a 3 line code change into production. And I was using technology that predated Apollo (it was called Huston) There is nothing particularly wrong about that either (it was a potty mouth filter and was blocking some obscure swear words, and no one was too worried that the component it was part of didn't ship for the best part of a year.) I now run my own company, and I both manage technology, people as well as write code. I find myself exercising the same conservatism with respect to code and infrastructure that I found at Amazon, and for the same reasons. It is expensive and potentially company destroying to switch languages and core technologies. It is best not done, or if done at all, done with a lot of care and slowly.
[+] femto113|7 years ago|reply
I was once pitched a startup founded by some ex-amazonians whose big idea was "Apollo for everyone". They were nonplussed by my spit take.

For the people saying "I worked there and it wasn't like that" I wonder if you worked in retail. It's a very different world from the more modern bits of the company.

[+] just_passing_by|7 years ago|reply
What amuses me is that most of retractions come from ex-Amazonians not from the current staff. This is the only company i know dealing with that much criticism from engineering.

Even more to add is that the article is more or less fresh and at Amazon's scale i doubt any major changes had undergone in the last 10 months.

[+] amzn-throw|7 years ago|reply
There is a comment above you from an Amazon Principal Engineer: https://news.ycombinator.com/user?id=jcrites

His profile says "Architect and cofounder of Simple Email Service. Creator of Cloud Desktop, a cloud-based development environment used by most Amazon engineers. Technical lead for Amazon's strategy for using AWS."

Can't get more "From the horse's mouth than that"

We are generally asked not to comment on stuff like this because of how easy it is to reveal confidential internal details.

For the record, the article is mostly wildly out of date, but others have already corrected the record.

[+] umanwizard|7 years ago|reply
Amazon (like many other big companies) strongly discourages employees from commenting publicly on subjects related to the company.

It might seem draconian, but it makes a lot of sense. Talking about a smaller company in a forum like HN might be completely anodyne, but comments about Amazon could easily have repercussions like getting picked up by the press and spun into some crazy story, or alerting people to some strategic information that the commenter isn't even aware is sensitive. For example, posting "hey we're at Amazon and we're moving from system X to system Y" could generate surprised and angry phone calls with the CEO of the vendor of X (I have a real example in mind that happened because of a stackoverflow post), or could cause the stock price of the vendor of Y to jump, causing insider trading concerns... best to just avoid it.

So it's very natural that company policy would strongly discourage such public commentary, and most current employees follow that.

[+] jkingsbery|7 years ago|reply
I am a current Sr. SDE at Amazon in the Retail org. I would agree with most of the rebuttals, and I don't have too much to add, since they do a pretty good job summarizing.
[+] throwawayamz27|7 years ago|reply
The main benefit for amazon's tools is once you've been there a while you know how they work, and all the complexity and bugs have been stripped out of them. And because they force engineers to go oncall everyone has a pretty good idea of how to fix things.

When you have SRE's spending all day creating the next new thing (generally after deprecating the previous one with no replacement), you end up in a situation where you forget how to say, rollback a bad deployment. Or scale a fleet.

The problem with fancy infrastructure as code, containers and logging services is when they break you have no idea how to get out of trouble. SSH and grep almost always work, as does symlinking a directory.