> The team used existing tooling to move services between zones in order to ensure they were portable. Firstly, they allowed services to be moved back to the original zone to resolve any portability issues, but once resolved, services would be moved periodically to validate portability and prevent regressions.
This is something that most companies don’t do when they say they want to do $x to “prevent lock in”.
Uber actually is testing for portability along the way.
Uber Microservices were such an inefficient PITA. There was buzzword soup of a bunch of half baked infra pieces and they were always migrating. Every part of the stack was rotten. Udeploy, xterra, tchannel, schemaless, etc etc.
My peak “wtf” moment was when we had a SEV because two services that should communicate actually used different versions of thrift, both hard forked by Uber, with different implementations for sets. Passing a set from one service to another caused everything to break.
> In preparation for the move to the cloud, the company spent two years working towards making all stateless microservices portable so that their placement in zones and regions can be managed centrally without any involvement from the service engineers
I'd like to hear more about how Uber organized the engineering teams over two years to make "stateless microservices portable".
How many teams? What were the requirements to each team? What was the timeline? How did they know it was completed? How was it prioritized along other business priorities of the teams? How long did they think it would take originally? Was it worth it?
It seems like they’ve gotten to the “holy grail” of deployment where developers don’t have to worry about infrastructure at all in theory.
I’ve seen many teams go for simple/leaky abstractions on top Kubernetes to provide a similar solution, which is tempting because it’s easy and flexible. The problem is then all your devs need to be trained in all the complexities of Kubernetes deployments anyway. Hopefully Uber abstracted away Kubernetes and Mesos enough to be worthwhile, and they have a great infra team to support the devs.
A different (better?) question is, does Uber need 4000 API contracts?
The answer to that is probably yes. APIs let us split work across systems/people/teams/regions, and provide a way for both sides of a split to work together. Uber has a lot of teams, a lot of engineers, and so it makes sense that there are a lot of API boundaries to allow them to work together more efficiently. Sometimes those APIs make sense to package as microservices.
There's an an interesting HN comment[1] from 2020 by former Uber engineer, which discusses the complexity a bit. It's more about UI, but the thread discusses the backend as well. In brief something that may look super simple for the user (like handling payments) is actually quite complicated when you cover all the market, different payment types etc. And all this carries to the backend as well.
Uber has a really liberal definition of a micro service. Every web UI or dashboard is a service (of which there are many hundreds). Every application anyone builds across their many thousands of engineers is a service. It's rare, I think, for services to have fewer then a few thousand lines of code. In my experience, most companies would have a monolith that serves multiple UIs from the same service. Uber instead ships that monolith as a library which is a framework for building individual UIs. It has its pros and cons but I quite liked how they did it.
(Worked at Lyft) Our number of active micro services was small in comparison. 4,000 is likely a overblown number to highlight the accomplishment possibly counting inactive ones
From experience working at big tech I’m willing to take a guess.
Maybe a couple of dozens will be actual more complex and meaningful services. Then few dozens more services that are somewhat more unique.
And then majority of the long tail will be mostly cookie cutter services, doing X, but for lots of different use cases, where each of use cases is separate deployment counting as a service (for example - systems to process streams of logs related to business logic).
I've seen at least one place with many more than that in recent years. If you have one microservice "listener" per queue and another for the database processing and persistence (business logic) and another providing an API for one or more frontend UI's related to it then the microservice tally goes up very fast. It's kind of surprising to read so many comments indicating HN readers weren't aware of this.
There's quite a sizing range between monolith and microservice.
If all their It needs are behind micro "micro" services, that figure is understandable.
Outside of the map, taxi, food, payments, onboarding, they also have monitoring, deployment, HR, billing, legal, taxes, internationalized stufd, and the usual "..." for what I'm missing.
If you just take a standard ERP, you could easily split it in dozens even hundreds of microservices.
There's no way that number isn't fiction; Occam's razor say's its out of the range of believable. That's ~2 per eng according to Google. That's absurd. (That eng headcount is also a bit … high.)
This sounds like a figure from someone who sees a signle microservice running across 100 pods/instances, and counted that as 100 "microservices".
I couldn't find any explanation of where the data would be found. Are they splitting data across clouds, and constantly "porting" that data from cloud to cloud as part of their portability?
Orchestrating the application layer across clouds is interesting, but how does their data layer work?
I dislike the Uber business itself (horrible treatment of drivers, poor customer service, poor safety controls, bullying of small businesses with Uber Eats, shitty executive level team with questionable ethics).
But the underlying technology which carried them to this point is a fascinating read.
I believe the dollar amount savings figures, they’re big and worthy of a congratulations to the engineers involved!
IMO, engineering man hour savings are a lot less trustable. This may eliminate or simplify some engineering processes but IME massive migrations like this simply replace them with a different set of processes; because they’re different and theoretically addressable they’re not counted against the hours saved as they can be bucketed into bugs/to be addressed by the roadmap/legacy behavior migrated from the old system (which is now dangerously-fragile-legacy and not ol-reliable-legac). Eventually someone will come along and decide this too is an inherently flawed platform that needs to be entirely replaced at great expense, and the circle of life continues.
This is still a massive undertaking not just from an engineering perspective but from an organizational/process one though. Whoever pulled this off essentially had to coordinate (or figure out how to simplify/explain things well enough to skip coordination) with almost every engineer and likely almost every production service in a company with thousands of engineers. Those in startups may balk about this kind of thing taking two years, but having done my own two year projects (at a smaller but comparable scale) in a big company I can say two years is what I’d consider a highly optimistic and unlikely outcome for a project of this magnitude.
> This may eliminate or simplify some engineering processes but IME massive migrations like this simply replace them with a different set of processes
Yes
> because they’re different
Now I have to learn an entire new set of tools/processes etc that are more useful to someone else but not helpful for me. The old one had its quirks but I knew it inside out and now the whole org has to re-learn how to do everything we did before.
For a company that is basically a taxi service, they seem to invest an awful lot in constant rebuilds of their extremely complex infrastructure, which raises the question of whether that is even remotely necessary or just an exercise in pretending that they are a tech company.
“Basically a taxi service,” except that Uber spans hundreds of cities, coordinates millions of drivers - none of whom work on a fixed schedule - and its only interface with customers is an app that has to be fast, accurate, and reliable at all times.
They do food delivery, parcel courriers, regular ubers, plan ahead uber, grocery shopping, and a lot of other stuff. if anything this is simpler than most silo driven architectures you'd usually get with such a massively diversified business.
We're giving a talk about this at KCD Denmark on the 14th of November "Keynote: Uber - Migrating 2 million CPU cores to Kubernetes" if anyone is in the area and has any particular interest in this.
In 3 years… “Uber saved cost by migrating their micro service to their own colo.” followed by “Uber simplified operations by migrating their micro service platform to a monolith”.
In 5 years... "We've discovered a new paradigm for efficiently carving up and distributing computational units for our application. We call it, nanofunctions."
I'm not sure why this is an issue with a long running system. Business requirements change, knowledge changes, cost structures change, etc... Unfortunately the world isn't static. I'm not sure about you, but when the facts change I also try to change.
[+] [-] scarface_74|2 years ago|reply
This is something that most companies don’t do when they say they want to do $x to “prevent lock in”.
Uber actually is testing for portability along the way.
[+] [-] dehrmann|2 years ago|reply
[+] [-] eatonphil|2 years ago|reply
https://www.uber.com/en-GB/blog/up-portable-microservices-re...
[+] [-] voz_|2 years ago|reply
My peak “wtf” moment was when we had a SEV because two services that should communicate actually used different versions of thrift, both hard forked by Uber, with different implementations for sets. Passing a set from one service to another caused everything to break.
[+] [-] activescott|2 years ago|reply
I'd like to hear more about how Uber organized the engineering teams over two years to make "stateless microservices portable".
How many teams? What were the requirements to each team? What was the timeline? How did they know it was completed? How was it prioritized along other business priorities of the teams? How long did they think it would take originally? Was it worth it?
[+] [-] s3p|2 years ago|reply
[+] [-] jbotdev|2 years ago|reply
I’ve seen many teams go for simple/leaky abstractions on top Kubernetes to provide a similar solution, which is tempting because it’s easy and flexible. The problem is then all your devs need to be trained in all the complexities of Kubernetes deployments anyway. Hopefully Uber abstracted away Kubernetes and Mesos enough to be worthwhile, and they have a great infra team to support the devs.
[+] [-] x86x87|2 years ago|reply
[+] [-] danpalmer|2 years ago|reply
The answer to that is probably yes. APIs let us split work across systems/people/teams/regions, and provide a way for both sides of a split to work together. Uber has a lot of teams, a lot of engineers, and so it makes sense that there are a lot of API boundaries to allow them to work together more efficiently. Sometimes those APIs make sense to package as microservices.
[+] [-] jpalomaki|2 years ago|reply
[1] https://news.ycombinator.com/item?id=25376346
[+] [-] threeseed|2 years ago|reply
So almost certainly they are duplicating their entire stack per-country if only to get around the vastly different regulatory environments.
[+] [-] bastawhiz|2 years ago|reply
[+] [-] ninja3925|2 years ago|reply
[+] [-] justapassenger|2 years ago|reply
Maybe a couple of dozens will be actual more complex and meaningful services. Then few dozens more services that are somewhat more unique.
And then majority of the long tail will be mostly cookie cutter services, doing X, but for lots of different use cases, where each of use cases is separate deployment counting as a service (for example - systems to process streams of logs related to business logic).
[+] [-] talent_deprived|2 years ago|reply
[+] [-] whynotmaybe|2 years ago|reply
If all their It needs are behind micro "micro" services, that figure is understandable.
Outside of the map, taxi, food, payments, onboarding, they also have monitoring, deployment, HR, billing, legal, taxes, internationalized stufd, and the usual "..." for what I'm missing.
If you just take a standard ERP, you could easily split it in dozens even hundreds of microservices.
[+] [-] belter|2 years ago|reply
"What I Wish I Had Known Before Scaling Uber to 1000 Services" - https://youtu.be/kb-m2fasdDY
[+] [-] speedgoose|2 years ago|reply
https://news.ycombinator.com/item?id=30635369
[+] [-] 0xblinq|2 years ago|reply
[+] [-] tight-ship|2 years ago|reply
[+] [-] 0xDEF|2 years ago|reply
[+] [-] barbazoo|2 years ago|reply
[+] [-] nine_zeros|2 years ago|reply
/s
[+] [-] deathanatos|2 years ago|reply
This sounds like a figure from someone who sees a signle microservice running across 100 pods/instances, and counted that as 100 "microservices".
[+] [-] locustmostest|2 years ago|reply
Orchestrating the application layer across clouds is interesting, but how does their data layer work?
[+] [-] fbnbr|2 years ago|reply
I got so excited about reading for Mesos helping in the multi cloud world, potentially as the hypervisor for running k8s
[+] [-] xyst|2 years ago|reply
But the underlying technology which carried them to this point is a fascinating read.
[+] [-] abbadadda|2 years ago|reply
[+] [-] opportune|2 years ago|reply
IMO, engineering man hour savings are a lot less trustable. This may eliminate or simplify some engineering processes but IME massive migrations like this simply replace them with a different set of processes; because they’re different and theoretically addressable they’re not counted against the hours saved as they can be bucketed into bugs/to be addressed by the roadmap/legacy behavior migrated from the old system (which is now dangerously-fragile-legacy and not ol-reliable-legac). Eventually someone will come along and decide this too is an inherently flawed platform that needs to be entirely replaced at great expense, and the circle of life continues.
This is still a massive undertaking not just from an engineering perspective but from an organizational/process one though. Whoever pulled this off essentially had to coordinate (or figure out how to simplify/explain things well enough to skip coordination) with almost every engineer and likely almost every production service in a company with thousands of engineers. Those in startups may balk about this kind of thing taking two years, but having done my own two year projects (at a smaller but comparable scale) in a big company I can say two years is what I’d consider a highly optimistic and unlikely outcome for a project of this magnitude.
[+] [-] jvans|2 years ago|reply
Yes
> because they’re different
Now I have to learn an entire new set of tools/processes etc that are more useful to someone else but not helpful for me. The old one had its quirks but I knew it inside out and now the whole org has to re-learn how to do everything we did before.
[+] [-] lowbloodsugar|2 years ago|reply
[+] [-] this_user|2 years ago|reply
[+] [-] ttul|2 years ago|reply
[+] [-] bogota|2 years ago|reply
And google is just a search engine they only need like 20 engineers……………
[+] [-] mardifoufs|2 years ago|reply
[+] [-] gedy|2 years ago|reply
Not defending their tech stack, but I mean that is a lot of realtime data that needs to be accurate - this is not your typical SaaS crud app.
[+] [-] zht|2 years ago|reply
Is this generally a sign of youthful wishful thinking or just plain hubris?
[+] [-] intunderflow|2 years ago|reply
We're giving a talk about this at KCD Denmark on the 14th of November "Keynote: Uber - Migrating 2 million CPU cores to Kubernetes" if anyone is in the area and has any particular interest in this.
[+] [-] kosolam|2 years ago|reply
[+] [-] jiveturkey|2 years ago|reply
[+] [-] mbrumlow|2 years ago|reply
[+] [-] corney91|2 years ago|reply
2013: "Migrating Uber from MySQL to PostgreSQL"[1]
2016: "Why Uber Engineering Switched from Postgres to MySQL"[2]
[1] https://www.yumpu.com/en/document/view/53683323/migrating-ub...
[2] https://www.uber.com/en-GB/blog/postgres-to-mysql-migration/
[+] [-] politelemon|2 years ago|reply
[+] [-] matwood|2 years ago|reply
[+] [-] digger495|2 years ago|reply
[deleted]
[+] [-] m3kw9|2 years ago|reply
[deleted]
[+] [-] dzlobin|2 years ago|reply
[deleted]