Microservices Killed Our Startup. Monoliths Would've Saved Us

ameliaquining|1 month ago

Does this story seem kinda…fake…to anyone else? Like, obviously companies do sometimes make decisions this stupid, but the way this is written seems a little too carefully optimized to make for a morality play of the kind HN enjoys. (And there's a potential motive, since there's a whole bunch of links to paid books and such, somewhat clumsily tied to the main narrative.)

altmanaltman|1 month ago

> “It’s not complex if you do it right. Netflix — “

> “WE’RE NOT NETFLIX!” I finally snapped. “Netflix has 500 engineers. We have 4. Netflix has dedicated DevOps teams. We have one guy. Netflix has millions of users. We have 50,000.”

Then

> Lesson 5: The Monolith Isn’t Your Enemy

> A well-structured monolith can:

> Scale to millions of users (Shopify, GitHub, Stack Overflow prove this)

Because Shopify, Github and Stack Overflow have 4 engineers each as well.

It kind of seems real because it reads like the it's written by the kind of person that would make high level arch decisions without even understanding what the f they are doing.

temp_praneshp|1 month ago

100%. The writing style put me off too, not sure exactly what about is weird though

greatgib|1 month ago

This article looks like a giant stack of bullshit, trying to surf the wave of trendy topics.

If you are small and not have scaling problems, it is highly unlikely that you see a real difference between monolith or microservice except on the margin.

But lots of things look off in the article: Billing needed to ... Create the order

What? Billing service is the one creating the orders instead of the opposite?

   Monday: A cascading failure took down the entire platform for 4 hours. One service had a memory leak, which caused it to slow down, which caused other services to time out, which caused retries, which brought everything down. In the monolith days, we would’ve just restarted one app. Now we had to debug a distributed system failure.

Hum, they could have restarted the service that failed, but if they had a leak in their code, even being a monolith the whole app would have gone done until the thing is fixed even constantly restarting. And I don't imagine the quality of your monolith service that is constantly restarting in full...

Finally it claims that Monday their service started to be slow, and already Wednesday the customer threatened to leave them because of the service to be slower. Doesn't look like to be a customer very hooked or needing your service if only after 2 days of issues they already want to leave.

Also, something totally suspicious is that, even if small or moderate size of company you could still have people push some architecture that they prefer, no company with a short few months cash runaway will decide to do a big refactor of the whole architecture if everything was good on the first place and no problem encountered. What will happen in theory is that you will start to face a wall, degrading performances with scale of something like that and then decide that you will have to do something, a rework. And then there will be the debate and decision about monolith, microservice, whatever else...

AbstractH24|1 month ago

It feels like I heard this same story about another company like two weeks ago

Was it something in the payment space?

condwanaland|1 month ago

The mistake here is having an architect who is not shipping product. Architects who's job it is to define 'rules' and 'patterns' without actually impending anything are almost always a bad idea. Just focus on shipping. Have at least one experienced engineer who can guide the development but don't give those decisions over to some 'architect' who is not even going to write 10 lines of code in your codebase

LucaMo|1 month ago

> We had 4 backend developers and a DevOps guy who was already stretched thin.

The mistake here was having an architect full stop. The team is too small, a good tech lead can manage to plan a service with 50k MAU (and way beyond) without an architect. The problem with some companies that get millions in seed funding is that they need to spend the money and they do so by adding roles that shouldn't exist at that stage.

salomonk_mur|1 month ago

Having members of the tech team who don't write code in some way or another is bad practice in general

hinkley|1 month ago

I vastly prefer architecture as a responsibility shared by all the staff+ or lead devs than as a role.

But that starts to fall down too any time too many people are talking about software they aren’t responsible for deploying or fixing.

dabinat|1 month ago

One thing I’ve learned is that you should be wary of spending too much time on things that customers don’t see. Customers don’t care about backend engineering unless it results in benefits they can actually see, and if you spend too long on invisible features they’ll think your platform is stagnant and move somewhere else.

CharlieDigital|1 month ago

    > ...you should be wary of spending too much time on things that customers don’t see

I don't think this is entirely true because there are some things that will help you ship faster like good architecture and a system design that is as simple as possible. These are worth investing, despite their obscurity to the end user, because doing it well can result in a faster pipeline and more stability.

Olumde|1 month ago

> But we survived

> And ironically? Now that we’re back on a monolith and shipping fast again, we’ve started growing again. Customers are happier. The team is happier.

So Microservices did not kill your startup?

And why did you stop instances of your monolith before the Microservices version was mature and ready???

rambojohnson|1 month ago

Premature distribution killed the startup, not microservices. You split the system before the boundaries were real, paid the tax in latency and coordination, and skipped the hard parts that make it viable: event-driven boundaries, local read models, and boring failure handling and comprehensive logging. Start with a modular monolith, earn your boundaries, then extract.

hinkley|1 month ago

Ask your coworkers how many of them got any formal training in distributed systems in college. You’re going to find out it’s not many. So far I haven’t found anyone who didn’t go to Berkeley or UIUC. WTF is going on with universities?

jiggawatts|1 month ago

Ironically posted on Medium, which showed me the text, then blanked the whole screen to replace the text with light grey polyfills, and then showed me the same text again... several seconds later.

That's because Medium is a bunch of APIs and (micro) services, not a monolith like it should be.

Heck, it could be plain static HTML because it's just text for crying out loud!

Instead, it uses a GraphQL query through JSON to obtain the text of the article... that it already sent me in HTML.

Total page weight of 17 MB, of which 6.7 MB is some sort of non-media ("text") document or script.

This is user-hostile architecture astronaut madness, and is so totally normal in the modern internet that nobody even bats and eye when text takes appreciable amounts of time to render on a 6 GHz multi-core computer with 1 Gbps fibre Internet connectivity.

Your customers hate this. Your architects love it because it keeps them employed.

echoangle|1 month ago

> light grey polyfills

Those grey loading placeholders for text are called skeleton loaders BTW, polyfills are libraries used to support newer browser APIs in older browsers and not something you can exactly see on a website (without checking the devtools)

simfree|1 month ago

A simple modern Dotnet monolith with Postgres on a Linux server could deliver a much better end user experience, and it probably would take a lot less server resources than the current mess.

hinkley|1 month ago

I tried to explain this to a team that eventually lost their customers to competitors who could generate less interesting pages far cheaper per request. Instead they went off on a two year jag trying to cache page sections.

You know a team has lost the architectural plot when their answer for all performance problems is more caching. And once you add caching it’s hard to sell any other sort of improvements because the caching poisons the perf analysis.

Their solution took forever because the system was less deterministic than we even knew. They were starting to wrap it up when I went on a tear cleaning up low level code that was nickel and diming us. By the time they launched they were looking at achieving half of the response time improvement they were looking for, in twice the time they estimated to do so. And they cheated. They making two requests about 10% of the time, which made the p50 time into a lie, because two smaller requests pull down the average but not the cost per page load. But I scooped them and made the slow path faster, undercutting another 25% of their perf improvements.

I ended up doing more to improve the Little’s Law situation in three months of working on it half time than they did in two man years. And still nothing changed. They are now owned by a competitor. That I believe shut down almost all of their services.

ameliaquining|1 month ago

Monoliths vs. microservices has nothing to do with server-side rendering vs. GraphQL. Architecturally monolithic Web apps use GraphQL all the time.

I'm not sure why Medium does the weird blanking thing but my guess is that it's because it's deciding whether to let you read the article or instead put up a paywall. There are a lot of SPA sites out there, many of which aren't particularly economical with frontend resources, and they generally don't do that unless they're trying to enforce some kind of paywall or similar.

analogpixel|1 month ago

this goes to show that one person can make a difference, the lead architect all by himself was able to destroy a company and moral.

nodesocket|1 month ago

Pretty sure making a product that people don’t want killed your startup. This is like saying using Python vs Go killed your startup which is absurd (unless your startup is high frequency trading or something).

hinkley|1 month ago

There’s a Venn diagram of different delusions and they overlap a lot.

You don’t have a market fit and you’re running your dev team like a larger company? What are the odds? Pretty high actually.

another_twist|1 month ago

I am pretty sure it was a lack of demand that killed the startup. Either of these are valid problems and quite easy to deploy and work with.

hinkley|1 month ago

One of the tricks of the startup dance is attaching improvement to revenue. Any work that’s done to support new customers will get approved. And work that looks like a loss leader, such as to retain existing customers, they may lean on the business or support people to paper over.

brimstedt|1 month ago

I do not agree fully with this article, but it does give food for thought and have some valid points:

- don't blindly jump into a new architecture because it's cool

- choose wisely the size of your services. It's not binary, and often it makes sense to group responsibilities into larger services.

- microservices have some benefits, moduliths (though not mentioned in the article) and monoliths have theirs. They all also have their set of disadvantages.

- etc

But anyway, the key lesson (which does not seem like a conclusion the author made) is:

Don't put a halt to your product/business development to do technician only work.

I.e if you can't make a technical change while still shipping customer value, that change may not be worth it.

There are of course exceptions, but in general you can manage technical debt, archtectural work, bug fixing, performance improvements, dx improvements, etc, while still shipping new features.

hdaz0017|1 month ago

please give us $200 ;) but sorry we can't even get the title right!

blackoil|1 month ago

Microservices solve people problem not technical. Till ~20 backend devs no point in moving to it. Monoliths are better in terms of performance, reliability and dev speed.

hinkley|1 month ago

You can’t solve social problems with technology.

Microservices solve a logistical problem. Rob wants to push code every two days. Steve wants to push every three. Thom deals with business who wants to release at whim and preferably within a few hours. Their commissions and bonuses are not reduced by how much chaos they case the engineering team. It’s an open feedback loop.

As you add more employees they start tripping over each other on the differences between trunk and deployed. Thats when splitting into multiple services starts to look attractive. Unfortunately they create their own weather and so if you can use process to delay this point you’re gonna be better off.

Everyone eventually merges code they aren’t 100% sure about. Some people do it all the time. However microservices magnify this because it’s difficult to test changes that cross service boundaries. You think you have it right but unless you can fit the entire system onto one machine, you can’t know. And distributed systems usually don’t concern themselves with whether the whole thing will fit onto a dev laptop.

So then you have code in preprod you are pretty sure will work but aren’t completely sure. Stack enough “pretty sure”s over time and as team sizes grow and you’re gonna have incidents on the regular. Separate deployment reduces the blast radius, but doesn’t eliminate it. Feature toggles reduce it more than an order of magnitude, but that still takes you from problems every week to a couple a year. Which in high SLA environments still makes people cranky.

unknown|1 month ago

[deleted]

amelius|1 month ago

That building in the right half of the image doesn't look like a monolith to me.

unknown|1 month ago

[deleted]

aee|1 month ago

"No fluff" is the GPT5 em dash

shraddha92|1 month ago

what did i just read

54 comments