top | item 22193383

Monoliths Are the Future

1065 points| feross | 6 years ago |changelog.com

552 comments

[+] rubyn00bie|6 years ago|reply

I couldn't agree more with an article.

Most people think a micro-service architecture is a panacea because "look at how simple X is," but it's not that simple. It's now a distributed system, and very likely, it's a the worst-of-the-worst a distributed monolith. Distributed system are hard, I know, I do it.

Three signs you have a distributed monolith:

1. You're duplicating the tables (information), without transforming the data into something new (adding information), in another database (e.g. worst cache ever, enjoy the split-brain). [1]

2. Service X does not work without Y or Z, and/or you have no strategy for how to deal with one of them going down.

2.5 Bonus, there is likely no way to meaningfully decouple the services. Service X can be "tolerant" of service Y's failure, but it cannot ever function without service Y.

3. You push all your data over an event-bus to keep your services "in-sync" with each-other taking a hot shit on the idea of a "transaction." The event-bus over time pushes your data further out of sync, making you think you need an even better event bus... You need transactions and (clicks over to the Jepsen series and laughs) good luck rolling that on your own...

I'm not saying service oriented architectures are bad, I'm not saying services are bad, they're absolutely not. They're a tool for a job, and one that comes with a lot of foot guns and pitfalls. Many of which people are not prepared for when they ship that first micro service.

I didn't even touch on the additional infrastructure and testing burden that a fleet of micro-services bring about.

[1] Simple tip: Don't duplicate data without adding value to it. Just don't.

[+] BrentOzar|6 years ago|reply

I'm a database guy, so the question I get from clients is, "We're thinking about breaking up our monolith into a bunch of microservices, and we want to use best-of-breed persistence layers for each microservice. Some data belongs in Postgres, some in DynamoDB, some in JSON files. Now, how do we do reporting?"

Analysts expect to be able to connect to one system, see their data, and write queries for it. They were never brought into the microservices strategy, and now they're stumped as to how they're supposed to quickly get data out to answer business questions or show customers stuff on a dashboard.

The only answers I've seen so far are either to build really complex/expensive reporting systems that pull data from every source in real time, or do extract/transform/load (ETL) processes like data warehouses do (in which the reporting data lags behind the source systems and doesn't have all the tables), or try to build real time replication to a central database - at which point, you're right back to a monolith.

Reporting on a bunch of different databases is a hard nut to crack.

[+] tootie|6 years ago|reply

This is what gave rise to data lakes. The typical data lake maturity model I see in enterprise is:

1. Pay a ton of money to Microsoft for Azure Data Lake, Power BI, etc.

2. Spend 12 months building ETLs from all your microservices to feed a torrent of raw data to your lake.

3. Start to think about what KPIs you want to measure.

4. Sign up for a free Google Analytics account and use that instead.

[+] MadWombat|6 years ago|reply

> Some data belongs in Postgres, some in DynamoDB, some in JSON files. Now, how do we do reporting?

One of the key concepts in microservice architecture is data sovereignity. It doesn't matter how/where the data is stored. The only thing that cares about the details of the data storage is the service itself. If you need some data the service operates on for reporting purposes, make an API that gets you this data and make it part of the service. You can architect layers around it, maybe write a separate service that aggregates data from multiple other services into a central analytics database and then reporting can be done from there or keep requests in real time, but introduce a caching layer or whatever. But you do not simply go and poke your reporting fingers into individual service databases. In a good microservice architecture you should not even be able to do that.

[+] WoahNoun|6 years ago|reply

I'm going to disagree heavily here. The world of cloud computing, microservices, and hosted/managed services has made the analyst and data engineers job easier than ever. If the software team builds a new dynamodb table, they simple give the AWS account for the analytics team the appropriate IAM permissions and the analytics team will set-up an off-peak bulk extract. A single analyst can easily run an entire data warehouse and analytics pipeline basically part time without a single server using hosted services and microservices. With a team of analysts, the load sharing should be such that the ETL infrastructure is only touched when adding new pipelines or a new feature transformation.

And for data scientists working on production models used within production software, most inference is packaged as containers in something like ECS or Fargate which are then scaled up and down automatically. Eg, they are basically running a microservice for the software teams to consume.

Real time reporting, in my opinion, is not the domain of analysts; it's the domain of the software team. For one, it's rarely useful outside of something like a NOC (or similar control room areas) and should be considered a software feature of that control room. If real-time has to be on the analysts (been there), then the software team should dual publish their transactions to kinesis firehouse and the analytics team can take it from there.

Of course, all of this relies heavily on buy-in to the world of cloud computing. Come on in, we all float down here.

[+] projektfu|6 years ago|reply

It's a little sad because originally, people thought there would be a shared data base (now one word) for the whole organization. Data administrators would write rules for the data as a whole and keep applications in line so that they operated on that data base appropriately. A lot of DBMS features are meant to support this concept of shared use by diverse applications.

What ended up happening is each application uses its own database, nobody offered applications that could be configured to an existing data base, and all of our data is in silos.

[+] dcolkitt|6 years ago|reply

I disagree with the conclusion. While every situation is unique, the default should be separate persistence layers for analytics and transactions.

Analytics has very different workloads and use cases than production transactions. Data is WORM, latency and uptime SLAs are looser, throughput and durability SLAs are tighter, access is columnar, consistency requirements are different, demand is lumpy, and security policies are different. Running analytics against the same database used for customer facing transactions just doesn't make sense. Do you really want to spike your client response times every time BI runs their daily report?

The biggest downside to keeping analytics data separate from transactions is the need to duplicate the data. But storage costs are dirt cheap. Without forethought you can also run into thorny questions when the sources diverge. But as long as you plan a clear policy about the canonical source of truth, this won't become an issue.

With that architecture, analysts don't have to feel constrained about decisions that engineering is making without their input. They're free to store their version of the data in whatever way best suits their work flow. The only time they need to interface with engineering is to ingest the data either from a delta stream in the transaction layer and/or duplexing the incoming data upstream. Keeping interfaces small is a core principle of best engineering practices.

[+] AnthonyWnC|6 years ago|reply

In my last job I was a DevOps guy in a Data Eng. team and we used microservice (actually serverless) extensively to the point that none of our ETL relied on servers (they were all serverless; AWS lambda).

Now databases themselves are different stories, they are the persistence/data layer that microservices themselves use . But it's actually doable and I'd even say much easier to use microservices/serverless for ETL because it's easier to develop CI/CD and testing/deployment with non-stateful services. Of course, it does take certain level of engineering maturity and skillsets but I think the end results justify it.

[+] yibg|6 years ago|reply

This isn’t a new problem to microservices though, although maybe it’s amplified. Reporting was challenging before microservices became popular too with data from different sources. Different products, offline data sources etc that all had to be put together. The whole ETL, data warehousing stuff.

In the end everything involves tradeoffs. If you need to partition your data to scale, or for some other reason need to break up the data, then reporting potentially becomes a secondary concern. In this case maybe delayed reporting or a more complex reporting workflow is worth the trade off.

[+] bcrosby95|6 years ago|reply

Microservices are primarily about silo'ing different engineering teams from eachother. If you have a singular reporting database that a singular engineering team manages I'm not sure its a big deal. Reporting might be a "monolith" but the system as a whole isn't. Teams can still deploy their services and change their database schemas without stepping on eachother's toes.

[+] sharadov|6 years ago|reply

No one has solved that problem and it sucks, and what ends up happening is you end up again porting that data from those disparate SQL and NoSQL databases either to a warehouse which is RDBMS or you put it into a datalake. That's again possible if you somehow manage to find all the drivers. You're doubly screwed if you have a hybrid - cloud and on-prem setup.

[+] rjkennedy98|6 years ago|reply

This is what Kafka is for. You put Kafka on top of your database to expose data and events. Now BI can take the events put them into their system as they want.

[+] segmondy|6 years ago|reply

This is a solved problem, "Data Engineering" teams solve this by building a data pipeline. It's not for all orgs, but for a large org, this is worth doing right.

[+] zten|6 years ago|reply

I don't think you're right back to a monolith with centralized reporting. Remember, microservices doesn't mean JSON-RPC over HTTP. Passing updates extracted via change data capture and forwarding them to another reporting system is a perfectly viable interface. Data duplication is also an acceptable consequence in this design.

[+] willvarfar|6 years ago|reply

https://prestosql.io/

It can access all those different databases.

You can also make your own connectors that make your services appear as tables, which you can query with SQL in the normal way.

So if the new accounts micro-service doesn't have a database, or the team won't let your analysts access the database behind it, you can always go in through the front-door e.g. the rest/graphql/grpc/thrift/buzzword api it exposes, and treat it as just another table!

Presto is great even for monoliths ;) Rah rah presto.

[+] debrice|6 years ago|reply

Where I work we use micro-services through lambda (we have dozens of them) and use DynamoDB for our tables. DynamoDB streams are piped through elasticsearch. We use it for our business intelligence. Took us about a week to setup proper replication and sharding. I don't have a strong opinion on monolith or micro-service, pick one or the other, understand their culprit and write high quality (aka. simple and maintainable) code.

[+] jsjohnst|6 years ago|reply

But I’d argue Monoliths don’t have anything inherent to them which makes reporting easier. A proper BI setup requires a lot of hard work no matter how the backend services are built.

[+] JMTQp8lwXL|6 years ago|reply

Whether or not your system intentionally ended up with this architecture, GraphQL provides a unified way of fetching the data. You'll still have to implement the details of how to fetch from the various services, but it gives people who just want read access to everything a clean abstraction for doing so.

[+] gowld|6 years ago|reply

Build a reporting database that maintains a copy of data from other data stores.

Simple enough. Surely you wouldn't run analytics directly on your prod serving database, and risk a bad query taking down your whole system?

[+] mattbessey|6 years ago|reply

is that a bad thing? I don't think anyone is against a reporting monolith. to me, being able to silo data in the appropriate stores for their read / write patterns, and still query it all in a single columnar lake seems like a feature, not a bug to be solved

[+] Terr_|6 years ago|reply

> now they're stumped as to how they're supposed to quickly get data out

I'd argue that (given a large enough business) "reporting" ought to be its own own software unit (code, database, etc.) which is responsible for taking in data-flows from other services and storing them into whatever form happens to be best for the needs of report-runners. Rather than wandering auditor-sysadmins, they're mostly customers of another system.

When it comes to a complicated ecosystem of many idiosyncratic services, this article may be handy: "The Log: What every software engineer should know about real-time data's unifying abstraction"

[0] https://engineering.linkedin.com/distributed-systems/log-wha...

[+] oneplane|6 years ago|reply

Reporting on databases is a rather 90's thing to do. Why would you still actively pursue such a reporting method from the OLTP/OLAP world? If you use tools that are purposely made to work in such a way, an analyst using those tools will obviously not be able to utilise them in an incompatible environment.

[+] toriningen|6 years ago|reply

Or you could have CQRS projectors (read models), which solve exactly this - they aggregate data from lots of different eventually consistent sources, providing you with locally consistent view only of events you might be interested in.

It will lag behind by some extent, roughly equal to the processing delay + double network delay, but can include arbitrary things that are part of your event model.

Though, it's not a silver bullet (distributed constraints are pain in the ass yet), and if system wasn't designed as DDD/CQRS system from the ground up, it would be hard to migrate it, especially because you can't make small steps toward it.

[+] 1MoreThing|6 years ago|reply

Isn't this the whole point of a data lake?

[+] nitwit005|6 years ago|reply

You can't entirely escape this problem. Even when companies want to fit everything in a single database, they often find they can't. With enough data you'll eventually run into scalability limitations.

A quick fix might be to split different customers onto different databases, which doesn't require too many changes to the app. But now you're stuck building tools to pull from different databases to generate reports, even though you have a monolithic code base.

[+] campers|6 years ago|reply

I've seen that BigQuery has federated querying, so you can build queries on data in BigQuery, BigTable, Cloud SQL, Cloud Storage (Avro, Parquet, ORC, JSON and CSV formats) and Google Drives (CSV, Newlinen-delimited JSON, Avro or Google Sheets)

[+] AmericanChopper|6 years ago|reply

Even if you have a monolith, you’re still going to have multiple sources that you want to report on. Even in an incredibly simple monolith I could imagine you’d have: your app data, Salesforce, Google Analytics. Having an ELT > data warehouse pipeline isn’t difficult, and what reporting use case is undermined by the data being a few minutes old?

[+] anbotero|6 years ago|reply

Off-topic, but I knew I had seen that name somewhere: https://dbareactions.com/post/183007001237/devops-owns-the-q...

(SRE here, but I work on databases as well all day)

[+] throwaway894345|6 years ago|reply

> Reporting on a bunch of different databases is a hard nut to crack.

Maybe, but your business analyst already needs to connect to N other databases/data-sources anyway (marketing data, web analytics, salesforce, etc, etc), so you already need the infrastructure to connect to N data sources. N+1 isn't much worse.

[+] yahyaheee|6 years ago|reply

This is a problem but I’m not sure having everything in a single data store is a great idea either. Generally you want your analytics separate from your operations anyway. We do this by having a central ES instance which just informs on the data it needs, which had worked perfectly fine for our needs

[+] andy_ppp|6 years ago|reply

> Reporting on a bunch of different databases is a hard nut to crack.

It's not necessarily a bad idea though :-/

[+] skybrian|6 years ago|reply

It seems like interacting with customers and enforcing business rules is one job, and observing what's happening is a different concern. Observing means collecting a lot of logs to a reporting database.

[+] closeparen|6 years ago|reply

My employer adopted microservices for a very specific reason: it became nearly impossible to deploy the monolith. With hundreds of commits trying to go out every day, probability that at least one would break something approached 1. Then everything had to be rolled back. Getting unrelated concerns into separate deployable artifacts rescued our velocity.

It came with many of its own challenges, too! A great deal of infrastructure had to be built to get from O(N) to O(1) infrastructure engineering effort per service. But we did build it, and now it works great.

There is a reason monoliths were traditionally coupled with quarterly or even annual releases gated by extensive QA.

[+] eternalny1|6 years ago|reply

What is old is new again.

I've been a software engineer for over 30 years and have dealt with companies always trying to jump on the next bandwagon. One company I worked with tried to move our entire monolith application, which was well architected and worked fine, over to a microservices-based architecture and the result was an unstable, complex mess.

Sometimes, if it's not broke, don't try to "fix" it.

I can say the same regarding a lot of what is going on in the JavaScript ecosystem, where people are trying to replicate stuff that works fine in other languages in JavaScript. Mostly because they are only familiar with JavaScript and don't realize this stuff already exists and doesn't need to be in JavaScript.

[+] sytse|6 years ago|reply

At GitLab a couple of years ago we had to fight the temptation to split the application up in hundreds of micro-services.

I'm glad we did and today GitLab has a big monolith but also a ton of services working together https://docs.gitlab.com/ee/development/architecture.html#com...

I did an interview about this yesterday https://www.youtube.com/watch?v=WDqGaPGBZ9Y

[+] mikepk|6 years ago|reply

I'm tempted to write a blog post... I bristle a little when microservices are described as "best practice". Monolith vs microservice is really about _people_ and _organizations_. Monoliths make sense in some contexts and microservices in others, but the deciding factor is really the size of the team and number of people working on different functional contexts.

The best analog I can come up with is monoliths in larger organizations are like a manifestation of Amdahl's law. The overhead of communication and synchronization reduces your development throughput. Each additional person does not add one persons worth of throughput when you cross a critical individual count threshold (mythical man month and all that).

I'm not describing this clearly so I should probably actually commit to writing out my thoughts on this in a post describing my experience with this.

[+] andrew_n|6 years ago|reply

Spot on. The metaphor I typically use here is cleaning up a mess vs spreading it around. If you have a really big mess and spend a year or two rearranging it into dozens or hundreds of smaller messes, yes the big obvious mess is gone, but the overall amount of mess has likely gone up and by segregating everything you’ve probably made it much harder to someday get to a clean state.

If you’re moving to microservices because the number of people working on a project is growing too large to manage and you need independent teams, great. If you’re refactoring to microservices because “we’re going to do everything right this time,” this is just big-rewrite-in-disguise.

Whatever engineering quality improvements you’re trying to make—tech stack modernization, test automation, extracting common components, improved reliability, better encapsulation—you’re probably a lot better off picking one problem at a time and tackling it directly, measuring progress and adjusting course, rather than expecting a microservices rewrite to magically solve a bunch of these problems all at once.

[+] tynpeddler|6 years ago|reply

There are two big reasons to go to microservices (note that the exact definition of microservice can vary a lot).

1. Organizational streamlining. If the team working on the monolith becomes to large, then coordinating and pushing out changes quickly can become incredibly difficult. One rule of thumb I've heard is the two pizzas rule. If two pizzas can't feed the team working on a system, it's time to break up the system.

2. Horizontal scaling. If some components of your workflow require much more computing power than others, then it makes sense to break up your system to move computationally intensive tasks to their own services.

While there are lots of other decent reasons to break up a system, if you can't invoke at least one of the two above reasons, you may be shooting yourself in the foot. I think he's dead on when he points out that if you don't have engineering discipline in the monolith, then you won't have it in the microservices.

[+] bob1029|6 years ago|reply

We wound up taking a similar journey:

Monolith => Microservices => Monolith

I wouldn't say the journey was completely pointless, because the fact that we had to deploy 10+ services to make a single environment whole required us to build extremely powerful CI/CD management tools that we happen to be able to re-use in the (new) monolith case today. This journey was also a really good growth and learning opportunity for the team. Everyone who has touched this project and has seen both ends of the distributed<=>monolith spectrum is now radicalized towards preferring the monolith approach.

On the trip back into a monolith, we didn't just stop with the binary outputs of our codebase. We also made the entire codebase a monorepo. We have a single solution (VS2019) within that monorepo which tracks all of our projects. Prior, we had upwards of 15 different repositories to keep track of. Being able to right-click on a type, select "View all References" and legitimately get every possible reference to that type across the entire enterprise is the most powerful thing I have yet to see in my career.

[+] sebringj|6 years ago|reply

My experience with microservices has been just a shift in worries. I don't worry about scale or ssh configs but I rather worry about cloudformation and cloudwatch or billing impact. It has also been some challenge to get testing locally to work easily and there have been quite a lot of meetings and discussions used up on that alone. I don't find the microservice pitch from a developer perspective to be easier at all, actually harder overall. I do like the approach of gcloud or elastic bean stalk better for cloud as you have an auto-scale but still can do local testing easily for a monolith approach. The use case for microservices IMO is more like you have a couple of highly-used sets of functionality that are disproportionate to your monolith and can be split out to save money but not to build everything around microservices and pretend everything is easier. Personally I feel my cognitive load increases when using microservices purely.

[+] holoduke|6 years ago|reply

Designing an application from scratch where pure microservices is implemented is in my opinion the same as over engineering possible future performance issues. Splitting up your application in many services requires a lot of thinking and designing. Challenges with syncing, communication etc are not always easy to deal with. That's why I agree to start as a monolith but with architectural principles to still have multi modules/components. But I would for example never split up the database in multiple.

[+] reading-at-work|6 years ago|reply

I think the underlying point, as expressed by the author, is that trendy new architecture patterns will never be a panacea for bad engineering, though that's often how they're implicitly sold as ideas.

[+] markbnj|6 years ago|reply

I often agree with Kelsey Hightower, but there are so many things he doesn't mention here. For example, being able to independently deploy components frees up certain kinds of development worklows. Distributed components also scale and fail independently, and you can use nifty things like message queues between them to provide resilience and soak up load spikes. I'm sure the pattern has often been applied in the wrong use cases, and that many people have over-applied it, but "the monolith is the future" seems just as wrong as "microservices are the future." We are nowhere near the size of a large bank... or even a small bank, and yet we've benefited from a distributed set of independently deployable and scalable components. You can call them microservices, or not. I can think of ways we could restructure on a monolithic backend, but just noodling on the idea leaves me with more constraints than benefits. Idk, it's a thought-provoking statement at least, but I sort of wish we'd stop reacting to fads with anti-fads.

[+] MosheZada|6 years ago|reply

Why do we need to choose one of monolith and microservices? What about simply "services"? Monolith doesn't have to be split into 50 microservices, it can be split to 3 services

[+] zackmorris|6 years ago|reply

What's really going on here is that a remote procedure call (RPC) to a microservice or REST API is conceptually equivalent to calling a function in a library specified by an interface in a header file. There is an incredible amount of handwaving that obfuscates minutia around synchronous blocking vs asynchronous callbacks/promises/async-await but there is no reason why we can't convert from the distributed to local paradigm losslessly.

What I'm not seeing is any attempt to go in the opposite direction. A compiler should be able to look at ordinary code and slice it up into microservices automagically, converting the header interfaces to API specifications like OpenAPI/Swagger. We should literally be able to write a monolithic program in any functional or C-style imperative language and get a conversion to a bunch of lambda functions. If that doesn't work, then something is seriously wrong (probably having to do with determinism, like inadequate exception handling for timeouts, etc).

So frankly, the first day I saw lambdas, I was skeptical. I don't understand the point of writing all of the glue code by hand. Incidentally, I reached this same conclusion after manually building a large REST API around the JSON API standard just before GraphQL went mainstream and made a mockery of my efforts.

I think that the HTTP spec and things like separation of concerns serve a purpose for human readability. But we're well past the point where the gains made by the early internet are providing dividends in today's highly-interoperating stuff like Rust, Go and Node.js. Basically 90% of the work done today would be considered a waste of time (bike shedding and cargo culting) in the 1980s and 1990s. Just my two cents.

[+] dcwca|6 years ago|reply

The author does not seem to understand when to correctly apply microservices. There are two basic use cases: 1) Different parts of your solution have different load patterns and it is economically beneficial to scale them at different rates and 2) Different teams need to be able to work & ship autonomously. It's not at all about technical merits or architectural beauty. It's about people and costs.

[+] philipkglass|6 years ago|reply

I have seen monoliths successfully transition parts of their functionality into small services. I have not seen a microservice-first approach work very well. When you're building something new, your intuitions about which parts are going to be tightly coupled and which parts are going to be relatively independent are just guesswork.

Once you've iterated on a monolith enough to see which parts are relatively independent and would actually benefit from decoupling, then you can move them into separate services.

One example that comes to mind: I wrote a recommendation service that also handled user feedback events. This was the easiest way to start. After about a year I saw that we were iterating faster on the event processing than on the actual rec delivery. We were also deploying this monolith across more machines mostly to scale up event handling capacity. So we broke the high volume event handling out into a separate service that was smaller and optimized exclusively for event processing.

[+] DanielBMarkham|6 years ago|reply

With respect to the author, who probably is a much smarter person than I am, this is yet another in a long, long series of HN articles that should be grouped under "I don't know what the hell X is, but I was an expert in it, and I can tell you it sucks"

I've seen X be a dozen things: UML, databases, User Stories, Functional Programming, Testing... It's too much to list.

Yes. If you do it that way it will hurt, and you should stop. I don't know this author, but I suspect that many people who jump into microservices are not getting the foundations they need to successful. The idea that microservices are just broken-up monoliths is a big clue. They're spot on about marketing and spend, though. In this community we're quick to hype and sell things to one another whether it's a good idea or not.

I've seen some great criticisms of microservices, some of which made me pause. Now, however, I think there's a reasonable way through the obstacles. It doesn't have to be a mess. Nothing is a magic bullet, but about anything will work if your game is good enough. You don't buy a bright and shiny to make your game better. Doesn't work like that.

[+] JMTQp8lwXL|6 years ago|reply

It's okay to ship a bunch of services together, if you can be serious about keeping hard boundaries between subsystems. Microservices force you to do this (e.g., your microservices might have to communicate via REST APIs, but they can't access eachother's internal implementation details).

Your customers do not care about your monolith. They don't see a monolith; all they see is features. Untangling it may or may not be the right choice.

In a certain set of situations, the path forward, instead of trying to untangle your monolith is --if you so desire-- create new services actually be true microservices, and keep your monolith as-is.

[+] bkanber|6 years ago|reply

I've settled on a compromise in this debate. Halfway between monoliths and microservices is the shared-library model. Instead of creating a microservice for your image processing, break it out into a standalone NPM or Composer or whatever module, then use that in your monolith. Gives you good separation of code and responsibilities, gives you good upgrade paths for your monoliths, avoids the overhead of microservices.