top | item 9857288

Introduction to Microservices

137 points| fixel | 10 years ago |nginx.com

83 comments

order
[+] electrotype|10 years ago|reply
What scares me about microservices is the case where some operations must be transactional.

What if, in a given use case, multiple microservices are involved but the operations must be transactional : if one of the services fails, all previous operations must rollback. What are the recommended ways of implementing this kind of transactional behavior in a modern HTTP/REST microservices architecture?

I know the pattern is called "distributed transactions" and is often related to two-phase commit protocol. But there doesn't seem to be a lot of practical information available about this topic!

I found this recent presentation[1] that talks about it, but I'd like to learn more on the subject. Also, I'm looking for practical tutorials, not highly academic ones! I'd really love to see code samples, for instance.

Any links, suggestions?

[1] http://www.infoq.com/presentations/microservices-docker-cqrs

[+] batou|10 years ago|reply
We've moved down this path from a massively complicated distributed transaction environment on top of MSMQ, SQL Server etc and you know what? With some careful design and thought about ordering operations and atomic service endpoints, we didn't need them at all after all.

Transactions can be cleanly replaced with reservations in most cases i.e. "I'll reserve this stock for 10 minutes" after which point the reservation is invalid. So a typical flow for a order pipeline payment failure would be:

1. Client places order to order service.

2. Order service calls ERP service and places reservation on stuff for 10 minutes.

3. Order service calls payment service (which is sloooow and takes 2-3 mins for a callback) and issues payment.

4. Payment service fails or payment fails.

5. Order service correlation times out.

6. Order service calls notification service and tells buyer that their transaction timed out and cancels the order.

7. ERP service doesn't hear back from the order service and kills reservation.

etc etc.

At step (4) you have an option to just chuck the message back on the bus to try again after say 2 minutes. If everything times out, meh.

[+] nlawalker|10 years ago|reply
The concept of Aggregates in Domain-Driven Design is based around the need for business invariants that must be maintained with transactional consistency in a system that is generally eventually consistent.

Overall, you have to learn to love eventual consistency, but small portions of the domain should absolutely be clustered together around transactional consistency needs that are absolutely necessary.

Check out "Implementing Domain Driven Design" by Vaughn Vernon; chapter 10 in particular talks about this.

[+] pbh101|10 years ago|reply
Caitie McCaffrey gave a talk on this subject at GOTO Chicago 2015 [1], refreshing the 'Saga Pattern' for a distributed environment. Short summary, IIRC:

1) Use a linearizable data store to store transaction metadata

2) Each step must have a complementary rollback step

3) Rollback steps must be idempotent. Depending on the type transaction, sometimes 'rollback' is too strong and you instead implement other types of recovery (e.g. 'roll-forward')

[1] http://gotocon.com/chicago-2015/presentation/Applying%20the%...

[+] msluyter|10 years ago|reply
Funny, I made a very similar comment recently:

A concrete example we've faced. A certain operation requires writing data to N flaky services. You successfully write to N-1 of them, but the Nth fails. Now what do you do? If these N things were just database writes to the same DB, transactions would save you, as you could just rollback. Without that, the answer has to be handled in code -- do you reverse the previous changes (if possible) by sending delete events, or leave the system in some sort of half-baked state and rectify things later via some other process? (I'm interested in hearing of other options...)

The answers I got were:

1) apologetic computing (Amazon)

2) consensus algorithms / paxos

The problem I see is that these may be non-trivial to implement and/or not fully understood or standardized.

[+] cam-|10 years ago|reply
We have one instance where we update an account, that goes out to potentially 11 service calls which are not transactional. We are having to maintain state in our app because the micro services have split this up so much.
[+] programminggeek|10 years ago|reply
The alternative to Microservices would be to get better at building boundaries in your application before resorting to creating physical boundaries to interact with code.

There are a lot of approaches to this. I've explored these ideas with Obvious Architecture (http://retromocha.com/obvious/) and the talk I gave at MWRC 2015 on Message Oriented Programming (http://brianknapp.me/message-oriented-programming/).

I think the big lesson is that the Erlang stuff was WAY ahead of its time and it already solved a lot of the problems of large networked systems decades ago. Now that we are all building networked systems, we are relearning the same lessons telcos did a long time ago.

[+] sbov|10 years ago|reply
Reading the article, I'm not sure if that's an alternative so much as a prerequisite.

Looking at the monolithic architecture, it just took each feature within the monolith and created it as a microservice. Just because you have a monolith doesn't mean you can't have well thought out features and separations of concern.

Before coding, before deciding on architecture, I like to think in these terms. What features make the most sense together? Far apart? It should be a prerequisite of any project, regardless of architecture. If you're building a monolith, each one just goes in a different module or package rather than having its own service.

[+] jedberg|10 years ago|reply
But then you lose a lot of the benefits of microservices, namely the scalability and reliability of being able to manage deployments seperately.
[+] orthecreedence|10 years ago|reply
After reading about microservices, I feel like it's a great idea if you have a large team and a lot of resources, but that's not really explicitly stated (although it's nice this article features drawbacks). I feel like there's a ton of hype but nobody is saying "Don't do this if you don't have a 30-person team!"

Separating everything out into little APIs all with their own datastores that all talk to each other sounds great, but I would not to do this on a three person team. Just give me an old fashioned monolithic API, a large database, and then I can spend 80% of my time programming and 20% on maintenance. One app is hard enough to run, why consciously choose to run 10 of them if you don't have the capacity?

I don't think microservices are a bad idea at all...I love the architecture. But the hype makes it hard to see that this architecture probably isn't for you unless you have the capacity for it.

[+] johns|10 years ago|reply
If you're interested in our experience on starting on microservices from scratch with only 2 people, I gave a talk on it: http://www.infoq.com/presentations/queues-proxy-microservice...

I agree they're not for every team, but it definitely allowed us to move, grow and scale faster than other dev environments I've worked in.

[+] viksit|10 years ago|reply
Actually, I'd say the microservices architecture has helped in a complex project that I'm working on right now.

Consider the following that need to be done here.

- A main library that needs to load up a few gigs of data in memory

- A process that communicates with a queue of messages coming in

- A process that interfaces with mobile app (port x)

- A process that interfaces with a different kind of app (port y)

The goal is - every incoming message needs to go through to the main library and back to the app via the queue.

Monolith option - main.cc which contains all this, takes a while to start, can't queue up incoming messages till everything starts up and loads in memory, et al. Even using threads and whatnot.

Now with microservices,

- I can build a service that exposes my big-data-load library through a port. This can be loaded and restarted at will.

- Queue is running as a separate process. Messages queue when main lib is down and processed later.

- Server A and server b run separately

- A bug in one won't crash all the others

- I can manage each service independently (run them via supervisor or whatnot)

- Scaling it is easy - I can deploy each service behind load balancers, on different machines in the future without ever needing to change anything but the urls in a config file

- Monitoring - I have latencies for each service available via haproxy and the like.

My 2c.

[+] viksit|10 years ago|reply
Microservices are great, but the underlying protocol they run on is important.

If you're building a REST interface to all your services, and something consumes them - they might be slower than a monolithic app unless you have something like a TCP or HTTP level keep-alive built in. Connections need to be long standing - otherwise the overhead of creating a new connection is pretty high.

Question here - what is a good way to make this long standing connection happen? Eg, if you use python urllib3 and nginx - can you keep these connections alive enough (with pings or whatever) that your latency is lower than bundling that service within the code itself as a library?

[+] usaar333|10 years ago|reply
If you use keep alive and your server/load balancer doesn't have overly aggressive connection termination policies, I doubt re-connections every 60 or so s will be a major throughput hit. Regardless, you are always going to incur latency overhead by sending data over TCP relative to just sharing memory.

What I am more worried about with microservices is the data serialization overhead. Transforming data to some encoding that is robust against version changes (say using protobuf) can be quite costly on both sender and receiver, especially in languages with relatively slow object creation (e.g. python). This is highly application specific, but I'd love to hear others' thoughts on this trade-off.

[+] sbilstein|10 years ago|reply
I think this is the first time I've really understood the differences between SOA and Microservices (and realized that my workplace's architecture is a flavor of Microservice).

"On the surface, the Microservice architecture pattern is similar to SOA. With both approaches, the architecture consists of a set of services. However, one way to think about the Microservice architecture pattern is that it’s SOA without the commercialization and perceived baggage of web service specifications (WS-) and an Enterprise Service Bus (ESB). Microservice-based applications favor simpler, lightweight protocols such as REST, rather than WS-. They also very much avoid using ESBs and instead implement ESB-like functionality in the microservices themselves. The Microservice architecture pattern also rejects other parts of SOA, such as the concept of a canonical schema."

So SOA implies the existence of some heavy enterprise tools like WSDL and SOAP or other RPC type systems. Microservices favor RESTful interfaces.

[+] dragonwriter|10 years ago|reply
SOA predates the enterprise tooling, which followed on with the popularity of SOA as an architectural style.

If microservices catch on, expect five years from now we'll be talking nano-services, and how microservices imply a whole stack of enterprise services that will have grown up around the microservices architecture.

[+] markbnj|10 years ago|reply
I guess I am in what the author termed the "naysayers" camp, in that I cannot think of the "monolithic" variation they depicted as a "service-oriented architecture" of any kind. A single chunk of stuff sporting a lot of different APIs and interfaces can't be thought of as implementing services, imo, unless "services" is synonymous with "API" and for my part that is too general a definition to be of much use. If you agree with my view then services already required a separately designed, implemented, and managed piece of code that implements a single cohesive set of functions, and all that's left for microservices to add is minimality, which, imo, should already be in the mix.
[+] matthewmacleod|10 years ago|reply
This is a good read, but I'm wondering:

The Microservice architecture pattern significantly impacts the relationship between the application and the database. Rather than sharing a single database schema with other services, each service has its own database schema.

Is this a necessary prerequisite? One of the problems I'm dealing with now (and have been in the past) is the tyranny of multiple data stores. At any reasonable scale, this quickly leads to a lack of consistency, no matter how much you'd like to try.

It feels like most of the gain in a microservices architecture is from functional decomposition of code, with limited benefit from discarding the 'Canonical Schema' of SOA. I'd be interested to hear others' experiences with this, though.

[+] danudey|10 years ago|reply
The huge benefit that we see in our architecture (which I would call service-oriented, and not 'microservices' necessarily) is data separation.

Each of our services is a separate django app, and the database name is <consistent prefix>_<app name>. Originally, this meant we had 5-6 database schemas named <something>_friend, <something>_invite, <something>_news, etc., all one one database.

What ended up happening was some services rapidly outgrew the capacity of a single database server, such as our 'news' service, which handles chat services, private messages, and so on (and thus grows nonlinearly with community growth), unlike other services which grow linearly (like our 'identity' service). As a result, the 'news' database had to move to its own server. Thanks to this database schema separation, however, this was a trivial task. Dump the schema, restore the schema, change the DB host in the django config, and you're done.

If we had our data intermingled in the same schema, it would have been far, far harder to do this.

Fundamentally, your 'microservices' style architecture should be designed in such a way that you could take any of your services, tar up the code, and e-mail it to someone else, and they could use it in their architecture. For obvious reasons this isn't actually feasible (e.g. service interdependencies), but conceptually you should be able to draw firm, hard lines down your stack showing where each service starts and ends; this includes frontend services (nginx/haproxy/varnish/whatever configs), code (including interface definitions/client libraries), data persistence (database schemas, MongoDB collections, etc), and caching (Redis/Memcached/etc. instances).

The more interdependencies you have, the more problems you'll encounter down the road. If you intermingle MySQL data then any maintenance is downtime, any slowdown slows everything, any tuning is across your entire dataset, etc.

[+] jedberg|10 years ago|reply
It's a requirement in the sense that sharing a datastore would break the abstraction. Each service should be independent from the others, which necessitates a separate data store.

Consistency should be maintained at the application level if you want to build a robust service, because doing it in the database leads to a single point of failure (the database)

[+] dragonwriter|10 years ago|reply
> Is this a necessary prerequisite?

For something to really be using a microservice architecture? Yes.

Of course, real world systems don't have to use pure architectural styles, though its worth understanding why a named architectural style combines certain features before deciding to use some but not others.

> One of the problems I'm dealing with now (and have been in the past) is the tyranny of multiple data stores. At any reasonable scale, this quickly leads to a lack of consistency, no matter how much you'd like to try.

Honestly, I think if you have real inconsistency (rather than differences in data of similar form but different semantic meaning) with microservices with separate data stores, it means that you have designed your services improperly, such that they have overlapping responsibility.

[+] hardwaresofton|10 years ago|reply
I don't think it's a hard prerequisite (as there really aren't... too many of those, microservices can be built how you want them to be), but I think it's a good rule to follow.

If consistency is a necessary concern, and you have tightly coupled data, it's not a terrible idea to make the services a little bigger.

But also, if you have a service that depends on multiple other services to do work, I don't think it's so bad to get used to using the API for the other services (rather than trying to access their databases directly) -- despite the introduced latency overhead

[+] outworlder|10 years ago|reply
If you have multiple "microservices", all operating on the same data store, it is difficult to guarantee the separation of concerns.

Conceptually, though, I don't think it is a requirement.

[+] aikah|10 years ago|reply
> The term microservice places excessive emphasis on service size. In fact, there are some developers who advocate for building extremely fine-grained 10-100 LOC services.

It's just a matter of time until someone writes a "nano-service" manifesto...

[+] stevelosh|10 years ago|reply
Let's go all out:

    Prefix      Symbol  Size                               Example
    yocto       y       1 bit                              Theoretical minimum
    zepto       z       1 byte (close enough to 10 bits)   Really small APL program
    atto        a       10 chars                           nc -l 8080
    femto       f       1 line (roughly 100 chars)         netcat piped into something else
    pico        p       10 lines                           tiny python service
    nano        n       100 lines                          small python service
    micro       μ       1000 lines                         typical "smallish" service
    milli       m       10,000 lines                       about as big as "microservices" would go these days, or a small monolithic app
    centi       c       100,000 lines                      decent-sized monolithic app
    deci        d       1 million lines                    large monolitihic app
    none        n/a     10 million lines                   roughly OS-level app
    deca        da      100 million lines                  god help you beyond here
    hecto       h       1 billion lines                    
    kilo        k       10 billion lines                   
    mega        M       100 billion lines                  
    giga        G       1 trillion lines                   
    tera        T       10 trillion lines                  
    peta        P       100 trillion lines                 
    exa         E       1 quadrillion lines                
    zetta       Z       10 quadrillion lines               
    yotta       Y       100 quadrillion lines
[+] Jtsummers|10 years ago|reply
Perhaps the nanoservice manifesto would just be to use vanilla erlang. One process per module, each module less than 100 LOC. Would that be small enough?
[+] jedberg|10 years ago|reply
I think Amazon just did, they call it Lambda.
[+] ed_blackburn|10 years ago|reply
I think some teams are going to discover that RPC is a better fit for some APIs. Will we see Thrift get more popular? A resurgence in WCF(!) or something new and super light? For asynchronous are we going to see more pub / sub? Is this a good fit for ZeroMQ? I think there's a lot more mileage in these discussions...
[+] jessaustin|10 years ago|reply
ISTM RPC is eventually a too-leaky abstraction. That is, if used in synchronous fashion, in the long run it will cause pain. If you use it asynchronously (which seems to stretch the definition of RPC), why not just use something that is naturally asynchronous?
[+] sul4bh|10 years ago|reply
The article mentions using a different database for each service. Which means, you cannot join table between different databases and you lose the 'relational' aspect of RDBMS. How is this problem solved by people using micro-services? For example, how do you related trip data and driver data for reporting purpose?
[+] Cieplak|10 years ago|reply
One pattern I've seen is having an OLTP database per service and an ETL process to stream each service's data to a central warehouse or OLAP database that would satisfy reporting requirements.
[+] dragonwriter|10 years ago|reply
> How is this problem solved by people using micro-services? For example, how do you related trip data and driver data for reporting purpose?

There are a couple of different ways that are obvious:

1) The act of scheduling a trip requires the trip service to get information from the driver service related to the driver for the trip (the action might be triggered from either service). While information about the driver in the driver service might change, the information about the driver that was recorded with that trip is fixed. All the information necessary to answer queries about involved drivers that are within the scope of the trip service is stored in the datastore for that service. The same thing is generally true of all services.

2) For generalized reporting, information required to support that function is sent by various services to a separate reporting service, which aggregates historical data for reporting purposes. (Even non-microservice architectures often involve this, having transactional operational databases export data into a analytical database, with different schema and capabilities, for reporting purposes, rather than using one datastore for operational and reporting use.)

[+] sul4bh|10 years ago|reply
Thanks for your answers. I see that data warehousing is an important aspect of microservice architecture. I wish it was highlighted more often on microservice discussions.
[+] tostitos1979|10 years ago|reply
I've been hearing the Microservices buzz for a while. I recently tried to set up a new project as an ensemble of microservices but got stuck. There is a fair bit of "common" tooling like load balancers and events/log system. I'm about to throw in the towel since it seems too complicated to get going for a brand new project.
[+] mountaineer|10 years ago|reply
From my impression of the recent buzz about Microservices, and having spent the last year building them, Microservices shine as a refactoring approach for large/monolithic applications.
[+] jedberg|10 years ago|reply
I'm glad to see this because I cofounded a company to solve exactly this problem. Link is in my profile if you're interested, we just launched.
[+] panamafrank|10 years ago|reply
My main advice to anyone considering writing a microservice based architecture from scratch is to keep a really tight handle on code duplication and testing. Also take a deep look into Erlang, it's written with these types of systems in mind.
[+] HeroOfAges|10 years ago|reply
I’m having a really hard time understanding the pushback against a microservice architecture. Done properly, I can’t see the difference between building an app using a microservices architecture and a mashup. Am I being naive?
[+] bitcrusher|10 years ago|reply
I think the big problem with the microservice movement is that it is easy to understand but difficult to implement. It reminds me of when OO was new on the block and people designed these really elaborate OO hierarchies only to have them break down over time. Then came "OO SUCKS" because there was a level of experience required to understand what level to take your modeling to.

The same thing is happening with microservices. Engineers are microservcing ALL THE THINGS at such a fine grained level that it becomes a nightmare to maintain, orchestrate and manage. Therefor "microservices suck!".

Most of the time, you can model your domain into a few key areas, say 'customers/security', 'interface' and 'processing'. That's a good 3 service start. You may never need to go beyond that. However, as your needs grow or change, you can start to refine your model based on changing business needs or scale/performance/infrastructure issues.

In my experience it's a completely logical way to design a system and is really no different than making 'libraries' of code all housed under a single master application. The only real difference is the underlying communication infrastructure.

[+] cscharenberg|10 years ago|reply
I think of it as a "where do you want complexity?" tradeoff in this whole debate. Monolithic codebases have complex code but easy deployment and monitoring and coordination (in the sense you deploy and manage one big thing). Microservices have simple code individually but have complex deployment, monitoring and interface coordination. As my team (40 people) moved toward microservices, we spend extra effort ensuring passive changes. Then we have to update and release the multiple services that need updates to gain some new feature. In the past, what was some complex code updates is now complex packaging and intricate versioning. Less code changing...but in more places.

For several reasons, I think our move toward microservices is a good one. But in our case I have seen complexity move from code to coordination.

[+] mountaineer|10 years ago|reply
I think the pushback can be summarized by the fact that there are more moving parts with a microservice architecture. Harder to test, more things that can break, more coordination for deployments.

Are you referring to mashups in that the application is using external APIs? I'd say that's similar with the difference being you don't have to coordinate the deployment part. With mashups you still have the testing challenge and reliance on another system to be running that comes with a microservice architecture.