top | item 29315414

Living with single-tenant and multi-tenant architectures

109 points| mkasprowicz | 4 years ago |medium.com

40 comments

order
[+] beachy|4 years ago|reply
There's a word missing from this article - SaaS.

If you aspire to deliver the same software solution to multiple customers then most likely you are in the SaaS business.

And as Marc Benioff famously observed a long time ago, "Multi-tenancy is a requirement for a SaaS vendor to be successful.".

I think about this often because this was a crucial thing we got wrong at my last company.

We got a few large enterprise customers early on, which was great. But each had some unique requirements at the time. With hindsight, they weren't really that unique at all, but there was only a single customer asking for each one.

We took the decision to use separate databases (schemas in the Oracle world) for each customer. That way we could more easily roll out features to individual customers. We were careful to keep only a single codebase, I'd seen that before. But still, any customer could be on their own version of that codebase at any time, with their own schema that matched the code at that version.

I now think of this approach as maybe getting into heroin (no direct experience). Feels great, powerful, you can do anything. But ultimately it will kill you, and the longer you do it, the harder it is to get back onto the path of righteousness - a decent multi-tenant architecture.

[+] throwaway984393|4 years ago|reply
I've work on projects where we did it the opposite way, with multi-tenancy as the default. It didn't work. For a couple customers, we had to carve out their own dedicated resources. A few of them were absolutely murdering the performance of the rest of the cluster(s), and some had business requirements to be completely isolated. Customers with similar requirements and workloads we kept in a multi-tenant pool.

Even though some of the benefit of multi-tenancy is (supposedly) simpler management (one set of resources), multi-tenancy can actually become more difficult as the customer pool gets bigger, or workloads get more uneven. Maintenance on the whole pool becomes more and more problematic, and you try to patch around or delay it by scaling vertically. You basically run into every possible problem and hit limits you hadn't thought of sooner than with single-tenancy. And worst of all, it's impacting more and more customers.

[+] mooreds|4 years ago|reply
I think there's another word missing: "data sovereignty". Depending on your business, your customers might need to keep user PII within their country. Having a single tenant solution makes this possible (just stand up a server in a data center in their country and have it communicate only within their country).

It is really just another set of tradeoffs and I think the author does a good job of detailing them from the perspective of company with internal customers. With external customers, the calculus can change.

> But ultimately it will kill you, and the longer you do it, the harder it is to get back onto the path of righteousness - a decent multi-tenant architecture.

Hahah!

The nice thing about a multi-tenant architecture is that it enforces consistency and therefore gives you scale. This can be achieved with single tenant as long as you are ruthless about it (that's what we have done at my current job).

We run separate servers for each client (they can also self host). We built a system to let clients control their version, so they can stay at an earlier release. But everyone stays on the mainline codebase and, critically, database schema.

But sticking with a multi-tenant architecture for a publicly facing application will make it easier to enforce that consistency. That may lose you some sales, but will lead to better scalability.

[+] nisegami|4 years ago|reply
I (briefly) worked at a company who fell into this trap. They ended up having an entire department built just to manage the versioning of client instances.
[+] kgeist|4 years ago|reply
For our monolith, we have a separate database per tenant. Databases are hosted on several servers (around 10k-20k databases per server) -- to spread the load. Additionally, each region (EU, US, etc.) has its own infrastructure. Having separate databases is very useful in that we have complete isolation of business data, it allows some performance improvements (full table scans are less harmful), and it's easier to investigate client problems for our support team, because you only see what you need to see. However, obviously, CPU and disk are shared and one especially active tenant can degrade the service for others (we're working on ways to throttle them).

Our roll-outs first apply migrations, then deploy the code. Migrations are applied by iterating all databases, and it sometimes it can take for up to several hours, before the code is finally deployed (so that it can use the new schema in every DB). It creates a very large window where old code can see new DB schemas, so we have to be careful for our migrations to be forward- and backward-compatible.

Microservices have a different approach: usually there's a single database for all tenants (a database per microservice, of course), what is sharded instead is tables. There's tables like "user_0", "user_1" et cetera; they are created automatically when needed. It allows some degree of isolation (although several tenants can occupy same sharded tables), but the main benefit is that scanning such tables is faster. The migration mechanism can enumerate all such sharded tables and apply migrations to them one by one. For data isolation, there's a requirement that each table must have an "accountID" column which must be always checked in each access to the repository on the infrastructure level (otherwise it shouldn't pass the code review). Account ID itself comes from the JWT token from the request headers, so a malicious tenant can't access other tenants by just changing the account ID in the request. Business logic doesn't pass account ID's around in function signatures, it happens transparently on the infrastructure level (it's passed to the repository constructor when building the service graph in the dependency container).

[+] some_developer|4 years ago|reply
> Our roll-outs first apply migrations, then deploy the code. Migrations are applied by iterating all databases, and it sometimes it can take for up to several hours, before the code is finally deployed (so that it can use the new schema in every DB). It creates a very large window where old code can see new DB schemas, so we have to be careful for our migrations to be forward- and backward-compatible.

I read this answer last week but had to let this sink in.

I can't fathom how this would feel being "the norm". As a postgres user, I'm accustomed that usually most DDL statements can be done inline in a few seconds, ignoring things like creating indices concurrently which may takes up to 1-2 hours depending on the table.

OTOH I'm sitting more or less on a monolithic database, if there's even a term for this. Yes, it's multi-tenant for tables where necessary (though we don't include that particular customer ID key on all tables as some FK relations naturally provide this separation) and there are times when I'm not sure a tables with 700+ million rows and rapidly growing is a good idea.

From where I sit, having multiple database per server and/or application instances sounds an impossible thing to manage. Let alone that the "application" is a collection of multiple services (micro to medium) and "one just does not run multiple instances" of this cohort. Then there's is the additional challenge that we're receiving "lots-o-webhooks" from various services, a few hundreds to thousands per second for any customer and would need a central service to know to dispatch to which database etc.

If I may ask kindly, is it possible for you to share a bit of insights on how you got on this road to where your company is now? Did you start out _that_ way from the start?

Thanks

[+] mooreds|4 years ago|reply
"This is not a big deal in Comments because our API is not public and user accounts are shared across all newsrooms."

This was kinda scary to read. just because an API isn't documented doesn't mean it isn't public! Broken Access Control is the top OWASP issue in 2021: https://owasp.org/Top10/

Visiting the sample 'Comments' link and looking at the network console in Firefox revealed this call: https://cmmnts-api.i.bt.no/v1/publications/e24/article:e24:a... DESC&replies=ASC

Doesn't look like they protect (probably because it is called from javascript). That means that the malicious can nose around and create mischief.

[+] FarhadG|4 years ago|reply
For multi-tenant architecture, the following topics are always top priority for me:

- Shared compute with tenant context passed via JWT

- Data isolation by either physical separation (i.e. separate database) or logical separation (i.e. separate schema, table, or column association) depending on requirements

- Enforcing tenant context at the API gateway

- Always leveraging policies and ACLs via JWT to enforce secure data retrieval

- Sometimes using RLS within the database

- Either universal data encryption or per tenant depending on requirements

[+] Spivak|4 years ago|reply
- JWT is fine and webscale but plain sessions are also fine. Associating logins with tenants is the important bit.

- Shared compute is actually the part that to me means diddly squat and customers seem to prefer dedicated. It costs nothing to spin up more stateless-ish app servers dedicated to a tenant. It’s the db, logs, caching, load balancers, queues, monitoring I don’t want to split up. Also nothing is still wrong with normal sessions stores in Redis.

- Separate schemas are not preferred but fine kinda but at very least don’t create separate db accounts per tenant. The credential/connection management will make your life a living nightmare and doesn’t work with SQL proxies.

- We must seriously have vastly different JWT experiences. Every super businessy app I’ve made hits the ceiling fast of how much junk you can store in the JWT before having to punt to the db for user permissions.

- RLS is dope and you should choose it every time when you can. Not having to do #customers schema migrations is worth it.

[+] esyir|4 years ago|reply
>- Data isolation by either physical separation (i.e. separate database) or logical separation (i.e. separate schema, table, or column association) depending on requirements

It's interesting to me that you and @abraae seem to take the exact opposite view on the topic of data isolation where he/she has a much... harsher opinion:

>We took the decision to use separate databases (schemas in the Oracle world) ... I now think of this approach as maybe getting into heroin (no direct experience). Feels great, powerful, you can do anything. But ultimately it will kill you.

Of course this doesn't apply to the case of column association, but I'm interested on your take on this.

[+] abraxas|4 years ago|reply
> - Sometimes using RLS within the database

When is it not a good idea to leverage the database's RLS for access control?

[+] say_it_as_it_is|4 years ago|reply
Multi-tenant databases feel like the result of a decision where a team either doesn't know how to architect data models well or doesn't want to put the effort into doing so. Referential integrity was solved 50 years ago. To demand that one's data not be commingled with someone else's in a database is as arbitrary as demanding it not be transmitted by the same pool of network connections used by the server. Our data must not reside within the same physical disk storage or memory as that used by other customers!
[+] fennecfoxen|4 years ago|reply
To turn this practical: A good security model makes the right thing happen by default, and makes doing the wrong thing hard.

A secure data model should make the tenant identifier necessary to successfully complete a query. Haven’t composite keys (including composite primary keys) been around since SQL86?

A good application layer, likewise, enforces that a tenant identifier is set on every endpoint, with no additional code, just from creating the endpoint.

[+] e1g|4 years ago|reply
The issue is backups.

We're a B2B/Enterprise SaaS and most tenants require that we erase all their data at the end of the contract. Some require customer-managed encryption keys. The only way to meet this requirement is to have every tenant isolated in their own database (and their own S3 bucket etc). If data is mixed, when one tenant leaves you must go through all copies of all backups, purge their rows, then re-save the cleaned up backups. Nearly impossible in practice.

[+] chromanoid|4 years ago|reply
I think it depends. The concept of a tenant is a spectrum that ranges from simple end user to big company with thousands of users. Depending on what your application provides, different solutions are useful.
[+] DeathArrow|4 years ago|reply
Discriminate tenants by Id in the database? This can go wrong for countless reasons. I would use a separate DB for each tenant.
[+] cerved|4 years ago|reply
I would tend to agree but maybe it's not a fit for them.

I'd say in this case the obvious problem is the comment is associated with a tenant. The comment should be associated with a user which should (probably) be associated with exactly one tenant.

[+] arpinum|4 years ago|reply
This is a great reminder of all the problems I don’t think about because I use Lambda, API Gateway and DynamoDB. I still think a lot about IAM policies and queuing systems, but scaling compute and databases are no longer a concern.
[+] moltar|4 years ago|reply
Do you not think about relational integrity?