top | item 22382505

(no title)

nizmow | 6 years ago

One thing this article doesn't mention is that in many cases databases scale poorly, and only vertically. You can throw a bunch of domain logic in complex SQL queries and see significant performance benefits, but at some point you may find that your application has grown and your SQL server is running into IOPS and CPU limits that are difficult to fix. At that point your answer could be, pull logic out of those complex queries and create simpler queries that ask less of the DB and run on the application side so you can scale horizontally.

discuss

order

crazygringo|6 years ago

I don't think I've ever seen performance improvements from taking logic in a complex SQL query and re-implementing it at the application layer.

The bottleneck with databases is virtually never calculations on results (CPU), it's disk access and network latency and bandwidth.

And generally, if you do have crazy complex CPU-bound calculations you need to do on data (e.g. scientific stuff)... SQL doesn't provide the necessary functions/control anyways, so the database isn't even an option for that.

The only issue I can imagine you might have run into at some point is complex queries badly written (e.g. recursive subqueries without indices) that you could speed up with application logic -- but then the solution is to optimize the query.

mumblemumble|6 years ago

And I've often seen things go dramatically in the opposite direction.

I/O is generally just about the slowest thing a computer can do. So, in general, the more you can reduce the data before sending it across the network, the better. And, heck, a well-crafted SQL query can often save you having to even read large chunks of data off of the disk in the first place, let alone pipe it across the network.

sgift|6 years ago

> (...) but at some point you may find that your application has grown and your SQL server is running into IOPS and CPU limits that are difficult to fix.

It's not unique to SQL/databases that at some point your user base may outgrow your current architecture and you may have to invest in something else.

It's dangerous to think you can skip this and just employ whatever architecture is "good" for bigger user bases from the start. There are different trade-offs in different systems and the trade-offs of most alternatives are not as well understood as the trade-offs of databases. I have seen more than one project implode when something which was simple with a database was far more complicated with the alternative.

ak39|6 years ago

Most (if not all) modern RDBMSs allow for reliable horizontal and vertical scaling. SQL Server (the db you mention) is no slouch when paired with additional CPUs (for vertical linear scaling) and allows for transactionally safe replication to distributed clusters of SQL Servers (for horizontal scaling) in geographically distributed servers. PostgreSQL has this and it's absolutely free.

Where is this meme that RDBMSs do not scale coming from? (I see lots of people saying this as though it is a given. Has there been any published data on this for me to read?).

keanzu|6 years ago

> Where is this meme that RDBMSs do not scale coming from?

They don't scale to Google or Facebook operational sizes. Once you get to a billion customers or so the ol' RDBMS tends to struggle. Because everyone wants to be Google they imagine they have Google's problems. I've been in a meeting where the client was talking about their severe scaling issues for their "big data" which could only possibly be resolved by state of the art cloud solutions. I pressed them on the numbers - they had 400GB. You can buy an iPhone with 512GB.

Oracle's Exadata X8-2 is ~1PB per full rack.

throwaway95729|6 years ago

I think another reason for this, is that companies had a tendency to only have one single running database, to store everything, and often even in the same schema. Which ended up not scaling well in the organisation when so many unrelated topics/project/features went through one and the same database. I don't know if it has anything to do with licensing and paying per instance, maybe, but I feel that a lot of those problems of a database that doesn't scale, could have been solved by simply using a few databases for isolated domains instead of cramming everything into one.

fr1tkot|6 years ago

> Where is this meme that RDBMSs do not scale coming from? (I see lots of people saying this as though it is a given. Has there been any published data on this for me to read?).

It mostly comes from people who don't know how and when to create an index.

felixyz|6 years ago

> Where is this meme that RDBMSs do not scale coming from?

It's a holdover from the heyday of noSQL. Was more true then, but RDBMSs have caught up. (At least some of them.) There's even Dqlite for SQLight!

mhd|6 years ago

> Where is this meme that RDBMSs do not scale coming from?

In my days it was more like "You can't afford RDBMSs at that scale". Reach a certain point and Larry gets a new yacht. Cheaper/open source offerings have moved that goal post by quite a bit.

Although it sometimes scares me how them poor databases get treated when performance is dropping. Little Jimmy JOIN is the first one to be put down, often way before there's a need for it.

BrentOzar|6 years ago

> SQL Server (the db you mention) is no slouch when paired with additional CPUs (for vertical linear scaling) and allows for transactionally safe replication to distributed clusters of SQL Servers

Additional CPUs are $7,000 USD per core, and replication is labor intensive. Transactional replication has a nasty habit of breaking as the source tables are changed, and Availability Groups have a ton of bugs (as evidenced by any recent Cumulative Update.)

Saying that SQL Server scales is like saying your wallet scales to hold any amount of money. Sure, it might, but it’s up to you to put the coin in, and it’s gonna for a lot of coin - compared to scaling out app servers, who have generally near no license cost, and code is synchronized at deploy time.

As to recommending you a place to read, I hate to say this, but you could start with my blog. Pretty much every week, I’ve got real life examples turned into abstract tutorials on there from companies who hit scaling walls and had to hire me for help. (Past client examples: Stack Overflow, Google.)

nickserv|6 years ago

It's certainly possible, but more complex than just putting up a bunch of stateless application servers all accessing the same database. With no local cache. Then saying the DB is slow ;-)

It's also the kind of thing many developers don't like doing: thinking about operational concerns.

As an aside, horizontal scaling of sorts can be achieved by using microservices, it's actually one of the few really valid reasons for this type of architecture. If the microservice databases are not independent, you're doing it wrong.

bdcravens|6 years ago

> Where is this meme that RDBMSs do not scale coming from?

MongoDB marketing?

marcosdumay|6 years ago

The situation goes both ways. Sometimes people put complex logic into the database and it hurts scalability. Other times people pull logic out of the database, but losing all the correlation seeking and filtering means that it now has to do a lot more IO than it did before, what hurts scalability too.

silon42|6 years ago

+1 Also, in my experience, SQL query optimizers can also be unpredictable on complex queries (behavior changes with data size and statistics) which can cause them to suddenly change to slow execution plan when data grows or something is added , despite having all indexes for a good plan possible (which for interactive apps is to avoid table scans always -- they might be faster in some bad corner cases, but if data doesn't fit memory, they can't be cached and they slow down everything else).

tkyjonathan|6 years ago

That could be a fault of complex physical data models that need simplification.

nizmow|6 years ago

Follow up: yes, you're all correct, but sometimes things in an organisation aren't perfect and I've been in many situations where the database is the bottleneck with little power to change it. Now that I think about it, it's mostly a symptom of the database frequently under a different 'organisational area' compared to the application. This means we might have the ability to, for example, easily add more application servers, but the database falls under a different department and any changes require a lot more coordination with other parts of the business. I realise this could happen for any resource, but in my experience it's very common to have it occur with a database.

golergka|6 years ago

What's the limit of vertical SQL scaling and what kind of applications (I'm talking not only about the functionality, but also realistic level of popularity) can end up hitting it, in your experience?

bdcravens|6 years ago

Another alternative is that based on your workload, using an RDBMS that is column-based.

MiroF|6 years ago

what an odd comment... I find it difficult to believe that you would run into CPU limits before data movement costs slowed you down.

Also RDBMS are not that difficult to scale, especially easier than an ad-hoc ORM/whatever on your application side.