top | item 23998503

Migrating a 40TB SQL Server Database

126 points| ksec | 5 years ago |tarynpivots.com

120 comments

order
[+] polygotdomain|5 years ago|reply
I think some of the posts here miss a little bit of the context as to why things like this happen in the first place. It's only in the last handful of years that a stack for logging has really become mainstream. Chances are a lot of these types of logging solutions predate that and used whatever persistence technology was readily available. Writing to files on web servers can be a pain, and these logs will have to be queried at some point, so storing it in a database is not a bad idea, especially when better options have only recently become available.

The problem is that relational SQL is bad for logs, but by the time it gets to the scale where it's problematic that there's too much volume in the logs to make anything "easy". Simultaneously there's a lot of business value in that log data that you don't want to lose.

Yes, SQL's a poor fit for logs, but it's a better fit then a lot of other things, including not logging at all. Better solutions exist, but they don't exist in a bubble, and there's a cost to integrating them and migrating to them. A lot of these comments seem to be judging a technology decision based solely on hindsight without realizing that there are legitimate reasons for logging to SQL.

[+] oneplane|5 years ago|reply
This reads like the most Microsoft or most legacy-on-prem problem ever. Why on earth would you do this, and even then not take the opportunity to do it right while you are migrating anyway? Are you mad?

Well, that was my first reaction anyway. It seems to me that this is more of a 'when all you have is a hammer, everything looks like a nail' to me; if you are a vendor shop (i.e. Microsoft-only) and you are a DBA and specifically just for MSSQL, then this is probably the best way you could come up with, based on the skills, experience and tools at hand. It does make me wonder if at any point someone thought to ask for help instead of keeping this going.

[+] iliasku|5 years ago|reply
I found this article useful, mostly for people who don't have similar experience dealing with large databases. It gives so many reasons on why following this path is such a bad idea. Please people, stop writing custom scripts and then run them untested on the production db. If the company doesn't pay to give you enough disk space to take a proper backup to test just quit or buy it yourself. It will save you 11 months and many hours of sleep.
[+] malisper|5 years ago|reply
I've done migrations of several databases with ~20TB of data before. Assuming you're data is immutable, A much easier approach than what they did is to restore a backup of the database onto a new machine. Since you don't have to worry about serving production traffic you can have the machine go full throttle on performing the migration. When the migration has finished you have to copy the new data that came in during the migration into the new database which isn't too bad.

When I've done this with 20TB DBs, it usually took several days to complete. That's much better than the 11 months mentioned in the blog post.

[+] d_k_f|5 years ago|reply
I'm late to the party, but he mentioned this in the article. The new server didn't have enough space to hold both the backup and the newly restored database, so that approach didn't work.

Quote: "We’re a very lean shop when it comes to hardware, and this is by far our largest chunk of data we have to store, so there wasn’t a place anywhere on the network to store a 33-38TB full backup temporarily, while it was being restored to a server."

As others have mentioned in the thread, a lot of the problems seem to have been solvable by just buying/attaching more storage, even if only temporarily.

[+] justinclift|5 years ago|reply
Similar thought here. Was kind of surprised they didn't go with some kind of checkpoint/snapshot approach for the data sets, + stream them to the new machine, + add any newly arrived log data.
[+] atwebb|5 years ago|reply
Yeah I only scanned through but there seem to be better tools for the job in the environment. SSIS is scriptable/has APIs so you can shred the data into n packages and run it. Also, columnstore_archive gives amazing compression if you're low on the read side and immutable data means you can load into partitions. Column diffs are a bit annoying but not unmanageable. Then again, arm chair architecting isn't ever as accurate as you assume. I do know you can move billions of rows daily using SSIS across thousands of tables.
[+] rrdharan|5 years ago|reply
It seems concerning that they would be hanging onto traffic logs for that long. Seems like a massive PII / data breach liability.
[+] tutfbhuf|5 years ago|reply
Agreed, it's questionable why a company needs to save raw traffic logs for years.
[+] jpalomaki|5 years ago|reply
Stack Overflow HAproxy logs likely don’t have too much sensitive information. These are most likely logs about visited pages with timestamps - maybe with client IP addresses.

Storing them raw can make sense. You might later come up with new ways to analyze the data which your aggregation logic of choice does not support.

[+] dsr_|5 years ago|reply
Right here at the beginning:

"Our HAProxy1 logs aka traffic logs, are currently stored on two SQL Servers, one in New York, and one in Colorado. While we store a minimal summary of the traffic data, we have a lot of it. At the beginning of 2019, we had about 4.5 years of data, totaling about 38TB. "

and I asked myself, why the hell would you want this data?

Logs serve two purposes:

- find a fault and fix it - audit what someone (or many someones) did

You never need years of data to find evidence of a fault: either it's available in the last few weeks or you didn't log the data in the first place (unless, of course, it's a bug that only shows up on leapdays, or leapseconds, or is similarly rare...)

So that leaves tracking your users in a way which isn't covered by your user's own history. Again, a month or two might be helpful, sometimes... but why years?

[+] posix_me_less|5 years ago|reply
> "We realized that the original daily table structure wasn’t ideal. If we needed to query something over several days or months, it was terrible — lots of UNION ALLs or loops to crawl through days or even months at a time was slow."

Seems to me rather storing traffic logs in an SQL database is a very bad idea, especially if you have 40TB of it. Anybody can explain why SQL database makes sense for large time series data? Why not just dump it into linear files?

[+] BrentOzar|5 years ago|reply
> Anybody can explain why SQL database makes sense for large time series data?

Let's say, just to say, that you have effectively free licensing for Microsoft SQL Server (because you already own the licenses in question), and you have staff who know that platform well.

Sometimes it's easier to use the tools you already have rather than go acquire a new platform to achieve a goal.

As with anytime you choose a persistence layer, you're accumulating technical debt the instant you make the choice. Over time, if better persistence layers become available, you have to make a judgment call about when it makes sense to change the code, versus when it makes sense to do maintenance work on the persistence layer.

[+] ThePadawan|5 years ago|reply
Because you've already paid the licensing cost for the SQL database and have salaried DBAs taking care of it. And building an alternative solution (including HA, backups etc.) would cost more than $0.
[+] tutfbhuf|5 years ago|reply
> Why not just dump it into linear files?

I guess because they want to query them, but in this case I would prefer the ELK stack (Elasticsearch + Logstash + Kibana).

[+] abhishekjha|5 years ago|reply
>storing traffic logs in an SQL database is a very bad idea, especially if you have 40TB of it

I get this question asked a lot of times in the interview.

Why is it a bad idea to store logs in a SQL database? It has the ACID properties baked in and it is pretty reliable. What is the better alternative? I guess I have never worked on anything that would require keeping track of 40TB of data because of which the drawbacks aren't really obvious to me.

[+] dswalter|5 years ago|reply
According to their pricing page, a 50TB SSD from Nimbus costs $12,500. That would have been worth the expense in DBA time alone, right?

Edit: added the link. https://nimbusdata.com/products/exadrive/pricing/

[+] BrentOzar|5 years ago|reply
> That would have been worth the expense in DBA time alone, right?

You're assuming the DBA did nothing else during the project's timeline, and that's not likely the case. A lot of work in projects like this is just letting the batch job run while you keep an eye on it.

[+] thisisbrians|5 years ago|reply
Yes. Or even stripe some cheaper $/TB drives together in a RAID to cover the needs.
[+] champtar|5 years ago|reply
If they want to migrate to something else, they need to have a look at ClickHouse. When switching from Elasticsearch to ClickHouse 1.5 years ago, I reduced my storage needs by 20, gained SQL, performance, and a ton of analytics features.

In hindsight I would say that Elasticsearch is for full text queries, and if you are using it for something else (access logs) there is a good chance this is the wrong tool for the job.

[+] miahi|5 years ago|reply
I worked on a system that generated around 500GB-1TB of log data monthly. An estimation done way before it was put in production said that it would generate less than 100GB/month, so the DB storage was sized accordingly (~6TB, half of which were used for other data). A change that came later in the project (on a different system) was disregarded, but it increased the application traffic by a factor of 5, and the space estimation was not updated. So it was an interesting surprise when the space started filling up way faster after the go live.

To make the issue even more interesting, the system was mission-critical at five nines and the log data was sensitive + had a legal requirement to be kept online for at least one year, for queries. So deletion was not an option and the migration had to be done online.

Fortunately the system used Oracle DB and the log table was partitioned by date. Unfortunately, a partition kept three months of data, so it could reach the full amount of free space on the DB. There was no easy way to add storage to that server, but we did have some space on another one, with lower speed (the prod DB was RAID10, the other one was RAID1)

In the end we had to create a new partition scheme, with monthly partitions stored in different tablespaces, create another table with the same structure and switch it with the online one (a couple of seconds of downtime), backup the database and restore it on the slow storage, then delete the old table structure and tablespaces/datafiles to free the space on production.

A procedure was created to detach the oldest partition from the production database and reattach it to the slower database (where we also implemented compression at a later stage). This meant that the query system needed changes to work with both databases, but at least the system was up.

The good thing was that in Oracle you can detach partitions and tablespaces and transport them to a different database without downtime. I don't know MS SQL enough, but I understand from the migration procedure that this is not possible there. I guess they did that manual daily partitioning by table, but they did not store each table in its own datafile, so it can be moved as a whole (maybe with downtime, but at least it would free the space faster).

[+] WorldMaker|5 years ago|reply
Bulk copy is about the fastest you could do in C# or PowerShell, but you would still be moving a ton of data back and forth over the wire between the DB server and the "jump" server running the script.

I would have tried a hybrid approach with the dynamic SQL one. Dynamic SQL sprocs like that don't seem to generate good query plans (at least not with a lot of prodding) in my experience, but if the dynamically generated SQL was done in the C# or PowerShell side rather than a sproc I feel the query analyzer would be less confused about individual queries coming out of the scripts.

The only other advice experience from a past life I have to offer is that I often saw better performance from T-SQL MERGE than T-SQL INSERT FROM. Dynamically generating MERGE statements from C# is not fun, but in that past experience it would give the best results for single-server data migrations like this. (Multi-server there's not much you can do beyond Bulk Copy.)

[+] user5994461|5 years ago|reply
This is all the reasons why logs must have a limited retention period (usually 1 to 3 months).

Then nobody has to waste 11 months of their life trying to retain and migrate TB of old logs that nobody needs, with no storage to do so and no backup.

[+] jeffbee|5 years ago|reply
Storing web logs in a SQL database is a pretty bad plan.
[+] tyingq|5 years ago|reply
Combined with "The database was initially designed to have a single table for each day." Oy.
[+] isatty|5 years ago|reply
At this scale and amount of data, definitely. I have however dumped short term data into an unlogged temp Postgres table to query data using Metabase/psql. Worked surprisingly well enough for me to debug something. I'm sure another index on ES would've worked well too, but I'm more familiar with SQL than ES syntax and it's cheaper too.
[+] myrandomcomment|5 years ago|reply
So this is all interesting from a technical point of view but from a business point of view it makes no sense. The amount of time that you had a talented DB dev working on a project that could have been solved by spending ~25K on a new system seems silly. Bring the new system up and move the data over the network doing the conversion as part of the move. 100TB over a 10G network is ~22H theoretical transfer rate.
[+] metadata|5 years ago|reply
I think 11 months is absolutely crazy, and the context for that is that I earn for living by selling my database migration software [1]. They could have used my software to copy up to 16 tables in parallel for a mere $3k. They could have copied a few hundred tables at a time, compare source and target to make sure everything is fine, then drop and compact. As more disk space frees up, they could have moved more and more data in one go. If the data needed to be transformed much, they could have wrote views to pull the data in a proper format and creating tables on the target with that data as they go. True, spinning drives would kill performance, but my tool is easily reaching 50MB/s on SSDs and spinning drives wouldn't add months. I mean, when you have off the shelf products doing what you need, spending so much time fighting with custom scripts is and odd choice. This is really not meant to advertise, I'm just baffled by how much effort was needed for this project.

[1] https://www.spectralcore.com/fullconvert

[+] hansitomani|5 years ago|reply
Wow.

I really wana know if they even compress the log data in there database.

40 TB of logs in a postgresql, like that can't be efficient?

I would probably just have a scaling blob store and would store it away. That has to be much faster, simpler and cheaper.

[+] greggyb|5 years ago|reply
They mentioned in the article that they are using SQL Servers clustered columnstore index. This is a poor name, as it's more than just an index. CCI in SQL Server is an on-disk storage format that is a highly compressed columnstore.
[+] dkdk8283|5 years ago|reply
I run a 30TB MySql server on EC2. We raid 0 multiple EBS volumes for space and iops.

It is completely unmanageable - I inherited it. We’re moving data out, but slowly. This DB houses PII for tens of millions of users.

[+] jto1218|5 years ago|reply
> There were lots of reasons this needed to be done, one being tech debt. We realized that the original daily table structure wasn’t ideal. If we needed to query something over several days or months, it was terrible — lots of UNION ALLs or loops to crawl through days or even months at a time was slow.

Maybe i'm missing it, but there doesn't seem to be any discussion of the result. _How_ much faster was it after the change? Was it 11-months-worth-of-dev-time faster?

[+] BrentOzar|5 years ago|reply
> Maybe i'm missing it, but there doesn't seem to be any discussion of the result.

Sometimes you have to stop writing after 9,000 words. ;-)

[+] thisisbrians|5 years ago|reply
I'm in the process of migrating a ~5.4TB time series database. This is also slow/painful. I wonder if there are any general-purpose/open source tools that could help with these sorts of tasks (I've had to roll my own job queue/ETL worker script infrastructure to manage this). In my case, it looks like the full migration will take around 10 calendar days assuming no issues come up.
[+] user5994461|5 years ago|reply
The trick is to find a common baseline format, like CSV or JSON, then you can leverage the bazillion of existing tools.

The second trick is to split the data in manageable chunks (a day of logs for example), then you can transfer one unit at a time to the new system.

Dump a chunk on one side, load on the other side, verify it's good. It will still take a while but you have a good indication of progress and don't need to start all over if one chunk fails.

[+] yknx4|5 years ago|reply
I'm baffled with the fact that they were toying with the production database in the production machine. That sounds incredibly dangerous
[+] PaywallBuster|5 years ago|reply
Actually if you read the article this is not actually critical infrastructure.

All the data is duplicated in both DCs and the source of data still has a copy of the data.

I imagine is not a problem if it goes down for some moments. Developers may be unable to review traffic logs, that's it.

[+] BrentOzar|5 years ago|reply
> I'm baffled with the fact that they were toying with the production database in the production machine. That sounds incredibly dangerous

As Taryn stated repeatedly in the post, they simply didn't have enough infrastructure to have development spare copies of this. They didn't even have enough infrastructure to take a backup.