top | item 3243133

How StackOverflow Scales with SQL Server

132 points| jswinghammer | 14 years ago |brentozar.com | reply

54 comments

order
[+] wmwong|14 years ago|reply
This wasn't so much about fine tuning SQL Server itself, but breaking traditional thinking. The 5 rules he presents originally are all replaced in the end.

  - Everybody's the DBA
  - Do what it takes to get what you want
  - Tune later, cache & separate now
  - NewEgg your way out of problems
  - Share for great good
For each point, he argues why the old rule no longer applies and what the new solution is.

I felt a lot of the presentation was about tuning SQL Server without tuning SQL Server: caching, leave full-text searching to Apache Lucene (because it's not querying), and using SSDs to speed up performance without having to touch any code.

[+] BrentOzar|14 years ago|reply
Yep, absolutely, you nailed it. There's a gazillion presentations out there about tuning databases, but that only takes you so far. I wanted to show that you need to take a step back before you go into query tuning details.
[+] newegg|14 years ago|reply
Interesting that buying hardware written as "NewEgg your way out of a problem" as Newegg uses IIS and quite likely SQL Server.
[+] krmmalik|14 years ago|reply
I had no idea StackOverflow was using Microsoft SQL Server. I guess its just a learned response but i've always come to expect some large networking site to always be using a non-MS based solution.

I havent had a chance to watch the video, but i hope to later on. What is interesting, even with just the link is that for a website like StackOverflow that MS SQL is a viable solution.

We have been using SQL Server in our own company for our projects and i was really starting to get annoyed with it. I found it to be heavy on resources, slow to respond and lets not forget cost. I just completed a project that i have been working on for the last 4 months, and the majority of the work was within SQL server. One thing i learned was that its actually quite a powerful beast.

When used correctly and in the right way SQL is a very capable SQL solution. I'm glad that we decided to stick to SQL Server. There is a lot i learned about SQL Server in the last 4 months that i had no idea it was capable of.

[+] silverbax88|14 years ago|reply
Big networks (banks, insurance companies) basically use one of two options most of the time. Oracle or SQL Server.

It only seems like it isn't when you read sites like Hacker News, where most of the posters are not working in big environments. Facebook is a large scale solution that doesn't use either, but their data management and caching is so bad nobody should be considering them as a best practice.

[+] timwiseman|14 years ago|reply
I found it to be heavy on resources, slow to respond and lets not forget cost

What are you comparing it to?

Admittedly I am biased since my day job is a SQL Server DBA, but I have tried several other options and think SQL Server tends to stack up quite well.

I have found it generally more user friendly, easier to work with, and cheaper than Oracle (though Oracle does seem to have an advantage in certain types of partitioning). I rather like MySql for certain types of projects, but generally find SQL Server easier to maintain for large projects.

I have only dabbled with NoSQL options, but my general opinion is that for certain problem sets, they are great. However, when ACID is even remotely desireable they are not an option and for certain other tasks they are less desirable.

So, I think that which type of database you use depends largely on the project, but SQL Server tends to stand up quite well for a wide array of projects.

[+] mkramlich|14 years ago|reply
one or both of the main people behind SO are "Microsoft stack" guys, so I'm not surprised.
[+] 3am|14 years ago|reply
I can't watch the video right now, but the idea that "Everybody's the DBA" is risky in general. I'm hugely in favor of developers writing and optimizing their own SQL, being able to create normalized schema (and know when it's worth the tradeoff to denormalize), how to read a query plan, and generally be as competent as a DBA. But... it's good to have one person who has the global view of the database for things like tuning extent sizes, selecting the optimal types of storage for various partions, doing reviews on the schema, capacity planning, etc, etc.

The right setup (IMO) of having a DBA in an operational role with developers that are highly proficient/self-sufficient is hard to get right and expensive enough that it probably isn't right for an early stage company. And a bad DBA can be a nightmare. So there are tradeoffs on both sides.

[+] gaius|14 years ago|reply
It's about moral authority. The only people touching production should be those whose pagers go off at 3am if it all goes hatstand. Everyone else is the peanut gallery.
[+] Duff|14 years ago|reply
I think that with a modern database like SQL Server, the old-school "high priest" DBA is obsolete.

But... if you don't have someone dedicated to thinking about database issues, you need to treat database changes just like your code. It needs to be in a repository, it needs to be reviewed, and you need a change management regime.

From a anecdotal POV, I've noticed that many folks have a good process (or at least a consensus approach) to managing their code... but the database is often a red-headed stepchild that doesn't get the attention it deserves.

[+] BrentOzar|14 years ago|reply
You'll want to watch the video, because I explain that in more detail. Of course I'm not against change control or security, but if you want to scale a database, everybody involved has to have DBA-level skills. StackExchange's sysadmins and developers are all keenly aware of the database impact of what they do, and they aggressively try to minimize that impact. That's what I talk about on the video when I say "Everybody's the DBA." It's like saying, "Only you can prevent forest fires."
[+] scottshea|14 years ago|reply
I love it when companies offer insight into their practices like this.
[+] mwsherman|14 years ago|reply
It comes down to two things: economics and computer science, in that order.

For most companies, diving deep into your data persistence layer is probably not worth it in the beginning. Your devs need to work on features. (MS SQL does pretty well untuned.) This is the economics part.

The computer science part comes in when you are doing enough traffic that 200 hours of dev time toward a 10% performance improvement becomes good economics. Then you dig into the data layer (and every other layer) and start counting milliseconds.

Which is what we did at Stack O by using Dapper and renting Brent Ozar. :)

[+] BrentOzar|14 years ago|reply
RENTING, hahaha, I never thought of myself that way. I like it. I'm going to use a "For Rent" sign around my neck in an upcoming blog post!
[+] blhack|14 years ago|reply
"Don't worry the cloud takes care of all of this!"

This is horrifying. Are there really people that think this way?

[+] BrentOzar|14 years ago|reply
Sadly, yes - obviously it's not this site's target audience, but a lot of PHBs out there are buying into it hook, line, and sinker.