Scaling SQL Server

Scale Up or Scale Out? Scaling SQL Server

Want your app to be able to handle more users? Here are tips, tricks, and links to comprehensive references on how to scale SQL Server– either Scale Up or Scale Out.

Do You Really Have a Scale Problem?

It could be that you just have a bad implementation. First, make sure you’re not wasting your time.

sp_Blitz: Uncover your SQL Server’s dirty little secrets
Scripts to check your wait stats, plan cache, and currently running activity in SQL Server are here.

Scale Up

Want to grow upwards, using big hardware, big tables, and lots of queries running on a single instance?

Brent gives an overview of What It Means to Scale Up
Sargability – Brent shows why %string% is slow — and this can be a real problem in a scale up environment
sp_BlitzIndex – Get insight into your toughest indexing challenges
Table Partitioning: Will it help you scale up, or become your biggest nightmare?
Performance Considerations of Datatypes – study by Michelle Ufford about the impact of choosing the right datatype.
Isolation Levels: Kendra’s list of everything you might want to know about Isolation Levels (complete with poster)
Locking and Blocking: Find lock related perf counters, how to run the blocked process report, and more.
Kendra tells you how to implement Snapshot or Read Committed Snapshot Isolation
List of all Microsoft RSS Feeds for KB Updates
Making backups faster: “A Technical Case Study: Fast and Reliable Backup and Restore of Multi-Terabytes Database over the Network” on the SQL Customer Advistory Team (SQLCAT) list of whitepapers

Scale Out: Availability Groups, Sharding, and More

Instead of just building upwards, what are your options for distributing your data outwards in SQL Server?

Brent gives you an introduction to Scaling Out SQL Server
SQL Server 2012 AlwaysOn: Setup checklist, introduction, and much much more
Jeremiah talks about Sharding in SQL Server
If you’re using availability groups, they’re grounded in failover clusters. Get familiar with:
- Windows 2008 Hotfixes Related to Failover Clusters
- Windows 2012 Hotfixes Related to Failover Clusters
It can be tricky to find out if a failover happened with an availability group. Learn more from the SQL Server PFE team.
Jeremiah talks about Availability, Data Locality and Peer to Peer Replication
Jeremiah explains why Conflict Resolution is a Big Issue with Peer to Peer Replication
Lucene Full Text Search – this open source product is cheaper, faster, and more scalable than SQL Server’s built-in full text search. It’s much easier to scale this out with multiple boxes rather than trying to scale SQL Server. There’s also Lucene.NET for .NET folks.
Adding Reliability to Your Infrastructure – Brent explains why, all other things being equal, the more moving parts you add, the less reliable your system gets.

Scale Out: Caching

Application caching sounds great, but is it really all that? (Well, yes, it actually is.)

Brent says: “The fastest query is the one you never make“
Jeremiah helps you pick the right caching tool – When your app needs data, check the cache tier first, then only query if the data isn’t in cache. Caching options include Redis, Memcached, and Windows AppFabric Caching.
Finding Queries to Cache (script)
Jeremiah and Kendra debate in DBAs vs Devs: ORMs, Caching, and Access to Prod (slideshow has a section on caching trickery)
Caching in the Distributed Environment (msdn)

Testing and Performance Monitoring

You need to test and monitor no matter how you scale, but you need to become a serious pro at performance monitoring when you scale up.

Random vs Synchronized Load Testing – users sometimes hit your application in waves, with several people doing the same thing at the exact same time. Find out if that will cause a problem rather than just having a bunch of people randomly click around in your app.
Using Perfmon for Performance Tuning SQL Server – if you want to scale, you have to find out what bottleneck is currently holding you back from getting more performance. Brent explains what counters to look at and how to analyze them.

Lessons Learned Scaling StackOverflow

StackOverflow serves millions of pageviews per day with a SQL Server back end consisting of plain old commodity hardware with local drives. Here’s some of the lessons we’ve learned along the way.

Capture & Test Every Query Before Upgrading – we upgraded to fix one problem, and ended up with a different problem because 2008 handled a specific kind of query differently.
Scaling Full Text Search in an OLTP Environment – we ran into a problem with SQL 2008’s new FTS and got help from Microsoft.
High Scalability Profile of StackOverflow – talks about the architecture and hardware in use.
Cache-Warming Query – wildly unsupported, but gives you a rough idea of how to select data from all of your indexes in a particular database to warm the buffer pool. Doesn’t work with filtered indexes, and make sure you have enough memory to cache the entire database.

StackOverflow’s Shared Scaling Tips

Blog.ServerFault.com – the sysadmins behind scaling StackOverflow’s infrastructure. It’s not purely dedicated to SQL – it touches a lot of parts of systems administration – but there’s a lot of great information in here. For an example, start with Different Views of the Same Problem.
Sam Saffron’s Blog – Sam is a rocket surgeon coder who’s doing great stuff to keep performance fast.
MVC Mini Profiler – like running a client-side trace that can tell you which queries are slow (or running too often). To see how it can work in your application, check out Scott Hanselman’s MiniProfiler walkthrough.
Blog.StackOverflow.com – more general blog that includes the StackExchange Podcast. There’s some scalability info in here, but not as much as the ServerFault guys.