You’ve been nursing your SQL Server through day after day of increasing load. Stored procedures are taking longer and longer to run. There are timeouts on the client side. Both customers and management are getting upset. As you’re bemoaning the terrible code that the developers wrote, it hits you: you don’t have a code problem, you have a scalability problem.
The Types of SQL Server Scalability Problems
It seems obvious to an outsider, but hardware has to be replaced on a regular basis. Between replacements, hardware is like code: it requires attention and maintenance to keep it running smoothly. As a DBA, it’s your job to pay as much attention to the database hardware as you do to the wait stats, slow queries, and missing indexes.
This doesn’t mean you can throw a monitoring package in place and walk away. Understanding how SQL Server and the underlying hardware respond under your application’s workload is important. Once you have a baseline on SQL Server and the hardware, you can easily tell if a problem is bad code or you need to call up your friendly hardware salesperson.
Database design is probably not your bailiwick; you might even say you’re not a database architect, you’re a database administrator. That may be, but you’ve also got the keys to the kingdom of database performance.
Through the DMVs you can tell which indexes have lock escalation problems, which files have slow reads, and even narrow these problems down to specific queries and times of day. Even if you can’t tell Boyce-Codd Normal Form from Backus-Naur Form, you have tools to help you identify problems at a physical level.
You probably want an example – if you add several new queries to the application and suddenly there are a lot more lock escalation attempts on a table, you can safely conclude that at least one of those queries would benefit from a new index.
I lied, sometimes the scalability problem is a code problem.
DBAs love to blame those pesky developers for causing problems. There’s some truth in that statement – developers introduce changes into an otherwise stable system. But when developers are adding new features and functionality, they can’t always know which indexes will be the best; after all, you didn’t have the resources to provide them with a full copy of the database, right?
Thankfully, you can track down top resource consuming queries very easily. Once you’ve identified those queries, you can either tune them or work with the developers to educate them about what they can do better the next time. All too often, time is the only issue that prevents things from being done correctly – developers are pressed for time to get a feature out, so they don’t spend as much time tuning code as they’d like. Help them out – find the bad queries and share the knowledge or just tune them yourself.
Outside of bad T-SQL, there are a number of patterns that can cause problems for database performance. So bad, in fact, that they’re going in a new section.
If you’ve thought that something didn’t smell quite right in the database, you were probably right. There are a number of ways to use SQL Server that work… for a while. I like to call these scalability anti-patterns. These anti-patterns work well as long as you can keep throwing more physical resources at the problem.
Anti-patterns can be difficult to identify as an anti-pattern – is it just bad code or are you seeing something worse? Over at ACM Queue there’s a guide 20 Obstacles to Scalability that lists patterns that will prevent you from moving forward. The most common patterns you’ll find are a lack of caching, serial processing, using the database as a queue, and full-text search.
Fixing anti-patterns will take a long time. These are architectural decisions that are baked into the entire application, not just the database. In order to fix these, you’re going to need to work with the developers, research new tools, and figure out how to implement the features
Solving the Problems
Easy Mode: Hardware
Let’s be honest, using money to solve your scalability problems isn’t a solution. Using money to solve scalability problems can often be a quick fix. If storage is slow, you can embark on a query tuning project or buy additional memory (or even SSDs). Although you’ll eventually need to tune those queries, you can buy yourself some breathing room by adding new hardware.
Don’t let new hardware lull you into a false sense of security. If you buy hardware to temporarily solve a problem, make sure you also record the performance problems and get time scheduled in the future to put a fix in place.
Medium Difficulty: Tuning
When you finally decide to tune your application, identify your biggest bottlenecks (if you don’t know, we can teach you how) to make sure that you’re tuning in the right place. You can’t tune T-SQL to make the network faster. Once you know what you’re trying to fix, identify the queries cause the most pain. Is memory a problem? Look for queries with large memory grants and tune them. Think about adding indexes.
Database performance tuning is as much about improving code as it is about making the database do less. Better indexes mean you have to sort less, but so does sorting in the application layer. Maybe you can pre-compute complex queries – even aggregations may be slightly out of date, there may not be a real-time requirement.
The Hard Stuff: Redesign
When hardware and tuning aren’t enough, it’s time to roll up your sleeves and redesign parts of the application. You’ll still need to identify bottlenecks, but you aren’t going to be tuning code.
Got a queue in the database? Figure out how you can move that queue based processing outside of SQL Server or process the queue in large batches rather than line by line.
Reporting against live data? Find ways to report against readable secondary servers or even report against a time delayed copy of the data in a data warehouse.
There are ways to redesign any feature for scalability. Tools and techniques have been documented in books, articles, blog posts, and conference talks. The question is no longer “How could you possibly do that?” The question is “Which technique are we going to use?”
What Should You Do?
For most of you, the answer is simple: find your bottleneck and decide if you can make a simple fix or if you need to devote more time to the problem. If you need more time, buy a bit of hardware to help you through. Don’t jump on the re-write bandwagon too early, but always keep ideas in the back of your head. And, above all else, make sure you’re solving the right problem with the right solution. Adding more processors won’t help if every query is wrapped in a