Hi. I’m a former developer who’s moved into database administration, and here’s what I wish somebody would have told me when I got started.
7. SQL functions rarely perform well.
Good developers like to reuse code by putting it into functions, and then calling those functions from multiple places. That’s a great practice in the app tier, but it has huge performance drawbacks in the database tier.
Check out Paul White’s excellent post on Forcing a Parallel Query Plan – in particular, the list of things that produce a serial zone in the plan. Most functions will cause your query to go single-threaded. Sad trombone.
If you do want to reuse code, consider stored procedures and views instead. (Granted, they can come with their own performance drawbacks, but I’m just trying to get you started on the right foot as quickly as possible here, and functions are a broken foot.)
6. “WITH (NOLOCK)” doesn’t actually mean no locking.
At some point in your career, you’re going to start using WITH (NOLOCK) on everything because it gets your query results faster. That’s not necessarily a bad idea, but it can come with some surprising side effects that Kendra discusses in her “There’s Something About Nolock” video. I’m going to focus on one of them here, though.
When you query a table – even WITH (NOLOCK) – you take out a schema stability lock. No one else can change that table or indexes until your query is finished. That doesn’t sound like a big deal until you need to drop an index, but you can’t because people are constantly querying a table, and they think there’s no overhead as long as they use WITH (NOLOCK).
There’s no silver bullet here, but start by reading about SQL Server’s isolation levels – I bet READ COMMITTED SNAPSHOT ISOLATION is an even better choice for your app. It gets you consistent data with less blocking hassles.
5. Use 3 connection strings in your app.
I know, you’ve only got one SQL Server today, but trust me, this is worth it. Set up three connection strings that all point to the same destination today, but down the road, when you need to scale, you’ll be able to set up different database servers to handle each of these:
- Connection for Writes & Realtime Reads – this is the connection string you’re already using today, and you think that all data needs to come from here. You can leave all of your code in place, but as you write new code or touch existing pages, think about changing each query to one of the below connections.
- Connection for Data 5-15 Minutes Old – this is for data that can be slightly stale, but still needs to be from today.
- Connection for Data as of Yesterday – for management reports and trending data. If you run an online store, you might pull reviews from here, for example, and tell users that their reviews take a day to publish.
That first connection string is the toughest one to scale; we don’t have a lot of options in SQL Server to scale out multiple servers that handle writes. (We do have options – they’re just painful to implement and manage.) The lower-tier connection strings 2 and 3 are much, much easier and cheaper to scale. For more about this technique, check out my 3 Favorite Connection String Tips.
4. Use a staging/apptempdb database.
Your app probably uses the database for some scratch work – processing, sorting, loading, caching, etc. It wouldn’t break your heart if this data disappeared, but you’d like to keep the table structures around permanently. Today, you’re doing this work in your main application database.
Create a separate database – call it MyAppTemp – and do your work in there instead. Put this database in simple recovery mode, and only back it up once daily. Don’t hassle with high availability or disaster recovery on this database.
This technique accomplishes a lot of really cool scalability stuff. It minimizes the changes to the main app database, which means you get faster transaction log backups and differential backups for it. If you’re log shipping this database to a disaster recovery site, your important data will arrive faster – and not be impeded by all the scratch work. You can even use different storage for these different databases – perhaps cheap local SSD for MyAppTemp, keeping your shared storage connection free for the critical production stuff.
3. Yesterday’s articles and books are often wrong today.
SQL Server has been out for over a decade, and a lot has changed over the years. Unfortunately, the old material isn’t updated to cover what’s happening today. Even today’s material from reputable sources is often wrong – take this critique of Microsoft’s Performance Tuning SQL Server guide. Fellow Microsoft Certified Master Jonathan Kehayias points out a bunch of really bad advice that comes straight from a Microsoft document.
When you read something that sounds like good advice, I like to try the Anti-Doctor-Phil strategy. Dr. Phil preaches that you should love every idea for fifteen minutes. Instead, try hating it – try to disprove what you read before you put it into production. Even when advice is generally good, it might not be good advice for your own environment. (Yes, that includes my advice too.)
2. Avoid ORDER BY; sort in the app instead.
To sort your query results, SQL Server burns CPU time. SQL Server Enterprise Edition goes for about $7,000 per CPU core – not per processor, but per core. A two-socket, 6-core-each server rings up at around $84k – and that’s just the licensing costs, not the hardware costs. You can buy a heck of a lot of application servers (even ones with 256GB or more of memory) for $84k.
Consume all of the query results as fast as possible into memory in your app, and then sort. Your application is already designed in a way that you can scale out multiple app servers to distribute CPU load, whereas your database server…is not.
UPDATE: I’ve gotten a lot of comments wailing about how the app only needs ten rows of a ten million row dataset. Sure, if you’re doing TOP 10, you’ll need an order by – but how about reworking the query to avoid juggling so much data? And if the data sounds like too much for the app server to sort, it’s probably causing work on the SQL Server too. We talk about how to find those queries in the webcast listed at the bottom of this post. Also, keep in mind that I said “Avoid ORDER BY”, not “Never use ORDER BY”. I use ORDER BY myself too – but if I can avoid that work in the very expensive data tier, I’ll avoid it. That’s what avoid means.
(This part here is where the MySQL and PostgreSQL guys start screaming about how you can avoid licensing costs altogether with open source databases.) (This part here is where you would expect me to have a witty retort, but I don’t. If you’re building a brand new app and you’re choosing a database, read my StackOverflow answer on which database handles the most load.)
1. SQL Server has built-in zero-impact instrumentation tools.
SQL Server’s dynamic management views (DMVs) can tell you all kinds of killer stuff like:
- Which SQL statements are causing the most load on your server
- Which indexes are wasting space and slowing down inserts/updates/deletes
- How fast storage is responding to requests on a database-by-database level (and even more fine-grained than that)
- Where your server’s bottleneck is, like CPU, disk, network, locking, etc
All you have to know is where to look, and I’ll show you.