How Bad Statistics Cause Bad SQL Server Query Performance

Statistics
3 Comments

SQL Server uses statistics to guess how many rows will match what your query is looking for. When it guesses too low, your queries will perform poorly because they won’t get enough memory or CPU resources. When it guesses too high, SQL Server will allocate too much memory and your Page Life Expectancy (PLE) will nosedive.

A common fix is to update your statistics regularly.

In this post, I’ll explain why bad stats cause bad performance, and give you some links on how to improve your statistics & performance.

When Statistics are Accurate…

Let’s start with a query against the Stack Overflow database. I’m using the StackOverflow2013 50GB database, but any size will get roughly similar results here.

To execute this query, SQL Server needs to guess how many people live in Boise so that it can allocate enough memory to sort the query results in RAM. I’ll run the query and include the actual execution plan to see what it guessed:

When you’re viewing an execution plan, look at the “195 of 46” numbers under operator. That means SQL Server actually brought back 195 of an expected 46 rows. That sounds bad – it sounds like SQL Server underestimated rows by something like 423% – but that actually isn’t too bad. I only get nervous when estimates are off by 10x or more, and that’s when SQL Server underestimates how much memory is required to do things like sorts.

In this case, SQL Server guessed really well! It used built-in, automatically-created statistics to guess how many people live in each location. It accurately guessed how much memory it would need, and there’s no yellow bang over the Sort operator because the entire sort happened in memory.

To learn more about how these statistics are generated and what their contents are, stay tuned, and I’ll give you learning resources at the bottom of the post.

When Statistics are Inaccurate…

But let’s say that a lot of people suddenly move to Boise. Manhattan folks have gotten tired of the high rents and they’re feeling a lust for delicious russet potatoes, so let’s move everyone from New York, NY to Boise:

That adds a couple thousand folks to the census. Now, if we run our same select query again, let’s see how our estimates fare:

Uh oh – SQL Server still estimated 46 people live in Boise. Before, the underestimation wasn’t that bad, but now that 2,228 folks live in Boise, SQL Server didn’t allocate enough memory for the sort, so the sort ends up spilling to disk, thus the yellow bang on the Sort operator. If you hover your mouse over the Sort operator, you can see more details about how SQL Server wrote to TempDB:

How to Fix Bad Statistics

SQL Server automatically updates statistics as your data changes, but that may not be enough. You may have to also manually trigger statistics updates. I’ll update the Users table statistics, then run my query again:

Now when I run my query again, the estimates are way more accurate:

And the sort no longer spills to disk.

Now, you probably don’t want to be running manual statistics updates all the time because this can actually trigger a whole lot more problems. However, the purpose of this blog post was to get you started on your learning journey.

Here are the resources I’d recommend next:

Want to learn more about troubleshooting these kinds of issues?

Check out my How I Use the First Responder Kit class. I walk you through using sp_BlitzWho, sp_BlitzFirst, sp_BlitzIndex, and many of the other scripts in the open source First Responder Kit.

If you’ve got a Live Class Season Pass, you can drop in on this class or any of my live online classes at any time. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.

Or, if you’ve got a Recorded Class Season Pass, you can hop in and watch the recordings in your account at any time.

See you in class!

Previous Post
Updated First Responder Kit and Consultant Toolkit for November 2020
Next Post
How Scalar User-Defined Functions Slow Down Queries

3 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.