How Bad Statistics Cause Bad SQL Server Query Performance

Last Updated 3 years ago

SQL Server uses statistics to guess how many rows will match what your query is looking for. When it guesses too low, your queries will perform poorly because they won’t get enough memory or CPU resources. When it guesses too high, SQL Server will allocate too much memory and your Page Life Expectancy (PLE) will nosedive.

A common fix is to update your statistics regularly.

In this post, I’ll explain why bad stats cause bad performance, and give you some links on how to improve your statistics & performance.

When Statistics are Accurate…

Let’s start with a query against the Stack Overflow database. I’m using the StackOverflow2013 50GB database, but any size will get roughly similar results here.

SELECT *
	FROM dbo.Users u
	WHERE Location = N'Boise, ID'
	ORDER BY DisplayName;

1

2

3

4

SELECT *

FROM dbo.Users u

WHERE Location = N'Boise, ID'

ORDER BY DisplayName;

To execute this query, SQL Server needs to guess how many people live in Boise so that it can allocate enough memory to sort the query results in RAM. I’ll run the query and include the actual execution plan to see what it guessed:

When you’re viewing an execution plan, look at the “195 of 46” numbers under operator. That means SQL Server actually brought back 195 of an expected 46 rows. That sounds bad – it sounds like SQL Server underestimated rows by something like 423% – but that actually isn’t too bad. I only get nervous when estimates are off by 10x or more, and that’s when SQL Server underestimates how much memory is required to do things like sorts.

In this case, SQL Server guessed really well! It used built-in, automatically-created statistics to guess how many people live in each location. It accurately guessed how much memory it would need, and there’s no yellow bang over the Sort operator because the entire sort happened in memory.

To learn more about how these statistics are generated and what their contents are, stay tuned, and I’ll give you learning resources at the bottom of the post.

When Statistics are Inaccurate…

But let’s say that a lot of people suddenly move to Boise. Manhattan folks have gotten tired of the high rents and they’re feeling a lust for delicious russet potatoes, so let’s move everyone from New York, NY to Boise:

UPDATE dbo.Users
	SET Location = N'Boise, ID'
	WHERE Location LIKE N'New York, NY';

1

2

3

UPDATE dbo.Users

SET Location = N'Boise, ID'

WHERE Location LIKE N'New York, NY';

That adds a couple thousand folks to the census. Now, if we run our same select query again, let’s see how our estimates fare:

Uh oh – SQL Server still estimated 46 people live in Boise. Before, the underestimation wasn’t that bad, but now that 2,228 folks live in Boise, SQL Server didn’t allocate enough memory for the sort, so the sort ends up spilling to disk, thus the yellow bang on the Sort operator. If you hover your mouse over the Sort operator, you can see more details about how SQL Server wrote to TempDB:

How to Fix Bad Statistics

SQL Server automatically updates statistics as your data changes, but that may not be enough. You may have to also manually trigger statistics updates. I’ll update the Users table statistics, then run my query again:

UPDATE STATISTICS dbo.Users WITH FULLSCAN;

1	UPDATE STATISTICS dbo.Users WITH FULLSCAN;

Now when I run my query again, the estimates are way more accurate:

And the sort no longer spills to disk.

Now, you probably don’t want to be running manual statistics updates all the time because this can actually trigger a whole lot more problems. However, the purpose of this blog post was to get you started on your learning journey.

Here are the resources I’d recommend next:

Our free video series on YouTube about how statistics work – the first half-hour video is an introduction, and you should definitely watch that. Then, if you want to learn more details about how queries leverage statistics, continue watching the whole series.
Why updating statistics takes so long – SQL Server scans the entire object once for each statistic that you have, so updating statistics with fullscan on large tables is gonna suck.
Why updating statistics can make queries slower – when you update statistics, you also flush execution plans from the cache. That can be a good thing for crappy plans, but a crappy thing for good plans.
How to use Ola Hallengren’s scripts to update statistics – these scripts are kinda the world standard for database administration.

Want to learn more about troubleshooting these kinds of issues?

Check out my How I Use the First Responder Kit class. I walk you through using sp_BlitzWho, sp_BlitzFirst, sp_BlitzIndex, and many of the other scripts in the open source First Responder Kit.

If you’ve got a Live Class Season Pass, you can drop in on this class or any of my live online classes at any time. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.

Or, if you’ve got a Recorded Class Season Pass, you can hop in and watch the recordings in your account at any time.

See you in class!

3 Comments. Leave new

SQLing4ever
November 17, 2020 10:54 am

Brent, good post as usual. I have one problem with it though… I live in Boise and I don’t think we need any more advertisements for people to move here any more than we already have 😉

Reply
- Brent Ozar
  November 18, 2020 2:40 am
  
  HA!
  
  Reply
Andy
November 20, 2020 6:48 am

Hey Brent, I love potatoes and am now thinking of moving to Boise; but you haven’t included a link to your sp_BlitzMove script. Do I have to sign up to your Season Pass to get this code?

Reply