I first knew something was up when I looked at the job history for a simple maintenance plan. It had two steps:
- Rebuild all the indexes in the database – this took 10 minutes each night.
- Update statistics – this took 2-3 hours each night.
Maintenance Plans light the fuse
I love the concept of maintenance plans, but I don’t love the way all the tasks are set up.
In the case I was looking at, the Update Statistics task was being used with two values that are set by default:
- Run against all statistics
- Update them with fullscan
“All” statistics means that both “column” and “index” statistics will be updated. There may be quite a lot of statistics — most people leave the “auto create statistics” option enabled on their databases, which means that queries will dynamically cause the creation of more and more statistics over time.
Combined with “fullscan”, updating all statistics can become a significant amount of work. “Fullscan” means that to update a statistic, SQL Server will scan 100% of the values in the index or column. That adds up to a lot of IO.
Why ‘SELECT StatMan’ repeatedly scans tables
If SQL Server needs to update column level statistics for the same table, it could potentially use a single scan and update multiple stats, right?
Because of the runtimes I was seeing, I was pretty sure that wasn’t happening. But we can take a closer look and see for ourselves.
In our maintenance plan task, if we hit “View TSQL”, a window pops up showing us the comamnds that the plan will run. (I love this feature, by the way!) We will use one of these commands to test things out in a bit.
First, let’s make sure we have some column level statistics on our database. It already has indexes and their associated stats. To create some column level stats, I run these queries:
--create two column stats using 'auto create statistics'
select * from Person.Person where MiddleName like 'M%';
select * from Person.Person where Title is not null;
--Create two filtered stats on Title
create statistics kl_statstest1 on Person.Person (Title) where Title = 'Mr.'
create statistics kl_statstest2 on Person.Person (Title) where Title = 'Ms.'
That will create two “auto” stats what start with “_WA_Sys”, and two stats that I named myself. To check ’em out and see ALL the index and column stats on the table, we run:
exec sp_helpstats 'Person.Person', 'All';
Sure enough, this shows us that we have seven stats total– three are related to indexes.
Alright, time to run that sample command excerpted from our maintenance plan. I start up an Extended Events trace to capture IO from sp_statements completed, then run the command the maintenance plan was going to use to update every statistic on this table with fullscan:
UPDATE STATISTICS [Person].[Person]
Here’s the trace output –click to view it in a larger image:
Looking at the Extended Events trace output, I can see the commands that were run as well as their logical reads. The commands look like this:
SELECT StatMan([SC0]) FROM
(SELECT TOP 100 PERCENT [Title] AS [SC0]
FROM [Person].[Person] WITH (READUNCOMMITTED)
WHERE ([title]='Mr.') ORDER BY [SC0] )
AS _MS_UPDSTATS_TBL OPTION (MAXDOP 16)
The “logical_reads” column lets me know that updating four of these statistics had to do four separate scans of my table– and three of them are all on the Title column! (Doing a SELECT * FROM Person.Person shows 5,664 logical reads by comparison.)
IO was lower for statistics related to nonclustered indexes because those NC indexes have fewer pages than the clustered index.
A better way to update statistics: Let SQL Server pick the TABLESAMPLE
If you just run the TSQL command ‘UPDATE STATISTICS Person.Person’ (without telling it to scan all the rows), it has the option to do something like this:
SELECT StatMan([SC0], [SB0000])
(SELECT TOP 100 PERCENT [SC0],
step_direction([SC0]) over (order by NULL) AS [SB0000]
(SELECT [Title] AS [SC0]
FROM [Person].[Person] TABLESAMPLE SYSTEM (3.547531e+001 PERCENT) WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SB0000] ) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 1)
It dynamically figures out a sample size by which to calculate results! (It can pick a variety of options– including scanning the whole thing.)
How to configure faster, better statistics maintenance
Avoid falling for the pre-populated settings in the “Update Statistics” task in the maintenance plan. It’s rare to truly need to use FULLSCAN to update stats in SQL Server, and even when cases where it’s justified you want to implement that with statements targeting the individual statistics to update. The basic “UPDATE STATISTICS Schema.TableName” command is pretty clever– the issue is simply that Maintenance Plans don’t make it easy for you to run that!
Unfortunately, if you use maintenance plans there’s no super simple solution– it forces you to specify either fullscan or a specific sample. There’s no way to just use the basic “You compute the minimum sample” with that task.
You’ve still got good options, they’re just a few more steps:
- You could use a t-sql related task or a custom SQL Agent job to run sp_updatestats
- You could use a free index and statistics maintenance script. The example I’ve linked to is super clever, and avoids updating statistics where it has just rebuilt an index!
- You could also let auto update stats take care of the issue– that’s often just fine on small databases or where there aren’t major data fluctuations
And each of those options should chew up less IO than updating all index and column statistics with FULLSCAN.