Blog

I first knew something was up when I looked at the job history for a simple maintenance plan. It had two steps:

  1. Rebuild all the indexes in the database – this took 10 minutes each night.
  2. Update statistics – this took 2-3 hours each night.

What was going on? Statistics in SQL Server are small, lightweight objects. Indexes are larger and contain more data. Why would updating statistics take so much longer?

Maintenance Plans light the fuse

I love the concept of maintenance plans, but I don’t love the way all the tasks are set up.

In the case I was looking at, the Update Statistics task was being used with two values that are set by default:

  • Run against all statistics
  • Update them with fullscan

Maintenance Plan Update Stats

“All” statistics means that both “column” and “index” statistics will be updated. There may be quite a lot of statistics — most people leave the “auto create statistics” option enabled on their databases, which means that queries will dynamically cause the creation of more and more statistics over time.

Combined with “fullscan”, updating all statistics can become a significant amount of work. “Fullscan” means that to update a statistic, SQL Server will scan 100% of the values in the index or column. That adds up to a lot of IO.

Why ‘SELECT StatMan’ repeatedly scans tables

If SQL Server needs to update column level statistics for the same table, it could potentially use a single scan and update multiple stats, right?

Because of the runtimes I was seeing, I was pretty sure that wasn’t happening. But we can take a closer look and see for ourselves.

In our maintenance plan task, if we hit “View TSQL”, a window pops up showing us the comamnds that the plan will run. (I love this feature, by the way!) We will use one of these commands to test things out in a bit.

Maintenance Plan Generate TSQL

First, let’s make sure we have some column level statistics on our database. It already has indexes and their associated stats. To create some column level stats, I run these queries:

--create two column stats using 'auto create statistics' 
select * from Person.Person where MiddleName like 'M%';
select * from Person.Person where Title is not null;
GO

--Create two filtered stats on Title
create statistics kl_statstest1 on Person.Person (Title) where Title = 'Mr.'
GO
create statistics kl_statstest2 on Person.Person (Title) where Title = 'Ms.'
GO

That will create two “auto” stats what start with “_WA_Sys”, and two stats that I named myself. To check ‘em out and see ALL the index and column stats on the table, we run:

exec sp_helpstats 'Person.Person', 'All';
GO

UPDATE STATISTICS-sp_helpstats

Sure enough, this shows us that we have seven stats total– three are related to indexes.

Alright, time to run that sample command excerpted from our maintenance plan. I start up an Extended Events trace to capture IO from sp_statements completed, then run the command the maintenance plan was going to use to update every statistic on this table with fullscan:

UPDATE STATISTICS [Person].[Person] 
WITH FULLSCAN
GO

Here’s the trace output –click to view it in a larger image:

Update-Statistics-Extended-Events-Trace-IO

Looking at the Extended Events trace output, I can see the commands that were run as well as their logical reads. The commands look like this:

SELECT StatMan([SC0]) FROM 
(SELECT TOP 100 PERCENT [Title] AS [SC0] 
FROM [Person].[Person] WITH (READUNCOMMITTED)  
WHERE ([title]='Mr.') ORDER BY [SC0] ) 
AS _MS_UPDSTATS_TBL  OPTION (MAXDOP 16)

The “logical_reads” column lets me know that updating four of these statistics had to do four separate scans of my table– and three of them are all on the Title column! (Doing a SELECT * FROM Person.Person shows 5,664 logical reads by comparison.)

IO was lower for statistics related to nonclustered indexes because those NC indexes have fewer pages than the clustered index.

A better way to update statistics: Let SQL Server pick the TABLESAMPLE

If you just run the TSQL command ‘UPDATE STATISTICS Person.Person’ (without telling it to scan all the rows), it has the option to do something like this:

SELECT StatMan([SC0], [SB0000]) 
FROM 
(SELECT TOP 100 PERCENT [SC0], 
	step_direction([SC0]) over (order by NULL) AS [SB0000]  
	FROM
	 (SELECT [Title] AS [SC0] 
	 FROM [Person].[Person] TABLESAMPLE SYSTEM 	(3.547531e+001 PERCENT) WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SB0000] ) AS _MS_UPDSTATS_TBL  
OPTION (MAXDOP 1)

It dynamically figures out a sample size by which to calculate results! (It can pick a variety of options– including scanning the whole thing.)

How to configure faster, better statistics maintenance

Avoid falling for the pre-populated settings in the “Update Statistics” task in the maintenance plan. It’s rare to truly need to use FULLSCAN to update stats in SQL Server, and even when cases where it’s justified you want to implement that with statements targeting the individual statistics to update. The basic “UPDATE STATISTICS Schema.TableName” command is pretty clever– the issue is simply that Maintenance Plans don’t make it easy for you to run that!

Unfortunately, if you use maintenance plans there’s no super simple solution– it forces you to specify either fullscan or a specific sample. There’s no way to just use the basic “You compute the minimum sample” with that task.

You’ve still got good options, they’re just a few more steps:

  • You could use a t-sql related task or a custom SQL Agent job to run sp_updatestats
  • You could use a free index and statistics maintenance script. The example I’ve linked to is super clever, and avoids updating statistics where it has just rebuilt an index!
  • You could also let auto update stats take care of the issue– that’s often just fine on small databases or where there aren’t major data fluctuations

And each of those options should chew up less IO than updating all index and column statistics with FULLSCAN.

↑ Back to top
  1. I actually had a strange performance issue result from stats being updated this morning – a job went from 44 minutes right after the update took place. First thought: developers deployed something ;-) After a little analysis and a new NC index, all was well. After that, however, I still wanted to find out what happened so I turned to my maintenance jobs.

    To investigate my index/stats maintenance history, I have a report I created to query Ola’s logging table if anyone is interested. Turns out the table I put an index on had it’s stats updated, obviously throwing the execution plan out of whack. You would have to update the data sources and parameter defaults, but that’s about it.

    http://allen-mcguire.blogspot.com/2014/01/rdl-for-olas-index-maintenance-logging.html

  2. Thanks Kendra

    So would you recommend to SET AUTO_CREATE_STATISTICS OFF ?

    • Hi Klaas,

      I recommend leaving that ON — statistics are extremely helpful to the optimizer for query plan quality.

      I just wouldn’t update all your statistics with fullscan. The statistics are good and lightweight, but the maintenance plan task isn’t great.

      Hope this helps!
      Kendra

      • OK

        I started doubting because of ‘most people leave the “auto create statistics” option enabled’.

        I even never looked at maintenance plans. The day I became a DBA I started searching the internet and since that day Ola’s been doing my maintenance for me.

        Not long after that I found you and your colleagues.
        I thank you for your answer and even more for helping me every day of the last two years.

  3. Pingback: (SFTW) SQL Server Links 31/01/14 • John Sansom

  4. Do you recommend using sp_updatestats with the resample parameter?

    • I’m not a huge fan of the Resample option. Let’s say someone’s troubleshooting an issue and they try updating statistics on an index with fullscan to see if it solves a problem.

      It doesn’t solve the problem, but suddenly ‘resample’ is now being applied by maintenance repeatedly. And probably the person who was doing the test doesn’t even realize that they caused a change.

      If there are individual statistics which need a higher sample, I like to have an agent job update those specifically with documentation about what issues are corrected — because there’s always a chance that the queries they help won’t even be run in a few years.

  5. Pingback: My links of the week – February 2, 2014 | R4

  6. How does SQL server choose “OPTION (MAXDOP ??)” when running “UPDATE STATISTICS Schema.TableName” and sp_updatestats?

    Is there a way to override this? I suspect I can fasten my stats update if I can set maxdop setting manually.

  7. Would you recommend to SET AUTO_CREATE_STATISTICS ON for TBs sizes databases?
    (In my cases, they are about 5TB for Main DB and 1TB of Staging DB)

    • Mahesh – I’d turn that around and ask two questions:

      1. Do you care about query performance in those databases?

      2. Are the tables fairly mature, meaning you’re not changing the entire contents of the tables every day?

    • Unless your applications fully handle creating statistics and you can know for certain that you have statistics for all the queries that need it, don’t turn off the automatic creation of statistics.

      • Appreciate your quick responses. I was weighting switching to handle the stats update using maintenance script vs my current DB setting which is set ON (assuming the auto update is interfering my OLTP/batch job perf.). We do have index maint. jobs and fragmentation on tables are very low <5%. MAIN DB is active 500+ user OLTP systems having 100s of transactions per minute, few are batch imports, etc and some query run/do lookup on transaction tables. Lately the system is reporting poor perf., we are investigating other aspects, along this.

        Is there a way to find out if the auto update stats is interfering/hampering perf?

        • One quick observation: Statistics creation and statistics update are very different things, and it sounds a bit like you’re confusing them.

          For some performance considerations on updating statistics, you should read this post: http://www.brentozar.com/archive/2014/01/update-statistics-the-secret-io-explosion/

        • I periodically run a profiler trace and monitor “Auto Stats” just to get a feel for whether or not there is excessive auto updates of stats going on. In some cases the application design is such that tables are constantly inserted/deleted from so the stats were perpetually updating, which likely would lead to performance issues. In very particular cases I disable the auto-update of stats on tables using sp_autostats – I never disable it at the database level.

          • There are so many ways this approach could backfire, I’m not even sure where to begin.

          • It also works in many cases where you have a very small table that’s inserted/deleted from very, very often. It does work, but it’s not someone for the inexperienced – agree.

          • Just for other readers of the comments, Allen and I tend to disagree on just about everything related to performance tuning– and disagreement is a good thing! There’s no one single approach to performance tuning or managing a SQL Server and my way definitely ain’t the only right way.

            I moderate the comments on my posts and note where I wouldn’t recommend the approach just so it’s clear what the author thinks is risky or not.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php