Oh God, he’s talking about statistics again
Yeah, but this should be less annoying than the other times. And much shorter.
You see, I hear grousing.
Updating statistics was bringin’ us down, man. Harshing our mellow. The statistics would just update, man, and it would take like… Forever, man. Man.
But no one would actually be able to tell me how long it took. Though many stacks of Necronomicons were wheel-barrowed out and sworn upon for various incantations of “it was faster, we timed it” and “yes, performance improved” and “no, nothing else was different”.
What I would believe, because it’s totally believable, is that perhaps statistics updated, and some plans recompiled, and that recompiling the plans made things take longer.
Okay, fair enough. But no one ever says that. I wish someone would so I could take one look at an execution plan, verify that it looks like Nyarlathotep eating the NYC skyline, and say “yeah, that’d probably take a few to compile a plan for, let’s try to figure out how to break that into smaller pieces”.
Or, you know, something else reasonable.
Where am I going with this? Oh yeah. I measured. With Extended Events. So I’m extra angry about having to use those things again. XML is a hostile datatype. Don’t let the cute and cuddly creatures on those O’Reilly books fool you. Here’s the setup for the XE session.
CREATE EVENT SESSION [StatsGather] ON SERVER
ADD EVENT sqlserver.auto_stats(
WHERE ([package0].[equal_uint64]([sqlserver].[session_id],(52)) AND [sqlserver].[is_system]=(0)))
ADD TARGET package0.event_file(SET filename=N'C:\temp\StatsGather',max_rollover_files=(10))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
Then, I ran the same type of workload that I ran to get my statistics thresholds for automatic updates. Except, of course, this time I’m only looking at how long each update took. Not when it happened. We already know that. If you want the query I used to parse the session data, it’ll be at the end of the post. I’d rather spend your ever-shortening attention spans getting to the point.
Here’s the point:
Updating statistics, even on some pretty good sized tables, didn’t really take that long. Everything is color-coded, so you can see the row count, how many rows were modified, etc. right next to the corresponding event time and timing. The statistics in red text have nothing to do with our tests, but I left them in there for completeness. They took absolutely minuscule amounts of time.
For the really big tables, which were all in the 10 million to 100 million row range, the statistics update itself never took more than 1 second. It stuck right around the half second mark aside from a few times, in the middle oddly, which I’m blaming on:
Now, how you proceed depends on a few things.
- Do you update statistics often enough to not need to have automatic updates?
- Are your update routines using FULLSCAN? (auto stats updates sample a percentage of the table)
- Do you not have data movement during the day (presumably when recompiling queries would be user-facing)?
- Can you not afford an occasional half second statistics update?
- Do your queries not benefit from updated statistics?
If you answer yes, yes, no, no, no, you’re not just singing about a dozen Amy Winehouse songs at once, you also might be in a situation crazy enough to warrant turning auto-update stats off.
Thanks for reading!
IF OBJECT_ID('tempdb..#StatsGather') IS NOT NULL
DROP TABLE [#StatsGather];
CREATE TABLE [#StatsGather]
[ID] INT IDENTITY(1, 1)
NOT NULL ,
[WaitsXML] XML ,
CONSTRAINT [PK_StatsGather] PRIMARY KEY CLUSTERED ( [ID] )
( [WaitsXML] )
SELECT CONVERT(XML, [event_data]) AS [WaitsXML]
FROM [sys].[fn_xe_file_target_read_file]('C:\temp\StatsGather*.xel', NULL, NULL, NULL)
CREATE PRIMARY XML INDEX [StatsGatherXML] ON [#StatsGather]([WaitsXML]);
CREATE XML INDEX [StatsGatherXMLPath] ON [#StatsGather]([WaitsXML])
USING XML INDEX [StatsGatherXML] FOR VALUE;
WITH x1 AS (
[sw].[WaitsXML].[value]('(event/action[@name="session_id"]/value)', 'BIGINT') AS [session_id] ,
DB_NAME([sw].[WaitsXML].[value]('(event/action[@name="database_id"]/value)', 'BIGINT')) AS [database_name] ,
[sw].[WaitsXML].[value]('(/event/@timestamp)', 'DATETIME2(7)') AS [event_time],
[sw].[WaitsXML].[value]('(/event/@name)', 'VARCHAR(MAX)') AS [event_name],
[sw].[WaitsXML].[value]('(/event/data[@name="index_id"]/value)', 'BIGINT') AS [index_id],
[sw].[WaitsXML].[value]('(/event/data[@name="job_type"]/text)', 'VARCHAR(MAX)') AS [job_type],
[sw].[WaitsXML].[value]('(/event/data[@name="status"]/text)', 'VARCHAR(MAX)') AS [status],
[sw].[WaitsXML].[value]('(/event/data[@name="duration"]/value)', 'BIGINT') / 1000000. AS [duration],
[sw].[WaitsXML].[value]('(/event/data[@name="statistics_list"]/value)', 'VARCHAR(MAX)') AS [statistics_list]
FROM [#StatsGather] AS [sw]
SELECT [x1].[session_id] ,
WHERE [x1].[duration] > 0
ORDER BY [x1].[event_time]
Brent says: if you’re thinking, “Oh, but my tables have more than a hundred million rows, so stats updates would take way longer,” then it’s time to think about regularly updating your stats during maintenance windows anyway. As long as you’re doing that say, weekly, then what are the odds that you’re going to trip the stats update threshold during the week on a billion row table? And if you do, it’s time to think about partitioned views.