Finding & Fixing Statistics Without Histograms

Last Updated September 21, 2018

Men Without Hats – now those guys were cool:

Statistics Without Histograms – not so much.

If you have a database that’s been passed along from one SQL Server to another, gradually upgraded over the years, or if you’ve had a table that’s been loaded but never queried, you can end up with a curious situation. For example, take the StackOverflow2010 database: I build it on SQL Server 2008, detach it, and then make it available for you to download and play with. The nice thing about that is you can attach it to any currently supported version – SQL Server 2017 attaches the database, runs through an upgrade process, and it just works.

Mostly.

But when you go to look at statistics:

That means when we ask SQL Server to guess how many rows are going to come back for queries, it’s going to act real rude and totally removed.

Finding This Problem in SQL Server 2016 & Newer

Starting with SQL Server 2016 Service Pack 1 Cumulative Update 2, you can run this in an affected database to find the tables with missing stats, and generate an UPDATE STATISTICS command to fix them.

Warning: before you run this, test it in development. In the comments on this post, we’re getting reports that querying this new DMV is causing some servers to restart! Michael J. Swart reports that it was fixed in 2017 Cumulative Update 8.

SELECT DISTINCT SCHEMA_NAME(o.schema_id) AS schema_name,
       o.name AS table_name,
    'UPDATE STATISTICS ' + QUOTENAME(SCHEMA_NAME(o.schema_id)) + '.' + QUOTENAME(o.name) + ' WITH FULLSCAN;' AS the_fix
  FROM sys.all_objects o 
  INNER JOIN sys.stats s ON o.object_id = s.object_id AND s.has_filter = 0
  OUTER APPLY sys.dm_db_stats_histogram(o.object_id, s.stats_id) h
  WHERE o.is_ms_shipped = 0 AND o.type_desc = 'USER_TABLE'
    AND h.object_id IS NULL
    AND 0 < (SELECT SUM(row_count) FROM sys.dm_db_partition_stats ps WHERE ps.object_id = o.object_id)
  ORDER BY 1, 2;

SELECT DISTINCT SCHEMA_NAME(o.schema_id) AS schema_name,

o.name AS table_name,

'UPDATE STATISTICS ' + QUOTENAME(SCHEMA_NAME(o.schema_id)) + '.' + QUOTENAME(o.name) + ' WITH FULLSCAN;' AS the_fix

FROM sys.all_objects o

INNER JOIN sys.stats s ON o.object_id = s.object_id AND s.has_filter = 0

OUTER APPLY sys.dm_db_stats_histogram(o.object_id, s.stats_id) h

WHERE o.is_ms_shipped = 0 AND o.type_desc = 'USER_TABLE'

AND h.object_id IS NULL

AND 0 < (SELECT SUM(row_count) FROM sys.dm_db_partition_stats ps WHERE ps.object_id = o.object_id)

ORDER BY 1, 2;

That gives you a list of fixes:

Run those in a way that makes you feel – what’s the word I’m looking for – safe. Depending on your server’s horsepower and the size of the objects, you may want to do it after hours.

Updating statistics can be a read-intensive operation since it’s going to scan the table, but at least it’s not write-intensive (since statistics are just single 8KB pages.) However, be aware that this can also cause recompilations of query plans that are currently cached.

Finding This Problem in Earlier Versions

That’s left as an exercise for the reader. Parsing DBCC SHOW_STATISTICS would be a bit of a pain in the rear, and I’m already dancing to a list of YouTube’s related videos. Good luck!

A Simple Stored Procedure Pattern To Avoid

6 DBA Lessons I Wish Someone Would Have Taught Me Earlier

38 Comments. Leave new

Kev Riley
September 20, 2018 9:33 am

The query is returning some false positives for me – when a column only has NULL values for every row (I know, I know…..) DBCC SHOW_STATISTICS returns a one row histogram, with RANGE_HI_KEY = NULL and EQ_ROWS= total number of rows in the table.
But sys.dm_db_stats_histogram returns nothing.

Can’t see anything in BOL that would call this out?

Reply
- Brent Ozar
  September 20, 2018 9:36 am
  
  Ooo, neat find! Open an issue on Github and I’ll dig into it.
  
  Reply
antonio
September 20, 2018 9:34 am

you should exclude the case of having columnstore indexes : its stat is empty (built on the fly) but this leads your script to give the fix when there is any issue.
maybe it could be better to have a statement for each stats with an empty histogram instead of giving simply the name of the overstanding table. i could open a pull request on GitHub 😛

Reply
- Brent Ozar
  September 20, 2018 9:36 am
  
  Perfect! Go for it. Thanks!
  
  Reply
Stephen Morris
September 20, 2018 9:51 am

A bit fiddly but not too bad, would need some extra SQL to pull back table and statistics names and a cursor to test each one in turn.

create table #st
(
RANGE_HI_KEY bigint,
RANGE_ROWS bigint,
EQ_ROWS bigint,
DISTINCT_RANGE_ROWS bigint,
AVG_RANGE_ROWS bigint
)

declare @statement varchar(1000)
set @statement = ‘dbcc show_statistics (”[dbo].[yourtable]”, [PK_yourtable]) with histogram’

insert #st (RANGE_HI_KEY, RANGE_ROWS, EQ_ROWS, DISTINCT_RANGE_ROWS, AVG_RANGE_ROWS)
exec(@statement)

select count(*) from #st

Reply
Shaun
September 20, 2018 11:22 am

I’m curious as to why you have the generated command update all stats on the table instead of just the stat without the histogram..

Reply
- Brent Ozar
  September 20, 2018 5:02 pm
  
  Shaun – because I’m lazy.
  
  Reply
Howard
September 20, 2018 11:32 am

Useful query. Thank you.

Lots of duplicate UPDATE STATISTICS statements are generated. SELECT DISTINCT clears that up.

Reply
- Brent Ozar
  September 20, 2018 5:02 pm
  
  Great point! Added that to the post.
  
  Reply
Justin Hoffmann
September 20, 2018 1:50 pm

You may need to adjust the spacing a bit, but I copied the above and applied what it looks like Ola Hallengren does from one of my CommandLog tables:

SELECT SCHEMA_NAME(o.schema_id) AS schema_name,
o.name AS table_name,
‘UPDATE STATISTICS ‘ + QUOTENAME(SCHEMA_NAME(o.schema_id)) + ‘.’ + QUOTENAME(o.name) + ‘ ‘ +QUOTENAME(s.name) + ‘ WITH FULLSCAN;’ AS the_fix
FROM sys.all_objects o
INNER JOIN sys.stats s ON o.object_id = s.object_id AND s.has_filter = 0
OUTER APPLY sys.dm_db_stats_histogram(o.object_id, s.stats_id) h
WHERE o.is_ms_shipped = 0 AND o.type_desc = ‘USER_TABLE’
AND h.object_id IS NULL
AND 0 < (SELECT SUM(row_count) FROM sys.dm_db_partition_stats ps WHERE ps.object_id = o.object_id)
ORDER BY 1, 2;

Reply
Natalia
September 20, 2018 1:50 pm

I ran it on prod and found 70 missing histograms. Thank you for a script. Going to build missing stats this week and hope CPU will go down :))

Reply
Chris Wood
September 20, 2018 3:12 pm

You mention SP1 CU2. What fix was in the CU that makes this happen?

Reply
- Kevan Riley
  September 20, 2018 3:23 pm
  
  Chris – that CU introduced the DMF sys.dm_db_stats_histogram
  https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-stats-histogram-transact-sql?view=sql-server-2017
  
  Reply
Chris Wood
September 20, 2018 3:45 pm

Not sure about Columnstore but for normal tables if I run the query with DISTINCT and then run the update stats the problem should disappear. We have Auto Create and Auto Update Stats on and run OLA on a regular basis.

Reply
- Shaun
  September 20, 2018 3:51 pm
  
  See Kev Riley’s comment (the first on one this post). If the column contains all NULLs, it will continue to show up in the query.
  
  Reply
  - Chris Wood
    September 20, 2018 3:57 pm
    
    So not allowing NULL would stop that issue. We try to do that as best we can.
    
    Reply
Tim
September 20, 2018 3:45 pm

Anyone else getting a system assertion check failure when running this? It’s worked most places, but on two databases on 13.0.5201.2 instances it bombs with the following.

Location: e:\b\s3\sources\sql\ntdbms\include\typesystem\_retypesbase.inl:544
Expression: cbLen <= x_cbStringMost
SPID: 59
Process ID: 1456
Msg 3624, Level 20, State 1, Line 1
A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a Hotfix from Technical Support.
Msg 596, Level 21, State 1, Line 0
Cannot continue the execution because the session is in the kill state.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.

DBCC CHECKDB comes back clean on the same db after.

Reply
- Steve
  September 20, 2018 4:24 pm
  
  Yeah. We got the same error and it caused a stack dump and the AG to bounce.
  
  Reply
  - Tim
    September 20, 2018 4:56 pm
    
    Do you happen to have a stat on a column with a hierarchyid data type? I can replicate the error on a new database with that setup.
    
    Reply
    - Steve
      September 20, 2018 5:02 pm
      
      We do have at least one.
      
      Reply
  - Rob
    September 21, 2018 9:31 am
    
    Same here, our error was very similar to Tim’s above and caused the AG to bounce also. Normally Brent’s queries are safe 🙂
    
    Reply
- Brent Ozar
  September 20, 2018 5:03 pm
  
  Woohoo! Quality is job 1 alright. If you’ve got time, that’s a great bug to file with Microsoft by opening a support case.
  
  Reply
  - Michael J Swart
    September 21, 2018 8:48 am
    
    I got the same error on an index with a rowversion.
    Microsoft knows about this and has a fix
    https://support.microsoft.com/en-ca/help/4316655/assertion-error-querying-dm-db-stats-histogram-dmf-in-sql-server-2017
    
    Reply
    - Brent Ozar
      September 21, 2018 9:45 am
      
      Woohoo! Great find, added that to the post. Thanks sir!
      
      Reply
- MisterZeus
  September 21, 2018 7:17 am
  
  Had the exact problem 2 days ago, while I was testing the latest sp_Blitz check:
  https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/pull/1758#issuecomment-422857763
  
  Reply
- Chris Wood
  September 26, 2018 8:25 am
  
  Had the system assertion error on 2016 SP1 build 4466 even with all the extra checks in the script.
  Message was Location:d:\b\s3\sources\sql\ntdbms\include\typesystem\_retypesbase.inl>, line=544 Failed Assertion Expression:= ‘cbLen <= x_cbStringMost'.
  This was in an AG too and caused it to hiccup.
  
  Reply
Thomas Franz
September 21, 2018 1:26 am

Changing the last WHERE condition to
AND EXISTS (SELECT * FROM sys.dm_db_partition_stats ps WHERE ps.object_id = o.object_id AND ps.row_count > 0)
will reduce the execution time on my database (with many partitionized tables) from 1 minute to 1 second -> never use SUM() or COUNT() when an EXISTS would be sufficient

Reply
Shaun
September 21, 2018 9:54 am

Kev Riley:

To eliminate columns that contain all NULLs (and a 1 row histogram) from the result set, cross apply dm_db_stats_properties and only select results where dm_db_stats_properties.rows is NULL. Like this (which also incorporates Thomas Franz’ change and adds a command to only update the single stat instead of all stats on the table):

SELECT SCHEMA_NAME(o.schema_id) AS schema_name,
o.name AS table_name,
s.[name] AS stats_name,
‘USE ‘ + QUOTENAME(DB_NAME()) + ‘;UPDATE STATISTICS ‘ + QUOTENAME(SCHEMA_NAME(o.schema_id)) + ‘.’
+ QUOTENAME(o.name) + ‘ WITH FULLSCAN;’ AS the_fix_all_table_stats,
‘USE ‘ + QUOTENAME(DB_NAME()) + ‘;UPDATE STATISTICS ‘ + QUOTENAME(SCHEMA_NAME(o.schema_id)) + ‘.’
+ QUOTENAME(o.name) + ‘ ‘ + QUOTENAME(s.name) + ‘ WITH FULLSCAN;’ AS the_fix_just_bad_stat
FROM sys.all_objects o
INNER JOIN sys.stats s
ON o.object_id = s.object_id
AND s.has_filter = 0
OUTER APPLY sys.dm_db_stats_histogram(o.object_id, s.stats_id) h
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
WHERE o.is_ms_shipped = 0
AND o.type_desc = ‘USER_TABLE’
AND h.object_id IS NULL
AND EXISTS
(
SELECT *
FROM sys.dm_db_partition_stats ps
WHERE ps.object_id = o.object_id
AND ps.row_count > 0
)
AND sp.rows IS NULL
ORDER BY 1,
2;

Reply
- Kevan Riley
  September 24, 2018 7:59 am
  
  Thanks Shaun – works a treat!
  
  Reply
- Thomas Franz
  September 24, 2018 10:45 am
  
  @Shaun:
  nearly perfect – since UPDATE STATISTICS is not allowed on columnstore indexes, we have to add a
  
  AND NOT EXISTS (SELECT * FROM sys.indexes AS i WHERE i.object_id = o.object_id AND i.name = s.name AND i.type_desc like ‘%COLUMNSTORE%’)
  
  to the WHERE condition
  
  Reply
Morden Kain
September 26, 2018 2:36 pm

Well Brent, you can start with the following for Pre-SQL Server 2016 (script is a bit long though), the below is good for a single database. I have a modified version that swings through all databases …
set nocount on;
declare
@SQL varchar(2048),
@MinID int = 1,
@MaxID int
;
if object_id(‘tempdb..#tmpStatsInfo’) is not null drop table #tmpStatsInfo;
if object_id(‘tempdb..#tmpStatHeader’) is not null drop table #tmpStatHeader;

declare @tmpObjTbl table (
ID int identity(1,1),
ObjSchema varchar(128) not null,
TblName varchar(1024) not null,

StatNm varchar(512) null,
stats_id int not null,
auto_created bit null,
user_created bit null,
no_recompute bit null,
has_filter bit null,
filter_definition varchar(max) null,
is_temporary bit null,
is_incremental bit null,

last_updated datetime2 null,
rows bigint null,
rows_sampled bigint null,
steps int null,
unfiltered_rows bigint null,
modification_counter bigint null
);

declare @tmpStatHeader table (
Name varchar(1024) not null,
Updated datetime null,
Rows bigint null,
RowsSampled bigint null,
Steps int null,
Density decimal (10, 7) null,
AvgKeyLen int null,
StringIndex varchar(3) null,
FilterExpression varchar(4096) null,
UnfilteredRows bigint null
);

create table #tmpStatHeader (
ObjTblID int not null,
Updated datetime null,
Rows bigint null,
RowsSampled bigint null,
Steps int null,
UnfilteredRows bigint null
);

create table #tmpStatsInfo (
DBName varchar(512) not null,
ObjSchema varchar(128) not null,
TblName varchar(1024) not null,

name varchar(512) null,
stats_id int not null,
auto_created bit null,
user_created bit null,
no_recompute bit null,
has_filter bit null,
filter_definition varchar(max) null,
is_temporary bit null,
is_incremental bit null,

last_updated datetime2 null,
rows bigint null,
rows_sampled bigint null,
steps int null,
unfiltered_rows bigint null,
modification_counter bigint null,

ChangeCnt decimal(10,2) null,
ExceededChangeCnt char(1) null
);

insert @tmpObjTbl (
ObjSchema,
TblName,
StatNm,
stats_id,
auto_created,
user_created,
no_recompute,
has_filter,
filter_definition,
is_temporary,
is_incremental,
rows,
modification_counter
)
select distinct object_schema_name(o.id) as ObjSchema, o.name as TblName, s.name as StatNm,
s.stats_id, s.auto_created, s.user_created, s.no_recompute, s.has_filter, s.filter_definition,
s.is_temporary, s.is_incremental,
si.rows, si.rowmodctr as modification_counter
from sys.sysindexes si
join sys.sysobjects o
on o.id = si.id
join sys.stats s
on s.object_id = si.id
order by o.name, s.name
;

select @MaxID = max(id) from @tmpObjTbl;

while (@MinID <= @MaxID)
begin
delete from @tmpStatHeader;

select @SQL = 'dbcc show_statistics (''' + ObjSchema + '.' + TblName + ''', [' + StatNm + ']) with stat_header;' from @tmpObjTbl where ID = @MinID;

if @SQL is not null
begin
insert @tmpStatHeader
exec (@SQL);

insert #tmpStatHeader (ObjTblID, Updated, Rows, RowsSampled, Steps, UnfilteredRows)
select @MinID, Updated, Rows, RowsSampled, Steps, UnfilteredRows
from @tmpStatHeader
end

set @MinID += 1;
end

update t
set
last_updated = h.Updated,
rows = (case when t.rows = 0 and h.Rows 0 then h.Rows else t.rows end),
rows_sampled = h.RowsSampled,
steps = h.Steps,
unfiltered_rows = h.UnfilteredRows
from @tmpObjTbl t
join #tmpStatHeader h
on h.ObjTblID = t.ID;

insert #tmpStatsInfo (
DBName,
ObjSchema,
TblName,
name,
stats_id,
auto_created,
user_created,
no_recompute,
has_filter,
filter_definition,
is_temporary,
is_incremental,
last_updated,
rows,
rows_sampled,
steps,
unfiltered_rows,
modification_counter,
ChangeCnt,
ExceededChangeCnt
)
select db_name(), ObjSchema, TblName, StatNm, stats_id, auto_created, user_created, no_recompute, has_filter,
filter_definition, is_temporary, is_incremental, last_updated, rows, rows_sampled, steps, unfiltered_rows,
modification_counter,
case
when (unfiltered_rows > 1000000 and unfiltered_rows 10000000 and unfiltered_rows 100000000 then [rows] * .05
else [rows] * .2
end as ChangeCnt,
case
when unfiltered_rows ([rows] * .2) then ‘Y’
when (unfiltered_rows > 1000000 and unfiltered_rows ([rows] * .15) then ‘Y’
when (unfiltered_rows > 10000000 and unfiltered_rows ([rows] * .1) then ‘Y’
when unfiltered_rows > 100000000 and modification_counter > ([rows] * .05) then ‘Y’
else ‘N’
end as ExceededChangeCnt
from @tmpObjTbl;

select
‘use [‘ + DBName + ‘]; update statistics [‘ + ObjSchema + ‘].[‘ + TblName + ‘] ([‘ + [name] + ‘]) with ‘ +
case
when auto_created = 1 then ‘resample’
else ‘fullscan’
end +
case when auto_created = 0 then ‘, index’ else ” end +
case when no_recompute = 1 then ‘, norecompute’ else ” end +
case when is_incremental = 1 then ‘, incremental = on’ else ” end +
‘;’ as Script, *
from #tmpStatsInfo
where ExceededChangeCnt = ‘Y’
order by DBName, ObjSchema, TblName, [name];

Reply
Antonio
September 26, 2018 3:53 pm

Just to point out that as for official documentation “Partitioned tables and indexes are not fully supported in this view(sys.sysindex); use the sys.indexes catalog view instead”.

Reply
- Morden Kain
  September 26, 2018 5:55 pm
  
  The above is only for a single database. When I do multi-database, I use the following:
  from sys.stats s
  cross apply sys.dm_db_stats_Properties(s.object_id, s.stats_id) p
  Additionally, sys.indexes does not give the row count or row modified that you can find in sys.sysindexes. If you can point me to someplace that contains that information, I would gladly switch it out. Already looking at switching to the sys.dm_db_stats_Properties as it contains a lot of the information I need.
  
  Reply
  - Antonio
    September 27, 2018 1:10 am
    
    You can look at sys.dm_db_incremental_stats_properties 🙂
    
    Reply
    - Morden Kain
      September 27, 2018 8:50 am
      
      Problem with the sys.dm_db_incremental_stats_properties is that it came about with SQL Server 2014 (which, thankfully, I am on). However, with the sys.dm_db_stats_Properties, you can use it as early as SQL Server 2008. How can we go back further, to say SQL Server 2005/2000? I think that is where Brent is trying to get back to.
      
      Reply
Sean Pearce
September 27, 2018 5:29 am

I have been using the following to find these since 2008. KISS
SELECT *
FROM sys.stats
WHERE STATS_DATE(object_id, stats_id) IS NULL

Reply
- antonio
  September 27, 2018 5:32 am
  
  if you need to resample your stats over partitions, you need sys.dm_db_incremental_stats_properties
  
  Reply
Shaun Stuart
November 12, 2018 11:03 am

This is not a situation unique to old databases or tables that have never been queried. I just discovered you can lose your stats data if you run UPDATE STATISTICS with a sample size of zero. It says so right in BOL: “We recommend against specifying 0 PERCENT or 0 ROWS. When 0 PERCENT or ROWS is specified, the statistics object is updated but does not contain statistics data.”

It amazing what you can learn when you read the manual..

Reply