Stats Week: Messin' With Statistics - Brent Ozar Unlimited®

If there’s one thing living in Texas has taught me

It’s that people are very paranoid that you may Mess With it. Even in Austin, where the citizenry demand weirdness, they are vehemently opposed to any form of Messing, unless it results in mayonnaise-based dipping sauce.

Me? I like Messing With stuff. Today we’re going to look at one way you can make SQL think your tables are much bigger than they actually are, without wasting a bunch of disk space that has nearly the same price as wall to wall carpeting.

To do this, we’re going to venture deep into the Undocumented Command Zone. It’s like the Twilight Zone, except if you go there on your production server, you’ll probably end up getting fired. So, dev servers only here, people.

Creatine

Let’s make a table, stuff a little bit of data in it, and make some indexes.

DROP TABLE [dbo].[Stats_Test]

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT  ISNULL([N].[N], 0) AS [ID] ,
    ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE())),     '1900-01-01') AS [OrderDate] ,
        ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 10000.00))) AS [Amt1]
INTO [Stats_Test]
FROM    [Numbers] [N]
ORDER BY [N];

CREATE UNIQUE CLUSTERED INDEX [cx_id] ON [dbo].[Stats_Test] ([ID])
CREATE UNIQUE NONCLUSTERED INDEX [ix_test1] ON [dbo].[Stats_Test] ([OrderDate], [ID])
CREATE UNIQUE NONCLUSTERED INDEX [ix_test2] ON [dbo].[Stats_Test] ([Amt1], [ID])

DROP TABLE [dbo].[Stats_Test]

;WITH E1(N) AS (

SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL

SELECT NULL ),

E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),

Numbers AS (SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)

SELECT ISNULL([N].[N], 0) AS [ID] ,

ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE())), '1900-01-01') AS [OrderDate] ,

ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 10000.00))) AS [Amt1]

INTO [Stats_Test]

FROM [Numbers] [N]

ORDER BY [N];

CREATE UNIQUE CLUSTERED INDEX [cx_id] ON [dbo].[Stats_Test] ([ID])

CREATE UNIQUE NONCLUSTERED INDEX [ix_test1] ON [dbo].[Stats_Test] ([OrderDate], [ID])

CREATE UNIQUE NONCLUSTERED INDEX [ix_test2] ON [dbo].[Stats_Test] ([Amt1], [ID])

There’s our thousand rows. If you’re dev testing against 1000 rows, your production data better only have 1001 rows in it, or you’re really gonna be screwed when your code hits real data. How do we cheat and make our data bigger without sacrificing disk space?

Eat Clen, Tren Hard, Anavar give up

You can update all statistics on the table at once, or target specific indexes with the following commands.

UPDATE STATISTICS [dbo].[Stats_Test] WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([cx_id]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000
UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test1]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000
UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test2]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([cx_id]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test1]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test2]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

This will set your table row count to uh… 10 billion, and your page count to 1 billion. This makes sense, since usually a bunch of rows fit on a page. You can be more scientific about it than I was, this is just to give you an idea.

So let’s check in on our statistics! Sup with those?

DBCC SHOW_STATISTICS('dbo.Stats_Test', cx_id)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test1)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test2)

DBCC SHOW_STATISTICS('dbo.Stats_Test', cx_id)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test1)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test2)

Hint: these commands will not show inflated page or row counts in them. They actually won’t show page counts at all. Hah. That’s kinda silly, though. Hm.

Anyway, what we should grab from the statistics histograms are some middling values we can play with. For me, that’s an ID of 500, a date of 2016-03-18, and an amount of 4733.00.

One thing I’ve found is that the inflated counts don’t seem to change anything for Identities, or Primary Keys. You’ll always get very reasonable plans and estimates regardless of how high you set row and page counts for those. Regular old clustered indexes are fair game.

Some really interesting things can start to happen to execution plans when SQL thinks there’s this many rows in a table. The first is that SQL will use a rare (in my experience) plan choice: Index Intersection. You can think of this like a Key Lookup but with two nonclustered indexes rather than from one nonclustered index to the clustered index.

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[ID] = 500

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[OrderDate] = '2016-03-18'

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[Amt1] = 4733.00

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[ID] = 500

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[OrderDate] = '2016-03-18'

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[Amt1] = 4733.00

For these equality queries, we get the following plans:

SQL isn’t fooled with an equality on 500. We get a little plan. We’ll examine inequality plans in a moment. For now let’s look at the middle plan. That’s where the Index Intersection is occurring. The bottom plan has a regular Key Lookup.

The costs and estimates here are Banana Town crazy. And right there down the bottom, we can see SQL using the Clustered Index key to join our Nonclustered Indexes together. If you’ve been reading this blog regularly, you should know that Clustered Index key columns are carried over to all your Nonclustered Indexes.

If we switch to inequality queries, well…

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[ID] > 500

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[OrderDate] > '2016-03-18'

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[Amt1] > 4733.00

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[ID] > 500

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[OrderDate] > '2016-03-18'

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[Amt1] > 4733.00

The top query that SQL wasn’t fooled by before now has the same insane estimates as the others. Our two bottom queries get missing index requests due to the amount of work the Index Intersection takes.

It’s happening because of the SELECT * query pattern. This will go away if we stick to only selecting columns that are in our Nonclustered Indexes. For example, SELECT ID will result in some pretty sane index seeks occuring. The estimated rows are still way up there.

Unfortunately, STATISTICS TIME and IO are not fooled by our statistical tomfoolery.

They use reality-based measurements of our query activity. This trick is really only useful to see what happens to execution plans. But hey it’s a lot cheaper, easier, and faster than actually inserting 10 billion rows.

So what?

Like a grandma in a grocery store, SQL Server makes all its decisions based on cost. Whatever is cheapest is choice. If SQL Server were a person, it would probably wash and dry used aluminum foil, save old bread ties, and use clothes pins for the right thing.

I forget what I was going to say. Probably something smart about testing your queries about sets of data commensurate to what you have in production (or larger) so that you don’t get caught flatfooted by perf issues on code releases, or if your company finally starts getting customers. This is one technique to see how SQL will treat your code as you get more rows and pages involved.

Just don’t forget to set things back when you’re done. A regular stats update will take care of that.

Thanks for reading!

7 comments

WaynesWorld

April 21, 2016 at 9:16 am

Educational and entertaining – love it!

Brandon M.

April 21, 2016 at 10:52 am

Miscer confirmed. U aware brah?

1. Erik Darling
  
  April 22, 2016 at 11:46 am
  
  >that guy who comments on blog posts between sets
  
Lorrin Ferdinand

April 21, 2016 at 2:16 pm

Does this technique only model cost estimation/bad stats issues?
Which issues does it not model?
I assume:
Parameter-sniffing
Fragmentation

1. Erik Darling
  
  April 21, 2016 at 4:45 pm
  
  Hey! No, you can’t do much with those things here.
  
Carsten Student since 2017

April 21, 2016 at 7:07 pm

Perhaps worth mentioning how to reset the stats afterwards:

DBCC UPDATEUSAGE (0, ‘dbo.Stats_Test’) WITH COUNT_ROWS
UPDATE STATISTICS [dbo].[Stats_Test]

Ref: http://dba.stackexchange.com/questions/130660/how-to-reset-statistics-after-update-statistics-with-rowcount/130663

Kevin Fries Student since 2021

April 22, 2016 at 10:16 pm

Perhaps this may be of interest? (Used to copy stats in Oracle to test systems so the link below is as close as I know of on how to duplicate that in SquealySwerver.)
http://blog.sqlauthority.com/2011/01/05/sql-server-copy-statistics-from-one-server-to-another-server/

PS: Lived in Dallas for a lot of years. When the going gets weird, you know you’re in North Texas.

Stats Week: Messin’ With Statistics

If there’s one thing living in Texas has taught me

Creatine

Eat Clen, Tren Hard, Anavar give up

So what?

Get my new posts by email

7 comments

Leave a comment Cancel reply

Stats Week: Messin’ With Statistics

If there’s one thing living in Texas has taught me

Creatine

Eat Clen, Tren Hard, Anavar give up

So what?

Get my new posts by email

Keep digging

7 comments

Leave a comment Cancel reply