If there’s one thing living in Texas has taught me
It’s that people are very paranoid that you may Mess With it. Even in Austin, where the citizenry demand weirdness, they are vehemently opposed to any form of Messing, unless it results in mayonnaise-based dipping sauce.
Me? I like Messing With stuff. Today we’re going to look at one way you can make SQL think your tables are much bigger than they actually are, without wasting a bunch of disk space that has nearly the same price as wall to wall carpeting.
To do this, we’re going to venture deep into the Undocumented Command Zone. It’s like the Twilight Zone, except if you go there on your production server, you’ll probably end up getting fired. So, dev servers only here, people.
Creatine
Let’s make a table, stuff a little bit of data in it, and make some indexes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
DROP TABLE [dbo].[Stats_Test] ;WITH E1(N) AS ( SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL ), E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j), Numbers AS (SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2) SELECT ISNULL([N].[N], 0) AS [ID] , ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE())), '1900-01-01') AS [OrderDate] , ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 10000.00))) AS [Amt1] INTO [Stats_Test] FROM [Numbers] [N] ORDER BY [N]; CREATE UNIQUE CLUSTERED INDEX [cx_id] ON [dbo].[Stats_Test] ([ID]) CREATE UNIQUE NONCLUSTERED INDEX [ix_test1] ON [dbo].[Stats_Test] ([OrderDate], [ID]) CREATE UNIQUE NONCLUSTERED INDEX [ix_test2] ON [dbo].[Stats_Test] ([Amt1], [ID]) |
There’s our thousand rows. If you’re dev testing against 1000 rows, your production data better only have 1001 rows in it, or you’re really gonna be screwed when your code hits real data. How do we cheat and make our data bigger without sacrificing disk space?
Eat Clen, Tren Hard, Anavar give up
You can update all statistics on the table at once, or target specific indexes with the following commands.
1 2 3 4 5 |
UPDATE STATISTICS [dbo].[Stats_Test] WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000 UPDATE STATISTICS [dbo].[Stats_Test] ([cx_id]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000 UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test1]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000 UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test2]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000 |
This will set your table row count to uh… 10 billion, and your page count to 1 billion. This makes sense, since usually a bunch of rows fit on a page. You can be more scientific about it than I was, this is just to give you an idea.
So let’s check in on our statistics! Sup with those?
1 2 3 4 5 |
DBCC SHOW_STATISTICS('dbo.Stats_Test', cx_id) DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test1) DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test2) |
Hint: these commands will not show inflated page or row counts in them. They actually won’t show page counts at all. Hah. That’s kinda silly, though. Hm.
Anyway, what we should grab from the statistics histograms are some middling values we can play with. For me, that’s an ID of 500, a date of 2016-03-18, and an amount of 4733.00.
One thing I’ve found is that the inflated counts don’t seem to change anything for Identities, or Primary Keys. You’ll always get very reasonable plans and estimates regardless of how high you set row and page counts for those. Regular old clustered indexes are fair game.
Some really interesting things can start to happen to execution plans when SQL thinks there’s this many rows in a table. The first is that SQL will use a rare (in my experience) plan choice: Index Intersection. You can think of this like a Key Lookup but with two nonclustered indexes rather than from one nonclustered index to the clustered index.
1 2 3 4 5 6 7 8 9 10 11 |
SELECT * FROM [dbo].[Stats_Test] AS [st] WHERE [st].[ID] = 500 SELECT * FROM [dbo].[Stats_Test] AS [st] WHERE [st].[OrderDate] = '2016-03-18' SELECT * FROM [dbo].[Stats_Test] AS [st] WHERE [st].[Amt1] = 4733.00 |
For these equality queries, we get the following plans:

SQL isn’t fooled with an equality on 500. We get a little plan. We’ll examine inequality plans in a moment. For now let’s look at the middle plan. That’s where the Index Intersection is occurring. The bottom plan has a regular Key Lookup.

The costs and estimates here are Banana Town crazy. And right there down the bottom, we can see SQL using the Clustered Index key to join our Nonclustered Indexes together. If you’ve been reading this blog regularly, you should know that Clustered Index key columns are carried over to all your Nonclustered Indexes.
If we switch to inequality queries, well…
1 2 3 4 5 6 7 8 9 10 11 |
SELECT * FROM [dbo].[Stats_Test] AS [st] WHERE [st].[ID] > 500 SELECT * FROM [dbo].[Stats_Test] AS [st] WHERE [st].[OrderDate] > '2016-03-18' SELECT * FROM [dbo].[Stats_Test] AS [st] WHERE [st].[Amt1] > 4733.00 |
The top query that SQL wasn’t fooled by before now has the same insane estimates as the others. Our two bottom queries get missing index requests due to the amount of work the Index Intersection takes.
It’s happening because of the SELECT * query pattern. This will go away if we stick to only selecting columns that are in our Nonclustered Indexes. For example, SELECT ID will result in some pretty sane index seeks occuring. The estimated rows are still way up there.
Unfortunately, STATISTICS TIME and IO are not fooled by our statistical tomfoolery.

They use reality-based measurements of our query activity. This trick is really only useful to see what happens to execution plans. But hey it’s a lot cheaper, easier, and faster than actually inserting 10 billion rows.
So what?
Like a grandma in a grocery store, SQL Server makes all its decisions based on cost. Whatever is cheapest is choice. If SQL Server were a person, it would probably wash and dry used aluminum foil, save old bread ties, and use clothes pins for the right thing.
I forget what I was going to say. Probably something smart about testing your queries about sets of data commensurate to what you have in production (or larger) so that you don’t get caught flatfooted by perf issues on code releases, or if your company finally starts getting customers. This is one technique to see how SQL will treat your code as you get more rows and pages involved.
Just don’t forget to set things back when you’re done. A regular stats update will take care of that.
Thanks for reading!
7 Comments. Leave new
Educational and entertaining – love it!
Miscer confirmed. U aware brah?
>that guy who comments on blog posts between sets
Does this technique only model cost estimation/bad stats issues?
Which issues does it not model?
I assume:
Parameter-sniffing
Fragmentation
Hey! No, you can’t do much with those things here.
Perhaps worth mentioning how to reset the stats afterwards:
DBCC UPDATEUSAGE (0, ‘dbo.Stats_Test’) WITH COUNT_ROWS
UPDATE STATISTICS [dbo].[Stats_Test]
Ref: http://dba.stackexchange.com/questions/130660/how-to-reset-statistics-after-update-statistics-with-rowcount/130663
Perhaps this may be of interest? (Used to copy stats in Oracle to test systems so the link below is as close as I know of on how to duplicate that in SquealySwerver.)
http://blog.sqlauthority.com/2011/01/05/sql-server-copy-statistics-from-one-server-to-another-server/
PS: Lived in Dallas for a lot of years. When the going gets weird, you know you’re in North Texas.