Along with the ability to create indexes (which you most definitely should be doing), SQL Server gives you the ability to create statistics. This helps SQL Server guess how many rows will come back for your searches, which can help it make better decisions on seeks vs scans, which tables to process first, and how much memory a query will need.
And I never do that.
I mean sure, I do it in training classes to show as a demo – and then I turn right around and show why it doesn’t usually get you across the query tuning finish line.
I used to think I was missing some kind of Super Secret Query Tuning Technique®, like I was just somehow doing it wrong, but then I stumbled across this note in Books Online’s how-to-create-statistics page and suddenly everything made sense:

Let me rephrase: before you even start playing around with statistics, make sure you haven’t taken away SQL Server’s ability to do this for you.

I like to make fun of a lot of SQL Server’s built-in “auto-tuning” capabilities that do a pretty terrible job. Cost Threshold for Parallelism of 5? MAXDOP 0? Missing index hints that include every column in the table? Oooookeydokey.
But there are some things that SQL Server has been taking care of for years, and automatically creating statistics is one of ’em. If you frequently query a column, you’re gonna get a statistic on it, end of story. You might have edge case scenarios where those statistics aren’t enough, but when you do, the fix isn’t usually to create a statistic.
The fix is usually to create the right indexes, which also happen to give you a free statistic anyway – but the index means that the data access itself will be faster.
Creating stats just tells SQL Server how bad the query is going to suck. Your users aren’t satisfied with that – they want the query to actually suck less. That’s where creating indexes comes in, and you should start your query tuning efforts there first.
9 Comments. Leave new
Aren’t combined statistics worth the effort? Especially with compatibility level 120 and before, where SQL Server assumes there is no correlation between e.g. your first name and gender, the estimates could be pretty far off if you didn’t create combined statistics.
You know, I haven’t personally seen an issue where that’s the one thing that’s gotten me across the finish line, no.
It’s kinda like those “How to Survive” shows on TV. Yeah, I get that somebody, somewhere was once saved by knowing which kind of African plant they could apply to a gunshot wound, but….
I have had good luck with combined statistics when the cardinality estimator ended up with ridiculous estimates. eg expects 170,000 rows, gets 7 or the reverse (hello tempDB spills). In these situations, what is your go to if not building smarter stats?
Typically query changes – my classic go-to is breaking the work up into multiple passes, letting SQL Server understand how the results from one group of joins will be different, and then build a new plan for subsequent parts of the operation. Breaking a query up into passes with temp tables is a great example.
Ouch Brent.
I have had more than my fair share of SQL Server Automated Statistics Re-generation problems. And I am so happy with SQL Server 2016 SP1 CU4 allowing me to persist a Sample rate. All my clients are in the communications world, some writing 10 – 20 million records per day. My primary keys have to be bigint as 2 billion is not enough. The problem I have is that the sample rate on these large tables is so low it is a joke. I force a 25% sample rate on my nightly statistics maintenance jobs then during the day the table changes 10% or so and SQL Server decides to update statistics with a sample rate of .4% and the application goes down the tubes. I now have a way to persist a 10% or 25% sample rate on my problem tables. Thank You Microsoft! But most of my clients are not on 2016 SP1 CU4 yet.
Brian: dumb question – if you’re doing nightly stats maintenance jobs, and you want to prevent unwanted stats changes during the day, why not just turn off auto update stats?
Definitely not a dumb question.
And yes it is something I have been struggling with. I would love to disable automated Statistics Updates on a half dozen tables. I am the DBA. My scripts are similar to Ola Hallengren’s, not as elaborate.
Our support team may see problems with my Index Rebuilds (Standard Edition) taking a long time as rebuilding the Primary Key with over 2 billion records in it can take some time. Mus be more than 30% fragmented for a rebuild. They may decide to disable the Index Maintenance jobs (all of them) then forget to turn them back on. Hour by hour Day by Day performance gets worse.
Hmm, now I’m really confused. I thought your earlier comment said you were doing nightly statistics updates. Alright, let’s hit pause here on the conversation – I think this is probably a little bigger than we can accomplish via blog post comment discussions.
[…] Brent Ozar is here to praise statistics auto-creation: […]