ALL CAPS EXCITEMENT
In rather fun news, a Connect Item of mine about replacing DBCC SHOW_STATISTICS with a DMV or function got updated as ‘Resolved as Fixed’. It was only slightly odd, because I intentionally funneled votes to Greg Low’s item after realizing I had duplicated it. So, Greg, if you’re out there… Uh, hi!
Root cause: Microsoft employees also have a hard time searching Connect Items.
So what does it look like?
I have no idea. I don’t know if it’s a DMV or a function, I don’t know what it’s called, and I don’t know what information it exposes. I also don’t know how it will get joined to other DMVs. There were no details offered up when the status changed. And I’m fine with that! I’m pretty psyched that it got enough traction to get a fix to begin with. If anyone from MS feels like shooting me an email with details, I won’t complain.
But since we don’t know, we’re free to speculate. Like all those History Channel shows about aliens and fake animals and where the Templars secretly buried Jesus’ gold teeth in Arizona. It’ll be fun!
It could be a lot like sys.dm_db_index_physical_stats! Maybe sys.dm_db_stats_stats! Heh. No, don’t do that. I think sys.dm_db_statistics would work. It could take inputs for database id, object id, and optionally index id and partition number to retrieve more specific stats. One input that would be killer here is for column name, so you could see any statistics objects that reference a particular column. Why? Because system generated statistics on _WA_sys_iuwghaiuhareiugh982375 aren’t helpful. In the words of Doug Lane: “that’s crap”.
This would be cool, because it would be easy to CROSS APPLY with existing queries to gather stats info. No learning new JOINs and all that hocus pocus relational database nonsense! It’s basically NoSQL with less curly brackets and reliable results.
Now, this could go two ways. It could be additional data tacked on to sys.stats, or it could be something like sys.stats_details (yeah, not exactly inspired), and join pretty easily to sys.indexes, sys.tables, sys.partitions, etc. These are probably the most common DMVs that you’d use along with this one.
What should it tell you?
Well, most statistics information queries I’ve written return a good bit of information. Not everything is easily accessible, so this could be expanded.
- Table name
- Index name (if not system generated, or a stats object without an index)
- Statistics object name
- Name of the column used for the histogram
- Last time the stats were updated
- Rows in table/index (different for filtered indexes, etc.)
- Rows sampled by the last update
- Modifications since last update
- Modifications before auto update
- Histogram steps
- If the statistics are marked as ascending
- Use count
HECK YES USE COUNT!
No, not because I want to start dropping statistics. But this would be AWESOME for maintenance scripts like Ola Hallengren’s when deciding which stats to update. Right now, you can skip statistics that haven’t been modified, and that’s great. But why bother updating statistics that aren’t even used? Building off other information exposed, you could also choose between full or sampled updates based on how many rows are in there.
But what about the Histogram?
What is it? Why is is capitalized here and nowhere else? The histogram is where the data for the leftmost column in your [object] gets described. There can be up to 200 steps in a histogram, and there are several columns of information in there. If this is also exposed, you may need something like sys.stats_histogram, or a function that accepts a stats ID and returns the histogram as well.
What would you want to see?
Leave a comment below if you think I missed something! Who knows, there may still be time for some lucky developer to make changes before this gets released!
Thanks for reading!