This came up during Office Hours
And I love questions like this, because they reveal some interesting underpinnings of how the cardinality estimator works. It was something along the lines of “I have a slow query, and when I add an index the query goes faster even though it doesn’t use the index I created.”
We can see an example of this with unique indexes and constraints, but another possibility is that the created index had better statistical information via the histogram. When you add an index, you get Fresh Hot Stats, whereas the index you were using could be many modifications behind current for various reasons. If you have a big table and don’t hit auto-update thresholds often, if you’re not manually updating statistics somehow, or if you’re running into ascending key weirdness. These are all sane potential reasons. One insane potential reason is if you have autocreate stats turned off, and the index you create is on a column that didn’t have a statistics object associated with it. But you’d see plan warnings about operators not having associated statistics.
Again, we’re going to focus on how ADDING an index your query doesn’t use can help. I found out the hard way that both unique indexes and constraints can cease being helpful to cardinality estimation when their statistics get out of date.
One Million Rows!
Here’s some setup script. You love setup script.
CREATE TABLE dbo.t1 (col1 INT NOT NULL, col2 INT NOT NULL, col3 VARCHAR(8000) NOT NULL)
CREATE CLUSTERED INDEX cx_t1_col1 ON dbo.t1 (col1)
INSERT dbo.t1 WITH (TABLOCK) ( col1, col2, col3 )
SELECT TOP 1000000
x.rn, x.rn, x.text
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) rn, m.text
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2
) AS x
I know, I know. What kind of lunatic creates a non-unique clustered index? Well, actually, a lot of you. Even after I told you how great they are! It’s a good thing I have the emotional fortitude of a week old banana.
So, table. Table needs a query. Let’s find a love connection.
FROM dbo.t1 AS t
AND t.col1 >= 1 AND t.col1 < 10001
Now, this is BY NO MEANS the worst estimate I’ve ever seen. It’s pretty close, but it’s weird that it’s still not right, because we literally just filled this table up with delicious and nutritious rows.
But we can pretend
Let’s say this rugged old table has been around since SQL Server 2005, and is chock full of customer data. Let’s pretend that being 90% right is still too wrong. We’re allowed to create an index! I bet one on just the column the clustered index is on would help.
CREATE NONCLUSTERED INDEX ix_partytime ON dbo.t1 (col1)
And now, magically, if we run that exact same query…
If you drop the index and re-run the query, the estimate goes back to 9090.63. Of course, in our case, we could have just updated statistics, but that may not be the obvious solution all the time. Given a choice, I’d much rather update stats than create a fresh index just to update stats. Which is basically what rebuilding indexes is. Heh. Anyway, I hope this helps.
Thanks for reading!