Using the StackOverflow database, let’s check out Krock’s query. He’s a competitive fella, and he’s looking to find users who signed up for StackOverflow after he did, but who have a higher reputation than he does.
I’m going to simplify the query a little here:
SELECT me.Id, u.Id
FROM dbo.Users me
JOIN dbo.Users u on
u.CreationDate > me.CreationDate
and u.Reputation > me.Reputation
WHERE me.Id = 557499;
The Users table has a clustered index on the Id field, so the initial seek on “me” (Krock’s row, Id = 557499) is super-fast. However, to find all the users with a higher creation date and reputation – man, that sucks scanning the clustered index for that, and it brings back a lot of rows. Here’s the execution plan:
Hover your mouse over the clustered index scan, and you get:
- Estimated number of rows = 5,277,830
- Actual number of rows = 5,277,831
Nice job, SQL Server, close enough for government work. Now it suggests a missing index – rightfully so – and let’s add it, because we want this query to go faster.
After adding the index, here’s the execution plan:
The query runs faster, make no mistake – but check out the estimates:
- Estimated number of rows = 1
- Actual number of rows = 165,367
Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?
It’s a differently shaped plan – 2012 decided to go parallel with this query. Its estimated subtree cost is 7.37 – higher than my SQL Server’s Cost Threshold for Parallelism.
The motivating factor can be discovered by hovering over that index seek:
SQL Server 2012’s cardinality estimator guessed that 475,005 users would have a newer creation date and higher reputation than Krock.
This query has so many fun lessons to share.
- Your database options matter.
- What database you run a query in matters.
- Neither 2012 nor 2016’s cardinality estimator is “right” in this case, they’re just differently wrong
- Indexes create statistics on their columns, but that doesn’t mean SQL Server has perfect information on what’s inside those columns
- Besides, SQL Server can’t know Krock’s creation date or reputation until the plan is built and the query’s already running
I find statistics to be one of the coolest topics inside the engine, and that’s why I love Doug’s newest course, Statistics: SQL Server’s Guessing Game. I bet you’re gonna love it too – go check it out.