In our last episode, I introduced the concept of scan predicates: execution plan operations that weren’t able to seek directly to the rows they needed. Let’s take another query:
CREATE INDEX IX_LastAccessDate_Id_DisplayName_Age
ON dbo.Users(LastAccessDate, Id, DisplayName, Age);
WHERE LastAccessDate > '1800/01/01'
AND DisplayName = N'Brent Ozar';
If we ONLY have the gray pages index on LastAccessDate, Id, DisplayName, and Age, our query plan looks like this:
I’m going to narrate this from bottom up because it makes for easier storytelling:
- Seek Predicates: SQL Server was able to seek on this index to a specific date/time. It jumped to the first user whose LastAccessDate = 1800/01/01. That’s, uh, not going to eliminate a lot of users because after all, Stack Overflow wasn’t around in the days of covered wagons. Now, SQL Server has “seeked” to that user – but now the real work begins.
- Predicate: now SQL Server examines every row to check its DisplayName, looking for people with DisplayName = ‘Brent Ozar’.
- Number of Rows Read: 299,398 – because we had to examine every single row in the table to find Brent.
You see an Index Seek here, but it is not a lightweight operation. We read every single stinkin’ row of the table.
“Then why does SQL Server call it a seek?”
When SQL Server builds execution plans, it’s going on all the information it knows at the time – which isn’t necessarily much, and can also be out of date. We talked about how SQL Server uses statistics to build plans, but statistics aren’t updated in real time.
Sure, the last time SQL Server updated statistics on this table, it knew that nobody had a LastAccessDate prior to 1800/01/01 – but it can’t guarantee that the data distribution hasn’t changed since the last time statistics were updated. Hell, it even gives you the ability to disable statistics updates, so the stats might be WAY out of date.
Granted, you know that based on the column name “LastAccessDate,” that nobody is going to access StackOverflow.com in the past. The data just doesn’t work that way. But SQL Server doesn’t know that unless you tell it – and that’s where things like constraints come in, telling SQL Server business rules about your data.
SQL Server has to just say, “Well, I know that I have an index that starts with LastAccessDate, and they’re looking for a specific LastAccessDate range. I can jump to that range and bypass any extra data that’s been added since the last time statistics were updated.”
Technically, this query plan works totally fine – there’s nothing wrong with it. But there sure is something misleading about that execution plan, isn’t there? The operator says “Index Seek”, which implies rows are being eliminated, and the “1 of 3” makes it sound lightweight too. That thick arrow is your only clue that there’s more work involved here than first meets the eye.
Oh, that thick arrow size? On actual plans, it would look like it relates to the data coming out of that execution plan operator. That’s not what it means at all – on actual plans, it refers to the amount of data that was read by that operator. The thicker the arrow, the more work is being done.
So when you see a thick arrow on an actual plan, look in the direction that the arrow is pointing away from.
(How is that for messed up? Even the arrow is drawing your attention away from the very problem. I tell you, sometimes I think Microsoft does this stuff just to give me job security.)