In our last couple of queries, we’ve been using a simple query to find the Ids of everyone who accessed the system since mid-2014:
1 2 3 4 |
SELECT Id FROM dbo.Users WHERE LastAccessDate > '2014-07-01' ORDER BY LastAccessDate; |
But Ids alone aren’t all that useful – so let’s add a few more columns to our query:
1 2 3 4 |
SELECT LastAccessDate, Id, DisplayName, Age FROM dbo.Users WHERE LastAccessDate > '2014-07-01' ORDER BY LastAccessDate; |
Now think about how you’re going to execute this query plan in plain English, as a human being. You have two copies of the table: the nonclustered index (black pages) with LastAccessDate, Id:
And the clustered index (white pages) with all of the columns in the table:
One way is to use the nonclustered index first.
We could:
- Grab the black pages, seek to 2014-07-01, and start making a list of the LastAccessDates and Ids we want
- Grab the white pages, and for each Id we found in the black index, look up their Id in the white pages to get the extra columns (DisplayName, Age) that aren’t included in the white index
Here’s what that query plan looks like:
We read plans from right to left, but also top to bottom. The first thing SQL Server did was the Index Seek at the top right – that’s the seek on the black pages. SQL Server seeked to a specific date/time, read out the LastAccessDate & Id for each row, and that was the end of the Index Seek’s work. (He’s a totally separate standalone mini-program.)
Then, SQL Server took that list of Ids, and looked up each one of them on the white pages – that’s the Key Lookup operation. SQL Server used their clustering key – in this case, Id, because that’s what we built our clustered index on. (If you don’t build a clustered index, this same basic process still happens – but more on that in another episode.)
That’s why every nonclustered index includes the clustering keys.
When I first told you I created the black index, I said I did this:
1 2 |
CREATE INDEX IX_LastAccessDate_Id ON dbo.Users(LastAccessDate, Id); |
But I didn’t have to. I could have just done this:
1 2 |
CREATE INDEX IX_LastAccessDate ON dbo.Users(LastAccessDate); |
And we would have ended up with the same basic black pages. SQL Server adds the clustering keys into every nonclustered index because it has to be able to accomplish these key lookups. For every row it finds in a nonclustered index, it has to be able to track back to that one unique row in the clustered index.
I just told you I created the index on both columns in the beginning because I had to ease you into this. I didn’t wanna say anything, but you weren’t particularly with it that first morning you started reading this series.
Yes, I’m still holding things back from you right now even as I type this. No, I didn’t want to say that because it’s only going to make things awkward between us. Also, you’re not my only reader. You’re still the one I love the most, though. Now let’s just move on before you ask more questions.
Key Lookup operators obfuscate a lot of details.
I wish execution plans were three-dimensional: I wish operators popped off the page based on the number of times they were executed. You see that “Key Lookup” and you think it only happened once, but that’s not true at all. Hover your mouse over it, and you’ll see Number of Executions – that’s how many times it was actually performed:
Earlier, I wrote that when Microsoft paints stuff on a screen, they either put it in alphabetical order, or they load up the Data Cannon, fire it, and wherever the data goes, that’s where it ends up. Here, this tooltip was arranged by the Data Cannon, so we’re going to jump around a little.
- Estimated Number of Executions is based on how many rows SQL Server thought was going to come out of the Index Seek. For each row we found, we’re gonna have to execute one Key Lookup.
- Estimated Number of Rows – how many rows EACH key lookup will return.
- Number of Executions – how many times we actually did it, based on how many rows actually came out of the index seek.
- Actual Number of Rows – the total number of rows that came out of ALL key lookups (not each one.)
I know. Sometimes they say Actual, sometimes they don’t. Sometimes the numbers are per-execution, sometimes they’re total. If you’re looking for consistency and organization, you don’t use the Data Cannon. I’m sure they treat the internals of SQL Server with much more accuracy and care.
The cost of the key lookup is logical reads.
Each time we execute the key lookup – in this case, 1,576 times – we’re opening up the white pages and doing some logical reads. To see it, let’s run a no-lookup query (that only gets Id) and a multi-column query (that gets DisplayName and Age too) and compare their logical reads:
The top query only does 7 reads, and the bottom one does 4,842! Those 1,576 key lookups needed to do about 3 logical reads each. (Why not just 1 read each? Because they need to navigate a tree structure of pages to find exactly the right page that has the data they want – there’s a couple/few pages of overhead each time you need to seek in an index. We’ll talk about alternatives to that later too.)
The more rows your index seek returns, the more likely it is that you won’t get this execution plan at all. Notice that in this post, I didn’t use 2014-07-01 as my filter date – I used something much more recent. You’ll learn why in the next episode.
9 Comments. Leave new
The ‘Think Like the SQL Engine’ series are fantastic. Great work Brent.
Awww, thanks sir!
Totally agree. You explain complex things in such a clear and concise manner and I feel I have learned a lot from your articles/emails.
Thanks, Robin! I have such a good time writing these, and it’s incredibly fulfilling knowing that other folks are getting something out of it.
[…] How to Think Like the SQL Server Engine: What’s a Key Lookup? One of the episodes of awesome series from Brent Ozar. […]
Hey Brent! Is 3 logical reads per key lookup dependent on the B-Tree structure or is 3 logical reads pretty much standard per key lookup?
Thanks!
I’ll answer that with a question: if you were going to check your own tables to find out, how might you construct an experiment to find out?
Related: https://ozar.me/2017/01/the-two-best-things-paul-randal-taught-me/
I’ll do you one better… I paid for your Mastering Index Tuning class and watched the module on avoiding key lookups and residual predicates and found my answer there lol.
There you go!