When we left off in the last post, our users kept running this query, and they want it to be really fast:
WHERE LastAccessDate > '2014/07/01'
ORDER BY LastAccessDate;
Let’s pre-bake the data by creating a copy of the table sorted in a way that we can find the right rows faster:
CREATE INDEX IX_LastAccessDate_Id
ON dbo.Users(LastAccessDate, Id);
This builds a separate copy of our table (also stored in 8KB pages) that looks like this:
This index slows down inserts and deletes.
The first thing to notice is that you now have two physical copies of the table. Every time you insert a new row, you have to add it in two places. Every time you delete a row, you have to delete it in two places. You’ve effectively doubled the amount of writes that your storage has to do. (Updates are a little trickier – more on that in another post.)
The second thing to notice is that the nonclustered index (black page) is denser: you can fit more users per 8KB page on that because we chose to store less fields. You can see that by running sp_BlitzIndex and looking at the top result set, which shows how big each index is:
The clustered index (marked CX PK) has ~300K rows and takes up 58.1MB space on disk because it contains all the columns of the table.
The nonclustered index has the same number of rows, but it only takes up 5.3MB space because we’re only storing LastAccessDate and Id on it. The more columns you add to an index – whether they’re in the key, or in the includes – they make the object larger on disk. (I’ll talk more about index design as this series continues.)
But this index pays off dramatically for selects.
Try our select query again – and here, I’m running two queries back to back. I’m running the first query with INDEX = 1 as a hint to show the cost of the clustered index scan. (Index #1 is your clustered index.)
Our new bottom plan is much simpler:
- We get a seek, not a scan – because SQL Server is able to jump right to the rows where LastAccessDate > 2014/07/01. It doesn’t need to scan the whole object to find ’em.
- We don’t need a sort – because as we read the data out, it’s already sorted by LastAccessDate.
The index seek’s cost is lower for two reasons: it doesn’t need the sort, and it reads less 8KB pages. To see how much less, let’s check out the messages tab:
The clustered index scan read 7,405 pages and took 110ms of CPU time to do the ORDER BY.
The new nonclustered index only read 335 pages – like 20X less! – and didn’t need CPU time for ORDER BY. Really good index designs like this are how you can quickly get 20X (or much higher) improvements in your queries without rewriting the queries. Do I wish every query was perfectly tuned? Sure – but even if it was, you’re still gonna need to give the engine a helping hand by organizing the data in more searchable ways.
This index is called a covering index.
The index perfectly covers this query, satisfying everything it needs in a quick, efficient way. The term “covering” isn’t a special kind of index that you create with different syntax – “covering” just refers to the combination of this index AND this query. If you change the query at all – like we’re going to do here shortly – then the index may no longer be covering, and we’ll have to do more work again.
So our plan has an index seek – that’s as good as a plan can get, right? No, as we’ll see in the next episode.