I see a lot of indexes
Most of them are pretty vanilla. Some are pretty memorable, like one that I saw with 93 INCLUDES. A lot of things that people could do with indexes get overlooked, either because they don’t know about them, or they’re not sure when they’ll help. So, here goes my attempt at shedding light on some of my favorite indexing tricks. This is by no means exhaustive, just a few pointers in the right direction.
Unique Indexes
Uniqueness is one of the most important pieces of information you can give to the optimizer. If you have faith in your data, I highly recommend testing this out. It really helps with row estimates. Imagine you have a query that wants IDs 1-100. It’s much easier for the optimizer to come up with a good plan if it knows it will only see each of those IDs once. The impact gets bigger as the number of rows you’re hitting grows. An additional upside is that you can use columns in unique indexes in foreign keys. This can be especially helpful if you’re working with hierarchies. To create a unique index, the syntax is simple. Of course, you can use Included columns, and you can also create them as…
1 |
CREATE UNIQUE NONCLUSTERED INDEX ix_happydays ON dbo.Table (column(s)... ) |
Filtered Indexes
Are you one of those crazy people who doesn’t write SELECT * FROM dbo.Table queries constantly? Do you ever use those crazy little things called WHERE clauses? Filtered indexes can really help! Sometimes! There are rules, of course. You can’t use certain operators, like BETWEEN, but you can use all the equalities and inequalities (=, >, >=, <, <=, <>). You can’t use NOT IN or CASE. You also can’t use OR, but you can use IN. Weird, right?
And no, you can’t use date functions to have constantly shifting 30/60/90 day windows. Not allowed. If you want to do that, you’ll either have to drop and re-create the index with the correct window, or put the filtered index on a BIT column that you flip to 1/0 with an UPDATE to capture the window you want. Then you filter on the BIT column rather than the date column.
Also, if you use any non-default ANSI SET options, you may see failing INSERT/DELETE statements. Which ones?
ON:
- SET ANSI_NULLS ON
- SET ANSI_PADDING ON
- SET ANSI_WARNINGS ON
- SET ARITHABORT ON
- SET CONCAT_NULL_YIELDS_NULL ON
- SET QUOTED_IDENTIFIER ON
OFF:
- SET NUMERIC_ROUNDABORT OFF
Creating filtered indexes is really simple, but you’ll definitely to at least put whichever column(s) you’re filtering on as INCLUDEs. It can give SQL an extra leg up when searching within the filter definition. Boy, this list keeps going downhill. It’s…
1 |
CREATE INDEX ix_VirginiaSlims ON dbo.Table (column(s)...) INCLUDE (filtercol) WHERE filtercol some expression |
Descending
Man, sorting sucks. Sorting really sucks. I hate it. Not like, within a query necessarily, but at the end. Whether it’s via ORDER BY or OFFSET/FETCH. I just get annoyed when I see it. Even worse, I get annoyed when I see expensive SORT operators. One thing a lot of people don’t realize is that SQL hates sorting, too. That’s why it makes it so miserably expensive! But you crazy kids rarely pay attention to reverse incentives, so let’s talk about this situation, and how indexes can help.
If your order by calls for DESC, and you want to make it as painless as possible for SQL Server, that’s how you should define your index. One sort of tricky bit here is that when you index to support sorts, you really have to put the ordering columns all the way left in your index definition. You’ll have to consider any JOIN or WHERE predicates second. It can absolutely be worth it, especially if sorts are routinely spilling to tempdb. Sorting strings is especially painful, I can’t figure out why you’d want that even if I had a…
1 |
CREATE INDEX ix_downwardspiral ON dbo.Table (SortaKinda DESC) |
Computed Column
These have rules too! Boo! Rules! The same rules as filtered indexes for ANSI SET options, actually. There are also rules about when they can be PERSISTED. This is a cool feature, because you don’t need to compute the column at query time. It will get computed and stored on disk like every other column. The flip side of this is that it can slow down table modifications if your computation is expensive.
In short, a computed column can’t be persisted if the output would change between runs. The simplest example is GETDATE(). You can’t persist a computed column with GETDATE() in it because it will change from run to run. Keeping it up to date at the storage level would be miserable. You can’t modify a computed column directly, because that wouldn’t make any sense. It’s computed. That means you don’t compute it.
But they’re cool in that they can be used in all kinds of constraints! PK, NOT NULL, and CHECK, to be specific. But this is about indexes, and some types of computed columns can be indexed for even better performance gains. Lucky you! Here are some examples of what can be added as a persisted column, and what can be indexed.
1 2 3 4 5 6 7 8 9 |
ALTER TABLE dbo.SalesOrders ADD F1 AS DATEADD(DAY, 1, OrderDate) /*Works*/ ALTER TABLE dbo.SalesOrders ADD F2 AS DATEADD(DAY, 1, GETDATE()) /*Works*/ ALTER TABLE dbo.SalesOrders ADD F3 AS DATEADD(DAY, 1, OrderDate) PERSISTED /*Works*/ ALTER TABLE dbo.SalesOrders ADD F4 AS DATEADD(DAY, 1, GETDATE()) PERSISTED /*Fails*/ CREATE INDEX ix_filter1 ON dbo.SalesOrders (F1) /*Works*/ CREATE INDEX ix_filter2 ON dbo.SalesOrders (F2) /*Fails*/ CREATE INDEX ix_filter3 ON dbo.SalesOrders (F3) /*Works*/ CREATE INDEX ix_filter4 ON dbo.SalesOrders (F4) /*Fails because the column add failed*/ |
One thing to be extra careful about is using scalar UDFs in computed columns, which used to be a popular way to break out…
XML
You’re all using XML. All of you. And you’re doing crazy things with it. You’re using XQuery like it’s going… Well, it is going out of style.
All the cool kids use JSON now. Didn’t you get the invitation to the party? No? Weird. Like you.
XML indexes can help make things faster! But there are, of course, more rules. Rules you must follow, lest you be henpecked to death by red text in SSMS. The table must have a Primary Key on it. The Primary XML index demands it. Which brings us to the next rule! You need to define a Primary XML index! There are also Secondary XML indexes, which can be created by PATH, VALUE, and PROPERTY. As you can guess, they all have different use cases. I’ve never been able to nail down a use case for PROPERTY, but PATH and VALUE can help with the .exist() method, depending on how you’re querying the XML.
And of course, if you’re on SQL Server 2012+, you can use Selective XML indexes! These don’t require a Primary XML index, but do require you to specify all the paths/values/nodes you want to index. The benefit is that they’re much smaller than other XML indexes, and from the little I’ve gotten to play with them, a lot faster, too. If you’re interested, I suggest you set aside plenty of time to figure out all the syntax possibilities, as well as the limitations and guidelines. I’ll follow up with a post about them in the near future.
So how are you doing?
Are you using indexes to their full potential, or are you just creating the same ol’ same ol’? Don’t get me wrong, good same ol’ same ol’ indexing can work wonders, but sometimes you need to reach a little deeper to correct a problem.
I left out ColumnStore and the different Compressed indexes here because they’re Enterprise only, not because I don’t like either as a feature. ColumnStore especially has come a long way in 2016.
Thanks for reading!
Brent says: indexes are the single biggest technique to make your database faster with as little code change and expense as possible. You can go an incredibly long way just by working with indexes, and when you think you’ve exhausted your possibilities, there’s almost always more new tricks to discover.
10 Comments. Leave new
Indexes descending really.. would you not just create tons of fragmentation when using indentity() keys and wouldn’t double linked pages solve an Order by desc on an ascending index??
Hi Kasper,
Who cares about index fragmentation? C’mon now.
And no, double linked pages don’t solve the problem. Scanning an index backwards can’t be parallelized, and may require additional memory grants, and cause spills to disk to sort the data the way you’re asking for it.
Multiple punctuation marks for emphasis,
Erik
Hi Erik, can you elaborate on the filtered indexes part where you say “if you use any non-default ANSI SET options, you may see failing INSERT/DELETE statements.”, please?
Are these options referring to when creating the index, or in the statements which fire off the inserts/deletes?
I once encountered this error and didn’t find a lot of info on the subject.
Thanks!
Hi Sam,
Sure, it’s for the statement that fires the modification off.
For the documentation, head over here, then ctrl + f for ‘Required SET Options for Filtered Indexes’
Thanks!
Thanks for the link!
The ability of unique indexes to have include columns and the inability for unique constraints to have them is why I insist that my developers always use the former instead of the latter. Even if you don’t need include columns at first, you might later, and it’s much easier to do a CREATE INDEX WITH DROP_EXISTING than to script out a DROP of the constraint and then a CREATE of the replacement index.
Hi, Erik!
Looking into https://msdn.microsoft.com/en-us/library/ms177484%28v=sql.105%29.aspx I’ve found this: “Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.” Does it mean that every node of the index contains row locator to the data row? It seems that it is in contradiction with B+ tree definiton or am I completely in the fog?
How else would you implement key or rid lookups without doing this?
I assume the clustered index key or RID lives in the leaf level only.
This is correct for unique indices only. With nonunique indices SQL Server keeps clustered index key or RID on any level to unqiquefy every node. I think we don’t need it for select purpose but we do need it for update or delete purposes.