Key length matters in SQL Server indexes.
It’s a best practice to keep your index keys as narrow as possible, and SQL Server enforces a maximum key length of 900 bytes on most “normal” clustered and nonclustered indexes.
But what happens if you want to optimize the lookup of a wide column? You’re not necessarily out of luck, you may just have to get a bit creative.
What If I Need to do an Equality Search on a Wide Column?
Let’s say I have a simple table. I have a narrow key on my clustered index and then I have a pretty wide variable length column. I need the wide column to be unicode, which makes it even wider, since unicode data types take up more room.
Here’s our sample table with a few rows (just pretend it has a lot more):
CREATE TABLE dbo.LookupValues ( i int identity, bigval nvarchar(2000) default (REPLICATE('d',700)), constraint pk_LookupValues_i primary key (i) ); GO --Insert rows with the default values begin tran declare @i smallint = 0; while @i < 10000 begin insert dbo.LookupValues default values; set @i=@i+1; end commit GO --Insert a few smaller values insert dbo.LookupValues (bigval) VALUES ('big'); insert dbo.LookupValues (bigval) VALUES ('bunny'); insert dbo.LookupValues (bigval) VALUES ('bunny bunny'); GO
Let’s say we write to this table rarely, but query it often. When this query runs, I want to make it as fast as possible:
SELECT i from dbo.LookupValues where bigval = N'bunny';
Right now, this query has to scan every row in the clustered index (the whole table) to find instances where bigval=N’bunny’. That’s not ideal, and as the table grows it’ll become worse and worse, burning more IO and CPU, and taking longer over time.
There’s usually an easy way to make a query like this fast: just create a nonclustered index on the bigval column. But when I try, it doesn’t work because of restrictions on key size.
--Make my query faster! CREATE NONCLUSTERED INDEX ix_LookupValues_bigval on dbo.LookupValues (bigval); GO
Warning! The maximum key length is 900 bytes. The index 'ix_LookupValues_bigval' has maximum length of 4000 bytes. For some combination of large values, the insert/update operation will fail. Msg 1946, Level 16, State 3, Line 1 Operation failed. The index entry of length 1400 bytes for the index 'ix_LookupValues_bigval' exceeds the maximum length of 900 bytes. The statement has been terminated.
Terminated. Yeah. I can’t just index this to make my query fast.
Options for Indexing Wide Keys
So what’s a performance tuner to do?
My first thought when I hit this problem was that I might have to use a fulltext index. A fulltext index can work here– it lets you index large columns, but it would be kind of a bummer to have to do it. Fulltext indexes have extra overhead and are really designed for different things than doing a simple equality search, so it would be like using a jackhammer because you can’t find a mallet.
My partner Jeremiah Peschka came up with a quick and clever solution using an indexed computed column. You can work all sorts of cool magic with computed columns in SQL Server– the trick is just to remember them!
Here’s how it works: you add a computed column to the table that’s the hash of the large value. You then index the computed column and modify your query to take advantage of it.
In this example we use SHA_512 for the hashing algorithm. This will give an output of 64 bytes– well within our limits for index key sizes.
ALTER TABLE dbo.LookupValues ADD bigvalhash AS HASHBYTES('SHA2_512', bigval) PERSISTED; GO CREATE NONCLUSTERED INDEX ix_LookupValues_bigvalhash on dbo.LookupValues (bigvalhash) INCLUDE (bigval); GO
Now, to get the query work, we need to change it a bit:
SELECT i from dbo.LookupValues where bigvalhash = HASHBYTES('SHA2_512', N'bunny') and bigval = N'bunny'; GO
This revised approach gives me a targeted index seek and limits my logical reads. Voila!
The Fine Print on This Solution
There are a few things to note:
- HASHBYTES results are dependent upon datatype. If my query used HASHBYTES(‘SHA2_512′, ‘bunny’), it would not find any rows, because the column is hashed unicode values and I provided a hashed non-unicode value.
- I do still include “bigval= N’bunny’” in my query. In theory there shouldn’t be collisions with SHA-512, but it doesn’t add much expense to the query and in my example I deemed it “worth it” to me. You might make a different choice.
Sometimes Old Tools Do the Trick
What I love most about this solution is that it’s creative, but it’s not really weird, when you think about it. It uses standard features that have been in SQL Server for a long time to create a way to do something that seems like the product wouldn’t support– and that’s really cool.