This week, we’re all about tuning indexes. So far, we’ve covered Brent’s 5 and 5 Rule and The D.E.A.T.H. Method. Today, let’s talk about reducing blocking and deadlocking.
Normally, when we think about the causes of blocking or deadlocks, we use badly written queries that modify different tables in the wrong order, resulting in a Mexican standoff series of blocks. They don’t even have to be delete/update/insert (DUI) queries, either – even selects can win deadlocks. So we usually use sp_BlitzLock to look at which queries are involved in a deadlock, and think about how we can tune those queries to reduce blocking.
But before you go making extensive, expensive changes to queries, take a quick look at your indexes and ask, “Am I indexing a hot field?”
In my free How to Think Like the Engine class, I use the Stack Overflow dbo.Users table to demonstrate how clustered and nonclustered indexes work. Along the way, we constantly run queries like this:
1 2 3 4 |
SELECT * FROM dbo.Users WHERE LastAccessDate > '2014-07-01' AND DisplayName = 'Brent Ozar'; |
In that class, in order to make some queries go faster, we create an index:
1 |
CREATE INDEX IX_LastAccessDate ON dbo.Users(LastAccessDate); |
Sure, that makes the SELECT go faster – but it comes at a price. Say that every time a logged-in Stack Overflow user visits a page, our application updates their LastAccessDate:
1 2 3 |
UPDATE dbo.Users SET LastAccessDate = GETUTCDATE() WHERE Id = 26837; |
By itself, in the code we’ve shown so far, that isn’t really a big deal. Sure, we need to update the clustered index of dbo.Users to reflect their new LastAccessDate, and we also need to update the IX_LastAccessDate index – but who cares?
Indexing “hot” fields has interesting side effects.
People think that when you include a field in an index – not as part of a key, but just included – that it’s somehow magically delicious. Take this index, for example:
1 2 |
CREATE INDEX IX_Location ON dbo.Users(Location) INCLUDE (LastAccessDate, Reputation); |
We’re just including LastAccessDate and the user’s Reputation score – but we’re not sorting the data in that order, so it’s not a big deal, right? Wrong – every time either LastAccessDate or Reputation is updated, we have to:
- Get the IX_Location page in memory
- Lock the row for this user
- Make the change to LastAccessDate and/or Reputation
- Commit and release our lock (only after we’ve changed all of the relevant data pages)
- Write the data page to disk
And I’m not even including the transaction log work, just simply the changes to the data page! It’s not a lightweight change – it’s a big deal. To make matters worse, imagine a setup with 5 different indexes on dbo.Users, and we’ve decided to include the LastAccessDate and Reputation fields on every index. We have to lock & write all of ’em every time any of these fields change!
This blocks queries that might have otherwise been able to use the Location index, and didn’t even need LastAccessDate or Reputation – just a seek on Location – but they’re not allowed to make progress on rows where we’re modifying LastAccessDate or Reputation.
Depending on our workloads, we might be better off making a rule to say, “Never include LastAccessDate or Reputation in any index – if you need that, you need to do a key lookup, because we can’t afford the penalty of updating it across lots of indexes.”
SQL Server doesn’t tell you whether a column is hot or not.
When you get missing index recommendations in query plans or in the DMV, SQL Server will happily tell you to index hot fields – fields that are constantly updated. There’s no way it can flag problem fields for you.
That’s where your own business knowledge has to come in. That’s why, when I talk about how many servers one person can manage, I say that if you’re going to go really deep in index performance tuning, there are only so many applications you can tune. You’re going to need business knowledge about the app.
As your application workload grows, it can be hard to strike the right balance of indexes. You want enough indexes to make your read queries fast, but not so many that they slow down your DUI queries. When you start to hit a wall, it’s time to learn to master index tuning.
9 Comments. Leave new
re : SQL Server doesn’t tell you whether a column is hot or not
– isn’t the information in sys.dm_db_index_operational_stats aimed at answering this question ?
Stephen – think through that a little deeper. If I have an index with 3 key fields and 4 included fields, and that index is getting a lot of writes, how can I tell which field(s) are changing?
modification_counter in sys.dm_db_stats_properties ? Going to be a fun query to fetch the columns in the index and then work out which statistic relates to which column though (lets ignore multi column statistics )
Exactly – that ain’t gonna work, heh heh heh. Not all columns are going to have stats. Fun thought exercise though!
Something like this but obviously you get odd semi-cartesian joins when there are multiple columns in indexes or multiple columns in statistics plus I haven’t even thought about partitioning.
SELECT OBJECT_SCHEMA_NAME(ixs.object_id), OBJECT_NAME(ixs.object_id), *
FROM sys.dm_db_index_operational_stats(DB_ID(),OBJECT_ID(‘Fact.f_AdsPipeline’), NULL, NULL) AS ixs
JOIN sys.index_columns AS ixc ON
ixs.object_id = ixc.object_id AND
ixs.index_id = ixc.index_id
LEFT OUTER JOIN sys.stats_columns AS sc ON
ixc.object_id =sc.object_id AND
ixc.column_id = sc.column_id
CROSS APPLY sys.dm_db_stats_properties(sc.object_id, sc.stats_id)
ORDER BY ixc.index_id, ixc.column_id
Looks like a good start! I’m curious to see where you go with it. That’d be an awesome foundation for an open source script!
Hey Stephen,
Did you ever take that script anywhere?
Best,
Doug
Interesting. I modified the code to below – then look for occurrences of a column in an index, as well as statistics for just the column, where the sample counts deviate. I wonder which statistic the engine looks at when determining what to do if the numbers vary?
This knocks out MS stuff as well as heaps. When you see a difference and rebuild the stats – I do notice it clears the modification_counter column in sys.dm_db_stats_properties(sc.object_id, sc.stats_id).
SELECT OBJECT_NAME(ixs.object_id) AS [Table], SI.Name AS [Index], SC.Column_ID, SD.*
FROM sys.dm_db_index_operational_stats(DB_ID(),OBJECT_ID(‘Fact.f_AdsPipeline’), NULL, NULL) AS ixs
INNER JOIN sys.index_columns AS ixc ON ixs.object_id = ixc.object_id AND ixs.index_id = ixc.index_id
INNER JOIN sys.all_objects AS AO ON ixc.object_id = AO.object_id AND AO.is_ms_shipped = 0 AND AO.type = ‘U’
INNER JOIN sys.indexes AS si ON ixc.OBJECT_ID = si.OBJECT_ID AND ixc.index_id = si.index_id AND SI.type_desc ‘HEAP’
LEFT OUTER JOIN sys.stats_columns AS sc ON
ixc.object_id =sc.object_id AND
ixc.column_id = sc.column_id
CROSS APPLY sys.dm_db_stats_properties(sc.object_id, sc.stats_id) AS SD
–WHERE OBJECT_NAME(ixs.object_id) = ‘Single Table filter’
ORDER BY OBJECT_NAME(ixs.object_id), SI.Name;
Taking it to the Brent level – this will generate a script to update the statistics for any of the columns with changes.
SELECT OBJECT_SCHEMA_NAME(ixs.object_id) AS [Schema], OBJECT_NAME(ixs.object_id) AS [Table], SI.Name AS [Index], SC.Column_ID, SD.*,
‘UPDATE STATISTICS ‘ + OBJECT_SCHEMA_NAME(ixs.object_id) + ‘.’ + OBJECT_NAME(ixs.object_id) + ‘ ‘ + ST.Name + ‘ WITH FULLSCAN;’ AS Script–ixc.*–OBJECT_SCHEMA_NAME(ixs.object_id), OBJECT_NAME(ixs.object_id), *
FROM sys.dm_db_index_operational_stats(DB_ID(),OBJECT_ID(‘Fact.f_AdsPipeline’), NULL, NULL) AS ixs
INNER JOIN sys.index_columns AS ixc ON ixs.object_id = ixc.object_id AND ixs.index_id = ixc.index_id
INNER JOIN sys.all_objects AS AO ON ixc.object_id = AO.object_id AND AO.is_ms_shipped = 0 AND AO.type = ‘U’
INNER JOIN sys.indexes AS si ON ixc.OBJECT_ID = si.OBJECT_ID AND ixc.index_id = si.index_id AND SI.type_desc ‘HEAP’
LEFT OUTER JOIN sys.stats_columns AS sc ON
ixc.object_id =sc.object_id AND
ixc.column_id = sc.column_id
CROSS APPLY sys.dm_db_stats_properties(sc.object_id, sc.stats_id) AS SD
INNER JOIN sys.stats AS ST ON sc.object_id = ST.object_ID AND SD.stats_id = ST.stats_ID
–WHERE OBJECT_NAME(ixs.object_id) = ”
WHERE ISNULL( SD.Modification_counter, 0 ) 0
ORDER BY OBJECT_NAME(ixs.object_id), SI.Name;
Cut out and run the ‘Script’ column and your stats are updated for the columns which have had unprocessed changes.
To get fancy, you could insert into a temp table and then execute the commands as a set.