Performance Tuning

Tales From Overindexing: Too Many One Column Indexes

By Erik Darling · November 29, 2018 · 8 comments

Master Plan

Sometimes you see a query, and it’s hitting one table, and then the query plan looks like a vase full of spaghetti.

Usually, there’s a view involved.

Usually.

Sometimes, there’s just really weird indexing.

Here’s A Thing, Look At It

This is a query against one table. No views, no functions, no triggers.

How To Think Like An Index

When most people think about the way a query can use indexes, they usually think it’s limited to one, or sometimes two, if you do a key lookup. And most of the time, that’s about what happens.

A key lookup is when SQL Server gives your nonclustered index inadequacy issues. It’s basically saying “I’m busy Saturday, let’s hang out on Tuesday.”

If we have this index, and this query, we’ll get a Key Lookup.

CREATE INDEX ix_whatever ON dbo.Posts(ClosedDate, AnswerCount, AcceptedAnswerId);

SELECT *
FROM   dbo.Posts AS p
WHERE  p.AcceptedAnswerId > 0
AND    p.AnswerCount > 0
AND    p.ClosedDate >= '20100101';

1

2

3

4

5

6

7

CREATE INDEX ix_whatever ON dbo.Posts(ClosedDate, AnswerCount, AcceptedAnswerId);

SELECT *

FROM dbo.Posts AS p

WHERE p.AcceptedAnswerId > 0

AND p.AnswerCount > 0

AND p.ClosedDate >= '20100101';

A Key Lookup is basically a join between the clustered index and nonclustered index. This is one reason why clustered index key columns are in all your nonclustered indexes, and you should be really careful how you choose your clustered index key columns.

That Seek Predicate down the bottom is the join relationship between the clustered and nonclustered indexes, which is the clustered index key column.

For every row that comes out of the nonclustered index, we grab the associated key column value, and join it to the clustered index to produce the columns that aren’t in the nonclustered index.

You can’t control what kind of join gets used here (though that would be neat, maybe).

This is the most common, but not the only way that SQL Server can use multiple indexes.

What Happens With A Bunch Of Narrow Indexes?

Like, say, one on almost every column in Posts, but with only one column in each index.

CREATE INDEX ix_posts_AcceptedAnswerId ON dbo.Posts(AcceptedAnswerId);
CREATE INDEX ix_posts_AnswerCount ON dbo.Posts(AnswerCount);
CREATE INDEX ix_posts_ClosedDate ON dbo.Posts(ClosedDate);
CREATE INDEX ix_posts_CommentCount ON dbo.Posts(CommentCount);
CREATE INDEX ix_posts_CommunityOwnedDate ON dbo.Posts(CommunityOwnedDate);
CREATE INDEX ix_posts_CreationDate ON dbo.Posts(CreationDate);
CREATE INDEX ix_posts_FavoriteCount ON dbo.Posts(FavoriteCount);
CREATE INDEX ix_posts_LastActivityDate ON dbo.Posts(LastActivityDate);
CREATE INDEX ix_posts_LastEditDate ON dbo.Posts(LastEditDate);
CREATE INDEX ix_posts_LastEditorDisplayName ON dbo.Posts(LastEditorDisplayName);
CREATE INDEX ix_posts_LastEditorUserId ON dbo.Posts(LastEditorUserId);
CREATE INDEX ix_posts_OwnerUserId ON dbo.Posts(OwnerUserId);
CREATE INDEX ix_posts_ParentId ON dbo.Posts(ParentId);
CREATE INDEX ix_posts_PostTypeId ON dbo.Posts(PostTypeId);
CREATE INDEX ix_posts_Score ON dbo.Posts(Score);
CREATE INDEX ix_posts_Tags ON dbo.Posts(Tags);
CREATE INDEX ix_posts_ViewCount ON dbo.Posts(ViewCount);

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

CREATE INDEX ix_posts_AcceptedAnswerId ON dbo.Posts(AcceptedAnswerId);

CREATE INDEX ix_posts_AnswerCount ON dbo.Posts(AnswerCount);

CREATE INDEX ix_posts_ClosedDate ON dbo.Posts(ClosedDate);

CREATE INDEX ix_posts_CommentCount ON dbo.Posts(CommentCount);

CREATE INDEX ix_posts_CommunityOwnedDate ON dbo.Posts(CommunityOwnedDate);

CREATE INDEX ix_posts_CreationDate ON dbo.Posts(CreationDate);

CREATE INDEX ix_posts_FavoriteCount ON dbo.Posts(FavoriteCount);

CREATE INDEX ix_posts_LastActivityDate ON dbo.Posts(LastActivityDate);

CREATE INDEX ix_posts_LastEditDate ON dbo.Posts(LastEditDate);

CREATE INDEX ix_posts_LastEditorDisplayName ON dbo.Posts(LastEditorDisplayName);

CREATE INDEX ix_posts_LastEditorUserId ON dbo.Posts(LastEditorUserId);

CREATE INDEX ix_posts_OwnerUserId ON dbo.Posts(OwnerUserId);

CREATE INDEX ix_posts_ParentId ON dbo.Posts(ParentId);

CREATE INDEX ix_posts_PostTypeId ON dbo.Posts(PostTypeId);

CREATE INDEX ix_posts_Score ON dbo.Posts(Score);

CREATE INDEX ix_posts_Tags ON dbo.Posts(Tags);

CREATE INDEX ix_posts_ViewCount ON dbo.Posts(ViewCount);

And then a query that kinda sorta does some self-like joins with some specific predicates.

SELECT *
FROM dbo.Posts AS p
WHERE p.AcceptedAnswerId > 0
AND p.AnswerCount > 0
AND p.ClosedDate >= '20100101'
AND p.OwnerUserId IN
(
    SELECT p2.OwnerUserId
    FROM dbo.Posts AS p2
    WHERE p2.CommentCount > 0
    AND p2.CommunityOwnedDate IS NULL
    AND p.FavoriteCount = p2.FavoriteCount
    AND p.ViewCount = p2.ViewCount
)
AND p.CreationDate IN
(
    SELECT p3.CreationDate
    FROM dbo.Posts AS p3
    WHERE p3.LastActivityDate >= '20100101'
    AND p3.LastEditDate >= '20100101'
    AND p3.LastEditorDisplayName > ''
    AND p.LastEditorUserId = p3.LastEditorUserId
)
AND p.ParentId IN 
(
    SELECT p.ParentId
    FROM dbo.Posts AS p4
    WHERE p.PostTypeId = p4.PostTypeId
    AND p.Score = p4.Score
    AND p.Tags LIKE p4.Tags + '%'
);

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

SELECT *

FROM dbo.Posts AS p

WHERE p.AcceptedAnswerId > 0

AND p.AnswerCount > 0

AND p.ClosedDate >= '20100101'

AND p.OwnerUserId IN

(

SELECT p2.OwnerUserId

FROM dbo.Posts AS p2

WHERE p2.CommentCount > 0

AND p2.CommunityOwnedDate IS NULL

AND p.FavoriteCount = p2.FavoriteCount

AND p.ViewCount = p2.ViewCount

)

AND p.CreationDate IN

(

SELECT p3.CreationDate

FROM dbo.Posts AS p3

WHERE p3.LastActivityDate >= '20100101'

AND p3.LastEditDate >= '20100101'

AND p3.LastEditorDisplayName > ''

AND p.LastEditorUserId = p3.LastEditorUserId

)

AND p.ParentId IN

(

SELECT p.ParentId

FROM dbo.Posts AS p4

WHERE p.PostTypeId = p4.PostTypeId

AND p.Score = p4.Score

AND p.Tags LIKE p4.Tags + '%'

);

Unnatural Selection

You end up with the query plan I showed you earlier.

What’s happening here is called index intersection.

It’s pretty rare to see it for a variety of reasons. Most people don’t create a single column index on every eligible column of the table. There’s a special place in hell for people who do that.

By hell, I mean a soup restaurant in a Faraday Cage, with no liquor license, and live comedy.

It’s sort of an expensive process, joining a bunch of nonclustered indexes together, so the optimizer has to really think it’s a worthwhile strategy. You’re talking about reading N number of separate objects, hoping they’re not locked, joining all those objects together, etc.

Remember that you’re probably not joining any of these indexes together on their leading key column. Many of the join types are hash joins, or merge joins that require sorts. Both of those things will drive up the query memory grant.

In all, this query uses 10 nonclustered indexes, and does three separate key lookups back to the clustered index. You could really cut down on the amount of joining, sorting, and hashing, by adding some composite indexes that let our query get all its data from a single source.

Thanks for reading!

Brent says: this blog post stems from a query we saw in the wild leveraging something like 8 different indexes on the same table – even though the table was only specified twice in the FROM clause! I was so impressed by the sheer number of indexes that I said we should blog about it, except that it would probably take days of experimenting to make this happen with the Stack Overflow database. And of course Erik did it that day.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

8 comments

Psalm

November 29, 2018 at 11:56 am

Excellent article. This came just in time i was looking at similar query/ issue. Thank you.

Reply
1. Erik Darling
  
  November 30, 2018 at 7:26 am
  
  Psalm — always glad to be timely.
  
  Reply
Forrest

November 29, 2018 at 2:56 pm

Both horrible and beautiful. Nice repro!

Reply
1. Erik Darling
  
  November 30, 2018 at 7:29 am
  
  Forrest — yes, like medium rare chicken.
  
  Reply
Peter Vandivier

November 29, 2018 at 4:46 pm

How timely! I just linked this on a running PR where I’m begging the devs to cut down on single-column indexes. Left out the color commentary though…

Reply
1. Erik Darling
  
  November 30, 2018 at 7:29 am
  
  Peter — that’s fine. It’s a black and white issue, anyway.
  
  Maybe.
  
  Reply
James Smith VIP Student since 2017

November 30, 2018 at 7:25 am

“It’s pretty rare to see it for a variety of reasons. Most people don’t create a single column index on every eligible column of the table. There’s a special place in hell for people who do that.”

When I first joined the company I work at, they had a process in the application that a developer could press that would add an index for every column of every table in all databases. That is the first process I removed from the system and I am still cleaning up databases to this day because of it.

Reply
1. Erik Darling
  
  November 30, 2018 at 7:30 am
  
  James — yes, job security is a wonderful thing.
  
  Reply