Is leading an index with a BIT column always bad?

Last Updated March 12, 2018

“Throughout history, slow queries are the normal condition of man. Indexes which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people who don’t think bit columns are selective enough to lead index keys. Whenever this tiny minority is kept from creating indexes, or (as sometimes happens) is driven out of a SCRUM meeting, the end users then slip back into abject query performance.

This is known as “Business Intelligence.”

–Bobby Q. Heinekens

Fake quotes and people aside

Let’s look at a scenario where you have a BIT column that’s fairly selective, and perhaps the rest of your predicates are ranged. This isn’t so out of the ordinary, especially because people like to know when stuff happened and how many times it happened.

“How many lunches have I eaten today?”

“Where did that bear learn to drive?”

“Am I being slowly disassembled on a molecular level by millions of tiny black holes?”

Yes, China or Florida, and Probably!

So let’s figure this out quick

WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT  
        IDENTITY (BIGINT, 1,1) AS ID , 
    ABS(CHECKSUM(NEWID()) / 100000000) + 1 AS CustomerID,
        ISNULL(CONVERT(DATE, DATEADD(MINUTE, -N.N, GETDATE())),     '1900-01-01') AS OrderDate ,
        CASE WHEN N.N % 19 = 0 THEN 1 ELSE 0 END AS isBit
INTO NotAshleyMadisonData
FROM    Numbers N
ORDER BY OrderDate;

ALTER TABLE NotAshleyMadisonData ADD CONSTRAINT PK_NotAshleyMadisonData PRIMARY KEY CLUSTERED (ID) WITH (FILLFACTOR = 100);

CREATE NONCLUSTERED INDEX IX_BITFIRST ON dbo.NotAshleyMadisonData
(isBit, OrderDate);

CREATE NONCLUSTERED INDEX IX_DATEFIRST ON dbo.NotAshleyMadisonData
(OrderDate, isBit);

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

WITH E1(N) AS (

SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL

SELECT NULL ),

E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),

Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)

SELECT

IDENTITY (BIGINT, 1,1) AS ID ,

ABS(CHECKSUM(NEWID()) / 100000000) + 1 AS CustomerID,

ISNULL(CONVERT(DATE, DATEADD(MINUTE, -N.N, GETDATE())), '1900-01-01') AS OrderDate ,

CASE WHEN N.N % 19 = 0 THEN 1 ELSE 0 END AS isBit

INTO NotAshleyMadisonData

FROM Numbers N

ORDER BY OrderDate;

ALTER TABLE NotAshleyMadisonData ADD CONSTRAINT PK_NotAshleyMadisonData PRIMARY KEY CLUSTERED (ID) WITH (FILLFACTOR = 100);

CREATE NONCLUSTERED INDEX IX_BITFIRST ON dbo.NotAshleyMadisonData

(isBit, OrderDate);

CREATE NONCLUSTERED INDEX IX_DATEFIRST ON dbo.NotAshleyMadisonData

(OrderDate, isBit);

Here’s one table with one million rows in it. Since it’s random, if you run this on your own it may turn out a little different for you, but I’m sure you can adapt. You are, after all, wearing the largest available diaper size.

I’ve also gone ahead and created two indexes (neither one filtered!) to avoid the appearance of impropriety. The first one goes against the oft-chanted mantra of not leading your index with a BIT column. The other complies to your thumb-addled rules of index creation where your more unique column comes first, though not to an opposing rule to lead your index with equality predicates and then range predicates.

Only 52,631 rows out of a million have a BIT value of 1. And with the exception of the first and last date values, each date has 75 or 76 BIT = 1 columns.

If you had to do this in your head, which would you do first? Find all the BIT = 1 rows, and then only count occurrences from the desired range? Or would you Find your start and end dates and then count all the BIT = 1 values?

(Hint: it doesn’t matter, you’re not the query engine. Unless you’re Paul White. Then maybe you are. Has anyone seen them in the same room together?)

Images of Query Plans

SELECT COUNT_BIG(*) AS Records
FROM   dbo.NotAshleyMadisonData AS namd
WHERE  namd.isBit = 1
AND    namd.OrderDate BETWEEN '2013-09-25' AND '2015-08-20'; --All dates

SELECT COUNT_BIG(*) AS Records
FROM   dbo.NotAshleyMadisonData AS namd
WHERE  namd.isBit = 1
AND    namd.OrderDate BETWEEN '2013-09-25' AND '2014-09-01'; --About half the dates

SELECT COUNT_BIG(*) AS Records
FROM   dbo.NotAshleyMadisonData AS namd
WHERE  namd.OrderDate BETWEEN '2013-09-25' AND '2014-09-01'
AND    namd.isBit = 1; -- Flipping them doesn't change anything

SELECT COUNT_BIG(*) AS Records
FROM   dbo.NotAshleyMadisonData AS namd
WHERE  namd.OrderDate = '2013-09-26' --It's not until here that the other index gets used
AND    namd.isBit = 1;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

SELECT COUNT_BIG(*) AS Records

FROM dbo.NotAshleyMadisonData AS namd

WHERE namd.isBit = 1

AND namd.OrderDate BETWEEN '2013-09-25' AND '2015-08-20'; --All dates

SELECT COUNT_BIG(*) AS Records

FROM dbo.NotAshleyMadisonData AS namd

WHERE namd.isBit = 1

AND namd.OrderDate BETWEEN '2013-09-25' AND '2014-09-01'; --About half the dates

SELECT COUNT_BIG(*) AS Records

FROM dbo.NotAshleyMadisonData AS namd

WHERE namd.OrderDate BETWEEN '2013-09-25' AND '2014-09-01'

AND namd.isBit = 1; -- Flipping them doesn't change anything

SELECT COUNT_BIG(*) AS Records

FROM dbo.NotAshleyMadisonData AS namd

WHERE namd.OrderDate = '2013-09-26' --It's not until here that the other index gets used

AND namd.isBit = 1;

Peter Godwin will never play my birthday party.

Put on your recap cap

This is another case where knowing the data, and knowing the query patterns in your environment is important. It’s easy to overlook or reject obvious things when you’re bogged down by dogma.

The stamp of approval for an idea shouldn’t come from blogs, forums, white papers, or hash tags. It should come from how much something helps in your environment.

Thanks for reading!

tim
August 26, 2015 10:58 am

So maybe this is a stupid question, but WHY did it decide to use the new index? Is it because it uses a date equality rather than a range query?

Reply
- Bertin Losier
  August 26, 2015 7:09 pm
  
  The engine should (will?) pick the index that will enable it to read a minimum number of pages.
  
  The best selectivity for a specific date (using equality) means using that index. The statistics for those indexes should/did enable the engine to pick the best index of the two.
  
  For the other queries, the where clause on the dates span either the entire table or half of it, so only the bit = 1 mattered to provide relief from a table scan.
  
  Regards,
  Bertin
  
  Reply
Clayton Hoyt
August 26, 2015 11:36 am

Would you consider using a Filtered Index here?

Reply
- Toby
  August 26, 2015 7:05 pm
  
  I’d certainly try a filtered index for this. Its a technique we’ve used with some success for some tables with an IsActive flag on them.
  
  You could have 2 indexes filtered indexes for a similar amount of space. Whether you would run into problems with cached plans is something I would want to explore.
  
  Reply
- Erik Darling
  August 31, 2015 9:58 am
  
  I absolutely would, in normal DBA real job life. For this blog post, I didn’t use a filtered index in either case to keep the example as simple as possible.
  
  Reply