Is leading an index with a BIT column always bad?

“Throughout history, slow queries are the normal condition of man. Indexes which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people who don’t think bit columns are selective enough to lead index keys. Whenever this tiny minority is kept from creating indexes, or (as sometimes happens) is driven out of a SCRUM meeting, the end users then slip back into abject query performance.

This is known as “Business Intelligence.”

–Bobby Q. Heinekens

Fake quotes and people aside

Let’s look at a scenario where you have a BIT column that’s fairly selective, and perhaps the rest of your predicates are ranged. This isn’t so out of the ordinary, especially because people like to know when stuff happened and how many times it happened.

“How many lunches have I eaten today?”

“Where did that bear learn to drive?”

“Am I being slowly disassembled on a molecular level by millions of tiny black holes?”

Yes, China or Florida, and Probably!

So let’s figure this out quick

Here’s one table with one million rows in it. Since it’s random, if you run this on your own it may turn out a little different for you, but I’m sure you can adapt. You are, after all, wearing the largest available diaper size.

I’ve also gone ahead and created two indexes (neither one filtered!) to avoid the appearance of impropriety. The first one goes against the oft-chanted mantra of not leading your index with a BIT column. The other complies to your thumb-addled rules of index creation where your more unique column comes first, though not to an opposing rule to lead your index with equality predicates and then range predicates.

Only 52,631 rows out of a million have a BIT value of 1. And with the exception of the first and last date values, each date has 75 or 76 BIT = 1 columns.

If you had to do this in your head, which would you do first? Find all the BIT = 1 rows, and then only count occurrences from the desired range? Or would you Find your start and end dates and then count all the BIT = 1 values?

(Hint: it doesn’t matter, you’re not the query engine. Unless you’re Paul White. Then maybe you are. Has anyone seen them in the same room together?)

Images of Query Plans

Peter Godwin will never play my birthday party.
Peter Godwin will never play my birthday party.

Put on your recap cap

This is another case where knowing the data, and knowing the query patterns in your environment is important. It’s easy to overlook or reject obvious things when you’re bogged down by dogma.

The stamp of approval for an idea shouldn’t come from blogs, forums, white papers, or hash tags. It should come from how much something helps in your environment.

Thanks for reading!

Previous Post
Bieber Decisions: The Top 5 Mistakes DBAs Regret Later
Next Post
Fake moustaches, barbecue, and SQL Server.

5 Comments. Leave new

  • So maybe this is a stupid question, but WHY did it decide to use the new index? Is it because it uses a date equality rather than a range query?

    Reply
    • Bertin Losier
      August 26, 2015 7:09 pm

      The engine should (will?) pick the index that will enable it to read a minimum number of pages.

      The best selectivity for a specific date (using equality) means using that index. The statistics for those indexes should/did enable the engine to pick the best index of the two.

      For the other queries, the where clause on the dates span either the entire table or half of it, so only the bit = 1 mattered to provide relief from a table scan.

      Regards,
      Bertin

      Reply
  • Would you consider using a Filtered Index here?

    Reply
    • I’d certainly try a filtered index for this. Its a technique we’ve used with some success for some tables with an IsActive flag on them.

      You could have 2 indexes filtered indexes for a similar amount of space. Whether you would run into problems with cached plans is something I would want to explore.

      Reply
    • I absolutely would, in normal DBA real job life. For this blog post, I didn’t use a filtered index in either case to keep the example as simple as possible.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.