“Throughout history, slow queries are the normal condition of man. Indexes which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people who don’t think bit columns are selective enough to lead index keys. Whenever this tiny minority is kept from creating indexes, or (as sometimes happens) is driven out of a SCRUM meeting, the end users then slip back into abject query performance.
This is known as “Business Intelligence.”
–Bobby Q. Heinekens
Fake quotes and people aside
Let’s look at a scenario where you have a BIT column that’s fairly selective, and perhaps the rest of your predicates are ranged. This isn’t so out of the ordinary, especially because people like to know when stuff happened and how many times it happened.
“How many lunches have I eaten today?”
“Where did that bear learn to drive?”
“Am I being slowly disassembled on a molecular level by millions of tiny black holes?”
Yes, China or Florida, and Probably!
So let’s figure this out quick
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
WITH E1(N) AS ( SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL ), E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j), Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2) SELECT IDENTITY (BIGINT, 1,1) AS ID , ABS(CHECKSUM(NEWID()) / 100000000) + 1 AS CustomerID, ISNULL(CONVERT(DATE, DATEADD(MINUTE, -N.N, GETDATE())), '1900-01-01') AS OrderDate , CASE WHEN N.N % 19 = 0 THEN 1 ELSE 0 END AS isBit INTO NotAshleyMadisonData FROM Numbers N ORDER BY OrderDate; ALTER TABLE NotAshleyMadisonData ADD CONSTRAINT PK_NotAshleyMadisonData PRIMARY KEY CLUSTERED (ID) WITH (FILLFACTOR = 100); CREATE NONCLUSTERED INDEX IX_BITFIRST ON dbo.NotAshleyMadisonData (isBit, OrderDate); CREATE NONCLUSTERED INDEX IX_DATEFIRST ON dbo.NotAshleyMadisonData (OrderDate, isBit); |
I’ve also gone ahead and created two indexes (neither one filtered!) to avoid the appearance of impropriety. The first one goes against the oft-chanted mantra of not leading your index with a BIT column. The other complies to your thumb-addled rules of index creation where your more unique column comes first, though not to an opposing rule to lead your index with equality predicates and then range predicates.
Only 52,631 rows out of a million have a BIT value of 1. And with the exception of the first and last date values, each date has 75 or 76 BIT = 1 columns.
If you had to do this in your head, which would you do first? Find all the BIT = 1 rows, and then only count occurrences from the desired range? Or would you Find your start and end dates and then count all the BIT = 1 values?
(Hint: it doesn’t matter, you’re not the query engine. Unless you’re Paul White. Then maybe you are. Has anyone seen them in the same room together?)
Images of Query Plans
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
SELECT COUNT_BIG(*) AS Records FROM dbo.NotAshleyMadisonData AS namd WHERE namd.isBit = 1 AND namd.OrderDate BETWEEN '2013-09-25' AND '2015-08-20'; --All dates SELECT COUNT_BIG(*) AS Records FROM dbo.NotAshleyMadisonData AS namd WHERE namd.isBit = 1 AND namd.OrderDate BETWEEN '2013-09-25' AND '2014-09-01'; --About half the dates SELECT COUNT_BIG(*) AS Records FROM dbo.NotAshleyMadisonData AS namd WHERE namd.OrderDate BETWEEN '2013-09-25' AND '2014-09-01' AND namd.isBit = 1; -- Flipping them doesn't change anything SELECT COUNT_BIG(*) AS Records FROM dbo.NotAshleyMadisonData AS namd WHERE namd.OrderDate = '2013-09-26' --It's not until here that the other index gets used AND namd.isBit = 1; |
Put on your recap cap
This is another case where knowing the data, and knowing the query patterns in your environment is important. It’s easy to overlook or reject obvious things when you’re bogged down by dogma.
The stamp of approval for an idea shouldn’t come from blogs, forums, white papers, or hash tags. It should come from how much something helps in your environment.
Thanks for reading!
5 Comments. Leave new
So maybe this is a stupid question, but WHY did it decide to use the new index? Is it because it uses a date equality rather than a range query?
The engine should (will?) pick the index that will enable it to read a minimum number of pages.
The best selectivity for a specific date (using equality) means using that index. The statistics for those indexes should/did enable the engine to pick the best index of the two.
For the other queries, the where clause on the dates span either the entire table or half of it, so only the bit = 1 mattered to provide relief from a table scan.
Regards,
Bertin
Would you consider using a Filtered Index here?
I’d certainly try a filtered index for this. Its a technique we’ve used with some success for some tables with an IsActive flag on them.
You could have 2 indexes filtered indexes for a similar amount of space. Whether you would run into problems with cached plans is something I would want to explore.
I absolutely would, in normal DBA real job life. For this blog post, I didn’t use a filtered index in either case to keep the example as simple as possible.