WHERE GETDATE() BETWEEN StartDate AND EndDate Is Hard to Tune.

Last Updated February 19, 2020

Say you’ve got a memberships (or policies) table, and each membership has start & end dates:

USE StackOverflow;
GO
DROP TABLE IF EXISTS dbo.UsersMemberships;

CREATE TABLE dbo.UsersMemberships
  (Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
   UserId INT NOT NULL,
   StartDate DATETIME NOT NULL,
   EndDate DATETIME NOT NULL,
   CancelledEarlyDate DATETIME NULL);
GO

USE StackOverflow;

DROP TABLE IF EXISTS dbo.UsersMemberships;

CREATE TABLE dbo.UsersMemberships

(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

UserId INT NOT NULL,

StartDate DATETIME NOT NULL,

EndDate DATETIME NOT NULL,

CancelledEarlyDate DATETIME NULL);

If all you need to do is look up the memberships for a specific UserId, and you know the UserId, then it’s a piece of cake. You put a nonclustered index on UserId, and call it a day.

But what if you frequently need to pull all of the memberships that were active on a specific date? That’s where performance tuning gets hard: when you don’t know the UserId, and even worse, you can’t predict the date/time you’re looking up, or if it’s always Right Now.

Let’s say – just to say – all of our memberships are current. They’ve all started in the past, and they all expire in the future. I’m going to synthesize the data by using the Stack Overflow Users table, taking the user’s CreationDate as the start of their membership, and making up an EndDate in the future:

INSERT INTO dbo.UsersMemberships(UserId, StartDate, EndDate)
  SELECT Id, CreationDate,
  DATEFROMPARTS(YEAR(GETDATE()) + 2,MONTH(LastAccessDate), DAY(LastAccessDate))
  FROM dbo.Users
  WHERE NOT(MONTH(LastAccessDate) = 2 AND DAY(LastAccessDate) > 28); /* Avoid Feb 29 problems */

INSERT INTO dbo.UsersMemberships(UserId, StartDate, EndDate)

SELECT Id, CreationDate,

DATEFROMPARTS(YEAR(GETDATE()) + 2,MONTH(LastAccessDate), DAY(LastAccessDate))

FROM dbo.Users

WHERE NOT(MONTH(LastAccessDate) = 2 AND DAY(LastAccessDate) > 28); /* Avoid Feb 29 problems */

I’m going to index the bejeezus out of both StartDate and EndDate to give SQL Server the best possible chance:

CREATE INDEX IX_StartDate_EndDate ON dbo.UsersMemberships(StartDate, EndDate);
CREATE INDEX IX_EndDate_StartDate ON dbo.UsersMemberships(EndDate, StartDate);
GO

CREATE INDEX IX_StartDate_EndDate ON dbo.UsersMemberships(StartDate, EndDate);

CREATE INDEX IX_EndDate_StartDate ON dbo.UsersMemberships(EndDate, StartDate);

When you’re querying for active memberships, you’ll probably run one of these two query design patterns:

/* The queries we would normally run: */
SELECT Id, StartDate, EndDate
  FROM dbo.UsersMemberships
  WHERE GETDATE() BETWEEN StartDate AND EndDate;
GO
SELECT Id, StartDate, EndDate
  FROM dbo.UsersMemberships
  WHERE StartDate <= GETDATE() 
    AND EndDate >= GETDATE();
GO

/* The queries we would normally run: */

SELECT Id, StartDate, EndDate

FROM dbo.UsersMemberships

WHERE GETDATE() BETWEEN StartDate AND EndDate;

SELECT Id, StartDate, EndDate

FROM dbo.UsersMemberships

WHERE StartDate <= GETDATE()

AND EndDate >= GETDATE();

In this case, estimation is easy. The actual plans show that SQL Server knows that all of the rows are going to match:

So when you just start building a membership or policy system, the query plans look nice and simple: estimation is bang on. Over time, though, your data starts to skew, and so do your estimates.

That imaginary case is easy,
but in real life, some folks have expired.

To show how bad it can get, I’m going to create another contrived extreme: instead of everyone’s membership being current, I’m going to say that NO one is current. 50% of our members’ plans have expired, and 50% haven’t started yet:

TRUNCATE TABLE dbo.UsersMemberships;

/* 50% are expired. They both start and end in the past. */
INSERT INTO dbo.UsersMemberships(UserId, StartDate, EndDate)
  SELECT Id, CreationDate, LastAccessDate
  FROM dbo.Users
  WHERE Id % 2 = 0;
GO


/* 50% haven't started yet. They both start and end in the future. */
INSERT INTO dbo.UsersMemberships(UserId, StartDate, EndDate)
  SELECT Id, 
  CASE WHEN GETDATE() < DATEFROMPARTS(YEAR(GETDATE()),MONTH(LastAccessDate), DAY(LastAccessDate))
			THEN DATEFROMPARTS(YEAR(GETDATE()),MONTH(LastAccessDate), DAY(LastAccessDate))
		ELSE DATEFROMPARTS(YEAR(GETDATE()) + 1,MONTH(LastAccessDate), DAY(LastAccessDate)) END,
  DATEFROMPARTS(YEAR(GETDATE()) + 2,MONTH(LastAccessDate), DAY(LastAccessDate))
  FROM dbo.Users
  WHERE Id % 2 = 1
    AND NOT(MONTH(LastAccessDate) = 2 AND DAY(LastAccessDate) > 28);
GO

TRUNCATE TABLE dbo.UsersMemberships;

/* 50% are expired. They both start and end in the past. */

INSERT INTO dbo.UsersMemberships(UserId, StartDate, EndDate)

SELECT Id, CreationDate, LastAccessDate

FROM dbo.Users

WHERE Id % 2 = 0;

/* 50% haven't started yet. They both start and end in the future. */

INSERT INTO dbo.UsersMemberships(UserId, StartDate, EndDate)

SELECT Id,

CASE WHEN GETDATE() < DATEFROMPARTS(YEAR(GETDATE()),MONTH(LastAccessDate), DAY(LastAccessDate))

THEN DATEFROMPARTS(YEAR(GETDATE()),MONTH(LastAccessDate), DAY(LastAccessDate))

ELSE DATEFROMPARTS(YEAR(GETDATE()) + 1,MONTH(LastAccessDate), DAY(LastAccessDate)) END,

DATEFROMPARTS(YEAR(GETDATE()) + 2,MONTH(LastAccessDate), DAY(LastAccessDate))

FROM dbo.Users

WHERE Id % 2 = 1

AND NOT(MONTH(LastAccessDate) = 2 AND DAY(LastAccessDate) > 28);

Now, when I run the query to check our active members again, there are a few tricky problems hidden in a plan that looks deceivingly simple. It looks like a pair of simple index seeks, but they don’t perform worth a damn:

The first problem:
these seeks read a lot of data to produce 0 rows.

While this looks like a pair of lightweight index seeks, both of these queries did a heck of a lot of work. Hover your mouse over either index seek, and check out the number of rows read:

They read 4.5M rows – half of the table – in order to produce a whopping 0 rows. I can almost hear you screaming, “How is that possible?!? It says SEEK right on there, so we’re seeking to the rows we need, right?”

Not exactly: look at the Seek Predicates at the bottom of that tooltip. SQL Server is opening up the EndDate_StartDate index, and it’s turning to the first row where the EndDate >= GETDATE(), and then it starts reading. But from that point forward, it has to examine every row that it finds.

To simulate what it’s doing, let’s visualize the index by crafting a query that accurately simulates what SQL Server will find when it opens the index. (To learn more about this technique, check out the very first module in my Fundamentals of Index Tuning class.)

SELECT TOP 1000
	EndDate, ExpiredYet = CASE WHEN GETDATE() > EndDate THEN 'Expired' ELSE 'OK' END,
    StartDate, StartedYet = CASE WHEN GETDATE() > StartDate THEN 'OK' ELSE 'Not Started Yet' END,
	Id
  FROM dbo.UsersMemberships
  WHERE EndDate >= GETDATE()
  ORDER BY EndDate, StartDate
GO

SELECT TOP 1000

EndDate, ExpiredYet = CASE WHEN GETDATE() > EndDate THEN 'Expired' ELSE 'OK' END,

StartDate, StartedYet = CASE WHEN GETDATE() > StartDate THEN 'OK' ELSE 'Not Started Yet' END,

FROM dbo.UsersMemberships

WHERE EndDate >= GETDATE()

ORDER BY EndDate, StartDate

We can verify that it’s an accurate visualization of the index’s contents by looking at the contents of the execution plan, and making sure that there are no sorts and no residual predicates on the index seek:

When SQL Server opens up that index on EndDate, StartDate and starts reading, it’s jumping to EndDate = GETDATE(), and it reads forward. It’s examining all future-expiring memberships, and checking to see if their StartDates are in the past:

And it has to read through all future-expiring memberships just hoping to find one that started in the past. Until it reads all of them, every single future-expiring membership, it can’t guarantee that one of them hasn’t started in the past.

In terms of the good ol’ white pages of the phone book, it’s as if I told you to find everyone whose LastName >= P, and FirstName <= AAA. Sure, you could seek to the P’s and start reading, but you’re going to have to read through a lot of pages checking to see who has a FirstName like AAAAaandy. In the phone book example, you would jump to the P’s, find that no one matches, and then skip ahead to the Q’s. You can do that because there are a relatively limited number of letters in the alphabet, but SQL Server can’t do that because there aren’t a limited number of EndDates. There might be millions of distinct EndDates.

That sucks. But it gets worse.

The next problem:
SQL Server’s estimates were way off.

SQL Server estimated that 27.5% of the table would match. He thought this query would return 3.3M of the table’s 8.9M rows:

That’s a problem for me if my index doesn’t fully cover the query. In my contrived example, I completely covered my contrived membership query:

SELECT Id, StartDate, EndDate
  FROM dbo.UsersMemberships
  WHERE StartDate <= GETDATE() 
    AND EndDate >= GETDATE();
GO

SELECT Id, StartDate, EndDate

FROM dbo.UsersMemberships

WHERE StartDate <= GETDATE()

AND EndDate >= GETDATE();

But eagle-eyed readers might have caught that my Memberships table has 4 columns: Id (the primary clustered key), StartDate, EndDate, and also, UserId, which links back to the Users table to show whose membership it is. If I add that UserId column in as well:

Then SQL Server ignores my nonclustered index altogether! He thinks that 3.3M key lookups would cost too much money, so he ends up scanning the table, and doing even MORE logical reads than he was doing before!

Thankfully, real-life data distributions aren’t quite THAT bad.

In real life, your own business has its own skew of expired, current, and future memberships. Based on your own skew, you might need to cluster a table like this on either EndDate, StartDate, or StartDate, EndDate. I can’t write a demo to show your exact data skew (well, I mean, I can, I’m a consultant for hire, and that’s how this whole thing started.) For the sake of this blog post, I wrote a script to populate the table with a bunch of data so that about 25% of my memberships haven’t started yet, 25% have started but already expired, 25% are current, and then 25% are kinda wacko.

For that demo script, you’ll note a few unusual table design choices:

The table is a heap
It has a nonclustered primary key on Id
It has a nonclustered index on EndDate, StartDate that includes Id, UserId
It has another on StartDate, EndDate that includes Id, UserId

With that data mix populated, let’s run our reporting query, check out our estimates, and we’ll even hint it with both of the nonclustered indexes so we can see which one does less reads:

SET STATISTICS IO ON;
GO
SELECT Id, StartDate, EndDate
  FROM dbo.UsersMemberships
  WHERE GETDATE() BETWEEN StartDate AND EndDate;
GO
SELECT Id, StartDate, EndDate
  FROM dbo.UsersMemberships
  WHERE StartDate <= GETDATE() 
    AND EndDate >= GETDATE();
GO

SET STATISTICS IO ON;

SELECT Id, StartDate, EndDate

FROM dbo.UsersMemberships

WHERE GETDATE() BETWEEN StartDate AND EndDate;

SELECT Id, StartDate, EndDate

FROM dbo.UsersMemberships

WHERE StartDate <= GETDATE()

AND EndDate >= GETDATE();

The good news is that both query plans get decent row estimations:

In addition, both pick the same IX_StartDate_EndDate index. The bad news, though, is that even though the estimate is decent, the residual scan predicate means that we seek to one point, but then we read 2x more rows than the query actually produces:

If I use index hints for testing which index actually reads less data:

SELECT Id, StartDate, EndDate
  FROM dbo.UsersMemberships WITH (INDEX = IX_StartDate_EndDate)
  WHERE GETDATE() BETWEEN StartDate AND EndDate;
GO
SELECT Id, StartDate, EndDate
  FROM dbo.UsersMemberships WITH (INDEX = IX_EndDate_StartDate)
  WHERE GETDATE() BETWEEN StartDate AND EndDate;
GO

SELECT Id, StartDate, EndDate

FROM dbo.UsersMemberships WITH (INDEX = IX_StartDate_EndDate)

WHERE GETDATE() BETWEEN StartDate AND EndDate;

SELECT Id, StartDate, EndDate

FROM dbo.UsersMemberships WITH (INDEX = IX_EndDate_StartDate)

WHERE GETDATE() BETWEEN StartDate AND EndDate;

In my case, with my demo’s data distribution, the StartDate index reads less 8KB pages to produce the same results:

So if I needed to tune my demo data’s scenario, and if we did a lot of querying for active memberships, and I couldn’t predict the date/time we were going to query for nor could I know which UserIds we were looking for, I might:

Put a clustered index on on StartDate, EndDate
Put a nonclustered index on UserId (and probably no other fields, although I’d be open to changing that if individual users could have a lot of memberships)

I know this is one of those blog posts that’s going to provoke a lot of “why don’t you do ___” comments, but I’ll be honest: I’m not going to answer them here because you can answer ’em yourself. Download a Stack Overflow database, run my script to populate the UsersMemberships table, and then run your own experiments to test your ideas. When you find something that seems like it works, you’re welcome to share it in the comments. You can also use a Github gist to share your source code.

One approach I expect folks to suggest is to build a child table like this with a row for each day that a subscription is active:

UserId
MembershipId
ActiveDate

So if your subscription was active for 300 days, you’d have 300 rows in that table. Unfortunately, solutions like that can be tough to scale too: using our UsersMembership table, the number of rows in it would be pretty ugly if we had to maintain all history. I’d be at 7 billion rows even with this simplified example where each user only has one membership:

Razor-sharp readers like yourself may have noticed the column that I didn’t mention in this post. That particular column makes this exercise deliciously difficult – more on that in the next post.

WHERE GETDATE() BETWEEN StartDate AND COALESCE(CancelDate, EndDate) Is Even Harder to Tune.

19 Comments. Leave new

Chris F
February 18, 2020 1:20 pm

Just create a persisted computed column calculating whether or not the row is active based off the current time. That will work great, won’t it? 🙂

Reply
- Brent Ozar
  February 18, 2020 1:28 pm
  
  People constantly suggest that at first too, hahaha, and then they realize the problem when they’re thinking through the details. Time is a cruel mistress.
  
  Reply
  - Chris F
    February 18, 2020 1:35 pm
    
    As I was typing it out I was imagining SQL dedicating CPU cycles to recalculating it frequently behind the scenes and shuddered.
    
    Reply
Garry L
February 19, 2020 12:55 am

Or you could have an active table that is truncated and populated at midnight via a scheduled job…the membership only expires at the end of the day right?

Reply
- Brent Ozar
  February 19, 2020 3:39 am
  
  Unfortunately, businesses usually need to check eligibility for past dates too – as in, show me if the membership/policy was active as of last Thursday.
  
  Reply
  - Garry L
    February 19, 2020 4:14 am
    
    See, this is why you get the big bucks… 😉
    
    Reply
    - Brent Ozar
      February 19, 2020 4:30 am
      
      HA!
      
      Reply
Fun with Filtering Between Start and End Dates – Curated SQL
February 19, 2020 5:09 am

[…] Brent Ozar shows why the StartDate + EndDate pattern is not great for filtering: […]

Reply
Scott Coutant
February 19, 2020 9:42 am

What about using a Temporal table. Add a user to the table when they “Start” and have a Job delete them when the End Date is reached, or in a trigger where CancelledEarlyDate is set to current (I’m assuming that you only set CancelledEarlyDate to the current date, if not, then into the previous Job they go.

Then you can simply use the Temporal querying for any specific date.

Reply
- Brent Ozar
  February 19, 2020 9:50 am
  
  Scott – sure, but then you’d have to change the queries (often dramatically, as I talk about in the next post when the early cancelation date column comes into play.)
  
  I wish I could tell people, “hey, just change the way you store and query data,” but in real life, that’s usually a pretty tough sell.
  
  Reply
Dominique Boucher
February 19, 2020 10:53 am

This would required quantum physics (as object can have 2 different values at the same time, like the start and the end) in order to make it work great 😉
Maybe in SQL2119

Reply
Filip
February 20, 2020 10:12 am

how about treating timestamps like points in 1 dimensional space and use heavy artillery of spatial goodies builtin db engine?

Reply
- Brent Ozar
  February 20, 2020 10:15 am
  
  Filip – go ahead and read the whole post, particularly the part about suggestions.
  
  Reply
  - Filip
    February 21, 2020 6:13 am
    
    I’ve read that part before posting my question (no fingers crossed). I’ll try that myself when I manage to accumulate some free time. I won’t forget that topic, because I’m dealing with sub 1TB tables, where long scans on timestamp columns pains me every week.
    
    Reply
Carlos Bercero
February 21, 2020 12:44 pm

Wonder where you got that scenario from… 😉 Thank you for sharing, big hug my friend!

Reply
- Brent Ozar
  February 21, 2020 12:56 pm
  
  I thought you’d like that! Good hearing from you, and hope to see you again soon.
  
  Reply
  - Carlos Bercero
    February 21, 2020 1:05 pm
    
    I do indeed!
    Same here.
    
    Reply
Chris
July 4, 2021 5:40 am

Hi,

I’d like to know when was the first order and last order placed for a given product. How to extract first order start date and last order end date from same column (OrderDate)?Do I need to use dateadd() or lead function?

Reply
- Brent Ozar
  July 4, 2021 5:49 am
  
  Hi, Chris. Unfortunately I can’t do free personalized consulting & training here in the comments.
  
  Reply