Date Tables are Great for Users, but Not So Great for Performance

Date tables help users filter data based on day of week, holiday, this year vs last year, and other kinds of filters that business users usually wanna add to their reports. If you haven’t used a date table before, here are a couple of primers on how to create one and use it:

And for the purposes of this post, I’m going to use the date_calendar table created in the second bullet point above. Here’s what it looks like:

Typically when people use date tables, they join from their real tables over to the date table, and they do their date calculations & filtering by just specifying columns in the date table.

For example, if you wanted to find out what day of the week people usually sign up for Stack Overflow accounts (database), you might write a query like this:

The results look like this:

Pretty cool! But…does it perform well? To find out, let’s write an old-school version of the query that doesn’t use a date table, and we’ll compare the two using SET STATISTICS IO, TIME ON to measure CPU, duration, and logical reads:

The metrics tell an interesting story:

  • Date table method: 100,074 logical reads, CPU time 3,422 ms, elapsed time 3,425 ms
  • Old school method: 20,095 logical reads, CPU time 3,671 ms, elapsed time 1,467 ms

The actual execution plans are obviously different:

So why did the date table not perform as well as the old-school way?

SQL Server doesn’t understand the relationship between these two tables. It simply doesn’t know that all of the rows in the Users table will match up with rows in the calendar table. It assumes that the calendar table is doing some kind of filtering, especially given our CAST on the date. It doesn’t expect all of the rows to match.

SQL Server decides it’s going to scan the date_calendar table first, and then for each date that it finds, it does an index seek into the Users index on CreationDate:

If you hover your mouse over that index seek on CreationDate, the number of executions are worth a laugh:

This index seek was done over 18,000 times – once for every day in our calendar table.

Because SQL Server thought the date table would do some filtering, it also underestimated the number of rows it’d find in Users, so it also underestimated the CPU work required, so the date table approach only went single-threaded. The old school way understood that lots of rows would be involved, so the work was parallelized across multiple threads.

That’s a simple query where I’m not even doing filtering – but query plans get even more predictable as you build atop this, adding filters on the date table looking for weekdays, specific days of the month, or ranges. SQL Server just has a hard time guessing how many rows are going to come out of the related tables. The more complex your filtering becomes, the crazier the estimates become.

So if I have date tables, what should I do next?

This quick experiment doesn’t mean that you should abandon your date tables and force everyone to do calculations against the base tables. Date tables are WONDERFUL, and I recommend them highly.

However, you just need to be aware that:

  • If you filter on the date table rather than the base tables, you’ll likely get inaccurate estimates
  • The more you rely on functions or ranges of data in the date tables, the more likely you are to get increasingly bad estimates
  • To fix these estimates, you may need to add relational columns on your base tables that define exactly which rows in the calendar table match which rows in your base table
  • When doing performance tuning on a query that uses a date table, consider briefly tweaking the query to not use the date table, and see how much better the execution plan gets. If it’s a night and day difference, you may need to work on the date table’s relationships or switch this one particular query to filter directly on the base tables rather than the date tables.

Got questions? Want to try different approaches to my queries above? Grab the Stack Overflow database and try running your own experiments. That’s why I love using the Stack Overflow database as a sample: it contains real-world distribution on stuff like dates so you can prove your own ideas or disprove mine.

Watch me write this post

I streamed this post live on Sunday, July 19:

Previous Post
[Video] The Top 10 Developer Mistakes That Won’t Scale on Microsoft SQL Server
Next Post
SQL Server Problems We Don’t Have Anymore

22 Comments. Leave new

  • Not The DBA You're Looking For
    August 4, 2020 9:17 am

    I disagreed with this less and less as I read through it.

    First thoughts:
    1) That CAST is unfair
    2) Having a foreign key on the base table would help no end

    Then you actually touched on each point. My main take away is, If you’re going to implement a date table, implement it correctly

    Reply
    • The cast IS unfair, but…that’s how a lot of folks use date tables. Now about the foreign key – best to try it and go see if it really helps. (Hint: in this case, it won’t.)

      Reply
    • That CAST is unfair indeed. There’s no issue with a date table, there’s an issue with the data modelling. I did almost the exact same query on my data warehouse, and I got a vastly different execution plan (hint: it was fast).
      If you’re going to open up your database to users (as the title suggests), then model your database for user access (in other words: build a data warehouse). Even if you just follow the Kimball method, you’ll model it correctly.

      Reply
      • We could frame all database problems that way, right? “You don’t have a stored procedure problem – it’s just your data modeling.” 😉

        Ah, if only everyone modeled their database correctly, the world would be a delightful place.

        Reply
        • Koen Verbeeck
          August 5, 2020 3:44 am

          I know I know. So many issues come from data modelling alone 🙂
          I just took offense at the title. It seems like date tables lead to performance issues, but that’s not the case. Only when you use non-SARGable expressions in your joins, they give issues as you demonstrated, but that problem is much wider than date tables alone.

          Reply
          • Yeah, it just stems from the fact that I’ve seen 3 clients in a row doing it this same way. I figure if I grab people’s attention with the title, then they’ll at least read to see if they’re doing date tables that way.

            If I write it as, “If you use a cast, blah blah” then they won’t necessarily catch it and read the post.

          • (Oh I should also add – it’s not only when you use non-SARGable expressions. I’ve got another post in the queue about how date tables cause problems with row estimations when you filter on things like weekday/holiday columns in the date tables.)

          • Koen Verbeeck
            August 5, 2020 3:56 am

            You and your clickbait articles 😀
            Looking forward to the other post about date tables.

  • Interesting. I suppose my inability to replicate the poor results in your test are from not knowing enough about your dbo.Users table. When I remove the CAST I get completely parallelized results, but the table I created is completely different from the one you used. I haven’t tried it yet, but would it be worth putting a date_low and date_high column in the date_calendar table and indexing them, to change that CAST to a BETWEEN clause?

    Reply
  • Interesting post, I often find date tables implemented in the most bizarre way… I wonder: what if instead of the table, a view that build the dates on the fly is used? You know, with dateadd, row_number and so on, without relying on a physical table.. I’ll give it a try as soon as I have some spare time

    Reply
  • […] Brent Ozar walks through a scenario in which a calendar table (AKA, date dimension) makes a query pe…: […]

    Reply
  • Well, the example you used made no sense, IMHO. Calendar table has lots of great usages and you shown a case where it has no usage at all.

    Did you try to compare one of the typical use cases when we need to find number of work dates (e.g. not a holiday and not a weekday) between 2 dates and do some calculation using some extra, say, Orders table?

    Reply
  • The title was catchy indeed, BTW 🙂

    Reply
  • There’s no shield, I’m afraid, that would save you from people writing bad SQL and people designing bad models. All we can or maybe should count on (or are forced to do) is Microsoft taking steps to make changes to the internals of SQL Server so that even bad SQL is transformed into good SQL. When you know what you do, you do it carefully and right. When all you want is to get A result, you don’t care about anything else. You just write and go home. Can’t be helped. But at least we, who do care, have well-paid jobs 🙂

    Reply
  • Esther Neustadt
    August 10, 2020 10:05 am

    Thanks for the article. Very helpful info. The title though it is catchy may leave the reader with the impression that data tables are not good for performance. Even if further in the article there is more detail as to when they actually go bad the message of the title sometimes does remain imprinted in a reader’s mind.

    Reply
  • If the purpose of the post is to show how “not” to use a date table, then it does it well indeed. Reading the early links help frame it by showing how to implement well. I would have benefited if the juxtaposition was stated initially – i.e. good vs the bad.

    Reply
  • Just so I’m clear… so the optimal solution in the model is to create a FK constraint between the date and have tables?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}