Finding the Best Time for Maintenance: Answers & Discussion

Your Query Exercise was to find the best time to do database maintenance by querying the Users table, looking for a one-hour window with the lowest number of created users. We kept this one pretty simple by looking at the data in just one table, and we didn’t hassle with time zones. We just wanted the 3 lowest-load hours, in this format:

  • Mondays, 1:00-1:59
  • Saturdays, 23:00-23:59
  • Sundays, 2:00-2:59

The comments on this one were wild! Lots of different approaches, but they all boiled down to two basic styles.

The easy (but not necessarily right) way:
group by date_part

One way to do this is to use the built-in DATEPART function to group users by weekday and hour of their creation dates, like this:

And at first glance, that looks like it works:

For the database I’m using (the 2018-06 Stack Overflow database) and the date ranges I picked, it looks like late Sunday night, early Monday morning would be the winners here, so my answer might be:

  • Monday, 02:00-02:59
  • Monday, 00:00-00:59
  • Sunday, 00:00-00:59

If that was your solution, you’d be in fine company: ChatGPT 4 suggested a similar solution. It even added a delightful caveat:

Note that DATEPART(dw, CreationDate) returns an integer representing the day of the week, where the specific value depends on the SET DATEFIRST setting of your SQL Server instance. Typically, 1 represents Sunday, but this can vary. Similarly, DATEPART(hour, CreationDate) returns the hour of the day based on a 24-hour clock.

That’s one of the things I love about using ChatGPT for coding tasks like that – it can add valuable context that saves me time from checking the documentation.

That grouping approach has a bug, though.

That technique only works as long as our historical data is populated for every hour of every day. What if we’ve been taking the web servers down every Sunday at 16:00-17:00? There wouldn’t be any users created in that date range, and thus, no rows would show up in the result set for that query!

To illustrate it, let’s query the table looking for days when no users were created during particular hours:

The results in the 2018-06 database show that we’ve had several days where we only created users in some (but not all) hours of the day:

And if that was a repeating trend in the data, we’d have a problem.

The Safer Technique:
generate a list of days/times first.

In pseudocode:

That way, if we have any weekday/hour combinations with no users created, we’ll still have rows for them in our result set. Starting with compatibility level 160 (SQL Server 2022), one option for this is the generate_series function. In older versions, you’d probably use a numbers table, aka tally table.

We need two sets of rows:

  • weekdays: a set of rows 1 through 7
  • hours: a set of rows 0 through 23

So we’ll call generate_series twice:

To get a list of days and hours:

Then, let’s take our first version of this query, use that as a CTE, and left outer join to it:

That way, the GENERATE_SERIES calls will make sure rows exist in our result set – even if no users were created in that date range.

Hope you enjoyed it! For more challenges like this, check out the Query Exercises page.

Previous Post
Save 10% on SQLBits Registration with BRENTO10.
Next Post
Query Exercise: Find Tagged Questions Faster.

3 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.