Your Query Exercise was to find the best time to do database maintenance by querying the Users table, looking for a one-hour window with the lowest number of created users. We kept this one pretty simple by looking at the data in just one table, and we didn’t hassle with time zones. We just wanted the 3 lowest-load hours, in this format:
- Mondays, 1:00-1:59
- Saturdays, 23:00-23:59
- Sundays, 2:00-2:59
The comments on this one were wild! Lots of different approaches, but they all boiled down to two basic styles.
The easy (but not necessarily right) way:
group by date_part
One way to do this is to use the built-in DATEPART function to group users by weekday and hour of their creation dates, like this:
1 2 3 4 5 6 7 8 |
SELECT DATEPART(WEEKDAY, CreationDate) as weekday, DATEPART(HOUR, CreationDate) as hour, COUNT(*) as users_created FROM dbo.Users u WHERE u.CreationDate >= '2017-11-05' AND u.CreationDate < '2017-12-03' GROUP BY DATEPART(WEEKDAY, CreationDate), DATEPART(HOUR, CreationDate) ORDER BY COUNT(*); |
And at first glance, that looks like it works:
For the database I’m using (the 2018-06 Stack Overflow database) and the date ranges I picked, it looks like late Sunday night, early Monday morning would be the winners here, so my answer might be:
- Monday, 02:00-02:59
- Monday, 00:00-00:59
- Sunday, 00:00-00:59
If that was your solution, you’d be in fine company: ChatGPT 4 suggested a similar solution. It even added a delightful caveat:
Note that
DATEPART(dw, CreationDate)
returns an integer representing the day of the week, where the specific value depends on the SET DATEFIRST setting of your SQL Server instance. Typically, 1 represents Sunday, but this can vary. Similarly,DATEPART(hour, CreationDate)
returns the hour of the day based on a 24-hour clock.
That’s one of the things I love about using ChatGPT for coding tasks like that – it can add valuable context that saves me time from checking the documentation.
That grouping approach has a bug, though.
That technique only works as long as our historical data is populated for every hour of every day. What if we’ve been taking the web servers down every Sunday at 16:00-17:00? There wouldn’t be any users created in that date range, and thus, no rows would show up in the result set for that query!
To illustrate it, let’s query the table looking for days when no users were created during particular hours:
1 2 3 4 5 6 7 8 |
SELECT TOP 10 DATEPART(YEAR, CreationDate) AS CreationDate_Year, DATEPART(MONTH, CreationDate) AS CreationDate_Month, DATEPART(DAY, CreationDate) AS CreationDate_DAY, COUNT(DISTINCT DATEPART(HOUR, CreationDate)) AS CreationDate_Hours FROM dbo.Users GROUP BY DATEPART(YEAR, CreationDate), DATEPART(MONTH, CreationDate), DATEPART(DAY, CreationDate) ORDER BY COUNT(DISTINCT DATEPART(HOUR, CreationDate)); |
The results in the 2018-06 database show that we’ve had several days where we only created users in some (but not all) hours of the day:
And if that was a repeating trend in the data, we’d have a problem.
The Safer Technique:
generate a list of days/times first.
In pseudocode:
1 2 3 |
SELECT weekday, hour, (SELECT COUNT(*) of users_created in that weekday and hour) FROM list_of_weekdays_and_hours |
That way, if we have any weekday/hour combinations with no users created, we’ll still have rows for them in our result set. Starting with compatibility level 160 (SQL Server 2022), one option for this is the generate_series function. In older versions, you’d probably use a numbers table, aka tally table.
We need two sets of rows:
- weekdays: a set of rows 1 through 7
- hours: a set of rows 0 through 23
So we’ll call generate_series twice:
1 2 3 |
SELECT weekdays.value AS weekday, hours.value AS hour FROM GENERATE_SERIES(1,7) AS weekdays CROSS JOIN GENERATE_SERIES(0,23) AS hours; |
To get a list of days and hours:
Then, let’s take our first version of this query, use that as a CTE, and left outer join to it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
WITH UserCreations AS ( SELECT DATEPART(WEEKDAY, CreationDate) as weekday, DATEPART(HOUR, CreationDate) as hour, COUNT(*) as users_created FROM dbo.Users u WHERE u.CreationDate >= '2017-11-05' AND u.CreationDate < '2017-12-03' GROUP BY DATEPART(WEEKDAY, CreationDate), DATEPART(HOUR, CreationDate) ) SELECT weekdays.value AS weekday, hours.value AS hour, COALESCE(UserCreations.users_created, 0) AS users_created FROM GENERATE_SERIES(1,7) AS weekdays CROSS JOIN GENERATE_SERIES(0,23) AS hours LEFT OUTER JOIN UserCreations ON weekdays.value = UserCreations.weekday AND hours.value = UserCreations.hour ORDER BY COALESCE(UserCreations.users_created, 0); |
That way, the GENERATE_SERIES calls will make sure rows exist in our result set – even if no users were created in that date range.
Hope you enjoyed it! For more challenges like this, check out the Query Exercises page.
3 Comments. Leave new
Hey, Brent, what day is “1” and is it going to be the same in all cases? 😉 This solution doesn’t meet the original requirement of displaying the results “in this format: Mondays, 1:00-1:59”. Nice touch using the two GENERATE_SERIES calls on the fly in a CROSS JOIN – very slick.
Glad you liked it! I only go so far in details in this post – only so many hours in the day, pun intended – but to learn more about the gotchas and lots of other ways to do it, check out the comments on the original post: https://www.brentozar.com/archive/2024/01/query-exercise-find-the-best-time-for-maintenance/
[…] Finding the Best Time for Maintenance: Answers & Discussion: This is a good thing to keep in your backpocket. Your Query Exercise was to find the best time to do database maintenance by querying the Users table, looking for a one-hour window with the lowest number of created users. […]