This morning, I had the honor of presenting virtually at SQLSaturday Oslo. The session title, Watch Brent Tune Queries, explains it all. Enjoy!
In the session, I used the 2018-06 Stack Overflow database, but any recent full-sized copy (not the 2010-2013 copies) will work. Here’s the query I worked with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
/* Watch Brent Tune Queries: SQLSaturday Oslo 2020 v1.0 - 2020-08-29 https://www.BrentOzar.com/go/tunequeries This demo requires: * Any supported version of SQL Server * The 2018-06 Stack Overflow database: https://www.BrentOzar.com/go/querystack (The demo will work with other versions, but only if you tweak query parameters to reproduce the estimation errors I cover here.) This first RAISERROR is just to make sure you don't accidentally hit F5 and run the entire script. You don't need to run this: */ RAISERROR(N'Oops! No, don''t just hit F5. Run these demos one at a time.', 20, 1) WITH LOG; GO /* I'm using the large Stack database: */ USE StackOverflow; GO /* I'm using 2019 compat level to give SQL Server every possible chance, but if you have an older server, use the newest compat level you have. */ ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150; ALTER DATABASE SCOPED CONFIGURATION SET LAST_QUERY_PLAN_STATS = ON; /* 2019 only */ EXEC DropIndexes; GO /* These indexes should help our query: */ CREATE INDEX UserId_Incl ON dbo.Comments(UserId) INCLUDE (CreationDate); CREATE INDEX OwnerUserId_Incl ON dbo.Posts(OwnerUserId) INCLUDE (CreationDate); CREATE INDEX CreationDate_Incl ON dbo.Comments(CreationDate) INCLUDE (UserId); CREATE INDEX CreationDate_Incl ON dbo.Posts(CreationDate) INCLUDE(OwnerUserId); GO CREATE OR ALTER PROC dbo.rpt_TopUsers_ByLocation @Location NVARCHAR(100), @StartDate DATE, @EndDate DATE AS BEGIN SELECT TOP 1000 u.Reputation, u.DisplayName, u.AboutMe, SUM(p.Score) AS PostsScore, SUM(c.Score) AS CommentsScore FROM dbo.Users u LEFT OUTER JOIN dbo.Posts p ON u.Id = p.OwnerUserId AND p.CreationDate BETWEEN @StartDate AND @EndDate LEFT OUTER JOIN dbo.Comments c ON u.Id = c.UserId AND c.CreationDate BETWEEN @StartDate AND @EndDate WHERE u.Location = @Location GROUP BY u.Reputation, u.DisplayName, u.AboutMe ORDER BY SUM(p.Score) DESC END GO /* Turn on actual plans & our query options: */ SET STATISTICS IO, TIME ON; /* My users have been complaining about this: */ EXEC rpt_TopUsers_ByLocation @Location = N'Reading, United Kingdom', @StartDate = '2011-09-01', @EndDate = '2011-10-01' GO /* If a query takes a long time to run, your options include: * Get the estimated plan * Look at the live plan with sp_BlitzWho or Activity Monitor SQL 2016 SP1 or newer: https://www.brentozar.com/archive/2017/10/get-live-query-plans-sp_blitzwho/ * Run it with Live Query Statistics on * Get the last actual plan with sp_BlitzCache: SQL 2019 or newer: https://www.brentozar.com/archive/2016/08/run-sp_blitzcache-single-query/ */ /* License: Creative Commons Attribution-ShareAlike 4.0 Unported (CC BY-SA 4.0) More info: https://creativecommons.org/licenses/by-sa/4.0/ You are free to: * Share - copy and redistribute the material in any medium or format * Adapt - remix, transform, and build upon the material for any purpose, even commercially Under the following terms: * Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. * ShareAlike - If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. * No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. */ |
And for more sessions like this, check out the Watch Brent Tune Queries series page.
8 Comments. Leave new
what happen at the end?
how can I watch the rest?
I’m not sure what you mean “the rest” – it was a one-hour session, and you see the whole thing up there in the video.
I was expected to see you tune the query and it runs very fast
I understand the concept of B.E. C.R.E.E.P.Y. but willing to see final optimization on this specific query.
P.S. you are exceptional presenter 🙂 but I dont like new sound effects 🙂 it was hard to hear your voice when sound effect is playing
OK, thanks for the feedback, and hope you enjoyed the video.
I have some doubts.
select Id from Users u where u.DisplayName=’Jon Skeet’
gives 22656
SELECT SUM(p.Score) AS PostsScore
FROM dbo.Posts p where p.OwnerUserId =22656 and
p.CreationDate BETWEEN ‘20110901’ AND ‘20111001’
gives 7947
But result
EXEC rpt_TopUsers_ByLocation
@Location = N’Reading, United Kingdom’,
@StartDate = ‘2011-09-01’, @EndDate = ‘2011-10-01’
GO
is
Reputation DisplayName AboutMe PostsScore CommentsScore
1047863 Jon Skeet … 9687393 677784
A little more.
Why?
SELECT count(*)
FROM dbo.Users u
LEFT OUTER JOIN dbo.Posts p ON u.Id = p.OwnerUserId AND p.CreationDate BETWEEN ‘20110901’ AND ‘20111001’
— LEFT OUTER JOIN dbo.Comments c ON u.Id = c.UserId AND c.CreationDate BETWEEN ‘20110901’ AND ‘20111001’
WHERE u.Id = 22656
=744
SELECT count(*)
FROM dbo.Users u
— LEFT OUTER JOIN dbo.Posts p ON u.Id = p.OwnerUserId AND p.CreationDate BETWEEN ‘20110901’ AND ‘20111001’
LEFT OUTER JOIN dbo.Comments c ON u.Id = c.UserId AND c.CreationDate BETWEEN ‘20110901’ AND ‘20111001’
WHERE u.Id = 22656
=1219
SELECT count(*)
FROM dbo.Users u
LEFT OUTER JOIN dbo.Posts p ON u.Id = p.OwnerUserId AND p.CreationDate BETWEEN ‘20110901’ AND ‘20111001’
LEFT OUTER JOIN dbo.Comments c ON u.Id = c.UserId AND c.CreationDate BETWEEN ‘20110901’ AND ‘20111001’
WHERE u.Id = 22656
=906936 (1219*744)
Join Users Posts gives not unique u.Id and for each row we join dbo.Comments
if we change logic:
join Users Posts sum PostsScore get top 1000 and result (each row have unique Id) join with Comments
It works much faster 🙂
Yep, very good! That’s one of the goals of webcasts like this: to get you to download the Stack Overflow database, use these queries for practice, and learn to do performance tuning yourself. Good job!
Original query is wrong -> produces duplicates:
select top 1000 u.Reputation, u.DisplayName, u.AboutMe,
sum(p.Score) as PostsScore,
sum(c.Score) as CommentsScore
from dbo.Users u
left join dbo.Posts p on u.Id
= p.OwnerUserId and
p.CreationDate between @startdate and @enddate
left join dbo.Comments c on u.Id
= c.UserId and
c.CreationDate between @startdate and @enddate
where u.Location = @Location
–and u.DisplayName != N’Jon Skeet’
group by u.Reputation, u.DisplayName, u.AboutMe
order by sum(p.Score) desc
Example:
select top 1000 u.Reputation, u.DisplayName, u.AboutMe,
p.Id
as PostId,
p.Score as PostsScore,
—
c.Id
as CommentId,
c.Score as CommentsScore,
u.Id
as UserId
from dbo.Users u
left join dbo.Posts p on u.Id
= p.OwnerUserId and
p.CreationDate between ‘2009-06-01’ and ‘2009-08-01’
left join dbo.Comments c on u.Id
= c.UserId and
c.CreationDate between ‘2009-06-01’ and ‘2009-08-01′
where u.Location = N’Reading, United Kingdom’
and u.DisplayName = N’YiSh’
Data:
– user has 2 posts and 2 comments and there are 4 rows so the sum will be higher than normal
Reputation DisplayName AboutMe PostId PostsScore CommentId CommentsScore UserId
300 YiSh 1130449 3 956428 0 94814
300 YiSh 1130449 3 956448 0 94814
300 YiSh 1138599 1 956428 0 94814
300 YiSh 1138599 1 956448 0 94814
If you try with user:
and u.DisplayName = N’Graeme Bradbury’
This user has 11 posts and 7 comments and original query returns 77 rows. So sums are wrong.
There is one user and you have join with 11 posts so there are 11 records and the you joins it with 7 comments and you have 77 records.
In original query user has:
3515 Graeme Bradbury Windows Azure Contractor 210 33
But when you drill down to records in each table it should have:
6 3515 Graeme Bradbury Windows Azure Contractor 30 3
Posts score is multiplied by 7 (count on comments) and comment score is multiplied by 11 (count on Posts).
My implementation:
– I was working on StackOverflow 2010
– original query was 2:08 minutes
– after indexing it was 0:51 minute
– the main problem is with aggregation cause it takes
– after switching to temp tables – 0 seconds
– also original query is wrong – there are duplicates -> see separate comment
——————————————————————————————————————–
create or alter proc dbo.rpt_topusers_bylocation
@Location nvarchar(100), @startdate date, @enddate date
as
begin
create table #Users (Id int, Reputation int, DisplayName nvarchar(40), AboutMe nvarchar(max))
create table #Posts(OwnerUserId int, score int)
create table #Comments(UserId int, score int)
–users
insert into #Users
select u.Id
, u.Reputation, u.DisplayName, u.AboutMe
from dbo.Users u
where u.Location = @Location
–posts
insert into #Posts
select p.OwnerUserId, sum(p.Score) as PostsScore
from dbo.Posts p
where p.OwnerUserId in (select ID from #Users)
and p.CreationDate between @startdate and @enddate
group by p.OwnerUserId
–comments
insert into #Comments
select c.UserId, sum(c.Score) as CommentsScore
from dbo.Comments c
where c.UserId in (select ID from #Users)
and c.CreationDate between @startdate and @enddate
group by c.UserId
–main select
select top 1000 u.Reputation, u.DisplayName, u.AboutMe,
p.score as PostsScore,
c.score as CommentsScore
from #Users u
left join #Comments c on u.Id
= c.UserId
left join #Posts p on u.Id
= p.OwnerUserId
order by p.score desc
end
go