You know how to write TSQL queries, but do you know how to make them faster? In this free 30 minute talk, Kendra Little will cover the basics of clustered and non-clustered indexes in SQL Server. She’ll show how you can tell which indexes your queries are using and show warning signs that mean perhaps your query needs a re-write. If you’ve ever wanted to learn how you can use indexes to make your queries run faster, this session is for you.
Want the scripts from the webcast? Scroll on down, the code is below the video.
Sample Databases from Microsoft
The sample code below uses the Contoso Retail DW database, but you can also play around with AdventureWorks. Here are where you can grab ‘em:
- ContosoRetailDW: http://www.microsoft.com/en-us/download/details.aspx?id=18279
- AdventureWorks: http://msftdbprodsamples.codeplex.com/
Demo Code
--------------------------------------------------
-- Demo script copyright Brent Ozar PLF, 2012
-- This script is ***NOT*** suitable for production use!
-- The script contains commands that could significantly
-- harm/change performance, so use on a dedicated test
-- instance only, and always research any commands
-- you aren't familiar with.
--------------------------------------------------
--Set up the script;
SET NOCOUNT OFF;
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GO
--Restore the db if needed
--This is restoring to my rotational drive
--Time estimate: 30 seconds
USE [master]
GO
IF DB_ID('ContosoRetailDW') IS NOT NULL
BEGIN
ALTER DATABASE ContosoRetailDW SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
END
RESTORE DATABASE [ContosoRetailDW] FROM DISK = N'S:_INSTALLContosoRetailDW.bak'
WITH
MOVE N'ContosoRetailDW2.0' TO N'T:MSSQLDataContosoRetailDW.mdf',
MOVE N'ContosoRetailDW2.0_log' TO N'T:MSSQLDataContosoRetailDW.ldf',
REPLACE, NOUNLOAD, STATS = 5
GO
-- Run single threaded. This is just to simplify the demo--
-- We don't have time in 30 minutes to cover the index examples
-- AND also all the impacts of parallelism on query cost and execution.
-- This is not a production configuration recommendation!
EXEC sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE
GO
--- Let's clear data out of our SQL Server's memory.
--- This is NOT a safe command for production servers!
--- We use this in the test so it causes a "fair playing field"
--- Otherwise we'd read from disk the first time, then read
--- From memory on subsequent runs-- which wouldn't be fair.
DBCC DROPCLEANBUFFERS;
GO
-- Get into our test database.
USE ContosoRetailDW;
GO
--- Let's say we're running this query.
--- We need to know total sales by day
--- Just for one store
--- For seven days
--- And this is just a sample of our data!
SELECT CAST(datekey AS DATE) AS [Date of Sales],
SUM(salesQuantity) AS [Sum of SalesQuantity],
SUM(SalesAmount) AS [Sum of SalesAmount]
FROM dbo.FactOnlineSales AS fs
WHERE StoreKey = 306
AND datekey BETWEEN '2009-11-01' AND '2009-11-07'
GROUP BY CAST(datekey AS DATE)
ORDER BY [Date of Sales] DESC;
GO
----------------------------------
--Question 1:
--Why did that take so long?
----------------------------------
--- Let's clear data out of our SQL Server's memory.
--- This is NOT a safe command for production servers!
DBCC DROPCLEANBUFFERS;
GO
--These commands tell SQL Server to report back on how my query runs.
--Results will be on the 'Messages' tab.
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
GO
-- Also turn on "actual execution plans"
-- The execution plan will show up in its own tab.
-- This can be done with ctrl+m
SELECT CAST(datekey AS DATE) AS [Date of Sales],
SUM(salesQuantity) AS [Sum of SalesQuantity],
SUM(SalesAmount) AS [Sum of SalesAmount]
FROM dbo.FactOnlineSales AS fs
WHERE StoreKey = 306
AND datekey BETWEEN '2009-11-01' AND '2009-11-07'
GROUP BY CAST(datekey AS DATE)
ORDER BY [Date of Sales] DESC;
GO
--- How much did we read?
-- What does the execution plan tell us?
-- SQL Server says this is the ideal index for THIS query:
USE [ContosoRetailDW]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[FactOnlineSales] ([StoreKey],[DateKey])
INCLUDE ([SalesQuantity],[SalesAmount])
GO
-- But do we have to have that exact index?
-- First, let's look at all the indexes on the table:
EXEC sp_helpindex 'dbo.factOnlineSales'
-- OK, well we're starting fresh!
-- I talk my DBA into creating ONE very narrow index.
-- In the real world, we won't always have the perfect index for *every* query--
-- And there is very good reason for that!
-- My DBA wants the index to start with DateKey, and they won't agree to all my includes.
CREATE NONCLUSTERED INDEX [ix_FactOnlineSales_DateKeyStoreKey]
ON [dbo].[FactOnlineSales] ([DateKey],[StoreKey]) WITH (MAXDOP=2);
GO
-- What happens to our query?
-- Will it use the not-so-perfect index?
-- Turn on actual execution plans: CTRL+M
--- This is NOT a safe command for production servers!
DBCC DROPCLEANBUFFERS;
GO
SELECT CAST(datekey AS DATE) AS [Date of Sales],
SUM(salesQuantity) AS [Sum of SalesQuantity],
SUM(SalesAmount) AS [Sum of SalesAmount]
FROM dbo.FactOnlineSales AS fs
WHERE StoreKey = 306
AND datekey BETWEEN '2009-11-01' AND '2009-11-07'
GROUP BY CAST(datekey AS DATE)
ORDER BY [Date of Sales] DESC;
GO
--- Hmm. Well, what if I forced it? I can use an INDEX HINT!
--- (is this a good idea???)
--- This is NOT a safe command for production servers!
DBCC DROPCLEANBUFFERS;
GO
SELECT CAST(datekey AS DATE) AS [Date of Sales],
SUM(salesQuantity) AS [Sum of SalesQuantity],
SUM(SalesAmount) AS [Sum of SalesAmount]
FROM dbo.FactOnlineSales AS fs WITH ( INDEX=ix_FactOnlineSales_DateKeyStoreKey )
WHERE StoreKey = 306
AND datekey BETWEEN '2009-11-01' AND '2009-11-07'
GROUP BY CAST(datekey AS DATE)
ORDER BY [Date of Sales] DESC;
GO
-- Bad news: Index hints are risky business over time.
-- Is there a better way?
-- Let's look at those indexes again.
EXEC sp_helpindex 'dbo.factOnlineSales';
GO
-- There's a secret about the OnlineSalesKey column
-- It is the clustering key, so it's automatically in my nonclustered index.
-- I can use the knowledge to rewrite my query.
--- This is NOT a safe command for production servers!
DBCC DROPCLEANBUFFERS;
GO
WITH mykeys
AS ( SELECT OnlineSalesKey
FROM dbo.FactOnlineSales AS fs
WHERE datekey BETWEEN '2009-11-01' AND '2009-11-07'
AND StoreKey = 306)
SELECT CAST(datekey AS DATE) AS [Date of Sales],
SUM(salesQuantity) AS [Sum of SalesQuantity],
SUM(SalesAmount) AS [Sum of SalesAmount]
FROM dbo.FactOnlineSales AS fs
JOIN mykeys ON fs.OnlineSalesKey = mykeys.OnlineSalesKey
GROUP BY CAST(datekey AS DATE);
GO
-- What if I used a temp table instead?
--- This is NOT a safe command for production servers!
DBCC DROPCLEANBUFFERS;
GO
IF OBJECT_ID('tempdb..#mykeys') IS NOT NULL
DROP TABLE #myKeys;
CREATE TABLE #myKeys
(
onlinesaleskey INT
);
INSERT #myKeys ( onlinesaleskey )
SELECT OnlineSalesKey
FROM dbo.FactOnlineSales AS fs
WHERE datekey BETWEEN '2009-11-01' AND '2009-11-07'
AND StoreKey = 306;
SELECT CAST(DateKey AS DATE) AS [Date of Sales],
SUM(salesQuantity) AS [Sum of SalesQuantity],
SUM(SalesAmount) AS [Sum of SalesAmount]
FROM dbo.FactOnlineSales AS fs
JOIN #mykeys ON fs.OnlineSalesKey = #mykeys.OnlineSalesKey
GROUP BY CAST(DateKey AS DATE)
ORDER BY [Date of Sales] DESC;
GO
-- Which of these is better?
-- I have a definite preference: watch the webcast to find out why.
Russ Thomas October 10, 2012 | 12:40 pm
First, thanks for the redirect to the Paul White post on temp tables vs table variables – that was my question too. Second, thanks for your guys dedication to the community. These webcasts are great. I started this one thinking it was going to be a review… but as always learned some really good new ideas. Keep it up.
Kendra Little October 10, 2012 | 2:38 pm
Thanks Russ!
The post I was thinking of is this one: http://sqlblog.com/blogs/paul_white/archive/2012/08/15/temporary-tables-in-stored-procedures.aspx
But there’s also this one:
http://sqlblog.com/blogs/paul_white/archive/2012/08/17/temporary-object-caching-explained.aspx
Whew!
And glad to hear that you enjoyed the webcast. Always trying to think of interesting ways to explore concepts.
Russ Thomas October 10, 2012 | 2:56 pm
It’s the first one that I read, though I plan to read the second as soon as the head-ache from the first one goes away. That dude is thorough.
Tahir Gul October 10, 2012 | 10:37 pm
Thanks Kendra.
Pingback: Something for the Weekend - SQL Server Links 12/10/12
Chandan Jha October 12, 2012 | 5:36 am
Thanks for this informative tutorial. I have one question and looks like I am confused here.
When you do the DBCC dropcleanbuffers before using the index hint at a point in the demo, the results show high logical reads and low physical reads. I am confused because if the memory is clean, the first time it should show high physical reads because the pages needed to be picked from the disk but the high logical reads show something opposite as I thought.
PS: Sorry if I am asking something stupid.
Thanks
Chandan
Kendra Little October 12, 2012 | 10:57 am
Not a stupid question at all!
With just 30 minutes I had a hard choice– I could either show less things and include the step of tracking/comparing the results better, or cover more things and not do as good a job comparing them. In this case I picked the second one, and I think that can make it a bit confusing.
When I am doing this in real life, I like to keep a scorecard (and I also testing things multiple times, etc of course)
Here’s an example the scorecard for the queries. (I reran the queries so the exact physical read vs readahead count may not match 100% with the video, as these can vary. I haven’t included parse and compile CPU time in these– just execution time).
Let me know if this doesn’t answer your question 100%. The logical reads are going up because of what the nested loop operator is doing in those plans.
1) First query with no modifications
logical reads 46535, physical reads 1, read-ahead reads 43346
CPU time = 1578 ms
2) Query after index is created, but with no hint or query changes. (Index is not used.)
logical reads 46535, physical reads 2, read-ahead reads 46524
CPU time = 1750 ms
3) Query after index hint is used to force the index
logical reads 74777, physical reads 19, read-ahead reads 282
CPU time = 188 ms
4) Query after rewritten with CTE
logical reads 74797, physical reads 19, read-ahead reads 263
CPU time = 172 ms
5) Query after rewritten with temp table
logical reads 261, physical reads 1, read-ahead reads 258
logical reads 74417, physical reads 19, read-ahead reads 16
CPU time = 31 ms
CPU time = 187 ms
Chandan Jha October 15, 2012 | 2:02 am
Thanks Kendra. My question was based on the high number of logical reads when during the first time you cleaned the buffers because I was surprised to see reads from memory where there was not data yet.
As per your answer, first the index or data pages are read from the disc but then due to nested loop,these pages are read again from memory hence the high number of logical reads.
Am I correct in the interpretation?
Thanks
Chandan
Steven Ormrod October 17, 2012 | 4:53 pm
Hi Kendra, I loved the video. I dug the ending part where you rewrote the CTE as a series of simpler statements. I like the philosophy of it being easier to tune several smaller statements rather than a single large, complex one.
vivekanand poojari December 8, 2012 | 3:37 am
Why did the SQL Server not use the non clustered index automatically the first time it was created ?
Why do we have to try the tricks ?
Is it because we are having SUM(Other columns which are not included in non clustered)
So is it safe to say, “when u retrieve columns which are not part of non clustered index, the sql server may or may not use the create non clustered index”