When you create an index in SQL Server with multiple columns, behind the scenes it creates a related multi-column statistic for the index. This statistic gives SQL Server some information about the relationship between the columns that it can use for row estimates when running queries.
But what if you use ‘included’ columns in the index? Do they get information recorded in the statistics?
Here’s my Index
To test, we’ll create an index with multiple key columns and included columns. I’m using a restored copy of the StackOverflow database.
/* FavoriteCount and LastEditDate aren't in the index key */ /* They'll just hang out in the leaf of the index */ CREATE INDEX ix_Posts_OwnerUserId_PostTypeId_INCLUDES ON dbo.Posts (OwnerUserId, PostTypeId) INCLUDE (FavoriteCount, LastEditDate); GO
First Stop: sys.Stats and Friends
The sys.stats DMV lets me query metadata out about my statistics, and I can join up to find out how many columns it has, what order they are in, when they were updated, and all sorts of info:
SELECT so.name, st.name, st.stats_id, sc.stats_column_id, c.name as column_name, st.auto_created, st.filter_definition, sp.last_updated, sp.rows, sp.rows_sampled, sp.steps, sp.modification_counter FROM sys.stats AS st JOIN sys.stats_columns AS sc on st.object_id=sc.object_id and st.stats_id=sc.stats_id JOIN sys.columns as c on sc.object_id=c.object_id and sc.column_id=c.column_id JOIN sys.objects as so on st.object_id=so.object_id CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp WHERE so.name='Posts' ORDER by so.name, st.stats_id, sc.stats_column_id; GO
Here’s the first few columns of the results:
This doesn’t show FavoriteCount or LastEdit date. My index key columns are in the statistic, in the same order they appear in the index.
What About DBCC SHOW_STATISTICS?
We can see a representation of the statistic with more information. We just plug in the name of the table and the statistic to DBCC SHOW_STATISTICS, like this:
DBCC SHOW_STATISTICS ('Posts', 'ix_Posts_OwnerUserId_PostTypeId_INCLUDES'); GO
This returns a few result sets. We get information about the header of the statistics, some density information about the relationship between the columns, and a histogram that shows the distribution of rows for the leading column in the statistic.
That Density Vector Has Three Rows!
Check that out, the density vector has a third row to help SQL Server estimate data distribution for OwnerUserId, PostTypeId, and the Id column. We didn’t specify the Id column anywhere in our index!
The Id column is the key of the clustered index on this table. SQL Server decided that it would also be interesting to know the density information of that as a third column.
What if the Index Was Unique? Would it Still Sneak In the Clustering Key?
The index above doesn’t have unique data, but let’s test a different index:
CREATE UNIQUE INDEX ix_VoteTypes_Name on dbo.VoteTypes (Name); GO DBCC SHOW_STATISTICS ('VoteTypes', 'ix_VoteTypes_Name'); GO
Here’s the density vector:
SQL Server decided that tracking the density information for the combination of Name and Id was still useful, even though this is a unique index and SQL Server didn’t have to sneak that column into the key of the index to make it unique behind the scenes.
Included Columns Weren’t in the Density Vector, but My Clustering Key Was!
In other words, included columns won’t be in the density vector unless they happen to also be in the key of your clustered index.
Want to write better queries?
- Learn How to Think Like SQL Server for $29
- Learn How to Tune Indexes and Speed Up SQL Server for $299
- Join us in person for a week of Advanced Query and Indexing
When you set up a Windows Server, you might be tempted to add a bunch of roles, just in case you want to use them in the future. Don’t do it: things can get weird.
If you add the Hyper-V role to Windows Server 2012 and higher, it changes some things about how Windows behaves. Your server is now a host. And the host starts to behave in some ways like a virtual machine — it’s now the “Parent Partition”, which is a “quasi-virtual machine”.
There are limits to the number of logical processors that any virtual machine partition might see, so if you have more than 64 logical processors, this can limit what’s available in the Parent Partition / host operating system, and really confuse you.
When SQL Server starts up, it checks to see if you’re virtualized. If the Hyper-V role is enabled, it will think it’s running in a VM. This could end up in a sticky situation if you have a licensing audit and hyper-threading is enabled — you’re supposed to license all the logical processors in a virtual machine regardless of hyper threading.
But good news, this is an easy problem to solve:
- Don’t install the Hyper-V role “just in case” on a SQL Server
- If someone else sets up your SQL Servers, check the installed roles before going live
- Don’t ever design an architecture where a Hyper-V host does double-duty and is also a SQL Server
- If you log onto a Hyper-V host, look at ‘Logical processors’ to see the total count (‘Host logical processors’ may be lower)
It sure can.
The lock risks of shrinking data files in SQL Server aren’t very well documented. Many people have written about shrinking files being a bad regular practice– and that’s totally true. But sometimes you may need to run a one-time operation if you’ve been able to clear out or archive a lot of data. And you might wonder what kind of pains shrinking could cause you.
One pain it could cause you is blocking and lock waits. It’s easy to reproduce, here’s how.
A Recipe for LCK_M_X Lock Waits and Blocking
I was able to easily reproduce blocking in my restored copy of the StackOverflow database on SQL Server 2014 by doing a few things.
I rebuilt a non-clustered index on the Posts table with this command:
ALTER INDEX [kl_Posts_OwnerUserId_CreationDate] on Posts REBUILD WITH (ONLINE=ON); GO
This ate up some space, and gave shrink something to do!
I then started a data modification and left an open transaction running:
BEGIN TRAN DELETE FROM dbo.Posts where Id=16;
And then, in another session, I started to shrink the data file:
DBCC SHRINKFILE ('StackOverflow'); GO
If I run that with @get_locks=1, I can see details on exactly what lock SHRINKFILE is after:
<Database name="StackOverflow"> <Locks> <Lock request_mode="S" request_status="GRANT" request_count="1" /> </Locks> <Objects> <Object name="(null)"> <Locks> <Lock resource_type="FILE.FILE_SHRINK" request_mode="X" request_status="GRANT" request_count="1" /> </Locks> </Object> <Object name="Posts" schema_name="dbo"> <Locks> <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="1" /> <Lock resource_type="PAGE" page_type="*" index_name="kl_Posts_OwnerUserId_CreationDate" request_mode="X" request_status="GRANT" request_count="3" /> <Lock resource_type="PAGE" page_type="*" index_name="kl_Posts_OwnerUserId_CreationDate" request_mode="X" request_status="WAIT" request_count="1" /> </Locks> </Object> </Objects> </Database>
Yep, it’s waiting on a page lock in kl_Posts_OwnerUserId_CreationDate.
But How Many Locks did SHRINKFILE Really Need?
Maybe it needed just a few locks… right?
Well, I ran a trace against my session that did that shrinkfile command, and here’s the number of locks by database and type that it caught…
And this was a FAST run, except for that blocking!
Shrinking files is painful
It can actually be worse than this — on a few runs, I was able to reproduce blocking with SCH_M locks that could cause even nastier blocking chains.
Shrinking is slow, tedious work. It eats CPU, steals your IO, and always takes longer than you want. And if it has work to do and you’ve got users or processors modifying that data, lock contention ensues.
Worst of all, at the end of shrinking a database, nobody will ever thank you. Be careful out there, and if you must run it make sure you don’t leave it unattended.
Brent says: whoa, I never even thought about that. Maybe we should design indexes to improve DBCC SHRINKDB speed.
Erik says: if you do this more than once, I will come find you.
DBAs often complain about SAN Administrators. “You can’t trust them.” I’ve seen established DBAs write publicly that they resort to lying about how SQL Server works to get what they want from a SAN Administrator.
That’s pretty toxic. Other DBAs start to think it’s normal to have a terrible relationship with your SAN Administrator.
But it’s not normal. And if there’s a terrible relationship between DBAs and the SAN team, that’s the DBA team’s fault, too.
The first SAN team I ever worked with
I’m pre-disposed to like SAN Administrators because I’ve worked with great people. When I started out with SQL Server at a small dot com, we had a two person SAN team and many terabytes of data across both SQL and NOSQL solutions. Our software team and our data grew insanely quickly.
The SAN team had more fun than almost anyone in the building. They had to work super hard. They got paged a lot. They were sometimes grumpy on the phone at 3 am, just like the DBAs. But they were funny and smart and the kind of people who could magically turn a crappy situation into a good time.
They didn’t always want to make everything RAID 10, and they didn’t always automatically believe the problem was the SAN when I said, “the SQL Server’s slow.” But they worked with me every time when I had a problem, and we always found a solution.
Over time, I learned ways to show them real metrics when I really needed more storage speed rather than just saying, “it’s slow.”
Most SAN Administrators I Work With Today
I still work with SAN Administrators frequently. They’re usually helpful – in fact, they’re often happy that someone would like to hear how the SAN is configured and why it’s set up that way.
Most SAN Admins I meet work alone or in small groups. They’re super busy, and sometimes mistakes get made (just like DBAs). But also like DBAs, I’ve found them to be pretty happy when there’s a real reason that justifies investing in better hardware. They’re able to admit when something’s not right, fix it, and move on.
Remember, The SAN Admin had just as much training as you did
That’s right, they probably didn’t get any training either. The storage world changes fast, and they have to try to keep up.
Yes, they get taken out to nice dinners by the SAN vendor, and you don’t. But think about how your job looks to the people over at the helpdesk. Ever gotten a bottle of scotch from the developers as a reward for saving the day? Ever had flextime and rolled into the office late? It’s not just the SAN admins who have some perks.
Your SAN Admin isn’t out to get you. They just have a lot of customers.
Your mission as a DBA is to make your databases perform as well as the business needs them, and protect the RPO and RTO of your customers. The SAN Administrator’s goal is to provide enough storage capacity and performance as the business needs. They’ve got a lot of customers– the performance of every one of your databases isn’t at the top of their list. When the database is slow, it’s hard for them to know that the issue is the storage.
I’m not saying that there aren’t bad SAN Admins out there. There certainly are.
But don’t be the person who misrepresents things and thinks someone’s out to get them. Aspire to be more like my old SAN team: the kind of person who can turn a crappy situation into a good time. That’s usually a lot more effective.
Brent says: wait a minute – I never got a bottle of Scotch from my developers. I mean, I got hit over the head with a bottle once, but there wasn’t any alcohol left in it.
By now, you have probably heard of CTEs
And you may have even heard them referred to as Inline Views. Really, an Inline View can be any type of derived table. It’s very easy to illustrate when one may turn into a performance problem with CTEs, if you aren’t careful.
A lot of people think that when you call a CTE, the results are somehow persisted in a magical happy place and the underlying query just hangs back admiring the output as it sails into the upper deck.
Take the following example, which serves no real purpose.
WITH c1 AS ( SELECT [sp].[BusinessEntityID] , [sp].[TerritoryID] , [sp].[CommissionPct] , [sp].[SalesYTD] FROM [Sales].[SalesPerson] AS [sp] WHERE [sp].[TerritoryID] = 6 ) SELECT [c1].[BusinessEntityID] , [c1].[TerritoryID] , [c1].[CommissionPct] , [c1].[SalesYTD] FROM [c1] AS [c1] WHERE [c1].[BusinessEntityID] = 278
It has a perfectly reasonable execution plan, and will lead a happy life.
Now, let’s pretend some additional request requires us to locate additional Business Entities in the same territory. No problemo.
WITH c1 AS ( SELECT [SP].[BusinessEntityID] , [SP].[TerritoryID] , [SP].[CommissionPct] , [SP].[SalesYTD] FROM [Sales].[SalesPerson] AS [SP] WHERE [SP].[TerritoryID] = 6 ) SELECT [sp].[BusinessEntityID] , [sp].[TerritoryID] , [sp].[CommissionPct] , [sp].[SalesYTD] , [c1].[BusinessEntityID] , [c1].[TerritoryID] , [c1].[CommissionPct] , [c1].[SalesYTD] FROM [Sales].[SalesPerson] AS [sp] JOIN [c1] AS [c1] ON [c1].[TerritoryID] = [sp].[TerritoryID] WHERE [sp].[BusinessEntityID] = 278 AND [c1].[BusinessEntityID] <> [sp].[BusinessEntityID]
Doesn’t get much easier than that. But what happened with the plan?
Huh. That’s a whole other index operation. So just like when you join to a view, the view has to be executed and returned. In fact, if you keep throwing joins that reference the original CTE, you’ll keep getting more index operations.
WITH c1 AS ( SELECT [SP].[BusinessEntityID] , [SP].[TerritoryID] , [SP].[CommissionPct] , [SP].[SalesYTD] FROM [Sales].[SalesPerson] AS [SP] WHERE [SP].[TerritoryID] = 6 ) SELECT [sp].[BusinessEntityID] , [sp].[TerritoryID] , [sp].[CommissionPct] , [sp].[SalesYTD] , [c1].[BusinessEntityID] , [c1].[TerritoryID] , [c1].[CommissionPct] , [c1].[SalesYTD] FROM [c1] AS [c1] JOIN [Sales].[SalesPerson] AS [sp] ON [c1].[TerritoryID] = [sp].[TerritoryID] JOIN [c1] AS [c2] ON [c2].[BusinessEntityID] = [c1].[BusinessEntityID] WHERE [sp].[BusinessEntityID] = 278 AND [c1].[BusinessEntityID] <> [sp].[BusinessEntityID]
Did someone say they wanted another index operation? Because I thought I heard that.
And, just to drive the point home, let’s add another one:
WITH c1 AS ( SELECT [SP].[BusinessEntityID] , [SP].[TerritoryID] , [SP].[CommissionPct] , [SP].[SalesYTD] FROM [Sales].[SalesPerson] AS [SP] WHERE [SP].[TerritoryID] = 6 ) SELECT [sp].[BusinessEntityID] , [sp].[TerritoryID] , [sp].[CommissionPct] , [sp].[SalesYTD] , [c2].[BusinessEntityID] , [c2].[TerritoryID] , [c2].[CommissionPct] , [c2].[SalesYTD] FROM [c1] AS [c1] JOIN [Sales].[SalesPerson] AS [sp] ON [c1].[TerritoryID] = [sp].[TerritoryID] JOIN [c1] AS [c2] ON [c1].[BusinessEntityID] = [c2].[BusinessEntityID] JOIN [c1] AS [c3] ON [c1].[BusinessEntityID] = [c3].[BusinessEntityID] WHERE [sp].[BusinessEntityID] = 278 AND [c1].[BusinessEntityID] <> [sp].[BusinessEntityID]
To sum things up, CTEs are a great base
From which you can reference and filter on items in the select list that you otherwise wouldn’t be able to (think windowing functions), but every time you reference a CTE, they get executed. The fewer times you have to hit a larger base set, and the fewer reads you do, the better. If you find yourself referencing CTEs more than once or twice, you should consider a temp or persisted table instead, with the proper indexes.
Want to Up Your Query Writing Game?
- Read more about CTEs in this blog post by Jeremiah
- Learn How to Think Like SQL Server from Brent in this $29 course
- Check out our 5 day in-person course with Jeremiah and Kendra on Advanced Querying and Indexing
Brent says: when I first start writing a query, I write for readability first. Make the query easy to understand – and CTEs can help big time here.
Kendra says: It drives some people nuts if you preface your CTE with a semi-colon to make sure the prior statement was properly terminated. But I won’t judge you if that’s what you like to do.
If you’ve got indexed views, or are thinking of using them, this is important.
First, go read the blog post, “An Indexed View Bug with Scalar Aggregates” by Paul White. It will take a minute, but it’s important to understand.
Paul shows a scenario where the data in a certain kind of indexed view can become out of sync with the base tables. You can still query the indexed view and return data using the demo code on his blog– but it’s incorrect data.
He mentions that you can identify the problem with DBCC CHECKDB or DBCC CHECKTABLE, but there’s something that’s not obvious: your regularly scheduled CHECKDB jobs are probably not going to identify if your indexed views are corrupted in this way.
CHECKDB doesn’t run EXTENDED_LOGICAL_CHECKS by default anymore
If you are using compatibility level 100 or higher (SQL Server 2008+), CHECKDB won’t validate the indexed view data against the base tables unless you specifically request it to do the extra work.
Here, I’ll prove it. I used the code in Paul White’s post to reproduce the bug on a SQL Server 2014 database. And then I ran plain old CHECKDB:
But as soon as I ran it with EXTENDED_LOGICAL_CHECKS, the world wasn’t so rosy:
The same thing is true of DBCC CHECKTABLE. I only find out about the issue if I do logical checks when running it against the indexed view:
Don’t Assume CHECKDB Will Catch Everything
It doesn’t run every possible check by default.
If you use indexed views and you suspect you might be impacted by this bug, it might be time for some EXTENDED_LOGICAL_CHECKS. But be careful, Books Online warns that it doesn’t run this by default anymore because “Running this query can have a very high effect on performance, and its progress cannot be tracked.” For sensitive environments, you can restore a full backup to another server and run CHECKDB against the restored database.
REDMOND, WA – Microsoft announced today that SQL Server vNext’s editions will be named by their new marketing partner, Taco Bell.
“We’re excited that SQL Server Supreme Edition is now our top-of-the-line product, replacing Enterprise Edition. When you hunger for warm, satisfying In-Memory OLTP wrapped in the power of unlimited memory, only SQL Server Supreme Edition satisfies,” said product manager Mark “Tex” Souza of Microsoft’s Data Platform Group.
Souza continued, “Like the Sriracha Quesarito, which is the best of a quesadilla and a burrito rolled into one, Windows Azure SQL Database offers a great hybrid of the SQL Server you know and love, plus the tasty goodness of Microsoft-managed cloud services. Therefore, we’re renaming it as the Microsoft Databasarito.”
At press time, we’re still trying to confirm that Express Edition has been added to the Taco Bell Dollar Cravings menu, and track down the rumor of a new product code named Fourthdatabase.
Learn the basics of Foreign Keys in SQL Server: trust, join elimination, cascading deletes, and indexing child keys. Limited on time? Check out the Table of Contents below.
Table of Contents:
03:46 – Foreign keys change performance
04:12 – Let’s say we have no foreign key
04:41 – This query has a join
05:05 – The plan with no foreign key
05:34 – Add a foreign key relationship
05:54 – Run the same query
06:30 – Foreign key join elimination
06:38 – Try to add an invalid PostTypeId
07:08 – You can use NOCHECK
07:54 – Now can I add that row?
08:16 – But when I try to re-enable my untrusted foreign key…
08:29 – I get an error if I use WITH CHECK
09:29 – Does foreign key join elimination work now?
09:33 – I get an error if I use WITH CHECK
09:36 – Does foreign key join elimination work now?
09:37 – Nope, the foreign key isn’t trusted
09:51 – Delete the row that violates the key
10:11 – Checking keys needs heavy locks
11:42 – Join elimination may not happen
12:24 – What about cascading updates/deletes?
13:31 – Example syntax – cascading deletes
14:34 – Behind the scenes…
15:51 – Cascading deletes
17:03 – Do I always need to index foreign keys?
18:14 – Creating the foreign key
19:12 – Indexes on the child table can help
19:38 – Takeaways
20:24 – BrentOzar.com/go/Trust
Brent says: when you’re deciding whether or not you should include foreign keys in your database, this video will help.
Can I upgrade an existing instance without migrating?
This is nothing against SQL Server 2014, but I can’t stand in-place upgrades. Over the years I’ve had in-place upgrades work flawlessly on a few instances, and then had an install issue cause it to fail in the middle on other instances. Usually the critical instances, just because I’m not always lucky. And when upgrade fails, it doesn’t always roll back completely, or allow you to just re-run it. You may be down for a good long time.
But you’ve got backups, right? Really recent ones? Even so, how long does it take to restore them, if you need to do that to a different location? (And did you set it up right?)
While in-place upgrades may be fine for test and dev environments, they aren’t a good fit for your production instances where RPO and RTO are critical.
Should I raise the database compatibility level to 120 to use the new cost-based optimizer?
If you can test it for all your queries and know if it’ll be right for you, you can turn it on. Most folks can’t be sure of this so they start with it off to reduce risk from the migration.
The new cost based optimizer is very exciting, but there’s definitely a chance you can hit performance regressions. If you can’t test in advance, turning it on at the same time you migrate makes your troubleshooting more complex if you hit a problem.
Can we restore a full backup with NORECOVERY, run CHECKDB, and then later restore a differential?
You can’t run CHECKDB unless you’ve done a full restore and made the database writable. That means you can’t apply a differential backup afterwards.
You can potentially mitigate the risk by running a full CHECKDB against the database prior to running the backup. You may also run the backup with CHECKSUM (not a substitute for CHECKDB but it does apply some protection), and then run CHECKDB in a maintenance window shortly after the migration.
It’s all about your risk tolerance.
Is it still helpful to run DBCC UPDATEUSAGE after a migration or upgrade?
DBCC UPDATEUSAGE is typically no longer needed and only impacts output from sp_spaceused, anyway. Check the ‘remarks’ section on its page in books online for the full details: https://msdn.microsoft.com/en-us/library/ms188414.aspx
People got into the habit of this because it was needed to upgrade to SQL Server 2005. But it’s OK, you can let go of that habit (and it’s worth it, this command can be pretty slow).
Should I run sp_updatestats after a migration or upgrade?
This was a big deal when upgrading to 2005 because of changes they made to statistics, but it’s not needed specifically for SQL Server 2014 upgrades. Some folks like to do this to kick the tires, but don’t go crazy or think it’s magical.
What should I set ‘max server memory’ to for SQL Server 2014 if I’m running Standard Edition?
Possibly to more than you think. Max server memory in Standard Edition is limited to 128GB for the buffer pool in SQL Server 2014, but you may want to set it higher so that other parts of SQL Server can access memory above that level. Read more here.
How do I turn on the tempdb IO enhancement in SQL Server 2014?
You don’t have to, it’s just on.
Are there any known upgrade issues?
Microsoft keeps a list here, along with a pre-upgrade checklist: https://msdn.microsoft.com/en-us/library/bb933942.aspx
Which cumulative update should I use?
You should definitely use one if you care about performance, but the choice can be very complicated. Especially if you’re using Availability Groups. There is no easy answer: read all the Cumulative Update articles and test heavily before you go live.
Want more SQL Server setup help? Check out our setup guide for SQL Server.
And while you’re here, please don’t forget to think about your version of Windows.
First, decide what you want to get really good at. Then try to break it in many possible ways and analyze why it broke.
The more you break something, the more you’ll understand it.
I use this technique all the time. Last year, I encountered a problem with a lack of worker threads in SQL Server in a client environment. That issue was particularly difficult, because when the problem occurred it was difficult to observe the SQL Server without using the Dedicated Admin Connection (DAC). At the time, I built some repro scripts in my own environment to show how the issue started, why it was tricky to observe, and how to observe it and confirm the queries at the root of it all without restarting the SQL Server. And just recently I wrote new scripts breaking the same thing in different ways — and showing how parallelism can be a factor — for our 2015 Performance Troubleshooting class. Taking the time to break it myself taught me nuances about workers in SQL Server that I wouldn’t have learned otherwise.
“But Kendra, I don’t have time for this!”
Here’s how to make time. Mix and match these three ideas:
1) Make it a team exercise.
One person breaks something in pre-production, the other has to fix it. You save and publish notes on the error messages and how you responded.
2) Tie it to a learning program.
Purchase one of our training video or in-person classes, and design a learning program that uses the training class as a launching board. Set a goal to write your own scripts breaking something for at least 5 of the modules.
If you really want to lock in the knowledge, write down a summary of what you’ve learned in your own words. You can blog it or publish it as notes for your team at work to reference. Your own notes will help you over time more than you expect.
3) Set goals for mastering specific items for the year.
Talk through it with your manager and document why you want to master the topic, and three things you want to achieve at work after you’re done.