While chatting with attendees before my Don’t Fear the Execution Plan webcast, a question was posed: “As a DBA who mainly worries about backups, index fragmentation and checkdb, should I also be looking at execution plans?”
YES! Here’s why.
- Performance is important. A production DBA’s main focus should be the safety and security of the data, but the performance of servers is also something to be concerned with. Companies pay a lot of money in hardware, support, and licensing costs for these servers. Being able to squeeze every bit of performance from them will save the company money – which helps everyone in the company in the long run.
- Be a superhero. Your first response to an ongoing, persistent performance problem may be a hardware fix. Add more memory, add more disks, ask the SAN team to give you dedicated LUNs, beg for a new server. But at some point, it is going to be worth your time – and the company’s money – to tune bad queries. By being the person that can identify the bad queries and provide ways to improve them, you are putting on your superhero cape and saving the day. (How can you find those queries? Find out in Jeremiah’s Exploring the Magic of the Plan Cache blog!)
- You’re being proactive, not reactive. We all need to be reactive sometimes – that’s a facet of the job. But being proactive – identifying top resource-consuming queries on a server, identifying problem spots in the queries, and suggesting fixes – makes us look better. We show we are bringing value to the company.
- Grow your knowledge of SQL Server. You never know when you may need the information. A new developer may have been able to ask the DBAs questions at her last job; you may need the skill at your next job. You will also learn things along the way about SQL Server internals, and that’s great knowledge to have.
- Increase communication between DBAs and developers. If your developers already use execution plans to tune queries, you will be able to speak their language. If they don’t, you can teach them as you learn – and there is no better way to learn than to teach something! Breaking down communication barriers is a goal my clients typically have. It involves some work and willingness to give, but will make your job better.
Sometimes I run into a query plan that just shocks me. It’s like looking up and suddenly seeing an adult dressed as a squirrel riding a unicycle down a busy city street. You have to stop and ask yourself, “Did that really just happen?” (I live in Portland, Oregon, so yeah, I’ve seen that.)
A while back I blogged about how to write a query to demonstrate a big memory grant against the AdventureWorks2012 database. The trick was to stuff the query’s joins full of functions so that the query optimizer became baffled about how many rows might come out of the join. The query is terrible, but it’s useful for demonstrating some problematic situations with memory on a test server.
Recently I dug out the query to set up a demo on SQL Server 2014 and something strange happened: it saw through the awfulness of my query. It made my horrible TSQL fast.
Let’s feed my terrible query to SQL Server 2014
First, make sure the new cardinality estimator will be used for your query. You do this by setting the database compatibility level to 120:
ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=120; GO
Now we’ll run this terrible, awful, no good query:
SELECT e.[BusinessEntityID], p.[Title], p.[FirstName], p.[MiddleName], p.[LastName], p.[Suffix], e.[JobTitle], pp.[PhoneNumber], pnt.[Name] AS [PhoneNumberType], ea.[EmailAddress], p.[EmailPromotion], a.[AddressLine1], a.[AddressLine2], a.[City], sp.[Name] AS [StateProvinceName], a.[PostalCode], cr.[Name] AS [CountryRegionName], p.[AdditionalContactInfo] FROM [HumanResources].[Employee] AS e INNER JOIN [Person].[Person] AS p ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID])) INNER JOIN [Person].[BusinessEntityAddress] AS bea ON RTRIM(LTRIM(bea.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID])) INNER JOIN [Person].[Address] AS a ON RTRIM(LTRIM(a.[AddressID])) = RTRIM(LTRIM(bea.[AddressID])) INNER JOIN [Person].[StateProvince] AS sp ON RTRIM(LTRIM(sp.[StateProvinceID])) = RTRIM(LTRIM(a.[StateProvinceID])) INNER JOIN [Person].[CountryRegion] AS cr ON RTRIM(LTRIM(cr.[CountryRegionCode])) = RTRIM(LTRIM(sp.[CountryRegionCode])) LEFT OUTER JOIN [Person].[PersonPhone] AS pp ON RTRIM(LTRIM(pp.BusinessEntityID)) = RTRIM(LTRIM(p.[BusinessEntityID])) LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt ON RTRIM(LTRIM(pp.[PhoneNumberTypeID])) = RTRIM(LTRIM(pnt.[PhoneNumberTypeID])) LEFT OUTER JOIN [Person].[EmailAddress] AS ea ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(ea.[BusinessEntityID])); GO
Shazam, it finishes using only 63 ms of CPU time. It asks for a 27MB memory grant and estimates 290 rows (quite accurately).
The execution plan contains warnings that “Type conversion in expression … may affect “CardinalityEstimate” in query plan choice”, but wow, it really did a remarkably good job with this!
Using OPTION(QUERYTRACEON 9481) to test Compatibility Level 110
You can go back to the old cardinality estimator with SQL Server 2014 in two ways: You could change the whole database’s compatibility level back to 110 like this:
ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=110; GO
But that changes it for everything in the whole database. We might just want to see how THIS query would behave using the old cardinality estimator, but still on SQL Server 2014. You can do that by adding OPTION(QUERYTRACEON 9481) to the very end of our gruesome query:
SELECT e.[BusinessEntityID], p.[Title], p.[FirstName], p.[MiddleName], p.[LastName], p.[Suffix], e.[JobTitle], pp.[PhoneNumber], pnt.[Name] AS [PhoneNumberType], ea.[EmailAddress], p.[EmailPromotion], a.[AddressLine1], a.[AddressLine2], a.[City], sp.[Name] AS [StateProvinceName], a.[PostalCode], cr.[Name] AS [CountryRegionName], p.[AdditionalContactInfo] FROM [HumanResources].[Employee] AS e INNER JOIN [Person].[Person] AS p ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID])) INNER JOIN [Person].[BusinessEntityAddress] AS bea ON RTRIM(LTRIM(bea.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID])) INNER JOIN [Person].[Address] AS a ON RTRIM(LTRIM(a.[AddressID])) = RTRIM(LTRIM(bea.[AddressID])) INNER JOIN [Person].[StateProvince] AS sp ON RTRIM(LTRIM(sp.[StateProvinceID])) = RTRIM(LTRIM(a.[StateProvinceID])) INNER JOIN [Person].[CountryRegion] AS cr ON RTRIM(LTRIM(cr.[CountryRegionCode])) = RTRIM(LTRIM(sp.[CountryRegionCode])) LEFT OUTER JOIN [Person].[PersonPhone] AS pp ON RTRIM(LTRIM(pp.BusinessEntityID)) = RTRIM(LTRIM(p.[BusinessEntityID])) LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt ON RTRIM(LTRIM(pp.[PhoneNumberTypeID])) = RTRIM(LTRIM(pnt.[PhoneNumberTypeID])) LEFT OUTER JOIN [Person].[EmailAddress] AS ea ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(ea.[BusinessEntityID])) OPTION(QUERYTRACEON 9481);
Retesting the query with the old cardinality estimator… Ouch! The query uses 84,109 ms of CPU time and the execution plan is back to a world of confusion, thinking it’s going to have to handle a kazillion rows:
Hey there new cardinality estimator, I’d like to get to know you better.
I’m not saying the new cardinality estimator will be better at every query, that it won’t have any regressions, or that you should start putting functions around all your joins.
But it’s pretty remarkable when the optimizer takes code that you wrote to be as terrible as possible, and suddenly makes it fast. Sign me up for more of that.
Do you rely on index rebuilds to make queries run faster? Or do you always feel like statistics are “bad” and are the cause of your query problems? You’re not alone– it’s easy to fall into the trap of always blaming fragmentation or statistics. Learn why these two tools aren’t the answer to every problem in these two videos from Kendra Little.
Does Rebuilding Indexes Make Queries Faster in SQL Server?
Learn if rebuilding indexes is a good way to help queries run faster in SQL Server in this 13 minute video.
Not convinced that fragmentation isn’t really your problem? Read Brent’s post, “Stop Worrying about Index Fragmentation“.
Why are Statistics Always the Problem? (Or ARE They?)
Once you’ve moved beyond an addiction to rebuilding indexes, you may find that you become an obsessive Statistics Blamer. Learn why you might mistakenly think your stats are “bad” when something else is really the problem in this 12 minute video.
There’s a couple of new features in SQL Server 2014 that provide options for how you work with temporary objects. Will inline index creation or memory optimized temporary tables forever change the way you code? Let’s take a look!
Inline Index Creation
SQL Server 2014 brings us a TSQL improvement called “inline specification of CLUSTERED and NONCLUSTERED indexes.” This feature works in Standard Edition and applies to many types of table. This improves the functionality of table variables. But does it make table variables as good as temp tables?
First, let’s take a look at the feature, then take it for a test drive.
Creating a table with an inline nonclustered index
Here’s a simple look at the new feature– when I create a table (nothing temporary about it), I can name and define an index on multiple columns. In this case, I’m creating a nonclustered index on the columns j and k:
CREATE TABLE dbo.IndexCreation ( i int identity constraint pkIndexCreation primary key clustered, j char(10) index ixIndexCreation_jk nonclustered (j,k), k varchar(100) ); GO
I also have the option to put all my constraint and index create statements at the end of the table, like this:
CREATE TABLE dbo.IndexCreation ( i int identity, j char(10), k varchar(100), constraint pkIndexCreation primary key clustered (i), index ixIndexCreation_jk nonclustered (j,k) ); GO
What about temp tables?
The same syntax for inline index create listed above works just fine for me on temporary tables in my CTP of SQL Server 2014.
This is a good thing for some people! One of the issues with temporary tables in stored procedures is that creating an index on the temp table can prevent caching of the temporary item. For most people, this is no big deal, but for some frequently run stored procedures, it might make a difference.
Creating the nonclustered indexes on the temp table at creation time (instead of afterward) can improve temp table caching in stored procedures.
Before you rejoice, there’s some fine print. If you change your temp table syntax to use inline index creation and it enables caching, you might run into issues where statistics aren’t updated on your temporary tables when you think they would be. (The short version: Statistics are also cached for temp tables, and Gremlins keep from updating very frequently.) Just test carefully if you’ve got very frequently run stored procedures you’re modifying.
Inline index creation on table variables
The new syntax works on table variables, too! This means that with SQL Server 2014 and higher, you can create non-unique nonclustered indexes on table variables. You can even set the fillfactor option. (Note: I’m not saying you should use a lower fillfactor– I was just surprised that option was available for table variables.)
DECLARE @IndexCreation TABLE ( i int identity primary key clustered with (fillfactor=95), j char(10) index ixIndexCreation_jk nonclustered (j,k) with (fillfactor=95), k varchar(100) ); GO
With the new SQL 2014 syntax, you will also have the option to create a non-unique clustered index on a table variable:
DECLARE @IndexCreation TABLE ( i int, j char(10), k varchar(100), index cxIndexCreation_i clustered (i), index ixIndexCreation_jk nonclustered (j,k) ); GO
So, are table variables as good as temp tables now?
Well, sorry, not generally.
First, there’s some limitations to the “inline index create” feature. You can’t inline every kind of index you might want — you can’t add included columns or filters to indexes created with an inline create statement. Since you can add indexes with those to a temporary table after it’s created (and you CANNOT do that with a table variable), it has an advantage.
But temporary tables still have another advantage.
Statistics help temporary tables
When looking at the new feature, I wondered if these new indexes declared inline might secretly allow some sort of populated statistic on table variables– which hasn’t worked before.
But, unfortunately no. Even using the inline indexes, table variables do NOT get statistics to help the optimizer psychically predict how many rows will come back based on a specific predicate value. That can cause you to get less efficient execution plans. Let’s look at an example.
First up, the temp table:
CREATE TABLE #TestMe ( i INT IDENTITY, City VARCHAR(60), StateProvinceID INT, INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID), CONSTRAINT cxTestMe_i PRIMARY KEY CLUSTERED (i) ); INSERT #TestMe (City,StateProvinceID) SELECT City, StateProvinceID FROM AdventureWorks2012.Person.Address; GO --This gets a nested loop plan SELECT City FROM #TestMe WHERE StateProvinceID=1; GO;
For the temporary table, SQL Server uses statistics associated with the nonclustered index to estimate that it will get 25 rows back (which is right on). Based on this it decides to seek to the rows in the nonclustered index, then do a nested loop lookup to fetch the City column from the clustered index. It does 52 logical reads:
Now let’s run the same code, but with a table variable with an inline index:
DECLARE @TestMe TABLE ( i INT IDENTITY PRIMARY KEY CLUSTERED (i), City VARCHAR(60), StateProvinceID INT, INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID) ); INSERT @TestMe (City,StateProvinceID) SELECT City, StateProvinceID FROM AdventureWorks2012.Person.Address; --This gets a clustered index scan SELECT City FROM @TestMe WHERE StateProvinceID=1; GO
Oddly enough, it gets a clustered index scan. It estimates that only one row will be found — that’s because for table variables, statistics associated with the nonclustered index still can’t be populated. So it doesn’t know to estimate the 25 rows, and it just guesses one.
With a one row estimate, I thought the optimizer would surely go for the nested loop lookup. Looking up just one row is easy, right? But instead it decided to scan the clustered index. (Note: The nonclustered index is functional– it will use it if I omit the City column from the query. But it does not seem to want to do a key lookup from it in my testing here. Wacky!)
Finding: Inline Index creation is the most awesome for temp tables
I like the new syntax for its added flexibility. I do wish it allowed included columns, but for indexes with just key columns it can make for tidier, more compact code.
This feature doesn’t fix all the flaws of table variables– it’s interesting that I didn’t get a nested loop lookup in my testing, which makes me wonder if the optimizer has as many options with the indexes on table variables.
The ability of temporary tables to have column and index related statistics still gives them a great advantage in most situations. Using the inline index creation script on temporary tables in stored procedure to improve caching is a nice little bonus. Even with the gotcha I linked to above about statistics updates on temp tables, I think this feature makes the case for temporary tables even stronger.
What about Memory Optimized Table Variables?
First off, this feature is more expensive. SQL Server 2014 adds in new “memory optimized” tables, AKA “Project Hekaton”. This feature is only available in Enterprise Edition. I won’t come close to covering the whole feature here– I’ll just scratch the surface of one of its uses: the “Memory Optimized Table Variable”.
The first thing of note is that memory optimized tables DO support statistics– so does that mean that a Memory Optimized Table Variable might have them? Let’s take a look!
First, let’s enable memory optimized objects
To test this feature out, I need to make some changes to my test database:
ALTER DATABASE [IndexTest] ADD FILEGROUP [ImaNewFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA GO ALTER DATABASE [IndexTest] ADD FILE (name='imanewfilegroup_1', filename='C:\MSSQL\Data\imanewfilegroup_1') TO FILEGROUP [ImaNewFileGroup]; GO ALTER DATABASE [IndexTest] SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT ON; GO
Memory_Optimized = ON !
Now I can start creating my memory optimized table variable. First, I must create a table type with the table variable’s definition. I’m not going to test this in a natively compiled stored procedure just yet, so I’m leaving off the identity (it’s not supported in this scenario). I also use some special collations and don’t allow nullability on some columns to get the whole thing to work.
CREATE TYPE TM1 as TABLE ( City VARCHAR(60) COLLATE Latin1_General_100_BIN2, StateProvinceID INT NOT NULL, INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID) ) WITH (MEMORY_OPTIMIZED = ON)
All right, now that our table type exists, we can put it to use! Let’s populate it and query it and check out what plan we get:
DECLARE @TestMe TM1; INSERT @TestMe (City,StateProvinceID) SELECT City, StateProvinceID FROM AdventureWorks2012.Person.[Address]; SELECT City FROM @TestMe WHERE StateProvinceID=1; GO
There’s a little surprise in the execution plan:
Estimated rows is still 1– statistics still aren’t working here. But magically we get an index seek on the nonclustered index instead of a scan on the clustered index. This is remarkable because back in the normal world of non-memory-optimized indexes, the City column would not be in that nonclustered index– we only asked for it to contain StateProvinceID!
Memory Optimized Indexes are Always Covering
In the brave new world of Memory Optimized tables, indexes have a motto: Always Be Covering. We’re not in Kansas any more Toto, and there’s no such thing as a Key Lookup or Bookmark Lookup anymore.
This means a couple of things: for these tables, it really doesn’t matter that you don’t have the INCLUDE syntax on inline index creation. Nonclustered indexes will always have all the columns, so why bother with an INCLUDE clause? (I’m guessing that’s why it’s not possible in the new syntax!) Also, memory optimized indexes may be larger than you’d assume because they cover everything.
For 99% of the use cases I see in the wild these days, temporary tables beat table variables easily because their ability to have statistics improves query optimization dramatically. That remains true with the enhancements in SQL Server 2014.
For the 1% of cases where stored procedures using temporary objects are run at high rates of frequency, there are a few new choices that are attractive. They both have gotchas and require testing, but they could both turn out pretty well:
Option 1: Inline Nonclustered Indexes on Temp Tables: This works in Standard Edition, so I think this will be the most widely adopted. Instead of creating a nonclustered index after the temp table exists, you can improve caching of your temp tables by creating nonclustered indexes with the temp table itself. Just make sure you don’t get into trouble with out of date statistics.
Option 2: Memory Optimized Table Variables: For you lucky people with Enterprise Edition, you can test these out. You won’t want to do exactly what I did above — you’ll probably want to use natively compiled stored procedures. You’ll definitely want to test at scale and make sure you can monitor performance of these new structures at load. That may be a little tricky: you don’t get actual execution plan or cost information for natively compiled procedures. Given the fact that the memory optimized table variables didn’t show any statistics in my testing, in complex implementations you could end up with some strange execution plans. (This post hasn’t covered questions like “How much memory are these structures using?” “How much memory do you have left?” “What happens if you run out?”) In short: there’s much to learn here– don’t rush in without making sure you can support it.
You’d love to become a Database Administrator, but how do you get your foot in the door? Kendra Little has hired junior DBAs and helps employers develop interview procedures for database administrators. In this 30 minute video you’ll learn what employers look for, steps you can take to make yourself a great candidate, and how to build the right resume to get your first DBA job.
Looking for the links from the webcast? Scroll down below the video!
- Download our free SQL Server Training Plan DBA eBook
- Join our free SQL Server Training Plan Weekly Email
- Free How to Develop Your DBA Career eBook
- SQL Server User Group Chapters (Professional Association for SQL Server)
- Practice interview questions with our $29 SQL Server DBA Job Interview Question and Answer Training Kit
For this month’s T-SQL Tuesday topic, Michael J. Swart told us to argue against a popular opinion, and man, is that right up my alley. I’ve told you to stop worrying about index fragmentation, stop backing up your logs once an hour, and to stop running $14k of licensing on $3k of hardware.
You’re probably checking SQL Server wait stats periodically to find your SQL Server’s bottleneck – and that’s a good thing. Instead of checking Perfmon counters and trying to figure out what’s the slowest part of your server, at least wait statistics tell you what SQL Server has been waiting on while running queries.
But it all falls apart when you say, “67% of the time, SQL Server is waiting on ____, so we should focus there.”
We need to understand the difference between latency and throughput.
Explaining Latency and Throughput with Shipping Companies
If we run an online store and we wanted to measure how fast a delivery service works, we could call them to pick up a single envelope, and then measure the amount of time it took to arrive at its destination. We could then say, “That service can deliver exactly one package per day. If our business is going to grow, we’re gonna have to find a delivery service that can ship faster, because we need to move more than one package per day.” We could then focus all our efforts trying to use local courier services, or spreading our load across multiple shipping companies.
But we would be morons.
Instead, we need to put MORE packages out on our doorstep and call the delivery service to get it. They’ll send a truck, pick up all the packages, and deliver them to various destinations. As we try to ship more and more packages, we’ll probably need to upgrade to a loading dock, or maybe even multiple loading docks, and set up an arrangement with our shipping company to send more trucks simultaneously.
Latency is the length of time it takes to deliver a single package.
Throughput is the number of packages they can deliver per day.
Just because our business is waiting overnight for a single package to be delivered doesn’t mean we have to go finding a new shipping company. It’s completely normal. We need to keep pushing our business to figure out where the breaking point is. Are packages piling up at the door because the shipping company only has a single small cargo van? Sure, that’s the point at which we worry.
How This Relates to SQL Server Wait Stats
In a recent load testing engagement, the DBA told me, “We’ve only got a few end users hitting the system, and we’re already seeing 80-90% of our waits in PAGEIOLATCH. The data file storage simply isn’t able to keep up.”
We switched from using his wait stats script to sp_AskBrent® instead, which shows the cumulative amount of time spent waiting on each wait type. In any given 5-second span, the server was spending less than a second waiting on PAGEIOLATCH. Furthermore, the average wait time was less than 5 milliseconds each time – indicating that the storage was responding fairly quickly to each request.
The server was sitting idle, and the DBA was reading wait stats incorrectly. Sure, the majority of time spent waiting was due to storage, but there just wasn’t much time spent waiting period.
“Crank up the load,” I said. “Quadruple the amount of work you’re throwing at the server.”
Everybody in the room looked at me like I was crazy, but they agreed – and the SQL Server still didn’t flinch. We kept upping and upping the load, and finally we did find a breaking point, but it wasn’t storage. Just as you can pile up a lot of boxes in front of your house and the shipping company will pick them all up to deliver them in the same amount of time, the SQL Server’s storage kept right on delivering every result within 5-6 milliseconds.
The Moral of the Story
When using wait stats for monitoring, make sure you’re looking at the total number of seconds spent waiting per second on the clock. If you sample waits for 5 seconds on a 16-core server, don’t freak out about 5 seconds worth of wait. Each core can have multiple queries piled up, all waiting on different resources, so even 15-20 seconds of wait time during a 5-second period may not indicate problems.
SQL Server DBAs are slow to adopt Windows Server 2012 and Server 2012 R2. I frequently find teams planning new implementations of SQL Server 2012 on Windows Server 2008 R2.
There seem to be two main reasons for this:
- DBAs want to change a minimal number of factors, and are slow to trust a new version of the OS. Windows 2008R2 feels “safe.”
- Windows Server 2012 was hard to love when it first came out. With no Start menu, Windows 2012 was more baffling than exciting to most production DBAs.
But these reasons don’t hold up– and staying on an old OS isn’t doing you any favors. If you’re planning a SQL Server 2012 (or upcoming 2014) installation, you should also be using Windows Server 2012 or Server 2012 R2.
Windows Server 2008R2 isn’t “Safer” or “Simpler” than Server 2012 R2
Let’s break down that first concern above. The essense of it is that staying on Windows Server 2008R2 seems less risky than using a new OS.
But let’s look at that closely. The most recent service pack for Windows 2008R2 was released in March 2011– that was Service Pack 1. (I bet I’m not the only person who has searched in vain for SP2 repeatedly, thinking I must have just missed it.)
Lots of fixes have been released since Service Pack 1. Here’s a few highlights:
- KB 2566205 – Scalability of the I/O performance decreases in Windows Server 2008 R2 if more than 256 GB of physical memory is used
- KB 2468345 – Computer intermittently performs poorly or stops responding when the Storport driver is used in Windows Server 2008 R2
- KB 2545635 – You experience disk I/O performance issues on a computer that has an Intel Westmere-EX processor with at least 8 sockets and that is running Windows Server 2008 R2
- KB 2553549 – All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup in Windows Vista, in Windows 7, in Windows Server 2008 and in Windows Server 2008 R2
Which patches you need varies by your configuration, and this is just the tip of the iceberg.
When you’re using high availability solutions, the list of patches you need to consider just gets bigger– and the risks get higher. In this post Brent describes a cluster-specific bug that was released after Stack Exchange experienced Availability Group outages due to a clustering issue.
Although Windows 2008R2 has been out for a while, managing an Availability Group or a Windows Failover Cluster on it isn’t simpler– it’s actually more complicated! You’ve got more patches you need to be aware of and manage, and you still need to keep an eye out for new fixes as they’re released.
Failover Clustering has huge new features in quorum
Don’t get me wrong– Windows 2008 R2 has some pretty nice features when it comes to setting up a cluster, particularly compared to older versions of Windows. I love the Cluster Validation Wizard. I find the Failover Cluster Manager snap-in to be pretty intuitive and easy to manage.
But Failover Clusters have grown up significantly– particularly when it comes to your options with making sure that your applications stay online. And having more ways to keep applications online is probably the exact reason that you’re using features involving clustering!
Dynamic Quorum and Vote Assignment – Windows Server 2012
Windows Server 2012 introduced “Dynamic Quorum“. Dynamic Quorum lets the cluster add and remove votes for nodes if they shut down or fail. This makes it more likely for the cluster to maintain a majority if you have multiple sequential failures.
Windows Server 2012 also lets you easily customize quorum configuration and remove votes from some nodes if you’d like to. To remove node votes with Windows Server 2008R2, you’d have to know to install KB 2494036 and then reconfigure quorum.
Bonus: as of Windows Server 2012, the Windows Failover Clustering feature is part of Windows Standard Edition.
Dynamic Witness, Tiebreakers – Windows Server 2012 R2
Windows Server 2012 R2 introduces even more quorum features. The Dynamic Witness feature lets the cluster decide when it will or won’t give a vote to the witness you have configured.
This means that if you have multiple sequential failures, the cluster has more options and is smarter about keeping the right number of votes available. It also means that the guidance about creating a witness is simpler: since the vote can be managed dynamically, you should always create a witness.
Windows 2012 R2 has more features, too– the cluster is smarter about what happens when things start to come back online after you force quorum, and it has options for specifying a tiebreaker if you have multiple failures at the same time.
And it’s a simple thing, but I love it the most: in 2012 R2 you can see what’s happening with dynamic votes for nodes by using the Failover Cluster Manager (no PowerShell commands required). This makes testing different failure scenarios much more straightfoward as you can quickly and easily see how many votes there are, and who’s got them.
Oh, and they fixed that Start menu thing, too.
You can’t easily migrate your Windows 2008R2 Cluster to Windows 2012+
The final reason I wouldn’t plan a new highly available environment on Windows 2008R2 is simply that I don’t want to be stuck there. You might think, “We’ll start out on a known OS and then upgrade soon” — but that’s easier said than done.
If you’re on a Windows 2008R2 cluster and you want to upgrade, you will need to create a new, separate failover cluster using Windows Server 2012, and then migrate to it. That requires separate storage, separate servers, and a careful migration plan. It’s absolutely doable, but it’s not simple, cheap, or easy. (Note that the built-in “Migrate a Cluster” wizard doesn’t support SQL Server installs for Windows 2008R2 -> Windows 2012.)
What about SQL Server 2008? Or 2005?
If you’re doing a new server buildout, you may decide to install SQL Server 2008 R2 on a Windows 2012 / 2012 R2 failover cluster. That’s supported, but to get installation to work you may need to use a slipstream installation and take a few other steps. (I think slipstreaming is great, but budget extra time to get everything tested and working.)
If you’re thinking about running SQL Server 2005 on Windows 2012 or Windows 2012 R2, that’s just not supported by Microsoft.
It’s Time to Embrace Our New OS Overlord
I really like Windows 2008 R2. I’ve used it for years, it’s very familiar, and I feel like I know its quirks. But I also recognize that it’s got a lot of baggage. It’s not really simpler or easier to manage.
The improvements to Windows Failover Clustering features in later versions of the operating system should be very compelling to anyone considering clustering SQL Server or using Availability Groups. It’s not always perfect and it’s certainly complicated, but it’s much better than what you’ve got in Windows 2008R2.
It’s time to start planning your new SQL Server implementations with Windows 2012 and higher.
“I Shot a Server in Reno (just to watch it die)” Johnny Cash
“AlwaysOn on My Mind” Willie Nelson
“WASD, Don’t Take Your Single Thread to Town” Kenny Rogers
“Transactions in the (Replication) Stream” Kenny Rogers and Dolly Parton
“Stand by Your Maintenance Plan” Tammy Wynette
“Ring Buffer of Fire” Johnny Cash
“Your Deadlockin’ Heart” Hank Williams, Sr.
“Friends in Low Latency Places” Garth Brooks
“Mommas Don’t Let Your Babies Grow Up to Be Virtual Clusters” Willie Nelson and Waylon Jennings (cover)
“Crazy” Patsy Cline (Let’s face it, there’s no improving on this one.)
“Here’s a Quarter (Call Someone Who Understands Oracle)” Travis Tritt
“I Wish You Could Have Turned My Head (And Left My Indexes Alone)” Oak Ridge Boys
“I Love the Way You Optimize Me” John Michael Montgomery
“The Night the Lights Went Out in the Datacenter” Reba McEntire
“All My Transactions Take Place in Texas” George Strait
“She’s In Love With the Feature” Trisha Yearwood
“Could I Have This Lock” Anne Murray
kCura Relativity is an e-discovery program used by law firms to find evidence quickly. I’ve blogged about performance tuning Relativity, and today I’m going to go a little deeper to explain why DBAs have to be aware of Relativity database contents.
In Relativity, every workspace (case) lives in its own SQL Server database. That one database houses:
- Document metadata – where the document was found, what type of document it is
- Extracted text from each document – the content of emails, spreadsheets, files
- Document tagging and highlighting – things the lawyers discovered about the documents and noted for later review
- Workspace configuration – permissions data about who’s allowed to see what documents
- Auditing trails – who’s searched for what terms, what documents they’ve looked at, and what changes they made
For performance tuners like me, that last one is kinda interesting. I totally understand that we have to capture every activity in Relativity and log it to a table, but log-sourced data has different performance and recoverability requirements than other e-discovery data.
The AuditRecord table is append-only. We don’t go back and modify AuditRecord data – we just constantly add to it, tacking on data at the end of the table. This means it has different backup requirements – I certainly don’t want to be doing a full backup on this table every day, repeatedly backing up the exact same data over and over and over when it will never change.
It will never get deleted. We have to keep this data throughout the life of the case because the client may need to go back through time to see who did what during the case. This means it will keep growing and growing, making backups tougher over time.
In big cases, it can dominate the database. I’ve seen instances where the AuditRecord table consumed more than half of the database size – meaning in a 1TB database, 500GB of the contents are AuditRecord. This means backups will take twice as long.
In the event of a disaster, I don’t need it right away. In our 1TB workspace example, I would prefer to restore the 500GB workspace data first, let the lawyers in to do document review, and then take my time restoring a separate 500GB AuditRecordArchive database.
Splitting AuditRecord Into an Archive Database
In order to pull this off, I need two separate databases:
- Workspace database – the normal EDDS12345 database for workspace 12345. Inside this database, I have all the normal Relativity tables, but only a small AuditRecord table with the current audits – say the last 7 days. As people review documents, I’d log that data into this AuditRecord table.
- AuditRecordArchive database – say, EDDS12345_AuditRecordArchive. In here, I have one AuditRecordArchive table that has all of the AuditRecord data more than 1 week old.
Once I’ve set this up, then I need a job that sweeps the EDDS12345.AuditRecord data into the archive database once per week. Immediately after that job finishes, then I do a full backup and DBCC of EDDS12345_AuditRecordArchive – and then I don’t need to back it up again until the next sweep.
If I want to get really fancy, I don’t do daily DBCCs or index maintenance against that AuditRecordArchive database either. If anything goes wrong with it, like database corruption, I just restore to last week’s full backup and I’m off to the races. This means less downtime for database maintenance.
Great News! kCura Supports This
It’s called “partitioning the AuditRecord table”, and your kCura support contacts can walk you through it. It doesn’t involve SQL Server partitioned tables at all – they just call it partitioning because it’s the same basic concept, only done with application-level code.
However, I don’t recommend doing this by default across all your databases. This technique is going to instantly double the number of databases you have and make your management much more complex. However, I do recommend reviewing your largest workspaces to see if AuditRecord is consuming half or more of the database space. If so, consider partitioning their AuditRecord tables to get faster backups, database maintenance jobs, and restores.
At the risk of sounding like a fanboy, this is one of the reasons I love working with the kCura folks. They really care about database performance, they take suggestions like this, and they implement it in a way that makes a real difference for customers.
This is also why database administrators need to:
- Understand the real business purpose of the biggest tables in their databases
- Build working, productive relationships with their software vendors
- Come up with creative approaches to ease SQL Server pains
- Help the vendors implement these approaches in software
Ever wanted to prove that your code or index changes had a positive impact? In this 30 minute video, Kendra shows you how to use Extended Events to measure a production workload– and how you can aggregate the results to prove your changes made a difference.
Want to skip straight to the demos? Here’s a quick timeline:
- Demo 1: How I generate a fake workload – 2:45
- Demo 2: Taking an initial benchmark with Extended Events – 4:45
- Demo 3: Aggregating the results of the Extended Events Trace -10:52
- Demo 4: Taking a second benchmark with Extended Events and aggregating the results – 18:05
Want a sample of the type of Extended Events trace run in the demo? Here’s a sample script. This isn’t any rocket science on my part: I generated the script with the Extended Events wizard and then told it to script it out. (Try it yourself!)
--****************** --These scripts were generated from SQL Server 2012 --Management Studio Extended Events Wizard --Customization is primarily just formatting for readability --****************** CREATE EVENT SESSION [Production Perf Sample-Before] ON SERVER ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1) ACTION(sqlserver.client_app_name,sqlserver.database_id, sqlserver.query_hash,sqlserver.query_plan_hash,sqlserver.session_id) WHERE ((([package0].[divides_by_uint64]([sqlserver].[session_id],(5))) AND ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)))) AND ([package0].[equal_boolean]([sqlserver].[is_system],(0))) AND ([sqlserver].[query_hash]<>(0)) )), ADD EVENT sqlserver.sql_statement_completed( ACTION(sqlserver.client_app_name,sqlserver.database_id, sqlserver.query_hash,sqlserver.query_plan_hash,sqlserver.session_id) WHERE ((([package0].[divides_by_uint64]([sqlserver].[session_id],(5))) AND ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)))) AND ([package0].[equal_boolean]([sqlserver].[is_system],(0))) AND ([sqlserver].[query_hash]<>(0)) )) ADD TARGET package0.event_file (SET filename=N'c:\Xevents\Traces\Production Perf Sample-Before') WITH ( EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, TRACK_CAUSALITY=ON, STARTUP_STATE=OFF ) GO --ALTER EVENT SESSION [Production Perf Sample-Before] ON SERVER STATE=START; --GO --ALTER EVENT SESSION [Production Perf Sample-Before] ON SERVER STATE=STOP; --GO