SQL Server DBAs are slow to adopt Windows Server 2012 and Server 2012 R2. I frequently find teams planning new implementations of SQL Server 2012 on Windows Server 2008 R2.
There seem to be two main reasons for this:
- DBAs want to change a minimal number of factors, and are slow to trust a new version of the OS. Windows 2008R2 feels “safe.”
- Windows Server 2012 was hard to love when it first came out. With no Start menu, Windows 2012 was more baffling than exciting to most production DBAs.
But these reasons don’t hold up– and staying on an old OS isn’t doing you any favors. If you’re planning a SQL Server 2012 (or upcoming 2014) installation, you should also be using Windows Server 2012 or Server 2012 R2.
Windows Server 2008R2 isn’t “Safer” or “Simpler” than Server 2012 R2
Let’s break down that first concern above. The essense of it is that staying on Windows Server 2008R2 seems less risky than using a new OS.
But let’s look at that closely. The most recent service pack for Windows 2008R2 was released in March 2011– that was Service Pack 1. (I bet I’m not the only person who has searched in vain for SP2 repeatedly, thinking I must have just missed it.)
Lots of fixes have been released since Service Pack 1. Here’s a few highlights:
- KB 2566205 – Scalability of the I/O performance decreases in Windows Server 2008 R2 if more than 256 GB of physical memory is used
- KB 2468345 – Computer intermittently performs poorly or stops responding when the Storport driver is used in Windows Server 2008 R2
- KB 2545635 – You experience disk I/O performance issues on a computer that has an Intel Westmere-EX processor with at least 8 sockets and that is running Windows Server 2008 R2
- KB 2553549 – All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup in Windows Vista, in Windows 7, in Windows Server 2008 and in Windows Server 2008 R2
Which patches you need varies by your configuration, and this is just the tip of the iceberg.
When you’re using high availability solutions, the list of patches you need to consider just gets bigger– and the risks get higher. In this post Brent describes a cluster-specific bug that was released after Stack Exchange experienced Availability Group outages due to a clustering issue.
Although Windows 2008R2 has been out for a while, managing an Availability Group or a Windows Failover Cluster on it isn’t simpler– it’s actually more complicated! You’ve got more patches you need to be aware of and manage, and you still need to keep an eye out for new fixes as they’re released.
Failover Clustering has huge new features in quorum
Don’t get me wrong– Windows 2008 R2 has some pretty nice features when it comes to setting up a cluster, particularly compared to older versions of Windows. I love the Cluster Validation Wizard. I find the Failover Cluster Manager snap-in to be pretty intuitive and easy to manage.
But Failover Clusters have grown up significantly– particularly when it comes to your options with making sure that your applications stay online. And having more ways to keep applications online is probably the exact reason that you’re using features involving clustering!
Dynamic Quorum and Vote Assignment – Windows Server 2012
Windows Server 2012 introduced “Dynamic Quorum“. Dynamic Quorum lets the cluster add and remove votes for nodes if they shut down or fail. This makes it more likely for the cluster to maintain a majority if you have multiple sequential failures.
Windows Server 2012 also lets you easily customize quorum configuration and remove votes from some nodes if you’d like to. To remove node votes with Windows Server 2008R2, you’d have to know to install KB 2494036 and then reconfigure quorum.
Bonus: as of Windows Server 2012, the Windows Failover Clustering feature is part of Windows Standard Edition.
Dynamic Witness, Tiebreakers – Windows Server 2012 R2
Windows Server 2012 R2 introduces even more quorum features. The Dynamic Witness feature lets the cluster decide when it will or won’t give a vote to the witness you have configured.
This means that if you have multiple sequential failures, the cluster has more options and is smarter about keeping the right number of votes available. It also means that the guidance about creating a witness is simpler: since the vote can be managed dynamically, you should always create a witness.
Windows 2012 R2 has more features, too– the cluster is smarter about what happens when things start to come back online after you force quorum, and it has options for specifying a tiebreaker if you have multiple failures at the same time.
And it’s a simple thing, but I love it the most: in 2012 R2 you can see what’s happening with dynamic votes for nodes by using the Failover Cluster Manager (no PowerShell commands required). This makes testing different failure scenarios much more straightfoward as you can quickly and easily see how many votes there are, and who’s got them.
Oh, and they fixed that Start menu thing, too.
You can’t easily migrate your Windows 2008R2 Cluster to Windows 2012+
The final reason I wouldn’t plan a new highly available environment on Windows 2008R2 is simply that I don’t want to be stuck there. You might think, “We’ll start out on a known OS and then upgrade soon” — but that’s easier said than done.
If you’re on a Windows 2008R2 cluster and you want to upgrade, you will need to create a new, separate failover cluster using Windows Server 2012, and then migrate to it. That requires separate storage, separate servers, and a careful migration plan. It’s absolutely doable, but it’s not simple, cheap, or easy. (Note that the built-in “Migrate a Cluster” wizard doesn’t support SQL Server installs for Windows 2008R2 -> Windows 2012.)
What about SQL Server 2008? Or 2005?
If you’re doing a new server buildout, you may decide to install SQL Server 2008 R2 on a Windows 2012 / 2012 R2 failover cluster. That’s supported, but to get installation to work you may need to use a slipstream installation and take a few other steps. (I think slipstreaming is great, but budget extra time to get everything tested and working.)
If you’re thinking about running SQL Server 2005 on Windows 2012 or Windows 2012 R2, that’s just not supported by Microsoft.
It’s Time to Embrace Our New OS Overlord
I really like Windows 2008 R2. I’ve used it for years, it’s very familiar, and I feel like I know its quirks. But I also recognize that it’s got a lot of baggage. It’s not really simpler or easier to manage.
The improvements to Windows Failover Clustering features in later versions of the operating system should be very compelling to anyone considering clustering SQL Server or using Availability Groups. It’s not always perfect and it’s certainly complicated, but it’s much better than what you’ve got in Windows 2008R2.
It’s time to start planning your new SQL Server implementations with Windows 2012 and higher.
“I Shot a Server in Reno (just to watch it die)” Johnny Cash
“AlwaysOn on My Mind” Willie Nelson
“WASD, Don’t Take Your Single Thread to Town” Kenny Rogers
“Transactions in the (Replication) Stream” Kenny Rogers and Dolly Parton
“Stand by Your Maintenance Plan” Tammy Wynette
“Ring Buffer of Fire” Johnny Cash
“Your Deadlockin’ Heart” Hank Williams, Sr.
“Friends in Low Latency Places” Garth Brooks
“Mommas Don’t Let Your Babies Grow Up to Be Virtual Clusters” Willie Nelson and Waylon Jennings (cover)
“Crazy” Patsy Cline (Let’s face it, there’s no improving on this one.)
“Here’s a Quarter (Call Someone Who Understands Oracle)” Travis Tritt
“I Wish You Could Have Turned My Head (And Left My Indexes Alone)” Oak Ridge Boys
“I Love the Way You Optimize Me” John Michael Montgomery
“The Night the Lights Went Out in the Datacenter” Reba McEntire
“All My Transactions Take Place in Texas” George Strait
“She’s In Love With the Feature” Trisha Yearwood
“Could I Have This Lock” Anne Murray
kCura Relativity is an e-discovery program used by law firms to find evidence quickly. I’ve blogged about performance tuning Relativity, and today I’m going to go a little deeper to explain why DBAs have to be aware of Relativity database contents.
In Relativity, every workspace (case) lives in its own SQL Server database. That one database houses:
- Document metadata – where the document was found, what type of document it is
- Extracted text from each document – the content of emails, spreadsheets, files
- Document tagging and highlighting – things the lawyers discovered about the documents and noted for later review
- Workspace configuration – permissions data about who’s allowed to see what documents
- Auditing trails – who’s searched for what terms, what documents they’ve looked at, and what changes they made
For performance tuners like me, that last one is kinda interesting. I totally understand that we have to capture every activity in Relativity and log it to a table, but log-sourced data has different performance and recoverability requirements than other e-discovery data.
The AuditRecord table is append-only. We don’t go back and modify AuditRecord data – we just constantly add to it, tacking on data at the end of the table. This means it has different backup requirements – I certainly don’t want to be doing a full backup on this table every day, repeatedly backing up the exact same data over and over and over when it will never change.
It will never get deleted. We have to keep this data throughout the life of the case because the client may need to go back through time to see who did what during the case. This means it will keep growing and growing, making backups tougher over time.
In big cases, it can dominate the database. I’ve seen instances where the AuditRecord table consumed more than half of the database size – meaning in a 1TB database, 500GB of the contents are AuditRecord. This means backups will take twice as long.
In the event of a disaster, I don’t need it right away. In our 1TB workspace example, I would prefer to restore the 500GB workspace data first, let the lawyers in to do document review, and then take my time restoring a separate 500GB AuditRecordArchive database.
Splitting AuditRecord Into an Archive Database
In order to pull this off, I need two separate databases:
- Workspace database – the normal EDDS12345 database for workspace 12345. Inside this database, I have all the normal Relativity tables, but only a small AuditRecord table with the current audits – say the last 7 days. As people review documents, I’d log that data into this AuditRecord table.
- AuditRecordArchive database – say, EDDS12345_AuditRecordArchive. In here, I have one AuditRecordArchive table that has all of the AuditRecord data more than 1 week old.
Once I’ve set this up, then I need a job that sweeps the EDDS12345.AuditRecord data into the archive database once per week. Immediately after that job finishes, then I do a full backup and DBCC of EDDS12345_AuditRecordArchive – and then I don’t need to back it up again until the next sweep.
If I want to get really fancy, I don’t do daily DBCCs or index maintenance against that AuditRecordArchive database either. If anything goes wrong with it, like database corruption, I just restore to last week’s full backup and I’m off to the races. This means less downtime for database maintenance.
Great News! kCura Supports This
It’s called “partitioning the AuditRecord table”, and your kCura support contacts can walk you through it. It doesn’t involve SQL Server partitioned tables at all – they just call it partitioning because it’s the same basic concept, only done with application-level code.
However, I don’t recommend doing this by default across all your databases. This technique is going to instantly double the number of databases you have and make your management much more complex. However, I do recommend reviewing your largest workspaces to see if AuditRecord is consuming half or more of the database space. If so, consider partitioning their AuditRecord tables to get faster backups, database maintenance jobs, and restores.
At the risk of sounding like a fanboy, this is one of the reasons I love working with the kCura folks. They really care about database performance, they take suggestions like this, and they implement it in a way that makes a real difference for customers.
This is also why database administrators need to:
- Understand the real business purpose of the biggest tables in their databases
- Build working, productive relationships with their software vendors
- Come up with creative approaches to ease SQL Server pains
- Help the vendors implement these approaches in software
Ever wanted to prove that your code or index changes had a positive impact? In this 30 minute video, Kendra shows you how to use Extended Events to measure a production workload– and how you can aggregate the results to prove your changes made a difference.
Want to skip straight to the demos? Here’s a quick timeline:
- Demo 1: How I generate a fake workload – 2:45
- Demo 2: Taking an initial benchmark with Extended Events – 4:45
- Demo 3: Aggregating the results of the Extended Events Trace -10:52
- Demo 4: Taking a second benchmark with Extended Events and aggregating the results – 18:05
Want a sample of the type of Extended Events trace run in the demo? Here’s a sample script. This isn’t any rocket science on my part: I generated the script with the Extended Events wizard and then told it to script it out. (Try it yourself!)
--****************** --These scripts were generated from SQL Server 2012 --Management Studio Extended Events Wizard --Customization is primarily just formatting for readability --****************** CREATE EVENT SESSION [Production Perf Sample-Before] ON SERVER ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1) ACTION(sqlserver.client_app_name,sqlserver.database_id, sqlserver.query_hash,sqlserver.query_plan_hash,sqlserver.session_id) WHERE ((([package0].[divides_by_uint64]([sqlserver].[session_id],(5))) AND ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)))) AND ([package0].[equal_boolean]([sqlserver].[is_system],(0))) AND ([sqlserver].[query_hash]<>(0)) )), ADD EVENT sqlserver.sql_statement_completed( ACTION(sqlserver.client_app_name,sqlserver.database_id, sqlserver.query_hash,sqlserver.query_plan_hash,sqlserver.session_id) WHERE ((([package0].[divides_by_uint64]([sqlserver].[session_id],(5))) AND ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)))) AND ([package0].[equal_boolean]([sqlserver].[is_system],(0))) AND ([sqlserver].[query_hash]<>(0)) )) ADD TARGET package0.event_file (SET filename=N'c:\Xevents\Traces\Production Perf Sample-Before') WITH ( EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, TRACK_CAUSALITY=ON, STARTUP_STATE=OFF ) GO --ALTER EVENT SESSION [Production Perf Sample-Before] ON SERVER STATE=START; --GO --ALTER EVENT SESSION [Production Perf Sample-Before] ON SERVER STATE=STOP; --GO
One of my favorite things about SQL Server is the instrumentation. It’s incredibly good at giving you information about what’s going on in the database engine and query optimizer.
Want to know why your server is slow? Need to know what made it go offline? You have a lot of tools available.
This hasn’t always been the case. One of the toughest things about the old SQL Server 2000 is the lack of instrumentation. The tools that help you improve SQL Server have all been added in the last 10 years.
The cool thing is that the instrumentation gets better and better with each version of SQL Server– new features get new Dynamic Management Views and functions. And sometimes we get a whole new feature to help us get insight into SQL Server.
If you’ve still got SQL Server 2000 instances where performance and availability is important, explain to your management how limited the tools are to diagnose and repair issues. SQL Server’s come an awful long way!
“How many rows exist in a table?”
It seems like such an innocent request. It isn’t too hard to get this information out of SQL Server. But before you open SSMS and whip out a quick query, understand that there are multiple methods to get this information out of SQL Server – and none of them are perfect!
COUNT(*) or COUNT(1)
The seemingly obvious way to get the count of rows from the table is to use the COUNT function. There are two common ways to do this – COUNT(*) and COUNT(1). Let’s look at COUNT(*) first.
SELECT COUNT(*) FROM dbo.bigTransactionHistory;
The STATISTICS IO output of this query shows that SQL Server is doing a lot of work! Over 100,000 logical reads, physical reads, and even read-ahead reads need to be done to satisfy this query.
(1 row(s) affected) Table 'bigTransactionHistory'. Scan count 5, logical reads 132534,
physical reads 3, read-ahead reads 131834, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Looking at the execution plan, we can see an Index Scan returning over 31 million rows. This means that SQL Server is reading every row in the index, then aggregating and counting the value – finally ending up with our result set. The cost of this query? 123.910000.
The query results: 31,263,601 rows.
Now, let’s look at the behavior of COUNT(1).
SELECT COUNT(1) FROM dbo.bigTransactionHistory;
We can see from STATISTICS IO that we have a large number of logical reads – over 100,000.
(1 row(s) affected) Table 'bigTransactionHistory'. Scan count 5, logical reads 132531,
physical reads 3, read-ahead reads 131849, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
The execution plan again shows an index scan returning over 31 million rows for processing. The query cost is the same, 123.910000.
The results here are the same – 31,263,601 rows.
The benefit of using COUNT is that it is an accurate indicator of exactly how many rows exist in the table at the time query processing begins. However, as the table is scanned, locks are being held. This means that other queries that need to access this table have to wait in line. This might be acceptable on an occasional basis, but I frequently see applications issuing these types of queries hundreds or thousands of times per minute.
sys.tables + sys.indexes + sys.partitions
We can join several SQL Server catalog views to count the rows in a table or index, also. sys.tables will return objects that are user-defined tables; sys.indexes returns a row for each index of the table; and sys.partitions returns a row for each partition in the table or index. I am going to query for the table ID, name, and count of rows in all partitions.
DECLARE @TableName sysname SET @TableName = 'bigTransactionHistory' SELECT TBL.object_id, TBL.name, SUM(PART.rows) AS rows FROM sys.tables TBL INNER JOIN sys.partitions PART ON TBL.object_id = PART.object_id INNER JOIN sys.indexes IDX ON PART.object_id = IDX.object_id AND PART.index_id = IDX.index_id WHERE TBL.name = @TableName AND IDX.index_id < 2 GROUP BY TBL.object_id, TBL.name;
The output of STATISTICS IO here shows far fewer reads – 15 logical reads total.
(1 row(s) affected) Table 'syssingleobjrefs'. Scan count 1, logical reads 2, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'sysidxstats'. Scan count 2, logical reads 4, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'sysrowsets'. Scan count 1, logical reads 5, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'sysschobjs'. Scan count 1, logical reads 4, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The execution plan is more complex, but much less work – the query cost here is 0.0341384.
The results of the query are also the same – 31,263,301.
The benefits of using this method are that the query is much more efficient, and it doesn’t lock the table you need the count of rows for.
However, you need to be cautious when counting the rows on a table that is frequently inserted into or deleted from. The TechNet documentation for sys.partitions.rows says it indicates the “approximate number of rows for this partition”. How approximate? That information isn’t documented. Understand, though, that if you use this method, you potentially sacrifice up-to-the-moment accuracy for performance.
A third option is to use the dynamic management view sys.dm_db_partition_stats. This returns one row per partition for an index.
DECLARE @TableName sysname SET @TableName = 'bigTransactionHistory' SELECT OBJECT_NAME(object_id), SUM(row_count) AS rows FROM sys.dm_db_partition_stats WHERE object_id = OBJECT_ID(@TableName) AND index_id < 2 GROUP BY OBJECT_NAME(object_id);
The STATISTICS IO output of this query is even lower – this time, only two logical reads are performed.
(1 row(s) affected) Table 'sysidxstats'. Scan count 1, logical reads 2, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The execution plan is less complex than our second example involving the three system views. This query also has a lower cost – 0.0146517.
The query results are the same as the previous examples – 31,263,301 rows.
Using this DMV has the same benefits as the system views – fewer logical reads and no locking of the target table. The query is also simpler to write, involving only one object.
But again, the TechNet documentation for sys.dm_db_partition_stats.row_count says it is “the approximate number of rows in the partition”, and when this information is updated is not documented. Here, you are also potentially sacrificing accuracy for performance.
Time to do some digging
The questions that you need to work with the business to answer are, “How up-to-date must the row count be? What is the business purpose? How often do you insert into or delete from that table, and how often do you count the rows?” If the accuracy of the row count is crucial, work to reduce the amount of updates done to the table. If performance is more important, and the row count could be approximate, use one of the system views.
The other day I was running a script I’d written to expand out some tables in the AdventureWorks sample database for some testing. The script was taking a long time, and at first I thought I was just spoiled because I usually run my tests on my laptop, whose storage is backed by SSDs. In this case I was running my test against a SQL Server instance up in AWS.
After a delicious fifteen minute coffee break, I realized something was wrong– it STILL wasn’t done. I grabbed sp_AskBrent™ to do a quick check and see if my storage was really THAT slow.
I ran sp_AskBrent™ with a 60 second sample, like this:
exec sp_AskBrent @seconds=60, @ExpertMode=1; GO
Here’s what I saw (click for a larger version):
PREEMPTIVE_OS_WRITEFILEGATHER means slow file growths
In a 60 second sample, I’d been waiting for PREEMPTIVE_OS_WRITEFILEGATHER for 36 seconds. Whoops!
The script I was running was making tables larger. The database was out of room, and was growing the data file in small autogrowth increments. Each of these automatic growths was stopping to “zero out” the files, because I hadn’t enabled Instant File Initialization in SQL Server.
How I enabled Instant File Initialization
My test instance is set up a bit uniquely– it uses UNC paths for the SQL Server databases. This is supported in SQL Server 2012, and I was testing it out in a Windows Failover Cluster. (If you use shares for files, you need to be very careful about redundancy and availability of course, but that’s a topic for another post.)
I’d forgotten to enable Instant File Initialization for the SQL Server Service on the server that hosted the file shares! Wherever Windows will actually grow the files, the SQL Server Service account needs the “Perform Volume Maintenance” tasks right.
Using the Local Security Policy to Grant “Perform Volume Maintenance Tasks”
Here’s how to grant the rights:
First, open the local security. I usually do this by running the command secpol.msc, but you can also find this in the “Administrative Tools” menu.
Then navigate to Local Policy -> User Right Assignment -> Perform Volume Maintenance Tasks.
Add the SQL Server Service account. The change will take effect the next time you restart SQL Server.
So, did it improve my performance?
Heck yeah, it did! On the first run I got tired of waiting and killed my expansion script after 20 minutes and just dealt with the rollback. After enabling Instant File Initialization on fileserver that holds the shares where my database files live, the expansion script finished in four minutes and 48 seconds. Shazam!
Temp tables are like real tables, just a little tricker.
When you’re starting out writing TSQL, it’s easy to want to do all your work in a single query. You learn about derived sub-queries, CROSS APPLY statements, and common table expressions. Suddenly, each of your queries is so complex that you hardly know what you’re doing.
But complex TSQL can be quite fragile. Small mis-estimations in parts of the plan can cause things to go very much awry in other regions of the plan.
Temp tables can help performance!
One treatment for slow, complex TSQL is to break it into multiple steps. You create a temp table, populate it, and reference it in future queries. This can lead to improved estimates and reliability, because temporary tables can have column and index related statistics. Those statistics help the optimizer estimate how many rows will be returned at different points along the way.
But this tuning technique isn’t foolproof. It’s easy to forget that statistics can be tricky– just like with real tables. Let’s take a look and prove that statistics issues from “normal” tables also matter on temp tables.
Out of date statistics on temp tables can hurt performance
The first step is to load up a temp table with some rows:
INSERT #temp (TransactionID, TempValue) SELECT TOP 100000 TransactionID, 1 FROM Production.TransactionHistory; GO INSERT #temp (TransactionID, TempValue) SELECT TOP 10000 TransactionID, 2 FROM Production.TransactionHistory; GO
We now have:
- 100K rows where TempValue=1
- 10K rows where TempValue=2
Now let’s say I query the temp table in some way. My where clause is for a specific TempValue. The act of running this query causes a column level statistic to be created on the TempValue column:
SELECT max(th.Quantity) FROM #temp as t JOIN Production.TransactionHistory th on t.TransactionID=th.TransactionID WHERE TempValue=2; GO
We can see the statistic, too. This code is modified from Books Online and uses a DMV available in SQL Server 2012 SP1 and higher:
use tempdb; GO SELECT sp.stats_id, name, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter FROM sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp WHERE stat.object_id = object_id('#temp'); GO
I continue on, and insert 11K rows for TempValue=3:
INSERT #temp (TransactionID, TempValue) SELECT TOP 11000 TransactionID, 3 FROM Production.TransactionHistory; GO
At this point I have:
- 100K rows where TempValue=1
- 10K rows where TempValue=2
- 11K rows where TempValue=3
Now I run another query which wants to look ONLY at the 11K rows where TempValue=3. SQL Server completely mis-guesses how many rows will be returned– instead of 11K rows, it guesses that I’ll just get a SINGLE row:
SELECT max(th.Quantity) FROM #temp as t JOIN Production.TransactionHistory th on t.TransactionID=th.TransactionID WHERE TempValue=3; GO
Looking at the statistics, we can see why– the statistic that was created when I ran my first SELECT query wasn’t triggered to update:
Let’s look at the histogram
When we look at the histogram for the statistic, it confirms that it doesn’t know about any rows with a TempValue higher than two:
Bad Statistics can lead to bad optimization
If I was joining from this temp table to other large tables, it might cause a very large data skew in my plan– and it could cause big problems. The optimizer might select inefficient joins, not allocate enough memory, and choose not to go parallel if estimates are skewed too low.
Temp tables are a great tool to use, but remember a few key rules:
- Column level statistics will be created for you when you run queries using the temp table
- If you modify the temp table in multiple steps, those statistics may get out of date
In complex procedures, manually creating and updating statistics on columns or indexes can save you time and improve execution plan quality.
Could Trace Flag 2371 Help?
If you find that you have a lot of existing code that has this problem, Trace Flag 2371 may help. This trace flag is available as of SQL Server 2008R2 SP1 and higher, and it changes the default threshold for when statistics automatically update.
The net effect of the trace flag is to cause statistics updates to happen more frequently on large tables. You must turn it on server wide for it to take effect — it doesn’t work for individual sessions.
I’m not a huge fan of enabling any trace flags unless you’re solving a specific problem, and documenting it. This trace flag is not very widely used in my experience, so flipping it on puts you in a small subset of people. If you start hitting wacky issues and need to resolve them, suddenly the trace flag gives you a whole new dimension of complexity. So please don’t think of this trace flag as “preventive medicine” — only use if it you don’t have other good ways to solve the problem.
I first knew something was up when I looked at the job history for a simple maintenance plan. It had two steps:
- Rebuild all the indexes in the database – this took 10 minutes each night.
- Update statistics – this took 2-3 hours each night.
Maintenance Plans light the fuse
I love the concept of maintenance plans, but I don’t love the way all the tasks are set up.
In the case I was looking at, the Update Statistics task was being used with two values that are set by default:
- Run against all statistics
- Update them with fullscan
“All” statistics means that both “column” and “index” statistics will be updated. There may be quite a lot of statistics — most people leave the “auto create statistics” option enabled on their databases, which means that queries will dynamically cause the creation of more and more statistics over time.
Combined with “fullscan”, updating all statistics can become a significant amount of work. “Fullscan” means that to update a statistic, SQL Server will scan 100% of the values in the index or column. That adds up to a lot of IO.
Why ‘SELECT StatMan’ repeatedly scans tables
If SQL Server needs to update column level statistics for the same table, it could potentially use a single scan and update multiple stats, right?
Because of the runtimes I was seeing, I was pretty sure that wasn’t happening. But we can take a closer look and see for ourselves.
In our maintenance plan task, if we hit “View TSQL”, a window pops up showing us the comamnds that the plan will run. (I love this feature, by the way!) We will use one of these commands to test things out in a bit.
First, let’s make sure we have some column level statistics on our database. It already has indexes and their associated stats. To create some column level stats, I run these queries:
--create two column stats using 'auto create statistics' select * from Person.Person where MiddleName like 'M%'; select * from Person.Person where Title is not null; GO --Create two filtered stats on Title create statistics kl_statstest1 on Person.Person (Title) where Title = 'Mr.' GO create statistics kl_statstest2 on Person.Person (Title) where Title = 'Ms.' GO
That will create two “auto” stats what start with “_WA_Sys”, and two stats that I named myself. To check ‘em out and see ALL the index and column stats on the table, we run:
exec sp_helpstats 'Person.Person', 'All'; GO
Sure enough, this shows us that we have seven stats total– three are related to indexes.
Alright, time to run that sample command excerpted from our maintenance plan. I start up an Extended Events trace to capture IO from sp_statements completed, then run the command the maintenance plan was going to use to update every statistic on this table with fullscan:
UPDATE STATISTICS [Person].[Person] WITH FULLSCAN GO
Here’s the trace output –click to view it in a larger image:
Looking at the Extended Events trace output, I can see the commands that were run as well as their logical reads. The commands look like this:
SELECT StatMan([SC0]) FROM (SELECT TOP 100 PERCENT [Title] AS [SC0] FROM [Person].[Person] WITH (READUNCOMMITTED) WHERE ([title]='Mr.') ORDER BY [SC0] ) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 16)
The “logical_reads” column lets me know that updating four of these statistics had to do four separate scans of my table– and three of them are all on the Title column! (Doing a SELECT * FROM Person.Person shows 5,664 logical reads by comparison.)
IO was lower for statistics related to nonclustered indexes because those NC indexes have fewer pages than the clustered index.
A better way to update statistics: Let SQL Server pick the TABLESAMPLE
If you just run the TSQL command ‘UPDATE STATISTICS Person.Person’ (without telling it to scan all the rows), it has the option to do something like this:
SELECT StatMan([SC0], [SB0000]) FROM (SELECT TOP 100 PERCENT [SC0], step_direction([SC0]) over (order by NULL) AS [SB0000] FROM (SELECT [Title] AS [SC0] FROM [Person].[Person] TABLESAMPLE SYSTEM (3.547531e+001 PERCENT) WITH (READUNCOMMITTED) ) AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SB0000] ) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 1)
It dynamically figures out a sample size by which to calculate results! (It can pick a variety of options– including scanning the whole thing.)
How to configure faster, better statistics maintenance
Avoid falling for the pre-populated settings in the “Update Statistics” task in the maintenance plan. It’s rare to truly need to use FULLSCAN to update stats in SQL Server, and even when cases where it’s justified you want to implement that with statements targeting the individual statistics to update. The basic “UPDATE STATISTICS Schema.TableName” command is pretty clever– the issue is simply that Maintenance Plans don’t make it easy for you to run that!
Unfortunately, if you use maintenance plans there’s no super simple solution– it forces you to specify either fullscan or a specific sample. There’s no way to just use the basic “You compute the minimum sample” with that task.
You’ve still got good options, they’re just a few more steps:
- You could use a t-sql related task or a custom SQL Agent job to run sp_updatestats
- You could use a free index and statistics maintenance script. The example I’ve linked to is super clever, and avoids updating statistics where it has just rebuilt an index!
- You could also let auto update stats take care of the issue– that’s often just fine on small databases or where there aren’t major data fluctuations
And each of those options should chew up less IO than updating all index and column statistics with FULLSCAN.
When is a “Full Scan” not a Full Scan?
The performance counter “Full scans/sec” sounds pretty scary. If you see spikes in this counter, you’re likely to think that your queries are madly scanning all the data in your tables– there are no stylish, well performant seeks, just drunken queries spewing IO everywhere.
If you look up the counter in SQL Server Books Online, it seems to confirm that. Full Scans/sec is defined as the ”Number of unrestricted full scans per second. These can be either base-table or full-index scans.” It’s not just a FULL scan, it’s “unrestricted.” That’s sounds really bad, right?
Let’s take a look. You can warm up your test SQL Server instance and restore your favorite sample database to play along with the sample queries below using the AdventureWorks2012 database and perfmon.
Test 1: A full scan
Here’s our first test query. This query technically does an index scan, but it’s a very efficient partial scan– it uses just 3 logical reads to hop to the end of the index and grab the last value.
--This will run 500 times declare @val1 int; select @val1=max(BusinessEntityID) from Person.Person; GO 500
500 of these finish in less than one second. Here’s what the plan looks like. (If you look deep into the properties of the scan operator, you’ll find that it says that it’s a special “Backwards” scan for speed.)
In perfmon, I can see that this query did the following:
- 500 Full Scans/sec (SQL Server:Access Methods)
- 1,908 Page lookups/sec (SQL Server: Buffer Manager – this is logical reads)
- 0 Page reads/sec (SQL Server: Buffer Manager — this is physical reads and the data was already in memory)
Hmm, so what we’ve seen here is that we can make the full scans/sec counter spike with a scan that isn’t really “unrestricted” or “full” in the way that we might think.
Full scan does NOT really mean that it read the entire index! (I’m not the first to discover this, of course. Michael J Swart gives a nice example of a TOP query doing something similar in this post on different types of scans.)
Test 2: a “worse” full scan
Let’s make our test query a bit nastier. This new version of the query is not such a great citizen — it reads 3,820 pages on each run (as opposed to 3 reads from the query before):
--The variable here is just to keep SSMS from choking on the output declare @val1 xml; select @val1 = AdditionalContactInfo from Person.Person; GO 500
This is also a scan but it does more IO. 500 of these take five seconds or so– much longer. Here’s the plan:
When we run this, we see that:
- Full scans/sec only spikes up to a max value of 97 this time– it’s much LOWER, even though we hit the SQL Server with a nastier query. We just ran this from a single SQL Server session, so it didn’t spike up as much and took longer– but it certainly isn’t better than the first query.
- Page lookups/sec spikes up to a max value of 362,161. Each run of the query is doing more logical IO– and the fact that this query does more work shows up much more clearly here.
- Page reads/sec just has a slight bump and at one point 8 reads/sec are registered. The data was already in memory, so this wasn’t really impacted at all.
The “worse” query didn’t spike the full scans/sec counter, but it did more IO. If we were designing monitoring around the full scans/sec counter then we’d probably just throw an alert on the incident that actually had less impact.
Pulling it all together
Monitoring is a tricky business. Performance counters tell you a lot about SQL Server– but if you monitor on individual thresholds on many performance counters, you’ll end up with false alarms and more questions than answers. The counters shown in this post can be useful for baselining, trending, and interpreting in a larger context of information, but by themselves they don’t tell you a full story.
If you’re getting frustrated with perf counters and want to learn what’s really dragging down your SQL Server, check out our free webcasts, our free scripts, or our performance tuning training videos.