Dear Brent Ozar Unlimited readers: thanks for following me for three great years. I’m off on a new adventure, but I hope to see you at a user group, a SQL Saturday, or a conference in the future!
I’m leaving with a lot of great memories – and here are a few of my favorites from the last three years.
My first company retreat was also my first SQL Cruise!
I got to co-present my first full-day pre-con at PASS Summit with Brent, Jeremiah, and Kendra in 2013.
Body boarding at the company retreats in Mexico is always a blast.
We had awesome training weeks in Chicago in 2013 and 2014.
We ate at Little Goat on May the 4th. Or, Doug usually dresses up for nights out. You can decide.
And I’ll always fondly remember our mascot from Mexico: donk!
Kendra says: Jes has been amazing and fun to work with. But you can tell that from the pictures! We wish her all the best and are proud to have her as an alumnus.
Doug says: Thanks to you, Jes, I’ve learned a whole new repertoire of workplace-appropriate donkey jokes. (Not appropriate for other workplaces. Just this one.) We’re gonna miss you!
Jeremiah says: It’s been a blast working with Jes – for once I wasn’t the loudest person in the room! I’ll miss your energy and your love of log shipping.
Brent says: I can’t believe it’s been three years already since we first talked to Jes about the job – and in a coffee shop, of course. I’m so happy we got to be part of her journey, and I can’t wait to see what she does next.
How often are “bad” statistics to blame for a “slow” or “bad” query? Chances are, not nearly as often as you think.
What are “bad” statistics?
Most often, when people say statistics are “bad”, what they mean is “out-of-date”. But what does that mean?
Let’s compare two tables – dbo.OrderStatuses and dbo.Orders. OrderStatuses is a 25-row table that contains a master list of all possible order statuses. Only one or two changes occur per year. Orders is a 2-million-plus-row table that contains information about every order placed in our system. There are, on average, 10,000 new orders placed per day (currently).
Let’s say that statistics were last updated on dbo.OrderStatuses 90 days ago, and last updated on dbo.Orders 9 days ago. Are either of those “out of date”? The date the statistics were last updated isn’t as important as how many changes have occurred since the last update.
Rougly, all tables with more than 500 rows will need a number of changes equal to 20% of the rows plus 500. (The formula for updating statistics is documented here.) This formula applies to tables of 1,000 rows, 560,000 rows, or even 2,000,000 rows. Larger tables require more changes to automatically update – thus, large tables are more susceptible to “bad” statistics.
How do I tell if statistics are my problem?
Or, as the question was asked in a different way, is there a way to tell if statistics are my problem without looking at the query plan? When asked that way, I would answer, “Not easily.”
Finding out-of-date stats in other ways would require me to do some digging. I’d have to find the number of rows in an index, calculate the change threshold, and use sys.dm_db_stats_properties modification_counter to determine if it’s close to the threshold. Then I’d have to decide if I should manually update the stats or not.
I prefer to approach the problem on a query-by-query basis instead.
What can you do to combat this?
Leave the default options to auto-create and auto-update stats turned on.
Run Ola Hallengren’s index maintenance script – or, his entire set of scripts. Ola’s index maintenance script has the parameter @UpdateStatistics, which allows you to choose index stats, column stats, or all. You can also choose to set @OnlyModifiedStatistics to true, so stats are only modified if rows have changed since the last stats update. This is a great way to include stats maintenance along with index maintenance in your regular routine.
If you have large tables which have frequent updates, but automatic stats updates and a weekly stats job aren’t enough to keep up, it’s acceptable to create a SQL Server Agent Job that updates the stats on that table on a regular basis – perhaps once a day.
In SQL Server 2008 R2 SP1, Microsoft introduced trace flag 2371. With this trace flag enabled, rather than a set 20% change threshold, the more rows you have in a table, the lower the change percentage gets for an auto update. What is the formula? That’s not published yet, but one example is given in the KB – “For example, if the trace flag is activated, update statistics will be triggered on a table with 1 billion rows when 1 million changes occur.” Be careful with this option, though, and test it thoroughly before putting it in production. Frequent stats updates can cause frequent query recompilations – queries can take a bit longer to run, and CPU will be used more.
What if statistics aren’t my problem?
The chances of a poor query performance being caused by out-of-date statistics is very low.
There are so many things that factor into query performance that you should look at first. Is the query slow only for you, or for one set of values? The problem could be parameter sniffing. Evaluate the overall health of your SQL Server by checking your wait statistics, or using a tool like sp_AskBrent to see what is happening in your server right now. Is your server suffering from blocking? Review the query execution plan – are there obvious issues such as large scans, index seeks combined with key lookups, large sorts, or other common problems? And last but not least, do you have baselines to compare to?
Remember: bad, or out-of-date, statistics, can cause problems, but it’s rare. A poorly-performing query is always worth tuning!
Kendra says: I completely agree with Jes– I was once a “statistics blamer” and thought it was the source of all my slow query problems. It turns out that there was just a lot I didn’t understand.
Extended Events have been available since SQL Server 2008, but they are still a black box to most database administrators. Rather than focusing on the technicalities of sessions and targets, Jes will show the value of Extended Events by showing you what problems can be solved. Metrics can be gathered, performance problems can be targeted, and events can be tracked. Let me show you how easy it is!
Brent Says: The timeline in here showing XE changes from SQL Server 2008 to SQL 2014 is really helpful. Profiler is a dead man walking, and XE is more like a zombie that rose up from the dead and became a powerful superhero working for the good guys. (Look, writing analogies isn’t my day job.)
Kendra Says: I agree– Profiler stinks! I especially love Jes’ analysis of what SQL Trace had that we don’t have in Extended Events: super helpful and not something you’d notice right away. I’m totally on Team Extended Events, I just like knowing the little things I may need to code or manage around.
When I start a SQL Server Critical Care with a client, the first thing I ask is, “What is your #1 pain point? Where does it hurt?” The answer I hear most often is, “The server is too slow”.
I’m a runner. If I was able to run a 5K race at an 8:30 per mile pace, I would think I was the fastest runner on earth. The world record for a 5K by a woman is at a 4:34 per mile pace. She would think an 8:30 pace was…slow.
Slow is relative.
Slow is not a metric.
In order to effectively measure our systems, we need metrics.This is why baselines for our systems are so essential.
Getting Started with Baselines
The first thing to ask is, “What information do I need to track?” There are a few data points to monitor on any SQL Server: CPU usage, memory usage, number of reads and writes, time spent on reads and writes, wait statistics, and more.
If you have baselines, when a user says, “The system is slow”, you can check your current metrics against your baselines. Has CPU spiked significantly? You can then use a tool like sp_WhoIsActive or sp_AskBrent® to find queries using a large amount of CPU. Maybe you can’t find anything in SQL Server – use Resource Monitor to see if there’s an external process in Windows chewing up CPU.
Are reads or writes taking longer than usual? Again, use a tool to see what queries are currently running and how much work they are doing. If you have shared storage, such as a SAN, have a method in place to determine if it’s the SQL Server using the storage, or another system connected to the SAN.
Become familiar with wait statistics. Run samples so you know what your average waits are. If you get a call saying, “It’s slow”, look at a 60-second sample of current waits. If the top three are normally CXPACKET, ASYNC_NETWORK_IO, and PAGEIOLATCH_SH, and now LCK_M_IX is second on the list, dig into your server to see what is holding the locks.
Tools for Baselines
You have many options for collecting and reporting on baselines. You can write your own, or you can invest in a third party tool.
You can create your own tool using a variety of tools:
- Perfmon: http://www.brentozar.
com/archive/2006/12/dba-101- using-perfmon-for-sql- performance-tuning/
- Wait Stats: http://www.brentozar.
- DMVs: http://www.brentozar.
- Extended Events: http://www.brentozar.
Data can be saved to a repository database you create, and you can write custom reports to track anything you wish in Reporting Services (or another report tool). Of course, this involves time, effort, and testing. You want to make sure you are tracking the right metrics, storing them correctly, and reporting on them accurately. The level of customization you get is wonderful, but you need to ensure you create an accurate solution.
There are many third party tools available for you to monitor SQL Server. The best way to approach selecting one is to write down a list of what you want to track, and make sure the software you pick has those features. Remember things like custom metrics, reports, and alerts! Most tools will offer a free trial; take advantage of that to make sure you’re making the right decision.
Have metrics, not feelings
“Slow” is relative. “Slow” is not a metric. You never know when someone is having a bad day and everything feels slow to him, or he needs to rush out the door early because his kid is sick, or she is getting ready for vacation and time has slowed down in her world. Measure and track. Numbers don’t lie.
Brent says: You get what you measure. The instant you start putting a graph on your cubicle wall, people start paying attention.
Kendra says: User complaints per day is also a metric! Find out what parts of the application people are complaining about and what their experience is specifically. As you work through issues, “Numbers of complaints solved” is a really useful metric for your next performance review.
I have a new mission: to convince you, the long-time Profiler user, to switch to Extended Events. I realize I have a long, difficult task ahead of me. Profiler has been around for a long time. You’re comfortable with it. You know it’s not perfect, but you have learned to live with its imperfections. Now I want you to step outside your comfort zone and learn a new tool, with new terminology, a new interface and new capabilities.
I’m going to start with a few reasons for you to consider Extended Events. I want you to think about these things, and if they could make your job easier.
1. The number of events in Profiler has remained the same since SQL Server 2008. The number of events in Extended Events (XE) has more than tripled.
What kind of events are being added to XE, but not Profiler? Events for any new features, such as Availability Groups, In-Memory OLTP, and Windows Azure Storage. You can query sys.trace_events to see which events are available for Profiler, and sys.dm_xe_objects to see the events for XE.
/* Profiler */ SELECT trace_event_id, name FROM sys.trace_events; /* XE */ SELECT pkg.name AS PkgName, pkg.description as PkgDescr, obj.name as EventName, obj.description as EventDescr FROM sys.dm_xe_objects obj inner join sys.dm_xe_packages pkg ON pkg.guid = obj.package_guid WHERE obj.object_type='event' and pkg.name NOT IN ('qds', 'ucs', 'SecAudit');
If I am using XE and have an Availability Group, I can track availability_replica_state_change to see what the replica’s previous state and current state are – such as when a replica goes from a secondary to primary. If using SQL Server 2014’s buffer pool extensions, I can count how many times pages are read, using buffer_pool_extension_pages_read, or when they are removed, using buffer_pool_extension_pages_evicted. Even without any special features, XE lets me dig deeper into what SQL Server is doing. I can track async_io_requested and async_io_completed events, which I can’t do with Profiler. You can count page_split as it’s happening, to see if that’s causing a performance problem. XE has many, many events!
2. XE gives you multiple options to save and view the data. With Profiler, you can view the action live on-screen, and risk bringing the server down, or you can save it to a trc file and view it after the fact. XE allows you to choose one or more targets for each session, and they allow a lot of flexibility.
You can “View Live Data” and watch the events you’ve selected scroll across the screen (although I still wouldn’t recommend this approach). You can save your events to a file on disk, then review them later. You can also collect data in ways Profiler didn’t allow. For example, using the event_counter target allows you count the instances of an event happening – while it’s happening, no going back and having to aggregate after the fact. The histogram target is similar, but allows you to put the event in groups – again, as it’s happening, not later. You can also use the pair_matching target to find related events that match.
3. XE sessions are easily scriptable & portable. Yes, you could script out a Profiler session. Does this really make sense to you?
An XE script is simple to read and understand. I can create this once and deploy it to any server.
It’s time to consider moving away from Profiler and using Extended Events. In a future release of SQL Server, Profiler will be gone – so learn more about Extended Events starting today!
If you’ve poked at Extended Events in the last five years, you know the targets store data as…XML. Has XML been the reason you haven’t adopted Extended Events? Don’t let it stop you! In this video, Jes breaks down her method for unraveling the XML of targets.
The scripts I used can be downloaded here.
Jeremiah says: This is great! I wish I’d had this kind of tutorial when I started working with Extended Events.
Kendra says: XQuery is rough, and totally weird when you’re first looking at it. It’s great to get a guided tour of this strange part of the TSQL language.
When Extended Events (XE) were released with SQL Server 2008, I was excited – something new to learn! I read Microsoft articles about them, read blog posts about them, and tried to use them. At that time, there was no GUI for XE. I had to create the sessions with T-SQL. That wasn’t awful – I learned the syntax after some trial and error, and could start and stop sessions. There was also no GUI to view the saved data – and it was all saved as XML.
I gave up when faced with XML. When SQL Server 2008R2 came out, I once again looked at XE. With no GUI for creating or viewing data, I retreated in defeat again. With SQL Server 2012 came the GUI – a new session wizard, a new session dialog box, the (not-recommended) ability to view live data, and the ability to see the information you collected. But that didn’t solve the root problem for me – all the good data, all the results I wanted to view, aggregate, and slice and dice, were still in XML.
So, I learned to query XML with T-SQL. It hasn’t been easy for me to understand. But, by breaking it down into small steps I was able to conquer it. If querying XML has been the reason you’ve been avoiding XE, I’m here to help you get past it!
Create a session
I’m going to create a session that collects wait statistics and puts them in the Ring Buffer target, then run a small workload to capture waits.
CREATE EVENT SESSION [Ring Buffer - Track Waits] ON SERVER ADD EVENT sqlos.wait_info( ACTION(sqlserver.database_id,sqlserver.database_name,sqlserver.is_system,sqlserver.session_id,sqlserver.sql_text)) ADD TARGET package0.ring_buffer WITH (STARTUP_STATE=OFF) GO /* Start session */ ALTER EVENT SESSION [Ring Buffer - Track Waits] ON SERVER STATE=START; /* Run a workload or series of queries. */ USE StackOverflow_20130906; GO SELECT Id, DisplayName, CreationDate, LastAccessDate FROM Users WHERE UpVotes > 1000; SELECT V.PostId, V.VoteTypeId FROM Votes V INNER JOIN Users U ON U.Id=V.UserId WHERE U.DownVotes > 100;
I used the default query from Books Online to view the data. This is not helpful!
SELECT name, target_name, CAST(xet.target_data AS xml) FROM sys.dm_xe_session_targets AS xet JOIN sys.dm_xe_sessions AS xe ON (xe.address = xet.event_session_address) WHERE xe.name = 'Ring Buffer - Track Waits';
Even if I click the target_data column (which I can do!), I just get a page of XML I have to read through. Yuck!
So, how do I make this data understandable, so I can count the number of each wait type that happened? We want to get the XML into a format we can use. I’m going to start by inserting the data into a temp table and selecting it.
SELECT CAST(target_data as xml) AS targetdata INTO #capture_waits_data FROM sys.dm_xe_session_targets xet JOIN sys.dm_xe_sessions xes ON xes.address = xet.event_session_address WHERE xes.name = 'Ring Buffer - Track Waits' AND xet.target_name = 'ring_buffer'; SELECT * FROM #capture_waits_data;
This gives us the same link to a page of XML. What we can do now that that data is in a temp table, though, is “shred” the XML. XML is just nodes of data. You only need to understand the hierarchy of the nodes to query it.
Let’s look at this XML. I can see there is one element, event name=”wait_info” (highlighted in green) that I want to dig into. Under that, I want to get the “wait_type”, “duration”, and “signal_duration” (highlighted in yellow) attributes.
To do so, I use the T-SQL XML value function to pull out wait_type, like so.
SELECT xed.event_data.value('(data[@name="wait_type"]/text)', 'varchar(25)') AS wait_type FROM #capture_waits_data CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data);
Once I understand how to use the value function, I can do the same for other data I want to see.
SELECT xed.event_data.value('(@timestamp)', 'datetime2') AS [timestamp], xed.event_data.value('(data[@name="wait_type"]/text)', 'varchar(25)') AS wait_type, xed.event_data.value('(data[@name="duration"]/value)', 'int') AS wait_type_duration_ms, xed.event_data.value('(data[@name="signal_duration"]/value)', 'int') AS wait_type_signal_duration_ms FROM #capture_waits_data CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data);
Level up: Aggregating data
For this XE session (and wait statistics in general), having every wait listed out isn’t helpful. I want to aggregate the data so I can see which wait occurred the most frequently. What happens when I try to aggregate this query?
SELECT xed.event_data.value('(data[@name="wait_type"]/text)', 'varchar(25)') AS wait_type, COUNT (xed.event_data.value('(data[@name="wait_type"]/text)', 'varchar(25)')) AS count_wait_type, SUM (xed.event_data.value('(data[@name="duration"]/value)', 'int')) AS sum_wait_type_duration_ms, SUM (xed.event_data.value('(data[@name="signal_duration"]/value)', 'int')) AS sum_wait_type_signal_duration_ms FROM #capture_waits_data CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data) GROUP BY xed.event_data.value('(data[@name="wait_type"]/text)', 'varchar(25)');
I have to take a different approach. I’m going to use a derived table. I will put my initial query to select the data from the temp table into the FROM clause, then aggregate on the results.
SELECT xei.wait_type, COUNT(xei.wait_type) AS count_wait_type, SUM(xei.wait_type_duration_ms) AS sum_wait_type_duration_ms, SUM(xei.wait_type_signal_duration_ms) AS sum_wait_type_signal_duration_ms FROM (SELECT xed.event_data.value('(@timestamp)', 'datetime2') AS [timestamp], xed.event_data.value('(data[@name="wait_type"]/text)', 'varchar(25)') AS wait_type, xed.event_data.value('(data[@name="duration"]/value)', 'int') AS wait_type_duration_ms, xed.event_data.value('(data[@name="signal_duration"]/value)', 'int') AS wait_type_signal_duration_ms FROM #capture_waits_data CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data)) AS xei GROUP BY xei.wait_type ORDER BY SUM(xei.wait_type_duration_ms) DESC;
I have usable data!
Last but not least, I’ll clean up after myself.
/* Clean up */ DROP TABLE #capture_waits_data; GO /* Stop session */ ALTER EVENT SESSION [Ring Buffer - Track Waits] ON SERVER STATE=STOP; /* Drop session */ DROP EVENT SESSION [Ring Buffer - Track Waits] ON SERVER;
Querying the XML output from Extended Events sessions is easy if you take it step-by-step. When you are looking at the raw XML, establish what elements you want to extract. (These will vary by the events you choose to capture.) Move the data into a temp table you can work with. Use the XML value function to extract one element at a time, until you have the data you need. At last, if you need or want to, you can aggregate.
Don’t let XML be the reason you won’t use Extended Events – it isn’t for me any longer!
Want to learn more about Extended Events? Start here!
Tables and indexes in your databases have statistics associated with them, and you know they help the query optimizer determine the best way to execute a query. But do you know where the information about them is stored, what types of maintenance you should be performing on them, how to check if they need to be updated, and what causes them to change? Jes will explain the care and feeding of database statistics in this webinar.
Corruption: it can strike at any time. You know this, so you have your page verification option set to CHECKSUM and you run DBCC CHECKDB regularly. When the dreaded day finally arrives, what do you do? If you haven’t faced corruption yet, or you want to brush up your repair skills, Jes will show you how to corrupt your database with a sample database and a hex editor – and how to fix it!
Want to learn more about repair options? Read up on how to restore a page in SQL Server Standard and Enterprise Edition.
The script is available for download, with the understanding that this is purely for test purposes, and will never, ever be used on a production database. Ever.
One of the many restore features in SQL Server is the ability to restore one or more pages of data. This can be very convenient in some narrow situations – for example, corruption occurs on one page or an oops update is made to one record.
The page restore process is not straightforward, however, and, as I recently discovered, the Books Online article about it is confusing. See, you have to perform the restore offline in all versions except Enterprise Edition – but the only example Books Online gives is…Enterprise Edition.
Here’s a straightforward breakdown of how to do a page-level restore both offline and online. For the sake of brevity, let’s say I have two databases – TestRestoreOnline and TestRestoreOffline. Both are in Full recovery. Each has one damaged page, which I’m going to restore. (For a full demo script, click here.)
You should already have an existing full backup – mine is at D:\SQL Backups\TestRestoreOffline-Backup1.bak. I also have one transaction log backup, D:\SQL Backups\TestRestoreOffline-LogBackup1.trn.
/* This example uses an OFFLINE restore, which is applicable to all versions of SQL Server. */ USE master; GO /* The NORECOVERY statement in this last log backup makes the database "offline" - you don't actually set it OFFLINE. */ BACKUP LOG TestRestoreOffline TO DISK=N'D:\SQL Backups\TestRestoreOffline-LogBackup2.trn' WITH NORECOVERY; GO /* Restore full backup, specifying one PAGE. I used sys.dm_db_database_page_allocations to find the page number. */ RESTORE DATABASE TestRestoreOffline PAGE='1:293' --Have multiple? Separate with commmas. FROM DISK=N'D:\SQL Backups\TestRestoreOffline-Backup1.bak' WITH NORECOVERY; /* Restore log backups */ RESTORE LOG TestRestoreOffline FROM DISK=N'D:\SQL Backups\TestRestoreOffline-LogBackup1.trn' WITH NORECOVERY; RESTORE LOG TestRestoreOffline FROM DISK=N'D:\SQL Backups\TestRestoreOffline-LogBackup2.trn' WITH NORECOVERY; /* Bring database "online" */ RESTORE DATABASE TestRestoreOffline WITH RECOVERY;
That is an offline page restore. By putting the database in a NORECOVERY mode before the restores begin, the database can’t be accessed.
An online page restore is only available in Enterprise Edition. This will allow users to access other objects in the database while you are restoring the page(s) needed.
You should already have an existing full backup (D:\SQL Backups\TestRestoreOnline-Backup1.bak) and log backup(s) (D:\SQL Backups\TestRestoreOnline-LogBackup1.trn).
/* This is an example of an online page restore. */ USE master; GO /* Restore full backup, specifying one PAGE. I used sys.dm_db_database_page_allocations to find the page number. */ RESTORE DATABASE TestRestoreOnline PAGE='1:293' --Have multiple? Separate with commmas. FROM DISK=N'D:\SQL Backups\TestRestoreOnline-Backup1.bak' WITH NORECOVERY; /* Restore log backups */ RESTORE LOG TestRestoreOnline FROM DISK=N'D:\SQL Backups\TestRestoreOnline-LogBackup1.trn' WITH NORECOVERY; /* With Enterprise Edition, the "online" restore - a log backup with NORECOVERY - goes here. */ BACKUP LOG TestRestoreOnline TO DISK=N'D:\SQL Backups\TestRestoreOnline-LogBackup2.trn' WITH NORECOVERY; GO /* Restore the last log backup */ RESTORE LOG TestRestoreOnline FROM DISK=N'D:\SQL Backups\TestRestoreOnline-LogBackup2.trn' WITH NORECOVERY; /* Restore database */ RESTORE DATABASE TestRestoreOnline WITH RECOVERY;
The steps for an online restore differ slightly. The tail-log backup is taken after all the other log backups are applied, instead of at the beginning of the sequence.