Blog

Your users probably shouldn’t be able to view all of the data.

You might have regional sales managers who should only see sales for their region, or human resource staff who should be able to see employee details but not salaries.

This is usually challenging with databases.

How We’ve Historically Built Row-Level Security in SQL Server

We modify application code to pass in the user’s name or group name as part of their query, like this:

SELECT * FROM dbo.vwSalesHeader WHERE SalesVisibleTo = ‘MaryJane’

MaryJane doesn’t have to be a Windows or SQL authentication account – it can be anything we want to use as a security identifier. Some apps use a UserID number from their Users table, for example.

Then the vwSalesHeader view joins our SalesHeader table out to several other security tables where user names or roles are listed. The SalesHeader table might have a StoreID field, and then we’d have other tables that listed which users could see which StoreIDs.

This approach typically works well (enough) in reporting applications where we can guarantee all access is done through views. However, it isn’t as effective when users and applications are accessing the tables directly – and that’s where the database server needs to handle row-level security.

Serious Security: Enforcing It at the Server Level

If you create database logins for every end user, and the end user is authenticated in the database, then some database platforms can perform row-level security directly against tables.

The really cool part of this approach is that you don’t need to modify your applications – the apps don’t have to access the data via stored procedures or views in order to get row-level security. If you try to read or modify a row (or in some cases, even a column) that you’re not allowed to, the database simply stops you.

PostgreSQL added support for this via Row Security Policies recently in 9.5, and Oracle’s had Oracle Label Security since at least 10g. In all vendor implementations, the database needs a map between users, roles, tables, and rows. PostgreSQL does this with row security policies that let you build any query you want as the check expression. Oracle builds hierarchical security in at the data label level with compartments and groups. Both are radically different approaches, so there’s not really a concern about how Microsoft’s implementation adheres to standards.

How Azure SQL Database Does It

Books Online explains that the v12 preview will let you:

  1. Create a security predicate function to do the security check
  2. Create a security policy on a table that points to your new security function
  3. Enforce that security policy based on whoever’s logged in – without changing their queries

That’s awesome. Instead of this:

SELECT * FROM dbo.vwSalesHeader WHERE SalesVisibleTo = ‘MaryJane’

Your apps can just select from the table directly (or views still if they want):

SELECT * FROM dbo.SalesHeader

And SQL automatically applies the security policy.

That’s my favorite kind of new feature – one that can be introduced without app changes. It’s absolutely wonderful if you’re letting end users connect directly to the database with PowerBI tools like Excel.

I’m not going to write about the full implementation T-SQL here (function and policy creation) because we’re still ahead of the release date, and you can expect these details to change through the course of the release previews.

Server-Level Security Drawbacks

Most modern web and reporting applications use connection pooling with a single database login for all users, and no concern for “execute as user” statements. It’s challenging to run every query in the security context of the end user – especially if you have end users who aren’t in your actual security system. (Think public end users who don’t have Active Directory accounts.) The number of connections from your web/app tier may skyrocket, although most of those connections will end up being idle or disconnected over time. The alternative is to build in dynamic “execute as user” statements in your data access logic, and that’s nowhere near as trivial as it looks.

Plus, enforcing security at the database server level requires adding a security definition field to every secured table. If you’re not allowed to modify the tables, this approach won’t work. I don’t see this as a serious drawback because it’s still less work than modifying your application to work entirely on views and stored procedures.

Performance will suffer no matter how you implement row-level security. In the big picture, I don’t see this as a drawback because you shouldn’t implement it unless you need it, and if you need it, you should be willing to amp up your hardware requirements in order to pay for the additional business logic requirements. There’s no free lunch.

Silent security generates support calls. Anytime the users know that the system is influencing their query results (be it through Resource Governor, dirty reads, simultaneous loads & queries, or row-level security) then they’re going to think their reporting data is wrong/slow/unavailable because your infrastructure is at fault. If this is a concern for you, you can enable auditing of the security policies, but keep in mind that now you’re talking about even more performance impact.

And of course, there’s the obvious gotcha of having to create database users for these roles. In a perfect on-premise world, you don’t really want to create those in SQL Server – instead, you create groups in Active Directory and then in SQL Server. Let your help desk team manage the group memberships of individual users, and only rarely make changes to the groups in SQL Server. (But still, if you’re constantly adding/editing sales regions, and your queries need to be region-aware, you’re going to be constantly making security changes in production, QA, and dev servers.)

In summary, I think row-level security is one of the coolest new engine features I’ve seen in quite a while, and it’s in Azure SQL Database first. It solves a real business pain, and continues to bring Microsoft’s databases closer to feature parity with Oracle and PostgreSQL.

5 comments ↑ Back to top

I have a new mission: to convince you, the long-time Profiler user, to switch to Extended Events. I realize I have a long, difficult task ahead of me. Profiler has been around for a long time. You’re comfortable with it. You know it’s not perfect, but you have learned to live with its imperfections. Now I want you to step outside your comfort zone and learn a new tool, with new terminology, a new interface and new capabilities.

I’m going to start with a few reasons for you to consider Extended Events. I want you to think about these things, and if they could make your job easier.

1. The number of events in Profiler has remained the same since SQL Server 2008. The number of events in Extended Events (XE) has more than tripled.

Number of events

What kind of events are being added to XE, but not Profiler? Events for any new features, such as Availability Groups, In-Memory OLTP, and Windows Azure Storage. You can query sys.trace_events to see which events are available for Profiler, and sys.dm_xe_objects to see the events for XE.

/* Profiler */
SELECT trace_event_id, name
FROM sys.trace_events;

/* XE */
SELECT pkg.name AS PkgName, 
  pkg.description as PkgDescr, 
  obj.name as EventName, 
  obj.description as EventDescr
FROM sys.dm_xe_objects obj
  inner join sys.dm_xe_packages pkg ON pkg.guid = obj.package_guid
WHERE obj.object_type='event'
  and pkg.name NOT IN ('qds', 'ucs', 'SecAudit');

If I am using XE and have an Availability Group, I can track availability_replica_state_change to see what the replica’s previous state and current state are – such as when a replica goes from a secondary to primary. If using SQL Server 2014’s buffer pool extensions, I can count how many times pages are read, using buffer_pool_extension_pages_read, or when they are removed, using buffer_pool_extension_pages_evicted. Even without any special features, XE lets me dig deeper into what SQL Server is doing. I can track async_io_requested and async_io_completed events, which I can’t do with Profiler. You can count page_split as it’s happening, to see if that’s causing a performance problem. XE has many, many events!

2. XE gives you multiple options to save and view the data. With Profiler, you can view the action live on-screen, and risk bringing the server down, or you can save it to a trc file and view it after the fact. XE allows you to choose one or more targets for each session, and they allow a lot of flexibility.

You can “View Live Data” and watch the events you’ve selected scroll across the screen (although I still wouldn’t recommend this approach). You can save your events to a file on disk, then review them later. You can also collect data in ways Profiler didn’t allow. For example, using the event_counter target allows you count the instances of an event happening – while it’s happening, no going back and having to aggregate after the fact. The histogram target is similar, but allows you to put the event in groups – again, as it’s happening, not later. You can also use the pair_matching target to find related events that match.

histogram

A sample of the histogram, capturing the number of SELECT statements executed per database.

 

3. XE sessions are easily scriptable & portable. Yes, you could script out a Profiler session. Does this really make sense to you?

Do you know what @TraceId, 15, 10, @on means?

An XE script is simple to read and understand. I can create this once and deploy it to any server.

I can clearly tell I'm capturing sp_statement_completed and sql_statement_completed events.

I can clearly tell I’m capturing sp_statement_completed and sql_statement_completed events.

It’s time to consider moving away from Profiler and using Extended Events. In a future release of SQL Server, Profiler will be gone – so learn more about Extended Events starting today!

6 comments ↑ Back to top
You can build a lot with simple tools. Carefully.

You can build a lot with simple tools. Carefully.

I’m a fan of SQL Server’s transaction log shipping. It works in Standard Edition, it’s relatively simple to set up, and you can even make your log shipping secondary readable using STANDBY mode.

I’ve worked with some pretty cool, complex log shipping environments over the years. In one case, we had multiple log shipping secondaries and a load balancer involved to support a full fledged reporting application. It worked pretty well– with a lot of careful scripting and support.

But there’s a few things you should know before you decide to implement this yourself.

“Readable” Log Shipping Secondaries Are Just a Point in Time

Although a log shipping secondary can be made readable using “STANDBY” mode, it’s just readable to a specific point in time. If I bring the secondary online at 2 pm, users can only read data through the last committed transaction in the last log file I restored. And…

Everyone’s Kicked Out When You Restore Transaction Logs

The database can’t be read when you’re refreshing data. You must kick out any users (or not refresh the data).

Logs Restore More Slowly if You Use STANDBY (Readable) Mode

If you’re reading from the secondary, you usually want those periods where it’s unavailable to be as short as possible. If you have a lot of log files to restore, you will probably find that performance of the restores is better if you’re in “NORECOVERY” mode while you’re doing the restores, and then switch back to “STANDBY” at the end of the process so the data can be read.

This can be done, it just can take some fancy scripting.

You will also need to add monitoring for slow restores if getting the database online is critical. If your log shipping secondaries get behind, it’s possible to use differential backups to catch them up.

You can’t use “STANDBY” Mode if your Secondary is a Higher Version

I was a little sad when I learned this one years ago. I had a SQL Server 2005 instance that was the log shipping primary. The log shipping secondary was made readable so that developers could check out production data if needed without having access to production.

Our upgrade plan was to get SQL Server 2008 on the least critical servers first — and that developer access server was one of them. But I couldn’t use STANDBY mode on the log shipping secondary with it as a higher version: it failed with an error like this…

This backup cannot be restored using WITH STANDBY because a database upgrade is needed. Reissue the RESTORE without WITH STANDBY.

When SQL Server brings a database online in a higher version, it needs to make some modifications to that database. This breaks with STANDBY mode because the database is read only.

An aside: unfortunately, the Enterprise Edition feature of a database snapshot doesn’t overcome this limitation if you try to use it against a database mirror to make it readable to a point in time. You get a different error, but it’s the same theme:

Msg 946, Level 14, State 1, Line 1
Cannot open database 'MirrorMePlease_snap' version 661. Upgrade the database to the latest version.
Msg 1823, Level 16, State 7, Line 1
A database snapshot cannot be created because it failed to start.

Auto-Create Statistics Doesn’t Work in the User Database in STANDBY mode

If you’re using your log shipping secondary for reporting, SQL Server can’t automatically create statistics in the database, regardless of your settings. (That read only thing keeps coming up.)

In SQL Server 2012 and higher, this isn’t a huge problem because temporary statistics can get created in tempdb. This new feature was added when Availability Groups came along, but it also works for log shipping secondaries, which is pretty awesome.

Security is a Hoot. And by that, I Mean a Problem.

In many scenarios, you only want to grant read access to a user on the log shipping secondary database. You do not want that user to be able to access the primary. This is tricky.

To read from a database, you need a login with an associated database user. To grant reads to a log shipping secondary database, you can create all the logins you want– but the log shipping secondary database is read only, so you can’t create a user in it.

You have options:

  1. Grant access via stored procedures or views in another database. This will require enabling cross database ownership chaining, which can be a treacherous security road. And a lot of things could go wrong over time as tables are added, modified, and dropped.
  2. Create the login on the log shipping primary instance with the associated database user, and disable the login on the log shipping primary instance. If you’re using SQL authentication, you may have to use a special script to transfer the SID to the log shipping secondary to get it all to work.

Option 2 isn’t terrible, it’s just awkward to have a bunch of disabled logins. Someone can misunderstand and accidentally enable them or delete them, and then… oops.

It’s Complicated, but It’s Still Cool

If you’re clever and dedicated, you can work around these issues and use log shipping to provide read access to either applications or users who shouldn’t be reading from the live production copy of data.

Brent Says: Even with AlwaysOn AGs out, I think I’ve said the words “use log shipping” more in the last two years than I’ve said in the rest of my career. It’s useful, cheap, flexible, and nearly bulletproof.

15 comments ↑ Back to top

Here’s some of the reasons companies usually virtualize their SQL Servers:

  1. Cost savings on hardware
  2. Cost savings on Windows OS licensing
  3. Cost savings on SQL Server licensing
  4. Protect against the failure of a single hardware element
  5. Leverage extended features for Disaster Recovery
  6. Automatic load balancing across multiple hosts
  7. Easier hardware replacement/migration

When we perform a SQL Critical Care® on a virtualized SQL Server, we often ask, “Are we actually getting those benefits?”

1. Cost savings on hardware – do you find yourself putting one SQL Server guest on each host, isolating them to make sure they get the performance they need? If so, you’re not actually saving money on hardware.

2. Cost savings on Windows OS licensing – as a standard, some companies license all their virtualization hosts with Windows Server Datacenter Edition in order to get unlimited virtualization rights. However, if you’re only running one guest per host (or just a few), then you’re not saving money here either.

3. Cost savings on SQL Server licensing – for this one, you’ve gotta do a little bit harder work. Add up the licensing you’re spending now, and look at what it would take to run similar instances on bare metal hardware. Keep in mind that you can still buy dual-socket, quad-core servers that are insanely powerful (768GB RAM, dozens of SSDs), thereby keeping your SQL licensing lower.

We're going to need your parents to sign your report card.

We’re going to need your parents to sign your report card.

4. Protect against the failure of a single hardware element – on the free versions of most hypervisors, you don’t get automatic failover protection. You can manually start up a guest on another host with some human intervention. Is that enough for the business, or are they assuming it’ll all happen automatically with only a minute or two of downtime – even when you’re not around? Or even worse, do you not have enough hardware horsepower to start up your biggest SQL Server guest somewhere else if its host fails? Or, heaven forbid, are you using local SSDs with virtualization, thereby missing the entire ability to move guests around?

5. Leverage extended features for Disaster Recovery – VMware and Hyper-V have killer features (and third-party app extensions) that make it easy to replicate a guest from one site to another. Are you using those, or have you given up because SQL Server’s data change rates are too high, and your network can’t keep up?

6. Automatic load balancing across multiple hosts – VMware’s Distributed Resource Scheduler (DRS) will automatically shuffle VMs around between hosts based on resource utilization. It’s an amazing way to react to performance issues with less human intervention. You should be using it.

7. Easier hardware replacement/migration – because SQL Server licensing is priced by the CPU core, and it’s super expensive, many shops choose to improve their virtualization host hardware annually. Whenever they need more capacity in their VMware or Hyper-V clusters, they drop in a couple of new hosts, vMotion or LiveMigrate the most expensive per-core guests over to those hosts (thereby taking advantage of today’s faster processors), and then give everybody else the hand-me-downs. It’s easy to do even live during the daytime. However, some shops are still running their SQL Servers on CPUs that might get featured on Antiques Roadshow.

If you’re not leveraging at least some of these virtualization features, and you don’t plan to…then what was the point of virtualizing to begin with? Jump on in – the water’s fine!

11 comments ↑ Back to top

If you’ve poked at Extended Events in the last five years, you know the targets store data as…XML. Has XML been the reason you haven’t adopted Extended Events? Don’t let it stop you! In this video, Jes breaks down her method for unraveling the XML of targets.

The scripts I used can be downloaded here.

Jeremiah says: This is great! I wish I’d had this kind of tutorial when I started working with Extended Events.

Kendra says: XQuery is rough, and totally weird when you’re first looking at it. It’s great to get a guided tour of this strange part of the TSQL language.

4 comments ↑ Back to top

Oracle maintains statistics about data stored on disk. The statistics and histograms help guide the optimizer during query plan compilation. These statistics are usually helpful. Sometimes the statistics are not helpful.

As data changes, the statistics collected by Oracle may become less accurate. If I’ve created a histogram on the column transaction_date, my histogram won’t know about data that’s been added since the histogram was last created or updated. Over time, that histogram becomes less helpful. DBAs can update statistics, but another problem can arise – the new statistics might be worse than the old statistics. DBAs need tools to manage stats and verify success before moving new statistics into production.

Historical Statistics

Oracle gives DBAs a powerful tool in managing statistics – historical statistics retention. By retaining historical copies of statistics, a DBA can ensure that there’s always a plan to back out changes that have caused a performance problem. You can view the current historical retention period by running:

SELECT dbms_stats.get_stats_history_retention() FROM dual;

On my Oracle 12c install (12.1.0.2), the retention period is set to 31 days. Larger systems may want to set up smaller retention periods.

Changing the statistics retention is as easy as running:

EXEC dbms_stats.alter_stats_history_retention(5);

Once you have identified the statistics that you want to restore, you can put them back in place using DBMS_STATS.RESTORE_TABLE_STATS:

EXEC dbms_stats.restore_table_stats (
         ownname         => 'SH',
         tabname         => 'CUSTOMERS',
         as_of_timestamp => TO_DATE('2015-01-01 11:38',
                                    'YYYY-MM-DD HH24:MI')
     );

Of course, you would need to know when statistics were last collected. Thankfully, you can review when statistics were collected with the DBA_TAB_STATS_HISTORY view.

Keep in mind that the historical statistics have to go somewhere. Monitor the size of the SYSAUX tablespace and adjust the retention period as needed. You may find that you don’t need all of those historical statistics.

Locking Statistics

Once you’ve found the good statistics, you probably don’t want to keep restoring them over and over again. The next step you can take is to lock the statistics in place. Statistics can be locked at the schema, table, or partition level. Once an object has been locked, anything depending on that object will be skipped during system wide statistics updates.

Locking the statistics on a table is as easy as:

EXEC dbms_stats.lock_table_stats('SH', 'CUSTOMERS');

If there were an index on the CUSTOMERS table that needed a statistics update, we can force Oracle to update statistics using the force flag:

EXEC dbms_stats.gather_index_stats(
         ownname => 'SH',
         indname => 'IX_CUST_FNAME',
         force   => TRUE
     );

By locking some stats in place, we can make sure that important queries don’t slip off a good plan. Using dbms_stats.restore_table_stats makes it possible to restore known good stats. There has to be a way to know if your statistics are going to be a problem before you move them into production, right?

Pending Statistics

Some tables are more volatile than others. When data changes and statistics are re-analyzed it may be possible that data is sampled in a way that can cause problems. This is why we have the ability to lock statistics. In some cases, you may want to test changes to statistics before they go into production, rather than locking statistics in place or rolling back to your last known good statistics. In this case, you can create pending statistics.

To create pending statistics, the first step is to disable publishing statistics for a table:

EXEC dbms_stats.set_table_prefs('SH', 'SALES', 'PUBLISH', 'false');

Once publish is set to false, newly gathered statistics wiill be placed in a holding area rather than being immediately made available for use by Oracle. At this point, the statistics are present, but unusable. Pending statistics can be used by changing the optimizer_use_pending_statistics variable to TRUE.

ALTER SESSION SET optimizer_use_pending_statistics = TRUE;
/* amazing query goes here */

Once it’s been determined that stats are good, they can be be published. DBAs have the option to publish all stats or just stats for a single table:

/* publish all stats */
EXEC dbms_stats.publish_pending_stats(NULL, NULL) ;
/* publish single object stats */
EXEC dbms_stats.publish_pending_stats('SH', 'SALES') ;

Or, if the statistics stink, you can delete them with the dbms_stats.delete_pending_stats procedure.

Exporting Statistics

There’s one other tool DBAs have for managing stats – exporting statistics. After creating a stats table using dbms_stats.create_stat_table, a DBA can export statistics using on of severalEXPORT_*_STATS stored procedures. Statistics can be exported for columns, indexes, schemas, and tables as well as several other database and system level statistics.

The EXPORT_*_STATS procedures will push data into the statistics table and that data can be exported to another Oracle system where an IMPORT_*_STATS procedure can be used to import stats to the other Oracle system. This can be important for development, testing, and staging environments where it may not be possible to restore a large volume of data, but developers need to assess how queries will run in the production environment.

Statistics Maintenance Doesn’t Have to be a Drag

DBAs have powerful tools in Oracle to make sure statistics don’t cause performance problems. By combining these tools, it’s possible to create a stable set of statistics that lead to predictable performance. To learn more about managing Oracle database statistics, head over to the Managing Optimizer Statistics: Basic Topics or refer to the DBMS_STATS documentation for reference material.

2 comments ↑ Back to top

If you’re using AGs, don’t apply these patches:

  • SQL 2012 SP2 CU3
  • SQL 2012 SP2 CU4
  • SQL 2014 CU5

until you read this Microsoft post about a breaking bug.

Your AG may stop synchronizing due to blocking between user queries and a system session. The fix is to disable automatic failover, restart the primary, and enable automatic failover again.

Carry on.

3 comments ↑ Back to top

When Extended Events (XE) were released with SQL Server 2008, I was excited – something new to learn! I read Microsoft articles about them, read blog posts about them, and tried to use them. At that time, there was no GUI for XE. I had to create the sessions with T-SQL. That wasn’t awful – I learned the syntax after some trial and error, and could start and stop sessions. There was also no GUI to view the saved data – and it was all saved as XML.

I ran headfirst into a brick wall.

This is the wall I ran into.

I gave up when faced with XML. When SQL Server 2008R2 came out, I once again looked at XE. With no GUI for creating or viewing data, I retreated in defeat again. With SQL Server 2012 came the GUI – a new session wizard, a new session dialog box, the (not-recommended) ability to view live data, and the ability to see the information you collected. But that didn’t solve the root problem for me – all the good data, all the results I wanted to view, aggregate, and slice and dice, were still in XML.

So, I learned to query XML with T-SQL. It hasn’t been easy for me to understand. But, by breaking it down into small steps I was able to conquer it. If querying XML has been the reason you’ve been avoiding XE, I’m here to help you get past it!

Create a session

I’m going to create a session that collects wait statistics and puts them in the Ring Buffer target, then run a small workload to capture waits.

CREATE EVENT SESSION [Ring Buffer - Track Waits] ON SERVER 
ADD EVENT sqlos.wait_info(
    ACTION(sqlserver.database_id,sqlserver.database_name,sqlserver.is_system,sqlserver.session_id,sqlserver.sql_text)) 
ADD TARGET package0.ring_buffer
WITH (STARTUP_STATE=OFF)
GO

/* Start session */
ALTER EVENT SESSION [Ring Buffer - Track Waits] ON SERVER STATE=START;

/* Run a workload or series of queries. */ 
USE StackOverflow_20130906;
GO 
SELECT Id, DisplayName, CreationDate, LastAccessDate 
FROM Users 
WHERE UpVotes > 1000; 

SELECT V.PostId, V.VoteTypeId 
FROM Votes V 
  INNER JOIN Users U ON U.Id=V.UserId 
WHERE U.DownVotes > 100;

I used the default query from Books Online to view the data. This is not helpful!

SELECT name, target_name, CAST(xet.target_data AS xml)
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe
   ON (xe.address = xet.event_session_address)
WHERE xe.name = 'Ring Buffer - Track Waits';

ring buffer default query

Even if I click the target_data column (which I can do!), I just get a page of XML I have to read through. Yuck!

ring buffer XML

Querying XML

So, how do I make this data understandable, so I can count the number of each wait type that happened? We want to get the XML into a format we can use. I’m going to start by inserting the data into a temp table and selecting it.

SELECT CAST(target_data as xml) AS targetdata
INTO #capture_waits_data
FROM sys.dm_xe_session_targets xet
JOIN sys.dm_xe_sessions xes
    ON xes.address = xet.event_session_address
WHERE xes.name = 'Ring Buffer - Track Waits'
  AND xet.target_name = 'ring_buffer';

SELECT * 
FROM #capture_waits_data;

This gives us the same link to a page of XML. What we can do now that that data is in a temp table, though, is “shred” the XML. XML is just nodes of data. You only need to understand the hierarchy of the nodes to query it.

Let’s look at this XML. I can see there is one element, event name=”wait_info” (highlighted in green) that I want to dig into. Under that, I want to get the “wait_type”, “duration”, and “signal_duration” (highlighted in yellow) attributes.

xml nodes i want to see

To do so, I use the T-SQL XML value function to pull out wait_type, like so.

SELECT xed.event_data.value('(data[@name="wait_type"]/text)[1]', 'varchar(25)') AS wait_type
FROM #capture_waits_data
  CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data);

Once I understand how to use the value function, I can do the same for other data I want to see.

SELECT xed.event_data.value('(@timestamp)[1]', 'datetime2') AS [timestamp],
  xed.event_data.value('(data[@name="wait_type"]/text)[1]', 'varchar(25)') AS wait_type, 
  xed.event_data.value('(data[@name="duration"]/value)[1]', 'int') AS wait_type_duration_ms, 
  xed.event_data.value('(data[@name="signal_duration"]/value)[1]', 'int') AS wait_type_signal_duration_ms 
FROM #capture_waits_data
  CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data);
I can read this!

I can read this!

Level up: Aggregating data

For this XE session (and wait statistics in general), having every wait listed out isn’t helpful. I want to aggregate the data so I can see which wait occurred the most frequently. What happens when I try to aggregate this query?

SELECT xed.event_data.value('(data[@name="wait_type"]/text)[1]', 'varchar(25)') AS wait_type, 
  COUNT (xed.event_data.value('(data[@name="wait_type"]/text)[1]', 'varchar(25)')) AS count_wait_type, 
  SUM (xed.event_data.value('(data[@name="duration"]/value)[1]', 'int')) AS sum_wait_type_duration_ms, 
  SUM (xed.event_data.value('(data[@name="signal_duration"]/value)[1]', 'int')) AS sum_wait_type_signal_duration_ms 
FROM #capture_waits_data
  CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data) 
GROUP BY xed.event_data.value('(data[@name="wait_type"]/text)[1]', 'varchar(25)');
I get an error.

I get an error.

I have to take a different approach. I’m going to use a derived table. I will put my initial query to select the data from the temp table into the FROM clause, then aggregate on the results.

SELECT xei.wait_type, 
  COUNT(xei.wait_type) AS count_wait_type, 
  SUM(xei.wait_type_duration_ms) AS sum_wait_type_duration_ms, 
  SUM(xei.wait_type_signal_duration_ms) AS sum_wait_type_signal_duration_ms 
FROM 
  (SELECT xed.event_data.value('(@timestamp)[1]', 'datetime2') AS [timestamp],
    xed.event_data.value('(data[@name="wait_type"]/text)[1]', 'varchar(25)') AS wait_type, 
    xed.event_data.value('(data[@name="duration"]/value)[1]', 'int') AS wait_type_duration_ms, 
    xed.event_data.value('(data[@name="signal_duration"]/value)[1]', 'int') AS wait_type_signal_duration_ms 
  FROM #capture_waits_data
    CROSS APPLY targetdata.nodes('//RingBufferTarget/event') AS xed (event_data)) AS xei 
GROUP BY xei.wait_type
ORDER BY SUM(xei.wait_type_duration_ms) DESC;

I have usable data!

This is one sleepy system.

This is one sleepy system.

Last but not least, I’ll clean up after myself.

/* Clean up */ 
DROP TABLE #capture_waits_data; 
GO 
/* Stop session */
ALTER EVENT SESSION [Ring Buffer - Track Waits] ON SERVER STATE=STOP;
/* Drop session */
DROP EVENT SESSION [Ring Buffer - Track Waits] ON SERVER;

Querying the XML output from Extended Events sessions is easy if you take it step-by-step. When you are looking at the raw XML, establish what elements you want to extract. (These will vary by the events you choose to capture.) Move the data into a temp table you can work with. Use the XML value function to extract one element at a time, until you have the data you need. At last, if you need or want to, you can aggregate.

Don’t let XML be the reason you won’t use Extended Events – it isn’t for me any longer!

Want to learn more about Extended Events? Start here! 

2 comments ↑ Back to top

This year’s Simple Talk Tribal Awards are out, and the video announcements are pretty funny:

We won in two categories: BrentOzar.com won the Best Blog, and the Twitter Account of the Year goes to @BrentO.

I’ll let Sally Fields take over from here:

3 comments ↑ Back to top

You’re a developer or a DBA, and you’re comfortable writing queries to get the data you need. You’re much less comfortable trying to design the right indexes for your database server.

Join me on Wednesday, Jan 28, for a free one-hour session explaining how the SQL Server engine looks at your indexes and builds your query results.

You’ll learn:

  • The differences between clustered and nonclustered indexes
  • How (and when) to make a covering index
  • The basics of execution plans
  • What determines sargability
  • What SQL Server uses to estimate the memory your query needs
  • What happens when those estimates are wrong
  • And it’s all demoed with these Stack Overflow database pages (PDF)

I’m not kidding when I say it’s the best session I’ve ever written, and I’m really proud of it. This session is normally $29 in our online store, but thanks to Idera, this live version will be free. Register now.

3 comments ↑ Back to top
css.php