Blog

SQL Server 2014 Buffer Pool Extensions

SQL Server 2014 contains some interesting new features. Although SQL Server Standard Edition is limited to 128GB of memory, teams deploying on Standard Edition have an option to fit more of their working set in low latency storage – SQL Server Buffer Pool Extensions.

How SQL Server Normally Deals With Data

During SQL Server’s normal operations, data is read from disk into memory. At this point, the data is clean. Once the data is changed, it is marked as dirty. Eventually the dirty pages are written to disk and marked as clean; clean pages may be flushed from memory when the data cache (the buffer pool) comes under pressure. At this point the data is gone from memory and SQL Server has to read it from disk the next time a user runs a query that requires the data.

As long as you have less data than you have memory, this isn’t a problem. As soon as you have more data than you have memory, you’re at the mercy of physical storage.

Right now, some of you are probably saying “So what? I have an awesome EMC/3PAR/Hitachi VX6.” You also probably have SQL Server Enterprise Edition and a pile of unused RAM sticks. This blog post isn’t for you. Go away.

The rest of you, the 99%ers, listen up.

Speeding Up Data Access with Buffer Pool Extensions

SQL Server 2014 Buffer Pool Extensions are our new secret weapon against not having enough memory. Like most secret weapons, there’s a lot of hype surrounding Buffer Pool Extensions (BPE).

The idea behind BPE is a lot like the idea behind virtual RAM (better known as swap space): fast, low latency persistent storage is used to replace a portion of memory. In the case of BPE, SQL Server will use the disk space to store clean buffers – specifically BPE will hold unmodified data pages that would have been pushed out of RAM.

To see just how fast this was going to perform, I create a test instance of SQL Server and decided to find out.

Test Methodology

I ran SQL Server through a relatively boring test harness – TPC-C running through HammerDB. The database was created at a scale factor of 444 warehouses – this yields 44GB of data on disk, or near enough. SQL Server 2014 RTM was installed on Windows Server 2012 on an Amazon i2.8xlarge instance. To find out more about how the instance was physically configured, you can check out the instance type details page.

SQL Server was set up in a fairly vanilla way:

  • Max degree of parallelism was set to 8
  • Max server memory was left alone
  • tempdb was given 4 data files located on a local all SSD RAID 0 of four drives
  • A second all SSD RAID 0 of four drives was reserved for BPE

Tests were run multiple times and the results of the first test were discarded – many changes during this process could clear the buffer pool as such, the first test results were assumed to be anomalous. The remaining results were averaged to produce the following chart:

It's faster than not having enough RAM.

It’s faster than not having enough RAM.

Conclusions

Having an appropriately sized buffer pool was far more effective than allocating considerable space to buffer pool extensions. BPE improved performance by 42.27%. This is not an insignificant performance gain, but BPE is no substitute for memory.

BPE will shine for customers deploying in dense, virtualized environments where memory is constrained but SSD is cheap and plentiful. Given the near ubiquity of SSD, and high speed consumer grade SSD being available for as little as $0.50 per GB, BPE may seem tempting. It may even provide some respite from performance issues. However, BPE is no substitute for RAM.

Collecting Detailed Performance Measurements with Extended Events

Analyzing a workload can be difficult. There are a number of tools on the market (both free and commercial). These tools universally reduce workload analysis to totals and averages – details and outliers are smeared together. I’m against using just averages to analyze workloads; averages and totals aren’t good enough, especially with the tools we have today.

Collecting Performance Data with Extended Events

A note to the reader SQL Server Extended Events let us collect detailed information in a light weight way. These examples were written on SQL Server 2012 and SQL Server 2014 CTP2 – they may work on SQL Server 2008 and SQL Server 2008R2, but I have not tested on either version.

We need to set up an Extended Events session to collect data. There are three Extended Events that provide information we want:

  • sqlos.wait_info
  • sqlserver.sp_statement_completed
  • sqlserver.sql_statement_completed

In addition, add the following actions to each event, just in case we need to perform deeper analysis at a later date:

  • sqlserver.client_app_name
  • sqlserver.client_hostname
  • sqlserver.database_id
  • sqlserver.database_name
  • sqlserver.plan_handle
  • sqlserver.query_hash
  • sqlserver.query_plan_hash
  • sqlserver.session_id
  • sqlserver.sql_text

In a production environment you should enable sampling. Sampling is just a predicate that filters out random events, typically by using the modulus of the session identifier (e.g. SPID modulo 5 = 0 will sample 20% of activity). Enabling sampling makes sure that you don’t collect too much data (yes, too much data can be a bad thing). During a routine 5 minute data load and stress test, I generated 260MB of event data – be careful about how often you run this and how long you run it for.

To get started, download all of the scripts mentioned in this post.

Reporting on Extended Events

Go ahead and set up the Extended Events session and run a workload against that SQL Server for a few minutes. If you’re analyzing a production server, copy the files to a development SQL Server or to a SQL Server that has less workload – you’ll thank me later. Once you’ve collected some data in the Extended Events session, stop the session, and run the results processing script. On my test VM, processing the data takes a minute and a half to run.

The data processing script creates three tables materialize XML data to speed up processing; shredding XML takes a long time. Relevant data is extracted from each of the different events that were collected and persisted into the processing tables. If you’re going to be doing multiple analysis runs across the data, you may even want to put some indexes on these tables.

Once the data is loaded, run the analysis query. This doesn’t take as long as the processing script, but it does take some time; it’s worth it. The analysis script collects a number of statistics about query performance and waits. Queries are grouped into 1 minute time blocks and metrics around reads, writes, duration, and CPU time are collected. Specifically, each metric has the following statistics built up:

  • Total
  • Average
  • Standard Deviation
  • Minimum
  • Maximum
  • Percentiles – 50th, 75th, 90th, 95th, and 99th

The same thing happens for each wait as well – each wait is time boxed on the minute and then both signal and resource waits are analyzed with the same statistics as query duration metrics.

In both analyses, analysis is performed on a query by query basis. At the end of the analysis we you get a multi-dimensional view of the data by time and query. It should be easy to perform additional analysis on the data to create broader time windows or to analyze the entire dataset at once.

Want to see what the output looks like? Check out these two screenshots:

Query Performance by the Numbers

Query Performance by the Numbers

Query Waits

Query Waits

Why Produce All of These Metrics?

All of these metrics give us insight into how these queries really run; an average just doesn’t help. Standard deviation alone lets us be aware of the variability of a particular metric – high standard deviation on a particular wait type means that we have a lot of variability in how long we wait on a resource. We also collect percentiles of all of these different metrics to help understand the distribution of data.

With this data at our disposal, we can make a better analysis of a workload. Now we can identify variations of a query that are producing bad plans, taking too long, or just reading an excessive amount of data. Or, better yet, if query performance is constant, we know that your code is just plain awful.

How’s that for a parting thought?

SQL Server 2014 Licensing Changes

With the release of SQL Server 2014, we get to learn all kinds of new licensing changes. While I don’t work for Microsoft legal, I do have a PDF reader and a web browser. You can follow along in the SQL Server 2014 Licensing Datasheet… if you dare.

Server + CAL Licensing is Still Around

It’s only for Standard Edition and BI Edition.

Microsoft are highly recommending that VMs be licensed as Server + CAL (rather than per core). This can make a lot of sense when there are small, single application SQL Servers that cannot be consolidated for security reasons. Having a number of Server + CAL license for 1 or 2 vCPU instances can be much more cost effective than having a large number of core based licensed.

Of course, it makes even more sense to just license the entire VM host…

How much money would it take to give you assurance?

How much money would it take to give you assurance?

 

Standby Servers Require Software Assurance

Prior to SQL Server 2014, many shops were able to deploy a single standby server without licensing SQL Server. Log shipping, mirroring, and even failover clustering allowed for an unlicensed passive node, provided that the passive node didn’t become the primary for more than 28 days.

That’s gone.

If you want to have a standby node, you’ve got to pony up and buy software assurance. Head over to the SQL Server 2014 Licensing Datasheet; at the bottom of page three, it reads “Beginning with SQL Server 2014, each active server licensed with SA coverage allows the installation of a single passive server used for fail-over support.” The passive secondary server doesn’t need to have a complete SQL Server license, but Software Assurance is a pricey pill to swallow. In short, Software Assurance (SA) is a yearly fee that customers pay to get access to the latest and greatest versions of products as well as unlock additional features that may have complex deployment scenarios.

In case you were confused, high availability is officially an enterprise feature. Note: I didn’t say Enterprise Edition. I mean enterprise with all of the cost and trouble that the word “enterprise” entails in our modern IT vernacular.

All Cores Must Be Licensed

You heard me.

To license a physical server, you have to license all of the cores. Don’t believe me? Check out this awesome screenshot:

License ALL THE CORES

License ALL THE CORES

It’s even more important to consider alternatives to having a number of SQL Servers spread throughout your environment. SQL Server consolidation and virtualization are going to become even more important as SQL Server licensing changes.

Finding new ways to analyze, tune, and consolidate existing workloads is going to be more important than ever before. Your ability to tune SQL Server workloads is going to be critical in successful SQL Server deployments. The days of worrying when the server hit 25% capacity are fading into history – as licensing costs increase, expect server density and utilization to increase, too.

Standard Edition has a new definition

“SQL Server 2014 Standard delivers core data management and business intelligence capabilities for non-critical workloads with minimal IT resources.” You can read between the lines a little bit on this one – SQL Server Standard Edition isn’t getting back mirroring or anything like it. In fact – SQL Server Standard Edition sounds an awful lot like the database that you use to run your non-critical ISV applications, SharePoint, and TFS servers.

Software Assurance Gives You Mobility

If you want to move your SQL Server around inside your VM farm, you need to buy Software Assurance. VM mobility lets teams take advantage of VMware DRS or SCOM VMM. This isn’t new for anyone who has been virtualizing SQL Servers for any amount of time. What is explicitly spelled out, though, is that each VM licensed with SA can be moved frequently within a server farm, or to a third-party hoster or cloud services provider, without the need to purchase additional SQL Server licenses.”

In other words – as long as you’re licensed for Software Assurance, those SQL Servers can go anywhere.

SQL Server 2014 Licensing Change Summary

Things are changing. DBAs need to take stock of their skills and help the business get more value from a smaller SQL Server licensing footprint. Realistically, these changes make sense as you look at the broader commercial IT landscape. Basic features continue to get cheaper. More resources are available in SQL Server 2014 Standard Edition, but complex features that may require a lot of implementation time, and Microsoft support time, come with a heavy price tag.

Finding One Problem Query With Extended Events

It’s easy to find problem queries on your SQL Server after they’ve happened. You can open your favorite monitoring tool or look in the plan cache. Those options are great for finding things after they’ve happened, but what if you want to find problems as they happen?

Server Side Trace

You could set up a SQL Server Server Side Trace. This is an acceptable solution if you’re using SQL Server 2008 or earlier. There are a few problems with this approach.

Problem the first – the SQL Server Profiler UI is less than beautiful. The GUI interface is somewhat cumbersome and the scripting interface is hilariously bad. To make matters worse – to only capture a single query requires string based filtering. There’s a plan handle property available in profiler, but it’s not available for TSQL or stored procedures.

Something appears to be missing

Something appears to be missing

As much as I like string matching in T-SQL (hint: I don’t), I think it’s fair to say that Profiler/Server Side Trace are best avoided unless you’re using SQL Server 2008 or earlier.

Extended Events

Extended events are nowhere near as hard as you’d think. Trust me on this. How do I know? I set up an Extended Events session to grab a particular query.

Here’s what you do:

  1. Create a crappy stored procedure. Use your imagination.
  2. Run the stored procedure a lot.
  3. Grab the query_hash for that stored procedure. The easiest way
    I’ve found to do this is to use our plan cache query to identify it. My query_hash is 0x2B42702322C10821. What’s yours?
  4. Create an extended events session to capture that specific query hash.

Step 4 isn’t as easy as it sounds. Astute readers will note that I never said this was easy, just that it’s not as hard as you would think.

Even though the query_hash is stored as a hexadecimal in SQL Server, Extended Events would prefer to see an unsigned 64-bit integer. To get this magic number, take your query_hash and cast it as either a BIGINT or a DECIMAL(20,0):

SELECT CAST(0x2B42702322C10821 AS DECIMAL(20,0));
/* my magic number is 3117177188303046689. */

Once you convert the query_hash to a magic_number, create your Extended Events session. I chose to capture the sp_statement_completed and sql_statement_completed events and a number of server level events like time, app name, and client host name. Here’s what it would look like capturing this to a file:

CREATE EVENT SESSION [query hash] ON SERVER 
ADD EVENT sqlserver.sp_statement_completed(
    ACTION(package0.collect_system_time,
           sqlserver.client_app_name,
           sqlserver.client_hostname,
           sqlserver.database_name)
    WHERE ([sqlserver].[query_hash]=(3117177188303046689.))),
ADD EVENT sqlserver.sql_statement_completed(
    ACTION(package0.collect_system_time,
           sqlserver.client_app_name,
           sqlserver.client_hostname,
           sqlserver.database_name)
    WHERE ([sqlserver].[query_hash]=(3117177188303046689.)))
ADD TARGET package0.asynchronous_file_target
(SET filename = 'C:\temp\XEventSessions\query_hash.xel',
     metadatafile = 'C:\temp\XEventSessions\query_hash.xem',
     max_file_size=5,
     max_rollover_files=5)
WITH (MAX_DISPATCH_LATENCY = 5SECONDS);
GO

Once you have that up and running, you should be able to start the session, if it isn’t already started, by running:

ALTER EVENT SESSION [query hash] ON SERVER
STATE = START ;

Just like that, you’ll be capturing your terrible queries to disk where you can mine the extended events files for gold and glory at your own convenience. If you want to query it, it’d look something like this:

WITH events_cte AS (
    SELECT 
        DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), CURRENT_TIMESTAMP),
            xevents.event_data.value('(event/@timestamp)[1]',
            'datetime2')) AS [event time] ,
        xevents.event_data.value('(event/action[@name="client_app_name"]/value)[1]', 'nvarchar(128)')
          AS [client app name],
        xevents.event_data.value('(event/action[@name="client_hostname"]/value)[1]', 'nvarchar(max)')
          AS [client host name],
        xevents.event_data.value('(event/action[@name="database_name"]/value)[1]', 'nvarchar(max)')
          AS [database name],
        xevents.event_data.value('(event/data[@name="duration"]/value)[1]', 'bigint') 
          AS [duration (ms)],
        xevents.event_data.value('(event/data[@name="cpu_time"]/value)[1]', 'bigint') 
          AS [cpu time (ms)],
        xevents.event_data.value('(event/data[@name="logical_reads"]/value)[1]', 'bigint') AS [logical reads],
        xevents.event_data.value('(event/data[@name="row_count"]/value)[1]', 'bigint')  AS [row count]
    FROM sys.fn_xe_file_target_read_file
         ('C:\temp\XEventSessions\query_hash*.xel',
          'C:\temp\XEventSessions\query_hash*.xem',
          null, null) 
    CROSS APPLY (select CAST(event_data as XML) as event_data) as xevents
)
SELECT *
FROM events_cte
ORDER BY [event time] DESC;

Cleaning Up After Yourself

Once you’re done watching a specific query or queries, make sure you clean up after yourself. There’s no reason to add extra load to SQL Server when you aren’t watching. Make sure to stop and remove your Extended Events session:

/* Stop the Extended Events session */
ALTER EVENT SESSION [query hash] ON SERVER
STATE = STOP;
/* Remove the session from the server.
   This step is optional - I clear them out on my dev SQL Server
   because I'm constantly doing stupid things to my dev SQL Server. */
DROP EVENT SESSION [query hash] ON SERVER;

Augment Your Techniques

This won’t replace any other techniques that you’re using to find poorly performing queries. However, you can use this technique to find problem queries as they’re happening.

Different monitoring techniques provide you with a different view of the data. Our plan cache query shows you a historical record of what’s been going on, but it can only show so much. If a query has been run 14,400 times over the last 24 hours, does that mean it’s been run 10 times a minute for that entire period or is there a period of activity where that query gets run 1,440 times a minute for 10 minutes straight?

This isn’t going to replace the need for other tools you have at your disposal. It’s just another way to get the job done.

Extended Events – It Doesn’t Have To Be Hard

A lot of folks would have you think that Extended Events need to be complicated and involve copious amounts of XML shredding and throwing things across the office. I’m here to tell you that it doesn’t have to be so bad.

Collecting Blocked Process Reports and Deadlocks Using Extended Events

When you want to find blocking, you probably turn to the blocked process report. You mess around with profiler on your SQL Server 2012 box. You probably feel a little bit dirty for clunking around in that old interface, but it gets the job done.

There’s a better way… Well, there is at least a less awful way: Extended Events.

CREATE EVENT SESSION [blocked_process] ON SERVER 
ADD EVENT sqlserver.blocked_process_report(
    ACTION(sqlserver.client_app_name,
           sqlserver.client_hostname,
           sqlserver.database_name)) ,
ADD EVENT sqlserver.xml_deadlock_report (
    ACTION(sqlserver.client_app_name,
           sqlserver.client_hostname,
           sqlserver.database_name)) 
ADD TARGET package0.asynchronous_file_target
(SET filename = N'c:\temp\XEventSessions\blocked_process.xel',
     metadatafile = N'c:\temp\XEventSessions\blocked_process.xem',
     max_file_size=(65536),
     max_rollover_files=5)
WITH (MAX_DISPATCH_LATENCY = 5SECONDS)
GO

/* Make sure this path exists before you start the trace! */

With that, you’ve created an Extended Events session to grab blocked processes and deadlocks. Why both? The blocked process report makes use of the deadlock detector. Since large amounts of blocking are frequently synonymous with deadlocking, it makes sense to grab both at the same time. There are a few other things we’ll need to do to make sure you can collect blocked processes:

EXEC sp_configure ‘show advanced options’, 1 ;
GO
RECONFIGURE ;
GO
/* Enabled the blocked process report */
EXEC sp_configure 'blocked process threshold', '5';
RECONFIGURE
GO
/* Start the Extended Events session */
ALTER EVENT SESSION [blocked_process] ON SERVER
STATE = START;

At this point, you’ll be collecting the blocked process report with Extended Events. There’s no profiler session to set up, just start and stop the Extended Event session at your leisure.

Reading the Block Process Report from Extended Events

We’re saving the blocked process report to disk using Extended Events. Now what?

We need to get that blocked process data out of the Extended Events files and somewhere that we can better analyze it.

WITH events_cte AS (
  SELECT 
    xevents.event_data,
    DATEADD(mi,
    DATEDIFF(mi, GETUTCDATE(), CURRENT_TIMESTAMP),
    xevents.event_data.value(
      '(event/@timestamp)[1]', 'datetime2')) AS [event time] ,
    xevents.event_data.value(
      '(event/action[@name="client_app_name"]/value)[1]', 'nvarchar(128)')
      AS [client app name],
    xevents.event_data.value(
      '(event/action[@name="client_hostname"]/value)[1]', 'nvarchar(max)')
      AS [client host name],
    xevents.event_data.value(
      '(event[@name="blocked_process_report"]/data[@name="database_name"]/value)[1]', 'nvarchar(max)')
      AS [database name],
    xevents.event_data.value(
      '(event[@name="blocked_process_report"]/data[@name="database_id"]/value)[1]', 'int')
      AS [database_id],
    xevents.event_data.value(
      '(event[@name="blocked_process_report"]/data[@name="object_id"]/value)[1]', 'int')
      AS [object_id],
    xevents.event_data.value(
      '(event[@name="blocked_process_report"]/data[@name="index_id"]/value)[1]', 'int')
      AS [index_id],
    xevents.event_data.value(
      '(event[@name="blocked_process_report"]/data[@name="duration"]/value)[1]', 'bigint') / 1000
      AS [duration (ms)],
    xevents.event_data.value(
      '(event[@name="blocked_process_report"]/data[@name="lock_mode"]/text)[1]', 'varchar')
      AS [lock_mode],
    xevents.event_data.value(
      '(event[@name="blocked_process_report"]/data[@name="login_sid"]/value)[1]', 'int')
      AS [login_sid],
    xevents.event_data.query(
      '(event[@name="blocked_process_report"]/data[@name="blocked_process"]/value/blocked-process-report)[1]')
      AS blocked_process_report,
    xevents.event_data.query(
      '(event/data[@name="xml_report"]/value/deadlock)[1]')
      AS deadlock_graph
  FROM    sys.fn_xe_file_target_read_file
    ('C:\temp\XEventSessions\blocked_process*.xel',
     'C:\temp\XEventSessions\blocked_process*.xem',
     null, null) 
    CROSS APPLY (SELECT CAST(event_data AS XML) AS event_data) as xevents
)
SELECT 
  CASE WHEN blocked_process_report.value('(blocked-process-report[@monitorLoop])[1]', 'nvarchar(max)') IS NULL 
       THEN 'Deadlock'
       ELSE 'Blocked Process'
       END AS ReportType,
  [event time],
  CASE [client app name] WHEN '' THEN ' -- N/A -- '
                         ELSE [client app name] 
                         END AS [client app _name],
  CASE [client host name] WHEN '' THEN ' -- N/A -- '
                          ELSE [client host name]
                          END AS [client host name],
  [database name],
  COALESCE(OBJECT_SCHEMA_NAME(object_id, database_id), ' -- N/A -- ') AS [schema],
  COALESCE(OBJECT_NAME(object_id, database_id), ' -- N/A -- ') AS [table],
  index_id,
  [duration (ms)],
  lock_mode,
  COALESCE(SUSER_NAME(login_sid), ' -- N/A -- ') AS username,
  CASE WHEN blocked_process_report.value('(blocked-process-report[@monitorLoop])[1]', 'nvarchar(max)') IS NULL 
       THEN deadlock_graph
       ELSE blocked_process_report
       END AS Report
FROM events_cte
ORDER BY [event time] DESC ;

In this query, you read from an Extended Events session that’s being saved to disk and perform XML shredding to get client information. It isn’t a pretty query, but it does the job very well.

Viewing the Extended Events Deadlock Graphs

Extended Events deadlock graphs use a slightly different XML schema than what SSMS expects. You should see an error along the lines of “There is an error in XML document”. For folks using SQL Server 2012 and earlier, you can either parse the XML by hand or use SQL Sentry Plan Explorer.

Viewing the Extended Events Blocked Process Report

But what about the blocked process report? After all, your users are complaining about blocking, right?

Michael J. Swart has created tools to view the blocked process report. It’d be awesome if you could use it, but Michael’s blocked process report viewer uses the output of a server side trace to read blocking information. These Extended Events files are different enough that you can’t use them outright. You can, however, create a table that will let you use the blocked process report viewer:

CREATE TABLE bpr (
    EndTime DATETIME,
    TextData XML,
    EventClass INT DEFAULT(137)
);
GO

WITH events_cte AS (
    SELECT 
        DATEADD(mi,
        DATEDIFF(mi, GETUTCDATE(), CURRENT_TIMESTAMP),
        xevents.event_data.value('(event/@timestamp)[1]',
           'datetime2')) AS [event_time] ,
        xevents.event_data.query('(event[@name="blocked_process_report"]/data[@name="blocked_process"]/value/blocked-process-report)[1]')
            AS blocked_process_report
    FROM    sys.fn_xe_file_target_read_file
        ('C:\temp\XEventSessions\blocked_process*.xel',
         'C:\temp\XEventSessions\blocked_process*.xem',
         null, null) 
        CROSS APPLY (SELECT CAST(event_data AS XML) AS event_data) as xevents
)
INSERT INTO bpr (EndTime, TextData)
SELECT 
    [event_time],
    blocked_process_report
FROM events_cte
WHERE blocked_process_report.value('(blocked-process-report[@monitorLoop])[1]', 'nvarchar(max)') IS NOT NULL 
ORDER BY [event_time] DESC ;

EXEC sp_blocked_process_report_viewer @Trace='bpr', @Type='TABLE';

While you still have to read the XML yourself, this will give you a view into how deep the blocking hierarchies can go. Collecting this data with Extended Events mean that you won’t have to sit at your desk, running queries, and waiting for blocking occur.

Extended Events – Not That Hard

Extended Events aren’t difficult to use. They provide a wealth of information about SQL Server and make it easier to collect information from complex or difficult to diagnose scenarios. You really can collect as much or as little information as you want from SQL Server. When you get started, the vast majority of your work will be spent either looking up Extended Events to use or formatting the output of the queries into something meaningful.

Using SQL Server’s Table Valued Parameters

Table valued parameters (TVPs) are nothing new – they were introduced in SQL Server 2008. TVPs are a great way to get data into SQL Server, and yet very few developers use them.

Getting Started with TVPs

In order to get started, we need to create a user defined type. This type is necessary so we have a well defined interface between SQL Server and the outside world – you can just pass a bunch of junk in as a parameter to SQL Server… well, you could, but that’s called XML.

Let’s create that user defined type:

CREATE TYPE dbo.SalesPersonTerritory AS TABLE
(
    SalesPersonID INT,
    TerritoryID INT
);

Now we can create variables using the dbo.SalesPersonTerritory type whenever we want. It’s just this easy:

DECLARE @spt SalesPersonTerritory;
SELECT * FROM @spt;

Using Table Valued Parameters With Stored Procedures

Having a special table type is really convenient, but it doesn’t help if you can’t use it, right? Let’s assume that you’re calling a stored procedure and you’re sick of joining lists of strings on the client and then splitting them apart on the server. What you really need is the ability to pass a fully fledged table across the wire and into SQL Server.

This is relatively easy to accomplish. From C#, the code would look something like this:

string cnString = @"data source=.;initial catalog=TVPTester;user id=tvptester;password=tvptester;";

using (var connection = new SqlConnection(cnString))
{
    connection.Open();
    using (SqlCommand cmd = new SqlCommand("SELECT * FROM @tvp;", connection))
    {
        var pList = new SqlParameter("@tvp", SqlDbType.Structured);
        pList.TypeName = "dbo.SalesPersonTerritory";
        pList.Value = SalesPersonTerritoryTable();

        cmd.Parameters.Add(pList);

        using (var dr = cmd.ExecuteReader())
        {
            while (dr.Read())
                Console.WriteLine(dr["Item"].ToString());
        }
    }
}

That’s really all there is to it.

C# Tips for Table Valued Parameters

There are a few other things that developers can do to make their life easier when working with table valued parameters. A DataTable, just like a table in SQL Server, should have types declared for all columns. While it’s easy enough for developers to create these tables on the fly, that won’t cut it in production code – boiler plate code means that people can make mistakes.

To make life easier, developers can create methods in their application to make it easier to work with table valued parameters. We can create a special chunk of code that will make it easy for developers to instantiate and use a DataTable that matches up with the table valued parameter.

static DataTable SalesPersonTerritoryTable()
{
    var dt = new DataTable();
    dt.Columns.Add("SalesPersonID", typeof(int));
    dt.Columns.Add("TerritoryID", typeof(int));
    return dt;
}

Seasoned developers will even create methods that let them drop a list of objects straight into an appropriate DataTable. There are many different ways to build convenience methods into code and make it easy for development teams to work with TVPs.

For simple data type matching, this works well. If you’re dealing with more complex data types, you’ll want to check out SQL-CLR Type Mapping to make sure you get the right data type. You’ll notice that some datatypes (varchar for instance) have no direct corollary in the .NET Framework. Sometimes you just have to lose some fidelity – make wise decisions, it gets crazy out there.

These same techniques can be used with a string of ad hoc SQL, too. TVPs aren’t limited to stored procedures, they can be used anywhere that you are executing parameterized code.

Gotchas of TVPs

There are two big gotchas with TVPs.

First: the table variable that comes in as a table valued parameter cannot be changed. You’re stuck with whatever values show up. No inserts, updates, or deletes can be applied.

Second: table valued parameters are still table variables – they get terrible cardinality estimates.

We can get around both of these problems with the same technique – copy the contents of the TVP into a temp table. Although it adds an extra step to using the TVP, I’ve found that copying the contents of the TVP to a temporary table lead to better execution plans and much less confusion during development and troubleshooting.

Summary – Using Table Valued Parameters isn’t Hard.

TVPs aren’t difficult to use. They’re just different and require a different mindset. By using TVPs, developers can pass many rows into a stored procedure and create far more complex logic and behavior that is possible using only single parameter values.

Reporting in Production: SQL Server (video)

Everyone wants reports but nobody wants to build out a separate reporting server. What options do you have short of throwing up your hands in defeat? Join Jeremiah to learn about four SQL Server technologies that help with reporting in production. This session is for DBAs and developers looking for a place to get started with reporting against SQL Server.

For the links and scripts, check out the Reporting in Production: SQL Server page.

Exploring the Magic of the Plan Cache

The plan cache holds a lot of secrets about what’s going on inside SQL Server. In the First Responder Kit we shared one of our plan cache scripts to find the top resource consuming queries. That query works well, but over time we’ve added some additional functionality to the query. I figured it was time to share the new query that we’re using to analyze SQL Server performance.

Our existing query looks at individual query stats, but it doesn’t take into account stored procedure or trigger execution stats. During our SQL Critical Care checks, we’ve found it helpful to look at both procedures and triggers to figure out if they were causing problems for overall server health.

What Are We Looking At?

The original query just looked at sys.dm_exec_query_stats and looked at average and total metrics across CPU, duration, and logical reads.

Plan cache query: classic flavor

Plan cache query: classic flavor

 

This was a helpful approach, but over time it’s become apparent that we needed to look at more than just individual statements – what if a single procedure was causing problems that only show up in aggregate?

The Bigger Picture

To get a view of the bigger picture, we added in two more DMVs - sys.dm_exec_trigger_stats and sys.dm_exec_procedure_stats. This gives us a big picture view of what’s going on inside SQL Server – the only thing that would make this better would be a DMV for function execution stats.

To avoid skewing results, data is aggregated by the query_hash - unfortunately this means the queries won’t work against SQL Server 2005. Not only do we rank queries by CPU, IO, duration, and execution count, but a second level of ranking is provided that ranks queries within their logical grouping – by statement, procedure, and trigger. If you want to see what your most expensive trigger is doing, it’s easy enough to make a quick change to the query.

Check it out:

New and Improved Output

New and Improved Output

You can download the script in the usual way – by agreeing to our crazy terms of service, selling your email address to a foreign government, and clicking “download”.

Using the Query

This query is easy enough to start using right way – just download the file and run it. Once you’re familiar with it, scroll to the bottom and you’ll see two separate queries you can run. The first is suitable for pasting into Excel – it has no query plan and the SQL text is shortened to easily paste into a single cell. The second query has everything that you want.

Since data is dumped into a temporary table during analysis, it’s easy to keep re-querying the temporary table as you re-sort data or refine what you’re looking for.

Dynamic Sorting

While working on some DMV scripts, I came up with a lazy way to have a user definable sort order in the query that seemed like pure genius. I showed it to the team and they’d never seen anything like it before.

The Situation

Users like to be in control. They want to define custom columns, sort orders, and basically drag, drop, and pivot chart their way to victory. While I’m not going to show you how to build custom everything, we can look at a custom sort order.

Let’s start with a simple query:

SELECT  SalesOrderNumber, OrderDate, DueDate, ShipDate,
        PurchaseOrderNumber, AccountNumber, SubTotal,
        TaxAmt, Freight, TotalDue
FROM    Sales.SalesOrderHeader

Possible Solutions

Our users want to be able to define a custom sort order on this query. We could solve this in a few ways:

  1. Writing several stored procedures
  2. Use dynamic SQL to build an ORDER BY

Writing several stored procedures is tedious and error prone – it’s possible that a bug can be fixed in one of the stored procedures but not the others. This solution also presents additional surface area for developers and DBAs to test and maintain. The one advantage that this approach has is that each stored procedure can be tuned individually. For high performance workloads, this is a distinct advantage. For everything else, it’s a liability.

We could also use dynamic SQL to build an order clause. I wanted to avoid this approach because it seemed hacky. After all, it’s just string concatenation. I also wanted to work in the ability for users to supply a top parameter without having to use the TOP operator.

The First Attempt

My first attempt at rocket science looked like this:

DECLARE @SortOrder VARCHAR(50) = 'OrderDate';

SELECT  rn,
        SalesOrderNumber,
        OrderDate,
        DueDate,
        ShipDate,
        PurchaseOrderNumber,
        AccountNumber,
        SubTotal,
        TaxAmt,
        Freight,
        TotalDue
FROM    (
            SELECT  CASE @SortOrder WHEN 'OrderDate' 
                                    THEN ROW_NUMBER() OVER 
                                         (ORDER BY OrderDate DESC)
                                    WHEN 'DueDate' 
                                    THEN ROW_NUMBER() OVER 
                                         (ORDER BY DueDate DESC)
                                    WHEN 'ShipDate' 
                                    THEN ROW_NUMBER() OVER 
                                         (ORDER BY ShipDate DESC)
                    END AS rn,
                    SalesOrderNumber,
                    OrderDate,
                    DueDate,
                    ShipDate,
                    PurchaseOrderNumber,
                    AccountNumber,
                    SubTotal,
                    TaxAmt,
                    Freight,
                    TotalDue

            FROM    Sales.SalesOrderHeader
) AS x
ORDER BY rn ASC;

Why do it this way? There are a few tricks with paging that you can perform using ROW_NUMBER() that I find to be more readable than using OFFSET and FETCH in SQL Server 2012. Plus ROW_NUMBER() tricks don’t require SQL Server 2012.

Unfortunately, when I looked at the execution plan for this query, I discovered that SQL Server was performing three separate sorts – one for each of the case statements. You could generously describe this as “less than optimal”.

A terrible execution plan featuring three sort operators.

Look at all those pretty sorts!

Even though it seems like SQL Server should optimize out the CASE statement, the obvious thing doesn’t happen. SQL Server has to compute the ROW_NUMBER() for every row in the result set and then evaluate the condition in order to determine which row to return – you can even see this in the first execution plan. The second to last node in the plan is a Compute Scalar that determines which ROW_NUMBER() to return.

I had to dig in and figure out a better way for users get a custom sort option.

Moving the CASE to the ORDER BY

My next attempt moved the custom sort down to the ORDER BY clause:

DECLARE @SortOrder VARCHAR(50) = 'OrderDate';

SELECT  SalesOrderNumber,
        OrderDate,
        DueDate,
        ShipDate,
        PurchaseOrderNumber,
        AccountNumber,
        SubTotal,
        TaxAmt,
        Freight,
        TotalDue
FROM    Sales.SalesOrderHeader
ORDER BY CASE @SortOrder WHEN 'OrderDate' 
                         THEN ROW_NUMBER() OVER 
                              (ORDER BY OrderDate DESC)
                         WHEN 'DueDate' 
                         THEN ROW_NUMBER() OVER 
                              (ORDER BY DueDate DESC)
                         WHEN 'ShipDate' 
                         THEN ROW_NUMBER() OVER 
                              (ORDER BY ShipDate DESC)
        END ASC

This ended up performing worse than the first attempt (query cost of 8.277 compared to the original query’s cost of 6.1622). The new query adds a fourth sort operator. Not only is the query sorting once for each of the possible dates, it’s then performing an additional sort on the output of the ROW_NUMBER() operator in the ORDER BY. This clearly isn’t going to work.

I somehow made this query even worse. You should move on.

-50% improvement is still improvement, right?

Getting Rid of ROW_NUMBER()

It seems like ROW_NUMBER() really isn’t necessary for our scenario. After all – I only added it as a trick if so I could potentially add paging further down the road. Let’s see what happens if we remove it from the query:

DECLARE @SortOrder VARCHAR(50) = 'OrderDate';

SELECT  SalesOrderNumber,
        OrderDate,
        DueDate,
        ShipDate,
        PurchaseOrderNumber,
        AccountNumber,
        SubTotal,
        TaxAmt,
        Freight,
        TotalDue
FROM    Sales.SalesOrderHeader
ORDER BY CASE @SortOrder WHEN 'OrderDate' THEN OrderDate
                         WHEN 'DueDate' THEN DueDate
                         WHEN 'ShipDate' THEN ShipDate
        END DESC

Right away, it’s easy to see that the query is a lot simpler. Just look at the execution plan:

Despite the file name, this isn't the winner.

A contender appears!

This new form of the query is a winner: the plan is vastly simpler. Even though there’s a massive sort operation going on, the query is still much cheaper – the over all cost is right around 2 – it’s more than three times cheaper than the first plan that we started with.

There’s one downside to this approach – we’ve lost the ability to page results unless we either add back in the ROW_NUMBER() or else use FETCH and OFFSET.

Bonus Round: Back to ROW_NUMBER

While using my brother as a rubber duck, he suggested one last permutation – combine the ORDER BY technique with the ROW_NUMBER() technique:

DECLARE @SortOrder VARCHAR(50) = 'OrderDate';

SELECT  rn,
        SalesOrderNumber,
        OrderDate,
        DueDate,
        ShipDate,
        PurchaseOrderNumber,
        AccountNumber,
        SubTotal,
        TaxAmt,
        Freight,
        TotalDue
FROM    (
            SELECT  ROW_NUMBER() OVER (ORDER BY CASE @SortOrder 
                                                WHEN 'OrderDate' THEN OrderDate
                                                WHEN 'DueDate' THEN DueDate
                                                WHEN 'ShipDate' THEN ShipDate
                                                END DESC)
                    AS rn,
                    SalesOrderNumber,
                    OrderDate,
                    DueDate,
                    ShipDate,
                    PurchaseOrderNumber,
                    AccountNumber,
                    SubTotal,
                    TaxAmt,
                    Freight,
                    TotalDue
            FROM    Sales.SalesOrderHeader
) AS x 
ORDER BY rn ;

This ended up being almost as fast as the ORDER BY approach; this query’s cost is only 0.00314 higher than the ORDER BY. I don’t know about you, but I would classify that as “pretty much the same”. The advantage of this approach is that we get to keep the ROW_NUMBER() for paging purposes, there is only one sort, and the code is still relatively simple for maintenance and development purposes.

Check out the plan!

Victory is ours!

Victory is ours!

What Did We Learn?

I learned that trying to be smarter than SQL Server can lead to pretty terrible performance. It’s important to remember that the optimizer reserves the right to re-write. In the first and second case, SQL Server did me a favor by re-writing the query. Our third case is fairly obvious. The fourth example is somewhat surprising – by shifting around the location of theCASE, we’re able to eliminate multiple sorts and gain the benefit of using ROW_NUMBER().

The Year in Hadoop: 2013

A lot can change in a year and the world of Hadoop is no exception. Join Jeremiah Peschka in this webinar to learn about new features in Hadoop, changes to existing features, how these changes make your life as a DBA better, and where you can go to get started with Hadoop. This session is for people curious about how Hadoop is changing and how it makes their job better.

css.php