Announcing sp_BlitzCache v2.5.0

Woohoo, it’s another half of the year which means it’s time for another version of sp_BlitzCache®. The last version came out on June 18th, and it’s been about 4 months. In that time frame, you have requested some new features, and they’ve slowly trickled into sp_BlitzCache®.

If you don’t want to wait, head over to the download page and get yourself an sp_BlitzCache®!

In no particular order:

Default Values

Raul Gonzalez pointed out that if you set all of the parameters to NULL, sp_BlitzCache® will fail. This happens by default when you script execution through SSMS. Why SSMS does this is beyond me, but there you have it. sp_BlitzCache® will now get really upset and fail to run with a nice error message if you set any of the following to null:

  • @top
  • @sort_order
  • @query_filter
  • @reanalyze

Yay, errors!

Expert Mode

You guys are smart. I mean really smart. To help you keep being smart, I added a bunch of columns to expert mode. Ready?

  • Compilation memory
  • Compilation CPU time
  • Compilation wall clock time
  • Query hash and query plan hash
  • Cached plan size
  • Serial desired and required memory

Lean Options

sp_BlitzCache® used to pull back a lot of data by default. I mean… a lot of data. This could cause execution time to go through the roof on busy systems and systems with big plans. The default value for @top has been reduced to 10. You can still set it back to whatever you want, but just know that you can change it back to the way it was before.

There’s also an option to @skip_analysis. This bypasses the XML shredding that goes on in the execution plans. When you skip the XML analysis, you won’t find out if you have plan warnings or missing indexes, but your executions of sp_BlitzCache® will be much faster and you’ll use less CPU. Use this on heavily resource constrained systems.

Speaking of XML – The XML checks were re-written to be smarter and require less CPU time. We still have to scan the CPU, but I’ve re-written the XQuery in ways that shouldn’t need anywhere near as much CPU time as it used to. The upside is that this also makes it easier to add more checks to sp_BlitzCache®.

Sorting Results

There are two big changes to how results are sorted. One for speed and one for functionality.

An additional sort order option has been added for recent compilation. If you want to sort by recent compiles, just run EXEC sp_BlitzCache @sort_order = ‘compiles’

Some sort orders have always been slow (I’m looking at you “executions per minute”). New changes have pushed the sort earlier int he query and, in testing, these changes make performance faster, especially on busy systems.

Making Big Numbers Smaller

There was a potential for the totals temporary table to have integer overflows. I fixed that.

Basically, there was a significant chance that you might run into problems where you’ve burned through more than 600 hours of CPU time (that’s 2,147,483,647 milliseconds BTW). To help you avoid errors when you’re actually using your CPUs, I added a bunch of math to make numbers smaller. No, really, that’s about all that I did.

In Review: SQL Server 2005 Waits and Queues

Back in November of 2006, Microsoft released “SQL Server 2005 Waits and Queues” as a best practices article. Many people in the SQL Server community have referenced this article as a great place to get started with understanding waits in SQL Server. Heck, I’ve recommend it on a regular basis.

The last time I recommended it I told the recipient, “This is great, but only read it from page X to Y.”

After nearly 10 years, this paper is showing its age. As you read this critique, and if you read the paper, it’s important to consider the historical context. “SQL Server 2005 Waits and Queues” is the first of its kind for SQL Server.

High Level Overview

The idea of the article is that developers and DBAs should use the waits information to identify areas of slow performance in SQL Server (waits) and then associate that with performance monitor counters (queues). The authors imply a methodology that starts with waits, correlates through other data, and, if followed correctly, eventually arrives at a solution for your performance problems.

The core concepts of “SQL Server 2005 Waits and Queues” are solid; queries wait, SQL Server records the waits, you interpret the waits. The overall methodology that the authors propose is a good one, too. Gathering evidence and correlating the evidence with other signifiers is a sound way to approach problem solving.

The devil is in the details. And, in this case, the devil has a hangover.

In the Beginning…

The “SQL Server 2005 Waits and Queues” paper starts off benignly enough. There are places where it sounds downright revolutionary. Even today very few people consider things like:

We outline a methodology that considers performance from the perspective of application, SQL Server, and correlates this to the system or resource perspective.

The Purpose section goes on to mention “This paper outlines a best practice methodology and provides guidelines and thought processes to identify poor perfoming applications”. If “SQL Server 2005 Waits and Queues” stuck to defining process and high level guidelines, this paper would be gold. The process that the authors outline takes a top down approach that allows an observer to start at a system level, rapidly identify problem areas, drill down to the root cause, and then solve the problem.

This Is How We Do It

Immediately after the Purpose section, the authors dive into their performance methodology. Waits are dsecribed as the observed symptom of a problem and perfmon counters are the proof. Or, to look at it another way, waits are what the application observes and perfmon counters are resource waits.

In a stroke of genius, there’s a lengthy example where the authors explain how correlating between lock waits and perfmon counters can help an observer determine if there’s really a problem going on. The crux of the argument is that “associations or correlations allow us to determine relevancy to overall performance”.

Averages are all that we have to go on when passively observing SQL Server. But in many cases, averages are misleading. In the example given, the authors mention looking at average lock time over a given window (the use of averages comes up a lot in this paper). It’s important to remember that averages can be heavily skewed by outliers.

Remember, the methodology described in this paper is a good first glance, but you can’t use it to rule out a specific problem. You can only use this methodology to determine which areas of SQL Server are a performance problem.

The methodology section doesn’t skimp on details. It provides a description of the SQL Server query wait model (running, waiting, runnable) and examples of how a query can move through this.

Unfortunately, it’s in this section where the first hints that’s something isn’t right show up. “A low signal (where signal is less than 25% of the total waits) to resource wait ratio indicates there is little CPU pressure.” The problem isn’t with the statement; if queries are spending 25% of their time waiting to get on a CPU, you’ve got problems. The problem is that the statement comes with no context – there’s no information about why the signal:resource wait ratio is bad and what should be done about it. This problem persists throughout a lot of the paper, too. It’s assumed that the reader knows where to go for verification of the absolute ratios that are provided.


Building a better SQL Server future

Building a better SQL Server future

The authors of “SQL Server 2005 Waits and Queues” attempt to provide performance tuning blueprints for both OLTP and data warehouse applications. If you’ve ever build a house, you know that blueprints change the minute you go out in the real world and these blueprints are no exception. When I was reviewing this article, I scrawled in the margin “here’s where things go south for a while”.

The blueprints attempt to set up a list of scenarios that DBAs need to watch out for. These are typically based on hard and fast thresholds for perfmon counters and DMVs – more on that in a minute.

Although many people have pure data warehouse systems, it’s increasingly rare to find a purely transactional system. A pure transactional system is entirely single item select, insert, update, and delete; there are no reports. This was uncommon back in 2005/2006 and it’s even less common today – business users insist on viewing live data and we have no choice but to deliver. It’s left up to the reader to determine how to synthesize appropriate metrics based on the pure OLTP and pure data warehouse blueprints.

Here’s my first major issue with this article: hard and fast advice is rarely correct. The OLTP blueprint contains a huge amount of advice that could put users in a lot of trouble. In no particular order, developers will run into advice like:

  • Have three or fewer indexes on frequently updated tables.
  • Queries with more than 4 joins can be too normalized for high OLTP scalability.
  • Avoid scans on tables since a missing index flushes the cache.
  • Page Life Expectancy should always be over 300.
  • Page Life Expectancy should never drop by more than 50%.
  • Reads should take 4-8 ms without any IO pressure.

Take that in for a minute.

All of these statements are listed in the “SQL Server 2005 Waits and Queues” article. They’re all presented as hard rules to follow and that indicate performance problems. But there’s no evidence given for any rule being true, nor is there any condition given for re-evaluating them. The authors do provide a single line caveat “the values in Value column are good starting point. The actual values will vary.” At least their bases are covered for being wrong.

Computers have changed significantly in the last 10 years. SQL Server has seen 4 releases since this paper was published and we’re about to be on SQL Server 2016. The query optimizer has seen improvements, storage devices have seen improvements, and memory has seen improvements. When this paper was released, a dual core Xeon was a high end server CPU. In 2015 I have a quad core processor in my phone; it’s not uncommon to see 4, 6, or 8 core processors in modern servers.

As you read through the blueprints, it’s important to remember how things have continued to change in the SQL Server world. The first rule for data warehousing insists that data warehouses should have more indexes, not fewer. Whereas a lot of modern thinking on data warehouses runs counter to this and is focused on making sure disk throughput is fast enough.

Other gems from the data warehouse blueprint include:

  • Avoid plan re-use.
  • If top 2 sys.dm_os_wait_stats values contain IO, there’s a blocking bottleneck. (Never mind that a data warehouse will usually have more data than RAM and will need to read from disk)
  • Avoid big scans because a missing index flushes the cache.
  • “The time taken for plan generation of a large query is not nearly as important as having the right plan”.

It’s noble that the authors make these attempts, but the blueprint requires a significant amount of customization and modification to make it worth considering, much less using in production.

The real value of the blueprint comes from suggested correlations between counters and waits, but you need to have solid baselines and a good understanding of how different parts of SQL Server work in order to synthesize this into something meaningful.

By the way, this only gets us to page 16 of 94.

Just eat the candy, already.

Just eat the candy, already.

What’s Stopping You?

The next major section deals with finding the typical resource bottlenecks in a SQL Server. This is a pretty boring section, truthfully. It contains descriptions of problem areas (“memory”, “IO stalls”) and a sample query that will help the reader look into the problem.

The biggest specific complaint I have about this section is that the authors insist on usingPage Life Expectancy as a meaningful performance indicator.

Outside of that, there a glaring general concern – at no point does this paper tell the reader what to do with the information gathered by these queries. It’s left up to the reader to interpret IO stalls, memory pressure, missing indexes, and query plan problems. During a presentation I’ll sometimes answer an unforeseen question with “that’s an exercise left up to the reader”, readers of 100 page best practices articles come to an article expecting best practices, not homework.

It’s also in this section where vague guidelines start to creep in. Readers are instructed to “make sure that the [blocked process] threshold is not too low” – the default for this is 5 seconds, earlier in the article it is recommended that the blocked process threshold should be 30 seconds. But no guidance is provided to help a DBA determine the right threshold.

And, ultimately, that’s my chief complaint – there’s very little reasoning given around the numbers presented in this best practices article. It’s assumed that the reader knows SQL Server well enough to read between the lines, refine existing numbers, and synthesize additional performance metrics. When support information is provided, it’s often in the form of another best practices article. These vary in length from 20 pages to 100 pages – I don’t know about you,but the idea of reading War & Peace: Best Practices Edition over the weekend doesn’t sound that fun.

The “Good” Part

Starting on page 22, this best practices paper turns into a list of SQL Server wait types and correlating performance monitor counters. This goes from page 22 – 79. At that point, the “queues” portion kicks in and then you’re looking at performance monitor counters. I used to recommend that people read through the list of waits to get a better idea of what’s going on inside SQL Server.

Don’t use this part of the article.

Look at sys.dm_os_wait_stats. It has a much better explanation of individual waits.

Nowhere in this article do the authors explain which waits are ignorable and which aren’t.SOS_SCHEDULER_YIELD is given just as much weight as QPJOB_KILL. It’s left to you to determine if any of these waits are significant. Thankfully, we’ve got a wait stats script that will ignore many waits for you.

What’s worst of all is that some of the advice in this section is just plain crazy.

  • ASYNC_IO_COMPLETION – This is your SQL Server asynchronously writing to disk. To solve the problem, the authors suggest you check for memory pressure and bad query plans.
  • CXPACKET – The guidance for this wait type has been superseded by KB 2806535
  • DBTABLE – This wait is labeled “internal only”, but additional information is given on correlating it to performance problems. Should I even worry if it’s internal only? Can I even do anything about it?
  • Failpoint – if you made it to the Failpoint, you clearly haven’t reached your own internal fail point.
  • LATCH_x – SQL Server 2005 introduced sys.dm_os_latch_stats which helps a DBA determine which latches are causing problems. Does this paper point the reader to the right DMV? Nope. It just says “Possibly the most common case is contention on internal caches (not the buffer pool pages), especially when using heaps or text.”
  • LATCH_x + PAGELATCH_UP means you should partition the table, BTW.
  • If you see LCK_x you should check for memory pressure. Unfortunately, we’re left to our own devices to figure out how. If you’ve been taking this article at its word, you’re probably running for the First Church of Page Life Expectancy to find everything else.
  • OLEDB is explained to largely be caused by traces and linked servers. The mitigation, set up more traces, figure out where your client apps sit and whether or not they’re loading files, or query sys.dm_exec_requests to find out which SPIDs are seeing this wait. Nowhere does it mention stopping your traces and not using linked servers.
  • WRITELOG waits suggest looking in sys.dm_io_virtual_file_stats (good) and query plan output (bad). Raise your hand if you’ve ever successfully used a query plan to spot I/O problems.
Context switches are an important measurement of... something.

Context switches are an important measurement of… something.

Surely the queues section must be better… He said, hopefully.

  • Queue lengths are regarded as important. Disk queue length is especially spurious given modern storage technology. Formerly alarming numbers are now considered anemic – it’s not uncommon to see recommended queue depths as high as 128 or 256.
  • “Under typical circumstances reads should take 4-8ms” – they going on to remind you to check with your vendor. Which is good because 4-8ms of storage latency is either some kind of local solid state storage or else the time it takes your HBA to acknowledge a write.
  • Page Faults/sec – this metric is trash. It measure when your CPU can’t find a page in memory OR when the page isn’t in the cache the CPU thought it should’ve been in. This is normal. Also – you can’t control it. Even with all of your data in memory, your server will still show page faults occurring.
  • Pages/sec – this measures reading data to and from memory. Once again, this is junk. Microsoft PFE Clint Huffman debunked Pages/sec as a meaningful counter back in 2009.
  • Page Splits/sec is a counter to watch, right? Well, any addition of page to the end of an index is technically a page split. So is splitting a page. So we don’t know if these are good or bad, but they’re happening!
  • Context Switches/sec is mentioned as something to watch, too. According to the paper, if this goes over 20,000 there’s a problem and if it goes over 50,000 there’s a big CPU problem. One time I opened a program and it produced 33,000 context switches per second. Time to throw this laptop away.

At this point, I literally can’t even.


If you’re reading “SQL Server 2005 Waits and Queues” for the first time, read pages 1-7 and then skip to the conclusion. Ignore everything in between.

It’s important to consider this article in a historical perspective – it’s the first serious attempt at documenting a tuning process for SQL Server that combines perfmon counters and wait stats. In today’s context, the concrete advice from the article doesn’t hold up; but the concepts and broad methodology are still useful for identifying SQL Server problems.

If you are looking for guidance on tuning SQL Server using wait stats, more accurate information can be found in SQL Server Books Online and third party sites (like this blog).

Make Technical Decisions Easily With This One Trick

Decisions are emotional, right? Brent loves Sanka and I love Postum. We fight about it all the time. But when we wanted to settle the debate once and for all, we engineered a test to figure out who was right. You can do the same thing to take the personal investment out of technical decisions at work.

Check Yourself at the Door

The first thing you need to remember is that your opinions are just as valid everyone else involved. You need to move those aside and be ready to be wrong.

If the other people involved don’t want to play along, just tell them “Heck, I’d be happy to be wrong because I’ll learn something.” It’s not a bet or a contest, you’re just offering up your willingness to be wrong. Being wrong is great, especially when it’s your turn to be right later on.

Test, Test, Test

This dog knows more science than I do

This dog knows more science than I do

The next step to making that decision is to figure out a test. This test has to depend on your different opinions. The purpose of this test is to get your opinions out of the conversation.

Doing this correctly is really hard. You need to figure out:

  • What are both sides saying?
  • Which metrics will prove both points?
  • What won’t prove anyone’s points?
  • What’s the fastest way to test both options?
  • What’s a realistic scale to for testing?
  • What’s the worst that could happen if either side is right?
  • What’s the worst that could happen if either side is wrong?
  • If you can only run one test, which test should you run?

Hey, You Said This Was Easy!

All of this sounds like a lot of work. It turns out being factually right is just as much work as being factually wrong. If you really want to make sure that you’re choosing the right solution to a problem you need to figure out which option is the most right way to solve the problem – both solutions could be good, but one just might be better. The only way to get to proof is to test everything.

The next time there’s an argument on your team, or between two teams, figure out the best way to test each side’s ideas instead of spending your time arguing about which solution is the best.

Of course, you could always just fight in the parking lot like school kids. I hear that works well, too.

Forgotten Maintenance – Cycling the SQL Server Error Log

Most of us get caught up in fragmentation, finding the slowest queries, and looking at new features. We forget the little things that make managing a SQL Server easier – like cylcing the SQL Server error logs.

What’s the Error Log?

The SQL Server error log is a file that is full of messages generated by SQL Server. By default this tells you when log backups occurred, other informational events, and even contains pieces and parts of stack dumps. In short, it’s a treasure trove of information. When SQL Server is in trouble, it’s nice to have this available as a source of information during troubleshooting.

Unfortunately, if the SQL Server error log gets huge, it can take a long time to read the error log – it’s just a file, after all, and the GUI has to read that file into memory.

Keep the SQL Server Error Log Under Control

It’s possible to cycle the SQL Server error log. Cycling the error log starts a new file, and there are only two times when this happens.

  1. When SQL Server is restarted.
  2. When you execute sp_cycle_errorlog
Change everything!

Change everything!

When SQL Server cycles the error log, the current log file is closed and a new one is opened. By default, these files are in your SQL Server executables directory in the MSSQL\LOG folder. Admittedly, you don’t really need to know where these are unless you want to see how much room they take up.

SQL Server keeps up to 6 error log files around by default. You can easily change this. Open up your copy of SSMS and:

  1. Expand the “Management” folder.
  2. Right click on “SQL Server Logs”
  3. Select “Configure”
  4. Check the box “Limit the number of error log files before they are recycled”
  5. Pick some value to put in the “Maximum number of error log failes” box
  6. Click “OK”

It’s just that easy! Admittedly, you have to do this on every SQL Server that you have, so you might just want to click the “Script” button so you can push the script to multiple SQL Servers.

Automatically Rotating the SQL Server Error Log

You can set up SQL Server to automatically rotate your error logs. This is the easiest part of this blog post, apart from closing the window.

To cycle error logs on a regular basis, restart your SQL Server nightly.

Only joking.

You can set up a SQL Agent job with a T-SQL step. All it has to do is EXEC sp_cycle_errorlog. Schedule the SQL Agent job to run as frequently as you’d like and you’re good to go. The upside of this approach is that it’s automatic and the SQL Server error logs will be more granular, making it easier to find the error messages you’re looking for.

It’s Just That Easy!

Cycling the SQL Server error log is easy – you just need a regularly scheduled agent job. Rotating the logs makes it easier to find error messages. Let’s face it – you’re only looking for error messages when there’s a problem. That’s all there is to rotating the error logs.

Getting Started with Diskspd

Diskspeed, or diskspd.exe, is Microsoft’s replacement for SQLIO. While I’m not going to replace our SQLIO tutorial, you can use the information in here to replicate the same type of tests on your systems to get the information you need. During the Dell DBA Days, Doug and I used diskspd as one of our techniques for getting a baseline of raw performance. We wanted to get an idea of how fast the servers and storage before running SQL Server specific tests.

How do I get diskspd?

You can download diskspd directly from Microsoft – Diskspd, a Robust Storage Testing Tool, Now Publically Available. That page has a download link as well as a sample command.

The upside is that diskspd is a fully self-contained download. You don’t need Java, .NET, or anything else installed to run it. Apart from Windows – you’ll still need Windows.

How do I use diskspd?

With the command line, of course!

In all seriousness, although diskspd is the engine behind Crystal Disk Mark, it stands on its own as a separate tool. Download the executable and unzip it to an appropriate folder. There are going to be three sub-folders:

  • amd64fre – this is what you need if you have a 64-bit SQL Server
  • armfre
  • x86fre

I took the diskspd.exe file from the appropriate folder and dumped it in C:\diskspd so I could easily re-run the command. Let’s fire up a command prompt and try it out.

Here’s a sample that we ran: diskspd.exe -b2M -d60 -o32 -h -L -t56 -W -w0 O:\temp\test.dat > output.txt

Breaking it down:

  • -b2M – Use a 2 MB I/O size. For this test, we wanted to simulate SQL Server read ahead performance.
  • -d60 – Run for 60 seconds. I’m lazy and don’t like to wait.
  • -o32 – 32 outstanding I/O requests. This is your queue depth 32.
  • -h – This disables both hardware and software buffering. SQL Server does this, so we want to be the same.
  • -L – Grab disk latency numbers. You know, just because.
  • -t56 – Use 56 threads per file. We only have one file, but we have 56 cores.
  • -W – Warm up the workload for 5 seconds.
  • -w0 – No writes, just reads. We’re pretending this is a data warehouse.
  • D:\temp\test.dat – our sample file. You could create a sample file (or files) by runningdiskspd with the -c<size> flag.
  • > output.txt – I used output redirection to send the output to a file instead of my screen.

How do I read diskspd results?

You’re going to get a lot of information back from this command. You’re going to want to close the window and back away quickly. Don’t. This is good stuff, I promise.

The first thing you’ll see is a recap of the command line you used. Then you’ll immediately see a summary of the commands:

timespan:   1
duration: 60s
warm up time: 0s
cool down time: 0s
measuring latency
random seed: 0
path: 'O:\temp\test.dat'
    think time: 0ms
    burst size: 0
    software and hardware write cache disabled
    performing read test
    block size: 2097152
    using interlocked sequential I/O (stride: 2097152)
    number of outstanding I/O operations: 32
    thread stride size: 0
    threads per file: 56
    using I/O Completion Ports
    IO priority: normal

That’s a lot easier than trying to read a set of command line flags. Six months from now, I can review older runs of diskspd and understand the options that I used. diskspd is already winning over sqlio.

Next up, you’ll see a summary of CPU information. This information will help you understand if your storage test is CPU bottlenecked – if you know the storage has more throughput or IOPS capability, but your tests won’t go faster, you should check for bottlencks. The last line of this section (and every section) will provide an average across all CPUs/threads/whatevers.

actual test time:   60.01s
thread count:       56
proc count:     56

CPU |  Usage |  User  |  Kernel |  Idle
   0|  23.17%|  10.36%|   12.81%|  76.83%
   1|   4.24%|   2.40%|    1.85%|  95.76%
   2|   9.71%|   7.37%|    2.34%|  90.29%
   3|   1.48%|   0.70%|    0.78%|  98.52%
avg.|  14.11%|  12.48%|    1.63%|  85.89%

After the CPU round up, you’ll see a total I/O round up – this includes both reads and writes.

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
     0 |      2950692864 |         1407 |      46.89 |      23.44 | 1323.427 |   107.985 | O:\temp\test.dat (50GB)
     1 |      3013607424 |         1437 |      47.89 |      23.94 | 1310.516 |   141.360 | O:\temp\test.dat (50GB)
     2 |      2950692864 |         1407 |      46.89 |      23.44 | 1319.540 |   113.993 | O:\temp\test.dat (50GB)
     3 |      2950692864 |         1407 |      46.89 |      23.44 | 1315.959 |   122.280 | O:\temp\test.dat (50GB)
total:      167975583744 |        80097 |    2669.28 |    1334.64 | 1307.112 |   162.013

Look at all of those bytes!

If the I/O numbers initially seem small, remember that the data is split up per worker thread. Scroll down to the bottom of each section (total, reads, writes) and look at the total line. This rounds up the overall volume of data you’ve collected. The I/Os are recorded in whateverunit of measure you supplied. In our case, the I/Os are 2MB I/Os.

Important Sidebar Your storage vendor probably records their I/O numbers in a smaller I/O measurement, so make sure you do some rough translation if you want to compare your numbers to the advertised numbers. For more discussion, visit IOPS are a scam.

Finally, latency! Everybody wants to know about latency – this is part of what the end users are complaining about when they say “SQL Server is slow, fix it!”

  %-ile |  Read (ms) | Write (ms) | Total (ms)
    min |     13.468 |        N/A |     13.468
    max |   1773.534 |        N/A |   1773.534

This table keeps the min, max, and a variety of percentiles about how the storage performed while you were beating on. This information is just as helpful as the raw throughput data – under load your storage may have increased latencies. It’s important to know the storage will behave and respond under load.

How often should I use diskspd?

Ideally, you should use diskspd whenever you’re setting up new storage or a new server. In addition, you should take the time to use diskspd when you make big changes to storage – use diskspd to verify that your changes are actually an improvement. No, diskspd doesn’t include the work that SQL Server does, but it does show you how your storage can perform. Use it to make sure you’re getting the performance you’ve been promised by your storage vendor.

Enabling Query Store in Azure SQL Database

Enter Query Store

Query Store, in short, is a way to track query performance over time. In Microsoft’s words, “The feature automatically captures a history of queries, plans, and runtime statistics, and retains these for your review”. It’s like sys.dm_exec_query_stats but it persists across reboots! And it has execution plans!

The Query Store is a great feature for on premises SQL Server backed applications, but it adds significant value in Azure SQL Database. Instead of potentially losing execution history and execution plans, developers can see the history of query performance and execution plans across reboots.

Enabling Query Store in Azure SQL Database

Point SSMS at your Azure SQL Database instance and run:


No, really, it’s that easy.

How Do I Really Know Query Store is Enabled?

Figuring out if Query Store is enabled is pretty easy, too. All you need to do is look at sys.database_query_store_options. The following query should give you exactly what you need to know about the Query Store:

SELECT  desired_state_desc ,
        actual_state_desc ,
        current_storage_size_mb , 
        max_storage_size_mb ,
FROM    sys.database_query_store_options ;

As long as things are running correctly, the desired and actual states should read either ALL or AUTO. For more information about using Query Store, check out Monitoring Performance by Using the Query Store.

The Takeaway

Azure SQL Database instances can be rebooted at any time. It’s important to keep this in mind and plan accordingly, not just in the application but also in how you approach tuning your T-SQL. You should enable the Query Store in Azure SQL Database instances where you care about performance.

Getting JSON out of SQL Server

SQL Server 2016 brings support for JSON. For those of you who don’t know, JSON is JavaScript Object Notation – it’s a way of representing application level objects as a string. Imagine XML without anywhere near as much line noise, and you’re pretty much there.

Our Sample Document

In this example, to retrieve orders and their line items from the AdventureWorks database. We need to get the data out in this format for fast processing. In the end, our data should look something like:

{ "orders" : 
    [ /* an array of orders */
            "number" : "123456" ,
            /* more stuff goes here */
            "items" : 
            [ /* an array of items */
                { /* item info goes here */ }

Holy cow, that’s a lot of stuff just to represent and order and the line items.

Here’s our starter query:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        d.ProductID AS [item.product_id],
        d.OrderQty AS [item.quantity],
        d.UnitPrice AS [item.unit_price],
        d.UnitPriceDiscount AS []
FROM    Sales.SalesOrderHeader h
        JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
WHERE   h.SalesOrderNumber = 'SO43659'

First Attempt

The documentation hints that we might just be able to use `FOR JSON AUTO` option to automatically format our JSON. What could go wrong?

Syntax is hard

Syntax is hard

Our results aren’t what we want at all! The structure is close, but not quite what we’re looking for.

Getting the Structure Right

We are getting all the data we want, but we’re not getting the structure that we want. Our requirements say that we need to have and `orders` array and then each `order` should be a separate item in that array. An orders line items should also be stored in a separate array.

The square brackets make an array in JSON. The curly brackets denote an object.

Using the root() function will nest our results in a root object. In this case, we can use root('orders') to create a basic orders object to hold our list of orders.

Making each item part of an array inside the order is going to be trickier. We can still use FOR JSON AUTO to get the job done:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        (SELECT d.ProductID AS [item.product_id],
                d.OrderQty AS [item.quantity],
                d.UnitPrice AS [item.unit_price],
                d.UnitPriceDiscount AS []
         FROM   Sales.SalesOrderDetail d 
         WHERE  h.SalesOrderID = d.SalesOrderID 
         FOR    JSON AUTO) AS items
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'

We’re almost right!

So wrong I can taste it.

So wrong I can taste it.

Well, that’s … something. To get the output we want, we’re going to have to use a correlated subquery – fans of generating XML in SQL Server may remember jumping through hoops to get the right structure out of SQL Server.

We’re on the right track, but what we really need is better formatting.

The Best Solution, For Now

Since the FOR JSON AUTO isn’t generating the JSON output that we want, we can go down a manual route and use FOR JSON PATH. If this is starting to sound like XML, you’re absolute right.

After attaching a debugger to SQL Server, Paul White shows that the JSON writer is using the old XML code under the hood:

Anyway, the SQL to generate the right JSON document:

SELECT  h.SalesOrderNumber AS [number],
        h.DueDate AS [due_date],
        h.ShipDate AS [ship_date],
        h.PurchaseOrderNumber AS [po_number],
        h.TotalDue AS [total_due],
        h.Freight AS [freight],
        (SELECT d.ProductID AS [product_id],
                d.OrderQty AS [quantity],
                d.UnitPrice AS [unit_price],
                d.UnitPriceDiscount AS [discount]
         FROM Sales.SalesOrderDetail d 
         WHERE h.SalesOrderID = d.SalesOrderID
         FOR JSON PATH
        ) AS [items]
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'

Check out the correct (and formatted) results!

Getting JSON out of SQL Server

It’s easy to get incorrectly formatted JSON out of SQL Server. If you do need to get well formatted JSON out of SQL Server be prepared to put in a lot of work coercing your query into the right shape. You may find that you need to do a large number of nested sub-queries, too.

The Fastest Way to Reconfigure a Bunch of Servers

… is to use a SQL Server where a robot does it for you!

I stole that from Kendra – someone else already wrote the sentence for me and why not keep re-using things, right?

Configuring SQL Servers the Usual Way

What’s the usual way to do this?

There are a few commercial tools on the market that let you deploy scripts to multiple servers, they all vary in how they apply the changes and deal with errors.

You could also roll your own tools. This gives you full control of the process, but you have to spend time implementing and maintaining code over time.

Finally, you could just hire junior DBAs and make them implement the changes individually. This will give all of your servers a delightful hand-crafted flair. Sure, the changes may or may not all be applied with equal rigor, but hey, you’re a job creator now.

SQL Server RDS Parameter Groups

SQL Server RDS is part of Amazon’s database as a service offering. We’ve covered RDS a couple times before and we’re still in love with RDS all these years later.

One of my favorite features of RDS is the idea of parameter groups. Parameter groups are a unified set of parameters. A parameter group contains a bunch of different things that we’d normally set up through trace flags and sp_configure changes.

Parameters everywhere

Parameters everywhere

How Can I Change a Parameter Group?

Changing parameters in a parameter group is incredibly easy. The first step is to create a new parameter group.

Creating parameter groups is hard!

Creating parameter groups is hard!

Once that parameter group is created, we can highlight the parameter group and click “Edit Parameters”.

Edit parameters with ease!

Edit parameters with ease!

The UI makes it relatively easy to sort and filter. Here, I’m looking to adjust how parallelism is handled on these particular RDS instances. These parameters are applied to all instances assigned to the parameter group; make sure that you put similar SQL Server instances in the same parameter group.

Change doesn't have to be hard.

Change doesn’t have to be hard.

If you’re unsure about what you’ve changed, you can even select “Preview Changes” and get a quick overview of current and new values.

It's almost like we can see the future.

It’s almost like we can see the future.

Applying Changes with a Parameter Group

Once you’ve changed the parameter groups, you’d be tempted to think the changes will be immediately applied to the servers. They won’t be applied immediately. You’ll still need to schedule a restart to each SQL Server RDS instance for the new parameter group settings to be applied.

This approach works well for many applications – you can restart a small portion of the servers in the system, make sure that the application is performing correctly, and then restart more servers. By leveraging the power of the cloud – scale out, not up – we can make operations easier and simpler.

What’s it mean for you?

Once you’ve gotten the hang of parameter groups, new SQL Server RDS instances can be created with the parameter group already applied. Existing SQL Server RDS instances can be moved into a parameter group. Managing settings across 10s or 100s of instances just got significantly easier – change a parameter group and you’re off to the races.

Brent says: But I put all this work into learning PowerShell! Oh, wait, no, I didn’t. Carry on.

Jeremiah retorts: Good news! There’s a PowerShell suppository toolkit.

Faster queries using narrow indexes and CROSS APPLY

It’s common to try to build the perfect index for a query. This index only uses the exact right columns for the key columns and other, presentational, data is pushed to the INCLUDE section of the index. While you can create a narrow key this way, the entire index row can become wide.

A Sample Report

For this blog post, we’re going to work with the StackOverflow dataset. For more information on getting started with the StackOverflow database, check out How to Query the StackExchange database.

We want to find interesting, unanswered questions that have been asked in the last 90 days.

We’ll start by creating some indexes:

CREATE INDEX IX_UnansweredQuestions
    ON dbo.Posts (CreationDate) 
    INCLUDE (Id, Score, OwnerUserId, Title) 
    WHERE (AcceptedAnswerId = 0) ;

CREATE INDEX IX_ScoredCommentsByParent
    ON dbo.Posts (ParentId, Id)
    WHERE (Score > 0);

    ON dbo.Votes (PostId) ;

    ON dbo.Votes (VoteTypeId, CreationDate)
    INCLUDE (PostId, UserId) ;

And here’s our sweet query:

        p.Id ,
        p.Score ,
        p.OwnerUserId ,
        + (SUM(CASE WHEN v.VoteTypeId = 2 THEN 1
                 WHEN v.VoteTypeId = 3 THEN -1
                 ELSE 0 END )
            / COUNT(*))
        + SUM(COALESCE(uv.Reputation, 0)) AS SortWeight
FROM    dbo.Posts AS p
        JOIN dbo.Votes AS v ON p.Id = v.PostId
                               AND v.VoteTypeId IN (2,3)
        LEFT JOIN dbo.Users AS uv ON v.UserId = uv.Id
WHERE   p.CreationDate > '2013-06-08'
        AND v.CreationDate > '2013-06-08'
        AND p.AcceptedAnswerId = 0
        AND p.PostTypeId = 1
                    FROM    dbo.Posts a
                    WHERE   a.ParentId = p.Id
                            AND a.Score > 0)
GROUP BY p.Id, p.Score, p.Title, 

This query uses a lot of CPU and uses nearly 5 seconds of CPU time on my machine between parsing, compilation, and execution. Overall it does around 1 million logical reads. That’s a few too many reads for my liking. I think we can make it better.

Simple, Small, Indexes and APPLY

We need a way to make this query faster and lighter. First, we get rid of the wide index we created before.

DROP INDEX dbo.Posts.IX_UnansweredQuestions ;

We add two more indexes:

CREATE INDEX IX_UnansweredQuestions
    ON dbo.Posts (PostTypeId, AcceptedAnswerId, CreationDate) 
    INCLUDE (Id, Score) ;

CREATE NONCLUSTERED INDEX IX_Posts_AcceptedAnswerId_PostTypeId_CreationDate
    ON dbo.Posts (PostTypeId, CreationDate)
    INCLUDE (Id, Score);


And then we re-write the query with magic:

WITH recent_votes AS (
    SELECT  v.PostId,
            (SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 
                     WHEN v.VoteTypeId = 3 THEN -1
                     ELSE 0 END ) 
                / COUNT(*)) AS VoteWeight
    FROM    dbo.Votes AS v
    WHERE   v.CreationDate > '2013-06-08'
            AND v.VoteTypeId IN (2,3)
    GROUP BY v.PostId, v.UserId
posts AS (
    SELECT  p.Id ,
    FROM    dbo.Posts AS p 
            LEFT JOIN dbo.Posts AS answers ON answers.ParentId = p.Id
                                              AND answers.Score > 0 
    WHERE   p.CreationDate > '2013-06-08'
            AND p.AcceptedAnswerId = 0
            AND p.PostTypeId = 1
            AND answers.Id IS NOT NULL
        p.Id ,
        p.Score ,
        _.Title, _.CreationDate, _.OwnerUserId ,
        + SUM(v.VoteWeight)
        + SUM(COALESCE(uv.Reputation, 0)) AS SortWeight
FROM    posts AS p
        JOIN recent_votes AS v ON p.Id = v.PostId
        LEFT JOIN dbo.Users AS uv ON v.UserId = uv.Id
        CROSS APPLY (   SELECT  Title, CreationDate, OwnerUserId
                        FROM    dbo.Posts p2
                        WHERE   p2.Id = p.Id ) _
        _.Title, _.CreationDate, _.OwnerUserId

The new query performs 800,000 reads but it only takes 1.2 seconds to execute and it goes parallel (1.9 total CPU seconds are used). This is a big improvement over the first query, and I’ll explain how we got there.

What makes the CROSS APPLY faster?

This query has a lot of differences from the first query. Let’s break down what’s going on in here.

We’ve split out the core of the query into two CTEs. In this case, we’re using the CTEs as optimization fences. Rather than let SQL Server do its own thing with optimization, we’re tricking it into providing the optimization path that we want – recent_votes and posts will be optimized separately. Because of this optimization fence, we’ve been able to construct very narrow indexes that bring back the data that we need for each.

After tricking SQL Server into optimizing our CTEs separately, we use CROSS APPLY to pull back the details we want based on a primary key lookup. This helps our query run faster – we need fewer pages allocated for our indexes which means that SQL Server estimates that less memory will be used. The results speak for themselves, too – the second query performs 1/4 the I/O of the first query.

What’s the Result?

The end result is that we have query that’s fast. Our indexes are free of bloat caused by wide INCLUDE columns.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

Do Foreign Keys Matter for Insert Speed

Do you have the key?

Do you have the key?

Do foreign keys hurt the performance of inserts? Okay, we all know that foreign keys do require some work, but the crazy people of the internet have wildly varying guesses as to just how much work is involved. Estimates varied from “barely any” to “as much as 50% slower”.

I figured that you were going to do the work, so I might as well do it.

How did the test work?

I created two tables – parent and child. The parent table has an int primary key and a fixed width filler column. I varied the size of the filler column across multiple test runs, just to make sure I wasn’t biasing the test by using an artificially large parent table. The testing happened with a parent table that had 2 rows per page, 4 rows per page, and 8 rows per page.

You can find the test code in fk.sql.

Testing SQL Server insert speed with foreign keys

The test code inserts 1,000,000 rows in batches of 5,000 rows. Although this isn’t strictly realistic, it’s better than timing a single batch of 1,000,000 inserts.

What happens during the test?

It turns out that no matter what the page size it takes approximately 40ms to insert 5,000 rows with the foreign key in place. When we remove the foreign key, inserting 5,000 rows takes ~26ms.

Although the difference looks big at 5,000 row batch sizes, each insert is taking, on average, 0.0072ms with the foreign key and 0.0052ms without the foreign key. These differences are hardly worth noting. And, let’s face it, 2 microseconds is a pretty minor price to pay for data integrity.

How much do foreign keys affect single row inserts?

I ran the test one last time with a batch size of 1 row, just to see how much difference there was in insert performance.

FK present? duration
yes 253,896 ms
no 241,195 ms

When it comes down to single row performance, the difference is neglible. We’re spending all of our time waiting for other parts of the system.

How real is this test?

It’s not terribly real, but it’s real enough to say “Yes, foreign keys have overhead” and follow that up with “No, you shouldn’t care, just add the keys.” The tests were done in VMware and the hardware in question is mid-2012 Retina MacBook Pro.

In the real world, we’ve got a lot more than inserts happening, but it’s worth quantifying the cost of a foreign key lookup and realizing that it’s worth having foreign keys.

Kendra says: Wanna know if your foreign keys are ‘trusted’? Check out our script here.