Getting Started with Diskspd

Diskspeed, or diskspd.exe, is Microsoft’s replacement for SQLIO. While I’m not going to replace our SQLIO tutorial, you can use the information in here to replicate the same type of tests on your systems to get the information you need. During the Dell DBA Days, Doug and I used diskspd as one of our techniques for getting a baseline of raw performance. We wanted to get an idea of how fast the servers and storage before running SQL Server specific tests.

How do I get diskspd?

You can download diskspd directly from Microsoft – Diskspd, a Robust Storage Testing Tool, Now Publically Available. That page has a download link as well as a sample command.

The upside is that diskspd is a fully self-contained download. You don’t need Java, .NET, or anything else installed to run it. Apart from Windows – you’ll still need Windows.

How do I use diskspd?

With the command line, of course!

In all seriousness, although diskspd is the engine behind Crystal Disk Mark, it stands on its own as a separate tool. Download the executable and unzip it to an appropriate folder. There are going to be three sub-folders:

  • amd64fre – this is what you need if you have a 64-bit SQL Server
  • armfre
  • x86fre

I took the diskspd.exe file from the appropriate folder and dumped it in C:\diskspd so I could easily re-run the command. Let’s fire up a command prompt and try it out.

Here’s a sample that we ran: diskspd.exe -b2M -d60 -o32 -h -L -t56 -W -w0 O:\temp\test.dat > output.txt

Breaking it down:

  • -b2M – Use a 2 MB I/O size. For this test, we wanted to simulate SQL Server read ahead performance.
  • -d60 – Run for 60 seconds. I’m lazy and don’t like to wait.
  • -o32 – 32 outstanding I/O requests. This is your queue depth 32.
  • -h – This disables both hardware and software buffering. SQL Server does this, so we want to be the same.
  • -L – Grab disk latency numbers. You know, just because.
  • -t56 – Use 56 threads per file. We only have one file, but we have 56 cores.
  • -W – Warm up the workload for 5 seconds.
  • -w0 – No writes, just reads. We’re pretending this is a data warehouse.
  • D:\temp\test.dat – our sample file. You could create a sample file (or files) by runningdiskspd with the -c<size> flag.
  • > output.txt – I used output redirection to send the output to a file instead of my screen.

How do I read diskspd results?

You’re going to get a lot of information back from this command. You’re going to want to close the window and back away quickly. Don’t. This is good stuff, I promise.

The first thing you’ll see is a recap of the command line you used. Then you’ll immediately see a summary of the commands:

timespan:   1
duration: 60s
warm up time: 0s
cool down time: 0s
measuring latency
random seed: 0
path: 'O:\temp\test.dat'
    think time: 0ms
    burst size: 0
    software and hardware write cache disabled
    performing read test
    block size: 2097152
    using interlocked sequential I/O (stride: 2097152)
    number of outstanding I/O operations: 32
    thread stride size: 0
    threads per file: 56
    using I/O Completion Ports
    IO priority: normal

That’s a lot easier than trying to read a set of command line flags. Six months from now, I can review older runs of diskspd and understand the options that I used. diskspd is already winning over sqlio.

Next up, you’ll see a summary of CPU information. This information will help you understand if your storage test is CPU bottlenecked – if you know the storage has more throughput or IOPS capability, but your tests won’t go faster, you should check for bottlencks. The last line of this section (and every section) will provide an average across all CPUs/threads/whatevers.

actual test time:   60.01s
thread count:       56
proc count:     56

CPU |  Usage |  User  |  Kernel |  Idle
   0|  23.17%|  10.36%|   12.81%|  76.83%
   1|   4.24%|   2.40%|    1.85%|  95.76%
   2|   9.71%|   7.37%|    2.34%|  90.29%
   3|   1.48%|   0.70%|    0.78%|  98.52%
avg.|  14.11%|  12.48%|    1.63%|  85.89%

After the CPU round up, you’ll see a total I/O round up – this includes both reads and writes.

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
     0 |      2950692864 |         1407 |      46.89 |      23.44 | 1323.427 |   107.985 | O:\temp\test.dat (50GB)
     1 |      3013607424 |         1437 |      47.89 |      23.94 | 1310.516 |   141.360 | O:\temp\test.dat (50GB)
     2 |      2950692864 |         1407 |      46.89 |      23.44 | 1319.540 |   113.993 | O:\temp\test.dat (50GB)
     3 |      2950692864 |         1407 |      46.89 |      23.44 | 1315.959 |   122.280 | O:\temp\test.dat (50GB)
total:      167975583744 |        80097 |    2669.28 |    1334.64 | 1307.112 |   162.013

Look at all of those bytes!

If the I/O numbers initially seem small, remember that the data is split up per worker thread. Scroll down to the bottom of each section (total, reads, writes) and look at the total line. This rounds up the overall volume of data you’ve collected. The I/Os are recorded in whateverunit of measure you supplied. In our case, the I/Os are 2MB I/Os.

Important Sidebar Your storage vendor probably records their I/O numbers in a smaller I/O measurement, so make sure you do some rough translation if you want to compare your numbers to the advertised numbers. For more discussion, visit IOPS are a scam.

Finally, latency! Everybody wants to know about latency – this is part of what the end users are complaining about when they say “SQL Server is slow, fix it!”

  %-ile |  Read (ms) | Write (ms) | Total (ms)
    min |     13.468 |        N/A |     13.468
    max |   1773.534 |        N/A |   1773.534

This table keeps the min, max, and a variety of percentiles about how the storage performed while you were beating on. This information is just as helpful as the raw throughput data – under load your storage may have increased latencies. It’s important to know the storage will behave and respond under load.

How often should I use diskspd?

Ideally, you should use diskspd whenever you’re setting up new storage or a new server. In addition, you should take the time to use diskspd when you make big changes to storage – use diskspd to verify that your changes are actually an improvement. No, diskspd doesn’t include the work that SQL Server does, but it does show you how your storage can perform. Use it to make sure you’re getting the performance you’ve been promised by your storage vendor.

Performing Your Own SQL Server Triage

Our new Triage Specialist will be using tools, forms, and methodologies that we build for our SQL Critical Care® clients, but we also make a lot of those tools available to the public for free in our First Responder Kit.

Here’s how to do your own quick health assessment for free:

First, start your health check with sp_Blitz®. It’s a stored procedure that gives you a prioritized list of problems on your SQL Server. If the databases are all third party apps that you can’t control, use the @CheckUserDatabaseObjects = 0 parameter to focus the results on the things you can change. sp_Blitz® works on SQL Server 2005 and newer, but it doesn’t work on databases in SQL Server 2000 compatibility mode.

If you don’t think one of the resulting rows is important, or if you don’t understand the warning, copy the contents of the URL column into your browser and spend a few minutes reading the details. For example, at least once a month, we run across a server having performance problems, and it turns out not all of the CPU and memory is actually available to SQL Server. Sure enough, when we look at sp_Blitz®, it warns about CPU schedulers being offline, and the admin says something to the effect of, “Oh, I saw that in sp_Blitz®’s output, but I wasn’t really sure what it meant because I didn’t read the URL.”

Next, check your server’s bottlenecks with sp_AskBrent®. This stored proc was designed for live real-time performance troubleshooting, but it has another cool use. Use the @Seconds = 0, @ExpertMode = 1 parameters, and the second result set will show you your server’s primary wait stats since SQL Server was restarted (or since someone cleared the wait stats.) This helps guide you toward the type of performance bottleneck you’re troubleshooting – is it a CPU bottleneck, storage issue, or locking?

Armed with that knowledge, check your top resource-intensive queries with sp_BlitzCache®. Once you know the kind of bottleneck you’re looking for, use the @sort_order parameters to get to the right queries:

  • Looking for high CPU consumers? @sort_order = ‘cpu’
  • Tracking down the source of PAGEIOLATCH waits due to queries dragging a lot of data back from disk? @sort_order = ‘reads’
  • Wondering who’s getting blocked for long periods of time? Try @sort_order = ‘duration’

Then make the queries more efficient with sp_BlitzIndex®. Now that you know the worst queries and the database they’re in, run sp_BlitzIndex® in that database and get a psychological profile of your indexes.

  • Dastardly deadlocks? Look for the Aggressive Indexes warnings about high lock waits on an index.
  • Slow selects? Check out the high-value missing index warnings.
  • Intolerable inserts? Read the Hoarder warnings about tables that have a high number of nonclustered indexes that aren’t getting used.

We slather trademarks all over these tools because we’re really proud of ’em. We use them every single day in our consulting work, and we want you to use them in your triage work too. You might even wanna just practice using ’em in case we ever ask you to use ’em during our job interviews. (That’s a hint.)

Why RPO and RTO Are Actually Performance Metrics Too

Most companies come to us saying, “The SQL Server isn’t fast enough. Help us make it go faster.”

They’re kinda surprised when one of the first things we fill out together is a variation of our High Availability and Disaster Recovery Planning Worksheet:

They say things like, “Wait, I’m having a performance problem, not an availability problem.”

But as we start to look at the SQL Server, we often find a few disturbing truths:

  • The admins have stopped doing transaction log backups and DBCC CHECKDB because the server can’t handle the performance hit
  • Management thinks the database can’t lose any data because communication wasn’t clear between admins and managers
  • The server isn’t even remotely fast enough to start handling data protection, let alone the end user query requirements

I know you think I’m making this up. I know you find this hard to believe, dear reader, but not everyone is a diligent DBA like you. Not everyone has their RPO and RTO goals in writing, tests their restores, and patches their SQL Server to prevent known corruption bugs. I hope you’re sitting down when you read this, but there are some database administrators out there who, when given the choice between index rebuilds and transaction log backups, will choose the former on a mission-critical 24/7 system.

I’m sure that would never be you, dear reader, but these database administrators are out there, and they’re exactly the kinds of shops who end up calling us for help.

Then the fun part is that once we establish what the business really wants in terms of data safety, it often dictates a new server – say, moving from a single standalone VM with no HA/DR up to a full-blown failover cluster. And in the process of sketching out that new cluster, we can solve the performance problems at the same time without any changes to their application. (Throwing hardware isn’t the right answer all the time – but when you need to add automatic failover protection, it’s the only answer.)

That’s why we’re hiring a Triage Specialist: a person who can quickly assess the difference between what the business needs and what the system is really delivering, and get everyone on the same page as to what needs to happen next. And while you, dear reader, may not be the right person to apply for this position, I strongly encourage you to make sure your RPO/RTO knowledge is shored up. After all, you don’t want to have that awkward discussion with our Triage Specialist.

Scaling Up Our SQL Critical Care®

Since we’re hiring again, it’s a good time to explain what we do and how it’s evolved over the last four years.

When your SQL Server’s performance or reliability is causing you business pain, and you can’t get the relief you want, you call us for a SQL Critical Care®. We use specialized tools we’ve built to quickly get to the root cause. Then we explain it to you in clear, friendly terms, and train you how to get permanent pain relief.

Notice how we keep using medical terms? That’s no coincidence – we’re very much like a doctor’s office. Over the course of performing hundreds of these, and because the waiting room keeps filling up, we’ve scaled this to where we treat it like a hospital.

You can tell the drugs kicked in because I'm smiling.

You can tell the drugs kicked in because I’m smiling.

Remember that time I broke my arm saving a nun from a car wreck carrying a flat panel monitor? I went into the hospital, the triage nurse made sure I wasn’t having a heart attack, then got me into radiology for scans, then a doctor reviewed the situation, and he brought in an orthopedic surgeon for a consult on whether they should operate. The hospital had all of these experts available, and they rotated in and out of the room based on my needs.

Because of our scale, we’re able to bring that same approach to SQL Server pain relief.

First, a Triage Specialist reviews your RPO and RTO objectives, learns how you use SQL Server, collects diagnostic information, and prescribes changes to avoid data loss. Then, based on what the Triage Specialist finds, they bring in different specialists. The situation calls for plan guides? You’ll get a T-SQL surgeon who’s used that unusual trick before. Need pain relief for an AG in EC2 that randomly fails over? You’ll be able to work with an AlwaysOnologist who’s built up specialized tools to diagnose it.

We couldn’t use this approach when we only had a few consultants, but now that we’re doing so many of these, fun options open up.

Synchronous AlwaysOn Availability Groups Is Not Zero Data Loss

In theory, when you configure AlwaysOn Availability Groups with synchronous replication between multiple replicas, you won’t lose data. When any transaction is committed, it’s saved across multiple replicas.

Except when it’s not. Read the manual carefully:

If primary’s session-timeout period is exceeded by a secondary replica, the primary replica temporarily shifts into asynchronous-commit mode for that secondary replica.

That’s right – your sync secondary becomes asynchronous.

Automatically. Without warning you. And you can’t control it. You can’t say, “I only want you to accept transactions as long as data is being copied to another replica.” For more details on what causes a sync replica to drop into async, check out the BOL page on synchronous commits.

Bottom line – you can’t actually guarantee zero data loss with AlwaysOn Availability Groups. I love AGs, but they’re much, much more complex than they look at first glance.

Enabling Query Store in Azure SQL Database

Enter Query Store

Query Store, in short, is a way to track query performance over time. In Microsoft’s words, “The feature automatically captures a history of queries, plans, and runtime statistics, and retains these for your review”. It’s like sys.dm_exec_query_stats but it persists across reboots! And it has execution plans!

The Query Store is a great feature for on premises SQL Server backed applications, but it adds significant value in Azure SQL Database. Instead of potentially losing execution history and execution plans, developers can see the history of query performance and execution plans across reboots.

Enabling Query Store in Azure SQL Database

Point SSMS at your Azure SQL Database instance and run:


No, really, it’s that easy.

How Do I Really Know Query Store is Enabled?

Figuring out if Query Store is enabled is pretty easy, too. All you need to do is look at sys.database_query_store_options. The following query should give you exactly what you need to know about the Query Store:

SELECT  desired_state_desc ,
        actual_state_desc ,
        current_storage_size_mb , 
        max_storage_size_mb ,
FROM    sys.database_query_store_options ;

As long as things are running correctly, the desired and actual states should read either ALL or AUTO. For more information about using Query Store, check out Monitoring Performance by Using the Query Store.

The Takeaway

Azure SQL Database instances can be rebooted at any time. It’s important to keep this in mind and plan accordingly, not just in the application but also in how you approach tuning your T-SQL. You should enable the Query Store in Azure SQL Database instances where you care about performance.

Getting JSON out of SQL Server

SQL Server 2016 brings support for JSON. For those of you who don’t know, JSON is JavaScript Object Notation – it’s a way of representing application level objects as a string. Imagine XML without anywhere near as much line noise, and you’re pretty much there.

Our Sample Document

In this example, to retrieve orders and their line items from the AdventureWorks database. We need to get the data out in this format for fast processing. In the end, our data should look something like:

{ "orders" : 
    [ /* an array of orders */
        { "order" : 
            { "number" : "123456" ,
                /* more stuff goes here */
                "items" : 
                        { "item": { /* item info goes here */ } }

Holy cow, that’s a lot of stuff just to represent and order and the line items.

Here’s our starter query:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        d.ProductID AS [item.product_id],
        d.OrderQty AS [item.quantity],
        d.UnitPrice AS [item.unit_price],
        d.UnitPriceDiscount AS []
FROM    Sales.SalesOrderHeader h
        JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
WHERE   h.SalesOrderNumber = 'SO43659'

First Attempt

The documentation hints that we might just be able to use `FOR JSON AUTO` option to automatically format our JSON. What could go wrong?

Syntax is hard

Syntax is hard

Our results aren’t what we want at all! The structure is close, but not quite what we’re looking for.

Getting the Structure Right

We are getting all the data we want, but we’re not getting the structure that we want. Our requirements say that we need to have and `orders` array and then each `order` should be a separate item in that array. An orders line items should also be stored in a separate array.

The square brackets make an array in JSON. The curly brackets denote an object.

Using the root() function will nest our results in a root object. In this case, we can use root('orders') to create a basic orders object to hold our list of orders.

Making each item part of an array inside the order is going to be trickier. We can still use FOR JSON AUTO to get the job done:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        (SELECT d.ProductID AS [item.product_id],
                d.OrderQty AS [item.quantity],
                d.UnitPrice AS [item.unit_price],
                d.UnitPriceDiscount AS []
         FROM   Sales.SalesOrderDetail d 
         WHERE  h.SalesOrderID = d.SalesOrderID 
         FOR    JSON AUTO) AS items
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'

We’re almost right!

So wrong I can taste it.

So wrong I can taste it.

Well, that’s … something. To get the output we want, we’re going to have to use a correlated subquery – fans of generating XML in SQL Server may remember jumping through hoops to get the right structure out of SQL Server.

We’re on the right track, but what we really need is better formatting.

The Best Solution, For Now

Since the FOR JSON AUTO isn’t generating the JSON output that we want, we can go down a manual route and use FOR JSON PATH. If this is starting to sound like XML, you’re absolute right.

After attaching a debugger to SQL Server, Paul White shows that the JSON writer is using the old XML code under the hood:

Anyway, the SQL to generate the right JSON document:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        (SELECT d.ProductID AS [item.product_id],
                d.OrderQty AS [item.quantity],
                d.UnitPrice AS [item.unit_price],
                d.UnitPriceDiscount AS []
         FROM Sales.SalesOrderDetail d 
         WHERE h.SalesOrderID = d.SalesOrderID
         FOR JSON PATH
        ) AS [order.items]
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'

Check out the correct (and formatted) results!

Getting JSON out of SQL Server

It’s easy to get incorrectly formatted JSON out of SQL Server. If you do need to get well formatted JSON out of SQL Server be prepared to put in a lot of work coercing your query into the right shape. You may find that you need to do a large number of nested sub-queries, too.

Fake moustaches, barbecue, and SQL Server.

Must be Dell DBA Days:

Join us live as we performance tune – and break – SQL Servers:

  • Thursday Morning: Finding Your Slow SQL Server’s Bottlenecks in Record Time with Wait Stats
  • Thursday Afternoon: How to Prove Hardware is the Problem
  • Friday Morning: SQL Server High Availability, Disaster Recovery, and Licensing Explained
  • Friday Afternoon: Watch SQL Server Break and Explode

Register now and join in.

Is leading an index with a BIT column always bad?

“Throughout history, slow queries are the normal condition of man. Indexes which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people who don’t think bit columns are selective enough to lead index keys. Whenever this tiny minority is kept from creating indexes, or (as sometimes happens) is driven out of a SCRUM meeting, the end users then slip back into abject query performance.

This is known as “Business Intelligence.”

–Bobby Q. Heinekens

Fake quotes and people aside

Let’s look at a scenario where you have a BIT column that’s fairly selective, and perhaps the rest of your predicates are ranged. This isn’t so out of the ordinary, especially because people like to know when stuff happened and how many times it happened.

“How many lunches have I eaten today?”

“Where did that bear learn to drive?”

“Am I being slowly disassembled on a molecular level by millions of tiny black holes?”

Yes, China or Florida, and Probably!

So let’s figure this out quick

;WITH E1(N) AS (
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
        IDENTITY (BIGINT, 1,1) AS [ID] , 
    ABS(CHECKSUM(NEWID()) / 100000000) + 1 AS [CustomerID],
        ISNULL(CONVERT(DATE, DATEADD(MINUTE, -[N].[N], GETDATE())),     '1900-01-01') AS [OrderDate] ,
        CASE WHEN [N].[N] % 19 = 0 THEN 1 ELSE 0 END AS [isBit]
INTO [NotAshleyMadisonData]
FROM    [Numbers] [N]
ORDER BY [OrderDate]


([isBit], [OrderDate])

([OrderDate], [isBit])

Here’s one table with one million rows in it. Since it’s random, if you run this on your own it may turn out a little different for you, but I’m sure you can adapt. You are, after all, wearing the largest available diaper size.

I’ve also gone ahead and created two indexes (neither one filtered!) to avoid the appearance of impropriety. The first one goes against the oft-chanted mantra of not leading your index with a BIT column. The other complies to your thumb-addled rules of index creation where your more unique column comes first, though not to an opposing rule to lead your index with equality predicates and then range predicates.

Only 52,631 rows out of a million have a BIT value of 1. And with the exception of the first and last date values, each date has 75 or 76 BIT = 1 columns.

If you had to do this in your head, which would you do first? Find all the BIT = 1 rows, and then only count occurrences from the desired range? Or would you Find your start and end dates and then count all the BIT = 1 values?

(Hint: it doesn’t matter, you’re not the query engine. Unless you’re Paul White. Then maybe you are. Has anyone seen them in the same room together?)

Images of Query Plans

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[isBit] = 1
AND [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2015-08-20' --All dates

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[isBit] = 1
AND [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2014-09-01' --About half the dates

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2014-09-01'
AND [namd].[isBit] = 1 -- Flipping them doesn't change anything

FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[OrderDate] = '2013-09-26' --It's not until here that the other index gets used
AND [namd].[isBit] = 1
Peter Godwin will never play my birthday party.

Peter Godwin will never play my birthday party.

Put on your recap cap

This is another case where knowing the data, and knowing the query patterns in your environment is important. It’s easy to overlook or reject obvious things when you’re bogged down by dogma.

The stamp of approval for an idea shouldn’t come from blogs, forums, white papers, or hash tags. It should come from how much something helps in your environment.

Thanks for reading!

New and Improved: Relaunching Our Online Training – With a Daily Deals Sale

About two years ago, we launched our online training classes, and you loved ’em. We sold thousands and it became a pretty big part of our business.

The New Hotness

The New Hotness

We’ve launched our all-new online training video store with features like:

  • Comments – so you can ask questions on each video module
  • More powerful e-commerce – VAT, group discounts, Paypal, PDF receipts
  • HTML-based modules – so we can include code samples, images, and posts
  • Better analytics – to understand which modules you like, and which ones to tweak
  • Course reviews – we’ve had thousands of buyers, and it’s time to let them speak
  • Better coupons – including getting a discount for liking us or tweeting (and boy, Black Friday is gonna be fun this year)

To celebrate the launch, we’re running a set of daily deals:

  • Today (Monday 8/24) – The Everything Bundle: $599 – a bundle of all of our online video courses available for $1,553 just $599. Only 50 available. Gone!
  • Tues (8/25) – Chicago SQL Server Performance Troubleshooting class seats for $3,195 just $995. Only 3 available. Sold out!
  • Weds (8/26) – $99 SQLSaturday Pittsburgh Pre-Con seats – normally $199, this one-day Developer’s Guide to SQL Server Performance is half off. 25 available. Gone!
  • Thurs (8/27) – Mystery Pocket Bundles – we’ll post a handful of $29-$99 deals with secret unknown contents. On sale now. Sold out!
  • Fri (8/28) – all $299 courses 50% offfor the first 50 buyers of each course. Sold out!