[Video] What Percent Complete Is That Index Build?

SQL Server 2017 & newer have a new DMV, sys.index_resumable_operations, that show you the percent_completion for index creations and rebuilds. It works, but…only if the data isn’t changing. But of course your data is changing – that’s the whole point of doing these operations as resumable. If they weren’t changing, we could just let the operations finish.

Let’s see how it works to understand why the numbers aren’t really reliable:

Well, that’s … disappointingly inaccurate.

Here are the demo scripts if you want to give it a shot on your own systems:

Updated First Responder Kit and Consultant Toolkit for March 2020

In the immortal words of “Orange” Julius Caesar as written by Shake’s peer, “Beware the bugs of March.” It took us 2,064 years, but we’ve finally fixed those bugs, plus added a couple of new features.

To get the new version:

Consultant Toolkit Changes

Updated the First Responder Kit with this month’s script updates.

sp_AllNightLog Changes

  • Fix: skip databases where ignore_database <> 0 in the backup_worker table. (#2308, thanks Maxiwheat.)

sp_Blitz Changes

  • Fix: don’t error out when some databases have numeric roundabort turned on, and a rounding error would normally throw an error. (#2302, thanks DJH.)
  • Fix: bug in full backup check: if msdb.backupset contained backups taken on other servers, we would have included them as successful backups when they really weren’t. (#2313, thanks Solomon Rutzky.)
  • Fix: uninstall script now works on case-sensitive collations. (#2307, thanks Protiguous.)

sp_BlitzCache Changes

  • Improvement: added a new check for the UserStore_TokenPerm cache being >10% of the buffer pool. I’m seeing issues out in the wild where this cache has taken over memory and caused SQL Server to be unable to cache any execution plans. This was an issue over a decade ago with SQL 2005, but I’ve got a SQL Server 2016/2017 client with an open case with Microsoft on it. We’ll be adding this same check for sp_Blitz and sp_BlitzFirst. (#2134, thanks Erik Darling.)

sp_BlitzFirst Changes

  • Fix: @FilterPlansByDatabase was throwing an error if you passed in a list of databases (rather than just “user” for the user databases.) (#2311, thanks Sam Carey.)

sp_BlitzLock Changes

  • Improvement: now catches single-query parallelism deadlocks, too. (#2286, thanks Adrian Buckman.)
  • Fix: now works with Amazon RDS SQL Server Express Edition by skipping the sysjobssteps query like we do with Azure SQL DB. (#2317, thanks Ernesto-Ibanez.)

sp_BlitzWho Changes

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

The First Time I Had to Restore a Database

Mophead

Not the exact same time frame, but not that far off.

I remember it really clearly.

In the mid 1990s, long, long before I went into IT as a career, I was working at a photo studio in Muskegon, Michigan. They specialized in high school class photos, and they did a LOT of ’em. Every morning, the photographers would come into the office to collect printouts for the pictures they were going to shoot that day – student name, address, time of the photos, that kind of thing.

My job duties included:

  • Removing the prior night’s backup tape and switching it out for a new one
  • Running a few database queries to prep for the day’s sessions
  • Printing out the appointments & labels for the photographers

One morning, I ran a query without the WHERE clause. Because that’s what you do.

I don’t remember how the queries worked, and I don’t even remember what the database was. I just remember that it ran on SCO Xenix because I remember the manuals so clearly, and I remember that I didn’t panic at all. I knew nobody else had accessed the database yet – I was one of the first ones in every morning – so all I had to do was restore the database and go through the steps again.

But I also remember that the boss (not manager – boss) had an epic temper. As in, swear-and-throw-things-and-fire-people kind of temper. And like me, he was an early riser, and I knew it was only a matter of time before he showed up and looked for his printouts to see who he was going to photograph that day. I was sure I was gonna get my rear handed to me, and I was resigned to that fact.

So I put last night’s tape in, started the restore, and waited. Sure enough, the boss came in, and before he could say anything, I said:

I screwed up, my fault, and I’m sorry. I messed up the queries I was supposed to run, so I’m restoring last night’s backup, and then I’m going to run the queries again and do the printouts. You can fire me if you want, and I’ll totally understand it if you do, but you should probably wait to fire me until after the restores finish and I do the printouts. Nobody else here knows how to do this, and the photographers need to work today.

Worked like a charm. He nodded, and I could tell he was pissed, but he didn’t yell or throw things. He just gruffly left the computer room and did other stuff.

The photographers and other people started trickling in, looking for their printouts, and I explained that they weren’t ready yet, and explained why. They all got big eyes and asked if the boss knew about this, and they were all sure I was going to get fired.

I didn’t get fired, and everybody was surprised. The boss used me as an example, saying, “No, this is what you’re supposed to do – own up when you do something stupid, fix it yourself, and be ready to deal with consequences.”

Part of me was a little disappointed that I didn’t get fired, though. I wasn’t a big fan of that job. I’ve only been fired once – from a Hardee’s – but that’s a story for another blog post.

What about you? Do you remember the very first time you had to do a database restore to fix something you messed up?

Free Fundamentals of Query Tuning Week: Part 5, Common T-SQL Anti-Patterns

While you play Chopped with the ingredients in your cupboard (“Today’s ingredients are rice, beans, and hand sanitizer”), I’m giving away free training. So far, we’ve covered building a query plan, how to find the right queries to tune, how parameters influence cached plans, and improving cardinality estimation accuracy. Make sure to keep up with the series – on April 1, these posts will magically disappear.

If you’ve been enjoying this free week of fundamentals, and your company’s on lockdown for the foreseeable future, now’s the time to talk to your boss about leveraging that training budget. You can get started learning right now, regardless of what happens with the travel & conference season through the rest of the year. I offer a series of expensive 3-day classes:

But you, dear reader, are too smart to pay full price. You can score my Live Class Season Pass for $995 or the recordings of the Mastering classes for $795. Use coupon code ReallyBadWeek for another 10% off that price, and I’ll donate 10% of the sales to Doctors Without Borders. Now, on to this week’s final episode.

So far, we’ve covered how to find your server’s most resource-intensive queries, and you’ve opened their execution plans to figure out where the estimates vs actuals go wrong. Now, learn what T-SQL anti-patterns cause the estimates to go wrong.

 

Demo Script

 

 

Free Fundamentals of Query Tuning Week: Part 4, Improving Cardinality Estimation Accuracy

Before we get to the training in today’s episode, I wanna stop for a second to say thank you. We’re donating 10% of all sales this month to Doctors Without Borders. This morning, I sent in the first donation: so far, you’ve raised $4,792.58! That’s awesome. Now keep staying at home, learning, and flattening the curve.

Alright, now back to the training. So far, we’ve covered building a query plan, how to find the right queries to tune, and how parameters influence cached plans. Make sure to keep up with the series – on April 1, these posts will magically disappear.

When SQL Server’s estimated number of rows per operator are reasonably close to the actual number of rows, you’re likely getting a good plan. It may not be a fast plan, but it’ll accurately reflect the (possibly large) amount of work required.

You can still reduce the amount of work done by changing the query or the indexes.

We’ll step through a series of queries with increasingly harder cardinality estimation (CE) challenges so you can understand the mechanics of getting better estimates.

Part 1 (51m)

Part 2 (7m)

If you like this, you can keep learning this year with:

But you, dear reader, are too smart to pay full price. You can score my Live Class Season Pass for $995 or the recordings of the Mastering classes for $795. Use coupon code ReallyBadWeek for another 10% off that price, and I’ll donate 10% of the sales to Doctors Without Borders.

Demo Scripts

We’ll finish up the series tomorrow by covering common T-SQL anti-patterns.

Free Fundamentals of Query Tuning Week: Part 3, How Parameters Influence Cached Plans

While you practice physical distancing (words I never thought I’d daisy-chain together), I’m giving away free SQL Server training. So far, we’ve covered building a query plan and how to find the right queries to tune. Make sure to keep up with the series – on April 1, these posts will magically disappear.

When you’re tuning queries, picking the right parameters is crucial. So often, you can tune for one particular set of parameters – only to result in different parameters performing poorly.

We’ll start with the query we used in the first module, searching user comments by location & date range, and then start changing the location parameter. You’ll see how different parameters radically affect the shape of the plan: which table SQL Server chooses to process first, and how it chooses to access those tables (seeks vs scans.)

We’ll finish up by talking about the kinds of parameters to look for when you’re preparing to tune a query, and how to get them.

Part 1: Lecture (41m)

Part 2: Homework (7m)

If you like this, you can keep learning this year with:

But you, dear reader, are too smart to pay full price. You can score my Live Class Season Pass for $995 or the recordings of the Mastering classes for $795. Use coupon code ReallyBadWeek for another 10% off that price, and I’ll donate 10% of the sales to Doctors Without Borders.

Demo Script

Later this week, I’ll cover improving cardinality estimation accuracy, then common T-SQL anti-patterns.

Free Fundamentals of Query Tuning Week: Part 2, How to Find the Right Queries to Tune

Well, it turns out that toilet paper doesn’t cure the coronavirus, so during the quarantines, I’m giving away free training. Yesterday, I kicked off the Fundamentals of Query Tuning class with the first module, building a query plan. Make sure to keep up with the series – on April 1, these posts will magically disappear.

Now that you know some of the decisions SQL Server has to make, let’s look at your SQL Server to see what kinds of decisions it’s been making lately. First, we’ll run a workload against the Stack Overflow database and use sp_BlitzCache to spot the most resource-intensive queries.

Then, you’ll take a look at your own development or production environments to get an idea of what queries you need to tune in real life.

This lab has 3 separate demo scripts:

  1. Sets up the indexes & stored procedures
  2. Runs the workload (can take tens of minutes – just let it run for at least 5-10 minutes before you move on to step 3, and you can run step 3’s queries while step 2 is still running)
  3. Analyzes your server’s bottleneck, and helps you figure out which way to sort sp_BlitzCache

If you like this, you can keep learning this year with:

But you, dear reader, are too smart to pay full price. You can score my Live Class Season Pass for $995 or the recordings of the Mastering classes for $795. Use coupon code ReallyBadWeek for another 10% off that price, and I’ll donate 10% of the sales to Doctors Without Borders.

Demo Script #1: Index & Proc Setup

Demo Script #2: Run the Workload

Demo Script #3: Finding the Right Bad Queries

Later this week, I’ll be covering how parameters influence cached plans, improving cardinality estimation accuracy, and common T-SQL anti-patterns.

Free Fundamentals of Query Tuning Week: Part 1, Building a Query Plan

Well, that escalated quickly, huh?

Just a few short months ago, we were all making plans about what we wanted to learn in 2020. We sketched out our goals, our conference plans, maybe even how we wanted to do our first user group presentations or SQL Saturday presentations.

These days:

  • The events you wanted to attend are canceled
  • Your company banned travel (either for health reasons, or budgetary ones, or both)
  • Your family isn’t too keen on you leaving, either
  • It feels like your projects are all in a holding pattern

So, what’s the rational thing to do? Start learning online instead, and I’m gonna help by giving away another class for a couple of weeks. I’ve already shared my Fundamentals of Index Tuning class for free, but now that the quarantines are amping up, let’s spend another week learning. But follow along, because on April 1, they’re going to disappear. If big countries are still quarantined through April, I’ll give away other fundamentals classes in April. Let’s get started!

To follow along with the demos, download a Stack Overflow database. I’ll be using the medium-sized 50GB StackOverflow2013 database, but if you’ve got a different one handy (either the smaller 10GB or larger 300+GB one), that’s fine too – the same concepts will still apply. If you’re on a slow desktop/laptop, feel free to use the small 10GB StackOverflow2010 database.

Attach it to any supported version of SQL Server, can be Standard, Enterprise, or Developer. (Sorry, SQL Server 2008 and Express Edition won’t cut it here.) If you want a free development or evaluation version of SQL Server, hit up SQLServerUpdates.com.

Introduction Lecture (10 minutes)

I introduce the goals of the class and show students how to use Slack for live Q&A during the class. You can skip this if you want:

Now for the training.

First Lecture (35m)

When you execute a query, SQL Server has to decide things like:

  • Which table to process first
  • Which index to use on that table
  • How to access that index (seek vs scan)
  • How many rows it will find that match your search predicates

And the more tables you have in the query, the more complex the decisions become. We’ll start with a simple two-table join, with filters on both tables, and consider the decisions that SQL Server had to make along the way.

Later this week, I’ll be covering how to find the right queries to tune, how parameters influence cached plans, improving cardinality estimation accuracy, and common T-SQL anti-patterns.

If you like this, you can keep learning this year with:

Folks who wanna take ’em all can get my Live Class Season Pass (which is on sale this month to help with folks who suddenly can’t attend in-person conferences this year.)

I’ve got a new option, too:
recordings of the Mastering classes.

A lot of folks told me they wanted to attend the Mastering classes, but they couldn’t justify the Live Class Season Pass, even when it was marked down to $995. For the longest time, I didn’t wanna go any lower than that. I feel like $995 is a worthwhile investment into your career, and it’s a fair trade for the knowledge you gain in these courses.

The virus changed all that.

I needed an even less expensive way to let you get started learning right now, especially those of you stuck at home while your companies get their act together.

So last week, I took a day off vacation (hahaha) and put together a bundle of the Masters Class Recordings. Use coupon code ReallyBadWeek for another 10% off that price, and I’ll donate 10% of the sales to Doctors Without Borders.

Let’s make something good out of this terrible mess. And happy learning this week in the rest of the FQT series!

Demo Script

“Working” From Home? Watch A Bunch of Free SQL Server Videos on YouTube.

You’re new at this working from home thing, and you’re trying to put Game of Thrones up on a second monitor while you work? Listen, that’s not gonna play out well. You’re gonna get distracted and you’re not gonna get any work done. Instead, if you’re gonna have something in the background, learn about SQL Server.

Brent Ozar’s Conference and User Group Sessions – I’ve done a ton for various user groups and conferences that keep ’em in their own YouTube channel, making it a little hard to track down. To make your life easier, I put together a single YouTube playlist with ’em all.

Doug Lane’s 11-Part Series on Statistics – Doug originally filmed these for us as a training class several years ago. They’ve been available free on BrentOzar.com for a while, but I just realized I’d never uploaded them to YouTube, so I fixed that. (That’s what started this whole blog post.)

Doug Lane’s 11-Part T-SQL Level-Up – take your T-SQL from zero to hero – literally – as Doug teaches you heroic skills. The trailer tells you all you need to know about how the class will go down:

 

Happy learnin’!

How to Track Performance of Queries That Use RECOMPILE Hints

Say we have a stored procedure that has two queries in it – the second query uses a recompile hint, and you might recognize it from my parameter sniffing session:

The first query will always get the same plan, but the second query will get different plans and return different numbers of rows depending on which reputation we pass in.

I’m going to do some setup, free the plan cache, and then run the proc:

When I run it for @Reputation = 2, the second query returns 9,149 rows.

How a recompiled query shows up in the plan cache

If I check sys.dm_exec_query_stats right now, there are a few columns with interesting results – I’m going to move them around a little so you can see ’em:

Line 1 is the first query in the stored procedure. It’s a COUNT(*), and it only returns 1 row.

Line 2 is the second query, and it returns 9,149 rows for reputation = 2.

So right now, if I’m inclined to do math, I can add up the total number of rows for both statements and I can see the total number of rows returned by the query. It might seem odd to use rows as a measurement right now, but sys.dm_exec_query_stats’s columns – CPU, reads, writes, duration, etc – all behave the same way as I’m about to show you here, and the rows numbers are more repeatable than some of the others, so let’s use rows.

If I execute the stored procedure twice – for two different parameters – and then check metrics, things look different:

The results:

The first query’s plan stuck around in memory, so it now shows 2 executions, and 2 total rows returned. Its row metrics are correct through the life of the stored procedure’s time in cache.

However, the second query – the one with the recompile hint – has a brand new plan in the cache, but also new metrics. You’re not just recompiling the execution plan, but you’re also not getting query plan metrics here. (That’s fine, and that part I was also kinda aware of.)

But the part that I keep forgetting is that when I’m looking at the stored procedure’s totals in sp_BlitzCache, the total, min, and max values are useless:

In sp_BlitzCache, we add up the totals for each statement in a proc, and we present those as the total numbers for the proc. The problem is that they’re just not true whenever there’s a recompile hint: the totals are underreported, and the avg/min/max just reflect the last execution of any query with a recompile hint.

The plan cache just isn’t a good fit for tracking queries with RECOMPILE hints – but Query Store is.

How a recompiled query shows up in Query Store

I’m going to turn on Query Store in the Stack Overflow database, and then run the two queries again:

And then query Query Store:

I get a much better picture of the number of times that the recompiled queries have run, AND the number of rows they’ve returned each time. (And of course you get other much more useful query tuning metrics too, like CPU, reads, duration, etc.)

When you decide to use RECOMPILE hints,
you probably wanna enable Query Store.

Query Store first shipped in SQL Server 2016, and about half of all production servers are 2016+. Today, in 2020, when your query tuning efforts require you to put in RECOMPILE hints, then you should probably stop for a moment to consider Query Store too. It’ll make your performance troubleshooting easier down the line.

Query Store does require a little more planning than just putting in a RECOMPILE hint, though. Here are the things you wanna think through:

Are these things more work? Yep, absolutely, and the combination of RECOMPILE hints and enabling Query Store logging is going to have a performance impact on your server. You don’t wanna flip these switches unless you’re convinced it’s the best way to solve a parameter sniffing problem. There are other solutions, too – as a quick refresher, if you’re not familiar with any of the other ways, check out my video from SQLDay Poland:

How COVID-19 Affects Conferences

SQLBits just postponed their event, and I’m stepping out of SQL Saturday Iceland as well.

I’ve had the unusual position of being a conference attendee, speaker, and sponsor, all at different times of my life, so I wanted to do a quick brain dump explaining some of the gotchas involved behind the scenes.

Canceling a conference can financially ruin the organizers. The organizers spend a ton of money leading up to the conference – doing marketing, paying staff salaries, paying deposits (event space, food, hotels, printers.) Some of this stuff may be refundable, but … a lot of it isn’t, like staff salaries and supplies that are already purchased. Organizers can buy insurance, but to save money, they often don’t buy insurance to cover this kind of thing, especially diseases. As a result, when they have to cancel an event, they can be financially destroyed instantly. The SxSW festival just canceled their event, laid off 1/3 of their staff, and is considering bankruptcy.

Who cancels the conference can make a big difference. If the government cancels all events, then it can be easier to get money back for hotels, flights, event spaces, and make insurance claims. As a result, organizers can be tempted to play a game of chicken with government, trying to see who cancels first.

People are getting infected at conferences. For example, two people at RSA’s conference got infected, and it just kinda makes sense: people are flying in from all over the world, spending time in close proximity with strangers in airplanes, eating food from buffets, and of course, shaking hands with each other because it’s just hard to break that habit. (I’m still laughing at the Dutch prime minister who, after announcing a no-shaking-hands rule, promptly shook hands to close the announcement.)

Speakers and volunteers have to deal with more people. People come up afterwards and ask questions in close proximity, they want to shake hands, take selfies, hug, you name it. We’re at higher risk for infection, plus we’re especially dangerous if we’re the infected ones, and we spread it rapidly to other people.

(Personally, I have asthma, which means that if somebody infects me, I’ve got a much harder time fighting off the infection.)

Attendees are sensitive to the situation, too. They’re often packed in elbow-to-elbow with complete strangers in varying degrees of health, all breathing on each other for hours on end. Once an attendee starts sneezing and coughing, other attendees will start to feel uncomfortable, leading to awkward situations. For example, plane passengers became disruptive when an attendee had an allergy attack, and another plane ran into even stranger issues.

Sooner or later, conference attendees will ask organizers to remove someone of questionable health. For that to work, we all need to be on the same page about what’s accepted behavior at events, and attendees need to be told long ahead of time that they shouldn’t show up if they have symptoms that even look related to COVID-19. Sure, I get it – you don’t think you’re infected – but that doesn’t mean other people are going to be comfortable with you coughing into your elbow every five minutes, and wiping your mouth on your shirt. Conference organizers likely aren’t going to be sanitizing chairs and tables between sessions.

Organizers are already stretched to their limits. Leading up to an event, the organizers and volunteers do a heroic amount of work just dealing with regular conference issues. The Center for Disease Control has a prep document for event organizers, but just looking at the tasks in that list, I can tell that it’d take multiple full time bodies to check off all those tasks – and events often just don’t have the spare staff available.

Vendor staff don’t want to get infected. Companies make decisions to sponsor or attend a conference, and then they send their sales & marketing teams to the conference as well. Those employees may not have had much of a choice about whether they wanted to attend – they may not have the passionate feelings that you have about attending a conference to learn, because they’re just there to do sales and network. Their families ask tough questions about, “Why exactly is your company sending you to this event? Can’t you sell things from home?”

Everyone’s less likely to attend events right now. Companies are setting no-travel policies to protect their staff, which means the conference has less attendees, speakers, and sponsor staff available to attend.

When you add all this up, it’s a dark time for conferences: they have less attendees & revenue, but they have higher expenses to put on the event (because there are more health concerns to tackle, and all this costs money.) I don’t have easy answers – but as a speaker with asthma, I’m keenly interested in how events protect me and the other attendees. We’ve already got one person in the community being quarantined – we don’t need more. (Send your good vibes & prayers to Tim Radney, @TRadney.)

Footnote: this post might have even more typos & errors than usual because of its hurried nature. I’m scribbling it down quickly before Erika wakes up. We’re on vacation this month, driving around Iceland’s Ring Road, posting stuff on TikTokInstagramFacebook, and Twitter wherever you wanna follow along, and I’m trying not to do “work” work – but I wanted to get this out while I was thinking about it.

I know, it’s an odd time to go on a traveling vacation! We’ve been keeping a close eye on the news, washing our hands a lot, and frankly, avoiding other people, hahaha. The good news is that Iceland’s a great country for this kind of thing – plenty of beautiful wide open spaces for sightseeing – and the tourist places are even emptier than their usual winter season.

Things to Consider When SQL Server Asks for an Index

One of the things I love about SQL Server is that during query plan compilation, it takes a moment to consider whether an index would help the query you’re running. Regular blog readers will know that I make a lot of jokes about the quality of these recommendations – they’re often incredibly bad – but even bad suggestions can be useful if you examine ’em more closely.

In SQL ConstantCare®, we analyze SQL Server’s index recommendations and come up with our own suggestions. Here are some of the things we think about as we’re doing the analysis:

  • Did SQL Server ask for the same index yesterday?
  • Would the index have been used at least 100 times per day?
  • Does the table involved already have a clustered index? If not, we should probably think about adding one of those first.
  • Does the table have 5 or less indexes? If it has more, it might be time to think about deduping those with my D.E.A.T.H. Method first.
  • Does the table have <100M rows and under 10GB in size? If so, the indexes are easier to pop into place. At higher sizes, you want to approach the tables with a little more care and planning since even adding an index can require a maintenance window depending on the speed of your hardware.

SQL ConstantCareOnce we’re past those initial hurdles, we think about the index recommendation itself:

  • Does it have only one key and one included column? If so, it probably makes sense to promote the include into the key, as long as the first key makes sense too.
  • What are the datatypes on the columns? If they’re big (say, over 200 characters), consider removing ’em, especially if they’re included columns.
  • How many columns are in the index? If it’s more than 5 elements, consider removing the included columns.
  • Is the recommendation a narrower subset of an existing index? If so, it’s probably not a good fit.
  • Is the recommendation a wider version of an existing index? If so, you may want to remove the narrower (old) version after you replace it with the wider new version.
  • After your tweaks based on the guidelines above, does the index exactly match an existing index on the table? If so, set it aside – it’s time to review the high-read queries instead rather than looking at the missing index recommendations.

Whew. It’s a lot of stuff to think about – but with SQL ConstantCare®, you don’t have to. We’re gathering the metadata daily, watching what SQL Server does, gradually rolling out index recommendations to help, and then circling back to the index usage DMVs to make sure that the index actually helped.

Making TRY_CAST and TRY_CONVERT Queries Faster with Indexed Computed Columns

I know this is gonna sound crazy, but let’s say you had a table where people stored all kinds of things in one column: dates, integers, file names, sale prices, file names, you name it. And let’s say your application frequently ran a query looking for dates in that column, like this:

Even if you create an index on DisplayName, SQL Server ignores it because it believes so many rows are going to match, and it doesn’t want to do all the back & forth key lookups between the DisplayName index and the clustered index (to get SELECT *):

Yes, believe it or not, there are actually users in the Stack Overflow database whose names can be converted to dates:

So anyhoo, I’m not allowed to change the query, but I need to make it go faster. I have 3 problems:

  1. The estimates are wrong, so
  2. SQL Server refuses to use the DisplayName index (due to an artificially high number of key lookups)
  3. SQL Server can’t seek specifically to the rows that match (it has to do the cast on every row)

I can fix problems #1 and #2 with a computed column, like I explain in the Artisanal Indexes module of my Mastering Index Tuning class:

Presto, the estimates are more accurate, and SQL Server is now doing less logical reads (because it’s scanning the DisplayName index rather than the entire clustered index, which includes all the columns.) Now, to solve problem #3, I would usually add a nonclustered index on our newly created column, but…

You might think, “Wait, how can a cast as a date be non-deterministic? Does it have something to do with the fact that it might sometimes return null?”

No, it’s because your session’s date format can cause the same string to return different dates. Here, I’m using the Books Online example for SET DATEFORMAT to set my default date formats to different styles, and then when I do a TRY_CAST on the same value, I get different dates:

Diabolical. So I can’t use an index on a computed column to make this thing crazy fast, and I’ll just have to settle for an index scan.

But TRY_CONVERT *does* work,
as long as you specify a format.

If I try this same trick with TRY_CONVERT and I specify the exact date format I want – obviously ISO 8601, as any XKCD fan will tell you:

SQL Server is willing to create a nonclustered index on that, AND it’ll automatically do an index seek on it rather than scanning & computing the whole thing. That’s awesome! However, if I try that same trick without a specific format, I’m back to being unable to create the index on it:

Dang. And don’t even get me started on filtered indexes with that stuff – that’s a whole ‘nother ball of wax. For that kind of thing, hit the Artisanal Indexes module.

How to Find Cool Questions and Answers on DBA.StackExchange.com

DBA.StackExchange.com is a Stack Overflow site that’s specifically targeted at database administrators. There tend to be some pretty interesting questions in there – not just for SQL Server, but all kinds of databases – but at the same time, there’s also a lot of noise.

To help find fun questions, I use The Power of SQL™. Data.StackExchange.com is a web-based front end, kinda like SSMS, that lets you run queries against a recently-restored copy of the Stack Overflow databases. It’s different from the downloadable Stack Overflow database because it’s updated much closer to real time, and you don’t have to hassle with downloading it each time it’s updated.

Here are some of my favorite queries:

Recent Unanswered Questions Sorted By Views – I find views to be a good way to gauge questions: if other people find it interesting enough to click on, then I’m probably gonna feel that way too. There are parameters to let you choose how far back you want to look (say 9999 for tons of history if you wanna unearth an artifact), and a parameter to ignore specific users (like my own questions, or people whose questions I don’t really find that interesting.) Columns include:

  • Score – how popular the question is
  • AnswerCount – how many answers have already been posted (but not accepted as correct)
  • TopAnswerScore – because sometimes the community really likes an answer, but the asker hasn’t bothered to accept it yet
  • BodyLength – because I’m often interested in questions where the asker put in a lot of effort. Similarly, I’m interested in long answers, which brings me to…

Recent Long Answers – A long answer isn’t necessarily good, so I also include the score in the output column. However, I’m usually curious about what drove someone to type tens of thousands of characters into the answer box. I usually don’t even care what database it’s about – when someone writes that much, I want to at least glance at it to see what’s going on. Here are a few recent examples:

Most Interesting Recent Comments – because sometimes the real fun is in the peanut gallery, like when Andy Mallon says he once saw a table named dbo.tblTable.

You can use all of these queries on any Stack Overflow network site. When you’re looking at the query, just look in the parameter section and start typing in the “Switch sites” box.

Enjoy!

SQL ConstantCare® Population Report: Winter 2020

Ever wonder what’s “normal” out there for SQL Server adoption rates, hardware sizes, or numbers of databases? Let’s find out.

SQL ConstantCareOur SQL ConstantCare® service lets users send data in about their SQL Servers and get health & performance advice via email. Users can opt in to share their anonymized data to get better recommendations overall.

This isn’t a full picture of the SQL Server ecosystem. Lemme be really clear: this is by no means a random sample of the full population. It’s just a picture of the kinds of servers where people want an inexpensive second opinion. It doesn’t include giant corporations, hospitals with PII, or tiny businesses who just shove a SQL Server in the corner and then never take a second glance at it (and if my experience is any indication, never back it up.)

Even though it isn’t a full picture, with 3,570 database servers sharing data as of March 1, there’s still a lot of useful information in here for the community. Understanding what servers are out there will help you put context around where your own servers rank, and helps the community better build training material that’s relevant for most users.

This population report is just my attempt to help clear up the fog of war. Microsoft simply doesn’t give us any adoption information (even though SQL Server phones home by default.) If you believed what you saw onstage at conferences and in Microsoft’s changing certification programs, you’d think no one still bothers with self-hosted SQL Server, but the few who do have already migrated to SQL Server 2017 and 2019. This report is my humble attempt to keep Microsoft honest.

Let’s start with a simple question:

What versions are the most popular?

Or if you’re into percentages:

Thoughts on reading those charts:

  • The adoption rates haven’t changed much since the SQL ConstantCare Fall 2019 Population Report. SQL Server 2019 is almost at 1% adoption, with 25 intrepid users running it in production. (About 1/3 are on 2019 RTM, 1/3 CU1, and 1/3 CU2.)
  • SQL Server 2008 still has more adoption than Azure SQL DB and Azure Managed Instances combined.
  • If you purely compared SQL Server 2019 vs Azure, you’d think that Azure is the future because it’s outpacing SQL Server 2019 adoption. I just don’t see that as true because the vast majority of shops I know on SQL Server 2012/2014/2016 aren’t making adoption plans for Azure SQL DB or Managed Instances yet. Instead, they’re either comfortable where they are, or they’re migrating to VMs in the cloud. (I do think the default new database platform will be PaaS several years from now, but not for migrations of existing databases. We’re not there yet.)
  • SQL Server 2016 & 2017 market share has gone up a little, with these two capturing 52% of the market. 2014 is still outpacing 2017 though (21% vs 18%.)
  • About 1 in 10 SQL Servers are no longer under support. (The number is slightly higher since we don’t monitor 2000 or 2005.)

I’m going to continue publishing the numbers in this format (only showing the current population) for the first year of the survey, but on the first year’s anniversary, I’ll switch to graphs that show trending and year-over-year changes.

How about development? Are people maybe testing their apps on newer versions?

What versions are people using in development?

Both 2017 and 2019 got nice bumps here over last quarter – people are starting to develop and test more against 2017 and 2019. However, 2019’s low showing here is a pretty good leading indicator that SQL Server 2019 has a tough road ahead if it’s going to crack a 10% adoption rate in the next couple/few quarters.

I think about this chart a lot when I’m designing training. When is the right time to start building developer training material for SQL Server 2019? It’s a tricky timing act: you need to be skilled-up before you adopt the new version, and I need to have the training material ready to go before then. However, if I focus on building 2019-specific training material right now, it could be 6-12-18 months before it really starts to pay off. Judging by this chart, I’d be better off building 2017-specific training material since there are a lot of people who still haven’t adopted it yet (roughly 80%.)

For the rest of the census report, I’m going to focus on just the production servers.

How much data do servers have?

I think this chart really helps to set the rest of them in perspective:

Data sizes have actually grown a little since the last report. Y’all aren’t doing archiving, ha ha ho ho. There’s been a slight shift up from the 0-25GB market up to the 25-125GB market. Not huge, though.

A lot of the SQL ConstantCare® population have what I would consider to be a fairly small server: 30% of them are hosting <25 GB data, and 69% are smaller than the Stack Overflow demo database. That’s why I like using that database so much for training: it’s large enough that it represents (and challenges) a good chunk of the audience.

About 15% have 1TB of data or more, but even though that number sounds low, the population is still large. That still gives us about 500 servers’ worth of metadata to help guide folks who are in those larger tiers. There are a over a dozen instances in the population with over 10TB – good times there.

Is Enterprise Edition normal for larger databases?

Forgive the awkward visualization – I set myself a 2-hour goal to do this analysis and crank out the blog post, so I’m on a bit of a tight timeline – but I think the numbers are pretty neat:

At 0-125GB of hosted data, Standard Edition is twice as common as Enterprise Edition.

Around 200-750GB, the numbers even out.

Up in the 1TB+ tier, Enterprise Edition is twice as common as Standard Edition. (But the fact that folks are even using Standard Edition at all in this tier might surprise some of you in the audience.)

How much hardware do production servers get?

This really helps to reset folks’ expectations around what’s normal for SQL Servers out there. There are a huge, huge amount of production SQL Servers that just don’t need a lot of CPU power. About 70% have 1-8 cores, while only 10% of servers have more than 24 CPU cores.

This seems low at first: about 60% have less memory than my laptop. However, remember that almost 50% of the servers out there have 4 CPU cores (or less), so let’s try excluding them:

That makes sense: much, much less of the 5+ core servers are using 0-32GB RAM. Good to see folks investing in the resource that has an amazing impact on performance, especially given the SQL Server licensing costs involved.

How many databases are hosted per server?

In the last survey’s comments, sonalpathak28 asked about this number out of curiosity from an application perspective and a DBA maintenance one: how many databases are folks hosting per server?

And because I know you’ll be curious, over a dozen servers have over 1,000 databases on ’em. In all cases, they’re a lot of small databases (under 2TB total data size per server.)

What questions would you ask the data?

If there’s something you’d like to see analyzed, leave a note in the comments. To make it more likely that I’ll actually run your experiment, include why you want to know it, and even better, your guess as to what the data’s going to say. (That’ll make it a more fun game to play, heh.)

Breaking News: SQL Server 2019 CU2 Breaks Agent.

That's it, I'm rolling back to SQL Server 6.5

That’s it, I’m rolling back to SQL Server 6.5

You reported it here in the comments, and over at SQLServerUpdates, and in this DBA.StackExchange.com question.

The official word is in: yep, SQL Server 2019 Cumulative Update 2 breaks Agent jobs on some servers, causing them not to run.

That official word is only in that linked post, though: there’s still no word in the official CU2 page about the bug, nor does the CU2 bug fix list even mention SQL Server Agent, so we were all pretty surprised that Agent would break when it wasn’t even supposedly changed in this CU.

For now, the official word is to uninstall CU2 if you’re experiencing the issue.

Free Fundamentals of Index Tuning Week: Part 5, Recap

Whew. You’ve learned a lot. (Well, I’ve tried to teach you a lot, anyway.) Here’s what we covered this week:

  1. Indexing for the WHERE Clause, then we covered
  2. Indexing for ORDER BY (and layering in filters as well)
  3. Indexing for JOINs
  4. SQL Server’s built-in index recommendations (Clippy)

In today’s lecture, I’ll recap your most important lessons, and then give you your real-life lab: running sp_BlitzIndex in your production SQL Server and applying what you learned in class. I’ll map out your next steps, either Mastering Index Tuning or Fundamentals of Query Tuning. I’ll explain what you’ll learn in each of those classes.

If you like this, you can keep the learning going with:

Folks who wanna take ’em all can get my Live Class Season Pass or a bundle of the Masters Class Recordings.

Free Fundamentals of Index Tuning Week: Part 4, Clippy’s Index Recommendations

Hey buddy!

Monday, we covered Part 1: Indexing for the WHERE Clause, then we covered ORDER BY, and yesterday we did indexing for joins. Today, let’s cover SQL Server’s built-in index recommendations, aka Clippy. Make sure to keep up – on April 1, these posts will magically disappear.

Lecture (33 min)

As you’ve been working through the labs, you’ve surely caught Clippy’s missing index hints in some of your execution plans. You might have even looked at some of them and thought, “Well, that last one made sense, but this one here – this seems totally different than what Brent is teaching me to make. Who’s right, Brent or Clippy?”

In this module, you’ll learn the rules that the SQL Server engine uses when building missing index recommendations. You’ll understand where he’s coming from, but then also how to take his recommendations and adapt them to work better for real-life queries.

 

Lecture Demo Scripts

If you want to follow along with me during the class, this will make it a little easier. I often go off-script and tweak demos based on attendee questions – this is just my starting point:

Bonus Demo

Want to see how Clippy builds index recommendations? You can prove it:

Hands-on lab: now it’s your turn.

Now, it’s your turn: take the below queries and run them all at once without looking at ’em. Then, use sp_BlitzIndex to read the missing index recommendations, interpret them, and try to craft better indexes WITHOUT LOOKING AT THE QUERIES. Then, after you’ve made the list of indexes you want to create, go through the queries and try to guess which query triggered which missing index request – and whether your index is a good fit.

This first demo script runs a bunch of different queries against the Stack Overflow database in order to populate your missing index DMVs:

Run that, and it’ll run a bunch of queries 25 times. It’s okay if you start working with the next demo script even while that last one is still running – it’s just going to keep populating more of the same missing index requests.

Now start with:

Checking your answers against Brent’s (10m)

Tomorrow we’ll finish up with a recap.

If you like this, you can keep the learning going with:

Folks who wanna take ’em all can get my Live Class Season Pass or a bundle of the Masters Class Recordings.

Free Fundamentals of Index Tuning Week: Part 3, Indexing for Joins

Monday, we covered Part 1: Indexing for the WHERE Clause, and yesterday we covered ORDER BY. Today, let’s tackle joins. Make sure to keep up – on April 1, these posts will magically disappear, no foolin’.

Lecture (31 min)

When you join multiple tables together like this, and you’ve got multiple indexes, how does SQL Server choose which index to use? Would it use an index on Comments.UserId, or one on Comments.Score? Or will it use both, since it needs to both filter and sort?

We’ll show how selectivity influences the engine’s choice of indexes (just like it did for WHERE and ORDER BY), and why you might start out indexing foreign keys, but then have to discard those along the way.

Lecture Demo Scripts

If you want to follow along with me during the class, this will make it a little easier. I often go off-script and tweak demos based on attendee questions – this is just my starting point: