Free Webcast: Planning a SQL Server to Azure Migration in 2021

Microsoft Azure
9 Comments

You manage database servers for your company, and they’re thinking about migrating everything to Microsoft Azure. You’ve searched the web for information and advice, but everything you’re finding is outdated.

You just want a list of things to review on your servers, an overview of what won’t work in the cloud, and how to pick alternatives or workarounds.

In this one-hour session on Thursday, March 18, we’ll cover pure technical details, zero marketing, and get you up to speed quickly with none of the fluff.

Register now for the free webcast. If you can’t be there live, no worries – links to the recordings will be sent to the registrants. And if you’ve got questions you’d like to see me touch on during the webcast, you can get a jump on the Q&A by asking ’em below in the comments.


New SQL ConstantCare® Feature: One Serious Query to Tune

SQL ConstantCare

Our monitoring tool, SQL ConstantCare®, is different.

Some monitoring tools send you a whole bunch of emails for every little thing that’s happening on every one of your server – and having you end up just creating an Outlook rule to dump them all into a folder that you never read. PAGE LIFE EXPECTANCY IS LOW! DISK QUEUE LENGTH IS HIGH! It all just turns into noise.

SQL ConstantCareSQL ConstantCare® is minimal: just one email per day. We tell you when there’s something seriously wrong that demands attention.

So in the interest of minimalism, we’ve got a new feature: One Serious Query to Tune. When >50% of your SQL Server’s workload is caused by just one query, we tell you about it. We know you’re too busy to tune every query or index that could help performance, but in cases like this, we call out that there’s a serious issue that will probably be worth your time to fix.

You might be thinking, “Brent, how often does that really happen?”

Well, last week alone, it happened for 328 of our users!

Sign up now and get a free 14-day trial. You can cancel anytime without charge before your trial ends. What will we discover together about your servers?


Updated First Responder Kit and Consultant Toolkit for February 2021

First Responder Kit Updates
0

Along with the usual bug fixes & improvements, this month’s release includes a brand new sp_BlitzAnalysis script from Adrian Buckman. It’s for folks who have an Agent job to log sp_BlitzFirst to tables every 15 minutes, saving wait stats, Perfmon counters, file stats, etc into tables. sp_BlitzAnalysis reads those tables and sums up activity in your chosen time period, like this:

To install it, run the sp_BlitzAnalysis.sql file included in the First Responder Kit. It’s not a complex installation or anything – I just didn’t want to include it in the main installer scripts because this is the first public release, and I haven’t tested it myself yet either. (Been a little busy with the whole Iceland move.) To learn how to use sp_BlitzAnalysis, read the documentation. For questions or support on it, hop into the #FirstResponderKit Slack channel. (If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

How I Use the First Responder Kit
Watch and learn

To get the new version:

Consultant Toolkit Changes

I updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy your spreadsheet and querymanifest.json into the new release’s folder.

sp_Blitz Changes

  • Fix: the check for the most recent CHECKDB dates was running even when @CheckUserDatabaseObjects = 0. This was the only instance where we were using sp_MSforeachdb with @CheckUserDatabaseObjects = 0, so it’d slow sp_Blitz down on instances with hundreds or thousands of databases. (#2779)
  • Fix: put comments around the term xp_cmdshell to fool a firewall. (#2788, thanks TLovie.)
  • Fix: changed typo of BrenOzar.com to BrentOzar.com on one check. (#2786, thanks Michael Gowan.)

sp_BlitzAnalysis Changes

  • All new! This is the first release that includes sp_BlitzAnalysis by Adrian Buckman. If you want to install this script, install sp_BlitzAnalysis.sql – it’s not included in the normal installation scripts yet just to limit the blast radius in case something goes wrong.
  • For tips on how to use it, check out the documentation.
  • For questions or problems, see the “For Support” section of this doc.

sp_BlitzCache Changes

  • Improvement: default output now includes PlanGenerationNum from sys.dm_exec_query_stats. Helps troubleshoot scenarios where the plan is frequently recompiling due to statistics changes. (#2792, thanks Tom Lovie.)
  • Fix: a recent release broke @SortOrder = ‘recent compilations’, and that’s fixed. (#2772)

sp_BlitzFirst Changes

  • Improvement: if more than 20% of the queries are runnable, waiting on CPU power to make progress, we warn you as shown above. sp_BlitzFirst makes two passes through to check – once at the beginning of the sample defined with @Seconds (defaults to 5), and again at the end. (#2776, thanks Erik Darling.)
  • Fix: if you passed in @OutputTableRetentionDays because you wanted to store more or less history in the output tables, that change wasn’t being passed to sp_BlitzWho, which was still just storing 3 days. (#2758, thanks Emanuele Mezzo.)
  • Fix: the high compilations/sec and recompilations/sec warnings had way too high of thresholds. They were looking for 1,000 batch requests per sec, but you can see high CPU usage due to compilations with even just 10 big ugly plans per second getting generated. Lowered the threshold to either 10 compilations/sec, or more compilations than batch requests per second – which happens when a proc has multiple statements with recompile hints in them. (#2770)

sp_BlitzIndex Changes

  • Improvement: duplicate indexes are now sorted by row count descending so that the big bang-for-the-buck stuff comes first. (#2762, thanks Todd Chittenden.)
  • Improvement: Aggressive Indexes warnings are now sorted by total lock wait time descending. (#2768)
  • Fix: a few releases back, I added support for SQL Server 2019’s new missing index DMV that tells you which queries are triggering the request. I had some challenges getting it to work reliably, and Microsoft just now documented it so I’m removing support for it temporarily. It’ll be back. If you want to help with coding for it, check out #2185. (#2780)

sp_AllNightLog Changes

  • Fix: the RPO and RTO now honor overrides set in the config table. (#2775, thanks Alin Selicean.)

sp_ineachdb Changes

  • Improvement: Azure SQL DB compatibility in the sense that it’ll run on Azure SQL DB, but it can’t actually change databases since Azure SQL DB doesn’t allow that. I’m referring to Azure SQL DB “Classic Flavor” here, not Managed Instances. (#2790)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


How to Trace a Nested Stored Procedure Without Using Profiler

Extended Events
5 Comments

You’ve got a stored procedure that calls other stored procedures that build dynamic SQL that call other functions, and there’s so much calling going on that you would like to phone a friend.

Your query plans look like piles of spaghetti, and when you zoom in, you just see more tiny piles of tiny spaghetti.

Finding the slow parts is really easy thanks to sp_HumanEvents by Erik Darling (Blog@ErikDarlingData.) Download it, install it in your master database, and you’ll be ready to go.

In one window, get ready to run the nasty stored procedure. Make a note of which session you’re in – it shows up at the bottom of your SSMS window like this:

See the “SQL2019\Brent (60)”? That number in parenthesis (60) is my session ID.

In another window, fire up sp_HumanEvents:

The parameter names & values are pretty self-explanatory there. After 30 seconds, you’ll get a nice rundown of what queries ran, how long they took, their execution plans, and much more.

You could do this same thing with Profiler, but it’s more painful to set up, has more overhead, and doesn’t give you the nice, clean, filtered result set. You could also do this same thing by rolling your own Extended Events session, but seriously, who’s got time for that? Erik’s done all the hard work for you.

If you like this, you’ll love how easy it is to find queries using option recompile.


Changing Statistics Cause Longer Compilation Times

Execution Plans, Statistics
18 Comments

I need to be up front with you, dear reader, and tell you that you’re probably never going to need to know this.

I try to blog about stuff people need to know to get their job done – things that will be genuinely useful in your day-to-day performance tuning and management of SQL Server. This, however, is not one of those things. This is fun, obscure trivia, documenting a situation you’ll probably never run into, prompted by Forrest’s post proving that compilation time includes more than parse time.

Start with a hard-to-compile query plan.

I’ll use any Stack Overflow database and the hard-to-compile query plan from my post, Bad Idea Jeans: Building Big Query Plans, modified to join to the Users table:

Run the query, and check the Messages tab:

That complex query plan is hard to build, so it took some time:

The table variable doesn’t have any rows in it, though, so execution time is nearly instantaneous. The hard part here is clearly building the query plan.

But it gets much worse.
Try updating stats while it runs.

In one window, free the plan cache and start rebuilding the indexes on the Users table with online = on so the CTE query can execute:

Then in another window, run the CTE query again. Make a pot of coffee or hit the loo, because you’re gonna have some time. Several minutes later, both queries finish, and the output of the CTE’s statistics is pretty bizarre:

What’s happening? Every time the Users rebuild finishes, it’s also updating stats. The query compilation sees those updated stats, and…decides that the plan it just got done building is invalid, so it’d better go try to build another plan! Thus, the repeated messages about parse & compile time.

Yes, I actually hit this problem.

It sounds crazy, but to recreate a client issue, I was loading a 10 billion row table, which takes a while. As it was loading, its statistics were constantly changing, because as it grew, it was constantly hitting the automatic stats updates threshold.

As a result, simple queries with no where clause or order by could finish instantly, but add in even the simplest where clause, and the query would never finish. It was constantly recompiling as more data was loaded.

Shout out to Joe Obbish and Paul White who helped me troubleshoot this problem last year – all credit for solving the mystery goes to them:


What’s New & Undocumented in SQL Server 2019 Cumulative Update 9

SQL Server 2019
26 Comments

Last week’s SQL Server 2019 Cumulative Update 9 snuck in some undocumented things.

We got two new DMVs for Always On Availability Groups, sys.dm_hadr_cached_replica_states and sys.dm_hadr_cached_database_replica_states. They have a subset of data from other AG DMVs:

I sure wish I could point you to the documentation or KB article on those.

But we have no idea what changed in CU9 here, or why, or what to do about it.

I’m gonna be honest with you, dear reader: I’m starting to tell clients to choose SQL Server 2017 over SQL Server 2019. 2019 just doesn’t seem like it’s baked, and 2019’s Cumulative Updates have just become too large of a risk: the release notes aren’t honest about you’re deploying. That’s not fair to organizations that depend on SQL Server for mission critical deployments: they need to know what changed so they can be on the lookout for new unusual behaviors.

If your own developers were changing production without telling you how or why, you’d call them into the office, close the door, and have a conversation.

In lighter news, there are also a few new messages in sys.messages:

  • 898: Buffer Pool scan took %I64d seconds: database ID %d, command ‘%ls’, operation ‘%ls’, scanned buffers %I64d, total iterated buffers %I64d, wait time %I64d ms. See ‘https://go.microsoft.com/fwlink/?linkid=2132602‘ for more information.
  • 7877: Backup on secondary replica is not supported in the connection to contained availability group. Reconnect to SQL Instance and retry the operation.
  • 8659: Cannot continue columnstore index build because it requires %I64d KB misc memory per thread, while the total memory grant for query is %I64d KB, total Columnstore bulk insert memory is limited to %I64d KB in query, and the maximum memory grant is limited to %I64d KB per query in workload group id=%ld and resource pool id=%ld. Retry after modifying columnstore index to contain fewer columns, or after increasing the maximum memory grant limit with Resource Governor.
  • 35528: unable to read page due to invalid FCB
  • 46933: PolyBase configuration is invalid or corrupt. Re-run PolyBase setup.

Query Store is the Flight Data Recorder. It’s Obvious What We Need Next.

Humor
31 Comments

When Microsoft brought out Query Store, they wrote:

Similar to an airplane’s flight data recorder, Query Store collects and presents detailed historic information about all queries, greatly simplifying performance forensics by reducing the time to diagnose and resolve issues.

I have a straight face under here, for real

It certainly works, making it easier for you to understand when a query’s execution plan suddenly goes downhill, taking your server with it.

But it isn’t enough.

In airplanes, it’s not enough to know that the throttles were powered down and the altitude plummeted. It’s not enough to know what happened: we need to know why it happened. What problem did the team think they were facing? Was there a misunderstanding in the cockpit about a metric? A disagreement between team members about how to solve the problem?

That’s why we need Voice Store.

Just as a cockpit voice recorder helps investigators better understand why decisions were made, SQL Server Management Studio needs a Voice Store to capture what the team was trying to do in the moments leading up to the crash.

To help speed up accident investigations, we simply can’t afford delays. That’s why it’s also important that Voice Store be connected directly to YouTube, automatically uploading recordings to help the community get to the root cause analysis.


What’s a Size-of-Data Operation?

Database professionals say, “That’s slow because it’s a size-of-data operation.”

A size-of-data operation is when the database has to write all of the affected data again. What you’re asking for might seem small, but for the database, it causes a lot of work.

For example, let’s say you live in a strict household, and your parents have told us that under this roof, everyone’s hair can only be black, blond, brown, or red. (Don’t blame me. I don’t make the rules. They’re your parents, not mine. Go talk to them. And tell your mom I said hi.)

If your parents suddenly loosen up and say, “We’re now allowing you to dye your hair pink,” that doesn’t mean we have to touch everyone’s head. It’s a metadata operation: we’re just making a new note so that when someone walks into the door and we check their hair color, we have a new acceptable option.

However, if your parents freak out and say, “Everyone’s hair must be pink,” that means we’re gonna need to dye every person’s hair. We’re gonna have to gather everybody together and do some real work.

That’s a size of data operation: however many people live in the house, we’re gonna need to work on all of ’em.

How that translates into databases

I’ll use the Users table in a large Stack Overflow database to explain.

If I alter a table and add a new nullable column, it happens instantly, and SQL Server doesn’t have to touch any pages:

Adding a column like that is just a metadata change. It tells SQL Server, “When new rows walk into the house, they’re allowed to specify their HairColor now. When rows leave the house, if they don’t already have a HairColor set, just say it’s NULL.”

However, if I populate that column and dye everyone’s hair pink, that’s gonna take some time, because we’re gonna have to touch every row:

That’s a size-of-data operation, so it’s slower.

How that affects development & deployment

When you’re just getting started developing an app, and all your tables are empty, you don’t really have to worry about this kind of thing – because size-of-data operations are fast when there’s, uh, no data involved.

But the larger your data becomes, and the slower your storage becomes (like if you move from nice on-premises flash storage up to the cloud), then the more you have to pay attention to this. You’ll want to run your deployment scripts on a real production-sized database, not just an empty development shell, and time the deployment scripts. The longer an operation takes, the more likely it is a size-of-data operation. On larger databases – at 100GB and higher – and at 24/7 uptime shops like online stores, you’ll likely have to tune your deployment scripts to avoid size-of-data operations.

I haven’t seen a good rundown of all the operations that are size-of-data operations for Microsoft SQL Server, but it’s changed over the years. Hey, if you’ve got a blog – there’s a potential idea for you! Not me though. The hot tub is calling.


I Sat Down for an Interview with Forrest Brazeal.

Cloud Computing
6 Comments

Here’s how the interview started:

Forrest Brazeal: Brent, you’re a performance specialist who’s worked with massive SQL Server deployments; how does it make you feel when you hear cloud architects saying things like “relational databases don’t scale?” What do you wish they knew?

Eyes up here, kid
“Time for a smoke break.”

Brent Ozar: They’re right. Nothing scales worth a damn. Even stuff that looks easy, like Amazon S3 file storage, can run into scaling problems if you design it without an analysis of how AWS partitions reads, and how you have to watch out for your object prefixes.

But the thing is, nobody reads the documentation.

Read the interview here. I had a lot of fun with it.


Partitioned Tables Cause Longer Plan Compilation Times.

Partitioning
10 Comments

Folks sometimes ask me, “When a table has more indexes, and SQL Server has more decisions to make, does that slow down execution plan generation?”

Well, maybe, but the table design choice that really screws you on compilation time is partitioning. If you choose to partition your tables, even tiny simple queries can cause dramatically higher CPU times. Even worse, as the famous philosopher once said, “Mo partitions, mo problems.

We’ll start with any Stack Overflow database, create a numbers table, and then dynamically build a partition function that partitions our data by day:

Then we’ll build a partitioned copy of the Users table, partitioning them by CreationDates:

Let’s compare partitioned and non-partitioned performance.

I’ll create a non-partitioned index on both the Users and Users_partitioned tables. Note that even if I don’t specify partitioning on the Users_partitioned index, it still ends up partitioned by default, which is kinda awesome:

Now we’ll run an exceedingly simple query that only returns one row:

At first glance, the plans look the same – but notice how the non-partitioned table is 0% of the cost, and the partitioned table is 100% of the cost:

That’s because the non-partitioned estimated cost was way less than 0.001, and the partitioned query’s estimated cost is over 15. What’s worse, the compilation time, execution time, and logical reads are totally different – the top one is the non-partitioned query, and the bottom is the partitioned one, both cleaned up for readability:

Hubba hubba. Just compiling the partitioned table’s plan took 27ms of CPU time. I know what you’re thinking: “Brent, who cares about 27ms of CPU time?” Well, remember, this is an overly simple query! In real-life examples, it’s not unusual to see 250+ milliseconds of CPU time spent just compiling the plan – which means you can only compile 4 queries per second, per CPU core. That’s when plan cache pollution due to unparameterized queries really screws you.

I’m writing this particular post because I had a client who chose to partition every table in their data warehouse, regardless of size, and used exactly the same partition granularity on all sizes of tables to keep their dynamic ETL queries simple. Unfortunately, even querying simple configuration tables was taking 250+ milliseconds of CPU time just to compile a plan.

Fine – but what about non-partitioned indexes?

I’ll drop that partitioned index and create a non-partitioned index on the Users_partitioned tables – note that I have to specify ON PRIMARY as the filegroup for the partitioned table, or else any nonclustered index will by default automatically be partitioned as well – then try the queries again:

Now we’ll run an exceedingly simple query that only returns one row:

Now the estimated costs are neck and neck:

But that’s just an estimate – the only thing neck and neck here are the logical reads:

Designing and executing a query plan takes longer when any of the objects involved are partitioned.

This overhead isn’t large as long as you’re comparing it to the overhead of big data warehouse queries where partitioning data can reduce reads. But if you’re comparing it to smaller objects – say, 100GB or smaller rowstore indexes – that are queried more frequently, with more varying queries that require building new execution plans – then the overhead of partitioning starts to add up. Mo partitioned objects, with mo partitions in each object, mo problems.

Partitioning is one of those features where I never hear people say, “Wow, every which way I turn, partitioning has just made performance better!” Rather, it’s a feature where people keep saying, “I had no idea partitioning was going to cause me problems over THERE, too.”

In my Mastering Index Tuning class, we spend just enough time on partitioning to explain where it’s a good fit – but much more time on indexing techniques that are more appropriate for 99.9% of workloads out there.


Unusual Parameter Sniffing: Big Problems with Small Data

Parameter Sniffing
22 Comments

Normally when we think about parameter sniffing, we think about a tiny data vs big data problem: the tiny data’s plan goes in first, and when we try to process big data, performance is terrible.

But sometimes, it’s the exact opposite.

I’ll start with the large Stack Overflow database (I’m using the 2018-06 version, but any large one will do) and write a stored procedure to find the most recent posts authored by folks in a specific location:

To help SQL Server out – I’m a giver like that – I’ll create a few indexes to give SQL Server some choices:

Call it for tiny data first…

When we call the proc for a less-populated location first, SQL Server chooses to look up the people who live in Iceland (because there aren’t many), and then go find their posts. Small data finishes in under 1 second:

If we now run it for a larger location, like Germany, the query takes 24 seconds to run because we find so many people in Germany, and they’ve posted so many answers. (I’m sure they’re not asking a lot of questions. Germans are really smart.)

And the sort spills to disk because we only granted Iceland-size memory.

This is the typical parameter sniffing problem that people blog about: put the tiny data parameters in memory first, and SQL Server just isn’t equipped to deal with big data.

But if the big data plan goes in first…

Do things perform better? Let’s free the plan cache, then start with Germany:

This time around, SQL Server says, “Ah, Germany’s really common. You’re asking for these posts to be sorted by CreationDate – as a reminder, here’s your query and your indexes:”

“So since you’re asking for the top 200 by CreationDate descending, I’ll just scan the CreationDate index from newest to oldest. For each post I find, I’ll go look up the Users table to see where that user’s from. I bet I won’t have to look up too many before I’ll find 200 of ’em that were posted by Germans.”

And he’s right: he only had to look up about 20,000 of them before he found 200 that were written by Germans. Very cool.

But…when we run this for Iceland…

SQL Server has to read over a million Posts, doing over a million Posts key lookups, and over a million checks into the Users table before it’s found 200 posts that were written by Icelanders. Ten seconds isn’t so bad, but if you pass in an even rarer location, like the charming and easy-to-pronounce Hafnarfjordur, Iceland:

SQL Server again scans the index on Posts, doing a key lookup on Posts for every single row that it finds, and I bet that sounds goofy. Oh it gets worse: for all 40,700,647 Posts, it does a clustered index seek against the Users table. We do over 124 MILLION logical reads on a table that only has 143K pages in it. We read the Users table 871 times over, and the Posts table 6 times over due to all the key lookups:

That sure is a lot of work – would be nice if SQL Server parallelized that query across multiple threads. No can do, though – the query plan was designed with Germany in mind, when SQL Server thought it’d find rows quickly. The whole time this query runs, it’s just hammering one of my CPU cores while the rest sit idle:

There is no one good plan here.

I always laugh when I see demos that talk about “the good plan” and “the bad plan.” I wish the real world was so simple, so black-and-white. In reality, queries have many possible plans, and many of ’em just don’t work for other sets of parameters.

SQL Server 2017’s “Automatic Tuning” does nothing here, either: it can try throwing a different plan in thinking there’s been a regression, but it’s not smart enough to pick different execution plans for different parameters. It picks one plan and tries to make it work for all of ’em – and that simply doesn’t work here.

It’s up to us to fix it, folks.

That’s why I teach my Fundamentals of Parameter Sniffing and Mastering Parameter Sniffing classes. Folks with a Live Class Season Pass can start watching the Instant Replays whenever they want, or drop in live. I’m teaching a bunch of classes over the next few weeks:

See you in class!


My Live Streaming Setup for 2021: Iceland Edition

Home Office, Streaming
8 Comments

Erika and I moved to Iceland for a while. For those of you who don’t follow my personal blog, here’s why we moved, and how we got here with a teleworker visa. You can spy on our adventures via my Instagram, Facebook, or Twitter.

We’re working half the time while we live here, and sightseeing the other half. It’s easy to telecommute from Iceland because it’s one of the most connected countries in the world. Most houses have symmetric gigabit fiber available – even though we’re extremely isolated, we still have great bandwidth:

This isn’t a permanent move – probably just 6-12 months – and we’re doing some traveling through the country, so I didn’t want to pack my full home office setup with my Mac Pro and big monitors. I downsized my streaming setup, and I figured some of y’all might find it interesting. It’s closer to what most folks would use for remote presenting.

Laptop: MacBook Pro 16″ – the fastest Mac laptop available today, a 2.4GHz 8-core Intel Core i9, 64GB RAM, but…it’s a long, long way from the 16-core Mac Pro I usually use. When I live stream, I really put the CPU fans to work. I was tempted to try one of the new MacBooks with the Apple Silicon M1 processor, but I’m holding out for the higher-horsepower versions that will surely follow. I brought my Keychron K8 keyboard and an Apple Magic Touchpad at the last minute at Erika’s insistence – I was trying to pack lighter, heh.

External monitor on right: 15″ generic 1080p. Normally for monitors, you want high resolution, but when you’re streaming, you actually want a plain 1920×1080 because that’s the max size you’d stream out to your audience. The monitor has inputs for USB-C, HDMI, Mini DisplayPort, etc, but the nice thing is that it can use a single USB-C cable for both power and for the display signal.

When I’m streaming, the MacBook Pro’s high resolution display has Slack (for attendee chat & questions) and OBS’s control panel and preview. The 15″ 1080p monitor is what’s showing to the audience, so I’ve either got SSMS on there, or a PowerPoint.

I usually teach while standing up, but I’m trying the dining room table to see if I can make that work. If it turns out I just absolutely gotta stand, we’ve got a high bar table picked out at a local furniture store – if necessary, we’ll grab that and leave it in the rental house when we leave.

Cameras: two Sony A6100s each plugged into Elgato Cam Link 4K: in my normal home studio setup, I plugged the Sonys into a Blackmagic Design HDMI recorder, but that’s a PCI Express card. The Elgato is an easier/smaller USB solution for laptops. I do prefer a two-camera setup, alternating between a head-on camera and a from-the-side camera for casual discussionss. I still use my iPhone with the NDI camera app as a scenery camera during the breaks – that’s good for breaks, but not for a main camera because it has unpredictable latency, meaning your speech and moving lips can drift in & out of sync, and I can’t stand that.

Microphone: DPA In-Ear Broadcast Headset plugged into a Focusrite Clarett. I’m only reusing these because they’re part of my desktop setup, so I don’t need to buy them again, but they’re overkill for most streaming setups. I have a Rode VideoMic GO ($100) on-camera microphone as a backup, but the sound on that isn’t nearly as good since it also picks up echoes from the room, keyboard noises, laptop fan, etc.

Customizable keyboard control: Elgato Stream Deck. LCD keys so you can configure what they show & do, like switching between different cameras. Could you do this by memorizing a bunch of hotkeys? Probably, but as you start to rely on more advanced OBS functionality, like playing sound effects, this will come in super handy.

USB-C dock: CalDigit TS3 Plus: if you count the above devices, that’s 5 high-speed USB ports right there, plus Ethernet. The MacBook Pro only has 4 USB-C ports, plus it needs one for electric power, so I needed a USB-C dock. The CalDigit is by far the most stable dock I’ve found – a lot of USB-C docks flake out during sustained 4K streaming with multiple cameras.

Tripod mounting: Manfrotto Magic Arm plus Super Clamp: I could theoretically clamp these directly to the dining room table that I’m using for a desk, but they’d shake when I jostle the table. Instead, I brought a couple of tripods, plus these Magic Arms so I can mount multiple devices to the tripod, like a camera plus lights.

Green screen: collapsible with stand: at home, I used a wall-mounted pull-down green screen, but obviously that’s not gonna work on the road. This folds up, and I’ll have it behind me while I’m teaching. I expect to be teaching sitting down while I’m on the road, but I’d be completely delighted if I could put together a setup where I could continue standing & teaching. The box it shipped in was exactly the max dimensions for standard airline checked luggage. It collapses, but not small enough to fit in a regular checked bag – it’s still an odd shape.

Carrying the electronics: Pelican 1620 case: on one of our Iceland flights, the airport crew unloaded the baggage during a serious rainstorm. Our bags and their contents were all completely soaked with water. We didn’t have any electronics in ’em, thank goodness, but for this trip, I wanted something waterproof that I could check as luggage. (It won’t have any lithium batteries in it – I take those in my carryon.)

If you wanna see the results and you’ve got a Live Class Season Pass, hop into my live class page during any of my classes this month:

See you in class!


Can You Get Parameter Sniffing on Updates and Deletes?

Execution Plans
8 Comments

Sure you can – parameter sniffing is everywhere. Anytime that you have one piece of code that needs to handle a varying number of rows, you’re probably gonna have to worry about parameter sniffing.

I’ll start with any Stack Overflow database and run a couple of non-parameterized update statements. I’m using literals here, not parameters:

These two statements have two very different actual execution plans. The first plan, updating just Brent, is very simple. Reading from right to left, we scan the clustered index to find all the Brents, and then we update the clustered index:

 

Except…the simplicity is a vicious lie. We’re updating LastAccessDate, and before running this update, I’d already created half a dozen indexes:

Every one of ’em includes LastAccessDate. SQL Server doesn’t have the concept of asynchronous indexes, so all of the indexes have to be updated whenever we change their contents. Hover your mouse over the clustered index operator to see the truth: SQL Server’s updating those, too.

The second plan is different because there are a lot of Alexes at Stack Overflow – Alex is one of the most common names in the database. Read the second plan right to left, and the first thing SQL Server does is find & update all of the Alexes in the clustered index:

But since we’re updating a lot of rows, SQL Server changes the way it approaches updating all of these nonclustered indexes. After the clustered index operator, keep reading…

For every index, SQL Server sorts all of the Alexes in that order so we can find ’em faster in the index, and then updates the index. It also does this process one index at a time, serially, in order, for Halloween protection.

These are called narrow and wide plans.

Paul White is both narrow and tall, but his Query Buck is wide and short

Brent’s plan, which does all of the updates in a single Clustered Index Update operator, is called a narrow plan.

Alex’s plan, which breaks out the work for each index and does them one at a time, is called a wide plan.

No, that doesn’t make any sense to me either. They should be called short and tall plans, because frankly, Alex’s plan is the wide one. I don’t make the rules, though, Paul White does. Take it up with the Kiwi. Even when his rules are wrong, they’re still right. Or maybe they just feel that way.

Put this in a stored proc, and you’ve got parameter sniffing.

Here’s our stored procedure to handle updates:

And now depending on which parameter runs first – Brent Ozar or Alex – we’ll get a different execution plan.

When Brent runs first, we get the narrow plan, so:

  • Brent does 143,688 logical reads and has no worktable
  • Alex does 352,260 logical reads, has no worktable, gets a 72KB memory grant, and the clustered index scan is single-threaded

And when Alex runs first, we get the wide plan, so:

  • Alex does 553,530 logical reads – more than 3 times the number of pages in the entire table! – plus another 41,339 on a worktable, gets a 5MB memory grant, and the clustered index scan goes parallel (although the rest of the plan is serial because that’s how SQL Server does it)
  • Brent does 143,730 logical reads, plus 17 on the worktable

The differences in this case aren’t all that bad, but keep in mind that I’m dealing with just a 1GB table – the smallest one in the Stack Overflow database. The larger your data becomes, the more you have to watch out for this kind of thing.

That’s why I teach my Fundamentals of Parameter Sniffing and Mastering Parameter Sniffing classes. Folks with a Live Class Season Pass can start watching the Instant Replays whenever they want, or drop in live. I’m teaching a bunch of classes over the next few weeks:

See you in class!


“But Surely NOLOCK Is Okay If No One’s Changing Data, Right?”

Some of y’all, bless your hearts, are really, really, really in love with NOLOCK.

I’ve shown you how you get incorrect results when someone’s updating the rows, and I’ve shown how you get wrong-o results when someone’s updating unrelated rows. It doesn’t matter – there’s always one of you out there who believes NOLOCK is okay in their special situation.

The latest NOLOCK defender left me an angry comment:

I challenge you to show us one example of using nolock on a static database with incorrect results.

No problemo – I assume you mean tables that aren’t getting inserts, updates, or deletes. Well, folks can do index reorganize operations during business hours – it’s a completely online operation, even in Standard Edition – and that hoses NOLOCK too.

I’ll create a table and populate it with 100K rows, but do it in a way that makes it highly fragmented:

Then, in one window, I’ll reorganize the index:

While in another window, I’ll run a COUNT(*) repeatedly:

And presto: the results change, going up and down as the pages are moved around:

Even if you think no one’s doing index maintenance at your shop, the answer is still clear here: if you want to build safe, reliable code that produces accurate query results at all times, NOLOCK should not be involved.

I get why you use NOLOCK. You read a blog post somewhere once that said, “NOLOCK is faster because it does allocation scans instead of navigating those big hefty index pages.” But you never actually bothered to measure whether that’s a significant problem for you – and you never bothered to ask your users if they need accurate query results.

Sometimes, you really don’t need accurate query results, and in that case, NOLOCK can be fine. For example, one of my clients runs a monitoring query every few seconds just to see if any new carts have been started in their online store. They get the latest row, check to see whether it’s higher than the previous check, and if it is, they’re happy campers. If it isn’t, they don’t mind checking again to see whether they got bad data, and they check a couple of times before they sound an alarm. They don’t need accuracy on exactly how many carts have been started – they just want a rough idea that the latest value is going up. NOLOCK works fine for their needs.

But for the rest of y’all, when you’re ready to do things the right way, I welcome you to my SQL Server training classes.

You should probably take one of them quickly, before your boss reads this post and starts asking you why you were so vigorously defending your constant use of NOLOCK.


Free Webcast: Help! My SQL Server Maintenance is Taking Too Long!

Eyes up here, kid
“Time for a smoke break.”

You manage growing SQL Server databases with shrinking nightly maintenance windows. You just don’t have enough time left each night to do the necessary backups, corruption checking, index maintenance, and data jobs that your users and apps want to run. Cloud storage isn’t helping the problem, either.

Stop playing Tetris with your job schedules and step back for a second: are we doing the right things, at the right times, with the right SQL Server configuration?

In this session, I will give you three queries to run on your SQL Server to make sure you’re focused on the right problems. We’ll figure out how fast your backups are really going (and why), whether nightly index maintenance actually makes sense for you or not, and whether your hardware is suffering from serious performance issues.

Register now for the free webcast on Tuesday, February 16th. 


What SQL Server Feature Do You Wish Would Go Away?

Humor
45 Comments

I asked a two-part question on Twitter: if you could only remove one feature from SQL, why would it be cursors?

The responses were great:

Justin’s referring to MERGE’s hot mess.

But out of all of ’em, this one intrigues me the most:

How about you?


Query Hints You Can Use to Avoid Blocking

No, not NOLOCK. You get the wrong query results, even when your query results aren’t supposed to be changing.

I’m talking about when you need to do writes, but you want your query to get along with others in high concurrency environments, without creating a blocking firestorm.

SET LOCK_TIMEOUT 1000 – if you run this before your query starts, SQL Server will wait patiently for X milliseconds before giving up and aborting your own query. This is useful when a lot of short queries are running all the time, and I don’t want to give up instantly – but I do want to give up if there’s a long-running query blocking me. If you pass in a value of 0 milliseconds, SQL Server gives up instantly. For example, in the below query, I’m trying to delete all of Jon Skeet’s posts, but I want to abandon my query if anyone has locks that stop me on either the Users or Posts tables, and those locks stop me for over 1 second:

If you use this technique, or the other techniques described in this post, your query will fail with an error if it was blocked by someone. Most of us (myself included) don’t do a great job of checking for errors and retrying our query gracefully, so we could probably all use an hour to dig into Erland Sommarskog’s series on error handling to learn how to retry automatically.

 

WITH (NOWAIT) – this is kinda like setting lock_timeout 0, but it’s a table-level hint like NOLOCK. This means it only affects a single table in your query. For example, in the below query, I’m trying to delete all of Jon Skeet’s posts, but I want to abandon my query if someone has a lock on the Users table – but I’m willing to wait forever for locks on the Posts table:

 

SET DEADLOCK_PRIORITY LOW – this isn’t about blocking, but specifically deadlocks. When I set this at the beginning of my batch, I’m saying, “I’m willing to wait for blocking, but if a deadlock situation comes up, go ahead and kill my query, because I don’t mind trying it again.” You call it like we did LOCK_TIMEOUT:

 

WAIT_AT_LOW_PRIORITY – this one is just for index maintenance, and sadly, it doesn’t work for creates or drops. “Online” index rebuilds aren’t completely online: they need a brief schema modification lock to swap in the new copy of the index. This hint lets your completed index lurk patiently in the background while other queries finish, and then swap in later:

If that last one is helpful, then your next step is to do similar blocking avoidance with DML operations by using Kendra Little’s post on which operations support online & resumable.


Updated First Responder Kit and Consultant Toolkit for January 2021

You know how a lot of app release notes say “bug fixes and improvements”? We’re not gonna lie: this one’s all bug fixes. Huge shout out to the community contributors for catching stuff and contributing fixes.

New behavior for SQL Server 2008 & R2 users: sp_BlitzLock has never worked on SQL Server 2008, but starting this month, the installation of sp_BlitzLock simply fails with an error. In this month’s release, one of the checkins uses THROW, which isn’t available on 2008. This also means that if you use the Install-Core-Blitz-No-Query-Store.sql script to do installations, you’re going to get an error. It’s okay – all of the other stored procs install successfully. This behavior wasn’t intentional by any means, and we’d like that not to happen. If you’d like to improve sp_BlitzLock’s installer to bail out gracefully without an error, we’d gladly accept a pull request for that.

How I Use the First Responder Kit
Watch and learn

To get the new version:

Consultant Toolkit Changes

To help troubleshoot plan cache pollution due to unparameterized queries, there’s a new tab for the top 10 queries with the most duplicated plans in cache. It’s based off this troubleshooting query. We also capture the top 3 plans for each query, sorted by duration descending.

This does mean a new querymanifest.json and spreadsheet. If you’ve customized those files, you’ll need to copy your changes into the new querymanifest.json and spreadsheet if you want to get the new tab.

sp_Blitz Changes

  • Fix: standardized the @Help = 1 behavior across this & all of the procs. When you run ’em with @Help = 1, they return the help info and immediately return. (#2717, thanks Todd Chitt.)
  • Fix: offline databases no longer show up in auto-shrink or page verification checks. (#2750, thanks Erik Darling and Ali Hacks.)

sp_BlitzFirst Changes

  • Improvement: if you turn on @Debug = 1, it now raises “Running CheckID #___” messages in the messages tab in SSMS to help you debug. (#2739, thanks Adrian Buckman.)
  • Fix: fix warning typo on queries with max memory grant and clarified that the query’s memory grant is a percentage of max workspace. (#2732, thanks Ali Hacks.)
  • Fix: forwarded fetches in table variables now show that it’s a table variable rather than a temp table. (#2742)
  • Fix: forwarded fetches in tempdb objects were reporting the cumulative number of fetches since the object was created rather than the forwarded fetches created during the sp_BlitzFirst sample. (#2743)

sp_BlitzIndex Changes

  • Fix: if a missing index recommendation included columns whose names included ampersands, sp_BlitzIndex was failing with an error. (#2734, thanks snckirkmarken.)

sp_BlitzLock Changes

  • Improvement: now runs better without sysadmin or db_owner permissions by wrapping an update statement in a try/catch block. (#2751, thanks Chad Baldwin.)

sp_ineachdb Changes:

  • Fix: case sensitivity bug introduced in last month’s version. (#2737, #2744#2747, and #2748, thanks Jefferson Elias and sm8680.)
  • Fix: GO was removed at the end of last month’s version, which broke the Install-All-Scripts.sql installer when running on brand-new servers. (#2753, thanks TLSystemsLLC.)

sp_BlitzWho Changes

  • Fix: we’re supposed to sort queries from longest-running to shortest-running, but last month’s version introduced a bug in that. This should fix it. (#2721, thanks Erik Darling and Ali Hacks.)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

Watch Us Working On It

I live streamed while I was working on the forwarded fetches fixes for sp_BlitzFirst:


We Lost Gareth Swanepoel.

Community
21 Comments

Thirteen years ago, I met Gareth Swanepoel (@GarethSwan) for the first time, interviewed him for his first real sysadmin job, and … turned him down.

Gareth Swanepoel circa 2007

Thankfully, my manager overruled me and hired him anyway.

Thank goodness – and thank goodness Gareth forgave me for my incredibly bad judgment, hahaha.

Gareth proved to be one of the hardest-working, most-determined people I’d ever met. He was endlessly curious – it wasn’t enough to solve a problem – he wanted to know how the thing worked, whatever it was. He wanted to know everything about hardware, operating systems, networking, science, and eventually, databases.

When I left Southern Wine & Spirits, Gareth had the “good fortune” of cleaning up the messes I’d left behind. He grew to love databases, moving on to Pragmatic Works, then teaching classes to share what he knew, then moved on to the big goal that most SQL Server people dream of accomplishing: he got a job at Microsoft.

Along the way, we shared so many laughs, beers, and meats. When Tim Ford & I launched SQLCruise, Gareth immediately wanted to do a SQLBBQ at his house, cooking the pre-cruise party, sharing his pool, and introducing everyone to his wonderful family.

To know Gareth was to smile and to share.

Gareth greeted everyone with incredibly wide smiles, and he wanted to know everything about you. If you knew him, you surely witnessed him nodding excitedly and saying, “Yeah?” He wanted to encourage you to go on about whatever you were interested in, and he wanted to develop the same interests too. He wanted to know what made you happy, and it made him happy, too.

 

I have so many great memories with Gareth. The last time we were hanging out, we were racing a Porsche around in the Nevada desert and eating a crazy dinner with Ryan & Andrea Allred. He said yes to everything, and the joy that he found everywhere was inspiring.

Gareth brought the best parts of childhood into adulthood. He was endlessly curious and excited. From his Lego collection to his love of Disney to his passion for databases, he lived life to its fullest.

Sadly, we lost Gareth last week due to COVID-19.

The data community was so lucky to have him while we did. He was an incredible blessing to those who knew him, and he touched so many of us with his excitement and generosity.

There’s a GoFundMe to help his two children get started on the right foot, and you can watch his memorial service, part 1 and part 2. Gareth’s sister begins speaking at 48 minutes into part 1.


Which Microsoft Certification Should You Get?

You work with data, and you’re thinking about getting a Microsoft certification. Which one should you get?

If you need the certification to get a job…

Then get the certification you see mentioned the most often in the descriptions of the job you’re looking for.

This is just a little tricky because certifications shown on jobs often lag behind by a couple/few years. The HR department may just have copy/pasted the job requirements from the last job ad they posted. That certification may no longer even be offered by Microsoft. (Microsoft has a tendency of reinventing their certification lineup every 2-3 years, throwing the whole thing into the SlapChop for a remix, seemingly just to make their customers spend more money on new certs while learning more about Microsoft’s cloud products.)

I gotta tell you that the certificate absolutely, positively will NOT get you the job. It’s just that when you’re applying for a publicly posted job, you’re competing with a huge, huge stack of wildly unqualified applicants. In order to cut down the stack of resumes, the hiring department will use any kind of filter they can think of, and often, that’s going to be whether someone mentions a Microsoft cert number on their resume.

I don’t agree with that strategy, but I understand why they have to do it.

Long term, though, I’d say that a better strategy for getting a job is to start building your brand online. Start writing blog posts and delivering presentations, and after a couple/few years, you’ll have people coming to you with jobs. You won’t have to worry about standing out in a stack of resumes because you’ll be the only candidate. It takes time, but it’s a much better long term gamble than trying to stand out in a stack.

If you need the certification for your current job…

Sometimes companies require their staff to show that they’re continuously improving, and one way they measure it is by seeing you pass industry tests.

In that case, get whatever your manager or HR department wants you to get.

If they don’t care which one you get, then get the oldest one that’s relevant to your job. That might sound counter-intuitive: you might think you want the latest and greatest cert. The problem is that the new certs will have less training material available. If you’re only trying to check a box for your HR team, an older cert with tons of free YouTube prep videos and cheap PDFs will be easier to pass.

If you want the certification for personal growth…

Then get the newest certification that calls to you personally. It’s going to have less established training material available, but that’s okay: you’re doing this for personal growth anyway.

The best personal growth doesn’t come from reading a book with a bunch of canned test questions and answers. The best growth comes from understanding what the test is trying to measure, and then building your own experience working with that tool. Make a list of the tools and techniques you want to experiment with, and then check out Microsoft’s SQL Workshops repository on Github. It’s full of free self-learning resources for people like you who want to grow their own skills.

No matter why you want the exam,
just take it first.

You have a serious case of impostor syndrome: you believe you’re not qualified to pass the test. Because of that, you’re going to fill yourself with self-doubt as you prep for the exam, and you’re going to think the exam is incredibly challenging. It’s not – it’s just a test. It’s not the final boss of your life’s video game. After all, I bet you’ve met plenty of people who were supposedly certified, but couldn’t solve the most basic problems at work.

Go take it first. You’re going to fail, but you’ll be comfortable with that failure – because you didn’t expect to pass. Then, you’ll have a better feeling of what the test-taking process is like, the level of detail in the questions, and how much work you need to put into studying.

Go get ’em!


Menu