Sit Down. We Need to Talk About Your Scalar Functions.

Thanks for coming today. I apologize for baiting you in here with a picture of toilet paper. That’s not even in the US – that’s from my Iceland trip, before I flew back here to the US. Look, we all make mistakes. That’s not the point of this post.

You see, I’ve been reading this KB article about bugs in scalar function inlining. SQL Server 2019 tries to inline your functions using a technology they called Froid, but ever since Froid came out, it’s had one bug after another.

And I don’t think it’s Microsoft’s fault.

I think it’s your fault.

Because reading this KB article, I’m increasingly convinced that y’all are completely insane. You’re doing things with functions that I can’t even comprehend doing once, let alone doing so often that you feel the need to encapsulate the logic in a function and use it over and over again. Microsoft’s having to deal with functions that shouldn’t have ever passed a code review, let alone gone into production.

Here, you read it – and instead of reading it as a bug list, read it as a Microsoft support person who’s hearing a customer complain about inaccurate function results when their function has ___ in it:

 


This cumulative update includes several fixes across the following areas for scenarios in which a query that uses Scalar UDF Inlining may return an error or unexpected results:
  • Type mismatch error if the return type of the UDF is sql_variant (added in Microsoft SQL Server 2019 CU2)
  • UDFs referencing labels without an associated GOTO command return incorrect results (added in Microsoft SQL Server 2019 CU2)
  • Uninitialized variables used in condition (IF-ELSE) statements cause errors (added in Microsoft SQL Server 2019 CU2)
This cumulative update also blocks Inlining in the following scenarios:
  • If the UDF references certain intrinsic functions (e.g. @@ROWCOUNT) that may alter the results when inlined (added in Microsoft SQL Server 2019 CU2)
  • Aggregate functions being passed as parameters to a scalar UDF (added in Microsoft SQL Server 2019 CU2)
  • If the UDF references built-in views (for example: OBJECT_ID) (added in Microsoft SQL Server 2019 CU2)
  • If the UDF uses XML methods (added in Microsoft SQL Server 2019 CU4)
  • If the UDF contains a SELECT with ORDER BY without a “TOP 1” (added in Microsoft SQL Server 2019 CU4)
  • If the SELECT query performs an assignment in conjunction with the ORDER BY clause (e.g. SELECT @x = @x +1 FROM table ORDER BY column_name) (added in Microsoft SQL Server 2019 CU4)
  • If the UDF contains multiple RETURN statements (added in Microsoft SQL Server 2019 CU5)

 

When writing functions like that, your scientists were so preoccupied with whether they could that they didn’t stop to think if they should.

If you have scalar functions with anything but a SELECT in it, you should probably stop holding out hope that SQL Server 2019 is going to save you, and start rolling up your sleeves to rewrite those functions. By the time CU9 comes out, I’m kinda expecting scalar function inlining to be disabled altogether, and for Joe Sack to have thrown himself off a cliff in despair, having seen so many of your terrible functions.


How to Upgrade SQL Server Evaluation Edition to Developer Edition

Six months ago today, you installed SQL Server Evaluation Edition. You told yourself you’d take care of that sooner or later, and…you forgot. Now, your SQL Server won’t start:

Error 17051: SQL Server evaluation edition has expired.

Awww, shucks. Good news though! You don’t have to uninstall & reinstall from scratch – there’s an easier way.

Go download SQL Server Developer Edition, and start the installation media download process. It’s going to download the bits into C:\SQL2019. Once the download has finished, go into Maintenance, Edition Upgrade (at the top):

At this point, you may need to set the date back a few days on your computer and disconnect it from the network (to make sure it doesn’t automatically update the date/time.) In theory, you shouldn’t need to do this – but in practice for me, I’ve needed to set the date back so that SQL Server could at least start during the upgrade process. If you’re running a VM, you will also need to configure the VM so that it doesn’t automatically update the date/time from the host. In VMware Fusion, go into Virtual Machine, Settings, Advanced, and uncheck the “Synchronize time” option.

Next up in the installer, choose which edition you’d like to upgrade to – not the one you already have, but the destination edition:

Choose your character

Next, pick the instance you’re upgrading. In my case, I just had one instance on my VM:

Then click upgrade:

After about 30 seconds, you’re back in business, and SQL Server is running again. If this screen sits idle, not doing anything, it’s likely because your date/time was set back to current again, and the SQL Server service is trying but failing to start. You’ll need to go fix that and try the process again. That’s okay. People like us don’t read the instructions the first time, do we? I feel you. It happened to me too, thus this blog post.

Now you can set the computer’s time back to current, enable time syncing again, connect it to the network, and get back to work.


Bad Idea Jeans: Building Big Query Plans

When you build a monitoring tool that sends advice via email, you discover two things:

  • Some people have really, really big execution plans
  • Email servers have reasonable limits on file attachment sizes

Add those two things together, and Richie found himself working on a new feature for SQL ConstantCare® that would automatically break up advice into multiple emails if we needed to attach several big query plans. He needed to test it, so I whipped up some T-SQL that would quickly build a large query plans.

You can copy/paste this directly into SSMS, no prep required, and get the estimated execution plan:

When I started writing the query gradually, layering on one CTE at a time, the first several 7 were no big deal. The query compilation time was noticeable for 8, but that 9th CTE, hoowee. You should see it yourself.

Copy/paste that query into SSMS, and just getting the estimated query plan takes tens of seconds. Once it generates, you can right-click on the Select operator and go into properties to see just how hard it worked – in my case, it took 26 seconds of CPU time and clock time to build the plan.

If you save the file, the resulting sqlplan file is over 10MB in size, and that causes a bit of a problem for SSMS, too. If you right-click on the plan and click Show XML, you get an error about needing to edit a registry key if you want to show a plan that big.

Not bad for 12 lines of T-SQL.

And that certainly got the job done for Richie.

But imagine what we could do with 13:

Adding that one extra join takes compilation time to 10 minutes on my machine.

For bonus points, you can run several of these at a time with SQLQueryStress. At first, the queries don’t appear to be waiting on anything at all – they’re just churning through CPU as fast as they can get it, and the server doesn’t show any wait times:

But wait for a little while, and RESOURCE_SEMAPHORE_QUERY_COMPILE times pop up:

Fun times.

So, what’s the moral of this story, you ask? Like so many posts in my Bad Idea Jeans category, and so many of my readers, this story has no morals.


Finally, You Can Buy Brent Ozar Unlimited Swag.

Brent Ozar Unlimited coffee mug and hatWhen I’m doing a webcast, the same question always comes up:

“Hey, how do I get one of those coffee mugs?”

We used to give ’em away during our in-person live training classes, but since I stopped doing those, we needed to come up with a plan B.

Now you can buy Brent Ozar Unlimited swag: mugs, shirts, stickers, and hoodies!

I’m not trying to make a buck on these: the pricing is literally set as low as Streamlabs will allow me to set it, hahaha, with zero money going to me. I’m only doing this so y’all can have as much fun with the logo as I do. You can read about the backstory of our logo here: it originally started out as something else, but of course, given our absurd luck, we picked the only logo that could get us thrown in war prison.

I’m really happy with the quality, too: the embroidery looks great.

Happy shopping!


How Doug Lowing Gets the Most Out of His Live Class Season Pass

With my Live Class Season Pass, students have the ability to attend as many of my live online classes as they want for an entire year straight, plus they can view the instant replay recordings whenever they want.

Doug Lowing, one of the students, shared with me how he gets the most out of his:

  1. Probably the most important step… View the entire class before the live class. Do all the exercises. This is pre-learning. It is difficult to flip between video, exercise, locating files, reading sp_blitz output, and absorb new concepts. Prelearning introduces the pace, vocabulary, and concepts, preparing me to absorb it during live class.
  2. Participate in the live class and exercises. The live comments will fill gaps I missed the pre-learning
  3. Practice after the live class. At least 3 times go through all the videos and exercises.
  4. If desired, sign up for the live classes again and go through all the exercises. Do not comment in Slack, just read and listen and do the homework.

It’s a lot of work – but that’s why I’m such a believer in the Live Class Season Pass approach. When it comes to really hard topics like my Mastering Parameter Sniffing class, you’re going to get a lot of value by revisiting the material again each time that you’re facing parameter sniffing issues live in production.

Instead of scanning your cryptic handwritten notes from some short conference session you saw once six months ago, why not watch the exact relevant in-depth module again? You can pause my video as you work through the problems alongside me, and you can join in on live classes anytime to ask followup questions – like today’s How I Use the First Responder Kit class, broadcasting live now as we speak.

It works for Doug, and it’ll work for you. It’s on sale for just $995 this month. Why wait? Start learning right now! See you in class.


Page Life Expectancy Doesn’t Mean Jack, and You Should Stop Looking At It.

Page Life Expectancy is a Perfmon counter that’s supposed to track how long pages stay in memory before they get flushed out to make room for other pages that are needed instead.

Paul Randal has blogged this a few times, and rather than rehash ’em, I’d rather point you to a couple of his roundups:

And layer in a few of my own observations for context:

Page Life Expectancy goes up 1 for each second that you’re not under memory pressure. Restart your SQL Server instance and watch PLE: it starts at 1, and goes up by 1 for each second of uptime. 5 minutes of uptime = PLE 300. For the first 5 minutes, it’s not like your server’s under memory pressure – it just woke up, for crying out loud. Give him a minute. Well, 15-20, I guess, because PLE is near useless during that time span.

Page Life Expectancy drops can be triggered by confusing operations. By default, any one running query can get a memory grant the size of 25% of your buffer pool. Run a few of those queries at the same time, and your buffer pool gets drained – but PLE doesn’t necessarily drop. However, the instant an unrelated query runs and needs to get data that isn’t cached in RAM, your PLE will drop catastrophically. Which queries are at fault? The queries getting large grants, or the queries doing reads? “Yes.”

Page Life Expectancy is a lagging indicator. Lagging indicators are something that tell you about an emergency long after the emergency has happened, and lagging indicators don’t recover quickly after the emergency is over. When you combine the above two problems – PLE only rising 1 per second, and PLE dropping at times that aren’t necessarily tied to the dropping buffer pool – then if you’re alerting based on low PLE numbers, you could have already missed the emergency. When you go log in and look for what long-running queries are running, it’s already too late. Instead, you should be using leading indicators: things that tell you a problem is coming up.

With that stuff in mind, I’ve removed the Page Life Expectancy warnings from sp_BlitzFirst altogether. I think they’re distracting y’all from the real leading indicator, wait stats – and of course, sp_BlitzFirst shows that, so I’d rather focus your attention there.


What Was the Worst Database Decision You Ever Made?

Eyes up here, kidI’m not talking about “oops” mistakes.

I’m talking about things you decided to do on purpose, and then deeply regretted later.

Me personally, my biggest database regret was that time almost twenty years ago when I ran a Profiler trace from my desktop computer.

My very slow desktop computer.

Which was connected to production over a VPN.

<sigh>

I can’t remember what happened first – running out of space on my C drive, or my VPN disconnecting – but in the hot mess of it all, I fielded a call from my manager asking about why the production SQL Server kept freezing up. Everything about that situation was frustrating.


[Video] How to Find Queries Using OPTION RECOMPILE (And Their Parameters)

For years, I hated troubleshooting servers with high CPU usage caused by queries constantly asking for new execution plans. Hated it. SQL Server just doesn’t make it easy to find queries with recompile hints.

Then Erik Darling’s sp_HumanEvents came along.

And now troubleshooting frequent compilations is as easy as this:

Which produces this:

ARE YOU KIDDING ME? It gives me the parameters, the number of times the query compiled, how much time was spent doing those compilations, and more! It’s a SINGLE LINE OF T-SQL that sets up an Extended Events session, waits, tears it back down, and then lays out the output in a ridiculously easy-to-consume format!

Or, say you’re facing frequent re-compilations due to stats changes in high-turnover tables. No problem:

Here, I’m running a workload that causes a stats update in the middle of the workload, which triggers a recompile of a normally stable stored procedure:

It doesn’t get any easier than that. These are by no means the only tricks up sp_HumanEvents’ sleeve, either: it has all kinds of parameters to help you track down issues that would usually involve a lot of work.

If you prefer moving pictures with audio gibberish, here’s a live stream of me setting up the demo for this blog post:

Enjoyed this session? Follow me on TwitchYouTube, or Facebook to be alerted whenever I’m streaming. Here’s the code I used with the Stack Overflow database in the live session:

 


A Single Database View Cost Me $50/Day for Months.

Our SQL Server monitoring product, SQL ConstantCare®, uses Amazon Aurora on the back end. I blogged about our choice of database back in 2018, and at the time, I said:

I know, I know. The Microsoft geek in you – and me – wants us to use SQL Server on the back end, but here’s the deal: SQL Server is not self-managing or cheap. (After all, you and I make very good livings keeping SQL Server up and running, right?) Whenever possible in this product, I wanted to use something inexpensive that somebody else managed for us.

Well, like anything else in the cloud, the costs creep up on you slowly over time.

SQL ConstantCareIt took me a really long time to notice because most of our Amazon hosting costs are due to the lab VMs for our Mastering training classes. Students get pretty beefy VMs to run their labs, and I run a lot of training classes. I really should have had separate AWS accounts for each of our business uses – these days, the best practice is to isolate your environments across a lot of accounts. Instead, I was using tags in AWS to break things up by budget.

One day, it hit me out of nowhere as I was looking at Billing Explorer: the AWS Aurora RDS Postgres costs had gotten pretty expensive over time.

We were spending about $60/day, but we hadn’t upsized our instances. What was going on?

A peek into our bill revealed a nasty surprise:

A few lines down, there’s a pretty doggone large number of IOs: we’d done 8.3 biiiiiiillion iops, ringing in at an extra $1,663.68 that month.

We jumped into AWS Performance Insights, a monitoring tool that’s included free with RDS. It gives you a really nice point-and-click GUI to analyze queries by CPU, reads, writes, and more. I expected to find a single ugly query, but lots of them were doing a lot of reads, and it took me an hour or two of digging to realize…

All of the read-intensive queries were calling the same view.

It wasn’t even a large view: it was just a view that fetched the prior collection number & date for a given client’s server. If we were analyzing your metrics, we needed to join to your prior collection so we could see what changed. The tables in the view had less than 100K rows – it was just that it was getting a pretty bad execution plan – well, bad relative to the size of the small query.

Each time we hit the view, we were incurring a few hundred milliseconds of time, which doesn’t sound like much – but we called that view a lot.

We debated tuning the view and its indexes, but Richie leapt into action and turned it into a table instead. After all, when we process a client’s data, we only need to fetch the prior collection ID & date once – it’s not like they’re adding additional past collections. Time only goes forward.

You can see the day when Richie implemented the table:

Our IO costs dropped by fifty bucks a day just with that one change.

Performance tuning in the cloud is about the tipping point.

No, not the tipping point of index seeks vs table scans: the point where increasing ongoing costs mean that you should stop what you’re doing and focus on eliminating some technical debt.

When we’re in the cloud, it’s up to us, the data professionals, to:

  • Review our monthly bill from time to time
  • Understand what IT operations are generating new costs (like in our case, read IOs)
  • Dig into which queries are causing those IT operations (like in our case, an inefficient view)
  • Figure out the most cost-effective fix for those operations
  • And then maybe most importantly for your career, produce the fix, and take credit for the new lower costs

Good data professionals can pay for themselves in reduced hosting costs. I know because I do it for other folks all the time – I just needed to do it for myself, too. Eagle-eyed readers will note that the dates on these screenshots are February. Yes, this happened a couple months ago – but what prompted me to write the post is that our costs have started creeping back up again, hahaha, so it’s time for me to go through and do another round of checks!


When a Columnstore Index Makes Your Query Fail

If you wanna count the number of rows in a table, I’ve usually said, “It doesn’t matter what you put inside the COUNT() – it can be COUNT(*), COUNT(Id), COUNT(1), or COUNT(‘Chocula’), or even COUNT(1/0) – it all works the same.”

And that was true right up until I added a columnstore index.

Start with the Stack Overflow database – any size will do – and do a plain ol’ COUNT when no nonclustered indexes are present:

The actual execution plans for all three queries are the same:

Add a nonclustered rowstore index on any column and try the queries again:

And again, all three query plans are the same because SQL Server optimizes away the contents of the COUNT(), using the index in all three cases:

The index gets used every time. So for years, ever since Jeremiah Peschka or Kendra Little (I can’t remember which one) showed me that 1/0 trick, I’ve used COUNT(1/0) because it’s funny.

Well, the laughing stops when you add a nonclustered columnstore index:

Because only two of those queries work at all! The first two work fine, scanning the nonclustered columnstore index, but the COUNT(1/0) fails with the obvious error:

The estimated execution plans for all three queries are now slightly different:

The COUNT(1/0) plan has a Compute Scalar operator that the other two plans don’t have, and here’s the disappointing definition:

Diabolical.

So what’s the lesson here? Well, don’t be amusing with your T-SQL, I guess, because sooner or later, the query optimizer behavior you depended on can change.


How to Make Online Presentations Fun and Interactive with OBS Scenes #tsql2sday

In my recent post about how I use Streamlabs OBS for training classes & webcasts, I mentioned that to keep things lively, you wanna have a variety of scenes that you can switch between to keep the viewer engaged. Imagine a TV show or sporting event that only used one camera angle and screen layout for 30-60 minutes on end: you’d be bored out of your gourd, and you’d tune out.

For OBS, that means using scenes. I’m going to give you a high-level overview of the scenes I use, but for an even more technical post, check out Suz Hinton’s live coding setup – she’s Noopkat on Twitch.

When I first started out, I used an off-the-shelf set of scenes. After streaming for a few weeks, I made notes of what I wanted in my scenes and what I didn’t, and watched a bunch of streams in the Science & Technology category for inspiration. Armed with that knowledge, I hired szvoreny from the Fiverr stream designers to make a custom set of scene overlays. He does awesome work and I’d recommend him highly, but be patient – lots of folks are hiring designers for custom streams right now, and good stuff takes time. Expect a 1-2 week turnaround despite Fiverr’s promise of 3-4 days.

Starting Scene

I start streaming with this scene about 5 minutes before I go live on camera. People get notifications when their followed streamers are going live, and I want them to have a few minutes to get the browser open and make some popcorn. This scene shows what’s about to start and when, plus feeds from cameras 2 & 3 – more on their placement in a minute:

I type in the “starting at the top of the hour” manually before I go live. At some point later, I’d like a little more automated solution that pulls stuff from a text file with my upcoming calendar, but baby steps.

Live Scene

When gamers stream, they have an overlay that shows their game in a window, plus their webcam in a window, plus their recent followers, subscribers, etc. I take a minimal approach instead: I just show my desktop, overlay my webcam on top of it, and that’s it for permanent stuff. Here’s what it looks like when I’m working in Github, for example:

Comments automatically pop up above my head and disappear after 15 seconds. Note that you’re seeing live comments from multiple sources in that screenshot, both YouTube and Facebook. Like I mentioned in my Streamlabs OBS post, I use Restream to simultaneously stream to Twitch, YouTube, and Facebook, and Restream Chat centralizes all of the discussion. Everybody feels like they’re participating regardless of where they’re leaving comments, and all of the comments show up onscreen, live, in the same format.

I’m not actually sharing my whole desktop – I’m only showing a 1920×1080 portion of a 4K monitor. This means I can use the sides & bottom for things I need while I’m streaming. For example, when I streamed How to Think Like the Engine All-Demo Edition, I had my presentation notes on the left, and the Restream Chat window on the right so I can see what people are saying:

I run Streamlabs OBS on monitor 2 (the smaller right side 1080p one) so I can monitor what’s going out to the audience just in case something goes horribly awry.

When I want to show PowerPoint, I have a second Live Scene that shares monitor 2 instead, and also uses a bigger picture of me since I don’t need as much space for demos, and I often step to the side to point stuff out onscreen like some kind of weather-person, hahaha:

This lets me use Monitor 1 for my PowerPoint presenter view, so I can still see what slides are coming up next.

Camera 2 Chat Scene

From time to time, when I want to discuss something at length or when a particularly good question comes in, I switch to this scene showing the chat, most recent subscriber & follower, and my camera 2:

 

Or this scene, which shows my camera 2 fullscreen – and in this example, Marc Tomkinson just subscribed, so it’s showing a thank-you animation as soon as he subscribes, but then it disappears a few seconds later:

This camera-2 setup feels like a much more intimate atmosphere with the audience, just you and me, and it’s also more three dimensional since I’m not a floating character standing in front of the demos. It gives the audience a feel for what my office is like.

Camera 2 is another Sony A6100 on the side of my desk, plugged into my Blackmagic Design DeckLink Quad HDMI card:

It looks tilted just because of where I’m standing. The camera is right in front of a @ChrissieZullo (Instagramstore) print that a friend & client gave me that I really liked a lot. I switch out different prints & drawings here. I find that when I look at a piece of art I love, especially a smiling face, I light up more.

I wouldn’t recommend putting the camera on a tripod on your desk because it’ll vibrate whenever you lean on the desk or type. As I type this, a replacement floor-standing light stand is en route that will host both the light you see and the camera. Since they’ll be mounted to the floor, they won’t be susceptible to the vibrations. I should have done that from the get-go.

Sponsor Shout-Out Scene

On weeks when someone is sponsoring BrentOzar.com, I take a break every so often on the webcast, stop, and do a shout-out to the sponsor. I talk about what they’re trying to promote this week, and give folks a URL to learn more, like BrentOzar.com/go/experts to register for my free webcast with Pinal Dave:

Can I just stop here for a second and say that it’s utterly, ridiculously amazing that I get to do this for a living? You can too. I wrote Rock Stars, Normal People, and You ten years ago, and I still believe wholeheartedly in that today. The whole reason I’m writing this is to give you, dear reader, all the tools you need in order to make a living doing what you love. It’s all laid out here in front of you – you just gotta have the dedication to do it. Get ‘er done.

Pause Scene

Every 45-60 minutes, I take a bio break and fire up this scene with camera 3, and the attendees can also still see each other chatting on all the different stream services too:

This camera’s really low-budget, though, since quality doesn’t matter much. Older iPhones show up from time to time on Woot, and I paid $140 for an iPhone 7. Using them as a camera is a little convoluted. When you plug an iPhone into an Apple, the screen of the iPhone (not the camera, mind, you but the screen) shows up as a camera source in most video apps. This was probably designed for app developers who want to record their apps for demos. However, when you use the app Full Screen Camera, the iPhone’s camera (either front or rear) takes up the whole display with no buttons, so you can use the camera as an input source.

Add a tiny tripod and a 10′ iPhone cable, and you’ve got a webcam with a built-in monitor for under $200. Is that crazy or what?

It’s kinda neat because they see the current daylight & weather in San Diego, plus they see me walking past as I leave the office to go get hydration. I move it around to different places – sometimes it shows the window, sometimes art on the wall, sometimes my desk.

When I trigger the Pause scene and I’m streaming to the public on Twitch, then 15 seconds after I trigger the Pause scene, my Elgato Stream Deck also triggers 90 seconds of ads. This lets me reset the Twitch ad clock for folks so they don’t see a pre-roll ad when they come into my channel.

Transitioning Between Scenes

I mentioned in my last streaming post about how I have keys set up on an Elgato Stream Deck for each scene, and when I hit a scene’s key, it switches over to it. It’s just a little bit more complicated than that.

Each scene has a button, but it’s actually a multi-action button: a single button configured to perform several actions in a row. For example, when I switch to my Live 1 scene (the VM or RDP window), it:

  • Switches the scene
  • Turns on the right lights (and turns off the unneeded ones)
  • Starts (or continues) recording in OBS
  • Un-mutes my microphone
  • Makes a marker in the Twitch recording (useful for folks who want to skip around to parts of the stream where things changed)

If my lights are already on, OBS is already recording, or my microphone is already unmuted, then the latter 3 actions have no effect. However, they save my bacon if I forgot to start recording or unmute my microphone before the webcast started.

The Pause scene basically does the opposite of all those actions: it opens Pretzel (podsafe music player) and hits Play, switches to the Pause scene, mutes my microphone, turns off all my lights, and stops the recording. Streamlabs OBS then puts the session recording in a folder on my computer. During my private training classes, the newly arrived file triggers the Vimeo app to automatically upload that module’s recording to Vimeo.com. After the day’s classes, I can then grab the URLs for each module’s videos, and update the students’ Instant Replay recordings in their training account.

If you wanna learn more about how I use the Stream Deck, I did a Twitter thread with info about each of my Stream Deck keys.

The end result is a lively, fun stream.

 

When you watch typical game streams on Twitch, the game itself provides a lot of visual variety and action. When you’re watching someone live code or give a presentation, it’s up to the presenter to bring a level of variety and interaction. Overlays and scenes are great tools for that goal.

Having a lively, fun stream is harder than a lively, fun in-person presentation. No doubt.

But I bet you’ll find the result totally worth it. Now that I’ve got the plumbing in place, I find this kind of online session just as fun as in-person ones because folks can be so interactive.


[Video] Top 3 Pitfalls of Managing Your Azure SQL DB Costs

You manage servers, not invoices. Your company has started to move their databases into Azure, and they’re asking questions that you’re just not ready to answer. How big does your server need to be when you’re moving an existing database up to the cloud? How are servers measured? How do you measure to know if your new Azure SQL DB is overwhelmed, or if it’s a code or index problem? How do you make the right adjustments once you’re up there?

In some webcasts, I give you all the answers. In this webcast, I’m mostly just giving you the questions – but I’d rather have you learn them quickly in a short webcast instead of learning ’em the hard way. Here goes:

If you enjoyed this webcast, you can catch me doing ’em live by following me on Twitch, YouTube, or Facebook.


Attention Live Class Season Pass Holders: New Mastering Parameter Sniffing Videos Available!

Time to get your learn on: this week’s Mastering Parameter Sniffing class recordings are already available in your account here if you’ve got a Live Class Season Pass. (What’s that? You don’t have one yet? Well, tell you what: click here and I’ll make you a special deal.)

Just to give you a taste, the first module, How Index Tuning Reduces the Blast Radius, is free to watch through this weekend:

(Videos can’t be embedded in the email newsletter or RSS feeds – if you’re reading there, click here to watch the video.)

I bet you’ve sat in boring webcasts before where you can’t see the speaker, they’ve got a really tinny microphone with all kinds of booming echoes, and there’s no interactivity between you and Charlie Brown’s parents on the other side of the screen. My sessions are the opposite of that: super interactive, high quality info, and high quality delivery. I think it’s even better than classroom sessions because folks are often more comfortable typing in questions, sharing demo code that they want me to try, and even multi-tasking – keeping the live session up while they’re doing other things.

The next Season Pass rotation starts June 16th.
Wanna get weeks of training for $995?

My Live Class Season Pass holders always get free access to any of my live online classes and their recordings – including new stuff like this one that I bring out during the course of your subscription. You don’t even have to register: you can just drop in on any of these when your schedule permits:

Here’s the order I recommend taking ’em in so you’re not overwhelmed:

  • Fundamentals: Indexes, then Queries, then Parameter Sniffing
  • Mastering: Indexes, (optional: Parameter Sniffing), then Queries, then Server Tuning

The very next rotation starts next Tuesday, June 15. You can take the above classes in order, and by September, your tuning skills will be absolutely next-level. Plus, your learning isn’t done there: you can keep dropping in on my live classes and watching the recordings again all year long, picking up things you missed.

Level up your skills now for just $995. See you in class!


Updated First Responder Kit and Consultant Toolkit for June 2020

If you’re saving the contents of sp_BlitzFirst, sp_BlitzCache, and sp_BlitzWho to tables every 15 minutes, you should definitely install this month’s updates. There are several key improvements for you in here to help make it easier to troubleshoot plan cache rollover due to unparameterized queries, plus way easier to gather query plans that are having parameter sniffing problems.

If you’re not saving this stuff to table, it’s as easy as creating a diagnostic database, and then setting up an Agent job to run this every 15 minutes:

It automatically creates the tables and prunes them to keep just 7 days worth of history by default, and you get tons of useful diagnostic data about why your server was slow last night, and what queries were having parameter sniffing issues around that time.

Brent Ozar's Consultant ToolkitTo get the new version:

Consultant Toolkit Changes

Updated the First Responder Kit with this month’s script updates, plus:

sp_Blitz Changes

sp_BlitzCache Changes

  • Improvement: warns about lots of single-use plans in the cache. (#2359, #2390, #2398, thanks Erik Darling.)
  • Fix: May’s release broke on SQL Server 2008 & R2 due to a change in the way that 2012 & newer – you know, the versions you’re supposed to be running since they’re supported – show pages in sys.dm_os_memory_clerks. (#2372, thanks West Seattle Coug.)
  • Fix: wasn’t showing queries whose duration exceeded the @MinutesBack setting. For example, if you ran it for @MinutesBack = 15, and a query had been running for hours and just happened to finish in the last 15 minutes, it wouldn’t show up. We were using the last_execution_time (date/time) from sys.dm_exec_query_stats, but that records when the query starts, not when it finishes. This also adds a new LastCompletionTime to the sp_BlitzCache output tables – that column will be automatically added (but not backfilled) the next time you run sp_BlitzCache to save to table. (#2377, thanks rrankins and Erik Darling.)
  • Fix: when logging sp_BlitzCache to table, QueryPlanHash wasn’t being populated. The column was there, but it was just always null. (#2396)

sp_BlitzFirst Changes

sp_BlitzLock Changes

sp_DatabaseRestore: Fixes Awaiting Your Testing

Users have contributed 3 changes/improvements that need testing before we can get ’em into the dev branch. If you can test these by downloading the code and seeing if they work in your environment, please leave a note on the issue with your thoughts on it, and if a user or two agrees that they’re production-worthy, I’ll merge ’em in:

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


Updating Statistics Causes Parameter Sniffing.

In my free How to Think Like the Engine class, I explain that SQL Server builds execution plans based on statistics. The contents of your tables inform the decisions it makes about which indexes to use, whether to do seeks or scans, how many CPU cores to allocate, how much memory to grant, and much more.

I’m tired of smelling your plans.

When the statistics on an object change, SQL Server says to itself, “Hey, the next time a query comes in that references this object, I’d better build a new execution plan because my old one might not be a good fit for the new data distribution.”

That’s usually a good thing: you want accurate plans.

However, you’re also putting yourself at risk.

Whenever you update statistics on a table or index, you’re also telling SQL Server that all execution plans that touch that table should get a brand new plan based on whatever parameters happen to come in next. As I talk about in my Fundamentals of Parameter Sniffing class, that means the more you update statistics, the more chances you’re taking: you’re purposely freeing parts of the plan cache, often large parts, and taking a gamble on the very next parameters that come in.

Updated statistics can build better query plans.

In a perfect world, you would only update statistics in cases where query plans would benefit from fresh statistics.

To understand what I mean by that, let’s look at the Users table in the Stack Overflow database, and think about how often the contents of each column would change in a way that would affect our query plans.

The classic scenario where frequent stats updates are vital is a date column that focuses on current activity. In a data warehouse, that means loading new sales for yesterday. In the Stack Overflow column, the equivalent is the LastAccessDate column: users are logging in all day long.

Say we’ve got an index on LastAccessDate, and a stored procedure that queries people by ranges of that date:

When that query runs, SQL Server has to make a decision about whether to use that index, and how much memory to grant for that sort on DisplayName. If 1% of our users log in today during the day, after we’ve updated statistics overnight, and we try to run that search query:

Then we might not accurately estimate the number of rows that’ll come out, since the data was updated after the stats update. This under-estimation might mean sorts end up spilling to disk, or that we choose an index seek where a table scan would have been more appropriate:

In the above plan, SQL Server inefficiently chose an index seek + key lookup and ended up reading more pages than there are in the table. In this scenario, you want frequent statistics updates – and heck, on a very active web site, you might not even be able to update stats fast enough to get plans that accurately reflect the data that will be in the table over the course of the next few hours. This is why folks end up resorting to query hints, forced plans, and building out fake statistics for future data. (I’ve never done that last one, but I find it mesmerizing.)

Frequent stats updates backfire on other columns, though.

If I write a query that says, “Show me all of the users whose Location = San Diego, CA,” that data distribution just doesn’t change that often. Sure, as we add progressively more and more users to the table, we stand a slight chance that the estimate will be off by a few or a few hundred rows – but it’s not a large distribution change for any one value over the course of several weeks. In a mature database with years of history, we could leave the same statistics in place on Location for months at a time without worrying.

The data distribution inside the table doesn’t change that much that quickly.

But holy cow, the data distribution per query changes constantly.

Say I’ve got an index on Location, and a stored procedure that looks like this:

After a statistics update, the very next Location I call it with will determine everyone’s performance for the day:

  • If a large location like India goes in first, everybody gets a table scan that goes parallel and gets a huge memory grant
  • If a small location like San Diego goes in first, everybody gets an index seek + key lookup that goes single threaded and gets a tiny memory grant

So the worst case scenario for parameter sniffing emergencies would be to rebuild the statistics on this table every day. Every morning, you’d wake up and stumble into work with a 50% chance of table scans, and a 50% chance of TempDB spills. Every morning would be a different emergency – even though the data distribution isn’t changing at all.

Plan cache geeks can spot it by looking at the plan_generation_num column in sys.dm_exec_query_stats. Each time a plan is recompiled due to things like stats changes, the plan_generation_num will increment. Higher numbers indicate more frequent compilations – but at the same time, plan_generation_num = 1 doesn’t mean you don’t have a parameter sniffing problem, either. (I dig into that in Mastering Parameter Sniffing.)

That’s why I update stats weekly until I have a reason to do it more frequently.

I run into parameter sniffing issues much more often than I run into the ascending-date-range problem. I’d rather default to safe weekly stats updates, which leads to more plan cache stability and easier weekday mornings.

Then, when I find specific stats that are wildly out of date within a day or two, I … no, I still don’t update stats daily. I just try to tweak the indexes and queries first to make them easier for SQL Server to build fast, efficient plans.

Because when you resort to daily stats updates to fix queries like this:

Then you’ve probably forgotten something important: this query is vulnerable to parameter sniffing too!

After your daily stats job, if someone happens to call it for a year-long (or one-hour) date range, then you’re right back in hot water with a query plan that won’t work well for everyone.

And that’s why I teach a 3-day long Mastering Parameter Sniffing class. When I started writing it, I wasn’t sure I could fill 3 days, but the more I prepped, and the more I read of our past parameter sniffing posts, the more I realized 3 days might not even be long enough!


SQL ConstantCare® Population Report: Spring 2020

Ever wonder what’s “normal” out there for SQL Server adoption rates, hardware sizes, or numbers of databases? Let’s find out in the spring 2020 version of our population report.

SQL ConstantCareOur SQL ConstantCare® service lets users send data in about their SQL Servers and get health & performance advice via email. Users can opt in to share their anonymized data to get better recommendations overall.

This data isn’t a full picture of the SQL Server ecosystem. Lemme be really clear: this is by no means a random sample of the full population. It’s just a picture of the kinds of servers where people want an inexpensive second opinion. It doesn’t include giant corporations, hospitals with PII, or tiny businesses who just shove a SQL Server in the corner and then never take a second glance at it (and if my experience is any indication, never back it up.)

Even though it isn’t a full picture, with 3,878 database servers sharing data as of May 30 (up from 3,570 in the last report), there’s still a lot of useful information in here for the community. Understanding what servers are out there will help you put context around where your own servers rank, and helps the community better build training material that’s relevant for most users.

This population report is just my attempt to help clear up the fog of war. Microsoft simply doesn’t give us any adoption information (even though SQL Server phones home by default.) If you believed what you saw onstage at conferences and in Microsoft’s changing certification programs, you’d think no one still bothers with self-hosted SQL Server, but the few who do have already migrated to SQL Server 2019 running on Kubernetes and Linux. This report is my humble attempt to keep Microsoft honest.

This time around, I built this post in an Azure Data Studio notebook live in front of a streaming audience, so the charts and tables look a little different. You can download the notebook to play with it yourself (but can’t run the queries, of course), and you can watch the video for that one at the end of the post.

Let’s start with a simple question:

What versions are the most popular?

Woohoo! This quarter, SQL Server 2019 adoption outpaced SQL Server 2008!

Thoughts on reading that chart:

  • SQL Server 2019 still hasn’t caught up to 2008R2.
  • If you add together 2008 & 2008R2, almost 10% of the servers out there are no longer under support. (That number is really slightly higher since we don’t monitor 2000 or 2005, and every time I talk versions in front of a group, folks talk about the 2000 and 2005 instance they still have cowering in shame in a closet somewhere.
  • SQL Server 2017 adoption continues to rise, but frankly, 2016 is still the complete juggernaut: 1 in 3 servers is 2016.
  • You’ll notice that the graph format is different this time around – that’s because I’m using Azure Data Studio’s built-in charts. The experience isn’t great – hopefully that’ll improve over time as well.

I’m going to continue publishing the numbers in this format (only showing the current population) for the first year of the survey, but on the first year’s anniversary, I’ll switch to graphs that show trending and year-over-year changes.

How about development? Are people maybe testing their apps on newer versions?

What versions are people using in development?

SQL Server 2019’s adoption rate for development servers doubled from the last quarter, and that’s a great sign. People are starting to develop more against 2019. However, 2017’s growth wasn’t as strong – I think it basically means people are using 2019 as their default new dev box, but it doesn’t bode well for a rapid adoption of 2019 in production over the next couple of quarters. People just aren’t testing on it yet, and testing is utterly critical for 2019 given its radical differences in execution plans (batch mode on rowstore, inlined scalar functions, etc.)

I think about this chart a lot when I’m designing training. When is the right time to start building developer training material for SQL Server 2019? It’s a tricky timing act: you need to be skilled-up before you adopt the new version, and I need to have the training material ready to go before then.

However, if I focus on building 2019-specific training material right now, it could be 6-12-18 months before it really starts to pay off. Judging by this chart, I’d be better off building 2017-specific training material since there are a lot of people who still haven’t adopted it yet. I use 2019 in my class lab VMs, but I still focus on 2017 compat mode (and only require the bring-your-own-VM students to use 2017 minimum, not 2019.)

For the rest of the census report, I’m going to focus on just the production servers.

How much data do servers have?

For quick, easy reporting (and because accuracy here has relatively limited value), I’m going to use user database data file sizes, not contents, and not log files.

Keep in mind that this is the total data file size for all user databases on the server, and these numbers really help to keep the world in perspective.

Over half of all servers host less than 225GB data.

Over 2/3 of servers are smaller than the Stack Overflow demo database, and that doesn’t even have nonclustered indexes. That’s why I love using that database for training: it challenges a very good chunk of the audience.

Consistent with the last population report, about 15% have a terabyte or more of data, with some of them having dozens of terabytes. That’s useful because it gives me over 500 servers’ worth of metadata to help guide folks in larger tiers. Performance advice is dramatically different at that scale: it’s so much harder and more expensive to throw memory and indexes at problems. Heck, just creating an index on a 1TB table can represent a big challenge for some shops.

How much hardware do production servers get?

Again, really helps set expectations: the majority of servers out there are pretty doggone small. You can go a really, really long way with just a few CPU cores, especially if you’re diligent about query tuning. At $2,000 per core for Standard Edition, you wanna avoid throwing cores at a problem if you can avoid it.

About 2/3 of the servers have less CPU power than a modern pro-grade laptop. That’s not a number to be ashamed of – that’s a number to celebrate! You can do a lot with a little. As with last quarter’s report, only about 10% of servers have over 24 cores.

Memory isn’t licensed – although Standard Edition has a ~128GB cap – so how much memory do folks have at the OS level:

I’m specifically looking at memory for the entire server OS here, like how much Windows sees.

There are several ways you can look at SQL Server memory, like target (how much it wants to use) or total (how much it’s currently using.) I kept it simple and I’m just looking at OS memory because I want to get a feel for the hardware population overall, not how people are configuring SQL Server, or whether their workloads are causing total memory to rise to the target counter.

When you think about these metrics, remember that over half of all production servers host <225GB data files. If you handed me a server with, say, 200GB of data and asked me how much memory to assign it, I think 32-48GB is actually a pretty realistic number. I might even call it generous – keeping in mind that RAM is still ridiculously overpriced in the cloud, and it’s tied to core counts.

Now, look at the metrics: half of the servers host <225GB of data, and half of servers have <48GB RAM. That makes decent sense. 0-15GB RAM sounds terrifying, but remember that 1,049 servers are hosting <25GB of data.

When you’re at these really low database sizes, I can see why people would think they’d be interested in a container-based solution: the overhead of 1,049 servers running Windows is pretty ugly. Containers might remove some of that overhead.

Bonus: what unusual wait types are showing up?

One of the fun things with gathering performance metrics on thousands of servers is that we get to see some pretty nifty edge cases:

Ooo, neato! PVS_PREALLOCATE, you say? SHRINK_CLEANER_SYNC? Time for me to go spelunking through the raw data to see what I can learn, and how I can help customers make their servers faster and more reliable.

Watch Me Build This Post with Azure Data Studio

I need to warn you a little, though, dear reader: I’m using Azure Data Studio in here, and I have some rather frustrating experiences with the graphing functionality. It’s a good overview of what it’s really like to use this product hands-on to build posts like this, though:


[Video] Watch Brent Write Queries

This morning, I wanted to write a few new “background noise” queries for my Mastering classes. In those classes, the students run live workloads against the Stack Overflow database and troubleshoot performance issues, and the more of a varied workload I’ve got running, the more realistic it feels. For this session, I specifically wanted to write quick queries – queries that would produce real-world style data in a few seconds, in the style a user would actually write them.

I streamed the whole thing live in Twitch, and you can watch. – or in YouTube if your company blocks Twitch.

If you wanna jump around to different parts of the video, here are the key points where I started new queries:

  • 7min:33sec in: I introduce the topic of what I’m trying to do. The first query fetches the tags a user works on the most often and shows their scores. I happen to run across Gordon Linoff, a prolific answerer.
  • 26m:08sec: I let y’all tell me what queries you want me to write, and then I start working on finding the top tags in a date range.
  • 34m:54sec: I explain why I love COALESCE and how a friend of mine ended up putting a seasickness pill up their butt.
  • 42m:30sec: I code a leaderboard of the top questions with the most answers.
  • 1h:04m: I get a great question about parameter sniffing on update statements, so I code a demo showing it live.
  • 1h:41m: I find the most recent answers from the top 10 most active folks, and explain why you have to use hard-coded dates rather than GETDATE() when working with older Stack Overflow exports.
  • 1h:57m: I write a query to find average response times for a given date range & tag, and build it atop a view to simulate what users would do, and then write another proc atop that view to find the fastest answers in a date range.

Enjoyed this session? Follow me on TwitchYouTube, or Facebook to be alerted whenever I’m streaming.

Here are the scripts from the demo:

 


Attention Season Pass Holders: New Fundamentals of Parameter Sniffing Videos Available!

Recorded Class Season PassIf you’ve got a Recorded Class Season Pass or Live Class Season Pass, time to get your learn on: yesterday’s Fundamentals of Parameter Sniffing class recordings are now available in your account here. As a reminder, here’s the class abstract:

Sometimes the exact same query goes slow out of nowhere. Your current fix is to update statistics, rebuild indexes, or restart the SQL Server. It works, but you don’t know why.

You’re facing a problem called parameter sniffing: SQL Server “sniffs” the first set of parameters for a query, builds an execution plan for it, and then reuses that same plan no matter what parameters get called. It’s been the bane of our performance tuning for decades, and it doesn’t appear to be going away anytime soon – in fact, in newer versions of SQL Server, it even gets worse!

In this one-day class, you’ll follow along with me in the Stack Overflow database on your own laptop or desktop as you learn:

  • What kinds of queries are vulnerable to parameter sniffing
  • How to test your code to see if it’s vulnerable
  • Why tactics like rebuilding indexes and updating statistics fix one query, but make others worse
  • How to proactively patrol your server to uncover the problems before they bring you down

This course is 100% demos: the only slides are the introductions at the start of the day, and the recap at the end of the day. The rest of the time, we’ll be working in SQL Server Management Studio. Roll up your sleeves: let’s do this.

After this class, you’re ready for Mastering Parameter Sniffing on June 3-5 (iCal). To attend that one, you’ll need a Live Class Season Pass: when you’ve got one of those, you get to attend any live online class I do throughout the span of your membership, even new ones that I bring out like this one. Just make sure you read the logistics, do the prerequisites, and set up your lab server. See you in class!


SQL ConstantCare® Now Tells You Which Indexes to Drop.

SQL ConstantCare

Indexes slow down data changes.

SQL Server doesn’t have the concept of asynchronous indexes – copies of the data that it could update later when it’s bored. It has to keep every nonclustered index up to date, every single time you change data, like a delete/update/insert.

Mo’ indexes, mo’ problems.

That’s why in my Mastering Index Tuning class, I start my D.E.A.T.H. Method with Deduping and Eliminating indexes that aren’t getting used. You wanna get rid of the dead weight that’s only slowing you down, and it’s especially important before adding more indexes.

The safest indexes to remove are ones that:

  • ARE being written to (because SQL Server has to keep them up to date) – I’m not as worried about archived or read-only tables
  • Aren’t used for uniqueness constraints (because they might be influencing execution plans even though they’re not used)
  • Aren’t columnstore indexes (because the presence of those can affect execution plans in other ways)
  • Haven’t been used at all for reads in over a month straight (ensuring they’re not used in month-end processing, either)

But you’re busy, and you have a lot of databases and servers, and you don’t set yourself a reminder to check to see which servers have been up for 31+ days, and then go in and check the index utilizations. And even if you did, you wouldn’t want to script out all the undo scripts to back out your changes.

SQL ConstantCare® now includes drop & recreate commands in your daily prescription.

Every day when your diagnostic data comes in, we examine your index usage and server performance. When we find good candidates for removal, we script out both the drop & recreate commands in your prescription file for easy running:

As with all of our prescription recommendations, you just wanna read through ’em as a sanity check, then run ’em. Note that the indexes are even fully qualified with database names: that’s because we want to make it super-easy for you to fix all of your databases just by hitting execute.

Sign up for SQL ConstantCare®, and then get started in minutes. We’ll help you take out the trash.


[Video] Watch Brent Write T-SQL

This morning I worked on a new check for sp_BlitzFirst to find statistics that were updated in the last 15 minutes, possibly causing plan caching issues and parameter sniffing. I streamed it live, and you can watch.

In part 1, the first hour, I write the proof-of-concept query:

In part 2, the second hour, I move the query into sp_BlitzFirst, test it, debug it, and check it into Github:

And finally in the third hour, I switch to open Q&A and do a few live demos:

And now, I’m enjoying the well-earned bagel that I discussed on the webcast:


Menu
{"cart_token":"","hash":"","cart_data":""}