Register Now to Watch Brent Tune Queries Tonight (Free)

It’s last-minute, I know, but I’m doing an online user group session for the Pensacola Data Professionals, and you’re welcome to attend online free if you like. It’s tonight at 8PM Eastern, 7PM Central, 5PM Pacific. Here’s the abstract:

Watch Brent Tune Queries – Ever wonder how somebody else does it? Watch over the virtual shoulder of Microsoft Certified Master Brent Ozar as he walks you through the Stack Overflow public database export, shows you two queries, and then tunes them to make them dramatically faster. Along the way, you’ll see several free tools in action that help make the process easier.

You can register to attend online, or show up in person in Pensacola. The session won’t be recorded, but if you’d like to see a previous version with other queries, you can watch one on YouTube. Enjoy!

How to Remove Times from Dates in SQL Server

Say that you’ve got a table with a datetime column in it, like the LastAccessDate column in the Stack Overflow Users table:

And let’s say you wanna find the users who last accessed the site on September 9, 2018 – regardless of the time. Here are a few ways to do it:

Here are their actual execution plans. I’m using SQL Server 2019, with 2019 compatibility level, and the 50GB StackOverflow2013 database.

The first two versions of the query are effectively the same, both doing a table scan.

When I CASTed the LastAccessDate as a DATE column, SQL Server did use an index on LastAccessDate. At first glance, that might seem like that’s a good thing because generally speaking, you want to use indexes to make your queries go faster.

However, there’s a dark side: when you check the output of SET STATISTICS IO ON, the third query is actually reading more pages than the first two. The combination of the index seek + key lookup is bad when we’re dealing with this many rows:

The problem is that when you CAST something as another data type, SQL Server’s estimates about the query’s output may not be as accurate. Look more closely at the estimated numbers of rows of output for each of the queries:

in the top two, SQL Server brought back 18,614 of an estimated 19,624 rows. That’s a pretty doggone good estimation!

In the last one with CAST, SQL Server only expected 6,364 rows. Because of that somewhat low estimate, it thought an index seek + key lookup would be more efficient for less rows, so it used the index, and ended up working too hard, doing too many logical reads.

You usually want accurate estimates,
and CASTing usually won’t get you there.

There are always edge cases where bad estimates deliver faster query results, but generally speaking, if you want to strip off times, you’re better off with one of these two approaches:

It’s more work on your part than simply CASTing the table’s contents as a date, but your work will be rewarded with more accurate estimates on SQL Server’s part, and that’ll lead to improvements throughout your query plan as you join to more and more tables.

Building Removing Comments

We have this PasteThePlan where you can copy/paste the contents of execution plans, and then share the links around the web. You’ve pasted 16,621 plans so far, and I’m always amused to see them continuing to trickle in every day without us actively promoting that tool. It just works, doing its thing quietly.

Back in 2017, we added Disqus comments at the bottom of each plan – mostly because it was easy to do, and I thought it might make your work easier if you could carry on a discussion with folks while you were working on it together.

I was wrong. Allowing comments was a bad idea.

We only got one kind of comment: “fix this query for me, kthxbai”

After leaving dozens of responses pointing people to my post on Getting Help With a Slow Query, I finally gave up and had Richie put this at the bottom of the query window instead:

The “get help” link points to my post on getting help with a slow query, and the report link sends us an email. (Sometimes folks upload query plans with things that probably shouldn’t be shown in public, so we wanna redact those if folks come across it.)

I feel bad because this is a symptom of something larger: people need free (or maybe affordable) help with query tuning.

We’ve kicked around the idea of building advice into PasteThePlan, showing the same kinds of warnings that we show in sp_BlitzCache, but it would cost Richie’s time (read: my money) to build that feature, and I don’t see revenue there to justify the development time. We do invest in it: Richie just updated it last week to add SQL Server 2019 plan support and fix a bunch of tests.

If you, dear reader, want a project idea and you think there’s revenue there to sustain it (or you’re willing to do it for free), then here you go: build something that will let people get advice on their queries. Maybe it’s an SSMS or Azure Data Studio plugin, maybe you run the code client-side, or maybe they upload the plan to the cloud and you do the calculations there.

If you launch it, let me know – I’d love to see it & share it.

Why Database Monitoring Tools Are So Hard to Interpret

SentryOne’s Greg Gonzalez recently published a post about “Why Wait Stats Alone Are Not Enough.” Sure, Greg’s obviously biased towards his own monitoring tool, but his post is really well-written and fair. I want to explain the same story, but from a different (and simplified) angle.

Say we’ve got a really simple database server, and we start out with just one query that runs each day:

Query R is a reporting query that scans ten years of sales history. It’s a terribly written query, everyone hates it, and when it runs, it maxes out all of our CPU cores for quite a while.

At this point, if you use any monitoring tool, it’ll tell you that:

  • CPU is a problem for us around 8AM (either by judging CPU %, or if the query is complex enough, SOS_SCHEDULER_YIELD waits)
  • The query that’s waiting on CPU at that time is Query R
  • The query that’s using the most CPU time throughout the entire day is Query R

Now let’s add in transactions.

Whenever our web site does transactions, Query T is a transaction that inserts a new sale and subtracts from our inventory. It runs every now and then, mostly during business hours:

Query T’s code is alright – not great, just alright – so when it runs, it just consumes some processor cycles for a few seconds and then finishes. People are still pretty happy with performance – other than that terrible Query R – so nobody bothers looking at monitoring tools.

But over time, our web site starts doing more sales and transactions:

Now the data gets a little trickier. Monitoring tools that focus solely on wait stats will tell you that you have a CPU problem around 8AM, and that the query waiting on CPU during that time is Query R, the big ugly report.

However, outside of that window, you may not have any wait time on CPU at all. If queries are simple enough, and there isn’t enough concurrency, then as we discuss in Mastering Server Tuning, you can end up with SOS_SCHEDULER_YIELD waits with no time attached to them. You wouldn’t notice a CPU problem at all outside of that 8AM window because you don’t have a CPU bottleneck – yet. And to be fair, you probably wouldn’t tune a server in this kind of shape, either.

But when you DO need to tune a server that isn’t running at 100% capacity, picking the right query to tune isn’t just about wait stats: it’s also about which queries you need to be more responsive, and how you need to plan for future load. If your company tries to run a flash sale, and a lot of folks try to check out at the same time, Query T is going to knock your server over. But wait-stats based tools won’t see that coming: they’ll still be focused on Query R, the only one that spends a lot of time waiting on CPU.

If you’re trying to tune this scenario, you need to think about two things:

  • Which time windows best represent the workload that I’m trying to tune?
  • Which queries are consuming the most resources during that time, and not necessarily which queries are waiting on resources yet at that time?

For example, if I was tuning this exact scenario in Microsoft SQL Server or Azure SQL DB, I’d use sp_BlitzCache @SortOrder = ‘cpu’ to see which queries are consuming the most CPU cycles. They may not be waiting on CPU – yet – but it doesn’t matter. I might need to find ’em before we get to the point where we’re actually waiting on CPU. (I would also strongly recommend that we move the reporting query to a replica, hahaha.)

Now let’s add in monitoring and backups.

Query B is our nightly backups, Query D is database corruption checking, and Query M is our monitoring tool that runs continuously 24/7:

Now it doesn’t matter whether you sort queries by wait stats or by how much CPU they used: you’re going to be in a fog. You have to ask yourself a few questions to get the right context:

  • What are the time windows where I’m worried about performance? You get that answer by working with the business.
  • During that window, what’s my current bottleneck today? You get that answer from wait stats.
  • During that window, what queries are causing that current bottleneck? You get that answer from a variety of sources like the plan cache, Query Store, Extended Events, sampling live workloads, and more.
  • Looking ahead, are there queries that will cause a bottleneck tomorrow? You get that answer by reviewing workloads whose quantity will vary as your user workloads or data sizes grow.

If you only use one source – whether it’s wait stats or the plan cache – and you don’t put it in context with the above questions, you’re going to be led down the wrong path, and your tuning efforts will be wasted. You need context, and that’s why I teach training classes to help you figure this stuff out. I long for the day when you can open a tool that has a big flashing button that says “click here to fix the problem,” but we ain’t there yet.

Updated First Responder Kit and Consultant Toolkit for January 2020

A couple of times a year, I disconnect from the Internet to sit down with a planning book, my Epic Life Quest, and my future task list. I spend time making sure my day-to-day tasks are still prioritized right. When I do that, I often find out that I’ve lost focus on a particular task, or I’ve been spending too much time on something. We only get so many hours in a day, and I gotta figure out where I can be the most effective.

For 2020, one of the things I decided was to update the First Responder Kit just once per quarter instead of once per month. At the time, I was thinking to myself, “These scripts are pretty mature. How many changes can really be left to make?” Uh, as it turns out, a lot, as you can see by this month’s changelog! So that plan is out the window – let’s keep going with monthly releases, hahaha.

Big shout out to the 9 folks who contributed code & issues for this month’s release.

To get the new version:

Consultant Toolkit Changes

Updated the First Responder Kit with this month’s script updates, plus:

  • Improvement: new Indexes Missing tab with the missing indexes from all databases. Same output as running sp_BlitzIndex @Mode = 3, and most of this stuff is available in the Indexes M4 tab, but when you have to do a massive index cleanup, it’s helpful to be able to slice & dice these in a table.
  • Fix: sp_BlitzCache memory grant & spill sorts weren’t running on Azure SQL DB. (Thanks, Erik.)

sp_Blitz Changes

  • Improvement: added non-default database configs for enable_broker and honor_broker_priority. (#2187, thanks parlevjo2.)
  • Improvement: added more-info query when we catch people running unusual DBCC commands. (#2211)
  • Fix: updated end-of-support dates for unsupported builds check. (#2221, thanks Matt Saunders.)
  • Fix: if you had >50 databases, you’d get the “multiple plans for one query” warning even if you just ran a single query once with sp_MSforeachdb (since it’d put a plan in cache for each query.) If you have >50 databases, we now set the threshold for this warning to be 2x the number of databases. (#2249)

sp_BlitzCache Changes

  • Improvement: new warning for Table Spools. (#2235, thanks Erik Darling.)
  • Improvement: checkid 999 (plan cache is expiring quickly) now has a URL for more details. (#2214)
  • Improvement: the warnings in the 2nd result set now exactly match the warning names in the Warnings column in the top result set, making it easier to look up warnings you don’t understand. (#2212)
  • Fix: the stale-statistics check was also labeled checkid 999, so moved that to 997. (#2214)
  • Fix: each time sp_BlitzCache ran, it was putting an insert-into-#ReadableDBs query into the plan cache, which made demos a little cluttered. (#2253)
  • Fix: avoid plan cache pollution by avoiding the use of a random ID when building the output table name. (#2252, thanks Adrian Buckman.)

sp_BlitzFirst Changes

  • Improvement: in @ExpertMode = 1, added these Availability Groups Perfmon counters (plus a few others) to help troubleshoot high HADR_SYNC_COMMIT waits that aren’t related to CPU or storage. I haven’t added any alerts for specific thresholds though – just showing these counters in the Perfmon section of the @ExpertMode = 1 output so you don’t have to go setting up a Perfmon session. (#2246)
  • Improvement: if the high forwarded fetches check fires (#29), look in TempDB for the top 10 tables with >100 forwarded fetches, and include them in the results. I would really like to tell you more, but it turns out it’s really hard to identify what code & session is responsible for a temp table.  (#2248)
  • Fix: the ##WaitCategories table is no longer repopulated with every run. (#2208, thanks bgpjr.)
  • Fix: error when hitting the long-running blocking check. (#2218, thanks sm8680.)
  • Fix: when @ExpertMode = 1, really slow sp_BlitzWho runtimes (like for hundreds of sessions rendering across a slow network pipe) would cause sp_BlitzFirst’s samples to be shorter since the sp_BlitzWho runtime was included in the sample time. Now the clock only starts after sp_BlitzWho finishes. (#2244)

sp_BlitzIndex Changes

  • Improvement: added @Debug parameter that outputs the working temp tables and the dynamic T-SQL used to populate them. (#2229)
  • Fix: if you ran it with @GetAllDatabases = 1, @Mode = 3, it would run even if you had >50 databases. Now requires @BringThePain = 1 even for @Mode = 3. (#2255, thanks DBAdminDB.)
  • Fix: non-partitioned columnstore indexes were reporting as partitioned with duplicate row counts. Root cause is multiple rows in sys.dm_db_index_operational_stats for different hobts for the same columnstore index. (#2228)

sp_BlitzLock Changes

sp_BlitzWho Changes

  • Improvement: added OPTION (RECOMPILE) hint, and if the SQL Server has >50GB memory, and it’s on a recent patch level, we add a max memory grant hint of 1% to reduce sp_BlitzWho’s memory grant request. (#2238, thanks Tara Kizer.)

Power BI Dashboard for DBAs Changes

Thanks to the Power BI wizardry of Eugene Meidinger, there are a ton of changes and improvements in progress in the Power BI Dashboard. Consider this month a very “alpha” release: it’s not really ready for public consumption yet, because Eugene changed a lot and I need to tweak a few things to help get you started, but it’s off to a great start. I’ll do an update to it over the coming week and post a few blog posts explaining what’s new.

I highly recommend Eugene’s Pluralsight courses, including:

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

Cool Query Hints

The SQL Server documentation has a pretty cool list of query hints:

Yeah, I surf in dark mode. I’m a dark kinda guy. But wait – what’s that colored box? ENHANCE!

I had to open this page in incognito mode because that warning doesn't show up when *I* log in, baby

I could make jokes here, but … every single thing in that caution is true. SQL Server really does typically select the best execution plan for a query, like 99.999% of the time, and if you use a query hint, you’re probably making a mistake.

However, having said that, if you’re reading this blog, you’re already more experienced than most SQL Server users. No seriously, I know you’ve got impostor syndrome, but you’re doing alright, and I think it’s high time you considered using a few of these when your query tuning situations get really dire.

For more details about each of these, including the versions where they came in, read the query hint documentation. (And if you think you’ve found an error in the documentation, or you want to improve it by making it more specific, don’t forget that you can contribute changes to the documentation.

OPTION (MIN_GRANT_PERCENT = 5) or OPTION (MAX_GRANT_PERCENT = 10) – when the execution plans for your queries are asking for way too much (or not enough) memory, and you’ve already tried tuning the queries and the indexes, this hint is a useful last resort.

OPTION (FORCE ORDER) – if you’re struggling with an execution plan that doesn’t quite understand which table is the most selective or should really be processed first, this hint forces SQL Server to process the tables in the same order you wrote ’em in the query. I like this better than index hints because it still gives SQL Server flexibility in the way it accesses each table.

OPTION (MAXDOP 0) – if you’re stuck on a third party vendor’s server where they demand that you set MAXDOP = 1, did you know you can set MAXDOP higher for your own query by using a hint? Well, now you know: this query-level hint overrides the server’s MAXDOP setting. Great for building indexes or reporting tables.

The documentation has a lot more, and you certainly won’t use ’em often – but today’s probably a good day to go back in and check to see what new options are available in the version of SQL Server you’re running these days. I’m starting to cover ’em in Mastering Query Tuning, too.

Where Should You Tune Queries: Production, Staging, or Development?

The #1 fastest way to tune queries is in the production database, on the production server.

I know. Put the knife down. You’re not happy about that, but hear me out: I’m specifically talking about the fastest way to tune queries. In production, you can guarantee that you’re looking at the same data, hosted on the same exact build of SQL Server, with the same horsepower, and the same configuration settings at the server & database level. If you’re under duress, this is the fastest way to get it done.

Having said that, tuning queries in production carries obvious risks:

  • Queries might insert/update/delete data
  • You might see data you’re not supposed to see
  • Your tuning activities might adversely impact other users
  • You might forget to highlight that WHERE clause
  • Or you just plain might not have permissions to production (which frankly, is a great thing)

So instead, in descending order of effectiveness, here are the next fastest places to tune queries:

#2: Production data, on a different (but identical) server – because here you should get identical query plans. I still put this at #2, though, because in my experience, even when folks think it’s an identical server…it’s not. We run sp_Blitz on the two servers, and suddenly we discover that they have different sp_configure settings, different trace flags, different storage, etc.

#3: Production data, on a differently powered/configured server – because at least you’re dealing with the same data size and distribution, so you should get highly similar execution plans. However, they won’t be identical – especially when we start talking about wildly different memory (which produces different grants, therefore different spill behaviors), different SQL Server builds, and different configuration settings.

#4: Non-production data, but an identical server – now, we’ve got different execution plans because the data quantity & distribution is different. Now your tuning becomes dramatically slower because you have to do a great job of comparing query plans between production & non-production. When they’re different – and they’re gonna be different – you have to understand that your tuning efforts in non-production environment may not produce the same awesomely positive results (or even ANY positive results) in production. Great recent example: had a developer who just couldn’t fix a bad query plan issue whose root cause was bad stats in production. Until the developer could see the bad stats, she had no idea there could even be a problem like that.

#5: Non-production data, on a differently powered/configured server – because here, nothing’s alike. It’s like calling a plumber to have them fix a broken pipe in your house, and they can’t see your house, but they’re trying to look at their own house and explain to you what you should do. Nothing’s gonna match, and nothing’s gonna be easy. Can you tune queries this way? Sure – but it’s gonna take a long time and a lot of money.

Building SQL ConstantCare: Let’s Tune an Index in Postgres.

This week, our SQL ConstantCare® back end services started having some query timeout issues. Hey, we’re database people – we can do this, right? Granted, I work with Microsoft SQL Server most of the time, but we host our data in Amazon Aurora Postgres – is a query just a query and an index just an index regardless of platform? Follow along and let’s see how your SQL Server knowledge translates across platforms.

Finding AWS Aurora Postgres queries that need indexing help

I started the process by going into AWS Performance Insights, their free database performance monitoring tool for all RDS-hosted instances – including MySQL, Postgres, SQL Server, and Aurora. It comes with 7 days of performance history, which is plenty enough to help with most real time troubleshooting.

It’s not exactly intuitive to find: in the RDS console, when looking at the monitoring graphs for your instance, you have to hit the Monitoring dropdown, and choose Performance Insights:

I switched my time scale to 24 hours, saw an area of the chart I wanted to zoom into, and then selected that area of the graph with my mouse (trackpad, whatever):

The graph switches to the wait stats for that specific time range, and on the right side, the default slicing is by waits, so I can investigate where the server’s bottleneck is:

Or if I want to figure out which queries used the most resources overall, I can change that slice to queries:

And I get a color-coded list of which queries used the most resources. That’s pretty doggone easy. Looks like I need to do some tuning for the query that checks to see whether your heaps need to be rebuilt – and now things start to get a little more complicated.

Designing an index based on the SQL query

Here’s the first part of the query that I need to tune – it’s a CTE that kicks off the rule query, and it looks for heaps that have had more than 100,000 forwarded fetches since the last time we analyzed ’em a week ago:

The 3082 and 3053 are numbers I’m using to replace variables – normally, this code is driven by variables for the specific collection of DMV data that we’re analyzing, and the collection done around one week prior.

That sys_dm_db_index_operational_stats table only has one nonclustered index:

Isn’t that if not exists syntax nifty? Postgres has so many cool shortcuts. And isn’t that index name terrible? This is what happens when you let Richie name things.

Anyhoo, let’s look at what the index has, versus what the query needs:

  • connection_id = this is helpful, because we’re seeking to just one connection_id, meaning one collection of DMV data for one server on one day
  • user_id: uh oh, our query isn’t using this at all, but it’s the second column in our index, meaning from this point forward, the data is essentially unsorted
  • object_id, index_id: the table & index we’re analyzing (useful)
  • leaf_insert_count: we’re not using that at all here (but we might be for other queries)
  • And these fields aren’t in the index, but the query needs them: database_id, partition_number, forwarded_fetch_count

So I have a few things I could do here:

  • I could add user_id into the query – because I do know it at runtime, and I could just as easily pass it in to make sure that SQL Server understood the rest of the ordering key fields could still be used as-is, but the index still won’t work without putting database_id in the index.
  • I could build a new additional nonclustered index based on connection_id, database_id, object_id, index_id, and partition_number, possibly including forwarded_fetch_count. You don’t usually think of adding indexes tailored to fit every query, but this table essentially exists to satisfy just a handful of queries, so it might be realistic. The production table has 350M rows though, so I don’t wanna go wild and crazy with indexes here.
  • I could tweak both the query and the index by adding user_id into the query, and then adding database_id, partition_number, and forwarded_fetch_count to the index. The key ordering on there would be a little tricky, and in a perfect world, I’d need to see all of the other queries that touch this table. Because Richie is what’s known in our industry as a 10X Full Stacks Developer™, he’s got all of our queries in source control, so it’s easy for me to do a quick search in the repo to find all queries that touch a table.

I chose to add a new index to perfectly cover this query – and based on my quick search of the repo, I don’t think the other index will be helpful to any of the other existing queries. Here’s the new index:

(I honored Richie’s naming convention, but Postgres is shortening the names of the indexes, so I just embraced that.) That index is fairly straightforward, but to learn more about how I design indexes without looking at the query plan first, check out my Fundamentals of Index Tuning course.

Next, I need to see whether the query plan will use my index or not.

Finding out if the query uses my index, and how much it helps

Our production Aurora database is around 2TB, and obviously I don’t wanna go creating indexes on the fly in there. I want to create indexes in development first, and make sure that the query actually picks those indexes up, and see what kind of a difference it makes in the plan.

Using Azure Data Studio, to get an estimated query plan in Postgres, you highlight the query and then click the Explain button, which gives you:

<sigh> I know. Text. You’re probably used to graphical plans in tools like SQL Server Management Studio, SentryOne Plan Explorer, or JetBrains DataGrip, but I’m living in the dark ages over here in ADS. You can also use the explain command to dump out results in JSON format, and then copy/paste it into online visual plan viewers like PEV or explain, but those say that ADS’s output isn’t valid JSON, so, uh, yeah. I just use text. (I’m not sure if the problem is Aurora or ADS.)

Anyhoo, some of the important things to notice from the text are:

  • There are two index scans, one for each connection_id: these are what you and I would consider index seeks in SQL Server because SQL Server would seek to one particular connection_id and then start reading, but then it’s going to scan all of that connection_id’s data because the rows aren’t sorted in a useful way.
  • Those two index scans each have a cost of 1,113 query bucks.
  • The query’s total cost is 2,227-2,244 query bucks.

Next, I created the index, and then I asked Aurora to explain the plan to me again:

The takeaways:

  • The index accesses are now called “Index Only Scan” (and they use my new index) – because now the index completely covers the query, so there are no key lookups
  • Each index scan only costs 0.43-739.92 query bucks
  • The query’s total cost is 0.85-1494.62 query bucks

Ta-dah! Now I feel confident adding that index in production. Well, not me, personally – I’m going to add an issue in Github explaining what I want to do and why, and then open a pull request with the new index. See, I don’t actually have write permissions in production. Only Richie does. I’m the pointy-headed boss around here, and the last thing I should be doing is changing stuff in production.

Your SQL Server knowledge helps you elsewhere.

At the end of the day, a relational database is a relational database. There are differences in the ways that you interact with them, like the tools you use and the format of the output, but it’s not rocket surgery. If you had to take on a new database platform, you could! There are interesting differences between SQL Server and Postgres – the vacuum process is an interesting one, for example – but overall, they’re surprisingly similar.

Performance Tuning Means 3 Things

  1. Managing what requests are made
  2. Managing the efficiency of each request
  3. Managing the hardware capacity available to satisfy requests

And that’s it.

In the grand scheme of things, those are the only 3 knobs you get.

Oh sure, from the outside, performance tuning looks more complicated than that, but at the end of the day, it always comes back to those 3 knobs and knowing which ones you can turn.

Managing what requests are made and when starts with understanding your workload: analyzing the queries that are running, where they’re coming from, what the data is being used for, and why the request is being made. The easiest free way to start with that is sp_BlitzCache, and one of my favorite ways to start the discussion is by using @SortOrder = ‘executions’ to show the most frequent query requests. Time after time, folks just flat out don’t know what the app is actually requesting from the SQL Server – and when we start asking why, the tuning process often progresses to … just not making those requests, like caching static configuration data in the app tier.

Managing the efficiency of each request starts with measuring what the queries are doing, and how the database server is configured to respond to those requests. Tuning efficiency can mean changing indexes, isolation levels, SQL Server settings, or even tuning the query itself.

Managing the hardware capacity available starts with measuring how overloaded the server is. I like using wait time per core per second to figure out if there’s smoke coming out of the server, or if it’s sitting around bored.

As a consultant, I like giving clients a couple/few different options that fall into different categories. For example, if they’re querying the database for static configuration data thousands of times per second, they could either cache it in the app tier, or continue to invest more in hardware capacity to sustain those requests.

When you compare caching to cha-ching, suddenly folks get it. They’d rather go to the registered cache than the cash register. For a good primer on caching in .NET, check out Nick Craver’s post on how Stack Overflow does caching.

Can SELECT * Make a Query Go…Faster?!?

Most of the time, you’ll hear advice that you should select as few columns as possible so that SQL Server can do less work.

However, one of my clients hit a neat edge case scenario where SELECT * queries actually ran faster than picking a subset of columns.

I can’t share the exact query, but I can share a query that can at least illustrate the underlying concept.

If you’ve seen my “But It Worked in Development!” – 3 Hard SQL Server Performance Problems session from the 2017 PASS Summit, you may remember that SQL Server has some odd behavior when it comes to estimating how much data will come out of a query. Starting at around the nine minute mark, I run this query:

The Stack Overflow database doesn’t ship with an index on Reputation by default, so SQL Server needs to sort all of the users by reputation in order to find the top 250. The Sort operator in this plan is going to need a memory grant:

In order to calculate the memory grant needed by the sort, SQL Server looks at the amount of data that it thinks is coming out of the Users table. That guess, as I talk about in the PASS Summit session, is wildly incorrect because it’s based on the data type sizes, not the data that’s actually in the table. Because it thinks this table has 37GB (it really only has 1GB), SQL Server overestimates the memory grant:

SQL Server desires a memory grant of 49GB, but “only” grants 10GB because that’s the max that my SQL Server will allow due to its hardware size. The query only uses 1.5GB of memory – the sort has plenty of memory in order to do its job. Too much, in fact, but who’s counting?

But if we change it to just SELECT Id…

Instead of using SELECT *, then now SQL Server is going to estimate that a much lower amount of data is coming out of the Users table, and now the sort spills to disk:

Because SQL Server underestimated the memory grant this time, only asking for 600MB of RAM.

In this query’s case, the spill doesn’t really matter – the SELECT Id is still faster. I’d say that’s true in 99.99% of cases out in the real world, too: you only want to select the specific columns you need. However, this is just a neat introduction to a complicated issue: trying to get memory grants to be high enough to handle your data, but not so high that they lead to RESOURCE_SEMAPHORE poison waits under concurrent load like I talk about in the PASS Summit session.

Building SQL ConstantCare®: Which Recommendations Do People Mute?

Our SQL ConstantCare® service sends you daily recommendations for specific tasks that will improve your SQL Server’s health and performance.

Next to each recommendation, there’s a Mute link that you can click on to either mute the recommendation entirely, or mute it for specific databases, indexes, or queries:

When we brought out that feature last year, I was really curious to see what recommendations people would mute. Before the Mute links came out, users had always been able to mute things by hitting reply and telling me what to mute, but I guessed that they’d get more ambitious with muting stuff once they were allowed to mute it themselves without talking to me. You can read what they were muting back then.

Here’s what you’re muting:

  1. Consider Enabling Query Store – now muted by 151 users. I’ve blogged about why people aren’t using Query Store, and this is still a bit of a bummer. I think we could turn this number around, but users need to see value out of the feature. It’s just not easy enough yet for most folks to go from turning it on, to configuring it properly, to getting value out of it.
  2. Check for Corruption ASAP – 110 users. Keep in mind that this isn’t the number of people who have been alerted about the lack of corruption checking, and then actually fixed it. These are just the people who said, “Yeah, I know I’m not checking for corruption, and you can mute that recommendation.” That’s why I’m asking for SQL Server to just start automatically check data integrity in the background. It’s too important to leave to end users to implement.
  3. Transaction Log Larger than Data File – 104 users. I need to go back and tweak the code on this rule because I’ve seen some commonalities: for example, people seem to run into this problem a lot on the SSIS DB and want to ignore it.
  4. Set Fill Factor Higher – 65 users. This rule currently fires for fill factors 79% and lower. I may tweak the threshold on this one too.
  5. Check Unusual Database States – 60 users. Dang, y’all really like keeping offline databases around. I think I wanna tweak this one to just alert when a formerly-online database drops offline.
  6. Take a Full Backup – 59 users. COME ON NOW.
  7. Move User Databases Off the C Drive – 56 users. That’s fair: once we’ve found this problem, you probably don’t want to hear about it repeatedly. I’m not sure there’s value in keeping this rule around because it doesn’t look like a lot of people are willing to fix it.
  8. Take a Log Backup – 44 users. We’re only alerting you for databases in full recovery model that haven’t had a log backup in several days. Right now, we don’t have logic to detect if you’re offloading log backups to a readable secondary, but based on this high number of mutes, that’s something we need to build.
  9. Move TempDB Off the C Drive – 42 users. See #7.
  10. Enable Checksums for Corruption Detection – 39 users. This one kinda surprises me because it’s so safe, easy, and fast. I think I need to work on my communication about the rule to explain why it’s so important.

Three Reasons to (Temporarily) Change Max Worker Threads

When you go to the deli, there’s a “take a number” dispenser where you tear off a number and wait for your number to be called.

If you walked into a deli and you saw 50 people lined up all waiting their turn, and you went to take a number and the dispenser was completely empty, you’d say to yourself, “I think I’ll come back later.”

The deli could solve the problem by getting a larger take-a-number dispenser – or could they? All that would let you do is grab a higher number, but it’s not like they’d be able to start working with you any quicker. More numbers don’t let the people behind the counter actually do more work. In fact, it’s almost the opposite: it just means that the queues are going to grow larger and larger.

SQL Server is the same way:

  • The workers behind the counter are your CPU cores.
  • When a query wants to start, it grabs a numbered ticket.
  • If there aren’t any tickets left, there’s a problem – but to be honest, there was already a problem if the deli was chock full of angry customers waiting their turn.

If a user walks into your SQL Server and there aren’t any tickets left, you run into a poison wait called THREADPOOL. To a user, THREADPOOL feels like the SQL Server isn’t responding to network requests – but it’s just because the workers behind the counter are really busy right now, and so many customers have piled up already.

You could solve the problem (except it’s not really solving the problem) by adding more numbered tickets. In SQL Server, the max worker threads setting is how many tickets are available. In the max worker threads documentation page, Microsoft explains that for most servers (64-bit operating systems with 5-63 CPU cores), the default max worker threads is 512 + ((logical CPUs -4) * 16):

  • 8 cores: 576 max worker threads
  • 16 cores: 704
  • 32 cores: 960
  • 128 cores: 4,480
  • 256 cores: 8,576

In theory, if you had 8 workers at your deli, and you wanted thousands of customers to be able to grab a number and sit around waiting forever to be called, you could set max worker threads to be, say, 8,576. However, in most scenarios, this is just going to mean a serious performance resource problem and a whole lot of pissed-off customers. Trust me, your sandwiches aren’t worth waiting that long.

There are 3 unusual customers who often stand around without ordering food.

SQL Server has a few features that consume worker threads even when they’re not doing anything:

  1. Availability Groups – which consume threads based on the number of databases you’re protecting
  2. Database mirroring – similar to AGs
  3. Query notifications – which let apps subscribe to query updates, but consumes threads based on the number of SqlDependencies (times the number of servers subscribed)

In our deli analogy, these 3 features can consume all of your ticket numbers – even when they’re not actually doing any work. They’re considered active, running queries – even though they’re just standing around the deli, and whenever they’re called on, they just say, “Go on to the next customer – I haven’t figured out which cheese I want yet.”

If you’re backed into a corner where you’re experiencing worker thread exhaustion and THREADPOOL waits – running out of numbers to start new queries – just because these 3 features have consumed all your worker threads, then you could temporarily solve that problem by increasing max worker threads. That will let other customers go ahead and start placing orders – and at first, it’ll seem like you’ve averted disaster, and your deli can make money again.

But sooner or later, those 3 customers WILL order food.

And that’s when your deli is going to be in a world of hurt.

Great example: I’ve had several clients with thousands of databases protected with Always On Availability Groups spread across several replicas. They all had to raise max worker threads to a ginormous number just so that they could continue to service end user queries. They told themselves, “It’s fine, it’s not like all of these databases are changing data at the same time – most of the time, those worker threads are idle, so it doesn’t matter.”

Until one of the replicas went offline for more than a few minutes.

Then, as soon as the replica came back online, all of the databases on the primary replica had been active over the last few minutes, and all of them had changes they needed to push over to the replica. All of a sudden, the deli was chock full of angry customers, all of which wanted to place an order RIGHT NOW, and CPU went to 100% in a really ugly way, with new queries timing out and experiencing THREADPOOL waits.

When you buy a bigger number dispenser, you also need to start architecture discussions.

The bigger number dispenser buys you time, but only as long as not all the customers wanna place an order. The very fact that you had to buy more numbers is your big sign that at some point, workloads can be unsustainable, and you’re going to need to figure out how to:

  • Reduce the number of potential active worker threads, or
  • Get yourself more CPU cores (workers behind the deli counter), or
  • Divide the work across more SQL Servers

When you’re making these decisions, stop to talk to the business about their future growth plans. Think about the number of databases you’ll have 3 years from now, and the number of query notifications you plan to add. If those numbers are up in the thousands, it’s time to revisit the architecture strategy.

[Video] You shouldn’t dynamically create/alter/drop triggers.

Let’s get one thing out of the way first: this isn’t about the relative good or evil of triggers. I kinda like triggers, in fact – they’re good for enforcing business logic in a hurry when you can’t redo code across an entire application. Sure, in a perfect world, we’d always have the code do whatever the code needs to do – but there are times when you can’t, and triggers come in handy then.

Just as you need to be judicious about what you do inside a trigger, you also need to be careful about when you actually deploy one, and you should probably never dynamically create/alter/drop a trigger. See, when you create, alter, or drop a trigger, you’re changing the table’s structure, which requires a higher level of locking than you might expect.

To demonstrate it, I’m going to create a trigger on the Votes table in the Stack Overflow database at the same time end user activity is happening, and show the blocking:

Here’s the demo code that I use in the video:

So what should you do instead? Treat trigger changes just like any other kind of deployment of a table change, like adding a column or dropping a column:

  • Try to schedule it off-hours during a low load period.
  • Consider the blocking implications of changing the object.
  • Only do it when there’s a human being around to watch what’s happening, and cancel the deployment if it’s causing a huge disruption.
  • And maybe most obviously, only do it once. Don’t automate the same change over and over if you can avoid it – it’s just too disruptive.

The 2020 Data Professional Salary Survey Results Are In.

We asked what you make, and 1,734 of you in 63 countries answered. Altogether, you made $170,753,936 this year. Damn, y’all are bringing home the bacon!

Download the 2020, 2019, 2018, & 2017 results in Excel.

A few things to know about it:

  • The data is public domain. The license tab makes it clear that you can use this data for any purpose, and you don’t have to credit or mention anyone.
  • The spreadsheet includes the results for all 4 years (2017-2020.) We’ve gradually asked different questions over time, so if a question wasn’t asked in a year, the answers are populated with Not Asked.
  • The postal code field was totally optional, and may be wildly unreliable. Folks asked to be able to put in small portions of their zip code, like the leading numbers.
  • Frankly, anytime you let human beings enter data directly, the data can be pretty questionable – for example, there were 14 folks this year who entered annual salaries below $500. If you’re doing analysis on this, you’re going to want to discard some outliers.

I did some quick slicing and dicing, focusing on SQL Server DBAs in the United States, and deleted the top & bottom 10 outliers (because they were kinda bananas.) Looks like DBA salaries are up again this year:

We had a lot more responders this year overall – responses went down in 2019, possibly because we got a little overzealous with asking a lot of questions. We took a lot of questions out this year because they just weren’t proving to be relevant, but if you want to slice & dice through past history on those, they’re still included in the spreadsheet output.

We did ask a couple of new questions though: what’s the newest version of SQL Server you have in production?

That’s awesome! 11% of respondents are already on SQL Server 2019. I know what you’re gonna ask: are those people paid more? For that, I’ll point you to the spreadsheet to do your own analysis, because you’d also probably want to filter for job positions you’re interested in, countries, etc. I don’t know much about stats, but I know that as sample size gets smaller, that kind of data – like salaries – is less useful, so be careful.

And the oldest version you have in production:

Ouch. 49% of shops are still running unsupported versions in production. But hey, at least the adoption rate of SQL Server 2019 (the newest version in the top table) has already caught up with the combined installations of SQL Server 2000 & 2005!

Hope this helps make your salary discussions with the boss a little bit more data-driven, and hope it helps justify keeping your salary competitive. If you’ve got questions about the data, go ahead and jump in – download the 2020, 2019, 2018, & 2017 results in Excel. Don’t ask me to slice and dice the numbers for you – you’re a data professional, remember? Get in there and analyze it to answer your own questions. You’ve got this!

If you write a blog post about the data, let me know – I’ll post a followup next Tuesday with a link to your blog, and that’ll help drive traffic to you.

My 11 Favorite 2019 Blog Posts

Yesterday, I showed the stuff you read the most, but enough data – now, let’s talk about feelings. These were the 11 posts I remember with the most fondness, in date order. I did indeed like some of the ones from yesterday’s list a lot, but I’m going to set those aside.

Erik and Tara are Heading Out (Jan 3) – It doesn’t seem right to have a 2019 retrospective without mentioning how this year started. It wasn’t a fun blog post to write, nor a fun business decision to make. However, looking back as the year closes out, I can say with more confidence that it was the right business decision to make, and I’m glad I did it.

Consultants: want a utility to gather SQL Server data? (Jan 8) – As soon as I decided to stop growing our own consulting work force, I decided to start selling the consulting tools that we’d built. The Consultant Toolkit is now used by hundreds of consultants around the world. That’s kinda awesome.

How fast can a $21,468/mo Azure SQL DB load data? (Feb 13) – spoiler alert: slower than a USB thumb drive.

Can deleting rows make a table…bigger? (Mar 5) – Anytime I can help Michael J. Swart solve a problem, I’m all over it. Plus, this issue pops up again in a new way with SQL Server 2019’s Accelerated Database Recovery, but I’ll leave that one for 2020.

Fixing Performance By Throwing Cloud At It (Apr 23) – This was one of those posts that just seemed obvious to me, but it came up repeatedly with clients, so I figured I’d write it down. I wasn’t proud of that post at the time – it just seemed so obvious – but I heard from a few people afterwards that they’ve integrated the term “throwing cloud at it” into their vocabulary, so that’s kinda cool.

15 Reasons Your Query Was Fast Yesterday, But Slow Today (Apr 18) – Even today, it’s surprisingly hard to get definitive answers from monitoring tools when this question comes up.

Pop Quiz: SQL 2000 Bug or SQL 2017 Bug? (May 1) – I knew this post wouldn’t be timeless when I wrote it, but hot dog, I really enjoyed writing it.

Yep, Developers are Using Unmasked Production Data. (July 5) – I get so frustrated when I hear trainers/presenters/bloggers/idealists talk about how developers should be using purpose-built-from-scratch data sets with no real customer data in ’em. The real world just rarely works that way. Sure, I get that it should work that way, but it’s just not reality today, so I wanted to poll the audience to get some hard numbers for proof.

The 6-Month DBA Training Plan (July 15) – Wow, time flies: I first wrote this 8 years ago as an email autoresponder. I kept getting emails asking me to update some of the content, so in the summer, I figured I’d sit down and just turn the private autoresponder into a public series of blog posts, and also update the autoresponder content while I was at it. I thought it would be quick and easy. I was wrong.

“But NOLOCK Is Okay When The Data Isn’t Changing, Right?” (Aug 5) – Simple demo & video proving why you probably shouldn’t be using NOLOCK.

So, uh, you can, like, draw me in SSMS. (Nov 16) – Because it’s amazing to have someone do something like this:

Portrait in T-SQL

Your Ten Favorite Blog Posts from 2019

Here are the 2019 posts you visited the most this year:

10. The Silent Bug I Find in Most Triggers – Once you know how to recognize this bug, you’ll see it everywhere. I wrote the post as an interactive Azure Data Studio notebook, something I thought I’d be doing a lot in 2019, but the lack of execution plan support in notebooks put the kibosh on that one. I don’t expect Microsoft to solve that anytime soon – there can’t be a lot of us who need query plan support in notebooks, just presenters. (I’m not being sarcastic there – the number of presenters who would use ADS for this kind of thing is really, really tiny compared to the number of, say, data scientists who use ADS daily. I love ADS and use it daily myself.)

9. Should we use stored procedures or queries built in the app? – Client DBAs ask this one a lot because they believe if they simply require untrained developers to start writing T-SQL instead of using ORMs, the queries will somehow magically be instantly faster, as if developers are just going to naturally write fast T-SQL.

8. SSMS v18 and Azure Data Studio: The Golden Age of SQL Server Tooling – How the hell did this end up in the top 10?!?

7. [Video] What’s New in SQL Server 2019 – When SQL Server 2014 released, the cardinality estimation changes caused legendary surprises for early adopters, especially the ones who didn’t test their workloads. (I get it – workload testing is hard.) As 2019 approached, I thought, “Can I write a demo for SQL Saturday attendees that will prevent a similar problem with 2019’s new engine features by teaching them while making them laugh?” This session went over really well at SQL Saturdays, so I figured I’d record it for the blog.

Big tequila

6. Free SQL Server Load Testing Tools – Everyone thinks they wanna load test their SQL Server until they realize how freakin’ hard it is. Classic example: try load testing a query with a DELETE in it, and it’s only gonna really show load one time before you have to reset the database.

5. What’s Better, CTEs or Temp Tables? – I field this question constantly.

4. How to Think Like the SQL Server Engine, Post 1 – This fall, I took my Engine presentation videos series and wrote them out as blog posts. I’ll also rerecord these with better A/V quality in 2020.

3. There’s a bottleneck in Azure SQL DB storage throughput. – When I discovered this, I mostly just wanted to document it for other Azure users so they understood why they weren’t getting the performance they wanted. Some blog posts are evergreen – always growing in popularity because they’re timeless topics. This one, not so much – it got a lot of attention (including from Microsoft) right when it went live, but then tapered off over time:

2. Which version of SQL Server should you use? – I put a ton of work into this one, and have already updated it since 2019 went live. This is an example of a more evergreen topic where traffic grows over time.

1. How to fix the error “String or binary data would be truncated” – If you ask Google a question, and if it thinks it can give you a snippet of text from a search result as the answer, then it displays that answer directly in your search results, like this:

You can write blog posts specifically for that kind of search optimization, so I decided to give that a shot with this blog post about the string or binary data error. In one way, the experiment worked – it got a lot of hits – but I forgot to follow up later to see if the right snippet showed up as an answer in Google’s results. It didn’t:

Ah, well. Gives me something to play around with in 2020.

How to Make SELECT COUNT(*) Queries Crazy Fast

When you run a SELECT COUNT(*), the speed of the results depends a lot on the structure & settings of the database. Let’s do an exploration of the Votes table in the Stack Overflow database, specifically the 2018-06 ~300GB version where the Votes table has 150,784,380 rows taking up ~5.3GB of space.

I’m going to measure each method 3 ways:

  • How many pages it reads (gauged with SET STATISTICS IO ON)
  • How much CPU time it uses (gauged with SET STATISTICS TIME ON)
  • How fast it runs

Don’t obsess over small differences between the operations – I’m writing this blog post fast & furious to show you the big-picture differences, and to show you how my thought process works when comparing the different operations. In your own environment, for the tables you’re trying to count and the hardware you’re using and the version you’re on and the phase of the moon, you’re going to get different results, and that’s fine. There are also other ways to measure these methods depending on your own performance requirements: memory grants, ability to run without blocking, and even the accuracy of the results under concurrency. For the sake of these tests, I’m not going to talk about isolation levels or blocking.

I’m running these tests on SQL Server 2019 (15.0.2070.41) on an 8-core VM with 64GB RAM.

1: Plain ol’ COUNT(*) with only a clustered rowstore index, compatibility level 2017 & prior

The Votes table is only ~5.3GB, so I’m able to cache the whole thing in my SQL Server. Even after the query runs the first time and the data’s cached in RAM, this still ain’t fast:

  • Pages read: 694,389
  • CPU time: 14.7 seconds of CPU time
  • Duration: 2 seconds

2: Compatibility level 2019 (batch mode on rowstore indexes)

SQL Server 2019 introduces batch mode operations on rowstore indexes, previously only available on columnstore indexes. The payoff here is pretty awesome, even though we’re still dealing with just the rowstore index:

  • Pages read: 694,379
  • CPU time: 5.2 seconds
  • Duration: 0.7 seconds

Presto change-o – CPU just instantly drops thanks to batch mode. This isn’t obvious in the execution plans until you start hovering your mouse over individual operators:

Batch mode is a great fit for a lot of reporting-style queries doing aggregates over a lot of data.

3: Add nonclustered rowstore indexes, but use 2017 & prior’s row mode

I’m going to create an index on each of the 5 columns of the Users table, and then compare their sizes with sp_BlitzIndex:

Check out the number of rows in each index versus its size. When SQL Server needs to count the number of rows in the table, it’s smart enough to look at which object is the smallest, and then use that one for the count. Indexes can have varying sizes depending on the datatypes of the contents they’re indexing, the size of each row’s contents, the number of nulls, etc:

I’ll go back to 2017 compat level (removing batch mode operations) and then run the count:

SQL Server chooses to use the BountyAmount index, one of the smaller 2GB ones:

Which pays off in reading less pages, but we’re still performing the same count of 150M rows, so the CPU time & duration don’t really change:

  • Pages read: 263,322
  • CPU time: 14.8 seconds
  • Duration: 2 seconds

If you want lower CPU time & duration, you really need to approach the count differently – and that’s where batch mode operation helps.

4: 2019’s batch mode with nonclustered rowstore indexes

So now let’s try batch mode operation with the indexes in place:

It still uses the BountyAmount index and does the same number of reads as #3, but we get the lower CPU time & duration from step #2:

  • Pages read: 694,379
  • CPU time: 4.3 seconds
  • Duration: 0.6 seconds

So far, that’s the winner. But remember that batch mode originally went live with columnstore indexes, which are awesome tools for reporting-style queries….

5: Nonclustered columnstore index with batch mode

I’m purposely running in 2017 compat mode here because I want to make it clear where the awesomeness is:

The execution plan has our fancypants new columnstore index scan operator, and all of the operators in the plan are in batch mode:

I have to change my units of measure here:

  • Pages read: 73,922
  • CPU time: 15 milliseconds
  • Duration: 21 milliseconds

Hubba hubba. Let’s put that in perspective: I know some developers who try to hit system tables in order to count rows quickly, and they can’t even generate speedy results like this.

So to make SELECT COUNT(*) queries fast, here’s what to do:

In descending order of preference & speed, with the best results first:

  1. Get on SQL Server 2017 or newer, and put a columnstore index on the table.
  2. Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have. To learn more about the specifics, read Niko’s series on columnstore indexes, specifically the posts with the word “batch” in the title.
  3. Get on SQL Server 2019 or newer, and put your database in compat level 150 (2019) – even with rowstore indexes, you can still cut your CPU usage dramatically thanks to batch mode on rowstore. This one’s really easy to do – it probably requires the least changes to your application and database schema – but it’s just that you won’t have the amazingly millisecond-fast responses that a columnstore index can get you.

Here are the demo scripts if you wanna play around with ’em, like adapting ’em to work on your own tables.

My Home Office Studio Setup: Winter 2019

I run a lot of online training classes, and some of you who do webcasts from home might be interested to see how all this works. I did pretty well this year, so I upgraded my home office setup to sneak some expenses in on my taxes. Here’s what I’m running at the moment:

Desktop: Apple Mac Pro (2019) – the big investment this year. 16 cores, 96GB RAM, AMD Radeon Pro Vega II 32GB, and 4TB SSD, all of which are pretty easily expandable. I took the plunge here to keep my training class workflow going as quickly as possible. This isn’t about SQL Server at all: it’s about the fact that when I do live online video classes, I need a machine fast enough to stream video & audio live, and record in 4K, and encode other recordings, all without making any fan noise whatsoever.

I tend to teach in 45-60 minute modules, and then take a 15-20 minute bio break. In each bio break, I rapidly edit the video recording of the last hour, put in chroma key effects, clean up the audio, and kick off the encoding process. The encoding can take 15-20 minutes since the source material can be 4K depending on which screen I recorded in that last module. I wanted to be able to let the encoding & uploading happen even while I was teaching the next module, but I just couldn’t do that on my MacBook Pro – the CPUs would max out.

Could I do it by cobbling together a home-built machine (or multiple machines) with water cooling and silent fans? Maybe, but then I’m spending time troubleshooting desktop problems, and they always seem to crop up right when you least want ’em, right? I wanted something bulletproof that just worked.

CPU usage while encoding videos, recording, and streaming

The Mac Pro’s excellent silent thermal management easily handles 16 physical cores. In the screenshot at right, my CPU rarely hits 50% – leaving me plenty of headroom to start experimenting with live video effects. I’ve never heard the fans, so my training class audio is great, and the streaming & recording never stutters at all despite the simultaneous encoding & uploading. (In downtown San Diego, we get 1Gbps fiber for $80/month and no caps.)

The desktop was expensive, but totally worth the investment. Now, at the end of each day’s classes, I only have to edit the last module & encode it, and I’m done for the day – rather than having to queue up a whole bunch of encodes. I had almost 100 students in last week’s Mastering Index Tuning class, and the hardware investments let me just focus on the students, not my studio gear. Worth it.

Microphone: Sennheiser AVX plugged into a FocusRite Clarett 2Pre – the Clarett is the little red box sitting atop the Mac Pro, and if you squint, you can see the Sennheiser and its little blue antenna. I wanted a wireless lavalier microphone so I could move around a lot while presenting, and could walk away from the computer during bio breaks without constantly unplugging/plugging in a microphone.

Next up, the desk:

Ring light: Neewer 18″ – casts an even, flattering, dimmable light on the face. In the Amazon pictures, it shows a phone mount, and you can certainly do that – it’s popular for things like beauty podcasters – but you can just unscrew the phone mount, and in its place, mount the…

Camera: Logitech Brio 4K – USB-C webcam that’s pretty good, although not great. Hanselman’s post about webcams shows how to use a “real” camera that outputs HDMI, but I haven’t been able to find a setup that works with GoToWebinar, which is kinda picky about the kind of camera inputs that it’ll take. I’m kinda-sorta on the lookout for a replacement to GoToWebinar in 2020, something with better GDPR compliance and the ability to use any webcam, including the Elgato CamLink. For now, this is the easiest way to get good-enough video that can be simultaneously used by both ScreenFlow (which I use for screen recording) and GoToWebinar.

Displays: ViewSonic 32″ 4K and AOC 22″ – the ViewSonic was on sale for $340 and the AOC was $99. While I might splurge on an Apple desktop, ain’t no way I was going to blow $5K on an Apple Pro Display and another $1K for the stand. I was tempted, but not that tempted.

  • When I’m lecturing with slides, my main monitor has the PowerPoint presenter view, and the students see the 1080p feed on the AOC monitor. (Any higher resolution is just a waste of bandwidth for the attendees.) I record the 1080p screen. I also put an iPad & keyboard on my desktop (not pictured) showing the Slack feed with the student questions & discussion. If I need to put a link in the Slack chat, I can do that quickly on the iPad without disrupting the screen recording.
  • When I’m doing demos, I escape out of PowerPoint, and I use a 1920×1080 RDP or VMware session on the main monitor, just under the ring light & webcam. (That way, I’m still looking at the camera when I’m working on the demos.)

Green screen: Elgato Green Screen MT – mounted to the wall behind me, and pulls down like a projector screen. In post-production, when I’m editing the recordings for Instant Replay here on the site, I can use Screenflow’s chroma key effect to make my video overlay on top of the slides and SSMS. I light the green screen with a pair of Emart LED lights, which I like because they don’t get warm.

Screen capture: Telestream ScreenFlow – it’s like Camtasia, but I found it much easier to use and more powerful. It lets me record my desktop, webcam, and audio locally while I’m broadcasting, thereby getting a much higher quality than just getting the GoToWebinar recordings (which are heavily compressed.) After recording stops, I can edit in ScreenFlow, use audio plugins like iZotope RX7, make my background disappear with chroma key effects, add callouts and transitions, and much more. The only criticism I have of ScreenFlow is really small: it only records one videocamera at a time. I’d like to step away from the computer from time to time during the webcasts, and do a fireside chat story time with the attendees. That’s doable with GoToWebinar – I can switch camera feeds – but not doable with ScreenFlow yet.

Control panel: Elgato Stream Deck XL – I bought this earlier in the year, and I still haven’t gotten around to configuring it yet, but it’s on my list for the next round of Mastering classes. The Stream Deck has 32 customizable keys that I should (in theory) be able to customize to play sound effects, put gifs into Slack, award trophies for questions, etc. I’m curious to see how it cooperates with apps while PowerPoint is running fullscreen – I may need to start running PowerPoint in a VM instead.

Voting Open for #SQLBits Sessions, and Here’s My Oddball Idea

Voting is open now for SQLBits sessions, and I have to warn you that there are a LOT of sessions to choose from. Bits doesn’t just flat out take the top vote-getting sessions – they use votes as part of the session selection factor – and I don’t want you to think I’m campaigning for votes.

In fact…it’s kinda the opposite.

See, this year, Bits suggested that we offer multi-part sessions: several sessions in a row that build on each other. A few years back, I gave a half-day 500 Level Guide to Career Internals session at the PASS Summit, and it was really fulfilling. I figured, why not try breaking that up and updating it? Here’s the new 3-part series:

They’re meant to build on top of each other, layering in more details in each subsequent session. If these get picked, I fully expect attendance to drop down through the sessions – my goal is to give people enough information to decide whether the next session is right for them. Not everyone is going to be willing to take this kind of journey, but at least attendees will feel more comfortable about making the compromises necessary to get financial independence earlier.

So if you want those sessions, great – vote for ’em. But if you don’t, take a few moments to search through the abstract list and cast votes for sessions you DO want to see, whether they’re mine or not. Picking the right material is hard – conference organizers have a hell of a job – and this is the part where you can help. I know it’s not easy – there are a lot of submissions – but you really can help.

The Two Ways to Fix Non-Sargable Queries

In the database world, when we say something isn’t sargable, what we mean is that SQL Server is having a tough time with your Search ARGuments.

What a sargable query looks like

In a perfect world, when you search for something, SQL Server can do an index seek to jump directly to the data you want, then read only the rows you want, and nothing more. As an example, say we’ve got an index on DisplayName, and we look for users with a specific display name:

Then our execution plan shows a nice, tidy index seek with one row read and one row actually produced:

What a non-sargable query looks like

Say I don’t trust the data in my database, and I ask SQL Server to strip off the leading & trailing spaces:

Now, my execution plan looks a lot worse: I’m reading a lot more rows than I’m actually producing.

The two ways to fix it

If you want to read less data, you can either:

  • Change the query to make it sargable, or
  • Pre-bake data on disk to match what you’re looking for

Changing the simple query in this case is pretty easy: we’d just remove the LTRIM/RTRIM. (I’ll talk about more complex situations at the end of the post.)

Pre-baking the data on disk can be as simple as adding a computed column, and then indexing it:

Now my execution plan is awesome again, reading just one row and producing one row:

My favorite part is that we didn’t even have to edit the query! The query still refers to LTRIM(RTRIM(DisplayName)). That good ol’ SQL Server looks at the query and says, “Ah, they’re looking for LTRIM/RTRIM DisplayName, and it just so happens I’ve already pre-baked that data on disk!” He uses the index on the trimmed column even though we didn’t ask for it.

My least favorite part is what happens if I flip the order of LTRIM/RTRIM, or if I use the new TRIM command:

SQL Server’s all, “Whoa, I’ve never seen functions like that in a WHERE clause, and I don’t have any of that data pre-baked,” so he ends up scanning the table again:

So if my users were regularly running non-sargable queries using all 3 methods (LTRIM/RTRIM, RTRIM/LTRIM, and TRIM), then I would either have to create 3 different indexes (bad idea), or else create just one of ’em, and incentivize my users to query the data in a consistent way.

To learn more about this technique and how to solve more complex non-sargable queries, check out the Mastering Index Tuning module on filtered indexes, indexed views, and computed columns.