Autoparameterized Trivial Queries May Not Get Partition Elimination

How’s that for a niche topic? You’re probably never going to know this, but since I had to figure it out the hard way, I’m writing this down so I don’t forget it: queries that get trivial optimization may not get partition elimination.

This post is going to be a little on the long side just because I have to set up a partitioned table. I also have to warn you that if you’re a partitioning pro, you’re not going to like the way I’m doing it in this post, but the thing is, I’m reproducing an exact client problem – setting up partitioning exactly the way their table had it. Let’s not waste time flaming the partition setup.

Setting up a partitioned table for the demo

I’m going to start with the Users table from the Stack Overflow database.

To reproduce the client issue that brought me here, let’s say we always filter users by Location and DisplayName, like this:

And say that we decided to partition the users by ranges of locations, alphabetically. I’ll create a partition scheme, function, and then copy the Users table into a Users_partitioned table.

Yes, you’re going to be angry that Location isn’t a great candidate for partitioning because data isn’t evenly distributed by Location, and you’re probably going to be angry about RANGE LEFT, and you’re going to have questions about different collations. Zip it. That’s not the point of this demo. This was the client’s existing partitioning strategy, and I have to do several demos for them to show different issues that we’re having with the setup. I love you a lot – no, a LOT – but I’m not going to craft a different demo for you. I’m just sharing this one demo with you because it’s easy to share publicly, whereas some of the rest I need to show can’t be shown publicly because it relates to their IP. I’m typing all this out because partitioning people are passionate about proper performance, and I just know they’re gonna flame me in the comments as if *I* designed this strategy. Anyhoo, moving on.

Running a trivial query

Let’s try our query to see if it divebombs into just one partition – the one containing San Diego. The actual query plan looks simple enough:

And it looks like we divebombed straight in, but right-click on the Clustered Index Seek and click Properties:

“Actual Partitions Accessed” shows that we accessed partitions 1 through 27. Another way to see it is SET STATISTICS IO ON, which shows that we read all 27 partitions of the alphabet.

Why?

Right-click on the SELECT operator in the plan and look at the properties, and scroll down to Optimization Level. SQL Server believed this query was trivially simple, so it didn’t put much thought into building an execution plan.

It’s not that SQL Server didn’t put any thought into it – note the 122ms of Compile CPU time. Partitioned tables generally see longer compilation times, even when they’re trivial. (And the more partitioned tables you have in a query, the worse this seems to get.)

Even though our query didn’t have parameters, SQL Server thought, “They’re probably going to run queries like this a few times, so I’m gonna automatically turn those literals (San Diego and Brent) into parameters.” Note the “Parameter List” in the screenshot – this is autoparameterization.

Now look, I gotta pick my battles here: I can’t teach you all of the intricacies of autoparameterization and trivial optimization inside one blog post.

Let’s set that aside, and keep going by building a stored procedure.

Put that same query in a stored procedure, and magic happens.

Instead of passing the literals in directly from the app, let’s make a stored procedure. Making a proc isn’t the only way of fixing this problem by any means, but it’s just a way of fixing it:

Now the logical reads tell a different story:

And while the actual execution plan looks the same at first, showing a clustered index seek:

Look at the clustered index seek’s properties, and it shows that we only accessed 1 partition (partition count), partition #20:

And it isn’t because this query got full optimization, either!

It’s still trivial. That’s kinda wild.

The morals of the story are NOT:

  • “You should put every query in a stored proc”
  • “You should use variables”
  • “You should add complexity to your queries to make them get full optimization”

The morals of the story ARE:

  • Just because you see a clustered index seek on a partitioned table doesn’t mean it really seeked into one specific area.
  • Partition elimination investigation requires looking at partitioned tables in a query plan and checking to see how many partitions were accessed.
  • Just because you should get partition elimination doesn’t mean you will.
  • Just because you do get parameterization in a stored procedure doesn’t mean you will get it everywhere that you run a similar query.
  • Partitioning really isn’t a query performance feature: it’s a maintenance feature, making data loading & index maintenance easier, especially for columnstore tables. This whole article just plain ol’ wouldn’t have mattered on a non-partitioned index at all: we would have gotten a plain ol’ clustered index seek, less logical reads, and faster plan compilations.
  • You can get faster query performance with table partitioning – but in most of the shops where I see it, it’s making query performance worse, and we have to put additional work in just to get equivalent query performance that a non-partitioned table would get.

I said you couldn’t flame me for the partition design on this table, but…you’re totally allowed to flame me for that last bullet point. That’s fair game.


When Do I Need to Use DESC in Indexes?

If I take the Users table from any Stack Overflow database, put an index on Reputation, and write a query to find the top 100 users sorted by reputation, descending:

It doesn’t matter whether the index is sorted ascending or descending. SQL Server goes to the end of the index and starts scanning backwards:

If you right-click on the Index Scan and go into Properties, you can see that the data is ordered, and SQL Server is scanning backwards:

You don’t need a separate descending index for that.

But if you sort multiple fields ASC/DESC/ASC, it gets complicated.

Say we’re looking for the highest-reputation people who live in London, sorted by reputation descending, and in the event of a tie, we want them listed in alphabetical order. Here’s an index we might build, plus the query:

In true Clippy style, SQL Server is recommending an index on Location – but with the rest of the columns just included in the index, not even sorted. Good times.

Ah, Clippy, good times. Stay in school, buddy.

We’re getting the sort because our data is kinda sorted, but not sorted enough, and if you hover your mouse over the Sort and look at the bottom of the tooltip, you’ll see that SQL Server is sorting by Reputation descending, DisplayName ascending.

To understand why, think about how the data is arranged when we seek to London, go to the highest reputation, and start reading backwards. Here’s a visualization query to see what’s in the index:

To simulate a backwards scan, go to the end of the result set and start reading upwards. At first, it looks like the data is perfectly sorted, but as you continue to scan backwards and start hitting ties, we have a problem:

If you’re reading from the bottom up:

  • Row 6752: you read the first 1140, tsvallender
  • Row 6751: you read another 1140, and you realize that the data’s not in order
  • You could in theory now jump back down and re-read 6752, and now you have the 1140s in order, but…how do you know that 6751 was the last 1140? You’d have to look up at row 6750
  • Row 6750: you read this, and it’s the first 1138, but
  • Row 6749: he’s also 1138, so you have to keep reading upwards, and…

That’s gonna get old. It’s too much random jumping around, and it’s not a scan anymore, so rather than doing that dancing during the reading, SQL Server just says, “I’m gonna add a sort to the execution plan because the data isn’t ordered the way I need it to be ordered.”

We could fix that with a DESC index.

But it can’t just be EVERYTHING descending. The sort order has to match our query’s order, like this:

So now our execution plan doesn’t have a sort or a memory grant:

Thing is, though, I almost never need to use this trick. Most of the time, the sort in the query plan just isn’t that expensive – like in this case, if you repeatedly compare the two queries, we’re talking about very small differences in memory grants and CPU consumption. The difference grows as the volume of sorted data grows, like if we’re talking about bringing back millions of rows, or if the query frequency grows, like if we’re running the query thousands of times per second.

Want to learn more tricks like this?

If you like tricks like this, you’ll love my Mastering Index Tuning class. The next one starts December 8th (iCal), and after that, Feb 12-14 (iCal.)

Folks with a Live Class Season Pass are welcome to drop in anytime, or just watch the Instant Replays on a schedule that works for them. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.

See you in class!


How to Set & Get the Next ID Without Serializable Isolation

Say for some reason you can’t use identity columns or sequences, or maybe your application was built a long, long time ago – and you’ve got a table with a list of IDs to use next:

Whenever your application wants to insert a row, it needs to do two things: it needs to grab an ID for that table, and it needs to increment the CurrentID by one. This is a common design pattern I see with older applications that need to get an ID from the database, but then do some processing on the application side. For example, they want to reserve an OrderID, and then in the application code, they build a list of insert statements for not just the Order, but the line item tables as well.

One way to code this would be to use serializable isolation while you work, holding a transaction so that nobody can change the table while you’re working in it:

In case you haven’t seen output variables before for procs, here’s how you use them:

This stored procedure works, but it doesn’t scale well when you get into thousands of inserts per second, especially scattered across lots of tables. (And I’m simplifying here: in a real-world scenario, this kind of stored procedure would have some error checking built into it as well, especially given the blocking scenarios you can hit with serializable isolation levels.)

Here’s a quick improvement that doesn’t require ripping out the stored procedure and switching to identity columns or sequences:

This leverages the fact that you can both update data AND set variables during an update statement. With this trick, I don’t need to touch the Ids table multiple times, which means I don’t need a transaction, which means I don’t need serializable. Suddenly, this really opens the floodgates on concurrency with this table.

I still like identity columns & sequences better, though.

Want to learn more tricks like this?

If you like tricks like this, you’ll love my Mastering Query Tuning class. I’ve got two upcoming sessions: December 11-13 (Fri/Sat/Sun, iCal) and January 12-14 (Tues/Weds/Thurs- iCal.)

If you’ve got a Live Class Season Pass, you can drop in on this class or any of my live online classes at any time. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.


How to Insert Rows and Get Their Identity Values with the OUTPUT Clause

Say you’ve got a two-step process where you’re:

  1. Inserting rows into a table that has an identity column, then
  2. Querying that table to figure out what identities you got

There’s a a faster way that doesn’t require hitting the table twice: the OUTPUT clause.

I’ll demonstrate it with the Badges table from the Stack Overflow database, which has an Id column that’s an identity. I’m going to find all of the Users who live in Iceland, give them a badge, and then return the list of badge IDs I just granted:

This code pattern is kinda painful because:

  • We lock rows twice (when we read & insert, and then again when we read back what we just did)
  • We have to be careful to only fetch the rows we really just inserted – so we end up putting in all kinds of convoluted logic to work around concurrency problems

Instead, use the OUTPUT clause.

Here’s how to do it without touching the Badges & Users tables twice:

The OUTPUT clause is kinda like the virtual INSERTED/DELETED tables: it lets you grab the output of what you’re doing and redirect it to another place. Presto, less locking, less T-SQL to manage, less guessing as to which rows were affected by your operation.

Isn’t that slick? In my own line of work, I sure don’t need to use it often, but when I do, it’s amazing.

Want to learn more tricks like this?

If you like tricks like this, you’ll love my Mastering Query Tuning class. I’ve got two upcoming sessions: December 11-13 (Fri/Sat/Sun, iCal) and January 12-14 (Tues/Weds/Thurs- iCal.)

If you’ve got a Live Class Season Pass, you can drop in on this class or any of my live online classes at any time. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.


Finding the One Query to Tune in a Multi-Query Batch

When you have a stored procedure or application query that has multiple queries inside of it, how can you tell which query to focus on?

For starters, do NOT believe the percentages you see in execution plans.

Let’s take a classic performance tuning issue: scalar functions. I’ll start with the Stack Overflow database – if you want to play at home, best to use the smallest database you can because this query performs terribly – then create a function and a stored procedure:

Then I’ll call the stored procedure, which takes 30 seconds even on the smallest Stack Overflow database:

And then use sp_BlitzCache to ask which queries ran the longest:

I’ve rearranged the sp_BlitzCache output columns in this screenshot to tell the story better:

In the result sets:

  1. The stored procedure itself shows up first with a total duration of 33,365 milliseconds
  2. One specific statement in the proc took 33,221 milliseconds of the above
  3. Another statement took just 143 milliseconds

When I click on any query plan in the “Query Plan” column, I’m going to get the query plan for the entire batch, not that specific statement. It doesn’t matter whether I click on the plan in line 1, 2, or 3 – I’m going to get the entire stored proc’s plan. That gets a little confusing because now, looking at the plan, we have to figure out which query took 33 seconds, because that’s the one we need to tune. Can you guess which one it is?

I’ll zoom in a little to help:

You might guess that it’s Query 1.

It has the biggest arrows and shows 100% of the cost.

Except that’s wrong.

That’s 100% of the estimated cost, calculated before the query executed.

Take another look at the sp_BlitzCache output, and pay particular attention to the Cost column:

The top query is the stored procedure, which took 33.3 seconds in total. Its total cost is so large that SQL Server resorted to scientific notation. The second query, which took the vast majority of the time, only cost 726 query bucks – way, way less than line 3.

Hover your mouse over the root node in each plan (in this case, the selects) and look at their costs. The one that cost 726 query bucks is actually Query 2, not Query 1:

To find the right query to tune,
listen to sp_BlitzCache.

When you’re looking at sp_BlitzCache’s output for multi-statement queries, pay particular attention to the “Cost” column.

Don’t look at the highest cost – instead, just look at the statement that ranks first in sp_BlitzCache’s output, then make a note of its cost. When you open up the query’s execution plan, look for that statement & that cost number – and that’s the real query that you need to focus on tuning. It’s often very different than the one with the highest estimated cost.

People who liked this post also liked Erik Darling’s technique for getting specific query plans from long stored procedures. He edits the proc to turn on SET STATISTICS XML ON right before the specific query runs, and then turning it back off again immediately afterwards. Slick!

Want to learn more query tuning tricks?

You’ll love my 3-day Mastering Query Tuning class. I give you live, running workloads hitting the Stack Overflow database, and you have to investigate & solve the issues using tricks I taught you in class. The upcoming rotations are:

And you can join in live anytime if you have a Live Class Season Pass, or watch the Instant Replays.


What It Takes To Write Two Blog Posts

This week, I published two blog posts:

Let’s talk about the process of writing ’em.

A couple I was putting together this week’s First Responder Kit release, I realized sp_BlitzIndex didn’t have URLs for a couple of common families of problems: bad statistics and scalar user-defined functions. I made myself a couple of Github issues (#2670 and #2671) to track the documentation work I needed to do, and I decided to live stream it on this past Saturday so I could show y’all what my blogging process looks like.

In this two-hour session, I walk you through writing those two posts:

In that session, here are some of the things I talk about:

  • Your blog has two kinds of readers: your regulars and people who just found this one specific page via a Google search. These two posts were specifically crafted for the latter. Sure, my regular readers would consume the info, but they wouldn’t be raving about how awesome the posts are.
  • Scoping a post is hard: it’s hard to limit yourself to just writing about specific parts of an issue. It’s really tempting to just let the writing flow, and then next thing you know you’ve lost an entire day and you’re nowhere near what you’d consider “finished.” I try to scope my posts with the clock: how much can I actually cover in an hour? Sometimes I write for a very junior-level reader (as Nick says on the stream, someone who’s on chapter 1) and sometimes I write for an expert-level reader (and in that case, I don’t cover a lot of the prerequisites.)
  • Anytime you feel guilty for not covering more scope in the post, remember that you can finish the post with a list of recommended reading for the reader to continue their learning journey.
  • Writing takes up time. I wrote these posts on a Saturday morning, and about 75 minutes in, I get a text from my wife, making me write with a little bit more urgency. That’s a good reminder that the time you put into blogging needs to pay off somehow – whether it’s in the form of improved mental health for you, or a sense of reward for helping others, or literally making you more money. I’m a consultant and trainer, so the blog posts and videos are effectively marketing material for my “day job.” That makes it easier to put in work because I hopefully see a return on it later.

If you want to learn more about the process of writing to forward your career, check out my 2-hour conference session, 500-Level Guide to Career Internals.


When You’re Troubleshooting Blocking, Look at Query #2, Too.

When I’m troubleshooting a blocking emergency, the culprit is usually the query at the head of a blocking chain. Somebody did something ill-advised like starting a transaction and then locking a whole bunch of tables.

But sometimes, the lead blocker isn’t the real problem. It’s query #2.

Here’s a sample scenario:

  1. A long-running select with nolock starts, taking out a schema stability lock: nobody can change the structure of the table while he runs
  2. An online index operation needs to finish up, which needs a schema modification lock on that same table in order to switch in the newly built index – but he can’t do that until select #1 finishes
  3. Other selects with nolock start – but they need a schema stability lock, which they can’t get until #2 finishes

Most monitoring tools will say that #1 is the lead blocker, and they’re technically correct. However, it’s really confusing for users because they look at the monitoring tool and ask questions like:

  • “How can that #1 query possibly block someone else?”
  • “What could I even do to reduce the impact of select #1?”
  • “How are a bunch of selects with nolock being blocked by another select with nolock?”

Reproducing the problem with an “online” index rebuild

Let’s reproduce it with the Stack Overflow database. I’m going to put a tiny index on the Posts table:

That index only has a few rows in it, so it’ll be quick to rebuild – and the rebuild is what I’m going to use to illustrate the problem.

Query #1: To start our blocking chain, I kick off a long-running select with nolock:

Query #2: I kick off an online index rebuild on that tiny filtered index – which would normally happen instantly, but it needs a schema modification lock to switch in the new index:

That query is blocked by #1, which wouldn’t be a big deal, but now…

Queries #3: I fire off Query #1 again, but this time in SQLQueryStress so I can throw a lot of sessions at it at once:

They’re all blocked.

The problem is easy to diagnose with sp_WhoIsActive.

Here’s what the situation looks like in sp_WhoIsActive @find_block_leaders = 1, which helpfully organizes the blocking chains in a row:

The lead blocker looks like a select with nolock – but if you’re going to troubleshoot something, that’s not the query to troubleshoot. He’s fairly harmless. Instead, you have to go down a level to figure out who’s blocking the rest. I just love this approach.

It's the one between the aubergine query and the eggplant query.

See the index rebuild?

But the reason we’re gathered here today, dear congregation, is because I had to troubleshoot this exact issue for a client. They were seeing strange blocking problems that didn’t make sense: all day long, random queries that never should have been a lead blocker were suddenly blocking dozens or hundreds of other queries out of nowhere.

Their monitoring tool just listed all of the waiting statements in a pile – without distinguishing query #2, which was the real problem. The client didn’t realize there was a hidden “online” index operation in that colorful mess. That was the real culprit, but the monitoring tool only showed query #1 as the lead blocker – and since query #1 was different every time the index rebuild would sneak in, it was nearly impossible to troubleshoot.

In this case, SQL Server 2014 & newer has an easy solution.

SQL Server 2014 introduced a few new options for index rebuilds:

Now, when I rerun the same scenario, sp_WhoIsActive looks wildly different:

Only the “online” index rebuild is blocked, and he has a different wait type – he’s waiting at a low priority. He’s sitting by in the background, graciously letting other queries get in ahead of him.

If you use Ola Hallengren’s IndexOptimize script, use the parameters for WaitAtLowPriorityMaxDuration and WaitAtLowPriorityAbortAfterWait to configure these options.

Want to learn more about troubleshooting this stuff?

Check out my How I Use the First Responder Kit class. I walk you through using sp_BlitzWho, sp_BlitzFirst, sp_BlitzIndex, and many of the other scripts in the open source First Responder Kit.


How Scalar User-Defined Functions Slow Down Queries

T-SQL
6 Comments

When your query has a scalar user-defined function in it, SQL Server may not parallelize it and may hide the work that it’s doing in your execution plan.

To show it, I’ll run a simple query against the Users table in the Stack Overflow database.

I don’t have an index on Reputation, so SQL Server has to sort all of the Users by their Reputation. That’s a CPU-intensive operation, so SQL Server automatically parallelizes it across multiple CPU cores:

The racing stripes on the plan operators indicate that the operations went parallel. Another way to see that is by using SET STATISTICS TIME ON, which adds CPU time and execution time information to the Messages tab of SSMS:

See how CPU time is higher than elapsed time? That’s an indication that the query went parallel. Because work was distributed across multiple cores, SQL Server was able to get 969 milliseconds of work done in just 357 milliseconds.

But when we add a scalar user-defined function…

Say our users want to see their names and locations formatted a little more nicely. Instead of two columns that say “Brent Ozar” and “San Diego”, they want a single column that says “Brent Ozar from San Diego”. And I don’t want to put that concatenation logic all over the place in every query I write, so I encapsulate it in a function:

The results are a little easier on the eyes:

Now, when I use that function inside the query, the query technically works fine:

But the execution plan is missing a little something:

And the statistics time output shows that it still needed a lot of CPU, but since it didn’t go parallel, it took longer on the clock:

If you dig deeply enough in the execution plan properties, SQL Server notes that it couldn’t build a valid parallel execution plan, but it doesn’t say why:

There are a few ways to fix this.

One way is to inline the contents of your function – literally copy/paste the scalar function’s contents directly into your query:

This query goes parallel, proving that the concatenation and coalescing isn’t what was stopping us from going parallel – it was the presence of a scalar user-defined function:

Statistics time output shows that we went parallel and ran faster overall:

 

Another way to fix it is to upgrade to SQL Server 2019 and set your database to 2019 compatibility level. One of 2019’s most ambitious features, code named Froid, aims to automatically inline scalar user-defined functions without you having to rewrite them.

When I set my database into 2019 compat level, the query with the scalar function goes parallel again:

And statistics time output shows the performance improvement:

However, SQL Server 2019’s scalar function inlining comes with a huge number of drawbacks. Your query can actually go slower, or your scalar function may use features that SQL Server 2019 refuses to inline.

To fix this problem:

Want to learn more about troubleshooting these kinds of issues?

Check out my How I Use the First Responder Kit class. I walk you through using sp_BlitzWho, sp_BlitzFirst, sp_BlitzIndex, and many of the other scripts in the open source First Responder Kit.

If you’ve got a Live Class Season Pass, you can drop in on this class or any of my live online classes at any time. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.

Or, if you’ve got a Recorded Class Season Pass, you can hop in and watch the recordings in your account at any time.

See you in class!


How Bad Statistics Cause Bad SQL Server Query Performance

SQL Server uses statistics to guess how many rows will match what your query is looking for. When it guesses too low, your queries will perform poorly because they won’t get enough memory or CPU resources. When it guesses too high, SQL Server will allocate too much memory and your Page Life Expectancy (PLE) will nosedive.

A common fix is to update your statistics regularly.

In this post, I’ll explain why bad stats cause bad performance, and give you some links on how to improve your statistics & performance.

When Statistics are Accurate…

Let’s start with a query against the Stack Overflow database. I’m using the StackOverflow2013 50GB database, but any size will get roughly similar results here.

To execute this query, SQL Server needs to guess how many people live in Boise so that it can allocate enough memory to sort the query results in RAM. I’ll run the query and include the actual execution plan to see what it guessed:

When you’re viewing an execution plan, look at the “195 of 46” numbers under operator. That means SQL Server actually brought back 195 of an expected 46 rows. That sounds bad – it sounds like SQL Server underestimated rows by something like 423% – but that actually isn’t too bad. I only get nervous when estimates are off by 10x or more, and that’s when SQL Server underestimates how much memory is required to do things like sorts.

In this case, SQL Server guessed really well! It used built-in, automatically-created statistics to guess how many people live in each location. It accurately guessed how much memory it would need, and there’s no yellow bang over the Sort operator because the entire sort happened in memory.

To learn more about how these statistics are generated and what their contents are, stay tuned, and I’ll give you learning resources at the bottom of the post.

When Statistics are Inaccurate…

But let’s say that a lot of people suddenly move to Boise. Manhattan folks have gotten tired of the high rents and they’re feeling a lust for delicious russet potatoes, so let’s move everyone from New York, NY to Boise:

That adds a couple thousand folks to the census. Now, if we run our same select query again, let’s see how our estimates fare:

Uh oh – SQL Server still estimated 46 people live in Boise. Before, the underestimation wasn’t that bad, but now that 2,228 folks live in Boise, SQL Server didn’t allocate enough memory for the sort, so the sort ends up spilling to disk, thus the yellow bang on the Sort operator. If you hover your mouse over the Sort operator, you can see more details about how SQL Server wrote to TempDB:

How to Fix Bad Statistics

SQL Server automatically updates statistics as your data changes, but that may not be enough. You may have to also manually trigger statistics updates. I’ll update the Users table statistics, then run my query again:

Now when I run my query again, the estimates are way more accurate:

And the sort no longer spills to disk.

Now, you probably don’t want to be running manual statistics updates all the time because this can actually trigger a whole lot more problems. However, the purpose of this blog post was to get you started on your learning journey.

Here are the resources I’d recommend next:

Want to learn more about troubleshooting these kinds of issues?

Check out my How I Use the First Responder Kit class. I walk you through using sp_BlitzWho, sp_BlitzFirst, sp_BlitzIndex, and many of the other scripts in the open source First Responder Kit.

If you’ve got a Live Class Season Pass, you can drop in on this class or any of my live online classes at any time. Just head to my current training page, check out the schedule at the bottom of the page, and grab the calendar files for the classes you’re interested in. You don’t have to register ahead of time – just drop in anytime I’m streaming.

Or, if you’ve got a Recorded Class Season Pass, you can hop in and watch the recordings in your account at any time.

See you in class!


Updated First Responder Kit and Consultant Toolkit for November 2020

There’s a bunch of improvements & fixes in this month’s release, but before we get to those, I want to talk about you, dear reader.

You make the First Responder Kit possible.

When you buy my stuff, you’re funding my work on the First Responder Kit. I couldn’t do this if it wasn’t for you.

Y’all literally pay my rent.

I thank you. Erika thanks you. The rest of the community thanks you, too, because we all rely on the First Responder Kit to do our jobs every month.

And of course, if you’re already a member here, or a code contributor, thank you. You rock. You make all this possible for the entire SQL Server community.

To get the new version:

Consultant Toolkit Changes

I updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy the /Resources/SQLServerSetup checklist folder over your existing files to replace the SQL scripts.

sp_Blitz Changes

  • Improvement: new check for log backups to NUL, which would break point-in-time recovery. (#2640, thanks DBAdminDB.)
  • Fix: stop alerting about usage of service accounts on the Launchpad service. (#2664, thanks Todd Chitt.)

sp_BlitzCache Changes

  • Improvement: added @SortOrder = ‘unused grant’, useful for when you have a lot of queries that are all hitting the max grant, and you’re looking for low-hanging fruit of the ones that use the least memory. (#2646)
  • Fix: removed extra LOWER calls. (#2633, thanks Maikel van Gorkom)
  • Fix: when outputting the results to table, the PlanCreationTimeHours calculated column now correctly refers to the difference between when the plan was created vs when the data was saved to table, as opposed to the current time (which would have been artificially long). (#2655, thanks Adrian Buckman.)
  • Fix: parameterized queries in different databases were flagged as multiple query plans for the same query. (#2653, thanks Erik Darling.)

sp_BlitzFirst Changes

  • Fix: we were excluding wait types that weren’t in our predefined list, which was a problem if you were hitting a new or obscure wait type. (Hello, EC.) (#2631)
  • Fix: when saving sp_BlitzFirst’s output to table, we now trim it to 4,000 characters to avoid problems with really long warnings. (#2660, thanks Jefferson Elias aka JeffChulg.)

sp_BlitzIndex Changes

  • Improvement: revamped the sort orders to make the output more clear and actionable, and clarified the wording on some of the warnings. For example, “Cold Calculator” is now “Serial Forcer – Computed Column with Scalar UDF.” Also added a new documentation file with the prioritized list of checks. (#2662)
    • @Mode = 0 (default): now lists priorities 1-100, the most urgent things to fix.
    • @Mode = 4: all priorities, 1-255, which includes not-easy-to-fix stuff (like bad clustering key designs or really wide tables), and informational warnings (like hey, you have temporal tables.)
  • Improvement: added @SortDirection parameter that works in combination with the @SortOrder parameter for @Mode = 2, letting you sort by asc or desc. (#2651, thanks David A. Poole.)
  • Fix: statistics updates will now show “never” if they’ve never been updated. (#2643, thanks Excelosaurus.)
  • Fix: instead of showing 0 reads/writes in the summary, show “Not Tracked” on spatial indexes and “Disabled” on disabled indexes. (#2650)
  • Work in progress, but not really working well yet: on SQL Server 2019, next to missing index requests in the table-level Mode 2 output and Mode 3 output, we show a sample query plan linked to that missing index request. (#2185, thanks Erik Darling.)
  • Fix: statistics warning for low sampling rate was never firing because it was looking for rows < 1 rather than percent sampled < 1. (Caught & fixed this while I was working on #2662.)
  • Fix: statistics checks now work in @GetAllDatabases = 1. (Caught & fixed this while I was working on #2662.)
  • Fix: removed an extra space. (#2673, thanks John McCall.)

sp_BlitzLock Changes

sp_BlitzWho Changes:

  • Fix: was throwing an error on AG readable secondaries when sp_BlitzFirst called sp_BlitzWho due to duplicate temp table names. (#2491, thanks Adrian B.)
  • Fix: we now pad the number of days a query’s been running with 2 digits instead of 0, so queries running more than 9 days (!!!) sort to the top of the list. (#2652, thanks Ahmet Rende.)

sp_DatabaseRestore Changes

  • Improvement: @StopAt can now skip full backups, too. (#2600, thanks Sergedess.)
  • Improvement: new @SkipBackupsAlreadyInMsdb setting. When set to 1, we check MSDB for the most recently restored backup from this log path, and skip all backup files prior to that. Useful if you’re pulling backups from across a slow network and you don’t want to wait to check the restore header of each backup. (#2644, thanks Greg Dodd and Raphael Ducom.)

sp_ineachdb Changes

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


[Video] How to Troubleshoot Someone Else’s Temp Table Contents

Let’s say you need to troubleshoot someone’s query performance, and they’re using temp tables. You want to peek in at their temp table contents from another session while their query is running. SQL Server makes that a little bit challenging. Nic Neufeld showed how to use DBCC PAGE to do this, but today I’m going to cover another way: reading their statistics histogram.

I wrote this blog post on a live stream in my Twitch & my YouTube channels – subscribe to those if you want email notifications whenever I start live streaming stuff like this – or just keep reading below for the code & screenshot walkthrough.

Let’s start by creating a temp table in one window of SSMS:

Then, open another tab in SSMS, meaning another session, and try to read the contents of that other session’s temp table:

Sad trombone.

Let’s try another trick: let’s use the system tables to get the full exact name of the temp table. Whenever you create a temp table, SQL Server actually creates it with a unique name suffixed with the hex equivalent of the number of temp tables that have been created since the last SQL Server instance restart:

Armed with that table name, let’s now try querying it by using the super-secret full object name:

It’s not a matter of not fully qualifying the schema, either, because it’s not in the dbo schema:

Shout out to Terry Stoneberg for improving this screenshot

This would be where normal people give up, but you know how I am, dear reader: not normal.

Let’s try reading the temp table’s statistics.

Like tables in user databases, SQL Server will automatically create statistics on columns when it needs to make decisions about the number of rows that will match filters on that column. Let’s take a peek in TempDB to see if any statistics exist on that temp table:

Drats! That temp table doesn’t have statistics yet. This is the first weakness in this blog post’s strategy to read someone else’s temp table contents: you’re only going to get statistics when folks have queried specific columns. Let’s trigger that by going back into the original session that created the temp table, and running a query that will trigger automatic stats creation on one of the columns:

Now go back over to the window where we were diagnosing statistics, and check again:

Voila! Now, armed with a few more joins, we can see the contents of that temp table’s statistics:

Presto! We see the secret plans

Presto! We see the secret plans

Here’s the query if you want to look at what’s happening right now in your own TempDB:

Limitations of this technique

In the live stream, CrankyOldGentleman reminded us that statistics only have a max of 201 steps in the histogram, so we won’t see all of the data on real-life-sized tables. In addition, even if there are <200 rows, SQL Server may not use all 201 steps in the histogram because it may not be necessary in order to paint a great picture of our table’s contents.

In addition, as Paul White has so lovingly blogged, temp tables and their statistics can be reused from one session to another. Because of that, you wouldn’t want to say that one specific user’s query activity has produced the statistics that you’re seeing in this diagnostic query. However, that’s actually a good thing: this diagnostic query can show you when someone’s session is running with temp table statistics that may not match up with their current workload.

The biggest limitation that I’ve hit in production environments, though, is that these DMVs don’t always honor isolation level requests. Even if you prefix your queries with SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED and slap WITH (NOLOCK) on every single one of ’em, that last DMV query in the post still gets blocked by active sessions using their temp tables. This technique is useful in lightly loaded environments, but not heavily loaded environments with thousands of active queries per second.

I recorded this stream and wrote the post way back in August, but I completely forgot about it until I was writing material for my new Fundamentals of TempDB class. There’s so much fun stuff around TempDB.


Paul White Explains Temp Table Caching 3 Ways

TempDB
1 Comment

Paul White, Debugger of SQL Server

Paul White (Blog, @SQL_Kiwi) is a legendary SQL Server blogger who’s written a ton of good material. You should read all of it. You won’t, and I can say that with authority because…I don’t read it all either. It’s too much.

But when your job eventually requires you to understand temp tables way better than you do today, read these 3 posts in this order – which is probably different than what Paul would recommend, but I think they flow better this way:

1. Temporary Table Caching Explained – when you create a temp table, you probably think it’s exclusive to your own session. It’s not. Its structure (but not its contents) can be reused by other sessions, and understanding that this happens is the first step to really understanding the complexity in tuning query problems with temp tables. Paul has a few notes in here that are outdated (like around creating indexes on table variables), but overall, the concepts are still solid.

2. Temporary Table Caching in Stored Procedures – after the above, now let’s see what happens when you repeatedly call a stored procedure that uses temp tables. To do it, he starts by showing you how a real table would work, then switches over to temp tables. He shows why good stats help temp table plans, how bad cached ones hurt, how to fix it with updating stats, and how table variables perform differently.

3. Temporary Object Caching – rehashes a little of the earlier posts, but then also layers in TempDB performance issues and behavior of newer versions of SQL Server.

For years, I’ve been pointing my clients at these 3 posts whenever they hit TempDB problems – and these problems kept coming up so often that I ended up building my new Fundamentals of TempDB class. However, if you don’t have time to wait for that, or if you’d just like to get started now, Paul’s posts are one heck of an on-ramp. Enjoy.


MAXDOP Isn’t Really MAXDOP. It’s More Like DOP.

Here’s how Books Online describes the Max Degree of Parallelism setting:

You can use the max degree of parallelism option to limit the number of processors to use in parallel plan execution.

And here’s what the SQL Server 2019 setup screen says:

When an instance of SQL Server runs on a computer that has more than one CPU logical core, it detects the best degree of parallelism, that is, the number of processors employed to run a single statement, for each parallel plan execution. MAXDOP specifies the maximum number of cores to utilize for this instance.

Uh, no, SQL Server doesn’t really detect that.

In most cases, if SQL Server decides to parallelize your query, it’s goin’ straight to MAXDOP.

I’ll demo it with a 64-core server with SQL Server 2019’s defaults.

I’m using an i3.16xlarge AWS EC2 instance with 2 sockets, 64 cores altogether:

During setup, SQL Server suggests that we set MAXDOP to 8:

Which I find kinda amusing, since on an i3.metal with 72 cores (just 8 more cores), SQL Server 2019 setup suggests a MAXDOP of 15, but whatever.

I’ll add an index, then run a simple query against the Users table in a large Stack Overflow database:

The query just barely clears the default Cost Threshold for Parallelism (5), so it goes parallel:

Well, lemme rephrase that – the query looks like it goes parallel in the sense that there’s a parallelism operator, and icons on the plan have what Erik Darling calls racing stripes.

However, if you right-click on each operator, like the sort, and look at the number of pages assigned to each thread, it isn’t what you would call balanced. One thread read all of the data and did all of the work while the rest of the threads went out for a smoke break.

This is why your queries produce CXPACKET waits: SQL Server doesn’t do a great job of balancing work across cores, and when that work isn’t evenly balanced, SQL Server has to account for the time the idle cores aren’t doing any work.

If we raise our MAXDOP higher, the problem becomes worse. I’ll set MAXDOP up to 64, and then run the query again:

The execution plan looks identical:

But right-click on that Sort operator, and poor Thread 4 is the only thing holding this thing together:

And the query burned up 64 worker threads and generated 620 milliseconds of CXPACKET wait – in a query that finished in under 100ms:

This is why it’s so important to set Cost Threshold for Parallelism correctly.

It’s not enough just to set MAXDOP – because MAXDOP simply isn’t a maximum. It’s more like a minimum. If your query crosses over the CTFP, buckle up: you’re going way parallel, and even just MAXDOP 8 has CXPACKET issues you can’t fix.

There are indeed some cases where the degree of parallelism will be lower than MAXDOP – for example, when you try to set MAXDOP 0 on a 72-core server – but it’s nowhere near like what the documentation describes, which makes it sound like a query-by-query decision based on the size of your workload.


Announcing a New Live Online Class: Fundamentals of TempDB

You’ve been working with Microsoft SQL Server for a couple of years, and you know a little bit about TempDB. You’ve heard that temp tables and table variables have different performance characteristics than regular user database tables, and you’ve heard that you’re supposed to have multiple data files.

You’re wondering what exactly goes on behind the scenes.

My newest one-day live online class is for curious folks who want to learn:

  • What uses TempDB, and how that affects performance: not just temp tables and table variables, but also triggers, cursors, sorting indexes, workspace spills, the version store, and more
  • How to host TempDB depending on how you use it: understanding what goes in the log file, the data files, why the number of data files matter, and whether your particular workload should be on SAN, local storage, or ephemeral storage, and why
  • How to monitor it: after you understand what uses it and how to host it, you need to keep tabs on how its performance changes over time using DMVs, Perfmon counters, and the First Responder Kit scripts

This course is 90% demos: the only slides are the introductions at the start of the day, illustrations to support a few topics, and the recap at the end of the day. The rest of the time, we’ll be working in SQL Server Management Studio. Roll up your sleeves and join me!

The first two dates will be:

This class is free for my Live Class Season Pass holders: you automatically get all new live classes that run while you’re a member. After the first live class on December 7, the recordings will also be free for my Recorded Class Season Pass holders, too.


Why Full Text’s CONTAINS Queries Are So Slow

SQL Server’s full text search is amazing. Well, it amazes me at least – it has so many cool capabilities: looking for prefixes, words near each other, different verb tenses, and even thesaurus searches. However, that’s not how I see most people using it: I’ve seen so many shops using it for matching specific strings, thinking it’s going to be faster than LIKE ‘%mysearch%’. That works at small scale, but as your data grows, you run into a query plan performance problem.

When your query uses CONTAINS, SQL Server has a nasty habit of doing a full text search across all of the rows in the table rather than using the rest of your WHERE clause to reduce the result set first.

This isn’t a big deal for small text data sets, like under 10GB, but as your data grows linearly, your query time goes up linearly too. By the time your data grows to real production sizes, full text can get uncomfortably painful.

To illustrate it, let’s say I’m using the Stack Overflow database, and I want to optimize this query:

The query runs in milliseconds if I have an index on CreationDate and Title:

Because SQL Server chooses to seek to my specific date first, then check the posts’ titles, as the actual query plan shows:

 

That’s awesome! Nobody’s gonna complain about that speedy query. If you hover your mouse over the Index Seek, it had to read 4,510 rows to find 100 rows that had “SQL” in the title – but that’s not terrible. It only did 100 key lookups, too. It’s super-quick:

However, try the same thing with full text.

I’m going to set up full text search and create a full text index on the Title column:

If you try that yourself, be aware that the above query will finish instantly, but it’s kicking off a full text index population in the background. That takes way, way longer to complete than the nonclustered index did, so while you wait, here’s how to check the status of your full text index populations.

When the full text population finishes, try the full text version of the query with CONTAINS:

And the query is slower because it does more work:

Because of the different plan:

The plan operators, top to bottom:

  1. SQL Server does an index seek on CreationDate, but it reads more rows
  2. It does 8,874 key lookups – up from just 100
  3. The full text search finds all posts with java in the title, regardless of date – producing 440,615 rows

Complex searches are even worse.

Let’s throw in a simple “and”, looking for both java and sql:

The query takes over 20 seconds thanks to an imbalance of work across cores:

And the actual plan shows the inefficiencies of work when we’re only trying to get 100 rows:

Why does SQL Server do it this way? I dunno – it just always has. It finds all the data that matches, and then joins to the rest of the data you’re searching for. This isn’t a new problem – it’s been the same way for over a decade – and it’s one of the reasons Stack Overflow had to move their search out of SQL Server as well.

Just for laughs, if I use LIKEs with leading % signs:

That finishes in less than a second, with 2,426 logical reads. Here’s the actual plan.

Full text search doesn’t integrate well in query plans.

If you have a really simple query and you’re filtering for very unusual keywords, then full text search is pretty dang good. It brings powerful capabilities to SQL Server built right into the box.

However, the more complex your query plan becomes, like the more filtering that’s being done on other tables, AND the more common your search keywords are, the angrier you’ll become with full text performance. SQL Server will choose to perform the full text search across all of the rows regardless of your other filters, then drag a ton of data back into the execution plan and force the rest of the operators to deal with it.

In this particular example, a like with a leading % sign – normally a DBA’s enemy – is actually a performant solution. I wanna stress, though, that that isn’t a solution that works in the majority of shops.

In other situations, I’ve had to point folks toward:

  • Building a terms table – and have people check boxes to pick from common terms, or have your app examine their search string to redirect over to the terms table rather than use full text search
  • Paul White’s trigram search solution – which requires you to build a separate table, keep it in sync, and rewrite your app’s searches to point at it
  • Elasticsearch – but then it’s left up to you to keep SQL Server and Elasticsearch’s contents in sync, and to rewrite your app’s searches to point at Elastic rather than SQL Server
  • Scaling out with Availability Group readable replicas – which costs a fortune, since they have to be Enterprise Edition, but at least you can scale the search workloads out without changing your application, or it can buy you time while you rewrite the search part of your app

None of those solutions are easy or cheap.


Free Webcast: Why is the Same Query Sometimes Slow?

I can smell your parameters from here

You swear nothing has changed, but all of a sudden, out of nowhere, queries that used to be fast are suddenly slow. Even weirder, you take the slow query from the application, run it in SSMS, and it’s totally fast! What’s going on?

You restart SQL Server or update statistics, and the problem seems to go away – but only for a few days or weeks, and then it comes right back. You’re familiar enough with execution plans to realize that you’re getting different plans, but…why? And how do you fix it long term?

In this free Idera Geek Sync session on Wednesday, January 27, you’ll see live demos of the most common culprit: parameter sniffing. You’ll learn how to recognize parameter sniffing when it strikes, understand what causes it, see how to fix it short term with the lowest impact possible, and learn 7 ways to fix it long term.

Register now. See you there!


Statistics Aren’t Guarantees: SQL Server Still Checks.

When I’m reviewing plans with folks, I get this response a lot:

But SQL Server should know there’s no data that matches! It has statistics!

Statistics are guideposts, not guarantees.

We’ll start with the Stack Overflow database and put an index on LastAccessDate, which also creates a statistic on that column. We’ll check the min and max LastAccessDates:

In the database I’m using today, the min is ‘2008-08-01 00:59:11.147’ and the max is ‘2018-06-03 05:18:24.843’. If I query for all users lower than the min, or all users above the max:

The actual execution plans show that SQL Server estimated 1 row would return for each of those, but of course 0 rows actually return:

You might say, “But SQL Server has statistics on those columns, and it knows what the top values are!” Well, that’s true, but…data can change without the statistics being updated. For example, say that one user logs in right now, and then we run the MAX query again:

One row actually returns:

This has a bunch of interesting ramifications.

SQL Server has to build the whole query plan at once. He doesn’t go execute the first operator, check to see how many rows come back, and then build the rest of the plan. There are adaptive joins that will change their behavior depending on how many rows are found, but their existence in the plan has to be set up before the first operation in the plan even starts. The more complex your query is, the more time it takes to build the whole plan – even if no rows are going to be found.

The statistics can be way off. In our Stack Overflow scenario, imagine that users are constantly logging in, all the time. SQL Server will keep thinking just one row is going to come back until the statistics are updated, or SQL Server figures out that we have what’s called an “ascending key problem.” This is a constant issue in SQL Server where it doesn’t realize that a particular column is going to keep growing in the future, and that it should always assume there’s going to be more data for higher values, even if that data hasn’t been loaded yet.

If apps constantly query for data that can’t possibly exist, like if we know no one is ever going to have a LastAccessDate < 2008, then we can put in a check constraint. SQL Server may bypass querying the table altogether and do a constant scan. I’ve used this trick when an app sent in queries that I couldn’t control, and the query was becoming a performance issue.

If apps constantly query for data that rarely exists, like if they check for invalid data and then correct it with a subsequent query, we can use a filtered index to just focus on that bad range of data. I cover that in the filtered indexes module in Mastering Index Tuning.

If you want to learn more about statistics, check out the free statistics courses in my YouTube channel. Or, if you’d like to learn about more advanced indexing tricks like these, check out my Mastering Index Tuning class. I have upcoming live classes starting this Friday, and also December 8th, and you can get in free with a Live Class Season Pass.


Why I Teach Training Classes on Weekends (the next one starts Friday)

You have a day job.

And your day job comes with two big problems.

Problem #1: it’s hard to pay attention during the week. Your calendar has tons of meetings scheduled for the foreseeable future, and you can’t just reschedule them whenever you want. Even when you’re not booked in a meeting, people are constantly pinging you for stuff, servers are going down, and you just can’t focus on a training class. You turn your head to solve one urgent issue, and then bam, when you turn back to class, you’re lost.

Problem #2: some of you don’t get paid to learn. Some of you are consultants, contractors, or freelancers, and when you take time off during the week, you’re not getting paid for it. Some of you are even paying for my training out of your own pockets, and your companies won’t give you time off to learn. You’re investing in yourself because you want to eventually get a better job, and you want to be more valuable to future employees. That’s also why we provide certificates of attendance to the classes, too. When you get a Live Class Season Pass, you can attend so many of my classes and learn so much, and that looks absolutely spectacular on a resume. It’s huge bang for the buck.

That’s why I teach my weekend Mastering classes. They’re the exact same in-depth Mastering classes with challenging hands-on VMs, but I run them over Friday, Saturday, and Sunday instead of during the week. I teach a couple of weekend rotations per year, and the next one starts this Friday:

Or, if you prefer weekday rotation, the next one is:

Weekend students – especially the consultants & contractors – also tell me they love the Instant Replays included with the Live Class Season Pass. When they hit a tough problem at work, they can pop open that particular module, start watching the recording, read through the slides, and run through the demo scripts. It’s like just-in-time learning, all year long.

Buy today, and you can start your learning journey Friday. You can show up to work better on Monday. Let’s do this!


How to Learn SQL Server in One Year

You’ve been using Microsoft SQL Server for a few years, and it’s becoming increasingly important to your job.

You can write a query, but…

You feel like something’s missing.

You attended a few local user groups or meetups before the quarantines hit, but you were disappointed at how scattered the sessions were. The material was all over the place, covering lots of different topics, and none of them went into depth. You couldn’t relate the topic to what you were actually doing at work.

You’ve been Googling to find solutions when you run into problems, but you’re starting to doubt what you read. You’ve read conflicting suggestions, and you keep hearing “It depends” – and you’re beginning to think it’s some kind of cop-out answer that people are giving when they don’t have the real answers.

I know. I’ve been there too.

If you’re going to learn,
you need a structured, timed plan.

You can’t be just randomly Googling around every now and then. You need a clear, linear plan where each session builds on top of the last one. You need a table of contents, and a recap at the end of each learning sprint.

You can’t just pick up a book, either: you’ve tried that. Nobody’s holding you accountable. The book sits on your bedside table for months, and it’s there for a reason – it puts you to sleep, and nobody’s pressuring you on a regular basis to show what you learned.

You need a combination of 3 things:

  1. Lively, fun lectures – not boring text
  2. Hands-on labs where you’re required to turn in your homework
  3. A timed schedule – you gotta show up and be there on specific days/times

If you put the dates on your calendar, show up for class, turn in your assignments to a fun instructor, and get feedback on your progress, THAT’S how you make real headway in just one year.

Get started in a 3-day sprint.

Mark these out in your calendar:

If you can’t make those, the next rotation starts February 1. I teach these 4x a year, but I’m telling you now because we gotta get you on board this train.

Those 3 classes are each 1 day long. If you can’t get the time off work, you can still do this: I teach those classes as 50% lecture, 50% hands-on labs. You can watch while I do the lectures & labs, and then keep up with your day job when I give the rest of the students time off to do the labs. (I’d recommend you do ’em yourself, though – remember, that’s part of the key to your success, actually getting your hand on the product and proving that you know what you think you know.) The hands-on labs for the Fundamentals classes can be done on your existing laptop/desktop.

Now, you might see “Fundamentals” and think that you’re already past ’em, and that you can move on. I don’t want you wasting time on fundamentals if you already know this stuff, so ask yourself these four questions as a sample test:

  • How do you measure which column goes first in an index key?
  • How do you index to support both joins AND filters?
  • When is a CTE better than a temp table, and vice versa?
  • What kinds of query operations are the most susceptible to parameter sniffing?

If you know the answers to those, you can skip Fundamentals. If not, get in there and get your fundamentals on. Don’t feel guilty: I know a lot of data professionals who’ve said they had over a decade of experience, but they were still shocked by what they learned in my Fundamentals classes.

After those 3 classes, you’re ready for Fundamentals of Columnstore on November 17th (iCal), but don’t feel like you HAVE to take that one. That one really only makes sense if you’re considering columnstore indexes. Those aren’t for everybody – they’re really targeted at reporting queries on 250GB+ databases running SQL Server 2016 or newer. If you don’t have those, feel free to skip that one.

Then, start to master SQL Server.

After you finish the Fundamentals, the really tough stuff starts, and you really need to take the first two in order – first indexing, then querying, and then you can take either #3 or #4 in either order – I teach ’em throughout the year:

  1. Dec 8-10: Mastering Index Tuning – iCal
  2. Jan 12-14: Mastering Query Tuning – iCal
  3. Optional: Feb 9-11: Mastering Server Tuning – iCal
  4. Optional: Feb 16-18: Mastering Parameter Sniffing – iCal

Those are 3-day classes, and each day goes like this:

  • 9AM-Noon: I teach you concepts with a mix of lectures & demos.
  • Noon-2PM: hands-on lab: you run a workload against the full-size Stack Overflow database and solve a performance problem, plus each lunch. You turn in your homework to me in the company Slack room, and I give you feedback on your work.
  • 2PM-3PM: I do the same lab you just did, but live onscreen, so you can see how my work compared to yours.
  • 3PM-4PM: I teach you another concept with a mix of lectures & demos.
  • 4PM-?: hands-on lab: you get another workload challenge. You can solve it from 4PM-5PM, or later if you prefer, or even in the next morning, whatever works best for you.

The Mastering classes are much tougher, and require a bigger VM to pull off. Read the instructions on setting up a lab VM, and you’ll want at least 4 CPU cores, 30GB RAM, and local solid state storage with 300GB free space. You’ve got some time before you have to set that up, though. If you don’t have access to that hardware, I offer a Live Class Season Pass with a lab VM included – but honestly, you can buy a little lab server from home at a more cost-effective price.

The first time you go through ’em,
you’re gonna fall off the horse.

When I went through the Microsoft Certified Master program, it had a 75% failure rate. Most of the students needed multiple attempts at the hands-on lab tests (although I’ll giddily report that I passed the first time, woohoo!) Even though I passed, my first reaction was, “I wanna do that again and see if I can do better, and faster.”

When Microsoft folded up the MCM program and I built my Mastering classes, I wanted to replicate that experience. I wanted to give you a very, very challenging test of your skills. Some folks would pass the first time, other folks would take multiple attempts, but hopefully it’d be fun enough for everybody that they would want to do it again and again.

That’s why most students opt for my Live Class Season Pass: the ability to attend all of my classes again and again for a year straight.

You’re reading that and going, “Who could possibly attend 3-day classes over and over throughout the year?” But here’s the thing: most folks tend to attend multiple times, but they pay different levels of attention depending on their schedule. When you have unlimited access to my classes, what you find is that you just check out my upcoming class list, add stuff to your calendar, and then leave my live sessions up on another monitor and on your speakers while you work. If an interesting concept comes in, or if you’ve got free time available, then you jump in.

By the end of the year,
you will be WAY better at SQL Server.

It’s going to require focus and attention. It’s going to require participation on your part. It’s going to take commitment and action.

You can float through another year, skating by on the bare minimum of skills, getting surprised at every turn.

Or you can decide that this is going to be the year when you start learning about problems before you encounter them, becoming proactive instead of reactive, and start to be seen as “the SQL pro” amongst your team members. You can decide that you’re going to take your career up a notch, and you’re going to let me teach you everything I know.


New Music Friday: Killa DBA’s New Release Just Dropped

Familiar readers will know that I don’t actually do New Music Friday here very often.

Okay, ever. I don’t do it ever.

Because frankly, nobody writes music about us.

Nobody except Homer McEwen, aka Killa DBA (YouTube@KillaDBA).

You may remember him from The SQL Server 2017 Song, and if not, you should go check that out too.

His new album Hug a DBA is available now on Apple Music and Spotify, free for folks who subscribe.

The SQL Server 2019 song is way more catchy than it has a right to be, and I don’t know how he figures out how to stitch all the features together in a way that they rhyme and flow. Nice work!


Menu
{"cart_token":"","hash":"","cart_data":""}