Updated First Responder Kit and Consultant Toolkit for October 2019

This is a pretty doggone big release. There are a lot of improvements to sp_BlitzFirst, sp_BlitzCache, and sp_BlitzWho to prepare for a new release of the First Responder Kit Power BI Dashboard. If you’re using that today, I would strongly recommend updating to this new version of the FRK scripts – they fix a ton of bugs and start collecting data in a way that will make joins easier.

I wanted to ship the new Power BI Dashboard this month, but couldn’t quite get it working just the way I want to, so I’ll ship that one in next month’s release. Here’s a quick preview of the updated Query Details tab contents:

It has the new Power BI filters pane on the right – and yes, there’s a server filter, which means you’ll be able to centralize all of your data into a single repository, and navigate through it with easy filtering. On the left, I’ve got not only the cached plans from sp_BlitzCache, but also any live snapshots that were captured from sp_BlitzWho. Those show when the query is running, and its status along the way – really useful for those big long OPTION RECOMPILE queries that don’t show up in the plan cache, but you know they’re running and killing performance. Lots of good stuff coming there.

Breaking change: if you’re logging the data to tables, there’s a new JoinKey computed column added to all of the views, and to the BlitzCache, BlitzFirst, and BlitzWho tables to make Power BI Desktop joins easier. These columns will automatically be added by sp_BlitzFirst & friends when you upgrade, but if you’re doing any kind of ETL to put all of the Blitz% tables in one central location, you may need to add this column to your central location’s tables (and make sure you’re not doing SELECT * to ETL the data around.) For questions, feel free to hop into the #FirstResponderKit Slack channel. I’m out on vacation starting Sunday, but I’ll stick my head in there from time to time.

To get the new version:

Consultant Toolkit Changes

Updated the First Responder Kit with this month’s script updates, plus:

  • Fix: the Plans CPU ByHash tab wasn’t populating if the server didn’t have sp_BlitzCache installed.

sp_Blitz Changes

  • Fix: new @SkipBlockingChecks parameter defaults to 1. There’s a known bug in sys.dm_db_persisted_sku_features, which caused sp_Blitz to be blocked during columnstore index creation. I don’t expect Microsoft to fix that one quickly, so we just skip that check by default to be safe. (#2130)
  • Fix: if you stored your database files on a UNC path (\\) and wrote your backups on one too, sp_Blitz no longer reports that you’re storing backups in the same place as your databases. (#2141)
  • Fix: some heaps with 0 reads were reported as being actively used. (#2149, thanks Razvan Socol.)

sp_BlitzCache Changes

  • Improvement: @SortOrder = ‘query hash’ now takes a second sort order in a comma-delimited list, like @SortOrder = ‘query hash, reads’ to find the queries with multiple cached plans, sorted by most reads. (#2156)
  • Improvement: adds JoinKey computed column (ServerName + CheckDate) to output table to make Power BI Desktop joins easier. (#2162)
  • Fix: when called by sp_BlitzFirst, the CheckDates in the output table now match the exact same dates/times as sp_BlitzFirst’s output tables (even though it’s not exactly when sp_BlitzCache is running.) This enables joins in the Power BI Dashboard since we’re filtering by dates/times. This used to work back when we only called sp_BlitzCache with a single sort order, based on your top wait type, but once I switched to @SortOrder = ‘all’, it stopped working. (#2159)
  • Fix: when @SortOrder = ‘all’, @MinutesBack was being ignored. This is especially important for folks who use the Power BI Dashboard because with the default Agent job step, this would have resulted in way too much stuff showing up in the plan cache reports, instead of only showing things that executed in the last 15 minutes. (#2168, thanks Matt Rushton.)
  • Fix: in @SortOrder = ‘all’ output, the plans with the most spills were being incorrectly shown in the ‘memory grant’ category. They were still queries you wanted to fix, but the reason WHY they were a problem was wrong. (#2170, thanks Matt Rushton.)

sp_BlitzFirst Changes

  • Improvement: new @OutputTableNameBlitzWho logs sp_BlitzWho results to table. (#2159)
  • Improvement: new @BlitzCacheSkipAnalysis parameter defaults to 1 to replicate existing behavior, but if you want your sp_BlitzCache output tables to have the full query analysis, you can turn this off in your Agent jobs. I’m not a fan – this makes the Agent job take >30 seconds on even my laptop – but if you want it, there it is. (#2155, thanks Matt Rushton for the idea. The idea is good though!)
  • Improvement: BlitzFirst output table has 2 new columns: JoinKey computed column (ServerName + CheckDate) and QueryHash, plus added join keys to the output views to make Power BI Desktop joins easier. (#2162, #2164)
  • Improvement: added SQL Server 2017 XTP Perfmon counters. (Yes, they actually hard-code the version numbers into the Perfmon counter name, and 2019 uses the 2017 names.) (#2162)
  • Fix: if you passed on a @LogMessage with a @LogMessageCheckDate, we were, uh, ignoring it and just using the current time. (#2135, thanks Matt Rushton.)

sp_BlitzIndex Changes

  • Fix: reducing collation errors. (#2126, thanks chaowlert.)
  • Fix: more natural sorting for missing index recommendations, by impact. (#2143)

sp_BlitzLock Changes

  • Fix: because the system health XE session stores deadlock times in UTC, @StartDate used to assume you knew that. We now assume @StartDate is the server’s local time, and we also convert the outputted deadlock dates from UTC to the server’s local time too. (#2147, thanks DigitalOhm for the idea.)

sp_BlitzQueryStore Changes

  • Fixes: wasn’t running on Azure SQL DB, now does. (#2124 and #2136, thanks Paul McHenry.) Remember, though – our support policy on Azure SQL DB is pretty fast and loose since the DMVs change up there without warning.

sp_BlitzWho Changes

  • Improvement: added a @CheckDateOverride parameter so this can be called by sp_BlitzFirst and write the exact same CheckDates in the output table. This enables joins in the Power BI Dashboard, which requires joins on exact field contents. (#2159)
  • Improvement: adds JoinKey computed column (ServerName + CheckDate) to output table to make Power BI Desktop joins easier. (#2162)

sp_DatabaseRestore Changes

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

How to Think Like the SQL Server Engine: What’s the Tipping Point?

In our last episode, I’d expanded our query to include DisplayName and Age – two columns that weren’t in our nonclustered index:

So as a result, I was getting key lookups in the execution plan:

And I spent a lot of time talking about the overhead that each key lookup incurs. Astute readers among the audience (as in, other people, not you) probably noticed the 2018-09-02 04:00 date and thought it was rather odd that I’d switch dates while telling this story.

Let’s query for just one hour earlier.

We were only getting 1,576 rows for Sept 2 4AM and after (because the data export I’m using was done on Sept 2, 2018) – let’s try 3AM:

And SQL Server ignores my index:

Even though only 2,256 rows are being returned, SQL Server would rather just scan the whole table than deal with the index seek + key lookup combo. There are 299,398 rows in the table – and even though we’re querying LESS THAN ONE PERCENT OF THE TABLE, the index gets ignored.

Is that the right decision?
To find out, let’s force the index.

I’ll run two queries back to back – first, letting SQL Server decide which index to use, and then second, forcing the nonclustered index by using a query hint:

Then check the Messages tab for our logical reads:

When we do a table scan, that’s 7,405 reads.

When we use the index, it’s 6,927 reads – assuming that we’re only getting 2,256 rows.

It’s tempting to say SQL Server was wrong, but here’s the thing: the table scan is a predictable 7,405 reads no matter how many rows are brought back from the query. It’s a safe decision when SQL Server just can’t be sure exactly how many rows are going to come back. If I query for just one more hour earlier, at 2AM:

Now that 2,917 rows match our filter, the clustered index scan pays off: it’s reading less rows than the combination of the index seek + key lookup. The key lookup costs more and more with every additional execution.

If we go back just one day, the difference becomes huge:

We’re only getting 16,325 rows – about 5% of the table – and yet if we use the index, we do so many logical reads that it’s like reading the entire table seven times over! That’s exactly the problem SQL Server is trying to prevent, dealing with key lookups that read more pages than there are in the table.

Our 2AM-4AM WHERE clause is hovering around the tipping point.

Even though we’re only bringing back less than 1% of the users, it’s already more efficient for SQL Server to scan the entire table rather than use our index.

In our examples, the 3AM query erred just a tiny, tiny bit on the conservative side, picking a clustered index scan when an index seek + key lookup would have been more efficient. In this example, who cares? It’s a difference of less than 500 page reads. I’m happy with that decision.

So how does SQL Server figure out which query plan – the index seek or the table scan – is more appropriate before actually executing the query? That’s where statistics come in, and we’ll talk about those next.

One solution is to widen the index.

When you see an index seek + key lookup in an execution plan, hover your mouse over the key lookup and look at the Output List:

Then start asking a few questions:

Can my query live with less rows? The whole problem with the tipping point is based on the number of rows we’re bringing back. If you can do pagination, do it. If you can restrict the number of rows outright or force users to be more selective with their searches, do that too. When you truly want to scale, everything is important.

Can my query live with less columns? Sometimes it’s a matter of folks doing a SELECT * out of laziness, but sometimes we really do need all of the columns. The more columns we need, the less concerned I am about the key lookups – because I simply can’t afford to maintain multiple copies of the full table, all sorted in different orders.

How large are the columns? If they’re big ol’ NVARCHAR(MAX), VARCHAR(MAX), JSON, XML, etc., then the discussion is over. I’m fine with the cost of the key lookup. I wouldn’t want to add them to indexes because they’ll simply take up too much space.

How frequently are the columns updated? SQL Server doesn’t have asynchronous indexes: every index has to be kept up to date at all times with the clustered index. If the Age column changes all the time, for example, then it may not make sense to duplicate it on multiple indexes since we’re going to have to grab locks on those indexes when we run updates. (Inserts & deletes don’t matter as much since we have to touch the indexes at that point anyway no matter how many columns they contain.)

If we do need those columns, should we add them to the nonclustered index? By adding them, we could get rid of the cost of the key lookup altogether.

There are no stone-carved right-and-wrong answers here, only a series of decisions and compromises to make in your quest for performance. When you first get started, you don’t really notice when SQL Server tips over from index seeks to table scans because your data is relatively small. As your data grows and your users want higher performance, that’s when this kind of thing starts to matter more.

I won’t dig too deeply into nonclustered index design in this blog post series, but if you’re interested in that kind of thing, check out my one-day Fundamentals of Index Tuning course.

Next up, let’s see how SQL Server figures out where the tipping point will be for a query.

How to Think Like the SQL Server Engine: What’s a Key Lookup?

In our last couple of queries, we’ve been using a simple query to find the Ids of everyone who accessed the system since mid-2014:

But Ids alone aren’t all that useful – so let’s add a few more columns to our query:

Now think about how you’re going to execute this query plan in plain English, as a human being. You have two copies of the table: the nonclustered index (black pages) with LastAccessDate, Id:

And the clustered index (white pages) with all of the columns in the table:

dbo.Users clustered index

One way is to use the nonclustered index first.

We could:

  • Grab the black pages, seek to 2014-07-01, and start making a list of the LastAccessDates and Ids we want
  • Grab the white pages, and for each Id we found in the black index, look up their Id in the white pages to get the extra columns (DisplayName, Age) that aren’t included in the white index

Here’s what that query plan looks like:

We read plans from right to left, but also top to bottom. The first thing SQL Server did was the Index Seek at the top right – that’s the seek on the black pages. SQL Server seeked to a specific date/time, read out the LastAccessDate & Id for each row, and that was the end of the Index Seek’s work. (He’s a totally separate standalone mini-program.)

Then, SQL Server took that list of Ids, and looked up each one of them on the white pages – that’s the Key Lookup operation. SQL Server used their clustering key – in this case, Id, because that’s what we built our clustered index on. (If you don’t build a clustered index, this same basic process still happens – but more on that in another episode.)

That’s why every nonclustered index includes the clustering keys.

When I first told you I created the black index, I said I did this:

But I didn’t have to. I could have just done this:

And we would have ended up with the same basic black pages. SQL Server adds the clustering keys into every nonclustered index because it has to be able to accomplish these key lookups. For every row it finds in a nonclustered index, it has to be able to track back to that one unique row in the clustered index.

I just told you I created the index on both columns in the beginning because I had to ease you into this. I didn’t wanna say anything, but you weren’t particularly with it that first morning you started reading this series.

Yes, I’m still holding things back from you right now even as I type this. No, I didn’t want to say that because it’s only going to make things awkward between us. Also, you’re not my only reader. You’re still the one I love the most, though. Now let’s just move on before you ask more questions.

Key Lookup operators obfuscate a lot of details.

I wish execution plans were three-dimensional: I wish operators popped off the page based on the number of times they were executed. You see that “Key Lookup” and you think it only happened once, but that’s not true at all. Hover your mouse over it, and you’ll see Number of Executions – that’s how many times it was actually performed:

Earlier, I wrote that when Microsoft paints stuff on a screen, they either put it in alphabetical order, or they load up the Data Cannon, fire it, and wherever the data goes, that’s where it ends up. Here, this tooltip was arranged by the Data Cannon, so we’re going to jump around a little.

  • Estimated Number of Executions is based on how many rows SQL Server thought was going to come out of the Index Seek. For each row we found, we’re gonna have to execute one Key Lookup.
  • Estimated Number of Rows – how many rows EACH key lookup will return.
  • Number of Executions – how many times we actually did it, based on how many rows actually came out of the index seek.
  • Actual Number of Rows – the total number of rows that came out of ALL key lookups (not each one.)

I know. Sometimes they say Actual, sometimes they don’t. Sometimes the numbers are per-execution, sometimes they’re total. If you’re looking for consistency and organization, you don’t use the Data Cannon. I’m sure they treat the internals of SQL Server with much more accuracy and care.

The cost of the key lookup is logical reads.

Each time we execute the key lookup – in this case, 1,576 times – we’re opening up the white pages and doing some logical reads. To see it, let’s run a no-lookup query (that only gets Id) and a multi-column query (that gets DisplayName and Age too) and compare their logical reads:

The top query only does 7 reads, and the bottom one does 4,842! Those 1,576 key lookups needed to do about 3 logical reads each. (Why not just 1 read each? Because they need to navigate a tree structure of pages to find exactly the right page that has the data they want – there’s a couple/few pages of overhead each time you need to seek in an index. We’ll talk about alternatives to that later too.)

The more rows your index seek returns, the more likely it is that you won’t get this execution plan at all. Notice that in this post, I didn’t use 2014-07-01 as my filter date – I used something much more recent. You’ll learn why in the next episode.

How to Think Like the SQL Server Engine: So Index Seeks are Great, Right?

In our last episode, we added a nonclustered index, and now even though this query has a filter and an ORDER BY, it hardly did any work at all – just a lightweight index seek:

Ah, index seeks – nothing better than that, right?

Well, not exactly. Let’s make one tweak to the WHERE clause and find all of the people who accessed Stack Overflow going all the way back to the days of programming your ox team:

We still get an index “seek”:

Yes, it’s a seek, even though it’s reading every single row in the index. You probably thought that would be a scan, but it’s not.

Index seek doesn’t mean lightweight.

Index seek only means SQL Server is going to jump to a specific point in the index and start reading.

It has absolutely nothing to do with how few or how many rows you’re going to read. You can do an index seek and read just one row, or read all of the rows in the table.

If you’ve been looking at an execution plan and going, “seek seek seek, all seeks here, guess it can’t get any better than that,” then you’ve been missing out on something pretty big.

Index scan doesn’t mean reading the entire table, either.

Let’s get any 10 users:

We get a clustered index scan – meaning, it’s reading the entire table, right?

Nope – SQL Server starts scanning the table, but it can bail as soon as it’s found enough rows to deliver your query results. Hover your mouse over the clustered index scan, and it says “Number of Rows Read: 10.” If your number of rows read equals the number of rows your query outputted, that’s great! I can’t possibly design a better index to support this query. A clustered index scan is absolutely perfect here.

Here’s what seeks and scans really mean:

  • Index seek: “I’m going to jump to a specific point and start reading.”
  • Index scan: “I’m going to start at either end of the table and start reading.”

Here’s what they don’t mean:

  • How many rows we’re going to read
  • Whether we’re seeking on all of the columns in the index, or just the first one (more on that later)
  • Whether the index is good or bad for this query

I can’t emphasize this enough: you can’t make index judgements based purely on seeks or scans. This is especially true when you start looking at index usage metrics from tools like sp_BlitzIndex: we’re simply reporting how the index is being accessed. You can’t say, “This index is getting seeks, so it’s perfect,” nor can you say, “This index is getting scans, so the keys must be in the wrong order.” You’ve gotta dig deeper than that to see how much data is moving around – and that just isn’t something the DMVs track. You’ve gotta look at query plans closely to decipher what’s going on.

Next, let’s add some columns to our query and force SQL Server to make a tough decision.

SQL Server 2019 Standard Edition Feature Limitations Are Out

The features list by edition is out, and there’s a disclaimer at the top:

This content is being updated for SQL Server 2019. The information in the article is not final.

Which means if you’re really angry, and your company buys a lot of licenses, now’s your time to raise holy hell with your Microsoft reps if there’s something in this doc that disappoints you.

The most important stuff:

  • Standard Edition is still capped at 128GB RAM.
  • Accelerated Database Recovery is in Standard Edition.
  • Automatic tuning, batch mode for row store, adaptive memory grants, adaptive joins, and memory-optimized TempDB are Enterprise Edition only.
  • Automatic inlining of scalar functions isn’t listed, but given the above, it’s probably Enterprise Edition only as well. Update: scalar UDF inlining is now listed, and it’s in Standard Edition! Woohoo!

Update: how I feel about the limitations

Looking back at this, I realize that I didn’t editorialize this at all – I didn’t say how I felt about it. I’m actually totally cool with it except for two things, and I’ll get to those in a second.

Microsoft’s gotta make money just like you do, and they need big differentiators between Standard Edition and Enterprise Edition. They need to have serious reasons as to why you’d spend the extra $5K per CPU core. I think automatic tuning, batch mode for row store tables, adaptive joins, and memory-optimized TempDB are all fair limitations. If you’re hitting the kinds of performance issues where you need those features in order to survive – not just that you want them, but that you have to have them – then you should probably pony up for Enterprise Edition.

The vast, vast majority of shops have survived just fine without those features for decades. Sure, you’ve hit performance limitations that have caused you to do some performance tuning, and those took you some time. Maybe even a lot of time. But that’s the value proposition of Enterprise: when you hit big performance issues, you could EITHER tune it manually yourself, or you could pay Microsoft to improve the performance for you. I think it’s a fair trade.

First, Basic Availability Groups are a joke. I know, they’re supposed to replace database mirroring, but the requirement of a separate Availability Group for every database, and a separate listener, and a separate IP address – those are just ridiculous. Let people put multiple databases in the same Basic Availability Group and fail them over together. (I’m not asking for multiple replicas or read replicas – I get that both of those are great differentiators for Enterprise. I know, some readers are gonna say they need multiple replicas for both high availability and disaster recovery, and I don’t think that’s unreasonable, but I’m not fighting that battle today.)

Second, there’s one area that’s unforgivable: memory grants. SQL Server itself makes horrible, horrible decisions around memory grants, especially around over-estimation. Standard Edition has the perfect storm of problems:

  1. It’s capped at 128GB RAM (especially around query workspace)
  2. You can’t use Resource Governor to cap query grants
  3. SQL Server can’t learn from its mistakes because Standard doesn’t get adaptive grants

I would totally understand if we had ANY way at the system level to fix SQL Server’s bad memory grant decisions, but the only option we have is by changing the queries themselves (by rewrites or by adding grant hints.) Microsoft should give us SOME way to fix these bad grants at the system level – and putting any one of the 3 above options in Standard would be fine.

Or, you know, they could fix the query optimizer to stop granting so many queries 25% of the buffer pool every time they run a query. But let’s be realistic here.

How to Think Like the SQL Server Engine: Adding a Nonclustered Index

When we left off in the last post, our users kept running this query, and they want it to be really fast:

Let’s pre-bake the data by creating a copy of the table sorted in a way that we can find the right rows faster:

This builds a separate copy of our table (also stored in 8KB pages) that looks like this:

This index slows down inserts and deletes.

The first thing to notice is that you now have two physical copies of the table. Every time you insert a new row, you have to add it in two places. Every time you delete a row, you have to delete it in two places. You’ve effectively doubled the amount of writes that your storage has to do. (Updates are a little trickier – more on that in another post.)

The second thing to notice is that the nonclustered index (black page) is denser: you can fit more users per 8KB page on that because we chose to store less fields. You can see that by running sp_BlitzIndex and looking at the top result set, which shows how big each index is:

The clustered index (marked CX PK) has ~300K rows and takes up 58.1MB space on disk because it contains all the columns of the table.

The nonclustered index has the same number of rows, but it only takes up 5.3MB space because we’re only storing LastAccessDate and Id on it. The more columns you add to an index – whether they’re in the key, or in the includes – they make the object larger on disk. (I’ll talk more about index design as this series continues.)

But this index pays off dramatically for selects.

Try our select query again – and here, I’m running two queries back to back. I’m running the first query with INDEX = 1 as a hint to show the cost of the clustered index scan. (Index #1 is your clustered index.)

Our new bottom plan is much simpler:

  • We get a seek, not a scan – because SQL Server is able to jump right to the rows where LastAccessDate > 2014/07/01. It doesn’t need to scan the whole object to find ’em.
  • We don’t need a sort – because as we read the data out, it’s already sorted by LastAccessDate.

The index seek’s cost is lower for two reasons: it doesn’t need the sort, and it reads less 8KB pages. To see how much less, let’s check out the messages tab:

The clustered index scan read 7,405 pages and took 110ms of CPU time to do the ORDER BY.

The new nonclustered index only read 335 pages – like 20X less! – and didn’t need CPU time for ORDER BY. Really good index designs like this are how you can quickly get 20X (or much higher) improvements in your queries without rewriting the queries. Do I wish every query was perfectly tuned? Sure – but even if it was, you’re still gonna need to give the engine a helping hand by organizing the data in more searchable ways.

This index is called a covering index.

The index perfectly covers this query, satisfying everything it needs in a quick, efficient way. The term “covering” isn’t a special kind of index that you create with different syntax – “covering” just refers to the combination of this index AND this query. If you change the query at all – like we’re going to do here shortly – then the index may no longer be covering, and we’ll have to do more work again.

So our plan has an index seek – that’s as good as a plan can get, right? No, as we’ll see in the next episode.

How to Think Like the SQL Server Engine: Running a Query Repeatedly

Earlier in this series, we ran a query with ORDER BY, and we realized that it was CPU-intensive work that tripled the cost of the query:

Now, let’s run that query a bunch of times. In SSMS, you can add numbers after the GO, and SSMS will run your query a number of times. I’ll run it 50 times:

I explained SET STATISTICS IO ON in the first post, but I’ve added a new option here: TIME. This adds more messages that show how much CPU time and elapsed time the query burned:

SQL Server executes the query over and over, reading the 7,405 pages each time, and doing the sort work each time.

When I was a developer, I used to be totally okay with constantly fetching data from the SQL Server. After all, I’d just run the query before, right? The data’s in cache, right? Surely SQL Server caches that sorted data so that it doesn’t have to redo all that work – especially when my queries had been doing a lot of joining, grouping, filtering, and sorting.

SQL Server caches raw data pages,
not query output.

It doesn’t matter if the data hasn’t changed. It doesn’t matter if you’re the only one in the database. It doesn’t even matter if the database is set to read-only.

SQL Server re-executes the query again from scratch.

And similarly, if 500 people are running the exact same query at the same exact time, SQL Server doesn’t execute it once and share the results across all of the sessions. Each query gets its own memory grant and does its own CPU work.

This is one of those areas where Oracle has us beat. Sometimes folks will ask me what my favorite database is, and I gotta confess that if money didn’t matter, I’d probably be really interested in Oracle. Check out their Result Cache feature: you can configure a percentage of memory to cache query results to be ready-to-go when apps keep rerunning the same query. However, at $47,500 per CPU core for Enterprise Edition licensing (check out the price sheets), I’m afraid I’m not going to be sampling any of that caviar anytime soon.

One way we can solve that problem is by caching the results in the application layer:

But another way – and the way I use most often – is to pre-bake the data in a way that makes queries run faster. That’s a nonclustered index, and I’ll cover those next.

How to Think Like the SQL Server Engine: The Perils of SELECT *

In our last post, we ran a query with an ORDER BY, but we only got one column in the SELECT:

The estimated cost was about $18 Query Bucks because SQL Server had to:

  • Scan the entire clustered index, yelling out the Id and LastAccessDate of each row
  • Sort that list by LastAccessDate

Now, let’s change just one thing about the query – what we’re selecting:

And run both of the queries back to back to get their actual execution plans:

The basic execution plan of both queries is pretty similar, aside from the fact that the bottom one went parallel. SQL Server realized that sorting all this data was going to be a heck of a lot more work, so it split the work across multiple CPU cores.

This query sucks in a few different ways.

SELECT * can read more data. I know what you’re thinking: both queries have to read all of the 8KB pages in the table, right? No – go back to the first post in the series when I introduced the Users table. I mentioned that the AboutMe field, a big ol’ NVARCHAR(MAX), might be so large that it ends up getting pushed off-row: stored in different 8KB pages. Our SELECT Id query didn’t need to read those extra pages – but when we SELECT *, we do.

SELECT * can sort more data. SQL Server can’t just sort the LastAccessDates – it sorts the whole rows around. That means it’s going to need more CPU time, more memory, and do more spills to disk if it gets those memory estimates wrong.

SELECT * can take more time to output. This was always the thing I focused on as a database administrator. I would tell my devs, “Don’t select fields you don’t need, because it takes longer to yell that data out over the network.” Today, that’s the least of my problems: I’m much, much more concerned about the CPU time and memory grants consumed by the sort operator.

Your first clue about just how bad the SELECT *’s sort operator sucks is the 97% cost of the sort – but it’s not 97% of the same query cost. The original query cost was $18 Query Bucks, but check out the cost of the SELECT * – IT’S ALMOST NINE HUNDRED QUERY BUCKS.

If query bucks were real dollars, I could have bought my first car with the cost of this SELECT *.

I hate SELECT *.

It isn’t about the star itself – it’s about lazily getting all of the columns, including ones you don’t need, and then forcing SQL Server to sort them.

And just because you didn’t put an ORDER BY in your query doesn’t mean you don’t have a sort, either – Erik wrote about sorts that get injected into your plan even if you didn’t ask for ’em.

For the next post in the series, we’ll go back to selecting just the Id. It’s not that I’m only going to allow my developers to just select Ids and nothing else – I understand that we gotta get data out of the database. However, when I’m dealing with query tuning, and I see a SELECT * (or a big ginormous list of columns), I’m going to start the tuning process by asking if we really need all of those fields. Even with Entity Framework, you can pick the columns you want. Does it take a little more work on your part? Sure – but you’re a lot cheaper than SQL Server’s CPU licensing or Azure SQL DB’s pricing, especially as your app starts to scale.

So next, let’s get just the Id, but run the query repeatedly.

[Video] What’s New in SQL Server 2019

We must be getting pretty close to the official release of Microsoft SQL Server 2019. There are less than 80 days left in the year. (Interestingly, it’s been almost 60 days since Release Candidate 1 shipped – that’s unusually long for a delay between builds. Hmm.)

With the clock ticking down, now’s a good time to take a 40-minute look at some of my favorite new features: deferred compilation for table variables, adaptive memory grants, adaptive joins, and air_quote_actual plans:

To learn more about this stuff:

How to Think Like the SQL Server Engine: Adding an ORDER BY

We started out the How to Think Like the Engine series with a simple query with a WHERE clause:

Now let’s add an ORDER BY:

Here’s the updated plan – note that the query cost has tripled to $17.72 Query Bucks. Let’s dig into why:

We read the plan from right to left, hovering our mouse over each operator. Each operator is kinda like a standalone program that has its own dedicated work to do, and produces a specific output.

The first thing that happens is the Clustered Index Scan.

At the top right, the clustered index scan is exactly the same as it was in the last query. Hover your mouse over that and there are a lot of juicy details that we didn’t really dig into before:

  • Predicate: the clustered index scan mini-program is only going to return rows where the LastAccessDate > ‘2014-07-01’. (Pretty nifty how it changed the date format and added the time, right?)
  • Estimated Number of Rows: 149,259
  • Estimated Number of Rows to be Read: 299,398 (we have to scan the whole table to find the people who match)
  • Output List: Id, LastAccessDate (because upstream of this mini-program, other mini-programs are going to need both Id and LastAccessDate. Specifically, the Sort operator – which happens next – is going to need to sort all these rows by LastAccessDate.)

The data flows out of this scan operator, and flows into the next one: Sort.

The second thing that happens is the Sort.

The sort’s input is the 148,328 rows of Id & LastAccessDate that came out of the Clustered Index Scan, and they’re not sorted in any kind of order – but our query asked for them to be ordered by LastAccessDate. That’s where the Sort’s work comes in. Hover your mouse over it to see what’s happening in that mini-program:

A few points of interest:

  • Order By (down at the bottom): the main goal of the sort
  • Estimated Number of Rows: 149,259
  • Estimated I/O Cost: $0.01 (because there’s not much I/O to do if you’re sorting 150K rows)
  • Estimated CPU Cost: $11.7752 Query Bucks (because sorting is mostly CPU work)

But note that those estimates above are all based on 149,259 rows. If way more rows came in (or less), then our actual work would have been way more (or less.) This is a good time to stop and mention that you don’t see Actual Cost numbers here in Query Bucks: SQL Server doesn’t go back and re-cost stuff after the work has been completed. Anytime you see a cost – even on an actual plan – it’s just an estimate that was done before the query started. Just like you, SQL Server doesn’t later confess just how over budget or late its work was.

So why did the query cost triple from $6 to $18?

It all comes back to the $11.7752 Query Buck cost of the Sort operator, which is brand new in this plan. Sorting data is hard work.

I jokingly say that SQL Server is the world’s second most expensive place to sort data – second only to Oracle. Next time someone complains about the $7K per core cost of SQL Server Enterprise Edition, remind them that Oracle is $47K per core. $47K. Hoo boy.

If you need to sort data, try doing it in the app tier instead. Developers are really good about scaling out application work, and their app servers don’t cost $7K per core. I’m all about going to bat for my developers to get them more & faster web servers because those are easier/cheaper than scaling out sorting in SQL Server. If the query doesn’t have a TOP, then it probably shouldn’t have an ORDER BY.

What’s that, you say? You’ve never heard Microsoft dispense that advice?

Microsoft, the company that charges $7K per core to do your sorting for you? They haven’t told you about that? Huh. Must have been too busy about telling you how you can now do R, Python, and Java inside the SQL Server, all at the low low price of $7K per core. Funny how that works.

And the more columns you add, the worse it gets. Next up, let’s try SELECT *.

Why Query Plans Can Look Different on Different Servers

In the first post in my How to Think Like the Engine series, I started by doing a pretty simple query:

But I noted that if you were following along on your computer, you might see a different execution plan. If I tweak just one thing about my SQL Server, I see a different plan:

Fully optimized plan

That’s the SAME query, hitting the SAME table with the SAME data, on the SAME server, with the exact same data – but suddenly now the query plan goes parallel (as indicated by the Parallelism icon), and I get a missing index request. What gives?

In this case, I changed Cost Threshold for Parallelism.

CTFP is a server-level setting that says, “If a query costs more than this, then consider parallelizing the query’s work across multiple cores.” Most SQL Server setup checklists will suggest that you raise it from the default of 5 up to something more like 40, 50, or 75. I explain the setting in more details over in my post, What is the CXPACKET Wait Type?

In the first screenshot – where the query goes single-threaded – I had my CTFP set at 50, which means that SQL Server wouldn’t consider parallelizing it if it had a lower cost. If I hover my mouse over the first query’s SELECT operator in the execution plan, I can see the Estimated Subtree Cost:

Estimated Subtree Cost

Estimated Subtree Cost is SQL Server’s guess of how much CPU and IO work will be required to execute the query. SQL Server doesn’t know how fast your CPUs are or how fast your storage is, and these numbers are hard-coded to kinda old hardware. We needed a way to communicate what these numbers mean, though, so Kendra Little came up with the term “Query Bucks.”

This query costs $5.79 Query Bucks, which is less than our $50 Cost Threshold For Parallelism, and it was a pretty simple query to optimize, so SQL Server bailed out early on building a query plan. If you right-click on the SELECT operator and click Properties, you can see that in the Optimization Level property being TRIVIAL.

SQL Server is in a race against the clock: the faster it can ship your query results, the happier you’ll be. There was no sense putting more time into building this query plan since it was seen as trivially simple – any additional time spent compiling the plan would be just wasted.

When I dropped Cost Threshold for Parallelism back down to the default of 5:

  • The query cost of $5.79 was suddenly higher than CTFP ($5)
  • SQL Server went deeper into query optimization (the Optimization Level switched to “FULL”)
  • Parallelism was injected into the plan
  • The missing index recommendation code kicked into action

And now, SQL Server decides to build a better query plan for my configuration settings – that part is really important to understand:

The cost is now HIGHER – $5.87 Query Bucks – but SQL Server believes it’ll complete more quickly by spreading the work across multiple threads. It might be wrong – but it would be wrong based on YOUR input, which is what drove the Cost Threshold for Parallelism setting. By itself, SQL Server just doesn’t know your workload well enough to decide which queries should go parallel and which ones should stay multi-threaded in order to balance load as efficiently as possible across multiple cores. It needs your tender loving hand on the CTFP dial.

Change anything about the server, and your plans can change.

Oh sure, there’s the obvious stuff:

  • Server-level configuration settings (like sp_configure options)
  • Hardware capacity (core count and memory availability change plans)
  • SQL Server major version (2019, 2017, 2016)
  • SQL Server patch level (yes, even Cumulative Updates change optimizer behavior)
  • Data quantity – how many rows you have in tables
  • Data distribution between different values

But that’s really only scratching the surface. SQL Server is incredibly complex, and has an amazing number of ways that you can influence query plan behavior without even meaning to. For more examples, check out 5 Ways to Change Execution Plans Without Tuning.

This has implications for how you do performance tuning.

Go easy on untested changes. Just because you believe that changing a setting is safe doesn’t mean it won’t have unexpected performance implications.

Try to keep production & development servers as identical as practically possible. The closer your servers match, the closer your query plans will match. If production is SQL Server 2017 with 48 cores and 2TB RAM, but development is SQL Server 2019 with 4 cores and 16GB RAM and a tiny subset of production data, you don’t stand a snowball’s chance in hell of getting execution plans that are even remotely similar. You’re going to have a much harder – not impossible, just harder – chance of getting identical query plans. But at the same time…

Most of the time, you don’t need exactly identical query plans. Generally, a crappy query is a crappy query regardless of the SQL Server version or settings. In this blog post series, for example, if you’re following along with my queries, you don’t need to get exactly the same query plan in order to get the point. Just focus on the basics: is the query getting better or worse?

I kinda feel like this is going into more technical detail than really necessary to introduce you to internals in the How to Think Like the Engine series, but I needed to get that out of the way for those of you who wanna follow along with the query plans on your own machines.

In the next post, we’ll get back to expanding our query with more work and seeing how the plans change.

How to Think Like the SQL Server Engine

You’re a developer or a DBA, and you’re comfortable writing queries to get the data you need. You’re much less comfortable trying to design the right indexes for your database server.

In the next several posts, you’ll learn:

  • The differences between clustered and nonclustered indexes
  • How (and when) to make a covering index
  • The basics of execution plans

If you prefer watching videos rather than reading, you can watch my How to Think Like the Engine training videos free too. I just decided to write these out as a series of blog posts because I get so many questions about how this stuff works – from folks who don’t like sitting around watching videos. Let’s get started!

I’m using the Users table in the Stack Overflow database.

StackOverflow.com open sources your question & answer data, and I import that into a few free SQL Server databases you can download. In this series, I’m going to use the small 10GB StackOverflow2010 database (1GB zip file) – if you want to follow along and get similar results & query plans, use that one.

The Users table holds exactly what you think it holds: a list of everybody who’s created an account at StackOverflow.com. The schema is pretty simple:

  • Id – an identity field, starts at 1 and goes up to a bajillion
  • DisplayName – not unique, just the name you go by, like “Brent Ozar” or “Alex”
  • LastAccessDate – the last time you opened a page at StackOverflow.com

In SQL Server, most objects are saved in 8KB pages, and each 8KB page is dedicated entirely to one object. (We’ll save the nuances of columnstore indexes, Hekaton, and other niche objects for other blog posts.) You can think of these 8KB pages just like printed-out pages of an Excel spreadsheet, like this:

dbo.Users clustered index

When I teach this class in person, I actually give out printed-out pieces of paper with this stuff. You can download that 5-page PDF and print it out to help follow along and visualize the data as we go through demos. Just stick with the white piece of paper at first.

That white grid above is the clustered index of the dbo.Users table. In the Stack Overflow databases that I publish, the Id column is set up as the clustered primary key of the table.

This clustered index on Id *is* the table.

Look down the left side of the sheet: the rows are sorted by Id, and the sheet has all of the columns in the table. The clustered index IS the table, and these 8KB pages are the same whether they’re on disk or in memory.

Each database’s data files (MDF/NDF) are just a string of 8KB pages from start to finish. Some of the pages are used by SQL Server to save metadata about the database, but most of ’em are your indexes.

You can see how many pages a table contains by running a query against it like this:

The first line there, SET STATISTICS IO ON, adds info in SQL Server Management Studio’s “Messages” tab that tell you the number of 8KB pages SQL Server had to read to execute your query:

The “logical reads 7405” means SQL Server read 7,405 8KB pages. Generally speaking, the less pages SQL Server has to read to execute your query, the faster your query will go.

7,405 pages is about 15 reams of paper.

You know those 500-page packs of paper that you put into the copier or the printer? (No? Do you remember copiers and printers? Honestly, me neither.) The Users table is one of the smallest tables in the Stack Overflow database export, but it’s still 15 of those packs.

As we work through demos in the upcoming posts, I want you to visualize a stack of 15 reams of paper over in the corner of your room. When I ask you to query the table, I want you to think about how you’d execute that as a human being facing data spread across 15 reams of paper. It’d be a hell of a lot of work, and you wouldn’t be so eager to go grab the first piece of paper to start work. You’d wanna build a really good plan before you go tackle that stack of paper.

That’s why SQL Server tries to build good execution plans.

In SSMS, click Query, Include Actual Plan (or hit Control-M), run the query again, and you’ll get an “Execution plan” tab:

Read the plan from right to left: SQL Server scanned the clustered index (aka the white pages, the table), pushing 299,398 things out to the SELECT statement. It had to scan the whole table because we asked for the whole thing – well, we only asked for one column (Id), but the Ids were scattered across all the pages, so we had to read ’em all.

You wouldn’t normally write a query without a WHERE clause though, so let’s add one.

Let’s be good and add a WHERE clause.

Let’s only get the people who accessed the system after July 1, 2014:

And my execution plan looks the same:

(If you’re following along on your own computer, your query might look a little different – I’ll explain why in the next post.)

Both of my queries – with and without a where clause – read the same number of pages. Here, I’m running both queries back to back, showing their stats in a single Messages tab:

And if you look at the 8KB page again, it kinda makes sense why we have to scan the clustered index in order to find rows with a LastAccessDate > 2014/07/01:

dbo.Users clustered index

Even though our query returns less rows, we still have to do the same amount of work because the data simply isn’t sorted in an order that helps our query. If we’re going to frequently query by LastAccessDate, and if we want that query to run faster, we’re going to need to help SQL Server out by designing an additional copy of our table, pre-sorted in a better way: a non-clustered index.

Disclaimer: I’m simplifying a lot of stuff here.

In this blog post series, I’m going to try to cover a lot of the most important ground, fast. To do that, I’m going to leave out a lot of details that you would probably find interesting – but look, this is a blog post, not a book. It’s a starting point for your learning journey.

However, there are a few disclaimers I need to make or else the armchair bloggers are going to complain:

  • Note that the AboutMe column is clipped off. The AboutMe column is an NVARCHAR(MAX) because Stack Overflow lets you store a huge amount of stuff inside your profile. In SQL Server, large AboutMe data can be moved off-row: saved on separate 8KB pages. When you design tables with big columns like VARCHAR(MAX), NVARCHAR(MAX), XML, JSON, etc., you can end up paying a performance price as SQL Server jumps around to read all that data. Ideally, design columns just wide enough to store the data you need, and no more.
  • The 8KB pages don’t use Excel-style grids. SQL Server needs to cram as much data in per page as it can, so no, it doesn’t organize rows and columns on the 8KB page like a spreadsheet. Similarly, the column headers aren’t stored in plain text on each page, nor are nulls aren’t stored with all capital NULL, hahaha. I’m just using a spreadsheet as a visualization tool because it helps you understand how things work.
  • The 8KB page I’m showing is called a leaf page. Indexes have more kinds of pages too, like B-tree structures that help the engine rapidly find the right page when it’s seeking for a specific row. However, even if you just focus on the leaf pages and don’t learn anything else, you can still do a phenomenal job of improving your database performance – and that’s the goal of this series, getting you up to speed on the most important stuff quickly.

And there’s one more complex thing I need to tackle, but it needs its own blog post, which is the next one in the series: why your query plan might not have exactly matched mine.

I’m coming to Gothenburg & Oslo next summer.

The Nordic countries are some of the most beautiful places I’ve ever seen. Erika and I are scheduling a few trips over there in 2020, and the first one I can announce is SQL Saturday Gothenburg and SQL Saturday Oslo.

I’m teaching my one-day workshop before each event: Performance Tuning in 21 Demos. You’re a developer or DBA who wants to tune indexes and queries to make your application go faster. You don’t like learning by watching slides or reading white papers – you want to jump right into the code and see what happens. You’re the kind of person who loves watching YouTube videos that start with, “Hold my beer and watch this.” Attendees get a year of my Consultant Toolkit.

These pre-cons are an awesome deal for one day of in-person SQL Server training, and they usually sell out. I’m announcing these a year in advance so that those of you who want to travel can schedule it in advance:

Erika and I are making a big vacation out of it, flying in a week early to take a Hurtigruten cruise around Norway and taking the Bergensbanen train, reputed to be one of the most beautiful train rides in Europe.

You can even watch the entire seven hour journey courtesy of the slow TV movement, something I’ve found really wonderful lately:

See you in Scandinavia!

New Free Azure and SQL Server 2019 Training from Microsoft

Folks often email me and ask, “Brent, why don’t you do training classes on Azure?” The answer is simple: the products change faster than I can recoup the cost of building the training material. By the time the class gets enough revenue to pay off the downtime for building an Azure (or AWS or Google) training class, the material’s out of date. I’m sure you’ve sat through an online video where you’ve noticed things that are already no longer true.

But there’s a company who CAN afford to constantly build new stuff without charging you for it: Microsoft!

The sqlworkshops Github repo is chock full of courses built by Microsoft names that you’ll probably recognize like Bob Ward and Buck Woody. It’s all open source, and written in a way that you can follow along. Now, granted, that’s not necessarily as good as an instructor-led class, because you’ve gotta teach yourself, but … you don’t have to pay for an instructor, so there’s that.

Their latest course is a one-day workshop called SQL Server 2019 Ground to Cloud, and it covers:

  • What’s new in SQL Server 2019
  • Working with Big Data Clusters
  • SQL Server in Azure – the differences between Azure SQL DB, elastic pools, VMs, etc.
  • Migrating data from SQL Server into Azure
  • Which databases to use for different business requirements

You can download the 219-page workshop PDF here, get the PowerPoint slides, and browse the Github repo with the class resources. Don’t be fooled by the times listed to complete each module: if you’re doing these classes yourself, they’re going to take you more time, and that’s not a bad thing. Lots of the slides in these presentations are going to prompt you to jump in a different direction and read up on related features & documentation that you want for your own business projects. That’s a good thing: it means this “one-day workshop” is really something you can dive into for several days, or a few weekends.

If you’re a consultant, your clients are probably going to ask you questions that are covered in these workshops. Microsoft *wants* you to be able to present these workshops to your clients to get ’em the answers they want. These workshops aren’t just about empowering your self-paced learning: they’re also about empowering you as a consultant and/or trainer to get the knowledge out there.

And if you like this, you’ll probably like the rest of the sqlworkshops Github repo. It’s all free – you can’t beat that. (I certainly can’t!)

Updated First Responder Kit and Consultant Toolkit for September 2019

This month’s release has a lot of new feature goodness: the Consultant Toolkit can automatically upload the results to Amazon S3, plus sp_BlitzCache adds a new sort order to better catch unparameterized queries burning up CPU, a new warning for selects that are doing writes (other than >500MB spills, which we’ve been warning you about for years), and support for SQL Server 2019’s air_quote_actual plans.

To get the new version:

Consultant Toolkit Changes

Updated the First Responder Kit with this month’s script updates, plus:

  • Improvement: the –upload command line switch now uploads the data directly to a bucket in your Amazon S3 account. This helps you set up a scheduled task at the client’s office to send you the data weekly or daily – enabling you to keep tabs on their SQL Server’s health without having to constantly VPN in. See the S3 uploading documentation for more details.
  • Improvement: added Plans by Query Hash tab using sp_BlitzCache’s new @SortOrder = ‘query hash’ option. This is basically the same as @SortOrder = ‘cpu’, but it filters for queries with multiple plans in the cache with the same query hash.
  • Fix: on the Databases tab, on the Total line, the “# of Data Files” and “LOB Reserved GB” columns were swapped. This only affected the Total line though, not the individual databases.
  • Fix: on the Databases tab, data sizes are now populated for databases with NVARCHAR names.
  • Fix: formatting improvements.

sp_Blitz Changes

  • Improvement: added Uninstall.sql for those of you who wanna nuke it from orbit. (#2080, thanks ShawnPOWER for the request and Emanuele Meazzo for the code.)

sp_BlitzCache Changes

sp_BlitzFirst Changes

  • Fix: @CheckProcedureCache was being skipped for some time zones, especially British Standard Time users. I had to fix this before stepping foot on the Isle of Man again, naturally. (#2096, thanks PlanetMatt for the delightfully easy-to-reproduce bug report.)

sp_BlitzIndex Changes

  • Improvement: faster compression analysis on partitioned databases. (#2065, thanks Erik Darling.)

sp_DatabaseRestore Changes

  • Fix: last month’s release broke on case-sensitive systems, fixed. (#2097, thanks sm8680 for the bug report.)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

DBA Training Plan 23: Congratulations! Now, It’s Your Turn.

I bet you think you suck at databases.

You’re sitting there on your computer, reading all kinds of blogs, watching presentations online, and you’re thinking, “Everybody knows more about databases than I do.”

We all feel that way. There’s even a name for it: Impostor Syndrome.

I have it too. For example, about a decade ago, when I headed to Redmond to attend Microsoft’s $20,000 Certified Master program, I kept thinking, “Everybody else there has gotta be different than me. I’m not really supposed to be there, for sure, but at least I’ll be able to spend a few weeks rubbing elbows with people who really are good with databases.” The instructors told us that the program was designed so that when Microsoft had a really ugly customer support situation, they could send out a Microsoft Certified Master, and know for sure that the problem was gonna get handled, no matter how severe the problem.

When I passed all 3 exams and the lab, my opinion of myself still didn’t change. I still thought I sucked at databases.

Today, in the year 2019, I still think the same way: I don’t know enough. It’s just that now, I can say that I’ve made a lot more mistakes, and I just like to help other people avoid the same mistakes that I’ve made.

You’re just like me.
Except you need to start sharing.

If you’ve been reading my blog posts, even just the 20-some posts in my DBA Training Plan, you already know a lot more than most database administrators. Seriously.

What, you think other people with your job aren’t Googling their way through their daily work? Go read the questions at DBA.StackExchange.com or StackOverflow.com. We’re all winging this together. We’re all facing an overwhelming number of databases, applications, error messages, and email alerts. We’re all trying to find help in any way that we can get it.

You can offer some of that help.

There are so many ways you can give back to the community: blog, present, answer questions, record YouTube videos, chat on SQLhelp, or contribute improvements to open source tools.

Giving back pays you back, too.

Oh sure, there’s a fantastic feeling when you see the light go on in someone’s eyes because they suddenly grasp a new concept, or when you watch the web site visitor numbers go up. (Slowly. Very slowly.) But there’s much more than that, and they’re selfish benefits, but I’m going to tout them here in an effort to win you over into the Contributor Camp.

You get recognized when in the speaker room. When you introduce yourself at events, people will recognize your name and thank you for your contributions – because the speakers needed your help, too.

People will start seeking you out. They’ll say hi to you at events, follow you on Twitter and Instagram, and get to know you as a person. You’ll build relationships that will last the rest of your life.

Companies will start seeking you out, too. I can’t emphasize enough how important this is for your career. Go read Rock Stars, Normal People, and You to learn how it turned my own job hunting around. If you’ve already read that, go re-read it again. No, I’m not fishing for web site hits – I’m doing just fine, thank you, hahaha.

People will leave comments. Well, okay, this one’s kind of a mixed bag, because sometimes they thank you for all your hard work to make their life easier, but sometimes…sometimes they won’t.

Yes, people will criticize your work.
It happens to all of us.

Sometimes, they’re going to point out an error in your work. You’re going to feel like an idiot, but…you already feel like an idiot, right? The only difference is, it’s going to happen in public. But “public” is just your blog or a Stack Overflow answer or a Github issue. It’s not like it’s happening on national TV. We’re a tiny industry, and nobody’s gonna remember that one time you made a mistake – especially if you handle it by testing your work afterwards to double-check, thanking the commenter, and integrating the correction into your work.

Other times, they’re mistaken, and this is kind of a mixed bag too: sure, you can be proven correct, but you still have to go the extra mile of working with the commenter to explain yourself in more detail.

Other times, there’s no right or wrong answer – the work is a matter of opinion – and that can be kinda grueling, too. But even then, be excited and thankful because your work prompted them to get up off their rears and leave a comment.

Let’s do this.

Don’t let the prospect of comments trigger your impostor syndrome: you can do this. It’s time to decide how you’re going to give back and make a plan to pull it off. To help, here’s my PASS Summit 2016 half-day session, 500-Level Guide to Career Internals:

DBA Training Plan 22: Where to Learn More

Whew. We’ve worked through a couple dozen posts together in this DBA Training Plan, and you’re still just getting warmed up.

First, pick your specialty. Here’s some of the more popular specializations in the SQL Server business:

  • Production administration – managing uptime, building new boxes, troubleshooting clusters, setting up storage, designing virtualization infrastructures
  • Performance tuning – taking queries from others and making them faster via rewriting the queries, tuning indexes, and tweaking SQL Server settings
  • Business intelligence – working with end users to get what they need from the mysterious database using SQL knowledge plus visualization tools
  • SQL development – building long, complex stored procedures and functions to transform and return data

There’s no wrong answers here because all of these will have great long-term markets. You need to pick the one that calls the most to you because you’ll need to spend your spare time learning it. It’s rare to find a company willing to invest in your future skills, so your career needs to be in a specialization that you’re genuinely excited to study. After all, it’s hard to suck up and study after hours when you don’t really love the topic.

Next, pick your favorite learning method:

Reading – books, articles, blog posts. Just be aware that the shorter the delivery method, the more you’re probably going to have to consume to get the desired effect. My favorite free reading resource that walks the line between free/short and expensive/long is SQL Server Central’s Stairways. They cover a single topic well from start to finish, and they’re arranged in single-serving chunks. It’s about as deep as you can get without paying for a book, but when you’re ready for books, check out the free book library from Red Gate.

If you want an absolute firehose of blog posts, here is a Feedly collection of the blogs I subscribe to, and an OPML file that you can import into your feed reader.

Watching videos – aside from our YouTube channel, my next favorite free video resource is SQLbits. It’s a UK conference that records videos at all of their sessions and makes them freely available to the public online. You can also search by speakers.

In-person training classes – start by finding your local chapter of PASS, the Professional Association for SQL Server. These local user groups meet monthly, and volunteer speakers talk about a topic for about an hour. Then search for upcoming SQLSaturdays – they’re free one-day conferences with multiple tracks on SQL Server. You may have to travel to get to one, but it’s still cheaper than any other one-day training class options.

When you’ve exhausted those free resources or you’re ready to dive deeper into a topic, we’re here. Sometimes you need to dive deeper into a topic than you can get in blog posts or community videos, and that’s where our paid options come in. For example, our Senior DBA class covers advanced real-world experience on high availability and disaster recovery topics, and our SQL Server Performance Tuning class shows you how to save valuable time and make the server go faster immediately. Learn more about our training now.

DBA Training Plan 21: Building a New SQL Server

I have a secret setup checklist.

Oh sure, our free First Responder Kit has a SQL Server setup checklist that walks you through a few things you need to do before you install, plus a few things to do immediately afterwards. That’s good. It’s actually pretty darned good.

But it’s not the secret checklist.

Gather round and cover your monitor for a minute because I’m about to share the good stuff.

1. Hardware and Windows prep. Apply the latest bios and firmware to the hardware. Install Windows, then apply all patches. Install and configure your server vendor’s hardware monitoring, and the monitoring tools your sysadmins use. For clusters, run the Validation Wizard, make sure all tests pass, and save a copy of the validation report.

2. Smoke test it. Run CrystalDiskMark against all of the storage (local and SAN) and make sure it meets your expectations. (See my Building the Fastest Servers session for more details.) If it’s a physical server, test it by removing and/or disabling the network, storage, and power cables individually, and make sure that all of the server’s connections are fully redundant. If it’s a virtual server, do those same things to the host. Yes, your VM admins are going to be concerned – but honestly, they haven’t tested any of this stuff in the last year either. They’re just assuming it’s going to work, and you know what they say about assuming: you make an ass out of you and Uma Thurman.

3. Install SQL Server and the latest CUs. Here’s where our setup checklist comes in handy, but you’re not done. Restore your existing databases and time how long that takes. It’s your one chance to know how long a production database restore will take on the new hardware, and that helps you find out if you can meet your RPO/RTO goals. Configure your maintenance jobs like with Ola Hallengren’s scripts, and time how long they take. Try performance tuning the backups by striping them across multiple files – often splitting the backups across 4 files will make them complete 4x faster. Try performance tuning the index rebuilds by doing sorts in TempDB and see if that helps.

4. Smoke test SQL Server. Set up your high availability and disaster recovery features like log shipping or AlwaysOn Availability Groups, and fail the databases over between production and DR. Test a planned failover and document all of the steps involved along with how much time they take. Include copious screen shots. Your goal is to make a planned failover easy enough that any of your company’s admins can do it without knowing SQL Server. (Hey, the more you document, the more time you can take off for vacations.) Then try the same with an unplanned failover – yank the power cables out of the production servers (or hosts) and do a failover. This is usually a more manual process with more steps.

5. White-board out the failure possibilities. Diagram out all of the moving parts in the architecture – shared drives, cluster IPs, service accounts – and think about what would happen if each part fails. What would failure symptoms look like? What troubleshooting steps would help you discover that this was the real root cause? If you’re not sure how you’d recognize a failure of that component, now is your chance to force the failure and plan out your troubleshooting fails.

Only then do you start to go live on the new infrastructure.

This sounds like a lot of work, and it is. I’m not saying this as a consultant who wants to bill you a bunch of money – this isn’t usually the kind of work that I tackle myself. Instead, I work with you to build a list like this, and then you go off and execute the plan. You’re already comfortable with each of these tasks in theory – you just need to do them hands-on. Don’t just assume that the hardware will react a certain way when there’s a failure – go test it, find out for sure, and when disaster strikes, you’ll react with calm confidence.

DBA Training Plan 20: Planning Your Next Server

When the business decides it’s time to give up fixing the old server because it’s not fast enough or reliable enough, stop. The worst thing you can do is jump directly to the hardware vendor’s site, check a few boxes, and order a new shiny server that ends up with exactly the same problems as the last server.

1. Get the business’s RPO and RTO goals in writing. These two numbers determine whether you build a cluster, an AlwaysOn Availability Group, a log shipping backup, or a pair of tin cans connected with string. Learn what RPO and RTO are here.

2. Find the right SQL Server feature for your RPO/RTO goals. For example, if you’re not allowed to lose any data, and you need to fail over in less than 1 minute, then your options include failover clustered instances, AlwaysOn AGs with synchronous mirroring, or synchronous database mirroring. Use Page 2 of the HA/DR worksheet linked in the last post to figure that part out, or watch the HA/DR architecture modules of the Senior DBA Class.

3. Review the current server’s wait stats. To help figure out what kind of hardware you need for tomorrow, look at the pains you’re experiencing today. For example, if you’re suffering from PAGEIOLATCH% waits because you can’t cache enough data in RAM, the next server may need significantly more memory (or faster storage, or time spent tuning indexes & queries.) If you’re suffering from LCK% waits because the apps are holding locks on their side, though, you may not see relief no matter what kind of hardware you throw at it.

4. Benchmark your current hardware’s speed. I’ve written about how to check performance on a new SQL Server, but this is also really useful when you’re transitioning away from an old server. For example, take your backup & CHECKDB job runtimes, and the new server should be able to perform at least that quickly (if not more so, depending on what your bottleneck was in the above step, and which kind of bottleneck you’re trying to fix with the new server.)

5. Get the server’s proposed expiration date.  Servers are like milk cartons: both need an expiration date printed clearly on the outside. Ask the business users, “How long does this server need to last?” They usually say, “Forever,” and start laughing, but once the laughing stops, here’s how to handle the discussion.

6. Ask the developers, “What features are we adding during that time?” I don’t expect them to have specific line-by-line answers about which features they’re adding over the next 3 years, but I do expect them to have rough architecture ideas. Are we going to add some kind of big data analysis projects? Machine learning in the database? R, Python, or Java running inside it? Storing new kinds of data, like tracking every web site click?

7. Ask the business, “How much will load grow during that time?” Are we expecting to acquire our next several larger competitors, or maybe conquer sales in a new country? Or are we just expecting to tread water and only deal with incremental growth?

For all of the above questions, the more specific the answers are, the more precise your server sizing can be.

My new hardware

My new hardware

In reality, they’re very rarely specific.

And that’s fine!

But it just means that your server design needs to be more flexible: we need to design something that we expect will change rapidly over time as we learn more about the answers above. That’s one of the reasons why I really love building new environments as Availability Groups up in the cloud: it’s so much easier to add in new, more powerful replicas, join them to the cluster, and then get rid of the old ones. Your server design can leverage the flexibility of the cloud to handle new features or workloads, and you can throw cloud at performance issues.

How to Set Up Your Own Mastering Class VM

When you buy one of my Live Class Season Passes, you can save a lot of money by building your own VM to follow along with the labs. Or, you might just wanna re-run the labs later to see if you can do a better job, or if you’re still mastering a concept.

You’re going to need:

  • A server
  • The Stack Overflow database
  • The indexes & scripts set up
  • Then ideally, back the database up, and set up an Agent job to rapidly restore the database between labs
  • SQLQueryStress

Let’s talk through each piece.

Building a SQL Server

To get a rough idea of how much hardware to use, let’s look at the AWS EC2 VMs I give the class students:

  • Mastering Index Tuning & Mastering Query Tuning use an i3.xlarge: 4 cores, 30GB RAM, local NVMe SSD storage
  • Mastering Server Tuning uses an i3.2xlarge: 8 cores, 61GB RAM, local NVMe storage – we use more cores & RAM here because we run heavier stress tests, since we’re doing server-level discussions

Each server needs at least 500GB local SSD space to deal with the 350GB Stack Overflow database, index creation space, TempDB, backups, and restores. It needs to be fast storage, too: for a rough idea, you’ll need to be able to restore the ~350GB backup in 15 minutes or less. (I’m just using restore speed as one example here – you’ll be doing lots of storage-intensive tasks, like creating indexes on large tables.)

If you want to check your storage speed before downloading the Stack database, run a test with CrystalDiskMark. You want at least 1,000 MB/sec for sequential reads and writes. As an example, my laptop’s SSD speeds are shown at right, and it would be fine for the labs.

Can you get by with less hardware? Sure, but of course your performance will be different than what we’re seeing in class. I actually don’t think that’s a bad thing – every server out there is different – but just be aware that it’ll pose additional challenges for you if you try something like 4 cores, 8GB RAM. Your index creations will be terribly slow, and you probably won’t be able to keep up in class.

Can you use more hardware? Yep, and as long as you’re not caching the entire database in RAM, you’ll probably still have the same basic challenges that we tackle in all of the classes.

After building the server, install:

Getting the Stack Overflow database

Download the 2018-06 version of the Stack Overflow database: 38GB torrent (magnet.) If you’re not familiar with BitTorrent, here are detailed instructions, but just make sure to get the Stack version mentioned earlier in this paragraph – it’s important so you reproduce demos the same way. Your company may block BitTorrent – many do – but it’s the most efficient way to get a >300GB database out there. There aren’t a lot of file hosts willing to share a file that big, heh. You may need to download it from home.

When you extract it, it’ll expand to a ~304GB SQL Server database that you can attach. There are other smaller versions of the database too, but be aware that you’ll get radically different query plans, and some of the demos won’t work the same way since we’re querying for specific date ranges.

It ships in SQL 2008 compatible format, but you usually wanna learn on the most recent Cardinality Estimator (CE). To set that, in SSMS, right-click on the database, go into Options, and change Compatibility Level to the most current version you’re on – but not SQL Server 2019, which behaves quite differently – hold off on that compat level until SQL Server 2019 is released. I’m updating the classes with more modules for SQL Server 2019’s behavior, but if you try that now, you’re going to get wildly unpredictable behavior compared to the rest of your classmates.

Setting Up the Stored Procs and Indexes

We use the First Responder Kit scripts for performance analysis, and we use the below procs to help the demos along. Create these in the Stack Overflow database:

Don’t install that in production, bucko.

Next, run the index creation scripts below. It’s going to take a while, depending on how fast your lab’s storage is. After creating these indexes, we’ll take a backup of the database so that you can restore it each time to set back to a known starting point – rather than creating the indexes every time you restart labs, which can take tens of minutes.

Back It Up and Set Up a Restore Job

During class, between each lab, the students restore their database to this starting point. For sheer performance, you want to run the backup across 4 files – SQL Server backups & restores can actually go faster when you stripe them across more files, even when writing to the same storage.

Then, set up an Agent job to restore them. Here’s the script I use – just modify the file path locations for your backups. Note that mine also has a step to set Cost Threshold for Parallelism and MAXDOP back to normal defaults for the VM we’re working with in the cloud.

In my example below, I’m doing my restores from E:\MSSQL\BACKUP – you may need to modify your backup path and file names.

Test your restore job to get an idea of how long it’ll take – if it takes more than 15 minutes, you probably want to use a faster VM to be able to keep up in class. If the restore takes, say, 20-30 minutes, you’re going to have a tough time keeping up when you’re doing storage-intensive things like creating indexes.

SQLQueryStress for load testing

SQLQueryStress is an easy, free, open source load testing utility. Download the latest version (zip), and extract it to a folder. You don’t have to run a setup or anything – it’s just a single .exe file that you can run whenever you want to do a load test.

Then, download my load test files and extract those to the same folder. Your folder will end up looking like this:

SQLQueryStress folder

And that’s it! See you in class, and hope you have as much fun as Vadim had.

Menu
{"cart_token":"","hash":"","cart_data":""}