DBA Training Plan 12: What Query Plans Are In Cache?

In the last episode, while talking about the basics of executing queries, I said that SQL Server caches execution plans in memory, reusing them whenever the same query gets executed again. I’ve also talked about the dynamic management views (DMVs) that let you query what’s in SQL Server’s memory. Let’s put those two things together and find out what queries have been running on the server lately.

Start by running this on any SQL Server – development is fine:

Some of the columns in the result set include:

  • sql_handle – used to join to other DMVs to get the text of the query.
  • plan_handle – used to join to get the execution plan.
  • creation_time – when the query plan went into cache.
  • last_execution_time – the last time the query was run.
  • execution_count – the number of times Picard has said “Make it so.”
  • total_worker_time – total amount of CPU time burned while the query executed – and note that there’s also a last_worker_time, min_worker_time, and max_worker_time. Most of the columns in this DMV have that same pattern.

So if you want to find which queries have read the most data, you could do:

Pretty nifty, eh? Except now you wanna see what the queries are, and their execution plans. You could try to write your own DMV query from scratch, joining all that stuff together, but…this is the year 2019. (Or at least it is when I write this.) Ain’t nobody got time for that – there’s a much, much easier way to analyze your plan cache.

Enter sp_BlitzCache.

A few episodes back, you downloaded the totally free First Responder Kit. One of the scripts in there is sp_BlitzCache.sql – open that one and run it in the master database to create the sp_Blitz stored procedure. Then run it:

This shows you the top 10 queries that have run the longest time in total. I happened to pick Duration here, but sp_BlitzCache has a lot of useful sort orders:

  • @SortOrder = ‘cpu’ helps you find queries burning the most CPU time, often due to implicit conversions, sorts, or functions running row-by-row.
  • @SortOrder = ‘reads’ finds queries that are reading the most data, typically great candidates for index tuning.
  • @SortOrder = ‘memory grant’ finds queries that need large amounts of memory to run (or at least, SQL Server thinks they need a lot – but it can be dramatically wrong.)

In the top result set, scroll across a little to the right, and look at these columns:

  • Executions – how many times it ran.
  • Total Duration (ms) – the total runtime across all the executions.
  • Avg Duration (ms) – total duration divided by the number of executions.

As they say in the land of wine and cheese, voilà. Keep scrolling around left and right, and you’ll find all kinds of other interesting metrics – many of which you recognize from sys.dm_exec_query_stats, and some that took a whole heck of a lot of joining and calculating in order to display to you, dear reader.

It’s tempting to click on the Query Plan column and dive right in – but hold up. Before you go looking at operators and errors, take a few moments to read the Warnings column. These are a combination of warnings built into the query plan (like implicit conversions) and warnings we figured out on our own (like spills over 500MB or joins to a user-defined-function.) The latter set of warnings is especially important because they’re not displayed with a yellow bang on the plan: they’re hidden issues that you need to be on the lookout for as you go spelunking through the plan and the query.

As you read through the warnings, you’ll doubtlessly come across one that doesn’t make intuitive sense to you. That’s where the second result set comes in: it’s like a decoder ring of the warnings you’re seeing.

sp_BlitzCache query warnings

Want to tune the queries?
This is where specialization comes in.

Back in episode 3, when we talked about treating our servers like cattle, I said that sooner or later, you were going to have to pick a specialty. The journey of learning to performance tune queries – for example, removing barriers that cause forced serialization – is typically a different journey than efficiently automating the rollout of Always On clusters in Azure VMs via PowerShell scripting.

Now’s a good time to pause and read my post on the job duties of production DBAs vs development DBAs.

If you want to focus on production DBA work, then your goal for this week is to hand off the sp_BlitzCache output to your developers. This is good, juicy information for them to understand which queries are using the most resources. Many of my clients actually publish the sp_BlitzCache output on an internal web page so the developers can go check it at any time – it turns into a bit of a competition, making fun of whoever’s queries are up on the top of the suckerboard. (I call sp_BlitzCache the suckerboard, not the leaderboard, because it’s the query plans that suck the worst.)

If you want to focus on development DBA work, performance tuning the queries and indexes, then here are your learning resources for this episode.

If you want to do both production and development DBA work, you just need to be aware that you’ve only got so many hours in the week. You can do a kinda-sorta-halfway job at both, but you’re not going to do a phenomenal job of both – at least, not as long as you’ve got real day job duties too, like responding to help desk tickets and sitting in meetings. I’m a great example of that myself – I’ve long since given up production DBA work, and I focus on the performance side instead. They’re both fun, high-paying jobs, don’t get me wrong – I just picked the latter, that’s all. (And there are plenty of other rewarding careers around data, too.)

DBA Training Plan 11: The Basics of Executing a Query

Up til now in the DBA Training Plan, we’ve been taking inventory of our servers, making sure the data’s well-protected, and understanding the basics of how the data’s stored in indexes inside those data files. Now, let’s start looking at performance, and let’s start by zooming really far out to think about how SQL Server runs a query.

When you pass in a query:

  1. SQL Server comes up with an execution plan for it
  2. SQL Server executes that plan

The first one happens in what seems like an instant, but in that instant, a whole lot of really complex decisions are made that will determine your entire experience.

It’s a lot like building a custom house.

When you decide to build a custom home from scratch:

  1. An architect designs a blueprint plan for it
  2. The construction company turns that blueprint plan into reality

The architect asks you a lot of questions about your style, how you like to live, and probably most importantly, how many people you expect to be living in this house. Capacity planning and honesty with your architect are both important here: if you tell them you plan to live by yourself and don’t entertain much, then you’re going to get a very different blueprint and cost than if you say you’ve got 8 kids and throw a lot of parties.

Once the architect hands off plans to the construction company, those plans are pretty well set in stone. You might be able to make small, superficial changes, but you’re not changing it from, say, 1 bedroom to 5 bedrooms. If you were single and had your architect design a 1-bedroom home, but then during construction, you decide to get married to someone who has 6 kids from a prior marriage, it’s not like the builder can move a couple of walls around and make everyone happy. You’re going to have to stop that project, then go back to the architect for a new plan.

There’s a strong division between the two phases: the architect’s work, and the construction company’s work. I might even say there’s a wall between them, but then this metaphor gets really complicated.

SQL Server works the same way.

(And by that, I mean all your construction projects run late and over budget. (Just kidding. (Not really.)))

Earlier, I used a really simple two-step process saying “SQL Server” comes up with an execution plan, and then executing it. I was wildly oversimplifying that – but now let’s dig a little deeper:

  1. The architect hears your list of requirements. Unlike our house-based analogy, where the architect can ask followup questions, he just has to sit here and listen to your demands, and hope that you describe your needs in a clear, accurate manner. T-SQL is a declarative language: you’re declaring the shape of the result set you want, but not necessarily describing how you want the exact result set to be built. (You can do that, though.)
  2. The architect thinks about whether he’s heard those requirements before. If someone’s come into his office and asked for something extremely similar, he may pull a blueprint out of his cache and hand that to you so the construction company can immediately start working.
  3. If he hasn’t, he’ll design a new blueprint execution plan. To get a rough idea of just how hard this work can be, scan Microsoft’s Query Processing Architecture Guide – and then realize that it’s an incredibly brief summary! The architect does all this work, and then saves the query’s blueprint execution plan in cache (memory) so that he can reuse it if you happen to run the query again.
  4. The construction company takes the fresh or cached plan and starts work. This is the part you usually focus on, and you usually say things like, “Dang, it’s taking way too long to run this query.” Thing is, it’s not usually the construction company’s fault: they were just given a plan that wasn’t appropriate for the amount of data involved in the query. Maybe it was designed for a tiny amount of data but way too much came back, or maybe it was designed for a huge amount of data and did a ton of unnecessary prep work for a tiny amount of data.

When you’re reviewing the cost & runtime of a project, start by looking at the blueprint to see if it’s appropriate. This is where things usually went wrong.

Start by reviewing the blueprint for a query.

In SQL Server Management Studio, click Query, Include Actual Execution Plan, and run this query:

That query gives you a list of databases on the server – but I don’t really care about the results. I’m more interested in the execution plan. Click on the Execution Plan tab of SSMS, and you get a graphical plan showing the work that SQL Server did. We’ll dig more into interpreting execution plans soon, but for now, right-click on the SELECT icon at the top left, and click Properties:

Execution Plan

A new Properties pane will appear on the right side – look at the top of that, and there’s a ton of juicy details:

Plan Properties

  • Compile CPU = the number of milliseconds of CPU time spent building the query plan
  • Compile Memory = well, I don’t think I’ve ever really used this for much
  • Compile Time = the number of milliseconds overall spent building the plan (can be much higher than CPU if we have to wait for things to come back from storage, like the first time we compile a plan in a database after it’s been restored or started up)
  • NonParallelPlanReason = if the query plan couldn’t go parallel and use multiple cores, you’ll get a hint here
  • Reason for Early Termination of Statement Optimization = the architect may decide to go home and start drinking early

These metrics focus on the architect: the one building the query plan. When you think about it, he accomplishes a spectacular amount of work in a short period of time. It’s amazing that we get a working query plan at all that quickly!

This has interesting implications for you.

Plans are built without awareness of execution frequency. SQL Server doesn’t know whether you’re going to run this query just once, or a million times. It just builds a plan once and caches it.

Plan structure isn’t revisited even when it goes wrong. SQL Server doesn’t go back and say, “Whoa, performance on that one was terrible – I thought I was only going to bring back 1 row, but 1,000,000 rows came back, and I should do things differently.” Starting with SQL Server 2017, Microsoft does revisit some parts of the plan, like how much memory gets granted, but the plan shape stays the same. It’s up to you to figure out when the plan shape is incorrect, and coach SQL Server into designing a better plan.

You don’t get great knobs to control architect time. You can’t tell SQL Server, “Listen, I need you to spend a lot more time designing plans on this server because performance here is really important.” There’s a trace flag you can enable to force SQL Server to spend more time building EVERY query plan, but that’s a bad idea since it also includes things like IntelliSense queries and monitoring applications.

In theory, SQL Server will rewrite your query. You’ll come across blog posts that suggest it doesn’t matter how you write your query, but read those posts carefully: they come with caveats. The classic example is when SQL Server runs out of time during plan generation and decides to ship a plan – even though it knows there might be better plans available if it spent more time doing optimization.

In practice, human-readable queries perform better. If a human being can look at the query and get a pretty good idea of what’s going on quickly, then so can SQL Server. If a human being looks at the query and develops a look of confused terror, then SQL Server is probably not going to build a great execution plan either.

Plans stick around longer than you might expect. In a modern server with 64GB RAM or more, plans can stay in cache for days, weeks, and even months. If it was a good execution plan, then that’s a good thing. If it was a bad execution plan, well, that’s why people start doing index rebuilds in order to “fix” the problem. More on that in the next episode.

With this background in mind, next up, let’s review which query plans are in your server’s cache.

DBA Training Plan 10: Managing Index Fragmentation

In the last episode, we looked at your index designs with the output of sp_BlitzIndex. You might have gotten a little overwhelmed what with all the different warnings (and all the learning resources!)

You might have thought, “Is there an easier way?”

“Is there a way I can just run a job, and have something automatically clean my indexes up?”

DefragAnd that’s how folks start focusing on fragmentation rather than the design of the indexes themselves.

Those of us who got our start on Windows back in the days when it ran on spinning rusty magnetic frisbees called “hard drives” used to run apps to defragment the data. These apps would shuffle bytes around, getting them all lined up into contiguous rows. The idea was that if your data was all in a row, then your spinning rusty magnetic frisbees would be able to read the data faster. See, the way they worked internally was that they had a head that physically moved around the drive, reading one place at a time, just like a record player.

What’s that?

What’s a record player?

<sigh>

Music used to be stored on vinyl discs. No, not the compact ones – those were plastic. This is before that. No, even before tapes. These vinyl discs had little grooves in them that would cause a needle to vibrate, and …

NO I’M NOT MAKING THAT UP, IT WAS REALLY A THING.

Lemme just stop there. The whole “defragment your drives” advice is really, really old, that’s what I’m saying. It stems from a day and age where technology was completely different than it is today. Your SQL Server’s data files probably don’t live on spinning rusty magnetic frisbees anymore, and even when they do, they’re scattered amongst lots of other data files – even from different servers all sharing the same drives. All these different data files are being accessed at the same time. The whole reason we used to defragment our data is just dramatically less relevant today.

Today, a lot of your data is stored in RAM. You know what that first R stands for? Of course you don’t, kid, just like you don’t remember that cars used to have record players. <sigh> RAM doesn’t stand for Sequential Access Memory, and you don’t need all your data pages lined up in a row in RAM. A table’s pages aren’t all lined up in a row in RAM, nor does that matter. (Does fragmentation matter if you hit disk? Yes, but then you’re screwed anyway.)

To make matters worse, a lot of folks try to “fix” fragmentation by setting fill factor lower than 100%, purposely leaving empty space on each 8KB data page so that they can add more data later. When they do that, they:

  • Make the database larger (which also makes their backups, restores, corruption checking, index rebuilds, etc all take longer)
  • Make their memory smaller (because each 8KB data page will have a percentage of empty space, and that space is cached in RAM as well)
  • Make table scans take longer (because more logical reads are involved)
  • Give a unicorn heartburn

So what’s a DBA supposed to do?

Leave the server-level fill factor at the default 100%. If you can prove that the default is causing you a problem on a specific index, I’m okay with lowering it on specific indexes, but I’m going to make you prove you made things better instead of worse. If you want to do that, read Jonathan Kehayias’ post on how to track page splits with Extended Events, and make sure to read the comments too.

Run Ola Hallengren’s index maintenance scripts weekly. We talked about these free tools when we were covering backups and restores, and they’ve got more tricks up their sleeve: index reorganization, rebuilds, and statistics updates. I’m not a fan of their defaults, though: start with these less-aggressive defaults instead. Avoid running ’em more than once a week, though, unless you can prove that rebuilding your indexes & updating your stats is fixing things (rather than making ’em worse.)

And if your mentor told you that you have to run daily index rebuilds in order to fix execution plans, I’ve got good news and bad news. The bad news is that your mentor was wrong – but the good news is that I’ll teach you the real problem over the next couple of episodes.

Wanna learn more? Here are a few more resources for this episode’s topic:

  • Why Defragmenting Your Indexes Isn’t Helping – a 1-hour video where I explain the two kinds of fragmentation, show why one kind does matter, and show in more detail why setting fill factor is such a bad idea
  • What’s So Bad About Shrinking Databases? This common maintenance task actually causes fragmentation, which is especially terrible if you’re doing both this and defragmenting your indexes.
  • Columnstore Online Index Rebuilds – Columnstore indexes need more attention to maintenance. Niko Neugebaeur is the go-to for in-depth columnstore tips, and in this post, he analyzes the online rebuild process and its overhead. (You’re definitely gonna want online index rebuilds when you’re dealing with big data warehouses.)

Now that we’ve got the right indexes in place, the next episode will tackle how queries are executed.

DBA Training Plan 9: My 3 Index Guidelines

In our last episode, we used sp_BlitzIndex @Mode = 2 to get to know the contents of our database, sorted our indexes from biggest to smallest, and started asking questions about leftover backup tables that we probably didn’t need to keep around anymore.

Now that you’ve taken out the trash, let’s have a closer look at what’s left. This time, run sp_BlitzIndex without a Mode parameter:

You get back a prioritized list of index design issues like:

  • Aggressive indexes – where a lot of locking is taking place
  • Hoarder – meaning someone’s been creating indexes like they’re free
  • Indexaphobia – where SQL Server is begging for more indexes
  • Self-loathing heaps with forwarded fetches – causing lots of extra reads for no good reason

When you’re dealing with an existing hand-me-down database that’s been in your company for years, your tables are probably facing a mix of extremes. Some tables don’t have any indexes at all, and we’re scanning the table every time we’re looking for just one row. Some tables have dozens of indexes because folks just created every recommended index willy-nilly.

This can be a little overwhelming at first, but before you start digging into the index design and learning what anti-patterns you’re facing, let’s stop for a second to ask:

Who’s supposed to manage the indexes here?

Whether you’ve got a third party application or one developed internally by a team of developers, there’s always a gray area around who’s supposed to be designing the right indexes. Going down the left side of this grid is a list of job duties involved with designing, maintaining, and hosting a database application:

The top group of tasks – designing tables, writing queries – that part is pretty straightforward. Whoever builds the application does this work. After all, if you suddenly removed a column from their table, you’d be in trouble because the application would stop working.

The bottom group of tasks – provisioning the VM, installing SQL Server, keeping it patched – that’s also pretty straightforward. It’s your job, hahaha – you’re in the DBA Training Plan, and that’s your work. It wouldn’t be fair to expect the app developers to do this job.

The middle group is where it gets sketchy.

Either of you could do it: either the app developers, or the DBA team. It just comes down to the support policy for each application. In some apps, like Microsoft Sharepoint, the customers are forbidden from screwing around with the indexes. If you have a performance problem, you basically have to throw hardware at it. In other apps, like Microsoft Dynamics, customers are encouraged to do their own index designs to support the way they use the application. To learn more about this choice, read my post on who’s allowed to add indexes to ISV apps.

Your job as a DBA is to:

  1. Share the sp_BlitzIndex results with the developers
  2. Say, “It looks like there are a lot of improvements we could make together. Who should drive this effort, me or you?”
  3. After getting the answer, communicate it clearly to the end users, too – as in, “We’ve found a lot of index improvements we could make, and the developers will be driving that effort.” (Or you will! Either way.)

Sometimes, it’ll be somebody else’s problem – but sooner or later, the job duties are gonna fall on your lap. Let’s get started with a few simple guidelines to help get your tables performing well.

Brent’s 3 Index Guidelines

You were assigned my free How to Think Like the Engine video class in a prior homework episode, so I’m going to assume you already know the differences between clustered indexes, nonclustered indexes, and heaps. With that in mind, here are a few index guidelines to start with as you’re reviewing the output of sp_BlitzIndex:

  1. Start every table with a clustered index. There will be times when you don’t need one, but when in doubt, start with one. Only remove it when you can prove that it’s a performance problem.
  2. Aim for around 5 indexes or less per table. The more indexes you add, the more copies of the table you have. Granted, you’re picking which columns are in each copy – but they’re still by no means lightweight. I’ve seen databases where every insert required hundreds of writes to disk because there were hundreds of indexes per table.
  3. Aim for around 5 columns or less per index. The more columns you add, the larger the index becomes. All of those columns have to be updated synchronously whenever the data changes, so in a perfect world, you avoid indexing “hot” fields that change all the time. (Yes, this includes “included” columns – they take up space in the indexes too, and they have to be updated every time they change. They ain’t free.)

These are starting guidelines,
not commandments carved in stone.

Brent Ozar presentingI get it – because I stand up on podiums and present, you think I’m Moses standing up on the mount, holding The Three Index Commandments. I can see why you’re confused – I look a lot like a Biblical character alright.

These are just starting points, meant to help you take the first few steps on your journey. You’re going to find plenty of scenarios along the way when you need to take a different approach. However, until you understand why those guidelines are in place, you want to abide by ’em.

Your next steps are to learn why I wrote those guidelines. These learning resources I’m about to point you to are huge – you could spend the next several months just learning about indexing. It’s up to you how deep you want to go:

If you like learning via books:

  • SQL Server Query Performance Tuning – chapter 8 digs into indexing. It’s several dozen pages, and teaches you most of what you need to know to get started. Plus, if you’re going to invest in a book, I want you to be able to keep gaining from it over time, so this book will pay off in more areas of our DBA Training Plan when I refer to other chapters later.
  • Expert Performance Indexing – a whole book dedicated just to SQL Server indexes.

If you like learning via blog posts:

If you prefer videos & online classes:

Whoa – this is hard work, right? Figuring out the right indexes, testing to make sure they’re getting used – can’t we just defragment our indexes and call it a day? Nope, and I’ll explain why in the next episode.

DBA Training Plan 8: What’s Inside Your Own Databases

In the last episode of the DBA Training Plan, we talked about what’s inside the MDF file: a series of pages that store a database’s tables. Now, it’s time to check out the contents of YOUR database to understand what’s using the most space, what’s getting used the most often, and what you might be able to clean out to make your job easier.

Let’s start by querying the DMVs.

SQL Server has tons of dynamic management objects, also known as dynamic management views (DMVs) or dynamic management functions (DMFs) depending on how they’re called. Generally, if you hear someone talking about DMVs, they’re talking about these system objects that LOOK like tables, but are really objects you can query to see what’s going on inside your database.

Try running this inside one of your databases – it doesn’t really matter which one, and it’ll go blazing fast without blocking anyone else:

The results of sys.indexes is a bunch of metadata about your indexes: what kinds of indexes you have, what kinds of locks they allow, whether they’re filtered, and much more. But, uh, reading this stuff is painful – for example, you don’t even get the name of the tables involved.

Think of the DMVs as building blocks: you have to join lots of them together before you’ve got anything useful. In the dark ages of the 1990s and 2000s, database administrators spent a lot of time doing just that – building their own houses out of building blocks. Their queries had all kinds of bugs, didn’t account for edge case scenarios, and weren’t kept up to date with the latest additions in each new version of SQL Server.

Today, there’s an easier way: sp_BlitzIndex.

sp_BlitzIndex is a free, robust, open source script originally started by Kendra Little. Since it’s open source and lives in a public Github repo (along with a lot of other cool scripts), it’s gotten a ton of updates over the years. Today, it’s a powerful, easy way to understand what’s being stored in your database.

Download it in our free First Responder Kit, and when you open the zip file, you’ll see a list of SQL scripts. For now, we’ll just start by opening sp_BlitzIndex.sql in SSMS, and execute that in your Master database on a development server. The script only creates sp_BlitzIndex as a stored procedure – that’s all. It’ll execute instantly.

There are lots of parameters for it, but let’s start with:

Change StackOverflow2013 to the name of a database you care about. What the Mode 2 switch does is give you an inventory of all the tables & indexes:

sp_BlitzIndex

In the Definition column, CX means Clustered Index, and PK means Primary Key. In my screenshot, all my primary keys are clustered – but that may not be the case in your own database.

To get to know your database, copy/paste this spreadsheet into Excel and turn it into a table. Sort it by a few different columns and ask yourself a few questions:

  • Sort by Rows descending, and make a note of which tables have the most rows.
  • Sort by Reserved MB descending, and think about how much space is being used up by these objects. Note that some of your tables might have really wide indexes with lots of columns – you might be duplicating the size of the table many times over. (Now’s also a good time to take note of any backup tables you might have made long ago, and forgotten to delete along the way.)
  • Sort by Create Date descending and look at recently created objects.
  • Finally, scroll way over to the right and note the Drop & Create SQL columns. Since the DMVs have so much metadata, we can basically build the index creation scripts from scratch – making it way easier to build undo scripts as we make changes.

Since you’re running this in a development server, a lot of the columns will be empty or zeroes. Total Reads is a great example: most of your indexes probably aren’t getting any action on the development server because you’re not running the app’s full workload over there. Once you’re comfortable running sp_BlitzIndex on a development server, and you understand that it isn’t doing damage or blocking other queries, then you can take it up a level by running it on a production server. In production, you’ll get much more valuable diagnostic data.

You’re going to want to drop indexes. Hold up.

For now, just focus on getting to know what you’ve got inside these databases – after all, they’ve probably been the same way for years, and another few days isn’t going to kill you. Before you make changes to the indexes, I’ve got some more stuff to teach you about the risks and rewards.

Instead, your homework today is to run:

That pulls the data for all databases at once. It’ll be slow, since it’s basically looping through each database one at a time. (We don’t have a way to quickly gather data across all databases at once.) If you have over 50 databases, we also make you pass in @BringThePain = 1 just so you understand that it’s going to take a while, hahaha.

Copy/paste the results into Excel, and don’t make changes to indexes – but DO make changes to TABLES. Before we start doing index tuning, I want you to clear out the garbage. Look at your largest tables – say, the biggest 25% by rows and by size – and look at their Reads columns. Are these tables backups that were made a long time ago in a galaxy far far away? If so:

  • Ask the team if they’re still in use
  • If not, rename the tables – add _ToBeDeleted_20190830 at the end of the table name (or whatever date you want) – that way, if somebody comes running in screaming that their table is gone or their query stopped working, you can just rename it back to bring it back to life
  • Set yourself a reminder for that YYYY/MM/DD date to go back in and drop those tables (making sure you’ve got a good database backup first, and keep that backup around)

Why clean out the garbage first? Because these tables are slowing down your backups, restores, corruption checks, statistics updates, and much more. I’ve seen cases where just taking out the garbage cut our nightly maintenance job times in half!

In our next episode, we’ll start digging into the health & design of those indexes.

“But NOLOCK Is Okay When The Data Isn’t Changing, Right?”

I’ve already covered how NOLOCK gives you random results when you’re querying data that’s changing, and that’s a really powerful demo to show folks who think NOLOCK is safe to use in production. However, I’ve gotten a question from several users:

But I’m querying data that isn’t changing – sure, OTHER rows in the table are changing, but not the rows I’m querying. Isn’t NOLOCK okay if I’m querying stable rows?

Nope, not even close. Get any Stack Overflow database, and start with the query from the last post – except this time we’ll modify it so that we’re updating some of the users, and querying others.

One of the more common DisplayNames is Alex. In one window, I’ll get a count of the number of Alexes:

And in the other window, I’m going to set everyone’s location and website – EXCEPT the Alexes, who I’m not going to touch:

Watch disaster unfurl:

The number of users named Alex appears to keep changing – EVEN THOUGH I’M NOT UPDATING THOSE ROWS! The reason: the location of Alex’s rows may be moved around by things that are happening to other rows.

I can’t emphasize this enough: with NOLOCK, you can:

  • See rows twice
  • Skip rows altogether
  • See data that was never committed
  • And have your query fail with an error

If you’re cool with that, great – NOLOCK is for you. If not, it’s time to consider other ways to get the performance you want while still getting the data accuracy that your users require.

What The Consultant Toolkit Does

It’s one thing for me to tell you how much easier The Consultant Toolkit makes my job and how clients love how much I can tell them about their server during the initial sales call – but it’s another thing to hear it from a happy customer:

Thought I would share a story, with my thanks for the Consultant Toolkit. A timeline:

  • Friday – random guy tweets he needs some DBA help. I get tagged, along with a couple of other consultants
  • Saturday – dude emails me. I send back an offer to “quick review” a server
  • Sunday – get results from the Consultant Toolkit
  • Monday afternoon call – impress the crap out of the prospect
  • Tuesday – negotiations
  • Thursday – contract signed to do dozens of HealthChecks and provide 7 hours of remediation effort per server
  • Today – I’m actually on their jump box they made just for me working.

One week from a tweet to a $35K contract. He had talked to several other firms before the tweet, but he sold me to management with “[Redacted] has already given us actionable items on the main server at a great price.”

And that’s what The Consultant Toolkit does: help you close deals faster and provide more value than your competitors. Over 100 companies are using it today – so you’re probably already competing with someone else using it. If you’re not closing as many deals as you used to…hmm…

DBA Training Plan 7: How SQL Server Stores Stuff in MDF Files

So far in the DBA Training Plan, we’ve been working hard to make sure the contents of our MDF files are backed up, corruption-free, and secured. Now, uh…what’s actually inside them?

Inside each of your database data files (MDFs, although technically you can use any extension you want, even PDF or MP3), SQL Server stores your data in 8KB pages. That’s kilobytes – not megabytes, not gigabytes, but just mere kilobytes.

Say that we create a table:

First off, yes, I understand, I shouldn’t do ReaderName in one field – I should have FirstName, LastName, MiddleName, Suffix, Prefix, yadda yadda yadda, but I’m trying to keep this short. Now see what you did? It’s long again. Doggone it, it’s hard teaching this stuff quickly.

Anyhoo, in this table, each record takes up just a little bit of space. ReaderID is an INTeger, which takes 4 bytes. It’s the same 4 bytes whether that number is 1 or 1,000,000. ReaderName is a VARCHAR(200), which means we can store up to 200 characters in here, and each character takes a byte. If we insert ‘BRENT OZAR’, that’s 10 characters (and boy, am I a character), so we need 10 bytes to store it.

If all of our readers average about 10 characters in their name, that means we could fit about 500-600 records per 8KB database page. (In reality, there’s some overhead because SQL Server also needs to use some parts of the page to store metadata, and we’ll talk about that in another post.)

This blog isn’t very popular, so we can keep all of our readers on a single 8KB page. As we insert, update, and delete readers, SQL Server fetches that 8KB page off disk, brings it into memory, makes the necessary changes, and then writes that data page back to disk. The 8KB page itself is the smallest unit that SQL Server will cache – it doesn’t cache individual rows/records – and each page belongs exclusively to just one object.

A Word About Objects

You’ll notice that I avoid using the word “table”. Tables are cool, but as we start to dig into what SQL Server’s doing under the hood, we want to start thinking about these three object types:

Heap – a table with no clustered index. In my dbo.Readers table, I didn’t specify in what order SQL Server should store my data, so it’s just going to slap the data down on my 8KB page in any old order.

Clustered Index – what we normally think of as a table. If I’d have created my table like this:

Then SQL Server would store my data in order of ReaderID. That way, when I search for ReaderID #42, it can look at a little directory (B-tree) that tells it where ReaderID #42 is, and it can jump there without scanning through all three of my readers. The clustered index is sorted in the order of the ReaderID field, but it’s actually the full copy of our table, including all of our fields – in this case, just ReaderName.

Nonclustered index – If I tell SQL Server to:

Then SQL Server will create a second copy of my table sorted by ReaderName. This copy of my table will only include the fields specified in my index (ReaderName), plus whatever fields it needs to get back to the clustered index or heap.

All three of these objects – heaps, clustered indexes, and nonclustered indexes – will be stored on separate sets of pages. We won’t have the clustered index and nonclustered index for the same table on the same page – they’re split. That’s why when we’re doing space analysis, we have to think in terms of indexes, not tables.

Pages & Rows on Disk

The sys.dm_db_index_physical stats Dynamic Management Function (DMF) returns the number of rows and pages stored in each database object. It takes parameters for database ID and object ID, or you can pass in NULLs to get information across all of your database objects. Scroll down to the examples link in that Books Online page, and you’ll get queries to examine a single table – I’d strongly recommend starting with a small table, because some of the parameters for this DMF will cause SQL Server to actually look at every page in the object. That means if all of the pages for that object aren’t cached in memory, SQL Server will go pull those pages from disk, and that can slow down your running SQL Server.

This DMF also includes average record size and max record size. This makes for fun spelunking: how big is each record, really? Just because we make everything a VARCHAR(8000) doesn’t mean we’re actually storing 8,000 characters in each field. Now, don’t go changing your database structure just yet – you can easily break applications when datatypes change. Let’s leave that for later.

You can get similar metadata much faster by using sp_spaceused, but it doesn’t get the cool info about average record size, and I wanted to encourage you to go spelunking here.

Your Homework: Watch a Free One-Hour Class

In my free class How to Think Like the Engine, I explain pages, indexes, joins, SARGability, and more. I use real database pages from the StackOverflow.com database for demos, and you actually get PDFs to print out and follow along as we go.

Print out this 5-page PDF to follow along – trust me, you’ll need these pages – and then here’s the first module of the class:

When you’re ready, move on to the next module.

As we start to move up the Hierarchy of Needs from capacity into performance, you’ll start to see how these foundational items are so important.

So far, we’ve been talking about the general structure of databases. In the next episode, we’ll get specific about what’s in your database.

DBA Training Plan 6: How to Monitor Drive Space

Don’t.

It’s not your job.

No, seriously, hang on a second. I’m not saying that you should ever say things like “It’s not my job,” but I do want you to understand when you should avoid reinventing the wheel. Your Windows administrators should be monitoring every Windows server you have, making sure it doesn’t run out of drive space. This isn’t some wild, unique need that only SQL Server has – it’s everywhere in your data center.

(You might actually even BE a Windows admin, just here because you need to learn about working with SQL Server. It might actually BE your job to monitor this stuff. That’s cool – that’s how I got started too.)

In our journey from the base of my Hierarchy of Database Needs to the top, we do indeed need to talk about capacity – but I don’t want to monitor drive space from inside SQL Server, and I don’t want to buy SQL-specific tools in order to pull it off. Yes, you can technically use commands like xp_fixeddrives to list all of the SQL Server’s drive letters and how much free space they have, but that doesn’t work everywhere. For example, you might have mount points or databases on UNC paths, neither of which will show up in xp_fixeddrives. So leave the drive space monitoring to the pros.

Why You Shouldn’t Build a Monitoring Tool

If you want to be a professional developer, you should build things. You should learn what exceptions could happen, how to trap them, and how to fail elegantly. It’s hard work, but if you get good at it – really good – you can build amazing things and make a killer living for yourself.

But if you want to be a professional DBA, you should leave the developing to the developers.

I’m not saying you shouldn’t learn the dynamic management views (DMVs), how to dig through them to find the evidence you need, and how to turn that data into actions. Hopefully, I’m going to teach you how to do a lot of that over the course of the next six months. Take another look at the Hierarchy of Database Needs again, and think for a second about all the things we’re going to be learning over the coming posts. Just in the last five, I’ve had you build an inventory of your servers, automate your database restores, start locking down security, and enumerate your database risks. The next few are chock full of things that will help you become a hero in the eyes of your users and your developers.

Building a crappy monitoring tool in your spare time will not give you that same level of respect. (And yes, if you’ve only got your spare time at work, your monitoring tool is going to be crappy. If you’ve got so much time that you can build a great monitoring tool, you need to focus on providing results to the business fast, because you’re about to be laid off as pure overhead.)

How to Pick a Monitoring Tool

There are basically two kinds of real time monitoring tools out there for SQL Server.

Up/down Alerting – these tools make sure the SQL Server service is running, and that it has all the basic needs taken care of. If the server runs out of drive space, if the service stops, if the event log starts filling up with errors, or zombies attack, these tools will let you know. The most popular software in this category is Microsoft System Center and Red Gate SQL Monitor.

Performance Diagnostics – these tools try to derive insight from SQL Server’s dynamic management views (DMVs) to help performance tuners wring the most throughput out of their hardware. The most popular tools here are Idera SQL Diagnostic Manager, Quest Spotlight, SentryOne SQL Sentry, and to some extent, the tools in the above category too.

If I was you, I’d start by asking the Windows team if they’ve got any software that handles the up/down alerting for services, drive capacity monitoring, etc. If so, get them to start monitoring your servers. I’m being selfish here – my real reason is that I want to save my monitoring budget for tools in the Performance Diagnostics category. Surprisingly, all of these tools are around the same price – around $1,000-$2,000 USD per monitored instance.

Then, no matter which category of tool you’re buying, make a list of the last 4-5 things that have caused your phone to ring after hours. Here’s some of my personal favorites:

  • Deadlocks
  • Out-of-control query running for hours
  • Long-running job, like a backup going too slow
  • Queries that desperately need an index

Figure out how to recreate those same symptoms in your development environment, and then get a free trial of a couple of the tools I mentioned. (All of them provide free 10-14 day trials.) Reproduce the problem, and watch how the monitoring software reacts. Does it lead you to the root cause quickly, or does it just have a bunch of flashing red lights on the console? The best ones will save you time by getting you right to the solution.

After you’ve done your evaluation and picked a favorite, get a quote from them – and get a quote from the other vendors as well. Share the quotes with the competing salespeople. You’d be surprised how willing they are to negotiate, and you might be able to use the cheapest quote to get the best tool at a price that works for you.

Your Homework: Start Tracking Your Pain Points

I don’t expect you to magically get budget approval for a tool this week, but I want you to start making a list of day to day problems as they strike. If you waste four hours troubleshooting a deadlock problem, make a note of the date, the time required, and a few words about the root cause. In a month or two, you’ll be able to present this list to your boss as proof of why the right monitoring tool could save you money.

If you’re still convinced you should build your own tool, go check out the open source project Opserver. It’s a free performance monitoring tool from the people at Stack Overflow. They use SQL Server (and a lot of other cool tools), and theirs is the best open source performance dashboard I’ve ever seen.

Whew. So far, we’ve been focused a lot on protecting these valuable MDF files. Next up, what’s in ’em?

Podcast: Data Driven Inteview w/Me

I had the pleasure yesterday of spending an hour chatting with Andy Leonard and Frank La Vigne for the Data Driven podcast. Some of the questions they asked me include:

Data Driven Podcast

  • How did you find your way into data?
  • What’s your favorite part of your current gig?
  • Complete this sentence: when I’m not working, I enjoy ___
  • Complete this sentence: I think the coolest thing in technology today is ___
  • Complete this sentence: I look forward to the day when I can use technology to ___
  • Share something different about yourself.
  • What book would you recommend today? (And we covered a few of my favorites)

You can listen to the podcast free here. Enjoy!

sp_WhoIsActive Is Now Open Source.

Adam Machanic, Japanese Snack Connoisseur

Adam Machanic, Japanese Snack Connoisseur

If you’re still using sp_who, sp_who2, and Activity Monitor, listen up: there’s a great free replacement, and it’s now open source, too!

sp_WhoIsActive is a much more powerful way to analyze what’s happening right now on your server. Here’s what you need to know:

With this, we’ll be deprecating sp_BlitzWho. The only reason we initially wrote it was that we needed something purely open source that we could distribute without any copyright concerns. Now, with Adam licensing sp_WhoIsActive under the GPL license, there’s absolutely no reason that any of you shouldn’t be using sp_WhoIsActive as part of your database troubleshooting routine. You should high-five Adam the first chance you get. Or buy him liquor. Or both.

DBA Training Plan 5: Knowing Who Has Access

Who can get you fired? Run our free sp_Blitz stored procedure on one of your production servers and pay particular attention to the Security section of the results. It lists the logins who have been granted the sysadmin or security admin roles.

Don’t think of them as logins.

Think of them as people who can get you fired.

These people can drop databases, drop tables, change stored procedures, edit data, or even change SQL Server configuration settings like max server memory or maxdop. You’re probably not getting alerted when any of these things change – we just can’t afford to monitor every single thing in SQL Server and send alerts on it. At some point, we have to be able to trust certain logins, and that’s where the sysadmin and security admin roles come in.

Except when we first get started learning database administration, it’s usually because we’re the only DBA in the shop, and the server is a mess. The front door is unlocked, the key is under the floor mat, and everybody knows we’ve got a big screen TV in the living room.

How to Get Started Locking Things Down

Before you start removing people’s SA rights, be aware that there can be political backlash. In one shop, the head developer’s SA rights were removed, and he stormed into the DBA’s office the next morning screaming. Turns out one of his apps automatically created a processing database every night, did a bunch of work in it, and then dropped the database. Nobody knew because it was only around for 30-45 minutes. The problem could have been avoided by communicating the security changes ahead of time, and that’s where we need to start.

Take the sp_Blitz output to your manager – just the security portions – and say something like this:

“Here’s the list of people who can do anything on the production server – delete data, drop databases, or change performance settings. If they do, here’s the list of applications that will be affected, including accounting and payroll. I don’t want to take away all of their permissions – I just want to start by giving them full permissions over their database, but not in any other databases, and not at the server level. Can I talk to them about doing that?”

Note that we’re only going to TALK to them, not actually do it, because we need to communicate with them first. Then, go to the end users or developers involved and say:

“We’re starting to lock down the production server, but I want to make sure you have all the permissions you need. I’m going to make you a complete database owner inside your database so you can do anything you want in there, but I’m going to take away your rights to the other databases (like accounting and payroll), and I’m going to remove your permissions to change server-level settings like how much memory the server can use. I’m planning on doing it next weekend, and I want you to have my email and phone number so that if anything breaks on that date, you can call me and I can audit what’s happening on the server to see if it’s related to the permissions change.”

When You Get Resistance

If you get any pushback from developers or users, go to the project managers or business people who have a large stake in the database. For example, if the accounting database is on the server, go to the CFO and say:

“Here’s the list of people who can take down the accounting system. They have the permissions to drop the database at any given time, and there’s nothing I can do to stop it. I’d like to get that changed – can I schedule a short meeting with you and the development manager to get everyone on the same page?”

You want to turn it into a business problem, not a technology problem, and the CFO will very much be on your side. She can’t afford to have her entire department go down just because some developer didn’t put a WHERE clause on a T-SQL statement.

If you want to have a little fun with it, read my post, How to Talk People Out of the SA Account, and the part 2 followup with a vicious sense of humor.

I Know, This Week Isn’t Fun

Database administration isn’t all candy and roses. Sometimes it’s boring politics and paperwork, and this is one of those weeks.

In the first week, we built a spreadsheet inventory of our servers, and now it’s time to fill in a little more details. Since we’re analyzing security, we need to know which applications live on each server, and who’s in charge of each of those applications. You don’t have to fill in the specifics of who has read or write permissions in each database, but we want to at least know the business purpose and the business contact.

The business contact is the one who really helps us get the database locked down because their job is on the line if this data is lost or unavailable. (Come to think of it, you’re someone who can get THEM fired!) In the coming weeks, you’ll be working more with them on reliability and performance, too, so now is a good time to start fleshing out that part of the spreadsheet.

Next week, we’ll move on to something you DON’T have to do: checking for free drive space. No, seriously!

DBA Training Plan 4: Checking for Corruption

You would think that when SQL Server writes your data to disk, it’s totally safe.

Forever.

You would be wrong. It’s almost like there are gremlins out to get you, purposely trying to trash your data. In the really old days, it was problems with magnetic hard drives. Later, it was buggy shared storage arrays. Today, it’s the cloud, where your storage is 99.971H!7 LOST CARRIER

That was an old joke. You probably won’t get that.

First, get alerted when SQL Server finds corruption.

By default, if SQL Server reads a corrupt page from disk, it logs a message, and then keeps right on going. It doesn’t email you, doesn’t show up with a red icon in SSMS, or tell you anything about the impending doom. Let’s get that fixed.

Here’s what you need to do:

  1. Configure Database Mail so it can send you alerts when all hell breaks loose
  2. Set up an operator so SQL Server knows who to email – ideally, this is a distribution list
  3. Set up Agent alerts so SQL Server emails you whenever it detects corruption

Second, get SQL Server to proactively check for corruption.

By default, when SQL Server backs up your databases, it happily backs up corrupt data without throwing an error message, either. We need it to start checking the checksums on each page to make sure they’re actually valid. Wherever you configured your backups – whether it’s maintenance plans, Ola’s scripts, or a third party tool – look for the option to check the checksums on backup. In most cases, it won’t take a noticeable amount more time, and besides, you need the peace of mind anyway.

Then, schedule regular jobs to run DBCC CHECKDB to proactively read the contents of your databases and look for corruption. We’ve got videos on how do to CHECKDB when you don’t have a DBA, when you do, and when you have a Very Large Database (VLDB).

Some folks are hesitant to check for corruption in production because they’ve run into timeouts during CHECKDB. That’s typically caused by bad storage or storage networking performance, and it’s a good sign that it’s time to dig deeper into the storage.

Third, check yo’self before you wreck yo’self.

Download a corrupt database from Steve Stedman’s Corruption Challenge. It’s a ten-week series where he gives you a different corrupt database each week, and it’s up to you to figure out how to fix the corruption with as little data loss as possible. Then, you can compare your answers to the winners and see how they solved it.

I’m not saying you have to actually fix the corruption – just download the corrupt database, and then attach it to your production server. Yes, your production server. I want you to find out if your corruption jobs really report errors, if your alerts actually work, and if your backup jobs are checking with checksum like they’re supposed to.

I would much rather have you learn with a known broken database than learn the hard way when corruption strikes your servers.

Wanna learn more about corruption?

Work along with me in this interactive blog post series:

  1. Let’s Corrupt a Database Together, Part 1: Clustered Indexes
  2. Part 2: Corrupting Nonclustered Indexes
  3. Part 3: Detecting Corruption

And sooner or later, you’re going to hit corruption. Here’s how to react.

In our next episode, we’ll find out who can get you fired.

DBA Training Plan 3: Treat Your Servers Like Cattle, Not Like Pets

In our last post in this series, I talked about why you should be using Ola Hallengren’s maintenance script for backups and sp_DatabaseRestore for faster restores. These two tools have implications for how you build, back up, recover, and fail over your SQL Server.

There’s a saying in the cloud: treat your servers like cattle, not like pets.

What this means for you, in graduating level of difficulty through your upcoming career:

1. Server setup isn’t done until you’ve tested and timed a point-in-time rebuild & restore. When you’re done setting up the full and log backup jobs on the new server, but before you’ve gone live, restore your real production databases. Run the full and log backup jobs, producing a few sets of backup files in their soon-to-be-normal location.

Now here comes the fun part: shut the server down, and start the clock. Your mission is to bring all of the databases online, with their databases up to a specific point in time – say, 5 minutes ago. Do it, and time how long it takes you. No, you’re not allowed to turn that original server back on – he’s broken. (Say for example that he’s having problems with a Windows patch, and the business is tired of waiting for the sysadmins to figure it out – they need you to start Plan B now.)

You’ll learn a lot as you step through this. It feels overwhelming at first, because you’ve probably been building your servers manually, stepping through the SQL Server install GUI manually, setting up things like Cost Threshold for Parallelism, your favorite trace flags, and SQL Agent jobs. That was fine when you weren’t a Database Administrator – but now that you’re moving towards becoming a full time DBA, you need to think about a better long term plan. That means…

2. Servers should be built from well-documented checklists. As you go through phase 1, you’ll realize that you really need to be writing this stuff down as you go. Documentation is a gift to your future self – not just in the sense that it makes your job easier, but it lets you pass work off to others when you need help. During an emergency, when you’ve lost 19 SQL Servers at once (hoo boy, August 2007), you’re able to hand different checklists off to different coworkers and say, “You, follow these checklists and tell me when you’re done.”

This affects your daily database management, too: I’ve seen DBAs apply a trace flag or sp_configure setting during an emergency in order to fix something, but then not document what they just did. When that SQL Server goes down hard, and they build a new one, they simply don’t remember every setting they made to the old server along the way. When the new server comes up online, and has the same performance issues they had a long time ago that required the unique setting, they have to go through the same old troubleshooting process. Bottom line: don’t make undocumented changes to servers – leave a bread crumb trail behind in your server’s documentation or build checklist instead.

I’ve heard folks say, “It’s no big deal: I can just copy the settings from the old server to the new one.” Not when the main server goes down hard, you can’t, nor when you’re dealing with a disaster recovery situation because your primary data center or availability zone is down.

3. Long term, servers should be built and user databases restored with automation. As you start to manage dozens of database servers, checklists don’t scale. You need a way to turn manual steps into automated actions so that you’re not sitting around waiting for the next progress bar to move to the end. However, this is where my advice starts to get a little fuzzy: the SQL Server product has sprawled out to a huge coverage area over the last decade (SSAS, SSIS, SSRS, clustering, Availability Groups, Linux, and now Kubernetes) and there isn’t a single desired-state installation & configuration tool that covers the whole surface area.

If you want to get to automated builds & restores, the closest thing right now is DBAtools & PowerShell, but just be aware that while we call this “automation,” it’s more about automation tooling than just something you download and click Go. Think of it as a box of tools, and you still have to build out an automation solution that works well for you. Thankfully, the DBAtools community is welcoming & hard-working, and they’re constantly making the tools better.

4. Even longer term, if it’s automated, it’s testable. For decades, serious database administrators have automated their own fire drill testing. They’ve built standalone SQL Servers that restore their production backups, run CHECKDB against them, and then send reports of success or failure to the DBA. This gives you a canary-in-the-coal-mine approach to detecting backup problems early.

Your Homework

Database Reliability Engineering – good buy

This post is a little shock-and-awe.

But it’s also a fork in the road: do you want to specialize in being a production database administrator (who makes sure the SQL Server is online and the database is accessible), or do you want to specialize in database development & performance tuning, making sure the queries run quickly?

These are really two different careers: people who tell you they do both are lying, because they suck at one of them (or maybe even both of ’em.)

If you want to specialize in production database administration, your future learning path focuses on things like PowerShell, scripting builds, monitoring for server configuration changes, applying patches in a controlled and recoverable manner, and engineering reliability into your infrastructure. If you think this is the route you want to take, pick up the book Database Reliability Engineering to get a glimpse of what your next ten years will look like. I don’t think there’s ever been a more exciting time to take that leap.

If you want to specialize in performance tuning, we’ll be talking more about that over the coming weeks as well – stay tuned.

If you’re not sure yet, that’s okay. You can always come back to this fork again later. For several years, I chose to focus on production database administration because I really liked building servers and troubleshooting ’em. Then back around 2015, I came to the fork again and decided to give up on that line of work and focus on performance tuning instead. I love the feeling of making a user’s query faster, or even better, making an entire workload go faster. (Plus I was sick and tired of getting called after hours for emergencies!)

In the next post, we’ll talk about the next job duty for production DBAs: checking for corruption.

DBA Training Plan 2: Backups (And More Importantly, Restores)

When I first started out as a SQL Server DBA, I thought things were going well as long as the backup jobs ran successfully. I’d go into SQL Server Agent every now and then, make sure they were still running, and … that was the end of it. I figured if disaster ever struck, I’d just do a restore. How hard could it be? I was missing the entire point.

Nines don't matter if users aren't happy

I bought two of these as soon as I saw ’em. Seriously. Shout out to @mipsytipsy.

In theory, we design our backup strategy ahead of time with 5 Simple Questions About Your Backups, and we’ve memorized the 9 Letters that Get DBAs Fired: RPO, RTO, and CYA.

In practice, small disasters strike all the time when we’re not ready. The most common reasons to do restores aren’t to revive an entire server – it’s just to get back a few small tables or an individual database. Somebody ran the wrong DELETE statement or dropped a database in production instead of development, and next thing you know, we’re all scrambling. Let’s think through a few things ahead of time to make the crisis easier.

So since the only reason we do backups is so we can do restores, so I’m going to talk about the restore process first before I talk about backups.

Where to Do Restores

When you’re restoring code (stored procedures, views, triggers, etc) or individual tables, don’t restore onto the production server. I don’t like touching production servers more than I have to, and let’s face it – you’re already having a bad enough day as it is. That’s why you’re doing a restore, remember? So when you only need to pull a few objects or tables out, let’s do our work on a different server (like dev or QA) and leave production as it is. I’ve also written about restores in my ideal dev, test, and production environments.

After we’ve safely restored the right data onto another server, it’s easy to copy that data across to other servers. For simplicity and security, you can set up a linked server on the production box with read-only access over to the restore server. Then, from production, you can run INSERT statements using a SELECT sourced from the linked server tables.

However, if you’re restoring tables (not databases) over 10GB, you’ll probably want to do the restores directly on the production server to make the data copies faster. Just make sure you’re extremely careful with the scripting and the database names – we don’t want to restore over the top of your working production database.

This may require adding extra space to the production server. In one emergency, I freed up the necessary space by shrinking all of TempDB’s data and log files down to just 1MB. TempDB was on fast drives, perfect for a one-time emergency restore, and that particular server didn’t have any other activity happening due to the outage. We’re not always so lucky, but it helps to think out of the box like that.

A word of warning: if referential integrity is involved, like if you’re trying to restore tables that have relationships to other tables that you’re NOT restoring, then you can be in for a world of hurt here. We’re not going to cover that scenario – it really is different in each case.

Doing the Restore

Big picture, your restore process goes like this:

  1. Restore the most recent good full backup
  2. Restore the most recent good differential backup
    (you don’t have to restore all of the differentials)
  3. Restore the transaction log backups since that differential, in order
  4. Run RESTORE WITH RECOVERY to bring the database online

As you go through steps 1-3, make sure you use the WITH NORECOVERY option as you do restores. This leaves the database in a restoring state so that you can continue to apply additional backups to it. If you forget those two key words or accidentally use WITH RECOVERY, your restore is done at that point – you can’t apply additional backups. If you get that wrong, your restore has to start over again from scratch, so please, for the love of all that’s holy, double-check that option before you start the restore.

Doing all of this with the GUI sucks. The more backups you have, the longer this takes, and the more likely you are to run into errors. Instead, what you need is a script that looks at all of the backups in a folder, plucks out the most recent relevant files, and restores them for you automatically, in order, and that’s where sp_DatabaseRestore comes in.

Designing Backups for Faster Restores

sp_DatabaseRestore requires the database backups to be laid out a particular way in advance, though. It’s designed to work best with Ola Hallengren’s free maintenance scripts, which can handle backups, checking for corruption, doing index maintenance, and more.

In the past, you’ve probably used the built-in maintenance plans for backups. Those aren’t a bad way to start – I mean, compared to the alternative of not having backups at all – but now that you’re going through a full blown DBA training plan, it’s time to graduate to something more powerful. Ola’s scripts are much more flexible and capable than maintenance plans, even integrating with 3rd party backup apps your sysadmin team might be using.

Your homework for this episode:

In the next episode, we’ll think about the implications that these things have on the way we do our backups – how often we back up, how we approach failovers, how we do disaster recovery for the cloud, and more.

And let’s pause here for just a moment to notice: isn’t it cool that all of the stuff above is open source? It’s all totally free, built by other members of the database community just like you. During the course of this training plan, we’re going to be referring to a lot of community & open source stuff. As you work with it, think about joining the communities for each of those tools to meet other folks like you and start expanding your career horizons.

If you’re a member of our Recorded Class Season Pass and you want to watch the training videos on this stuff, here are your next steps for learning:

In the next post, we’ll talk about why you probably won’t want to restore the master database.

DBA Training Plan 1: Build a Server Inventory

At your company, walk into the VP of Sales’s office and ask them how many salespeople they have.

NO, I mean, don’t actually DO that, because he’s going to ask you why the sales app is so slow. But I mean, imagine if you COULD walk into his office and ask him that. I bet he would have an instant answer. He wouldn’t wait for a single moment. Or walk into the CEO’s office and ask how many employees he has. Or ask the CFO how much the annual budget is.

My point is that when you’re in charge, you need to know exactly what you’re in charge of.

Make a Spreadsheet Inventory

Let’s start by making a spreadsheet. Across the top, make columns for:

  • SQL Server Version (2017, 2016, Azure SQL DB… well, maybe 2008 too, sadly)
  • Edition (Standard, Enterprise, Developer)
  • Environment (Production, QA, development, disaster recovery)
  • Department (sales, HR, accounting, IT, mixed use)
  • Business Users Affected (list of people to email when the server dies)
  • Application Names (internal or external product names)
  • Plan B

That last column gets a little tricky – it means, if this server dies in a fire, what’s our Plan B? Are we going to restore the databases from another server? Will we fail over to a log shipped copy? Or will we update our resume and head out for an early lunch? As we go farther into the training, we’re going to get much more specific about Plan B.

There’s no wrong answers here: this step is about understanding where we’re at today, not where we’d like to be. We’re never where we’d like to be. (Me personally, I’d like to be at a poolside bar right now, but noooo.)

If you’d like to get ambitious, add additional columns for Core Count, CPU Count, and Memory. The core and CPU counts will get you a head start on licensing, although I have to confess that we’re not going to cover licensing as part of our training plan. To get even more ambitious, read the comments on this post about what should be in your DBA inventory.

What We’ll Do With This Spreadsheet

Right now, you probably sleep well at night thinking you know everything that’s happening in these servers. Hoooweee, have I got bad news for you. Over the next six months, we’re going to progressively add more and more columns to this spreadsheet as we learn more about our environment, uncover problems, and learn how to solve them.

For bonus points, add a column for What Scares Me. Write a quick note about the one thing that scares you most about this server. Maybe it’s blocking problems, maybe it’s the failing jobs, maybe it’s code you don’t understand. Six months from now, I bet you’ll be proud of how this column has changed.

How to Survey Your Network for Servers

Put a row in the spreadsheet for every server you have – whether you’re in charge of it or not. We want to start with a good inventory of what we have.

The easiest way – if you’re just starting at a company and you have no idea what the network holds, check out the free Microsoft Assessment and Planning Toolkit. It’s designed for licensing compliance, but it works great for building server inventories. It scans your network looking for whatever programs you pick, but just confine it to SQL Servers only.

The PowerShell way – if you’re into PowerShell, Kendal Van Dyke wrote the open source SQL Power Doc to discover instances on the network and export the list to Excel. He since joined Microsoft and the tool hasn’t been updated much, but it may be enough to get the job done if you’re comfortable with PoSH.

The GUI wayIdera’s SQL Inventory Manager ain’t cheap, and I haven’t used it myself, but if you want a sealed-box-product tool to do it, there you go.

Got another way? If you’ve got a way that you’ve used successfully, leave a comment so we can add it to the list for other readers. Bonus points if it’s free, open source, and/or super-easy to use.

In the next post, we’ll use this inventory spreadsheet to assess how bad of shape we’re in.

Welcome to the DBA Training Plan.

Foreword: 8 years ago, I launched an email series with a 6-Month DBA Training Plan. I sent one email per week, bringing people up to speed on the most important topics that nobody taught ’em along the way. It’s hard to believe it’s been 8 years! This month, I’m revisiting the emails, updating their content, and publishing ’em as blog posts too to make ’em easier to find. Buckle up: here come 24 straight blog posts to take you from zero to…well, at least a hero who’s smart enough to wear the underpants on the inside.

I learned to be a DBA the hard way. The hard, crappy way. Our SQL Server was in trouble, and I was the kind of person who would roll up my sleeves and figure out whatever was broken. Next thing you know, I was the one responsible for managing it.

And boy, did that suck.

I didn’t know about any free videos or blogs or e-books. I didn’t have the budget to go to a class, and even if I did, I didn’t know where to find a good one. DBA wasn’t in my job title, and it wouldn’t be for years.

I want to make your learning experience much better than mine was.

Every Wednesday for the next six months, you’re going to get an email introducing an important DBA topic. We’ll cover the basics of the topic, plus link to our favorite free training material on that topic.

Your journey will start with Ozar’s Hierarchy of Database Needs – like Maslow’s Hierarchy of Needs, but it’s more about databases than pooping. (Although we’ll probably sneak in a few poop jokes.)

In the next six months, we’ll take you from the bottom of the pyramid up to the top. You may not be able to fix everything in your environment during those six months, but at least you’ll understand the work involved and how to confidently get started. It’s all about making your journey to Professional Database Administrator easier than ours was.

If You Have Questions

Oh, and you will! When you have questions about what you’re reading, start by Googling your questions. It sounds obvious, but you’d be amazed at how much good stuff there is out there to help. (I’m not being sarcastic. This is exactly how we get started whenever we have our own questions.)

If you’d like to post a question, try DBA.StackExchange.com or SQLServerCentral’s forums. Yes, both of these require registration, but they’re totally worth it. On both of these sites, there’s hundreds – sometimes thousands – of people who are itching to help answer your questions. They react fast, too – make sure to go back and revisit your question every 10-15 minutes for the first few hours to see what’s happening. Answer their clarification questions, and include as much detail as you can. For more instructions, read Getting Help with a Slow Query.

If you still can’t get the answers you need, hit Reply to any of our emails. They come from a real email address manned by, uh, me – Brent. This isn’t one of those emails where it says, “Don’t hit respond because nobody cares.” Seriously, I care, and that’s why I put these emails together. Just please don’t use that as your FIRST resort – there are only so many hours per week that I can spend answering questions. By using the above methods first, you’ll be able to leverage the whole community’s expertise instead of just me. I’m all about teaching you how to fish.

On to Part 1: Building a Server Inventory

Training Class Update: Fancier Audio/Video Setup

I’m constantly looking for ways to raise my game, so I made some investments this month to take my video class audio & video to the next level.

I’m now doing green screen effects on the recordings. I picked up an Elgato retractable green screen that mounts to the wall so that I can pull it down when I’m doing a training class, and roll it back up when I’m doing normal client work:

Home office 2019

I’d used floor-standing green screens in the past, but they were really frustrating for a lot of reasons: wrinkly, tough to get crisp background separation, had to light them separately, took up a lot of space on tripods, etc. Because of the hassles, I abandoned it for several years. This Elgato one totally pays off:

Old setup:

New hotness:

Second, you’ll notice that the recording doesn’t have a big microphone in the way. That’s because I’ve switched to a Sennheiser wireless lapel microphone hooked up to my computer via a Focusrite Clarett USB interface.

The lapel microphone picks up more room noise, so I’ve also added the iZotope RX 7 audio plugin to my production process to clean up the audio. It integrates well with Telestream Screenflow, the screen capture tool that I use, kinda like Camtasia. (Screenflow is a little more polished than Camtasia in that it allows for things like these 3rd party audio plugins.)

Between the chroma key (green screen) effect and the audio plugin, though, my CPU goes up to 100% during video editing as I scrub around the timeline, editing stuff, and the fan kicks off. I try to edit every class’s videos together during the bio breaks so that the day’s training videos are encoded/uploaded as soon as class finishes. This personal goal keeps my editing timeline tight, minimizing the amount of effects & things that I can add. I’m riiiight on the edge of what I’m comfortable doing with laptop power – I’m only on a 2018 6-core MacBook Pro, and the newer ones are calling to me. I’m waiting to see what the rumored 16″ version looks like.

The green screen effects and audio clean-up only happen in the recordings: live attendees see the green screen behind me, and the audio isn’t quite as crisp (since the cleanup happens in post-processing.) The next level-up will be figuring out how to run those effects live. Streamers do the chroma key thing with OBS, and I’m starting to privately experiment with some live streaming of coding, so we’ll see how that goes. OBS is great for public streams, but I need to figure out how to make it work well with private invite-only training classes.

Got a Live Class Season Pass or Team Membership? You can start watching the latest Mastering Query Tuning class’s fancy recording now, and the recording of today’s Performance Tuning in 21 Demos class will be up this weekend. Enjoy!

[Video] Help! SQL Server 2008 is Still Here!

Your company is ignoring the news: SQL Server 2008 and 2008R2 are officially out of support as of today, but nothing’s changing at your company. You still have SQL Server 2008 in production, and you’re a little nervous. How should you approach the conversations with management? Brent Ozar will help: he’s been there too, fighting managers who want to leave an old server hiding in the corner. We’ll role play: you be the manager, and Brent will show his techniques for convincing management to take the plunge and move to a newer SQL Server version.

Slide deck:

Using Implicit Transactions? You *Really* Need RCSI.

Implicit transactions are a hell of a bad idea in SQL Server: they require you to micromanage your transactions, staying on top of every single thing in code. If you miss just one little DELETE/UPDATE/INSERT operation and don’t commit it quickly enough, you can have a blocking firestorm.

The ideal answer is to stop using implicit transactions, and only ask for transactions when you truly need ’em. (Odds are, you don’t really need ’em.)

But if you can’t change the code, you need the delicious goodness of Read Committed Snapshot Isolation. Here’s a video demo showing why:

You can learn more about RCSI, its gotchas, and how to implement it here.

Menu
{"cart_token":"","hash":"","cart_data":""}