Error Handling Quiz Week: Where Are You Handling Errors?

T-SQL
7 Comments

This week’s series on error handling has been an eye opener for many of us. We’ve laughed. We’ve cried. We’ve screamed in horror. We’ve read the documentation.

I don’t blame you. This is a really confusing topic, and for many of us, it’s the first time we’ve actually looked at the mechanics of how transactions, TRY/CATCH, and error handling work. I know it’s the first time because a while back, I asked you how often your newly written code checks for errors. Most of the time, we’re not doing it at all.

That was enlightening (and a bummer) for sure, but I followed it up with another poll. When we DO check for errors, where are we doing it – in a TRY/CATCH block, in some other form of T-SQL, or in the app code?

When you think about the results, remember that this is nowhere near a valid survey – it’s just a quick question to open a discussion. This poll is actually why I wrote the blog post series – I had a pretty good hunch people didn’t understand how TRY/CATCH really works, and given that most of you aren’t using it, I figured I should talk about how complex it is.

The responders aren’t a wide sample. Most of my audience consists of database people, not pure application developers. Also keep in mind that if you’re writing C# code that accesses the database via Entity Framework or some other ORM, you’re not writing T-SQL queries either. In that case, error handling would obviously be in the app code, not TRY/CATCH or other forms of T-SQL.

I had to have the “I don’t check for errors” answer given how widespread the no-checking answers were on the prior poll. It helps remind the rest of my audience what kind of real-world code is out there, and it gives the YOLO crowd a way to answer without pestering me for replies.

Twitter polls only let you pick one answer, not mark-all-that-applies, and many folks responded with replies saying that they sometimes check in different places (or multiple places) depending on the kind of queries they’re writing. That’s great! And given what we saw with the complexity today, you can see why it’s helpful to try to catch errors in more than just one place.

As an industry, I think we’ve got some work to do around improving our handling of errors, transactions, and database failovers. I’m not saying every query needs to automatically retry deadlocks, handle database failovers, and survive out-of-drive-space issues, but there are a few easy things we can do to make our apps more graceful and fault-tolerant.


Error Handling Quiz Week: Making a Turkey Sandwich with XACT_ABORT

T-SQL
21 Comments

CAN YOU BELIEVE THAT HOT MESS IN YESTERDAY’S POST?!?

I know, right? You thought that by combining try/catch with a transaction, you’d get robust error handling.  Instead, you ended up with half the tables populated, and a leftover open transaction. You’d already forgotten Tuesday’s post in which I pointed out that TRY/CATCH doesn’t catch low severity or high severity errors.

TRY/CATCH and transactions are turkeys.

And two turkeys don’t make an eagle.

For a better experience, you need the Turkey Sandwich approach to transaction and error handling. You have to surround those turkeys with two key pieces of code:

We start with the XACT_ABORT option which increases our odds that entire transactions are rolled back or forwards together. However, that alone still may not be enough – remember Tuesday’s post where I talked about how some catastrophic errors skip the CATCH block altogether. We have to finish our turkey sandwich with an additional line to check to see if any transactions still remain, and if so, roll them back.

We’ll do yet another round of clearing the tables, running our proc, waiting a few seconds, canceling it, and then checking the table contents:

FINALLY, we achieve our goal: when the proc is canceled or fails, neither of our tables have any rows in them.

You are not a chef.
You just made your first Turkey Sandwich.

This is not the end of your culinary journey. You’ve merely learned that TRY/CATCH and transactions are turkeys, and you have to make a turkey sandwich by framing them with additional options and code. There is way more work involved.

The next step in your learning journey is to read Erland Sommarskog’s Error and Transaction Handling in SQL Server: Part 1, and then Part 2, and then Part 3. My blog posts this week have not even covered everything in Part 1! My job this week was really just to challenge what you thought you knew about error and transaction handling, and convince you that you need to dig into Erland’s posts.

Also, put down that jar of NOLOCK. That’s not mayonnaise.


Error Handling Quiz Week: Combining Transactions And TRY/CATCH

T-SQL
18 Comments

In yesterday’s epic cliffhanger, you might have been shocked to discover that a plain ol’ transaction does not solve our problem – and in fact, it makes things worse. With yesterday’s code, we still got rows inserted into the Parent table, no rows in the Child table, and a nasty surprise that our session will have to deal with down the road.

The size of the turkey might have become apparent when you tried to close your query. I’m using SQL Server Management Studio, and when I try to close that query’s tab, I get a popup warning:

Let’s look at our code again to understand what happened:

When we executed the proc, waited a few seconds, and then clicked cancel, we only canceled the execution of the waitfor. The transaction was still left open.

I’ve seen this error in so many applications, I can’t even begin to tell you. But here’s the crappy part: it’s almost impossible to track down where the bug is. All it takes is to make this coding mistake just once, and the transaction will be left open for other parts of code to deal with. I’ve seen so many cases where applications leave transactions open for days, and nobody’s sure exactly where the transaction was started.

The solution is to combine TRY/CATCH and a transaction. Before I get to the code, though, I’ll make sure to run this a few times just to make doggone sure I didn’t leave any transactions open in this window:

Let’s do a transaction in a TRY/CATCH block.

I know, I know, the code is a little wordy given that all we’re tryin’ to do is a pair of inserts, but here we go:

You know the routine: clear out the tables, run the stored procedure, wait a couple of seconds, cancel it, and check the contents of our tables:

And before you try it, for the last time this week, take a guess at what’s going to happen:

  • We’ll have no new rows in either Parent or Child, and no open transactions
  • We’ll have a new row in Parent, but not Child, and an open transaction

And then run it yourself to see whether your hunch was correct. After your experimentations, make sure to close your SSMS window juuuust in case – we’ve been playing around with a lot of transactions this week.

The fun continues in the next post.


Error Handling Quiz Week: Will a Transaction Help?

In yesterday’s post, we tried a TRY/CATCH, and those of us who actually ran the demo scripts were disappointed. We found that a TRY/CATCH alone doesn’t fix it: we end up getting rows in the Parent table, but no rows in the Child table. It turns out there are some gotchas in Books Online’s summary of what TRY/CATCH does:

A group of Transact-SQL statements can be enclosed in a TRY block. If an error occurs in the TRY block, control is passed to another group of statements that is enclosed in a CATCH block.

The devil is in the details. If you scroll further down in the documentation – look, I’m not saying you’re actually going to do that, because we both know damn well that you don’t read the documentation – there are gotchas like:

TRY…CATCH constructs do not trap the following conditions:

  • Warnings or informational messages that have a severity of 10 or lower.
  • Errors that have a severity of 20 or higher that stop the SQL Server Database Engine task processing for the session. If an error occurs that has severity of 20 or higher and the database connection is not disrupted, TRY…CATCH will handle the error.
  • Attentions, such as client-interrupt requests or broken client connections.
  • When the session is ended by a system administrator by using the KILL statement.
Jokes as dark as my shades

So to recap, TRY/CATCH doesn’t catch small problems or big problems – only medium-sized problems.

Okay, that’s a bit of a turkey.

In our scenario this week, I’m causing the error by canceling the query (a client-interrupt request) – but no, we can’t just say, “Never abort your queries.” I’m pro-choice: people deserve the right to end their queries at any stage of their development. I also believe in the death penalty: DBAs deserve the right to kill your queries when they’re causing terrible harm to society at large.

The business is asking us to make sure that the data gets into both the Parent and Child tables, or neither. We’re gonna need something more powerful than TRY/CATCH.

Let’s try a transaction instead.

Microsoft’s documentation on SQL Server transactions starts out with the most important stuff:

A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.

That, not try/catch, is probably a better fit for what we’re trying to do. Let’s frame our two insert statements with transactional code:

And as before, run the proc, wait a few seconds, cancel it, and then see what we’ve got:

As with yesterday’s demo, I’m not going to show you the results here. Instead, guess what’ll happen before you run it:

  1. We’ll have no new rows in either Parent or Child, or
  2. We’ll have a new row in Parent, but not Child, or
  3. We’ll have new rows in both Parent and Child

And then run it yourself to see whether your hunch was correct. Just as a reminder, only do these experimentations in a development environment, not production, especially if you happen to have RCSI or SI enabled in production.

To understand what’s happening and why, check out the next post in this series.


Error Handling Quiz Week: Tryin’ TRY/CATCH

T-SQL
32 Comments

Let’s say we have two tables, Parent and Child, and we need to guarantee that they both get populated at once. We’ll write a single stored procedure to do both inserts:

I put a WAITFOR in there, but that isn’t the problem – I’m just using that to demonstrate why the code isn’t production-ready. If I execute the stored procedure, wait a few moments, and then click cancel – what happens?

The results show that a row was added in the Parent table – but not in the Child table:

Well, that’s not good: the business has asked us to guarantee that we either insert both rows, or neither. Our code can’t be doing a half-*ss job. We either gotta have all *ss or no *ss in our back end systems.

This isn’t just about canceling queries.

This problem pops up in many scenarios, like these:

  • Our query loses a deadlock
  • Our query gets killed because it’s blocking others
  • The server runs out of drive space
  • Our application calls multiple statements (rather than a proc) and the app fails midway through

Foreign keys don’t fix this kind of problem, either – in fact, in many situations, foreign keys even cause problems. That’s outside of the scope of this week’s posts, though: this week, I need to teach you why and how to make a turkey sandwich.

Let’s try a TRY/CATCH.

According to the TRY/CATCH documentation – don’t worry, I know it’s a long page, but this important part is right up there at the top of the page so you can ignore it more quickly:

A group of Transact-SQL statements can be enclosed in a TRY block. If an error occurs in the TRY block, control is passed to another group of statements that is enclosed in a CATCH block.

Let’s implement it in our code:

And then try the same trick – run it, wait a few seconds, then cancel it, and see what’s inside the tables:

I’m not going to show you what happens, dear reader, because this is a short demo, and I want you to run it yourself to see what happens. Guess what’ll happen before you run it:

  1. We’ll have no new rows in either Parent or Child, or
  2. We’ll have a new row in Parent, but not Child, or
  3. We’ll have new rows in both Parent and Child

Then run the query yourself to see what actually happens. In this next post post, we’ll continue the experiments.


Registration is Open Now for My SQLBits Workshop!

SQLBits
2 Comments

Registration just opened up for my SQLBits workshop on Mastering Parameter Sniffing! Move fast – due to COVID, they’re spacing out the attendees, so there are even less seats available than usual. My workshops usually sell out quickly, so if you want a seat, you’ll need to hustle in.

Here’s the session abstract:

You’re a database developer or DBA with at least 3 years experience performance tuning queries and indexes.

You already know you have parameter sniffing issues, and now you need to figure out how to fix it. In this one-day class, you’ll learn how to reduce the blast radius with index changes, query tuning, and database-level settings. You’ll also see firsthand how SQL Server 2017, 2019, and 2022 try to reduce it with adaptive joins, adaptive memory grants, automatic tuning, and caching multiple plans.

Join me at SQLBits on Tuesday, 8 March, at ExCel in London for a one-day workshop. I’ll be there in person – my flights are already booked, and I’m excited to see everybody in person again. The conference is taking great precautions: masks are required at the event, tables will be spaced apart, and proof of double vaccination, Covid-19 antibodies, or a negative lateral flow test (within the last 48-hours) will be required.

This course is 100% demos: the only slides are the introduction and recap. The rest of the time, I’ll be working live in SQL Server Management Studio and SQL Server 2019 (or 2022 if Microsoft releases the public bits quickly enough.) You can even follow along in your laptop if you bring SQL Server 2017 or newer and the 50GB Stack Overflow 2013 database.

I’m taking the best modules of my 3-day Mastering Parameter Sniffing class and doing it as a training day session. We’re going to be moving quickly and covering a lot of ground, and I know it can be tough to keep up. That’s why attendees will also get one year’s access to the full 3-day Mastering Parameter Sniffing recordings, too! That’s a £695 value, and it’s free with your workshop admission. (You’ll get the access in class.)

Register now. See you in London!


[Video] Office Hours in Cabo: Ask Me Anything About SQL Server

Videos
3 Comments

Let’s get together at sunrise in Cabo San Lucas, Mexico and talk through your highest-upvoted questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Introductions
  • 00:33 AtHomeWithCOVID: Historical advice is to spread multiple datafiles/filgroups across disks. Now, storage is allocated from a balanced SAN as one drive, e.g “D” data, “L” Logs Are there any gains in having multiple datafiles/filegroups in this scenario? (Ignore tempfiles for obvious reasons)
  • 03:04 Dariusz Danielewski: Brent, in what kind of scenario would you recommend a table to have non-clustered primary key and a clustered non-unique index on a set of other columns? What would be the physical organization of such table and what type of queries would benefit from it? Thanks, D-Squared
  • 04:00 SQL mial: Hi Brent, according to you what are the best monitoring solutions that can be used at the same time for cloud VMs, azure sql, aws rdc and on premise.
  • 05:05 Wally: Hi Brent, planning on migrating from SQL2014 to SQL2019 but keeping 2014 compat level after the move. Do we still need to run Microsoft Data Migration assistant on the 2014 server to spot any potential migration issues?
  • 07:23 Doug: Hello Brent, sp_blitzbackups shows that it would take around 24 hours to restore our large multi-terabyte db full backup. Do you have any tips for reducing the restore time (i.e. switch to third party backup software, etc)?
  • 09:44 Koritt: Hi Brent, what are your best practices for SQL Data Compression? Industry advice once was that fewer data pages saved I/Os and meant more pages cached in RAM; are those benefits still compelling in these days of SSDs and 100’s GB of RAM, or are the D/U/I impacts too great?
  • 12:21 MojitoDBA: App with implicit transactions causes huge VersionStore and tempdb 2TB and still growing due to sleeping transactions. And Oracle is not an option cos we’re still waiting to be self-managed and to come with a wine tap for that price. Is it good idea job to kill such sleep tran over 1h
  • 14:30 Mehdi: In the stack overflow database, how many indexes are there in the user’s table?
  • 15:55 George Bluth: Hello Brent, will you share your pros/cons of being a DBA vs being a developer?
  • 18:13 Wrap-up

New Year’s Resolution: Lose Weight in Your Database

Indexing
14 Comments

Made a New Year’s resolution to lose weight this year? Forget taking it off your waistline – take it off your database instead with:

sp_BlitzIndex @GetAllDatabases = 1, @Mode = 2, @SortOrder = ‘size’

Or:

sp_BlitzIndex @GetAllDatabases = 1, @Mode = 2, @SortOrder = ‘rows’

This produces an inventory of all the indexes sorted by size on disk or the number of rows, descending:

You can copy/paste that into Excel, send it around amongst your teams, and start having discussions like:

  • “Why do we have a 50GB heap called Sales_Backup_BeforeTheDeploy?”
  • “Why do we have a 200GB table called ClickLog with over a billion rows? I thought we were purging that table nightly?”
  • “Weren’t we supposed to archive the SalesHistory table?”

Things to be aware of when you’re reading the results:

  • The Rows & Reserved MB columns are way off to the right. I just rearranged the SSMS output for this screenshot. That’s one of the reasons I typically do this analysis in Excel – that, and the fact that I like to highlight offensive rows.
  • The Last User Seek, Last User Scan, and Last User Lookup columns can help identify when the index was last read, but it’s not entirely accurate for a lot of reasons that we discuss in this module of the Mastering Index Tuning class.
  • The Last User columns are also incremented by any query, including things like your homegrown fragmentation scripts checking the contents of a row or a homegrown ETL package. Even if it’s run automatically by your apps, that’s still a “user” query to SQL Server.

That’s it. That’s the whole post. See, my blog post lost weight too.


Who’s Hiring in the Database Community? January 2022 Edition

Who's Hiring
15 Comments

Last year, I started this series of “who’s hiring” blog posts because The Great Resignation was clearly a real thing. The pandemic is causing so many folks to rethink what they do for a living, where they do it, who they do it for, and in light of yesterday’s salary survey results, how much money they earn doing it. It’s been really fulfilling to hear stories of employers who found candidates here, and candidates who found a new opportunity. So let’s keep ‘er going for a while longer.

Is your company hiring for a database position as of January 2022? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.

If you want to be notified as people leave new jobs here, there’s a “subscribe without commenting” option where you leave a comment. I don’t use that for subscribing you to anything else.


Here Are The Results of the 2022 Data Professional Salary Survey.

Salary
23 Comments

Every year, I run a salary survey for folks in the database industry. This year, I was especially curious to see the results to find out whether salaries went up. Anecdotally, I’ve seen a lot of people jumping ship to new companies due to the Great Resignation – but what does the data actually show? Let’s find out.

Download the results here and slice & dice ’em to see what’s important to you.

I’m a database administrator, so I filtered for DBAs in the United States and then looked at average salary, and yes, it took a pretty good jump this year:

About a 5% raise, much higher than previous years:

That’s a good sign, but I’ll leave it to y’all to dig into the numbers that mean the most to you, and in the comments, tell me what you see.

Update: Mattia Nocerino has a free Tableau report with this year’s data.


The SQL Server Posts You Read the Most in 2021

SQL Server
0

I think of my blog posts in a few different categories:

  • Posts that will only be read at the moment they’re published (like upcoming webcasts, humorous stuff, and career advice)
  • Posts that will show up in search history over time and continue to be evergreen (like how a feature works)
  • Posts that provoke discussion and interactivity (like asking you a question or running a contest)

So with that in mind, let’s take a look back at the stuff you seemed to enjoy the most this year. I’m gonna sort them in a few ways:

  • 2021 posts you read the most (could be flashes in the pan, or might end up being evergreen over time)
  • Older posts you read the most this year (evergreen)
  • Posts you commented on the most

Evergreen Posts You Kept Reading

Let’s start with these first because I think it helps paint a picture of the different kinds of readers.

If you’re going to write a post that stands the test of time, you want to solve a timeless problem that readers have faced for years, and will continue to face as time goes on. These tutorial posts aren’t often the favorites of readers when the post first goes live, but they’re the kinds of posts that bring in new readers over time.

Not only is it hard to write posts like this initially, but it takes work to continue to refine the content over time, adding in the kinds of key words and content that people are searching for. I actively prune some of ’em, and some of them were perfect when they were published.

If I *purely* wrote tutorial posts, then it’d be a different kind of blog. Pinal Dave’s SQLAuthority is a great example of that – I guarantee every one of you finds his stuff in your search results CONSTANTLY, and as a result, he has way, way higher page views than I have. I appreciate how much work he puts into that. I’m too lazy for that, as you’ll see in the next list.

2021’s Most-Read New Posts

Armed with that list of above evergreen posts, now you’ll be able to look at 2021’s top new posts and make a guess as to which ones will be evergreen, and which ones won’t generate any hits in 2022:

2021’s Most-Commented Posts

And with that, I’ll let you have the rest of the year off – no new blog posts will get published here for the rest of 2021. I’ll see you next year!


[Video] Office Hours: SQL Server at Sunset

Videos
0

Let’s kick back with a tasty beverage and go through your highest-upvoted questions from https://pollgab.com/room/brento on the beach:

Here’s what we covered:

  • 00:00 Introductions
  • 00:49 Alexander Speshilov: How to determine amount of memory (RAM) used by query store?
  • 02:23 Aslan: Merhaba Brent, in your opinion, what are the top 4 databases to administer by DBA pay scale?
  • 04:40 Murdock: In a 2 node sync AG, should my friend let the reports read from Primary or Secondary? What if the DBs from which the reports are reading have RCSI enabled?
  • 07:10 Tobias Fünke: Howdy Brent, we see a lot (3 thousand plans for 1 query hash) of duped query plans due to queries using literal values instead of parameterized values in the where condition. Are there any potential gotcha’s associated with enabling Forced Parameterization in this scenario?
  • 08:16 Annyong Bluth: Hello Brent, We have ISV app for which we made our own non-clustered index to help with speed for our own custom report. Unfortunately, this broke the next major upgrade script for that app / db. What are your thoughts on applying NC indexes to ISV app db’s?
  • 10:10 Buster Bluth: Hello Brent, from a hiring perspective, what are the traits that distinguish a junior SQL DBA from a Senior SQL DBA?
  • 11:30 Adam West: Hi Brent, is it possible/safe/wise to offload backups and CHECKDB to an AG secondary server?
  • 14:00 Wrap-up and discussion of life in Cabo

Check Your SQL Server Backup Performance & Safety with sp_BlitzBackups

Out of all of the scripts in our free First Responder Kit, sp_BlitzBackups is probably the one you’ve used the least. Let’s talk about what comes out of it, and why you should use it more often.

First, let’s define two terms:

  • Recovery Point Objective (RPO) – measured in time, it’s how much data you would lose if you restored a backup. For example, if your last backup finished at 1AM this morning, and it’s currently 10AM, you’d lose 9 hours of data if the server went down right now. That’s a 9 hour RPO.
  • Recovery Time Objective (RTO) – also measured in time, it’s how much time it would take you to perform a recovery. For example, if the server went down right now, and it took you a total of 1 hour to decide to do a restore, start the restore process, finish the restore, and then let people back into the app again, that’s a 1 hour RTO.

Businesses have a really hard time agreeing on what their desired RPO and RTO are because they wanna lose as little data as possible, and they wanna be down for as short a time as possible. That’s where sp_BlitzBackups comes in. It can’t tell you what your goals are, obviously. Instead, it estimates what you’re actually delivering today.

When you run sp_BlitzBackups, it analyzes the last 7 days of backups. Here’s what it returns:

I know, tiny screenshot – you can click on it if you wanna see the results, but I’ll describe the non-obvious columns for you:

  • RPOWorstCaseMinutes – the longest length of time between two successful backups. Say you regularly do log backups every 15 minutes, but between 9:10AM and 9:40AM, the backup target was offline, and no backups could be done. You had successful log backups at 9:00AM and 9:45AM. Your worst case for RPO would be if the server went down at 9:44AM, just before the 9:45AM backup ran, so you’d have a 44-minute worst case RPO.
  • RTOWorstCaseMinutes – the longest length of time it would take you to do a restore and get back online. Say you do full backups every night, and log backups every 15 minutes. Your worst case RTO scenario would be if the server went down right before the nightly full backup because you’d have to restore the prior day’s full backup, plus all day long’s transaction log backups. In order to calculate this number, we take the total backup time spent during all of those backups. Technically, this isn’t accurate because your restores could take longer than the backups, especially due to the lack of instant file initialization on the transaction log. This just gives you a rough starting point idea, though.
  • Supporting info – when people see these “worst case” numbers, their next question is, “Holy smokes, when was the time where we might have lost 44 minutes worth of data?” The following columns give you information about what the dates/times were and the backups involved, plus more-info queries so you can examine the history stored in MSDB, like this:

Wanna performance tune your backups too?

Continue scrolling to the right in the result set, and you’ll find:

  • FullMBpsAvg, Min, Max – the throughput you’re getting in megabytes per second.
  • FullSizeMBAvg, Min, Max – how large your backups are, before compression.
  • FullCompressedSizeMBAvg, Min, Max – how large your backups are, after compression.
  • Similar columns for Diffs, Logs – these are useful if you need to estimate change rates.

I love using backup throughput as an early warning system, like a canary in the coal mine. If backup throughput suddenly drops, it’s a sign that something went wrong with the storage or the networking. We can’t tell you whether it’s a problem with slower read speeds, slower write speeds, or more traffic on the storage network, but it’s just your clue that it’s time to start troubleshooting – because user queries are likely running more slowly too. And when backup throughput drops simultaneously across multiple SQL Servers, that’s an even bigger clue that something went seriously awry with the company’s overall storage.

When you sit down at your desk each morning, backups aren’t usually the first thing on your mind. Just take a few minutes today to double-check that you’re actually backing this stuff up as frequently as you think, and that downtime will be as short as you think. I wish I had a dollar for every time a client’s DBA got caught with their pants down because backups weren’t scheduled correctly or were regularly failing at specific days/times.

Oh wait – I actually do have a dollar for each of those times. Several dollars, in fact. Look, let’s not get sidetracked.


[Video] Office Hours: SQL Server Q&A in Cabo at Sunrise

Videos
4 Comments

Let’s watch the sun rise at Land’s End and cover your top-voted questions from https://pollgab.com/room/brento/.

Here’s what we covered:

  • 00:00 Introductions
  • 01:04 Jr Wannabe DBA: Hi Brent, a colleague wants to upgrade from Standard to Enterprise purely for performance gains on several regular servers (up to 8 CPU, 128GB of RAM, SQL 2016 or 2019). Is the change worth paying for? I cannot find any benchmarks to support or contradict such a change.
  • 04:12 OhMyKingdomForANetworkEngineer: Hi Brent! You mentioned using DNS CNAMEs to redirect during log shipping failovers. My friend wants redirect end users to new production after upgrade by changing CNAME, but knows nothing re: DNS, CNAMEs, or active directory. My LMGTFY didn’t help. Is this some kind of AD magic?
  • 06:38 Loud Howard: Hi Brent, what are your top MS hyped features of SQL Server that never caught on?
  • 08:54 SQL Steward: Hey Brent, a friend of mine is trying to explain to his boss why using a function to create a temp table which you then use to re-create your main OLAP fact table every 5 minutes is a crazy bad idea. How would you explain this in plain English? Really appreciate all you do.
  • 10:30 Francesco Mantovani: Hi dad, does liked server handle parallelism on the remote server? Is the OPTION (MAXDOP x) worth using in a linked server query? ….you rock, make those turtles go faster!!
  • 12:36 DBAGreg14: Is there any reason to stagger start times for DBCC CheckDB’s on AG replicas? We just added a 3rd node, and have each replica’s CheckDB running on the same schedule. Is there any impact on log replays between nodes when DBCC runs? Would the internal snapshot affect this?
  • 13:48 David: Hi Brent, Is it ok to defrag the Data files/disk? VM hosted on vmware platform.Datbases size 2-4TB.
  • 14:16 GreatWorkBrent: Were you aware that your videos don’t play in ‘restricted mode’? (my work enforces this mode) It may be due to your graphically violent slaying of common SQL problems, or your profane & incendiary comments about stupid people doing stupid things. Or you have a drinking problem.
  • 15:30 Kumaran: Hi Brent, we have app with many dyn SQL SPs. MS suggests to parameterize the execsql and app team prefers to not rewrite SP. But my friend says dynamic SQL should be avoided for better perform and for query stats (agree on stats), is he only right about dyn sql at app layer.
  • 17:15 Sunrise
  • 17:30 Kumaran: Could you suggest any solutions for sql server patching which is SQL Server feature aware?
  • 18:52 SQL100: When trying to design a Data Warehouse, there will be data coming from different sources- do you recommend to convert the data e.g. into a .csv file so that it can then be imported easily (after all the required filters have been done) or directly from the database? Thank you
  • 20:18 Brave Sir Robin: Hi Brent, Should we do any SQL maintenance tasks non traditionally if we have crazy fast TempDB (local SSD) and crazy fast SAN (NVME SSD)?
  • 21:25 Gob Bluth: Hello Brent, how do you measure write latency to an always on synchronous secondary replica?
  • 22:24 A DBA DBAing: Sometimes I see multiple values for the same parameter in the ‘cached parameters’ section of sp_blitzcache, what is the reason for that?
  • 23:23 Raphra: How do you select the picturesque places, where you stay and stream your office hours from?

[Video] Office Hours: First Post From Cabo

Videos
0

Join me on my balcony in Cabo as I review your top-rated questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Introductions
  • 00:48 md: Hi Brent, as SQL Server works with 8K pages why is the recommended NTFS Unit Allocation Size for SQL volumes 64K? Wouldn’t 8K be a more efficient fit? Sneaking in a second question – I’m guessing with modern storage systems this setting doesn’t have a big impact on performance?
  • 01:47 Ratbert: Hi Brent, is there a good way to be notified when a new table is created on production? Would like to know when a new table is created by anyone on production.
  • 03:08 Dogbert: Hi Brent, do you foresee much commercial opportunity for someone creating online Postgres training classes similar to what you do with SQL Server training?
  • 04:20 Skunkbert: Hi Brent, when will you stream your “awful” playing skills in Dead By Daylight?
  • 05:16 icklemouse: Hi Brent. Pretty new to Azure as Evil Corp is now moving everything up from on prem. Would appreciate a knowledge check. I can find no SQL Db solution that allows a cross region auto-failover to take place. Everything I see involves a manual step. Did I miss something? Cheers
  • 08:00 San: Hi Brent, My DBA has set the autogrowth to 1MB and 10%, and this now have hundreds of fragments created. what is the best approach to reduce these fragments, do you advice defragmentation using contig.exe or shrink databases? I am lost here.
  • 10:06 Jim Ignatowski: Hi Brent, trying to investigate a stored proc that uses cursors. Unfortunately, execution of this sproc (with actual plans enabled) crashes SQL management studio. Too many plans generated I guess. Is there a better tool to capture the query plans for sprocs that use cursors?
  • 11:52 i_use_lowercase_for_select: Hi Brent, hope you have a decent Wi-Fi coverage there in Mexico. My customer is using mixed mode authentication for some of its apps. When googling for best practices on that I get inconsistent results. What is your stand on mixed mode vs windows-only authentication and why?
  • 13:43 DefaultDBA: Hey Brent, Really enjoying your office hours and recorded season pass training. My question is about INT vs GUID for key fields. We have systems that use both, but wondering if there are reasons to use one over the other.
  • 14:49 Wrap-up

Don’t Use SQL Server to Send Emails.

I would like to unsubscribe from this idea.

I’m not talking about Agent alerts when jobs fail. That’s fine when you can’t get the business to spring for a monitoring system.

I’m talking about using SQL Server to send emails to confirm a customer’s order, alert that a product is low on stock, or other business processes.

SQL Server’s email delivery isn’t very robust. Whoever started the documentation by writing “Database mail is an enterprise solution” had clearly never seen an enterprise solution in real life. Sure, it seemed robust when it came out a couple decades ago, but these days, anything involving the word “enterprise” needs to survive a data center failover. To protect your user databases, you’re likely using log shipping or Always On Availability Groups, and in the future, it’s only getting more complex with SQL Server 2022’s ability to fail over to Azure SQL DB Managed Instances. Database Mail lives in the MSDB database and Service Broker. Your messages ain’t coming along for the ride.

When email delivery stops, you won’t notice. People think, “Hey, things must be going great these days. It’s been forever since I’ve gotten a failure email.” In reality, there are hundreds or thousands of emails piled up in some SQL Server somewhere with a broken email delivery subsystem, like an old SMTP server address. You won’t figure it out until it’s too late.

SQL Server’s email troubleshooting is almost nonexistent. When things do go wrong, the logging isn’t detailed, and it’s not the system that your sysadmins and email admins are used to working with. They’re going to be shrugging and pointing the finger at you, going, “Well, the rest of our systems are doing email just fine.”

These things aren’t as much of a big deal for DBA-focused emails like a job failed. However, if your business users are relying on the email getting there, hand the task over to your developers, and have them use a truly modern enterprise solution like SendGrid or Simple Email Service.


[Video] Office Hours: SQL Server 2008 R2 Edition, I Guess

Videos
0

I went through your highest-voted questions from Pollgab and talked through ’em, and strangely, there was a bit of a 2008R2 theme!

https://youtu.be/3-B5L3Komq4

Here’s what we covered:

  • 00:00 Introductions
  • 02:00 Pete Donnelly: Hi Brent, I have a 7 TB DB and the largest table is 2 TB, has 6 indexes and 6 related Statistics. It has 42 “_WA_Sys” Statistics. We truncate this table every Jan 1st, to start the new year with a empty table. Should I drop the “_WA_Sys” Statistics when I truncate this table?
  • 03:58 LifelessDBA: Hi Brent, Suppose SQL 2022 GA is available now, Is there any kind of feature or enhancement convincing you to think about migrating to SQL 2022?
  • 06:26 2008R2people: Hi Brent, my company is still using SQL Server 2008R2 which will be end of support soon. Is it better to upgrade to newer version, or stay at 2008R2? Technically currently our apps are running smoothly in 2008R2. Which version do you suggest we upgrade to if we must?
  • 12:14 Mehdi: Hi Brent! How does the query optimizer calculate the estimated number of rows when there are clustered index seek and residual predicate in the actual execution plan?
  • 13:42 BiggieDB: Hello Brent and all the party people! I have CTE that is finding all “parent” elements…think finding a managers manager etc. Number of levels is unknown. Noticed that iit gets wildly incorrect estimations as a CTE. Is there any better way to write a query like this?
  • 16:01 Trushit: This question is more regarding the consulting career. If you were to choose a technology/tool today to specialize in which one would that be so that you can grow in your career for next decade or so? Does Power BI seem like a good tool to specialize in?
  • 19:14 Redgate State of the Database Monitoring Report: https://www.brentozar.com/go/report
  • 21:37 Paul Mugs: What would you suggest as a broad approach to version control for tables/views, everyone just points me towards github, but I struggle to see how this translates to what me and the team do in SSMS.
  • 23:15 Stefano: I Brent, about “blob out of the database”: I’ve a 5TB database with blob inside, AlwaysOn (async) replication between primary and DR site. Moving documents outside, how guarantee consistency between metadata and file system? And consistency of replication (data+file) on DR site?
  • 25:28 HD: There are so many valuable TSQL sugg & feedbk from the TSQL users on the user voice forum but MS doesn’t act on it. Does MS really read & respect the user’s voice or it is there for the name’s sake? SS still missing a good no. of features in the TSQL area compare to other dbs
  • 28:06 Dr. Dolittle: Dear Dr. Ozar, can my friend install Sql Developer Edition for his Dev and Test Server or are there any gotchas? At the moment he uses Enterprise Edition on both server
  • 32:33 Chris Weaver: I have been asked to look at a DB in SQL Server 2008 R2. No diagrams or documentation. Is it possible at all to work out what the relationships could be between the tables? There are roughly 500 of the *******
  • 35:52 Would y’all want to play co-op with me, or watch me stream games?

How Do You Test IOPs, Latency, and Throughput?

Storage
5 Comments

You’ve heard these terms thrown around about storage, and you’re not sure what they mean, and how they relate to each other. Let’s make it simpler by using an analogy.

Say you want to measure a shipping company.

You’re trying to decide whether to use DHL, FedEx, UPS, or your local postal service.

You could measure them by sending me a gift – after all, it is the holidays, and I do a lot of stuff for you year round, and it’s probably the least you could do.

  • You place one box outside
  • You call the shipping company to come pick it up, and start the clock
  • When I get it, I’ll call you to confirm receipt, and you stop the clock

The length of time between when you called, versus when I acknowledged it – that would be how long it takes the shipping company to move exactly one package. Let’s say – just to say – that you sent the package overnight. You called the company on Wednesday at 10AM, and I received it Friday at 10AM.

Our performance tests are:

  • Input/output Operations per Second (IOPs): 1 package every 2 days
  • Latency, how long an operation takes: 2 days
  • Throughput, how many operations are coming through at once: 1 package at a time

That test only makes sense
if you’re sending one package at a time.

Sure, in rare cases, you really do passionately care about how long it takes to send me a Christmas gift. (But seriously, why did you pick a 55-gallon drum of Froggy’s Fog? I don’t even own a bubble machine.)

But most of the time, when you’re benchmarking shipping, you’re doing it because you want to ship a lot of things, to a lot of people, at the same time. And indeed, if you try to ship more things at a time, you’re going to get dramatically different test results.

If you set out 10 shipments of Froggy’s Fog, and call the shipping company for a pickup, your metrics magically become:

  • Input/output Operations per Second (IOPs): 10 package every 2 days (goes up)
  • Latency, how long an operation takes: 2 days (this stays the same)
  • Throughput, how many operations are coming through at once: 10 packages at a time (goes up)

A good load test involves lots of simultaneous operations – ideally, a similar number of operations that you’d actually need in your daily work, but not too many. Because imagine what happens if you try to put 100 drums of Froggy’s Fog out for pickup, but the shipper’s default van can only hold 10 drums. The van driver will arrive, find too many drums out, and just maybe tell you, “Sorry, we’re going to have to come back tomorrow with a bigger truck to handle this much.”

Or, maybe on the other side of the test there’s a problem. Maybe my front door just doesn’t have that much space, and after dropping off 10 drums, the shipper says, “There’s no space left here – we’re gonna have to drop the rest off tomorrow.”

This isn’t necessarily a shipping company problem – instead, it’s a problem with our planning and our infrastructure.

Storage tests have the same problems.

If you only test with one storage request at a time, you’ll see artificially low IOPs and throughput numbers. Sure, the latency is roughly accurate for one request at a time – but that is rarely a useful number by itself.

If you test with too many requests at a time, you’ll surface different bottlenecks. You may not have enough storage adapters, paths, or even volumes to really take advantage of the storage device.

Even your package size selection influences the test. What if, instead of sending out 55-gallon drums of Froggy’s Fog, you chose smaller gifts, like a bunch of yodeling pickles? You might get an artificially high number of packages shipped per second, but if your workload really does involve 55-gallon drums, then the pickle test isn’t accurate. Similarly, you can see storage tests that are purposely trying to achieve artificially high IOPs requests, so they’ll do absurdly tiny I/O size requests.

It’s hard to plan storage for SQL Server. On Thursday, I’m doing a free webcast with Pure Storage to talk more about how I approach the problem when building and consolidating servers. See you there!


Going to SQLBits? Join me for my Mastering Parameter Sniffing workshop.

SQLBits
2 Comments

You’re a database developer or DBA with at least 3 years experience performance tuning queries and indexes.

You already know you have parameter sniffing issues, and now you need to figure out how to fix it. In this one-day class, you’ll learn how to reduce the blast radius with index changes, query tuning, and database-level settings. You’ll also see firsthand how SQL Server 2017, 2019, and 2022 try to reduce it with adaptive joins, adaptive memory grants, automatic tuning, and caching multiple plans.

Join me at SQLBits on Tuesday, 8 March, at ExCel in London for a one-day workshop. I’ll be there in person – my flights are already booked, and I’m excited to see everybody in person again. The conference is taking great precautions: masks are required at the event, tables will be spaced apart, and proof of double vaccination, Covid-19 antibodies, or a negative lateral flow test (within the last 48-hours) will be required.

This course is 100% demos: the only slides are the introduction and recap. The rest of the time, I’ll be working live in SQL Server Management Studio and SQL Server 2022 (assuming the preview bits are ready.) You can even follow along in your laptop if you bring SQL Server 2017 or newer and the 50GB Stack Overflow 2013 database.

I’m taking the best modules of my 3-day Mastering Parameter Sniffing class and doing it as a training day session. We’re going to be moving quickly and covering a lot of ground, and I know it can be tough to keep up. That’s why attendees will also get one year’s access to the full 3-day Mastering Parameter Sniffing recordings, too! That’s a £695 value, and it’s free with your workshop admission. (You’ll get the access in class.)

My SQLBits workshops always sell out, and you’ll wanna move fast to save your seat. Talk to the finance folks now to get the credit card primed up and ready so that when registration opens – it’ll open by the end of this month – you’ll be able to grab your seat quickly.

See you in London!


Who’s Hiring in the Database Community? December 2021 Edition

Who's Hiring
21 Comments

Is your company hiring for a database position as of December 2021? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

I close the comments here after 1 week, and publish another post each month in the Who’s Hiring category so y’all can get the latest opportunities.

If you want to be notified as people leave new job comments here, there’s a “subscribe without commenting” option where you leave a comment.


Menu