Blog

[Video] Consulting Lines

Consulting Lines, Videos
2 Comments

In my consulting work, I often find myself saying the same things to clients – but they’re not the kinds of things I said when I was a database administrator. I call these “Consulting Lines”, and I talked through ’em:

 

Here’s my full series of Consulting Lines blog posts:

“Yes, that used to be a best practice, and…” – When someone’s been doing something remarkably bad, don’t pull their pants down in front of the group. Here’s how to focus on moving forward with a new way.

“Would you mind driving?” – If someone’s micromanaging you and telling you to do something you don’t agree with, here’s how to call their bluff gracefully.

“SQL Server needs a dog.” – If you’re dealing with someone who keeps bashing buttons that they don’t understand, here’s how to get them to leave all the switches alone.

“I’d be a bad consultant if I didn’t put that in writing.” – When you see something that scares the pants off you, cover your butt.

“Write this down, because I’m not going to.” The flip side of the previous line: this wins people over on to your side when the finding isn’t all that bad.

“What problem are we trying to solve?” – If someone’s trying to play Trivial Pursuit to test your knowledge, or won’t let go of a really minute technical detail, here’s how to get them to step back and look at the big picture.

“Sounds like you’ve got it under control.” – When someone keeps arguing with you about your proposed solution, saying it won’t work, here’s how to get them to open up to new ideas.

“What happens if that doesn’t work?” – When someone’s about to break production or cause irreparable damage, here’s how to get them to think about Plan B.

“Keep going. What else?” – Some people really need to get their story out before they can talk about the technical problem. Let them.

“Do you want 10% faster, 100%, or 1000%?” – When you start performance tuning, make sure everyone’s on the same page about how long it’ll take and how much work will be involved.

“Let’s put that in the parking lot.” – When you’re in the midst of troubleshooting something important or trying to finish a meeting, set aside a separate area for off-topic requests.

“I have a hard stop at…” – Consulting involves a lot of meetings, and not all meeting organizers respect your time. This line helps you get out of unproductive meetings gracefully.


[Video] Office Hours on a Calm Sea Day Off Canada

Videos
1 Comment

On an unbelievably calm sea day off the shores of Canada, aboard the Norwegian Jewel, I enjoyed a nice cup of completely alcohol-free coffee (as far as you know) and took your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Start
  • 01:20 NotCloseEnoughToRetirementToStopLearning: Inherited a server with dozen trace flags on but no documentation or tribal knowledge as to why. Is this a problem I should be trying to solving? If so, how do you suggest starting?
  • 02:50 Daniel Izzaldin: Hi Brent, what are your thoughts on copilot for azure sql database? Could this be threat to dbas?
  • 03:56 MyTeaGotCold: On Standard Edition, should I just give up on columnstore? The limitations on batch mode appear extreme.
  • 05:03 Vineesh: Does Azure SQL VM experience transition over easily to AWS?
  • 05:48 Venkat: Do you ever see network bottleneck on SQL Servers using Windows Storage Spaces Direct for SQL storage? How do you detect and treat this issue?
  • 06:51 Pradeep: Which certifications do you recommend for new SQL Server VM contractors?
  • 07:26 The Indexer: Hey Brent, I’m drowning in a sea of Buffer Latch Type 2 errors on “source” while loading our ODS with data from a SQL replication environment. I’ve thrown all query optimization lifeboats at the problem, but these errors keep sinking my ship. Any secret rescue plans to share?
  • 09:23 Chris Rose: I’m trying to run this guy in a warehouse type environment – lots of ETL jobs, Power BI reports, etc. It keeps getting blocked so I cancelled after 20 minutes. Do people typically use this tool in a warehouse type environment, and can anything be done to make this go?
  • 09:52 Franks Burns: What’s the PostgreSQL equivalent to Microsoft MVP? Will you be seeking it?
  • 10:32 Justin: Besides price, is PostgreSQL miles ahead of SQL Server?
  • 11:18 My name is Luca: What are the best and worst use cases of SQL file groups you see at clients?
  • 13:10 Dopinder: Does storage block size optimization matter any more in the world of Azure SQL VM and NVME storage?
  • 13:44 Einar: What are the top issues you see when migrating from Log Shipping to Always On?
  • 14:35 Petra: What is the best way to to monitor a given SQL agent job so that if it is not running, notifications are sent out? Important business logic job for big customer.

And here were those Aurora Borealis photos that I mentioned in the video.


Finished Your Training? Get a Certificate of Completion.

If you want to get a raise or stand out from a crowd of job hunt competitors, you need something to differentiate yourself.

Whenever you finish a training class – including mine – ask if you can get a certificate of completion. For example, here’s what mine look like:

Certificate of Completion

You can generate ’em yourself after you’ve finished each one of my training courses. (You won’t see the course completion certificate option until you’ve actually finished the courses, otherwise, trust me, people would just print ’em as soon as they buy the course. Trust me – we’ve had plenty of students ask for their certificates without actually watching any of the videos, and yes, we do track it.)

We have different certificates for each course, including the free ones. Yes, you should even be proud of the free ones, because there are plenty of folks in our industry who don’t even go to free training classes, let alone get their certificates of completion!

They come in PDF format, and you can also screenshot ’em to put on your LinkedIn profile. I would add ’em to your profile over time, as you complete each course, showing prospective employers that yes, you have a provable track record of learning, with timestamps over time.

If you want to add me to your LinkedIn network too, feel free. I only accept connections from folks who actually appear to be working in the database industry, with a picture on their profile. (I don’t accept ones from headhunters, sales folks, anonymous folks, etc.) Hopefully that helps expose your resume to more folks in the database industry.

For problems with the certs, feel free to click contact at the top of the site. Include screenshots of the exact error you’re getting, and the screenshot should include the URL in your browser. Thanks!


[Video] Office Hours in Haines, Alaska: No Bears Here

Videos
0

While in beautiful Haines, Alaska along the Chilkoot River, I stopped to take your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Start
  • 00:26 James L: When you were an employee, how did you decide whether to move on to another job? I think a similar dilemma exists when you’re trying to decide whether to move your own business into another phase.
  • 01:33 QueDayTa: Hi Brent – Can you point us to what the best practice pattern is for a simple, full table refresh, Oracle-to-SQL Server ETL job? Problems: blocking, locking and execution time. We’ve resorted to RCSI, DELETE vs TRUNCATE vs ALTER TABLE […] SWITCH. Thanks.
  • 02:21 Bandhu: In SQL VM is it better to retire excess CPU cores or turn on compression to utilize the excess cores? How do you decide?
  • 03:06 Erzsebet: What are the best and worst use cases you have seen in the field for SQL CLR?
  • 04:08 Frank: Hi Brent! Loved your cloud training course. Followup: How bad is connection stability nowadays compared to on-prem? I read horror stories from the early days of Azure SQL. Is hardening the client code with retry logic everywhere still needed? Thx
  • 05:00 Yusef: We need to regularly update our first responder kit installs. Is the same also true for Ola maintenance scripts and sp_whoisactive?
  • 05:39 Shahid Kapoor: What is your opinion of WorkloadTools for capturing and replaying SQL workloads for perf testing?
  • 05:48 Wilson Souza: Hi Brent ..I have a SQL job scheduled to run at 7:30 PM , but after reboot the server, the schedule changes for 7:30 AM.. SQL server version 2019. Any idea? Thanks for your help
  • 06:37 Rose Noble: What are your pros / cons for long term vs short term DBA contracting?

[Video] Office Hours: Icy Strait Point, Alaska

Videos
0

I’m on a boat! We took an Alaska cruise with some friends aboard the Norwegian Jewel. En route to Icy Strait Point, Alaska, I took your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Start
  • 01:40 RollbackIsSingleThread: Hi Brent! If a high-quality blog post involves original ideas, T-SQL, a great demo, and images to prove a point, but uses an AI tool like Grammarly or Google Translate to edit the content, Google might delist it. Any idea?
  • 03:27 MyTeaGotCold: Can you think of anything that the Service Broker is the best solution for? Modern alternatives seem better.
  • 04:37 JustWondering: Do you have a sense of what percentage of your followers are administering systems that are “custom built” vs “vendor app” dbs?
  • 05:13 Miss Minutes: Once SQL AG is stood up with two HA nodes and an offprem DR node, are SQL full backups necessary any longer?
  • 05:47 some_like_it_encrypted: Does people use Always Encrypted? I have the feeling that it’s quite an unused feature.
  • 06:53 chandwich: Have you ever successfully changed a database collation for a large database with numerous constraints? Any tips for doing this?
  • 08:43 Daniel Izzaldin: Hi Brent, do you have a course or do you know a course that teaches you how to become a Consultant.
  • 10:29 Logar The Barbarian: Hello! Is sp_DatabaseRestore expected to work with Azure/blob backups? From my review of the GitHub, it does not appear so. If not, what would you suggest for restores to other servers? I like sp_RestoreGene for same servers but it may not work for DR/test restores. Thank you!
  • 11:40 Laslzo: When an sp needs a long list of input id’s, do you prefer comma delimited string or a table value param with all the ID’s?

And that zipline that I mention in the video? Yeah, that was pretty awesome.


Who’s Hiring in the Microsoft Data Platform Community? October 2024 Edition

Who’s Hiring
2 Comments

Is your company hiring for a database position as of October 2024? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


Announcing the Winners of the Biggest Database Regret Contest

Company News
5 Comments

I recently asked you to leave a comment with your biggest database regret, and the comments were great! Here were my favorites:

Runners-Up

These 5 folks each won a Fundamentals Bundle lifetime access valued at $695:

Thirster42: writing code to help a team dynamically create sequences. they didn’t tell me they were going to generate millions of them so now the db is super hard to open in object explorer, run sql compare, or use auto complete tools.

Thirster42’s was literally the very first comment, and I laughed out loud when it came in. It’s such a good example of a regret. In theory, sure, you can dynamically create just about anything in SQL Server, but the more of them you create, the worse shape you’ll be in. It reminds me of Swart’s 10 Percent Rule.


Stefano: One of my first jobs as a consultant: critical production SQL Server with performance issues (slow queries, blocking), high workload (thousands of queries/s). To study the problem I decided to run a server-side trace. The idea was to capture a lot of types of events and not lose any records… Result: total block. No way to stop the task. Flurry of calls from customers… Not knowing what to do, I had to restart the server remotely..

I’ve been there too. I do wish there were more “are you sure you wanna do this” guardrails built into T-SQL commands. Those kinds of guardrails are common in GUIs, and in operating system text commands, but completely unseen in query language commands. (“Hey, are you sure you wanna run a delete without a where clause?”)


Alex: No pure SQL compatibility between different products. MSSQL, Oracle, PostgreSQL etc. TOP(100), LIMIT(100), FIRST(100).

At first when I read Alex’s answer, I thought to myself, no, Alex, this contest was supposed to be regrets about things YOU had done, but when I went back and reread the post, I realized Alex’s answer was totally legit. Better than that, Alex’s answer is one of my OWN top regrets about the database industry too, come to think about it! The languages are juuuuust close enough to be annoying, and just close enough that execs think that a migration will be easy. It will not.


SmartWombat: My first ever FORTRAN computing assignment at college (’67) just to input and store name, age, sex. Report on the invalid data. Next day I went back to the lecturer and said there’s nothing in the data to determine if they are an invalid or not. Cue red face when he explained it to mean incorrect data, as in not valid. Not data about invalids. Lesson learned on requirements gathering at a very early stage !

I laughed out loud at this one too because it took me a couple of reads to understand what “invalid” meant in this context.


Thomas Franz: On a previous job my coworker decided to not allow NULLs in from/to columns and to use a default value instead. Sadly he decided it to be 1899-12-31 for both, from and to, so you always had to write something as GETDATE() BETWEEN t.valid_from AND IIF(t.valid_to = ‘18991231’, ‘20991231’, t.valid_to)

Uuuuuuugh. Yeah, that would suck.


Grand Prize Winner

Chris Wilson: My biggest regret is not starting a online presence. For over a decade I’ve followed the careers of many in the SQL community and thought I could do that and had grand plans to do so but never pulled the trigger. Like planting a tree the best time to start a SQL blog is 20 years ago. The second best time is today. I don’t know that it would have changed my career in any meaningful way but I do know it would have solidified my own knowledge in a very tangible way. “The best way to learn is to teach” as the old saying goes.

I felt guilty as soon as I read Chris’s regret because I knew I had to give him the prize, which happens to be a Fundamentals + Mastering Bundle lifetime access valued at $2,495.

I don’t care what your online presence is: a Github repo, a YouTube channel, a blog, a podcast, whatever – but pick one thing that works for you, and start doing it.

Get ‘er done.


[Video] Office Hours: Open Q&A on Microsoft Databases

Videos
2 Comments

I took your top-voted questions from https://pollgab.com/room/brento while recovering from a late night of partying.

Here’s what we covered:

  • 00:00 Start
  • 01:00 VegasDBA: Loved the Always-On Availability Group Episode! You mentioned how awful the dashboard is for monitoring. Can you recommend any scripts, custom alerts, or dashboards to keep a better eye on things?
  • 03:26 SteveTV: Are there any tools or utilities to manage source control for SQL Agent job definitions?
  • 04:21 SHA256: How do you feel about MongoDB? Is there ever a reason to use it over SQL Server?
  • 05:50 JoseDBA: What metrics should I show to my boss when asking for a new dba in my team. We are 3 and manage around 180 servers with over 3000 DBs. I read your post of “how many dbas you need” but I don’t know how to properly communicate that. Thanks!!
  • 08:49 JoseDBA: How to keep server level objects in sync when using availability groups. Is dbatools the best option?
  • 09:46 Discussing my Anthony Bourdain painting by Cassie Ott
  • 11:52 Elon: Why don’t you own an electric car?
  • 14:27 MyTeaGotCold: When I’m writing a new stored procedure and make a compile-time syntax error, the error’s line number is always wrong. What’s the trick?
  • 15:25 Sigurður: Should SQL VM DBA’s also be required to know how to create the underlying infrastructure (VM, cluster, disks)? Is this a unicorn trait?
  • 17:15 happydba: what happened to groupby (the online conferences)? they were great 😀
  • 18:57 AppleUser: How do I convince my security team not to run EDR on my SQL Server cluster?
  • 20:39 Bandhu: For Azure SQL VM’s, is large memory or fast disk more of a dollar premium?
  • 21:06 Ninja: I’m struggling getting my SSMS to connect my on-prem SQL Server to Azure Blob Storage. Any advice?
  • 22:23 Panagiotis: Currently on SQL 2019 with log shipping HADR. Is it wise to simultaneously migrate to SQL 2022 / availability groups or should those be two separate upgrades?
  • 23:45 JerseyDBA: As someone who is younger (30+ years away from retirement) what’s the most important thing you think a DBA can do to future proof their career?
  • 25:38 Dopinder: What complimentary AI skills should the SQL Server and PostgreSQL DBAs be learning?
  • 27:21 Simon Frazer: Hi Brent, I love reading all your posts about career progression! How much time do you have to spend writing posts to keep your blog fresh?

Free Webcast: How to Think Like the Engine, Big Data Edition

Conferences and Classes
6 Comments
Brent, Big Coat Edition

You’ve watched my free How to Think Like the Engine class that focuses on the small 1GB Users table.

Let’s take things up a notch by using the ~164GB Posts table full of questions & answers. You’ll learn:

  • What PAGEIOLATCH waits are
  • Why you can’t usually fix them by just adding small amounts of memory like 16-64GB RAM
  • Why key lookups aren’t a big deal (compared to the alternative)
  • Why SELECT * isn’t as bad as it’s cracked up to be
  • Why big data also means slow writes on storage

Join me next Thursday for a free webcast sponsored by Pure Storage. See you there!


Yes, Cardinality Estimation Keeps Changing After SQL Server 2014.

Execution Plans
4 Comments

About 10 years ago, Microsoft made changes to the Cardinality Estimator (CE) which caused some problems for SQL Server upgrades. When folks upgraded to SQL Server 2014, they also casually switched their databases’ compatibility level to the latest version, because for years that hadn’t really affected query plans. They just figured they wanted the “latest and greatest” compat level, without regard to the effects. That backfired badly when they suddenly got 2014’s Cardinality Estimation changes.

So for several years, whenever someone upgraded from older versions, and they complained about performance, the stock community answer was, “Change your compatibility level back to what it used to be.” In many cases, that just solved the problems outright, leading to blog posts like this and this.

Even today on SQL Server 2019 & 2022, this advice is still relevant! If you mess around with compatibility levels, you can absolutely change cardinality estimations in ways you didn’t expect.

One amusing example is SQL Server 2022’s cardinality estimation feedback. Even if your workload has been on 2014+’s “new” Cardinality Estimator for a while, 2022’s CE feedback can change cardinality estimations back to older versions! That’s a complex example, though, and I’d rather stick to simple examples for here on the blog.

Simple Example Where 2022’s CE Improved

Let’s say our code uses a scalar function that just implements an RTRIM:

Starting with SQL Server 2019, SQL Server can attempt to inline that scalar function. Not only does that affect the shape of the execution plan, but it also affects the estimates.

Let’s see it in action with the largest Stack Overflow database. We’ll start by building an index to support our query, and then we’ll run the same query in two different compatibility levels – first 2016, then 2022:

The actual query plans:

First off, the good news is that 2022 absolutely smokes 2016 simply because it inlines the function. The query can complete in about 1 second, as opposed to 2016’s 1 minute 16 seconds. But set that aside for a second and let’s look at the cardinality estimation changes.

The top plan (2016) does a clustered index scan because SQL Server 2016 made a hard-coded assumption that 10% of the rows would match our filter. The first operator (top right) estimated that 22,484,200 rows would come out of the clustered index scan, and then the second operator (the filter) estimated that 2,248,420 rows would match.

Gotta love those hard-coded estimates at exactly 10% because they’re easy to spot, but I do wish they had a yellow bang on ’em so that SQL Server would warn us that it’s making up estimates out of thin air.

The bottom plan (2022) uses the index AND it gets absolutely beautiful, bang-on cardinality estimates. The first operator (top right, index scan) estimated 18,882 rows would come out.

The haters will say, “Brent, that’s not fair, that’s a different feature and it’s not the Cardinality Estimator.” They’re wrong, of course: the important part is that things have been changing all around the Cardinality Estimator, changing its inputs, and so as a result it’s common to see different estimations coming out of the CE with each subsequent version.

Simple Example Where 2022’s CE Worsened

In the first example, our simple function used the exact same NVARCHAR(40) that our table’s DisplayName column uses. However, what happens if we change the function to use a more generic datatype, like NVARCHAR(MAX)?

We’ll run our queries again, and look at their actual execution plans:

Now, SQL Server 2022 ignores the index. Why would you ignore the index in a query like this? Because you estimate that too many rows will come back, making the key lookup too expensive. But… look more closely at 2022’s execution plan. How many rows did it actually think were going to come out after the function’s filter?

Just one. <sigh>

I have no idea what the sam hell bug this is, or which part of the engine it’s in – whether it’s the CE or scalar function inlining or the query optimizer. All I can say is that the estimate is hot garbage. As a result, the sort spills to disk because SQL Server didn’t allocate enough memory to sort the rows that actually came back.

So you can’t even say, “Thanks to 2019, scalar functions get inlined, so estimates are more accurate.” They’re not. They’re all over the place, even in simple cases like this where we’re just changing the length of a datatype. (Again: the example here in the blog post is purely to talk about cardinality estimation, but at least 2022’s query runs way faster in this case because the function gets inlined.)

The Moral of the Story

The moral of the story: keep your grubby fingers off the compatibility level switch until you’ve followed my migration instructions.

To learn more about topics like this, attend my PASS Summit pre-conference workshop in Seattle, Tuning T-SQL for SQL Server 2019 and 2022.


TRY/CATCH Doesn’t Always Work.

T-SQL
11 Comments

If you’re using TRY/CATCH to do exception handling in T-SQL, you need to be aware that there are a lot of things it doesn’t catch. Here’s a quick example.

Let’s set up two tables – bookmarks, and a process log to track whether our stored proc is working or not:

And create a simple stored procedure that adds a bookmark, and tracks whether it was successful:

When you execute the proc, it succeeds, and a row is written to ProcessLog:

But if someone adds a new column to our Bookmarks table:

And we try to run our stored proc again, it fails:

Because the stored proc’s insert statement didn’t explicitly list the columns in the Bookmark table:

Okay, that’s bad code – but did the CATCH come into play? Check the table contents:

There’s no row in ProcessLog saying that the process failed! What happened? Wasn’t our CATCH supposed to insert a row there?

Catch only catches SOME errors.

Early errors aren’t caught, like errors when the query is being compiled. In this case, when SQL Server was building an execution plan for the stored procedure, SQL Server couldn’t build a valid execution plan because there’s no way for it to execute the insert. The compilation failed, which technically means the query wasn’t executed – even though to you and me and our app users, it was executed.

Low-priority errors aren’t caught, like under severity 10. Those are just considered informational messages.

High-priority errors aren’t caught, like severity 20 or higher. Those terminate the connection altogether.

Right about here is where you’re expecting me to give you a magic bullet that fixes these problems, but instead, I have to give you a monster amount of documentation. Check out the epic posts Error and Transaction Handling in SQL Server Part 1, Part 2, and Part 3 by Erland Sommarskog. They’re monster posts, and I have to be honest with you, dear reader, I haven’t ever read them cover to cover. This is one of those times where I’m glad I have a fake job, aka consultant, where I can just say, “If our stored procedure’s business logic is really that critical and complex, it’s time we move that processing over into an application language like C# that has better error handling and testability.”


[Video] Office Hours: Open Q&A About Databases

Videos
0

I went through your top-voted questions from https://pollgab.com/room/brento plus hit live ones from the TikTok viewers.

Here’s what we covered:

  • 00:00 Start
  • 00:52 chandwich: If SSMS had a dark mode option that was equally as good as the existing light mode, would you use it? Any idea why it doesn’t exist yet? RE: The on-call DBA that doesn’t like being blinded at 2:00 AM by SSMS.
  • 06:06 DBA JR: Hi, Brent what do you think about DP-300 Exam Certification or other certification? do you think is effective for finding a DBA job in the US or Europe? Is it important for companies?
  • 09:01 MyTeaGotCold: What are the biggest disadvantages of using RDS Microsoft SQL Server instead of an EC2?
  • 11:01 Yukio: What are the top skill gaps you see in SQL Server DBAs?
  • 14:20 DataGuy: Assuming a table does not have range scans nor ORDER BYs, why should a clustered index be added? HEAPs get a lot of hate, but maintaining them nightly with enterprise edition isn’t that big of deal. Blindly adding CX to the PK field(s) means worse performance for the NC indexes?
  • 16:36 Ozan: Hi Brent, when a SQL Server Patch gets released it makes sense to wait a while before deploying because it might be buggy or might have other unexpected surprises. Do you know a reliable source or community where one can check if a patch is „safe“? Thanks
  • 18:02 Meg Thomas: Does drive block size matter when using NVME SSD in SQL cloud vms ?
  • 19:35 Wiron: When should you use a stored procedure?
  • 20:24 zxta: How does a database like Redis work?
  • 21:17 Ross: When I’m developing an application, what indicators would trigger me to use SQL Server over Postgres?
  • 22:53 User: Do you have any plans for Postgres training?
  • 23:20 Tadim: What about storing JSON in MongoDB versus Postgres?
  • 23:58 Ailos: I use SQL Server clusters on physical devices. What IO values should I look for?
  • 25:25 Tim: How do you land a SQL job?
  • 28:03 Tadim: Do you have any experience with SingleStore?
  • 28:51 Raphra: If ChatGPT or Gemini used your online blogs to train models, would you be ok?
  • 30:55 Yoichi Asakawa: What is your opinion of NetApp files for Azure SQL VM?
  • 32:10 ThreeDaysGrace: How do I avoid losing connectivity between databases and applications on failover?
  • 33:58 User: What’s the difference between leaderless and multi-leader replication for databases?
  • 34:42 User: Why should I limit the connection pooling on my server?
  • 35:25 Clip: Can you talk about indexing?
  • 36:43 Steve Harrington: Is it recommended to use allocation unit size of 64kb on temp db drive for Azure SQL VM? How do you do this if the drive is ephemeral?
  • 38:00 OMGTiko: You should talk about why the best data warehouse is a lakehouse.
  • 39:29 A discussion about TikTok’s algorithm
  • 40:04 Ristos: I’m getting an error in MySQL, how do I fix it?
  • 40:27 UnderTheC: How do I use Redis to cache data effectively?
  • 41:40 Aidos: When I have different plans for the same query, do you have videos for that?
  • 42:35 Fioma: Can you talk about partitions?
  • 43:42 Leolog: Any recommendations on how to learn architecture, like data warehouses and lakehouses?
  • 44:06 ThreeDaysGrace: Wow, Brent, how are you not losing such inspiration for new technology?
  • 46:09 Gambit: How do you determine the correct amount of TempDB files and their sizes?
  • 47:08 JChandra: It’s been a while since I looked into SQL Server internals. Can you scale it out horizontally these days?

Query Exercise Answers: Why Are These 3 Estimates So Wrong?

Query Exercise Answers
5 Comments

This week’s Query Exercise challenged you to figure out why these 3 estimates went so badly:

Even though I’ve got indexes on Users.Location, Posts.OwnerUserId, and Votes.PostId, the estimates were terribad:

I challenged you to explain why each of the estimates went badly, in order.

1. Why was the Users.Location estimate off?

SQL Server opened the Users.Location statistics histogram and went looking for Reading, United Kingdom:

There wasn’t a dedicated histogram bucket for Reading here (nor is there one in the 2nd 200 that we created in the filtered stats attempt, because Reading isn’t in the top 400.)

The moral of the story here is that in complex, real-world queries, the filtered statistic solution for the 201 buckets problem isn’t enough. Even relatively “small” outliers like Reading run into problems. Now, that one estimation problem by itself wouldn’t be a big deal, but it turns out that Reading is an outlier in another way, as we’ll see next.

2. Why was the Posts.OwnerUserId estimate off?

First, we need to understand where the estimate came from. Let’s hover our mouse over the Posts.OwnerUserId index seek:

I don’t understand what kind of drunken baboon was at the wheel when this tooltip was designed, so forgive me for having to restate the numbers in an order that makes some kind of logical sense:

  • Estimated Number of Executions: 9.9 – because SQL Server thought ~10 people lived in Reading, and we would have to seek in here 10 times to find their 10 different User.Ids
  • Estimated Number of Rows Per Execution – 9.5 – because SQL Server believes overall, in the entire user population, people who have created Posts, have created an average of 9.5 each.

It’s worth stopping here to elaborate a little more on that. SQL Server is not saying, “The people in Reading create an average of 9.5 posts each.” Until runtime, SQL Server has no idea who the people in Reading are – it doesn’t know their User.Ids! It doesn’t know anything special about Reading whatsoever. It’s assuming the people in Reading are all average – and it turns out they are not.

  • Actual Number of Executions – 418 – because when we did the Users index seek, we brought back way more rows
  • Actual Number of Rows for All Executions – 59,391 – and this is where things really go off the rails.

If SQL Server’s estimate that each person produces an average 9.5 posts was correct, then 418 * 9.5 = 3,971. We’d have found about 3,971 posts for the people in Reading. However, that’s still way off, because we found more than 10x that many!

It turns out that some individual people in Reading are outliers, particularly one Jon Skeet. Lil Jon Skeet Skeet, we call him. When he’s not in the club, he’s answering Stack Overflow questions, so he massively inflates the number of Posts.

3. Why was the Votes.PostId estimate off?

Again, let’s understand where the estimate is coming from by hovering our mouse over that operator:

  • Estimated Number of Executions: 94 – because SQL Server thought the 9.9 people in Reading would only produce 9.5 posts each, and 9.9 * 9.5 = 94
  • Estimated Number of Rows Per Execution – SQL Server believes the average post from anywhere receives 4.4 votes

That might be true for overall – but if we look at just the posts from Reading:

We get an average votes cast of 11.6 – more than double the global average! There’s something about the Reading posts that drive people wild. (Skeet skeet.)

So for the final estimate, Estimated Number of Rows for All Executions = 414, we can see why that’s way off. It’s composed of 3 parts: 10 people in Reading, 9.5 posts each, 4.4 votes per post. All of those component estimates were wrong: there were more people, with more posts, and more votes on each post. The final estimate is 1,453x off.

If you figured out these causes, you should be proud of yourself.

Seriously, take a moment to congratulate yourself on your journey. You were able to read the execution plan, decipher the goofily-displayed metrics, and understand the components that drove each estimate. You understood where the statistics came from, and why even hacked-together solutions for the 201-bucket problem aren’t enough. You understood that joins literally multiply bad estimates.

If you didn’t figure this stuff out because you’re used to working with relatively small data sets, perhaps with nicely evenly-distributed data, you should also breathe a sigh of relief! Your career hasn’t progressed to the point where you’ve had to solve these problems, and it’s gotta be nice to smile and nod from the outside at what the rest of us have to deal with.

Mastering Query TuningWhen you’re ready to learn more about solving these kinds of problems, check out my Mastering Query Tuning class. We cover lots of estimation problems like these and dig into various ways to fix it.

You shouldn’t jump to that class first: the prerequisites include my Fundamentals of Query Tuning class and Mastering Index Tuning class. I really do get students who jump directly to one of the more advanced Mastering classes, and then come back to me saying, “Uh, Brent, I thought I was experienced enough to jump straight to that, but it turns out I didn’t actually know the fundamentals that I thought I did.”

That’s why most students start with the Fundamentals bundle, then after their first year, step up to the Mastering bundle. Don’t jump straight to the Mastering one if you don’t have a lot of time to spend digging into the classes – start with the cheaper Fundamentals one, and only spend more when you’ve conquered the basics.

See you in class!


Contest: What’s Your Biggest Database Regret?

Company News
60 Comments
PAST ME, WHAT WERE YOU THINKING

We all make mistakes.

I certainly have.

I’ll give you a recent one, actually: when we designed the database for SQL ConstantCare®, when we’re storing diagnostic data like wait stats, we thought these 3 columns would be good for identification:

  • user_id
  • server_name
  • instance_name

Because a single user would never send in data for the same SQL Server name, same instance name, but different data, right? Well, wrong-o, because it turns out:

  • Some companies reuse the exact same server name in different places (like they have SQLPROD1 in NYC, and another one in Atlanta, and another one in Taiwan). Even worse, at some of these companies, even IP address isn’t unique. Bananas.
  • Microsoft Azure SQL DB doesn’t group together data at the server/instance level – instead, you have to send in data for each database, separately.

Argh. To solve the first one, we would need to add a column for a user-defined server name, like “NYC SQLPROD1”. To solve the second one, we would need to add a column for database name. (You can’t use database ID because that can change when the database is restored.) But in both cases, we’re talking about a seriously widespread change to the database and all of the app code involved, and … I’m not wild about that.

What about you? What has been your biggest regret in your data career? It can be anything from a design mistake, to an architecture goof, to a production oopsie. Share it in the comments this week. The more entertaining the story, and the more I laugh out loud, the better your chances are to win. On Sunday, September 29th, I’ll pick my favorites and award winners:


[Video] Office Hours: Database Q&A

Videos
0

I went live on TikTok and took your top-voted questions from https://pollgab.com/room/brento:

Here’s what we covered:

  • 00:00 Start
  • 00:23 Andrea Hardesty: Do you have any recommendations for reproducing parameter sniffing issues? I have a copy of a production DB. I want @param=222 to use the plan compiled with @param=111 like it did in production. In my test it compiles a new plan for @param=222 every time. Suggestions?
  • 01:37 Eduardo: Is PowerBI a recommended skill for SQL Server DBAs? Do many of the DBAs you encounter have this skill?
  • 03:09 ProbablyOverthinkingThis: My friend and I agree that the overhead of Linked Server queries are “not ideal”. What do you recommend if we have multiple databases hosted by their own application server databases and the need to cross query that data?
  • 05:20 RoJo: We have 1TB DB and we use AG. The Log file is very large (350G), which perplexes us. Not that I need to shrink, but it does not show it would free up anything if I did. Why does SQL keep this so large – or should I care?
  • 06:46 280-2-smallDBA: Query w/high ASYNC_NETWORK_IO. Nested CASE with subQs in cols and Where. Joins are comma sep names. All tbls are local to DB. Other big waits CXPACKET, CXCONSUMER. 1col is NVARCHAR(MAX) always null. Est row size 6kb always 1row. Why is AN_IO big wait? Can CXP and CXC cause AN_IO?
  • 08:32 David Hicks: When looking at possible indexes to drop in order to fix table locks and long-running inserts/updates, is there a particular ratio of inserts and updates to actual usage of an index that you use for a guide, 10:1 for example?
  • 11:45 Jökull: Is there a good way to include the IP address for when SQL Server sends out emails? We have some rogue SQL Servers sending emails and would like to track them down.
  • 13:12 Vishnu: Will stack overflow experience a net benefit or loss from the A.I. revolution?
  • 20:02 Gianis: What’s your opinion of TSQL sprocs that have table value params that are passed in from .NET code? Is this good/bad practice from a performance perspective?
  • 21:19 Perseus: What are your top rules for working from home for a WFH newbie?
  • 22:28 DevInHiding: Hi Brent, sorry, this probably has been asked before, but can you recommend someone who is “the Brent Ozar of data security/encryption”? Thanks so much
  • 23:57 Srinivas: In Azure SQL VM multi tenant DB, is there good way to determine which tenant is utilizing the most disk IO for billing and perf monitoring purposes?
  • 24:54 Ramesh: What do you like about the paid AI chat tools over their freebie counterparts? Does one hallucinate less than others?
  • 26:23 Dan: Hi Brent, what are the top things that you’d like to see added or changed in PostgreSQL to make porting from SQL Server as painless as possible?
  • 27:56 Venkat: Are Microsoft certifications more important / valuable to the SQL Server contractor or the SQL Server corporate DBA ?

Query Exercise: Why Are These 3 Estimates So Wrong?

Query Exercises
6 Comments

Our prior Query Exercise introduced SQL Server’s 201 buckets problem: its inability to accurately estimate rows for more than 201 outliers in a table. I followed up with a solution using filtered statistics to help with the next 200 outliers, and I talked about how that’s really overkill for the simple problem we were facing in that initial challenge.

Let’s build upon that knowledge, but use a more complex query. I’m going to keep the filtered statistic in place, which gives me accurate estimates on the number of people in Lithuania, and I’m going to add an index on the Posts table (questions & answers.) My query’s goal is to find the top-voted Posts (questions & answers) created by people who live in Lithuania:

The query’s actual execution plan in the biggest Stack Overflow database, on SQL Server 2022, using 2022 compatibility level:

Let’s zoom in to read the estimates from right to left, top to bottom:

They’re great! The top right index seek on Users.Location accurately estimated that 2,554 users live in Lithuania. Then, it joined to Posts on OwnerUserId, and its estimate that the Lithuanians had created 24,316 posts was really close to accurate!

Granted, there’s a yellow bang on the plan because SQL Server didn’t grant enough memory. The sort spilled thousands of pages to disk, but if you run the query again on SQL Server 2022, adaptive memory grants start taking care of that problem for you automatically.

Try another location and add another join.

I hear Berkshire in the UK is a scenic place, and let’s also sort their posts by which ones have had the most votes cast overall. I’ll even have an index to support that, too:

The query’s actual plan has some problems:

ENHANCE! Let’s zoom in on the top right and read from right to left to see where things went wrong:

What happened, in order:

  1. SQL Server seeked on Users.Location to ‘Reading’ and expected to find 10 rows – but it actually found 418
  2. SQL Server seeked on Posts.OwnerUserId to find those 418 people’s posts, expecting to only find 94 (because there were so few estimated people in Reading), but it actually found 59,391
  3. SQL Server seeked on Votes.PostId to find the votes cast on those posts, expecting to only find 415 (because there were so few expected posts), but it actually found 603,085

Your exercise includes a few parts this time.

Let’s play fill-in-the-blank. Your comment needs to include 3 answers:

  1. What caused the estimation on Users.Location = ‘Reading, United Kingdom’ to be wrong?
  2. What caused the estimation on Posts.OwnerUserId to be wrong?
  3. What caused the estimation on Votes.PostId to be wrong?

I’m not even asking you to solve any of them – just conceptually put together the pieces about why each estimate failed.

If you do wanna try to fix any of the estimates – and again, you don’t have to, that’s just icing on the cake – then put your queries in a Github Gist and the query plans in PasteThePlan, showing your new accurate estimates, and include those link in your comments. Check out the solutions from other folks, compare and contrast your work, then check out my post with the answers. Have fun!


PASS Data Community Summit Prices Are About to Go Up!

#SQLPass
0

Next Tuesday, pricing on the 3-day tickets for the PASS Data Community Summit goes up to $2,295.

But if you register right now, plus use coupon code BRENTO24, it’s just $1,945.

It’s one in-person conference that covers Microsoft SQL Server, Azure, PostgreSQL, Snowflake, Oracle, and more. So many of us (me included!) are working with multiple databases these days, and it’s hard to find a single event with this kind of coverage.

I’m also teaching a 1-day pre-conference workshop on Monday, Tuning T-SQL for SQL Server 2019 and 2022. That’s an additional $595, and you can also sign up for another pre-conference workshop on Tuesday. There are great options in there including Query Quest with Erik & Kendra, Microsoft Fabric in a Day, PostgreSQL Fundamentals, Power BI Architecture, and a SQL AI Workshop run by Microsoft folks.

Go register now, and I’ll see you in Seattle!


[Video] Office Hours Q&A: Abrupt Exit Edition

Videos
1 Comment

Last week, we had an abrupt intro, and this week, we have an abrupt ending when my camera overheats, hahaha:

We went through your top-voted questions from https://pollgab.com/room/brento. Here’s what we covered:

  • 00:00 Start
  • 00:43 cha2rg: We have an OLTP table size of 4TB and rows of 6 billion. Scheduled an old data deletion job for this table and it deleted 5 million rows (Duration 1h, 500 records batch) per day. During this data deletion period does it need to re-build the index for better performance?
  • 02:10 SSASsy_DBA: We are using SSAS in multi-dimensional mode. Our DW=10 TB and SSAS Cube=900GB. Looks like this SSAS technology is being phased out by Microsoft and we are struggling to find expertise on this. What would you consider as good technology for pre-crunching OLAP style data like SSAS.
  • 04:13 RollbackIsSingleThreaded: Hi Brent, What is your opinion about Google delisting due to AI-generated content ?
  • 06:30 Punjab: What’s your opinion of the new JSON data type in Azure SQL DB? Will this be popular enough to make it to boxed SQL Server?
  • 07:03 Sean: Two databases (joined queries across) one Latin1_General_100_CS_AS the other SQL_Latin1_General_CP1_CI_AS: How will performance in tempdb be affected if I match the two so they both use Latin1_General_100_CS_AS even though tempdb will still be in SQL_Latin1_General_CP1_CI_AS?
  • 08:12 Qimir: What’s your opinion of the new Azure Database watcher for Azure SQL DB?
  • 10:31 MyTeaGotCold: Do you find many shops that should have went to fourth normal form, but stopped at Boyce–Codd normal form?
  • 13:02 RoJo: While you did answer this previously, can you elaborate on the dangers of removing ‘dead code’ or stProc that have been inherited. There seems to be an inherent desire to clean out regardless of cost.
  • 15:10 Christos Mavroidakos: Hi Brent, We have an issue with Auditing in Azure SQL database. We need to eliminate to audit certain tables and avoid all junk of audits that are produced. This skyrocks the cost and the actual auditing is not efficient. Is there a way?
  • 17:27 Aaaaall Night Lo(n)g: TopicCategories is a many to many table. Every topic has a primary category. Would you put PrimaryCategoryId on the Topics table or IsPrimary on the TopicCategories table? 5 of one or half of ten of the other? Or is there a clear winner for most cases?
  • 18:29 TeaEarlGreyHot: When performing test restores, is it necessary to test the log backups, or only fulls and differentials?
  • 19:33 YouTubeFreeLoader: I occasionally see some statistics for a primary key column that have an equal rows value of over 800 but all of the rest of them are 1 like I’d expect. When I run a select for that value the estimate is still 1 in the plan. Any idea what causes this? Is it a problem?
  • 20:36 Markive: We have a SAAS application, each customer has 1 SQL server database per ‘project’. Only our application makes queries to the SQL instance. Does it make sense to isolate each customer with a separate user account, or are the overheads not worth it in your opinion?
  • 21:53 Mustafa: What is your opinion of Sios Data Keeper for SQL Server?
  • 22:58 Gerardo: Have you always been a segmented sleeper or did it start at a certain age? Many Spaniards are segmented sleepers.
  • 24:44 hoping-to-retire-soon-dba: Hi Brent – I was just reviewing issues with index maintenance and see where allow_row_locks and allow_page_locks should be set to “ON” for standard edition SQL Server 2016+. What’s the skinny on these settings? Are there any downs to setting them on?
  • 27:13 DBA G: Hi Brent. When a recurring agent job overruns past the time of the next scheduled run (e.g. runs every 15 mins but took 16) it skips a run and waits until the next time it should run. Is there any way of detecting / alerting on skipped runs? I cannot see anything obvious. Thanks!
  • 29:10 iloatheesributlovesql: Have you ever worked with ESRI Geodatabases on SQL Server or PostgreSQL, any advice at all?
  • 29:32 i_use_uppercase_for_SELECT: Is there a situation where you’ve ever suggested taking logic out of SQL and putting it in the application for performance reasons? What about the logic made you push for that?
  • 31:30 Sigurður: When can we look forward to the first Ozar PostgreSQL recorded training classes?
  • 31:46 Ben: Version 1: Evaluate a CTE, insert into a table, and then select top 101 from table Version 2: Select top 101 from CTE and then insert into a table. Why is version 1 generally better?
  • 34:25 WouldLikeToKnow: I have custom SQL code in a SP that I would like to protect from being viewed by people. Creating a SP “with encryption” offers some protection but there are still ways to view it. Would storing the custom SP code in a file or another way you suggest offer more protection? 35:41 JordanT: My friend has several objects (Quotes/Orders, …) with different active statuses and 1 or 2 non-active statuses. When active records represent less than 5% of all records, what are the performance trade-offs using a standard index, filtered index and indexed view?

Geographic Sharding is Usually a Pretty Bad Idea.

Development
1 Comment

Let’s say you built a web app and it became globally popular. You’ve got users scattered across North America, Europe, and Asia.

In theory, you could shard your data globally: build 3 sets of SQL Servers, one on each continent, and split up their data.

In reality, this usually creates more problems than it solves.

Users are mostly active in their own time zones. So your North American database servers will be getting hammered while your European and Asian database servers are sitting around bored. That’s a great thing if you need to plan database maintenance, but it’s a terrible thing if you’re paying a ton of money for hardware & licensing that’s sitting around unused at night.

You have to figure out where to get a user’s data. In simple tutorial examples, posts usually hand-wave away this complexity, but it’s much, much harder than it looks. Just because someone’s web traffic looks like it’s coming from a European IP address doesn’t mean it’s a European user. At login time, you’re going to need to check against a centralized list of clients – and that means you need to keep that list in sync across all continents. When a new user is added, you’ve got something akin to a database transaction to be coordinated across that central list, plus other continents. It’s doable – but it’s much harder than tutorials suggest.

Users will move. For example, I lived in Iceland for 9 months, with an Icelandic address. I was subject to the GDPR while I lived there. I couldn’t believe how much my web experience changed – sites like Facebook were suddenly so much more polite about asking for permissions for things. How long should a site wait before moving my data to a different continent? How long is it legally allowed to wait, for data residency purposes? How will you automate this process, because with a global app so big that it needs multiple sets of database servers, you can’t wait around for meatbags to push buttons to do this maintenance task.

Managers will want reports across continents. I’m not just talking about your customers, either: your own company’s internal managers will want to see near-real-time reports in a single place that sum up global activity. You’re either going to need to build a reporting layer to query multiple continents and combine the data, or use a data warehousing process to periodically merge the data across multiple continents.

Even users want to see data across continents. If you’re building a global social media network, for example, then every time @Cristiano posts a photo, his 634 million followers around the world are going to want to see it, immediately. Where do you store his data & photos?

This stuff is all doable,
and yes, companies do it.

But when they do, they dedicate entire teams of developers to solving the problems listed above. This isn’t something you wanna do over a weekend with a pizza and a case of beer.

If you’re not ready to commit entire teams of people to solving these problems, do the obvious stuff first: performance tune your existing database servers. And of course, you can learn how to do it, and if you can’t wait for that, you can hire me for quick pain relief. Both of those options are way cheaper, faster, and easier than geographic sharding.