Blog

As a SQL Server DBA, Postgres Backups Surprised Me.

I’ve worked with Microsoft SQL Server for so long that I just kinda took backups for granted. We run a backup command, and SQL Server:

  1. Logs activity to the transaction log while the backup is running
  2. Reads the exact contents of the database files, and writes them out to a backup file
  3. When it’s done, it also includes the transaction log changes so that it can use the combination of the data files, plus the transactions that happened during the backup itself, to get a single point in time version of the database files

All without stopping transactions or closing the data files. It’s pretty nifty, and it works really well. The good part is that it’s very efficient at backing up the entire database, and restoring the entire database to a single point in time.

The drawback is that it’s impossible to restore a single object from backup, by itself. Oh sure, we’ve complained about it for years, and it’s the #2 top voted feature request, but it doesn’t seem to be happening anytime soon. We’ve learned to work around that by restoring the entire database somewhere else, and then extracting just the data we need.

PostgreSQL Backups are Totally Different.

One of the gotchas of Postgres is that there are a million different ways to accomplish any task. You could stop the database service and get file-level backups, but of course that’s a bad idea for production databases. You could install extensions like Barman to automate backups for you, and in many cases that’s a great idea for production databases. However, we’re going to focus on the built-in way that most shops start with.

When you back up a database with pg_dump, it actually generates a text file with statements like CREATE TABLE and INSERT that reconstruct the data from scratch.

I’ll give you a moment to re-read that sentence.

At first, you’re going to be horrified, but give it a second and open your mind.

Sure, there are drawbacks: if you work with databases over a few hundred megabytes in size, it probably horrifies you to think about a text file that large. No worries: you can tell pg_dump to compress (zip) the output into a custom file format as it goes. Another drawback is that there’s no such thing as combining transaction log backups with this – if you want to get point-in-time recovery, you’re going to need a better solution than pg_dump.

However, pg_dump has some pretty intriguing advantages – for starters, table-level restores. The pg_restore documentation page has all kinds of switches for just restoring one table, or only restoring the data (not the schema), or just restoring specific indexes, or more.

Speaking of indexes, get a load of this: because pg_dump is only backing up the data, indexes don’t bloat your backup files. You could have 50 indexes on a table, but that index data itself isn’t getting backed up – only the index definition, aka the CREATE INDEX statement! At restore time, pg_restore reads the backup file, runs insert commands to load the data, and then when it’s all there, runs the CREATE INDEX statements necessary to re-create your indexes. Your database can be partially online while the indexes are re-created. (Is this restore strategy going to be faster? Probably not, and I’m not going to test it, but it’s wild to know.)

But here’s the part that’s really going to blow your mind: since the dump file is just a list of commands, it’s technically possible to restore a Postgres database back to an earlier version. Take a plain-text backup (or convert the existing one to plain-text format), and then execute the commands, looking for any errors caused by newer engine features. Edit the backup file to remove the unavailable features in your older version, and then try again.

Which Approach Is Better?

Microsoft’s approach focuses on backing up (and restoring) the exact data file contents at extremely high speed, shoving the data out without concern for its contents. Properly tuned, it’s fast as hell. I’ve had clients who regularly backed up and restored 1-5TB databases in just 5-20 minutes. That’s useful when you’ve got very short SLAs, but not shared storage.

Microsoft’s integration with the transaction log also means that the full backup is very extensible. You can integrate it with log backups, or use it to seed transaction log shipping, database mirroring, or Always On Availability Groups. There’s just one backup approach in SQL Server, but it has all kinds of flexibility that Microsoft has built up over the decades.

On the other hand, there are a bunch of different ways to back up Postgres databases. If you choose the pg_dump approach, it also lends itself to all kinds of creative use cases right out of the box. The more I played with it, the more amused I was at its capabilities. For example, backing up data from AWS Aurora Postgres and restoring it to my local Postgres instance was a no-brainer. The fact that one was a platform-as-a-service database in the cloud, and the other was an on-premises database, just simply didn’t matter – something Azure SQL DB just can’t pull off, even though Microsoft manages the whole code stack.

Microsoft has just one backup tool, and it works really well – as long as you don’t need to do something unusual, like restore a single table or downgrade versions. Postgres has lots of backup tools that have more flexibility and power overall – buuuut, it’s up to you to pick the right one and then configure it in a way that supports your RPO/RTO.


Free DBA Job Interview Q&A Course This Weekend

Interviewing
4 Comments

DBA Job Interview Questions and AnswersIf you’re facing an upcoming SQL Server DBA job interview, or if you’re a manager who has to interview DBAs, I want to help. This weekend, I’m giving away my DBA Job Interview Questions & Answers course absolutely free!

Just hit up my training product page, find the DBA Job Interview course, add it to your cart, and check out with coupon code TikTok.

In that course, I give you questions for core DBAs, infrastructure, development, screenshot-based questions, open-ended questions, and more. I read the question, then I pause to let you answer it, and then I tell you what I was looking for in a great answer.

I did a quick live version of the first few questions if you’d like a feel for it:

DBA Job Interview Course Promo

I hope it helps make the DBA job interview process suck a little less for everybody involved. Cheers!


[Video] Office Hours with a Special Guest: Count Distinct

Videos
3 Comments

I was busy, so I asked a friend to fill in for me and answer your top-voted questions from https://pollgab.com/room/brento. He did a pretty good job:

Office Hours with Count Distinct

 

Here’s what we covered:

  • 00:00 Start
  • 00:52 Confused: Who uses differential backups, really? I don’t get the point.
  • 02:20 Chris: Are third-party tools a necessity or a luxury for managing SQL Server?
  • 03:26 SwissDBA: Statistics can only store 8kb of data, but often it would useful it it could store more info about that table. Can we make stats bigger and would this be a good idea to do?
  • 05:29 Stockburn: Hi Brent, any advice on running sp_blitzindex against a 1TB db with over 60000 tables and over 130000 Indexes. I have tried but it never finishes. Old Microsoft Navision environment. As always love what you do, cheers!
  • 07:06 Pat: Which team should the DBA be in: devs, sysadmins, devops, or something else?
  • 08:06 Indara: Is query store beneficial / necessary if you have third party SQL monitoring software?
  • 08:52 Tobin: Does Microsoft ever watch office hours / read your blog? Any resulting changes?
  • 10:52 DBANoob: Is it possible to perform transactional replication from one HA listener to another HA listener? Before you make fun and ask why, just know we are aware of how crazy this may sound and I don’t have enough characters to explain why we need to do this. Appreciate any input on this!
  • 11:29 TheyBlameMe: Online index creation on a big table in primary DB causing transaction log to max out at 100% due to slow transfer over wire to an AlwaysOn ReadOnly replica in a different geo-location. How can this be better managed, mitigated? No hickups primary, online index duration secondary
  • 12:33 End Try Begin Cry: We need to test our response to checkdb finding corruption. Is there a way to intentionally corrupt a database in various ways?
  • 13:22 FloydianDB: How can I convince our head of the company that adding columns in a certain position in the table is a bad idea. He won’t listen and I’m tired of doing create/drop table statement and re-adding the data back in.

Query Exercise: What Makes SELECT TOP 1 or SELECT MAX Different?

Query Exercises
10 Comments

These two queries both get the same answer from the Stack Overflow database:

But do they go about their work the same way? As it turns out, no – even with no indexes at all, the two queries get different execution plans, and one of them is faster than the other:

On my particular server, with my server-level settings, database settings, and table structure, MAX runs faster and uses about 15% less CPU time to accomplish the same result.

But read these instructions carefully, because this week’s Query Exercise is not about making them faster! I wanna emphasize that really clearly because I know I’m gonna get a bunch of people who yell “create an index” in the comments.

Your challenge is to find things that will change their performance, things that will cause them to produce even more different execution plans. I’ll give you an example: let’s create an index that does not lead with LastAccessDate, and then try the queries again:

The new actual execution plans:

TOP 1 uses a sort, burns 3.3 seconds of CPU time, runs in <1 second, and goes parallel.

MAX uses a stream aggregate, burns 1.5 seconds of CPU time, but stays single-threaded, so it takes longer.

If you wanted to call a winner by lower CPU time, it’d be the MAX, but if you wanted lower execution time, it’d be the TOP 1. Of course, you could change those queries further in order to get a different winner, but I just wanted to start with an example to show you the kinds of influencing factors we’re looking for.

Again, your challenge is to find things that change their performance. I’m doing this because I saw someone say online that MAX is always faster, and that’s just not even close to the truth. There are many, many factors involved in affecting whether TOP 1 or MAX is best for a particular use, and your challenge this week is to find as many of those factors as you can. Absolutely anything is on the table! You can change the server settings, database settings, the table itself, and the query.

Put your queries in a Github Gist and their actual query plans in PasteThePlan, and include that link in your comments. Check out the solutions from other folks, and compare and contrast your work. Then, read my thoughts in the answer post. Have fun!


[Video] I Must Be an Idiot: Automatic Tuning Never Works for Me.

SQL Server 2022
21 Comments

I don’t get it. I’ve given this feature one chance after another, and every time, it takes a smoke break rather than showing up for work.

The latest instance involved the recent Query Exercise where you were challenged to fix a computed column’s performance. In the comments, some folks noted that performance of the query was actually great on old compat levels, like SQL Server 2008, and that it only sucked on newer compat levels like 2016 and later.

That would be the perfect use case for Automatic Tuning, I thought to myself! I’ve said this so many times over the last five years, but I’m an endlessly hopeful optimist, as anyone who knows me well would never say, so I gave it another chance.

We’ll set up the same user-defined function and computed column described in that blog post:

We’ll set up an index on the Reputation column, and a stored procedure that’ll benefit from using that index:

We’ll put our database in old-school compat level, and turn on Query Store to start collecting data at a frantically quick pace:

We’ll run the query a few times, noting that it runs blazing fast, sub-second:

And check Query Store’s Top Resource Consuming Queries report to show that the plan is getting captured:

Now, let’s “upgrade” our SQL Server to the latest and “greatest” compatibility level, and turn on Automatic Tuning so that it’ll “automatically” “fix” any query plans that have gotten worse:

Now, when we go to run our query again, it takes >30 seconds to run, burning CPU the entire time – and zee Automatic Tuning, it does nothing. Query Store shows that there’s a new plan, and that the runtime is way, way worse:

But sys.dm_db_tuning_recommendations shows nothing, even though Query Store is on and so is “Automatic” “Tuning”:

To see it in inaction, here’s a live stream:

Automatic Tuning in Inaction

I don’t get it. I’ve given this feature so many chances, and it’s never kicked in for me at the right times. I’m guessing I’m missing some secret set of steps I need to take, but whatever it is, it’s beyond me. Maybe you can get it to work in this scenario, and share your magic? Now’s your chance to make me look like a fool.

Well, I mean, like even more of a fool.

Update Aug 7 – in the comments, Uri reminds us of Kendra Little’s post about problems with automatic plan correction. She’s frustrated with it as well.

Update October 16 – Erik Darling gave it a shot too and video’d his efforts. He got it working, but the query had to run poorly ~20 times for Automatic Tuning to wake up:

Another Video For My Friend Brent About Automatic Tuning

He’s using the much smaller StackOverflow2013 database, so he could actually get 20 runs to finish – whereas in the current StackOverflow databases, the video would have been a whoooole lot longer before Automatic Tuning would have kicked in. He also had some problems with SQL Server abandoning automatic tuning, reporting that the query was error-prone.


Who’s Hiring in the Microsoft Data Platform Community? August 2024 Edition

Who's Hiring
4 Comments

Is your company hiring for a database position as of August 2024? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


[Video] Office Hours in Cabo San Lucas, Mexico

Videos
0

What a comfy morning to sit down with a breakfast margarita and tackle your top-voted questions from https://pollgab.com/room/brento.

Office Hours: Cabo San Lucas, Mexico

Here’s what we covered:

  • 00:00 Start
  • 00:36 DBA_Mufasa: When we restore a DB from a prod source to a different server, does the DB come with the index usage stats from the original source server or do they get reset on the destination server once the DB is restored. Trying to figure out if people are using a daily restored DB in Dev.
  • 01:44 Trushit: I have a stored procedure that uses XML path to create the XML tree. I want to store this in variable. However, stored procedure returns a column with a link to XML tree. Only when I click on the link, full tree is visible. Any ideas how to access the entire XML tree?
  • 02:43 MyTeaGotCold: I like your arguments for always setting fill factor to 100. But do the same arguments mean that I should default to using DATA_COMPRESSION = PAGE on my rowstore indexes?
  • 04:02 Rose Noble: You’ve mentioned your aversion to linked server queries. Are linked server sprocs acceptable?
  • 04:49 Genuinely Curious: We’re using Standard Edition at the moment. When should I start thinking about Enterprise Edition?
  • 05:27 Yord: What are the top issues you see for rolling out read only AG replicas?
  • 06:32 Pink Pony: What’s your opinion on db setting “auto update statistics asynchronously”? Should we change value to True?
  • 07:49 Miles: Hi Brent, When we’ve resource intensive queries, which ones one should focus on first? high I/O or high CPU or head blocker queries or parallelism or based on waits or user complained queries? Please suggest? Whats the approach one should follow?
  • 08:44 Genuinely Curious: If Azure SQL DB Managed Instances are as good as SQL Server, but require less maintenance, why would people keep using SQL Server?
  • 10:51 TJ: Redo thread on secondary replica tends to get blocked by SELECT queries which in turn block other queries. We have to currently kill the select queries to resolve blocking. Is this the correct way to handle this situation?
  • 12:26 SQL_Stormlight: A friend needs to design a DR solution across datacenters for 2300+ dbs. It /seems/ like the best option is a FCI but how do they deal with the shared storage? Would SAN replication get them further than clustering or are the two used in tandem?

Free Online SQL Server Internals Conference Next Week

Conferences and Classes
0

The EightKB conference is a free event that focuses on SQL Server internals.

You can get the session details and register for the August 8th event at eightkb.online. The session lineup is below, each link being a calendar invite just so you can block out your calendar:

To see the full session abstracts and register, head to eightkb.online.


Thoughts About Stack Overflow’s Annual Developer Survey

SQL Server
7 Comments

Every year, Stack Overflow runs a developer survey about technology, work, community, and more. This year’s results include 65,437 responses from developers around the world.

The results are biased towards the kinds of developers who use Stack Overflow – 76% of the respondents reported that they have a Stack Overflow account. I would guess that it’s nowhere near a perfect picture of all developers worldwide, but let’s just focus on the fact that it does represent how over 65,000 developers feel – and that alone is useful enough to give you a picture of what’s happening in at least a lot of shops worldwide.

The most-used databases were PostgreSQL, MySQL, SQLite, and Microsoft SQL Server, in that order:

When Jeff Atwood and Joel Spolsky first started Stack Overflow, Jeff did a lot of evangelization work for it, and Jeff’s audience was heavily biased towards .NET development. I would imagine that’s part of why SQL Server is by far the highest paid database in the list (as opposed to open source.)

I have a total blind spot around SQLite, but often on my TikTok videos when the discussion of licensing costs comes up, some commenters will ask why everyone in the world doesn’t use SQLite. I’m sure there are a lot of apps worldwide that simply don’t need a database server, only a small local relational storage, and I can understand why those developers would have a similar blind spot about what it’s like to handle concurrent load in an enterprise-wide ERP app or e-commerce store.

The next chart, admired-vs-desired, doesn’t make sense to me and I don’t trust the numbers:

From what I can tell, the blue scores indicate how much the respondents have worked with the database in the last year, and red scores indicate how much they want to work with it next year? I’m pretty confused by this one. Are the red scores exclusive to only the people who actually worked with the technology this year – meaning, out of the 15.4% of the audience that worked with SQL Server in the past year, 54.5% of them want to work with it next year?

And why don’t these numbers come anywhere near agreeing with the prior question? The prior question says 25.3% of the audience used it last year, but the admired-vs-desired question says only 15.4% did? I’m so lost. At first I thought this question is taking about “extensive” development work, whereas the first one might just be ANY database work – but the numbers don’t make sense there either, because Supabase scored 4% on the first question (any work), but 5.9% on this question (extensive work.) Both questions use the term “extensive.” I’m lost.

So yeah, I just discarded that question and didn’t bother to think about the results. It doesn’t make sense, so I don’t trust it.

There was also a question about which cloud provider folks used, and AWS dominated the market:

That’s been my experience too – the vast, vast majority of my cloud clients are on AWS – but I’m mentioning it here because I know that Azure users really seem to believe Azure’s the only game in town. When I talk to Microsoft MVPs, they seem dumbfounded that companies are actually using AWS extensively, and they also seem surprised that Microsoft is competing with Google for second place. (Google’s been throwing a lot of discounted/free compute power at prospective clients to win them over.)

There’s a lot more stuff in the overall results, especially the workplace trends section that talks about employment status, hybrid/remote/in-office, and salary. When you’re looking at each workplace graph, make sure to click on the geographic filter at the top of that graph so the numbers will be more meaningful to you, based on where you’re located.


[Video] Office Hours in a Mexican Hot Tub

Videos
2 Comments

While taking a dip in Cabo, I went through your top-voted questions from https://pollgab.com/room/brento.

Office Hours in Cabo San Lucas

Here’s what we covered:

  • 00:00 Start
  • 01:00 OpportunityKnocking: What are your thoughts on when to implement a NoSQL strategy over traditional RDBMS for large enterprise-wide database platform solutions? I see scalability advantages in using NoSQL but do you see this becoming more of a preferred go-to solution for organizations over time?
  • 01:57 SwissDBA: In a table with 8M rows, 2 GB data + 4GB indexes, Clustered ix on a GUID column. INSERTs are slow bc SQL has to squeeze them in between existing rows. I would switch the clustered ix to a new IDENTITY column, so new records can be added at the end of the tbl. How would you do it?
  • 03:01 MyTeaGotCold: You recently said that your Hekaton clients are trying to get off. The faults of Hekaton are well-known, so what changed between when they started using it and now?
  • 03:54 Tim Rogers: 90 TB database in AWS, all data active. Tables partitioned across 100 filegroups (4 files each) for ease of index maintenance. Would like to consolidate to a single “data” filegroup with ~100 files, because we don’t do anything that benefits from multiple filegroups. Thoughts?
  • 06:30 Kevin M: Would like to learn Amazon Aurora despite current company not using Amazon stack. What’s the best / cheapest way to learn Amazon Aurora?
  • 07:47 SadButTrue: Hey Brent! Who’s the “Brento” in Postgres community?
  • 08:26 RadekG: When was the last time that you faced a query so poorly written that you had to have a drink before fixing it? What was so bad about it?
  • 11:19 Jökull: What are the top issues you see when storing / searching XML/JSON in SQL Server?

Query Exercise Answer: Fixing a Slow Computed Column

In last week’s Query Exercise, we added a user-defined function to the Users table to check whether their WebsiteUrl was valid or not. I noted that even with an index on Reputation, SQL Server 2022 simply ignored the index, did a table scan, and spent 2 minutes of time calling the user-defined function on a row-by-row basis.

First off, a disclaimer: I didn’t write that exercise with a goal of showing the difference between old & new SQL Server versions, between old & new compatibility levels. However, several folks in the comments said, “Hey, I’m on an older version, on an older compatibility level, and the query is blazing fast already. What do I need to fix?” That’s a good reminder of what work you need to do before you go live on SQL Server 2022 (or any new version.) Moving on.

One Fix: Persisting the Computed Column

One way to fix it is to drop the computed column, then create it again with the PERSISTED keyword:

That way, SQL Server computes each row’s contents just once, and then persists it in the table itself, instead of running the function every time we query it. By executing the above code, we get:

Wait, what? As a reminder, here’s the contents of our function:

HOW THE CELKO IS THAT NOT DETERMINISTIC? What, does string comparison change from time to time? Books Online’s article on determinism says “All of the string built-in functions are deterministic,” but I’m guessing this has something to do with a possibly changing collation, and I couldn’t care less about digging further. That’s left as an exercise for the reader. Moving on.

An Actual Quick Fix: WITH SCHEMABINDING

I used to hear people say, “You should put WITH SCHEMABINDING on your scalar functions and they’ll go faster.” Every single time I tried it, it made no difference whatsoever.

Until…last week! In the comments, Ag suggested to add that hint to the function, and it ran rapidly! We’ll remove the existing column, tweak the function – the only change is WITH SCHEMABINDING – and then add it back in:

Then rerun our query without changes:

Even on a large Stack Overflow database, the query runs in milliseconds and produces a great execution plan:

SQL Server uses the index only only reads a few hundred rows before the work is done. Sure, there’s no parallelism in the query, and the plan properties report that “TSQLUserDefinedFunctionsNotParallelizable”, but who cares? The query’s so fast, even with millions of candidate rows, that it doesn’t matter. Good enough, ship it.

My Old Faithful Fix: A Trigger

I know. I know, the word makes you break out in itchy hives, but triggers are a great way to quickly replace computed columns.

Let’s drop the existing computed column, add a new “real” column, configure the trigger to populate it going forward, and then backfill it to set up the existing data:

The first few statements will run instantly (assuming nobody else is locking the Users table), but the last statement will take a minute since it’s actually adding a new value to every row in the Users table, AND it’s calling a scalar function on every row. You could make that faster with set-based operations, but more on that in a minute.

After that, run the query we’ve been trying to tune:

It runs in a couple milliseconds (down from a minute) and uses the Reputation index, as the execution plan shows:

I like the trigger solution for a few reasons:

  • It keeps the function’s logic as-is, which can be helpful in situations where the logic is really complex, time-tested, and labor-intensive to rewrite.
  • It can be implemented really quickly.
  • Queries against the table can now go parallel if they need to.

But there’s a drawback: the trigger’s still getting called on every insert and update. If the workload involves batch jobs to load or modify the table, that’s going to be problematic because they’re going to slow down. Those modifications didn’t call the computed column’s function before since they most likely weren’t reading that column. If our workload includes batch jobs, we’re gonna need a faster solution. We’ll clean up after ourselves before trying the next one:

Rewriting the Function

Most of the time when you hear database people complaining about functions, they’re complaining about user-defined functions. For example, in this case, we can get a dramatic performance improvement if we forklift the business logic out of the user-defined function and inline it directly into the table’s definition itself. Here’s the end result:

Note that the logic isn’t identical because I had to tweak it: instead of using IF, I used CASE statements, and there’s no @Url parameter anymore because we’re referencing the WebsiteUrl column directly. The more complex your user-defined function is, the harder work this is going to be. To learn how to do these kinds of rewrites, check out Microsoft’s PDF whitepaper on Froid, the inline function technology they added in SQL Server 2019. That paper gives several specific examples of language to use when rewriting functions to go inline, which is quite nice of them.

Now, when the SELECT runs, the plan looks big:

And while the estimates aren’t good, the plan shape is fine, it uses the index, and it runs in milliseconds.

Eagle-eyed readers will note that I did not use the PERSISTED keyword in the above example. If you add that, the resulting plan shape looks even better because the function doesn’t have to run at read time:

And the query finishes in even fewer milliseconds. However, executing the ALTER TABLE with the PERSISTED keyword will lock & rewrite the table, so it’s a question of whether your workload can tolerate that outage. You could always start with the non-persisted version to quickly make the pain go away, and then later if you still need more performance, take an outage to drop the non-persisted computed column (which will be instantaneous) and then add the persisted one (which will be the opposite of instantaneous.)

Hope you enjoyed the challenge! For more exercises, check out the prior Query Exercises posts and catch up on the ones you missed so far.


Today’s a Good Day to Talk to Your Manager About Disaster Recovery.

Last night, two major IT disasters struck:

If you were affected by one of those outages, you have my warmest virtual hug. At times like this, the stress level can be really tough, and I hope you can take care of yourself. Remember that your own self-worth is not determined by the IT solutions you work on.

If you weren’t affected by one of those outages, it’s a good time to spend an hour writing up a few things:

  • Which of our production services are hosted entirely in a single region, availability zone, or data center?
  • How are we monitoring the status of that single point of failure? If there’s a widespread outage like that, how much time are we going to waste troubleshooting our own services when there’s a bigger problem?
  • When our single-region or single-AZ production services go down, what users/customers would be affected?
  • How will we communicate the outage to those affected users? Can we write that notification ahead of time so that it’s ready to go quickly in the event of the next disaster like this?
  • How much would it cost us (monthly or annually) to add in a second region or availability zone for protection from these kinds of incidents?

Summarize that, pass it up to your manager in writing, and it’ll help them have discussions this morning with their managers and executives. Today, a lot of business folks are going to be asking questions, and having these answers will help get you the resources you want.

(Or, it’ll help you feel more comfortable that the business understands the risks of putting all their eggs in a single basket, and that when that basket breaks, it’s not your fault. You warned ’em, and they chose not to spend the money to double-up on baskets.)


[Video] Office Hours in My Backyard

Videos
0

On a pleasantly mild morning, I sat down on the patio and took your top-voted questions from https://pollgab.com/room/brento.

Office Hours in My Backyard: SQL Server Q&A

 

Here’s what we covered:

  • 00:00 Start
  • 00:56 MyTeaGotCold: When going in to a database blind, do you worry at all about its compatibility level? I see a lot of unloved databases that are still on 2008’s level, but the tiny potential for breaking changes makes me scared to touch it.
  • 02:01 Vishnu: For boxed SQL, Is it ok for users to create SQL agent jobs that run periodic business logic and/or email end users?
  • 02:59 Karthik: What’s your opinion of live query plan viewing in SQL Sentry Plan Explorer?
  • 03:26 Jessica : I am DBA who previously managed dozens of Azure VMs running SQL Server AGs. After a layoff and new job I’m now managing 1 prod Azure SQL db. Any tips for someone making a transition like this? Every day I’m finding features that I no longer have access to and minimal monitoring.
  • 04:29 Hong Kong Phoey: Currently, when we see a non-clustered index on a given multi-tenant SQL table, we may or may not know why it’s there, who created it, when it was created or which app needs it. What’s the recommended change control process for answering these questions?
  • 05:35 Kevin M: What’s the best place to go for commercial PostgreSQL training?
  • 06:02 NotCloseEnoughToRetirementToStopLearning : Hi Brent Inheriting a VLDB (50Tb) any recommended articles or trainings for working with something this size?
  • 07:05 Mike: Hi Brent! When migrating from SQL Server 2017 to SQL Managed Instance, on MI databases after restore become FORCE_LAST_GOOD_PLAN = ON (On 2017, it was OFF). Do you recommend leaving it ON, or should we turn it OFF on initial stages ?
  • 08:27 Venkat: What’s the most common detrimental complacency you see with SQL dbas?
  • 09:13 Mattia: Can a big analytical query that does many logical reads (compared to the SQL Server RAM) give SQL Server Plan cache amnesia? Or is the Buffer Pool completely separated from the Plan Cache?
  • 09:48 Nickelton: Is it possible to forward select queries to AG secondary replica without changing application side? (connection string etc.) For example running code from SSMS or for legacy applications.
  • 11:22 WB_DBA: My friend suggests creating all indexes on a test database since it mirrors the production database. Is this a good approach?

Query Exercise: Fix This Computed Column.

Query Exercises
54 Comments

Take any size of the Stack Overflow database and check out the WebsiteUrl column of the Users table:

Sometimes it’s null, sometimes it’s an empty string, sometimes it’s populated but the URL isn’t valid.

Let’s say that along the way, someone decided to ask ChatGPT to build a function to check for valid website URLs, and then used that code to add a new IsValidUrl column to the Users table (and yes, this is inspired by a real-life client example, hahaha):

The user-defined function isn’t accurate, for starters – it’s letting things through that aren’t valid URLs, and stopping things that are actually valid – but let’s set that aside for a second.

What happens when we try to get the top users by reputation? To give SQL Server the best shot, I’m using SQL Server 2022, with the database in 2022 compatibility level, with an index on Reputation:

The actual query plan is deceivingly simple, despite its terrible performance that takes about a minute to run:

Crouching Tiger, Hidden Scalar

WHERE IS YOUR FUNCTION INLINING GOD NOW? I could make movie jokes about this all day. Anyhoo, the plan ignored the Reputation index, did a 2-second table scan, and spent nearly a minute doing the scalar function and the filtering.

To add insult to injury, if you’re going to do 1 minute of CPU work, it sure would help to parallelize that query across multiple cores – but that query can’t get parallelism, as explained in the plan properties:

SpacesNotAvailableEither

Your Query Exercise this week isn’t to fix the accuracy of the function – you can leave it as inaccurate if you like. Your challenge is to have the exact same query run in less than a second. Our goal is to avoid changing application code, and to get a very fast fix in place without blaming the developers. You’re the data professional: be professional.

Put your queries in a Github Gist, and include that link in your comments. Check out the solutions from other folks, and compare and contrast your work. After you’ve given it a try, read my thoughts in the following post. Have fun!

Update: please read the post in its entirety, and follow the instructions. Please don’t just throw ideas in there or half-formed T-SQL. For someone to test your work, they need to see your exact work. C’mon, folks – this isn’t a major project, just a single function. Be fair to people on the other side of the screen.


Updated, Larger Stack Overflow Demo Database

Stack Overflow
9 Comments

Stack Overflow publishes a data dump with all user-contributed content, and it’s a fun set of data to use for demos. I took the 2024-April data dump, and imported it into a Microsoft SQL Server database.

It’s an 31GB torrent (magnet) that expands to a ~202GB database. I used Microsoft SQL Server 2016, so you can attach this to anything 2016 or newer. If that’s too big, no worries – for smaller versions and past versions, check out my How to Download the Stack Overflow Database page.

Some quick facts about this latest version:

  • Badges: 51,289,973 rows; 4.7GB
  • Comments: 90,380,323 rows; 26.1GB
  • Posts: 59,819,048 rows; 162.8GB; 32.7GB LOB – this is where you’ll find questions & answers
  • Users: 22,484,235 rows; 2.6GB; 12.5MB LOB
  • Votes: 238,984,011 rows; 5.9GB – a fun candidate for columnstore demos

As with the source data, this database is licensed under cc-by-sa-4.0:  https://creativecommons.org/licenses/by-sa/4.0/ And to be very clear, this is not my data. The data and the below licensing explanation comes from the Stack Overflow Data Dump’s page:


But our cc-by-sa 4.0 licensing, while intentionally permissive, does require attribution:

Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Specifically the attribution requirements are as follows:

  1. Visually display or otherwise indicate the source of the content as coming from the Stack Exchange Network. This requirement is satisfied with a discreet text blurb, or some other unobtrusive but clear visual indication.
  2. Ensure that any Internet use of the content includes a hyperlink directly to the original question on the source site on the Network (e.g., http://stackoverflow.com/questions/12345)
  3. Visually display or otherwise clearly indicate the author names for every question and answer used
  4. Ensure that any Internet use of the content includes a hyperlink for each author name directly back to his or her user profile page on the source site on the Network (e.g., http://stackoverflow.com/users/12345/username), directly to the Stack Exchange domain, in standard HTML (i.e. not through a Tinyurl or other such indirect hyperlink, form of obfuscation or redirection), without any “nofollow” command or any other such means of avoiding detection by search engines, and visible even with JavaScript disabled.

This will probably be the last database update.

Prosus (a tech investment company) acquired Stack Overflow a few years ago for $1.8 billion. When a company’s founders sell their baby for money:

  • The new owners usually want to make a profit on their large investment, and
  • The new owners rarely share the same goals as the original founders, and
  • Sometimes the new owners spent way, way too much (hi, Elon) and are forced to make tough decisions to make their debt payments and keep the company afloat

So now Prosus wants to earn their $1,800,000,000 back, and they’re looking at the actual product they bought. StackOverflow.com has 3 components[1]:

  1. An online app that gives you good-enough answers, quickly
  2. The existing past answers already contributed by the community
  3. The potential of future answers continuing to go into the platform

Can Prosus compete on #1? No. Just no. Companies like OpenAI (ChatGPT), Google (Gemini), and Anthropic (Claude) simply have a better solution for #1, full stop, end of story. A web site – even a free one – can’t beat ChatGPT’s ability to integrate directly with your development environment, review your code & database, and recommend specific answers for the problem you’re facing. Game over.

Can Prosus compete on #2? No. The existing answers (as of April 2, 2024) are available for free with nearly no restrictions. The horse is already out of the barn. Moving on.

Can Prosus compete on #3? If ChatGPT and their friends win on #1 and #2, then the default place for developers to find answers is no longer the web browser. (It’s ChatGPT or Copilot or whatever). Whatever happens next is going to be intriguing. Today, you and I are conditioned to think, “I’ll post that question on Stack or a forum.” Tomorrow’s developers will not have that same bias:

  • Maybe the dev will prompt ChatGPT, “Can you find me answers online for this?” In that case, the LLM will search the web and summarize – and Prosus won’t stand a chance of convincing the user to post the question at StackOverflow.com.
  • Maybe the dev will open their web browser and ask the question. In that case, the search engine company will try to summarize answers too. These days, both Google and Bing try to avoid landing you on actual web sites, and try to give you the answers on their own pages instead, whether it’s AI-summarized answers or hallucinations or web page summaries next to each site.
  • Maybe the dev will go to the Github repo for the related project, and post a question there.

I don’t see an easy way for Stack Overflow to inject themselves into that workflow in the year 2030. I’m sad about that because I have a long personal history with Stack Overflow. At the same time, I’m also kinda glad that the original founders, employees, and advisors (me included) were able to cash out thanks to Prosus’s $1.8B overspending just before the generative AI boom hit.

Prosus needs solutions fast: Stack is now losing $150,000 per day. Prosus’s 2024 annual reports noted that Stack Overflow had $98M in incoming revenue – but lost $57M. I can understand why managers might flail at a company’s switches and dials trying to find a way to stop the financial bleeding.

One of the dials they’ve been flailing at is turning down community access to the past answer data, aka business part #2. In their minds, they’re trying to stop OpenAI/Google/Anthropic from making so much money on the back of Stack’s answers. Earlier this year, Prosus tried to pump the brakes on providing the data dumps in XML format on a regular basis, and there was some community outrage, so they relented. However, they’re back: last week, Prosus announced they’re limiting access again.

Based on what Prosus is saying in that post, going forward, I don’t think Prosus will approve of me redistributing new data dumps in a database format. I’m not going to waste time or energy fighting that battle – I’d rather they spent their own energy trying to figure out a way to keep StackOverflow.com a viable business concern going forward. Hopefully they find fun, productive ways to do that, ways that bring the community together onto Prosus’s side rather than turning consumers against Prosus.

However, if Prosus management is willing to limit the data dump, then I have a bad feeling that more barriers are coming over the years. Next, they’ll make answers harder to access for people who have an ad blocker, or who aren’t signed in, or who haven’t paid for a “premium” Stack membership. I’m not mad at them about this, because I don’t have any answers to turn the business around either, and I haven’t heard from anybody who does.

You either die a hero or live long enough to see yourself become the arch-enemy.


[1] Technically the company Stack Overflow has a couple other parts: advertising and Stack Overflow for Teams. Both of those business models are at risk due to AI as well. Their other attempts at diversification, like Articles and Jobs and Developer Story, never caught on.


[Video] Office Hours in Bulgaria

Videos
0

I went through your top-voted questions from https://pollgab.com/room/brento while in front of the National Gallery for Foreign Art in Sofia.

Office Hours in Sofia, Bulgaria

 

Here’s what we covered:

  • 00:00 Start
  • 03:12 MyTeaGotCold: If all of my columns are nvarchar, is there a performance benefit to always wrapping strings in N”? My tests have been inconclusive.
  • 03:50 SadButTrue: Hey Brent, most of our Azure SQL DBs have top wait stats related to parallelism (CX***) and performance is not great. As we cannot modify the cost threshold for parallelism in Azure SQL, what other techniques can we use to reduce the waits associated with parallelism?
  • 05:00 Dom: Hi Brent, I noticed something strange on a SQL Express 2012. I’m looking at the “Visible online” CPU and was expecting to see 4 (as Express limits to 4 cpu or 1 socket) but it shows 8 visible online cpu… Am I missing something or is my SQL Express really using 8 CPU ? Thanks !
  • 05:36 dba jr: hi Brent, in my company users can design any query they want. for example they can choose multi column for order by or in where clause . i mean queries in APP are not fix. but they tell me queries are slow. tables have million rows. how can I handle this.
  • 07:10 DBA in VA: SQL 2019: What causes a query not to use the execution plan I’ve forced in the query store? Isn’t that the whole idea of the forced plan??
  • 08:00 Ruby Sunday: Is creating a NC index that mirrors the clustered index to avoid blocking considered a bad practice?
  • 09:30 TheyBlameMe: Hi Brent. What’s you opinion of this MS recommendation to “prevent lock escalation” for long running batch operations? BEGIN TRAN; SELECT * FROM mytable (UPDLOCK, HOLDLOCK) WHERE 1 = 0; WAITFOR DELAY ‘1:00:00’; COMMIT TRAN;
  • 10:40 DanishDBA: Hi Brent, my friend needs to create an index on a highly used table on a SQL Server 2019 SE (FCI). On a copy of the db on the same server he knows that it can take up to 4 minutes. The goal is to minimize the impact, as a service window is not allowed. What is the best approach?
  • 11:48 JustWondering: Been suspicious that something “outside” of SS is the problem but don’t know how to prove it. Changed from using JTDS JDBC driver to MS JDBC driver and saw 40+% improvement in runtimes. How could I find what the MS driver is doing different vs JTDS? QryPlans seem the same. Thx.

Announcing Free SQL Server Monitoring.

SQL ConstantCare
8 Comments

You already use sp_Blitz and the rest of the free, open source First Responder Kit to give your SQL Servers a health check now and then.

But let’s be honest: you don’t do it often enough.

You wish you had an easier way to know when your backups stop working, when corruption strikes, when a poison wait is detected, or when a new SQL Server patch comes out for one of your servers.

SQL ConstantCare

Good news! We’ve now got a free version of SQL ConstantCare®!

SQL ConstantCare® is our simple monitoring product that you install on a jump box or VM. It connects to your SQL Servers, Azure SQL DB, Amazon RDS, etc. just once per day, gathers diagnostic data, and sends it to our processing servers in AWS. We generate a single daily email per server telling you specific, actionable tasks that will make your databases safer.

The full-blown $695/year version of SQL ConstantCare® gives you performance advice on stuff like query plans, indexes, memory settings, and wait stats, but there are a lot of folks out there who just don’t need that. They can’t fix the databases, queries, or indexes, and they just wanna know that the server is healthy.

That’s where our free health-only monitoring comes in. Sign up here for free, then download the installer, and follow the installation instructions. In a matter of minutes, you’ll get an email, and then again once a day – but only when we’ve got stuff you actually need to do on the server. Otherwise, we leave you alone to do the other important stuff you gotta do every day.

Why are we doing this? What’s the catch?

It’s easy for us to support because we’re not doing anything complex like putting agents on SQL Servers, or running 24/7 monitoring. The app just runs on your jump box as a scheduled task, once per day, and that’s it.

It’s cheap for us to provide because the health-only processing costs us less than $1 per monitored SQL Server per month. We designed SQL ConstantCare® to be serverless and cloud-native right from the start, and Richie’s done a lot of work keeping the code and database lean and mean, so scaling has been pretty easy.

Our back end is fully cloud-hosted, which means your diagnostic data goes up to our services in AWS. If you’ve got questions about how that works, check out how we collect, transmit, and store your data and the frequently asked questions. To help us with GDPR compliance, we automatically delete all data older than 30 days. We won’t ever offer a version of SQL ConstantCare® that you can host yourself – we rely on too many AWS services, and helping you set all that up and troubleshoot it would make the price tag crazy high. If you want your own private monitoring, you’re better off buying a conventional 24/7 monitoring app. Of course, those are more expensive, and priced per monitored server – but if you want that level of control, that’s the price you pay. (Literally.) We’re trying to help as many people as we can here, as inexpensively as possible.

The signup process is a little wonky, not as smooth as I’d like. When you click the signup link, you’ll be asked to sign up with an email or Google. If you’ve bought training classes or software from me in the last few years from training.brentozar.com, you can use the same login. If you hit any roadblocks signing up and you can’t figure out how to get past ’em, email Help@BrentOzar.com and include a screenshot of the full browser, including the URL you’re on.

I hope this helps make your job easier. Like the First Responder Kit, this here blog, and the tons of videos we put out across my YouTube channel, TikTok, LinkedIn, etc, I really want to help do as much as I can, for free. That way, when you need training or consulting, you’ll remember who loves ya, baby. Enjoy!


Use “We” Not “You”. #tsql2sday

Consulting Lines
26 Comments

For T-SQL Tuesday this month, Louis Davidson suggested we give our past self some advice.

I’d tell myself, “Use ‘we’, not ‘you’.”

For years, when I gave advice, I’d say things like:

  • “You’re doing A, when you should really be doing B instead.”
  • “Your code has a problem right here.”
  • “Your network settings are wrong, and you should change them to this instead.”

The very word ‘you’ sets up a confrontational tone that puts the recipient on the defensive. They can’t help but react by taking things personally. We’re just humans, meatbags of emotion.

Instead, use words like ‘we’ and ‘our’ that group us together. We’re on the same team, and our common enemy is technology. Dagnabbit, technology sucks hard. It’s always out to get us, to make our lives miserable, to refuse to work the way it says it’ll work in the manual.

Once we (see what I did there) get started using the term ‘you’ early on in our careers, it’s a really hard habit to break. I know that, because I’ve been trying to break it for years.

I do use the term ‘you’ a lot in blog posts and videos that are purposely designed to be confrontational and drive engagement. That’s on purpose. However, when I wanna give advice to someone on my own team, I try to remember that we are indeed on the same team, and I need to communicate that by using the word ‘we.’


[Video] Office Hours in Sofia, Bulgaria

Videos
0

I was honored to speak at the Present to Succeed conference in Sofia, Bulgaria, run by a former SQL Server MCM. Sofia is a beautiful city, and the gorgeous Patriarchal Cathedral of St. Alexander Nevsky (Wikipedia) was close to my hotel, so I dragged my tripod over there for an Office Hours session.

Office Hours in Sofia, Bulgaria

 

Here’s what we covered:

  • 00:00 Start
  • 01:19 Live is Life: After a migration to a new db, the old 3TB db is now read only. My friend is enabling row compression on the biggest tables/indexes and already got down to 1,5TB. The goal is to get faster reads and a smaller db. What are your thoughts about this? Is there a better way?
  • 02:40 MyTeaGotCold: Is Enterprise Edition generally seen as the norm? Blogs and particularly the official docs rarely point out that something isn’t available in Standard, but I’ve gone my entire career without seeing it.
  • 03:20 EthicalDBA: Hey Brent, have you ever faced any moral/ethical issues in your DBA career that caused you to really question the task you were working on?
  • 04:35 With NeinLock: Greetings! As someone who is a household name when it comes to the SQL Server community, have you noticed members of the younger generation joining the community to give back and share knowledge? If yes, are there any you recommend following?
  • 08:22 Roger ap Gwilliam: What’s the scariest RDBMS you have worked with?
  • 08:58 DT_DBA: Do you think using the FORCESEEK hint “everywhere” is okay? (same as how some people use NOLOCK). I have a client that has started doing this. They aren’t specifying which index to use, just doing things like “select * from table WITH (FORCESEEK) join view WITH (FORCESEEK) …”
  • 09:40 handysql: Helmet on the shelf. Is that for show or do you have a harley/rocket in the garage?
  • 11:35 Mike: Is there a good criteria that can be used to tell if a query is OLTP or OLAP query ? Is it number of seconds (duration, or cpu time), or number of logical reads, or something else ?
  • 12:48 VegasDBA: Hi Brent! Ever do any big physical to virtual conversions? I’ve been tasked with a very aggressive timeline to convert several physical SQL AGs to VMs. I was considering using AGs or distributed AGs to fail them over. Was curious of your thoughts and opinions.
  • 13:46 Mike: Is there a good up-to-date article that describes differences between Junior, Middle and Senior DBA in SQL Server (and maybe Azure SQL ?). And where can you develop next after becoming “Senior” ?
  • 15:53 Davros: When a traditional clustered index table starts to having too many indexes is this an indication that the table should be using column store?

Save $350 on the PASS Summit if You Register Now.

#SQLPass
0

Next Tuesday, pricing on the 3-day tickets for the PASS Data Community Summit goes up to $2,095.

But if you register right now, plus use coupon code BRENTO24, it’s just $1,745.

It’s one in-person conference that covers Microsoft SQL Server, Azure, PostgreSQL, Snowflake, Oracle, and more. So many of us (me included!) are working with multiple databases these days, and it’s hard to find a single event with this kind of coverage.

I’m also teaching a 1-day pre-conference workshop on Monday, Tuning T-SQL for SQL Server 2019 and 2022. That’s an additional $595, and you can also sign up for another pre-conference workshop on Tuesday. There are great options in there including Query Quest with Erik & Kendra, Microsoft Fabric in a Day, PostgreSQL Fundamentals, Power BI Architecture, and a SQL AI Workshop run by Microsoft folks.

Go register now, and I’ll see you in Seattle!