Blog

  1. Home
  2. Blog
  3. Page 2

Office Hours, ChatGPT Edition: AI Answers 4 of Your Questions

In a recent Office Hours post, David Reabow suggested letting ChatGPT answer a few of the questions y’all posted at http://pollgab.com/room/brento/. Brilliant! Let’s see how it does. I purposely cherry-picked the questions that I thought ChatGPT would have the easiest time answering.

Rollback is single threaded asks: Hi Brent! An update statement has blocked all other sessions. What should I do? Killing it, restarting SQL server service, waiting, etc. What’s your opinion? Thanks!

ChatGPT answered:

If an update statement has blocked all other sessions, there are a few options to consider:

  1. Monitor the situation: If the update is taking a long time, monitor it to see if it’s making progress. If the update is hung, it may eventually clear up on its own.
  2. Kill the blocking query: If the update is not making progress, you can use the KILL command to stop the blocking query. This will allow other sessions to proceed, but any data changes made by the blocked update will be rolled back.
  3. Restart the SQL Server service: Restarting the SQL Server service can clear up any blocked sessions, but this should be done as a last resort because it will cause an interruption to all other sessions.
  4. Wait: In some cases, waiting is the best option. If the update is expected to run for a long time, and there are no pressing issues with other sessions being blocked, waiting may be the best course of action.

In any case, it is advisable to take a backup of the database before taking any action to avoid data loss.

Brent says: bullet points 1 and 4 are basically the same. I abhor bullet point 3 because it doesn’t add anything that bullet point 2 wouldn’t have done. It was on the right track – bullet points 1 and 2 are good starting points.


 

Peter Riis: Hello Brent. We got a database storing mainly images in BLOB. Storing BLOB’s in the database got several disadvantages, so we thought about implementing FILESTREAM. What are your opinion on storing files in a database?

ChatGPT says:

Storing large files like images in a database can have the following disadvantages:

  1. Increased database size: Storing large files in a database can significantly increase its size, making it slower to manage and query.
  2. Slower performance: Querying and retrieving large files from a database can be slower than accessing them from a file system.
  3. Limited scalability: Storing large files in a database can limit scalability as the database grows in size and complexity.

On the other hand, FILESTREAM is an option that allows you to store large binary data (BLOBs) in a database, while maintaining the efficient streaming access to that data. FILESTREAM integrates the SQL Server database engine with the NTFS file system to store and manage large BLOB data.

In conclusion, whether to store large files in a database or not depends on the specific requirements and use cases of the application. If efficient streaming access to the large files is required, FILESTREAM can be a good option. However, if the focus is on performance, scalability, and managing large files efficiently, storing the files outside the database in the file system might be a better solution.

Brent says: nailed it. 


 

Stone Tablet Monk: When creating a new index, what is your opinion of using OPTIMIZE_FOR_SEQUENTIAL_KEY for minimizing last page insert contention? Is this a good solution? Is this the only solution?

ChatGPT says:

The OPTIMIZE_FOR_SEQUENTIAL_KEY option can be used when creating a new index to minimize last page insert contention, by prioritizing index page allocation for new data rows at the end of the index. This can help reduce index fragmentation and improve insert performance.

However, this is not the only solution to minimize last page insert contention. Other solutions include:

  1. Increasing the fill factor: The fill factor determines the percentage of space that is reserved on each index page for future growth, and increasing it can reduce the frequency of index page splits.
  2. Using a clustered index: A clustered index physically orders the rows of a table, so new rows are added to the end of the table.
  3. Monitoring and defragmenting indexes: Regular monitoring and defragmentation of indexes can help maintain their efficiency and reduce the frequency of page splits.

In conclusion, using OPTIMIZE_FOR_SEQUENTIAL_KEY is a valid solution for minimizing last page insert contention, but it should be evaluated along with other options to determine the best approach for a specific scenario.

Brent says: ChatGPT is simply flat out wrong here. Fill factor is a setting that isn’t honored during inserts. Clustered indexes are irrelevant here, too. This answer illustrates one of the challenges with AI – it can’t synthesize information on brand new terms and topics where there isn’t already conventional wisdom for it to analyze. Now, for the answer: what I’d ask is, how do you know that you have last page insert contention? This typically shows up at over 1,000 inserts per second, sustained – and most people I see asking this question, don’t even have the problem.


 

Piotr: When are SSMS client statistics helpful in troubleshooting query performance issues?

ChatGPT says:

SQL Server Management Studio (SSMS) client statistics can be helpful in troubleshooting query performance issues in the following situations:

  1. Long-running queries: If a query is taking a long time to complete, client statistics can provide information on the execution time, wait time, and other details about the query.
  2. Slow network performance: If a query is running slow due to network latency, client statistics can show the time spent sending and receiving data between the client and the server.
  3. High CPU or memory usage: Client statistics can help identify queries that are consuming high amounts of CPU or memory on the server, allowing you to focus your performance optimization efforts on the most resource-intensive queries.
  4. Blocked queries: If a query is blocked by another query, client statistics can provide information about the blocking and blocked sessions, which can help in identifying the root cause of the problem.

To enable client statistics in SSMS, right-click the query window and select “Include Client Statistics” in the context menu. After the query has completed, the client statistics will be displayed in the results pane.

Brent says: again, ChatGPT is just bullshitting here, and most of this is just flat out wrong. Client stats don’t show wait time or blocking queries. As to the real answer – for me, client statistics haven’t been useful.

 

Here’s the problem with ChatGPT.

It speaks with the same level of authority no matter how much or little confidence it has in its answers. It never says “I think” or “I suspect” or “Perhaps” or “I don’t know.” It just confidently struts right onstage and speaks loudly.

I stopped after four questions because I think these 4 really sum up the problem with ChatGPT today.

If you have tech questions, ChatGPT could theoretically be useful if you already know the right answers, and you can weed out the garbage. But … if you already know the right answers, what’s the point?

I do think AI has tons of good uses, like writing real estate listing text, but answering database questions isn’t one of ’em.


[Video] Office Hours: Ten Minutes of SQL Server Answers

Videos
2 Comments

ChatGPT, Resource Governor, manually created stats, Always Encrypted, and as always, fragmentation: let’s answer your questions from https://pollgab.com/room/brento.

Here’s what we covered today:

  • 00:00 Start
  • 01:13 Timbalero: Hi Brent. my friend knows your view on rebuilding indexes. He also thinks that external index fragmentation affects pretty much only readahead scans. For scientific purposes, what metrics should he look at to see if defrag makes a difference (if only marginal)?
  • 02:21 chandwich: Hey Brent! What kind of advantages (or disadvantages) do you anticipate with the recent emergence of ChatGPT, specifically in the SQL world. Have you used it?
  • 04:23 BrentsFastCars: Hi Brent, I have been reading one of your posts about running SQL Server in a virtual environment. You talk about when there are more cores than Standard allows and using affinity masking to disable cores. Have you seen your customers disable hyperthreading as another solution?
  • 05:40 Bill Bergen: Brent…I have to say it again….you are a genius….now to the question….is there a way to correctly and completely use TSQL to script out all parts of resource governor for migration to another server
  • 06:48 TomInYorks: Hi Brent. What arguments are there for and against manually creating statistics on every column of every table when auto create/update statistics options are enabled?
  • 09:15 Wasn’t_Me: A software company produce a software for compensations and bonuses. They cannot use their own software internally otherwise some employees could see the compensations of other employees. Can Always Encrypted be the solution and who should own the keys? The CEO?
  • 12:25 Life-long Learner: My friend asked me the other day while we were talking about archiving a 2TB Audit Table, to reduce its size and we asked what was better, rebuild the indexes or drop and recreate? Thank you very much for everything, I love your courses!!!

Bite-Sized Office Hours: Q&A on TikTok

Company News
9 Comments

Wanna learn about SQL Server and the Microsoft data platform, but you don’t wanna sit through long videos?

Enjoy short videos on TikTok?

I’ve got just the thing: I’m taking the best Q&A from Office Hours and putting ’em out as individual videos. That way, as you’re swiping through practical jokes, friendship goalscandid idiocy, music reimaginations, and dog tips, you can also learn a little in bite-sized chunks, too.

@brentozarultd

Do you have any resources/tips on a new dba needing to inventory the servers/instances/databases at a company? I have around 30 servers that I want to start documenting that currently have 0 physical documentation. Thanks Brent! #sqlserver #sql #dba #database #microsoft #hacks #tips #brentozar #azure

? original sound – Brent Ozar Unlimited

For those tech answers, follow @BrentOzarULTD on TikTok. Or, if you prefer stalking my personal life, my own account is @BrentOzar.


[Video] Office Hours: Ask Me Anything About SQL Server

Videos
3 Comments

Y’all just never run out of interesting questions at https://pollgab.com/room/brento! I’m impressed, got another great round today.

Here’s what we covered:

  • 00:00 Start
  • 00:25 Chrisbell: Recently we’ve been facing thread starvation issues. How can we troubleshoot it when even sp_whoisactive ( even with DAC ) is unresponsive and takes forever to return a result set ? How can we get to the root cause of this since its all happening within a few seconds ?
  • 01:24 Lu: We disable TLS 1.0 and 1.1 on our windows servers; how to affect the SQL server and SSIS jobs?
  • 05:19 Macrieum: Hi Brent, I am at the beginning of my sql career and trying to set up select query across 50+ sites/servers that all share a vpn/domain. 4+ hours of work/week saved if the query returns to a central location. Can you point me in a starting direction? Thank you
  • 06:32 RoJo: What is best way to update/patch with least downtime? patching current box seems risky.
  • 07:41 Don’tBotherAsking: Hey Brent. Love your work. We have a database in which all PII (names, birthdates, etc) is column-level encrypted. Query performance is getting worse over time–presumably because it has to decrypt more and more data. Is there any way to optimise queries on encrypted columns?
  • 09:25 Kalfr: What are the best options for oopsy query recovery in SQL 2019 Enterprise AG shops?
  • 11:28 pete: How important are non-equality columns in indexes after all they might follow 2-4+equality columns
  • 12:42 Safraz: Hey Brent. From your weekly links, I discovered that EVE ONLINE used SQL Server 2005 at one point and they documented their issues with scaling hardware to improve performance. I’m not sure if this is still the case but have you ever consulted for them?
  • 13:30 Piotr: What are the top causes for SQL log shipping breakages?
  • 14:17 Robbaco: Could 2 Create Index statements in the same DB and same schema lock each other (no FK relations) because of inserts/updates/deletes on the sys-tables? could data for unrelated objects change in the sys-tables (something like a reorg)?
  • 15:26 PhilRich: I have an automated process (weekly) that restores backups (Full/Diff/Logs) to a test server – runs DBCC – and then reports on all the databases. Is there any point in running DBCC on the production servers?
  • 18:22 Dru: What’s the best monitoring software for vanilla PostgreSQL, Aurora PostgreSQL, and Azure PostgreSQL?
  • 19:08 Mike: Sorry if this has been asked before, but – if you were to explain to your boss who does not understand who Database Administrator is – and what he does – how would you do it ?
  • 20:35 Kevin: What’s your least favorite part of being a consultant? what advice would you give to someone who wants to follow your footsteps? Thanks

The 20th Anniversary of the SQL Slammer Worm

SQL Server
8 Comments

Twenty years ago this month (next Wednesday to be exact), sysadmins and database administrators started noticing extremely high network traffic related to problems with their SQL Servers.

The SQL Slammer worm was infecting Microsoft SQL Servers.

Microsoft had known about it and patched the problem 6 months earlier, but people just weren’t patching SQL Server. There was a widespread mentality that only service packs were necessary, not individual hotfixes.

The problem was made worse because back then, many servers were directly exposed to the Internet, publicly accessible with a minimum amount of protection. Since all the worm needed was access to port 1434 on a running SQL Server, and many folks had their servers exposed without a firewall, it spread like wildfire.

Even if only one of your corporate SQL Servers was hooked up to the Internet, you were still screwed. When that server got infected, it likely had access to the rest of your network, so it could spread the infection internally.

So what have we learned in 20 years?

In terms of network security, a lot. I don’t have raw numbers, but it feels like many, many more client servers are behind firewalls these days. But… like with the original infection, all it takes is just one SQL Server at your shop to be infected, and if that one can talk to the rest of the servers in your network, you’re still screwed if something like Slammer strikes again.

In terms of patching SQL Server, to be honest, I don’t think we’ve learned very much. Most of the SQL Servers running SQL ConstantCare still aren’t patched with the latest Cumulative Updates, and many of them are several years behind in patching.

We’re just hoping that the worst bugs have been found, and no new security bugs are getting introduced.

Hope is not a strategy. Patching is. Patch ’em if you’ve got ’em.


[Video] Office Hours: Back Live on Twitch Again

Videos
3 Comments

After I stopped selling live classes, I took some time off all live broadcasting period. It was a nice couple of months over the holidays, had a good time with the family, and now I’m starting to fire up my Twitch channel again.

I’m not setting a schedule yet, just broadcasting when I have time available, so if you want to get alerted when I start streaming, subscribe to that channel and turn on notifications.

Here are the questions we discussed:

  • 00:00 Start
  • 06:17 JimLic: In moving to new physical servers with virtual disks which are claimed to be ‘better’ than our old physical disks, replicas are showing super slow redo queues. What is the importance of block size across replicas? Anything we configuration with SQL and/or disk? Any Tests?
  • 08:20 Chris Stoll: Do you have any resources/tips on a new dba needing to inventory the servers/instances/databases at a company? I have around 30 servers that I want to start documenting that currently have 0 physical documentation. Thanks Brent!
  • 09:54 Mike: I’ve never used a Mac nor Azure Data Studio. Will Mac and ADS be enough to perform my DBA duties, or should I restrain to Windows and SSMS due to ADS missing something important?
  • 10:55 TK_Bruin: Yo Brent! What would you say are the top 2 or 3 functions or features that SQL Server has added over the last 5 years that have been the most transformative or promising in terms of business benefits?
  • 13:18 Ted Striker: Any tips or gotchas when tuning queries that use OpenQuery to run a remote OleDB SQL query?
  • 14:21 Roger Murdock: What are the best/ worst filegroup design strategies you see in the wild?
  • 15:48 chandwich: Hey Brent. I just completed your Fundamentals of and Mastering classes, but I haven’t applied it all directly to my job yet. How would you recommend I show this off on my resume?
  • 17:00 Dru: When default invocation of sp_whoisactive takes 3.5 minutes to produce a resultset, what is the first thing you would look for in those results?
  • 17:38 Nick12: Hi Brent. How’s your week? Is there a way to avoid an eager spool for the halloween problem in a simple UPDATE query that sets a column to a fixed non-null value and filters for that column being null?
  • 20:56 Wren: Would you recommend using a surrogate key similar to a row-id (autoincrement integer) even if there is a usable unique PK column in a table?

How to Install SQL Server and the Stack Overflow Database on a Mac

SQL Server 2022
7 Comments

To follow along, you’ll need:

  • An Apple Mac with an Apple Silicon processor (M1, M2, etc – not an Intel or AMD CPU)
  • Azure Data Studio
  • Docker Desktop 4.16.1 or newer
  • An Internet connection

In Docker Desktop, go into Settings, Features in Development, and check the box for “Use Rosetta.” That’s the new 4.16 feature that allows you to run Intel-focused apps like Microsoft SQL Server on Apple Silicon.

1. Download & Run the SQL Server Container

We’ll follow the instructions from Microsoft’s documentation, but I’m going to abbreviate ’em here to keep ’em simple. Open Terminal and get the latest SQL Server 2022 container. You can run the below command in any folder – the file isn’t copied into your current folder.

That’ll download the ~500MB container, which takes a minute or two depending on your Internet connection. Next, start the container:

Don’t use an exclamation point in your password – that can cause problems with the rest of the script. (Frankly, to keep things simple, I would just stick with upper & lower case letters plus numbers.)

In the above example, the container’s name will be sql1. If you decide to get fancy and change that, remember the name – you’ll need it later.

You’ll get an error about the platform – that’s okay, ignore it. Docker Desktop will show the container as running:

And in less than a minute, you can connect to it from Azure Data Studio. Open ADS and start a new connection:

  • Server name: localhost
  • Port: 1433
  • Username: sa
  • Password: the one you picked above

With any luck, you’ll get a connection and be able to start running queries. To have fun, we’re going to want a sample database.

2. Download & Restore the Stack Overflow Database

Again, I’m going to abbreviate and change Microsoft’s documentation to keep things simple. Open Terminal and go into a folder where you’d like to keep the backup files. In my user folder, I have a folder called LocalOnly where I keep stuff that doesn’t need to be backed up, and I have Time Machine set to exclude that folder from my backups.

If you don’t have a folder like that, you can just go into your Downloads folder:

Download the Stack Overflow Mini database, a small ~1GB version stored on Github:

Make a backups folder inside your Docker container – note that if you changed the container name from sql1 to something else in the earlier steps, you’ll need to change it here as well:

Copy the backup file into your container:

In Azure Data Studio, restore the database:

Presto – you now have the Stack Overflow database locally.

3. Stop & Start the Docker Container

When you want to stop it, go to a Terminal prompt and type:

Because you’re exceedingly smart and almost sober, you can probably guess the matching command:

And yes, the database will still be there after you stop & start it.

For Mac Users, This is a Godsend.

Because this just works, at least well enough to deal with development, blogging, demoing, presenting, etc. For those of us who’ve switched over to Apple Silicon processors, this is fantastic. I love that I can work on the First Responder Kit without having to fire up a Windows VM.

This isn’t for production use, obviously, and it’s not supported in any official way. In this post, I didn’t touch on security, firewalls, SQL Agent, other versions of SQL Server, performance tuning, memory management, or anything like that, nor do I intend to get involved with any of that in Docker anyway.

This particular combination of technologies (plain Docker running SQL Server for Linux on a Mac with Apple Silicon processors) is brand spankin’ new as of last week, so there isn’t anything out on the web in the way of troubleshooting. However, if you want to learn more about the two components that are probably the most new to you (Docker and SQL Server for Linux), subscribe to Anthony Nocentino of Nocentino.com. He’s the go-to person for SQL Server on containers & Linux. He’s got several Pluralsight courses on these, too.


[Video] Office Hours: Bad Hair Edition

Videos
8 Comments

I am waaaay overdue for a haircut, but instead of being a responsible adult, I stopped to take your questions from https://pollgab.com/room/brento.

  • 00:00 Start
  • 00:43 Mike: We have 3 Dell PowerEdge R630 servers with SQL Server installed. Everything functions for 3.5 years straight. How long is it expected to work?
  • 01:30 Shalom: What are the worst incidents that you have witnessed due to SQL errors being routed to a mail folder that nobody ever reviewed?
  • 02:40 TheBigMC: Hi Brent. I’m about to start a new job where I’ll be looking after 100 SQL Servers. I’ve been told that’s a guess. How can I reliably scan a network to find servers people don’t even know exist
  • 05:00 Steph: Hi Brent, what are your top most dangerous seemingly benign SSMS menu items for which you shouldn’t approach your mouse pointer when connected on a prod database (for instance I once misclicked on ‘Fragmentation’ in ‘index properties’ on a prod db…). Thanks.
  • 06:02 Tim: Hi Brent. With Windows Server 2022 you can set the allocation unit size of a disk up to 2M. Is 64k still the best practice for SQL Server?
  • 07:58 DGW in OKC: Do people actually use Identity columns any more? What are the pros and cons of this practice?
  • 08:26 IndexingForTheWin: Hi, the company I now work for has taken the decision since 3 years to completely stop index rebuilds and only do stats updates. Wouldn’t we benefit from rebuilds (perhaps yearly)?
  • 09:18 Hamish: What are the pros/cons of using TSQL PRINT for debugging sprocs vs using table variables for debugging sprocs?
  • 09:44 Max: A friend of mine ask, what is better – add a bit field and index it on VLTB (over 2 Tb in size) with 60+ fields and 13 indexes already OR create a new table to store PK values of rows which have value 1 for this new field? Thanks
  • 12:10 John Bevan: Q1. When you have SQL running on an AzureVM, is it acceptable to use the D drive (i.e. low latency but ephemeral) over an attached disk for the TempDB?

Office Hours: Bad Questions Edition

Videos
4 Comments

Normally, y’all post and upvote great questions at https://pollgab.com/room/brento, but in today’s episode, y’all upvoted some stinkers. Buckle up.

  • 00:00 Start
  • 00:47 SQLKB: Hi, according to sp_BlitzCache I usually have more than 260k plans in cache, created in the past 1 hour, is it a big number? Comparing number of plans from exec_query_stats vs exec_cached_plans the numbers are 260k vs 130k , what could cause the diff between those numbers?
  • 02:23 Chase C: What do the options under linked server provider settings mean? “Allow inprocess” is frustratingly un-googleable and my technical manuals for SQL Server are also rather short on content. Cheers!
  • 04:09 sELECT RAM: You mentioned that PLE is useless recently. Why? and what is the alternative?
  • 05:21 Call Me Ishmael: Will SQL Server ever mandate the semi-colon as a statement terminator?
  • 07:35 Mike: What do you think about running SQL Server in Kubernetes for Production workloads in year 2023?
  • 09:07 Yitzhak: You once used a nice analogy in relating pilots to air planes and DBA’s to SQL Servers. Will you please share that again?
  • 10:47 Haddaway: When moving large tables to a new file group, does it ever make sense to do the migration with bcp command line vs using TSQL to copy the data to new location via insert?

3 Ways to Debug T-SQL Code

T-SQL
37 Comments

Writing new code = bugging. That part’s easy.

Taking those bugs back out, that’s the hard part.

Developers are used to their tools having built-in ways to show what line of code is running now, output the current content of variables, echo back progress messages, etc. For a while, SQL Server Management Studio also had a debugger, but it was taken out of SSMS v18 and newer versions. Even when it was around, though, I wasn’t a big fan: SQL Server would literally stop processing while it stepped through your query. This was disastrous if your query was holding out locks that stopped other peoples’ queries from moving forward – and you just know people were using it in production.

I do wish we had an easy, block-free way of doing T-SQL debugging in production, but T-SQL debugging is different than debugging C# code. So if your T-SQL code isn’t doing what you expect, here are a few better ways to debug it.

Option 1: Use PRINT statements.

Since the dawn of time, developers have put in lines like this:

So that when the statement fails, they can at least see which part failed:

There are a few problems with this approach:

  • PRINT doesn’t output data immediately. SQL Server caches the data that needs to be pushed out to the Messages. If you’re troubleshooting a long-running process, you probably want to see the messages show up immediately, as soon as they’re executed.
  • PRINT pushes data out over the network whether you want it or not, adding to the overhead of your commands. This isn’t a big deal for most shops, but when you start to exceed 1,000 queries per second, you’ll want to shave overhead where you can. You only really want the debugging messages coming out when you need ’em.

Let’s raise our game with RAISERROR.

Option 2: Use RAISERROR, pronounced raise-roar.

What? You didn’t notice that it’s misspelled? Okay, confession time, I didn’t realize that either – Greg Low of SQLDownUnder pointed it out to me. Let’s add a little more complexity to our code:

I’ve added a @Debug parameter, and my status messages only print out when @Debug = 1. Now, in this example, I don’t really need a parameter – but in your real-world stored procedures and functions, you’re going to want one, and you’ll want the default value set to 0, like this:

That way, you only turn on the debug features manually when you need ’em, but the app doesn’t call @Debug, so it just gets left at its default value, 0.

I’ve also switched to RAISERROR instead of PRINT because RAISERROR has a handy “WITH NOWAIT” parameter that tells SQL Server to push out the status message to the client right freakin’ now rather than waiting for a buffer to fill up.

When you’re troubleshooting long or complex processes, you’re probably going to want to dynamically drive the status message. For example, say it’s a stored procedure that takes hours to run, and you wanna see which parts of it took the longest time to run. You’re not gonna sit there with a stopwatch, and you’re not gonna come back later hoping that the queries will still be in the plan cache. Instead, you wanna add the date/time to the RAISERROR message.

Unfortunately, RAISERROR doesn’t support string concatenation. Instead, you have to pass in a single string that has everything you want, like this:

Which gives you the date at the end of the output:

You can even pass multiple arguments in – check out the RAISERROR syntax for more details on how the arguments work.

Option 3: Use Table Variables.

You’ve probably heard advice from me or others warning you that table variables lead to bad performance. That’s true in most cases – although sometimes they’re actually faster, as we discuss in the Fundamentals of TempDB class. However, table variables have a really cool behavior: they ignore transactions.

So even though I did a rollback, not a commit, I still get the contents of the table variable:

This is useful when you’re:

  • Troubleshooting a long-running process
  • The process has try/catch, begin/commit type logic where something might fail or roll back
  • Desiring the results in tabular format, possibly even with multiple columns, XML, JSON, whatever

And there you have it – 3 ways to work through debugging without using the discontinued SSMS Debugger. I typically use RAISERROR myself – it’s easy enough to implement, and it’s a technique you’ll use forever. There are more ways, too, and you’re welcome to share your favorite way in the comments.


How to Find Missing Rows in a Table

T-SQL
21 Comments

When someone says, “Find all the rows that have been deleted,” it’s a lot easier when the table has an Id/Identity column. Let’s take the Stack Overflow Users table:

It has Ids -1, 1, 2, 3, 4, 5 … but no 6 or 7. (Or 0.) If someone asks you to find all the Ids that got deleted or skipped, how do we do it?

Using GENERATE_SERIES with SQL Server 2022 & Newer

The new GENERATE_SERIES does what it says on the tin: generates a series of numbers. We can join from that series, to the Users table, and find all the series values that don’t have a matching row in Users:

The LEFT OUTER JOIN seems a little counter-intuitive the first time you use it, but works like a champ:

What’s that, you ask? Why does GENERATE_SERIES have fuzzy underlines? Well, SQL Server Management Studio hasn’t been updated with the T-SQL syntax that came out in the last release.

Thankfully, Microsoft separated the setup apps for SSMS and the SQL Server engine itself for this exact reason – the slow release times of SSMS were holding back the engine team from shipping more quickly, so they put the less-frequently-updated SSMS out in its own installer.

(Did I get that right? Forgive me, I’m not a smart man.)

Using Numbers Tables with Older Versions

If you’re not on SQL Server 2022 yet, you can create your own numbers table with any of these examples. Just make sure your numbers table has at least as many rows as the number of Ids you’re looking for. Here’s an example with a 100,000,000 row table:

Then, we’ll use that in a way similar to GENERATE_SERIES:

That produces similar results, but not identical:

What’s different? Well, this method didn’t include 0! When I populated my numbers table, I only built a list of positive integers. The single most common mistake I see when using numbers tables is not having thorough coverage of all the numbers you need. Make sure it goes as low and as high as the values you need – a problem we don’t have with GENERATE_SERIES, since we just specify the start & end values and SQL Server takes care of the rest.

If you’d like to dive deeper into other ways to solve this problem, Itzik Ben-Gan’s chapter on Gaps & Islands will be right up your alley. Me, though, I’ll call it quits here because I’m in love with GENERATE_SERIES to solve this problem quickly and easily. Also, I’m lazy.


The SQL Server Posts You Read the Most in 2022

Company News
3 Comments

Here’s what I wrote in 2022 that gathered the most views:

Evergreen Posts You Kept Reading

These aren’t posts I wrote in 2022 – they’re older posts that have stood the test of time, and keep showing up in Google results. These tutorial posts aren’t often the favorites of readers when the post first goes live, but they’re the kinds of posts that bring in new readers over time. I’ve gradually updated a lot of these (even if I wasn’t the original author) because they’re consistently popular.

Not only is it hard to write posts like this initially, but it takes work to continue to refine the content over time, adding in the kinds of key words and content that people are searching for. I actively prune some of ’em, and some of them were perfect when they were published.

Top YouTube Videos You Watched

My YouTube channel got 560,871 views in 2022, with 101,500 watch hours and 39,000 subscribers. I have to confess that I do a terrible job of reminding viewers to smash that like button and hit the bell to be notified when new posts go live. I’m just not that kind of host – yet, hahaha.

Here’s to a productive 2023 where I share a lot and y’all learn a lot!


Announcing the 2023 Data Professional Salary Survey Results.

Salary
9 Comments

Are your peers being paid more this year? Are they switching job roles? Are they planning on leaving their companies? To find out, I run a salary survey every year for folks in the database industry. Download the raw data here and slice & dice ’em to see what’s important to you.

Salaries are on the rise again this year:

If I filter the data for just the United States, they’re on the rise too:

And if I filter the data for just DBAs, the job title I’m usually most interested in, United States DBAs are on the rise by about 6% this year:

Nice! Congrats on the raises, y’all. (On the other hand, if you’re reading this, and you didn’t get a raise, now’s the time to download this data and start having a discussion with your manager.) You might also be considering switching jobs – so which job titles are getting the most this year?

Architects, app code developers, and managers are doing well. Note, though, that I’m filtering on just United States folks here, so the survey sample size is getting smaller. Plus, I would never make long term career decisions based on money alone – you might not be happy doing a particular job. (I hated management.)

Is any one job title growing at the expense of others? Are people leaving DBA work en masse and switching to something else? No, the mix by job title has been roughly the same over the years, despite what you might have heard from Thought Leaders™:

So why do you hear Thought Leaders™ saying one job role or another is dead, and another one is on fire? Because it’s a short-term publicity trend, sometimes as short as months, where a few publication outlets run the same stories and build buzz around something in order to get clicks.

Is there a mass resignation coming? What are peoples’ career plans for 2023, and have those numbers changed from prior years?

Most of you still want to stay with the same employer, in the same role. Well, in that case, here’s to a stable, safe 2023 for y’all.


Who’s Hiring in the Microsoft Data Platform Community? January 2023 Edition

Who's Hiring
11 Comments

Was your New Year’s resolution to get a new job? Heads up: the comments in this post are for you.

Is your company hiring for a database position as of January 2023? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


[Video] Office Hours: Holiday Speed Round

Videos
0

Beep beep! Here’s a speed round of Office Hours where I rip through a dozen questions in under ten minutes. Want to see your own questions answered? Post ’em and upvote the ones you like at https://pollgab.com/room/brento.

  • 00:00 Start
  • 00:22 DB-A-Team: Love and appreciate your work and please don’t make fun of my question. I want to create a special DB related project in my workplace next year, that can give an extra value like automate code deployment using CI/CD, any more ideas you have as someone who sees a lot of clients?
  • 01:03 TooSarcastic: Any tips or advice to keep a poker face when hearing non-sense from clients or coworkers?
  • 01:31 Haydar: When performing the ‘A’ of D.E.A.T.H., how do you avoid adding beneficial index for costly query that is infrequently executed? 02:04 Crypto Bro: Do you see any good / common use cases for new SQL 2022 Ledger functionality?
  • 02:27 Rick James: What is your opinion of AlloyDB for PostgreSQL from Google? Is this the Aurora killer?
  • 03:03 Herb: What is your opinion of new sp_invoke_external_rest_endpoint functionality in Azure SQL DB?
  • 03:47 Clarence Oveur: Are SQL CLR Udf’s any better / more desirable than scalar Udf’s?
  • 04:18 neil: I just noticed developers have been setting Read Committed Snapshot to ON on their databases without telling anyone. Should I be concerned?
  • 04:38 Dru: Is coding first responder kit for multiple versions of SQL painful like coding JavaScript for multiple browser versions?
  • 05:12 NeverEndingView: If you are tuning a view that you cannot get to complete or show an actual execution plan, will adding a TOP give you the same plan as running without? I let the query run for 19 hours and never received a plan.
  • 05:46 Bob the Builder: What is the largest SQL Server single DB size you have ever seen? What challenges does that large size present?
  • 06:14 Curious DBA: Hi Brent. I know you no longer get involved with DR scenarios, but was wondering if you ever encountered a scenario where a Suspect database couldn’t be brought into Emergency Mode? A friend encountered this scenario recently but was able to recover from backups. Thanks.
  • 06:38 RSS_Fees: Alberto Morillo mentioned on StackOverflow a tool called Data Sync Agent for SQL Data Sync. I never heard about it before but apparently you can use it to migrate or sync data from on-prem to Azure SQL Database. Have you ever used it?

SQL ConstantCare® Population Report: Winter 2022

Ever wonder how fast people are adopting new versions of SQL Server, or what’s “normal” out there for SQL Server adoption rates, hardware sizes, or numbers of databases? Let’s find out in the winter 2022 version of our SQL ConstantCare® population report.

Out of 3,679 monitored servers, here’s the version adoption rate:

The big ones:

  • SQL Server 2019: 32% – taking the lead from SQL Server 2016 for the first time!
  • SQL Server 2017: 20%
  • SQL Server 2016: 27%
  • That combo is 79% of the population right there (83% with Azure), and it supports a ton of modern T-SQL, columnstore, etc features, so it’s a fun time to be building apps with T-SQL

Companies are leapfrogging right past SQL Server 2017. I’m going to hazard a guess that SQL Server 2017 came out too quickly after 2016, and didn’t offer enough features to justify upgrades from 2016.

Does that offer us any lessons for SQL Server 2022? Is 2022 going to be a 2017-style release that people just leapfrog over? Well, as I write this, it’s late December 2022, and I’m not seeing the widespread early adoption that I saw for 2019 where people had it in development environments ahead of the release, learning how to use it.

Me personally, one of the most awesome features of 2022 is the ability to fail back and forth between SQL Server and Azure SQL DB Managed Instances. However, that feature is still in limited public preview that requires a signup. Combine that with the fact that both 2022 and Managed Instances have really low adoption rates, and … I just don’t think this feature is going to catch on quickly. (As a blogger/speaker/trainer, that’s useful information, too – I only have so many hours in the day, and I gotta write material for things I think people are actually going to adopt.)

Okay, next up – adoption trends over time. You’re going to be tempted to read something into this chart, but I need to explain something first: we saw a huge drop in Azure SQL DB users for SQL ConstantCare. In the past survey, we had exactly 500 Azure SQL DBs being monitored – and this round, it dropped to just 64. I talked briefly with a couple of the SaaS customers who stopped monitoring their databases, and they both said the same thing: “We’re not going to change the app’s code or indexes based on what you found, so we’re not going to monitor it further.” That’s fair – throwing cloud at it is a perfectly legit strategy. So now, having said that, let’s see the trends:

This quarter’s numbers are a little misleading because it looks like SQL Server 2019 stole Azure’s market share – but now you know why. If I look at pure installation numbers (not as a percentage):

  • We finally have a customer using SQL Server on Linux in production! It’s only one, but … still, I’m excited about that because I can dig into their diagnostic data and figure out which recommendations aren’t relevant for them.
  • Azure SQL DB Managed Instances stayed steady (but it’s still a tiny number relative to the overall population)
  • SQL Server 2019 definitely grew, and every other version went down
  • There are only 6 instances of SQL Server 2022 (and several of those are Development Edition)

The low early adoption rate of 2022 is more interesting to me when I combine it with another number: Availability Groups adoption is 26%, kinda. Of the production SQL Servers that can (2012 & newer, non-Azure, etc), 26% have turned on the Always On Availability Groups feature. Note that I didn’t say 26% of databases are protected, nor 26% of data volume – just 26% of the servers have the feature turned on, period, and that says something because the feature isn’t on by default, and requires a service restart to take effect.  Actual databases protected is way, way less.

One of SQL Server 2022’s flagship features is Managed Instance link, the ability to fail over databases back & forth between your SQL Server 2022 instances and Azure SQL DB Managed Instance. In theory, that’s awesome. In practice, I’ve never seen a concise live demo setting it up, failing it over, and failing it back. The setup part 1 and part 2 doesn’t look terrible, and the failover looks fairly straightforward, but … there are no docs on troubleshooting it. Between the low adoption rates, the complexity of existing AGs, the complexity of cloud networking, and this brand new feature, I’m … not ready to dig into Managed Instance link anytime soon.

I totally appreciate those of y’all who have the guts to try it, though, especially in production. I think that kind of thing is the future of hybrid databases. Looking at the current population numbers, though, it’s a pretty far-off future.


[Video] Office Hours: Ham Pillow Edition

Videos
2 Comments

Y’all post questions at https://pollgab.com/room/brento and upvote the ones you’d like to see me discuss, and then I artfully dodge giving you answers. At least, that’s how it feels sometimes, hahaha:

Here’s what we discussed in today’s episode:

  • 00:00 Start
  • 00:20 Piotr: Do many of your clients disable SA account for security? What are your thoughts on this practice?
  • 01:32 I was never given a name: What are your thoughts on using AI to generate SQL queries? OK for ad-hoc reporting, not for production? Specifically Ask Edith which is geared towards transforming English to SQL and ChatGPT.
  • 02:55 chandwich: Hey Brent! What’s your “go to” note taking app? I know you use Markdown, but is that all you use to improve note taking?
  • 03:24 Ólafur: What are some good ways to identify all NC indexes that could exceed the max key length of 1700 bytes in SQL 2019?
  • 04:06 Namor: Should SQL affinity mask be used/configured when SSRS / SQL Server are running on the same server so that each process gets it’s own processor?
  • 05:01 Sune Berg Hansen : Hey Brent, What features are no longer worth investing time in learning from a Admin or developer perspective? I have a coworker who still uses Server Side Tracing.
  • 05:57 Gustav: If you had to give a razzie award for worst performing cloud storage for VM SQL, which cloud vendor would win the award?
  • 06:57 Stockburn: Hi Brent, I assume you have needed to use a plan guide at some point in your career but at what point in a performance investigation do you decide this is the way to solve the problem? As always thank you for all you do for us SQL folk!
  • 08:58 Sune Berg Hansen : Yo Brent, What are your top 3 favorite movies?
  • 09:15 Lazze-SeniorAppDeveloper-JuniorDBA: Hi, We have a server with high cpu load ( 8 cores ) and high wait stats for parallelism(sql Enterprise 2019) + CPU Yield. MaxDOP 8, CTFP 50 – I’m thinking about decreasing MaxDop to 4 or maybe even 2, to leave more cores free to run other queries and help the wait stats?
  • 10:30 Yitzhak: How do you determine the optimal autogrowth size for a given data file? One server is on expensive SAN storage while the other is on cheaper cloud storage.

Should You Use SQL Server 2022’s GREATEST and LEAST?

SQL Server
2 Comments

If you’ve been following along with this week’s posts on DATETRUNC and STRING_SPLIT, you’re probably going to think the answer is no, but bear with me. It’s Christmas week, right? The news can’t all be bad.

GREATEST and LEAST are kinda like MAX and MIN, but instead of taking multiple rows as input, they take multiple columns. For example:

Produces 3 and 1. This actually has really useful real-world implications.

Let’s take the Stack Overflow database, and let’s say I want to find any posts (questions or answers) that had recent activity. This is surprisingly difficult because Posts has 2 date columns: LastEditDate, and LastActivityDate. You would think that LastActivityDate would be the most recent, but you would be incorrect – when posts are edited, the LastEditDate is set, but LastActivityDate is not.

So if I was looking for any Posts that were active – either via reader activity, or edits – in a date range, I used to have to build supporting indexes, and then write queries like this:

I’m using 2018-05-27 because my copy of the Stack Overflow database’s last activity is in 2018-06-03. Depending on which version you’re using, if you’re trying to reproduce these results, pick a date that’s within the last week of activity

So, what’s better – the old way or the new way? Like your hips, the actual execution plans don’t lie:

The old way smokes the SQL Server 2022 way. I mean, it’s not even close. The old way splits up the work into two index seeks, one on LastActivityDate and one on LastEditDate. It finds the recent stuff, does the appropriate key lookups, and it’s done in one second.

The new way does a table scan and takes a minute and a half.

Its estimates and its index usage are just garbage, and I don’t mean like Shirley Manson.

Again – they’re not necessarily bad functions, but they have no business in the FROM clause and below. Don’t use them in WHERE, don’t use them in JOINS, just use them to construct output, variables, and strings.


Should You Use SQL Server 2022’s STRING_SPLIT?

T-SQL
2 Comments

SQL Server 2022 improved the STRING_SPLIT function so that it can now return lists that are guaranteed to be in order. However, that’s the only thing they improved – there’s still a critical performance problem with it.

Let’s take the Stack Overflow database, Users table, put in an index on Location, and then test a couple of queries that use STRING_SPLIT to parse a parameter that’s an incoming list of locations:

The two queries produce slightly different actual execution plans, but the way STRING_SPLIT behaves is the same in both, so I’m just going to take the first query to use as an illustration:

That red-highlighted part has two problems:

  1. SQL Server has no idea how many rows are going to come out of the string, so it hard-codes a guesstimate of 50 items
  2. SQL Server has no idea what the contents of those rows will be, either – it doesn’t know if the locations are India, China, or Hafnarfjörður

As a result, everything else in the query plan is doomed. The estimates are all garbage. SQL Server will choose the wrong indexes, process the wrong tables first, make the wrong parallelism decisions, be completely wrong about memory grants, you name it.

Like I wrote in this week’s post about DATETRUNC, that doesn’t make STRING_SPLIT a bad tool. It’s a perfectly fine tool if you need to parse a string into a list of values – but don’t use it in a WHERE clause, so to speak. Don’t rely on it to perform well as part of a larger query that involves joins to other tables.

Working around STRING_SPLIT’s problems

One potential fix is to dump the contents of the string into a temp table first:

And the actual execution plan is way better than the prior examples. You can see the full plan by clicking that link, but I’m just going to focus on the relevant STRING_SPLIT section and the index seek:

This plan is better because:

  • SQL Server knows how many rows are in #LocationList
  • Even better, it knows what those rows are, and that influences its estimate on the number of users who live in those locations, which means
  • SQL Server makes better parallelism and memory grant decisions through the rest of the plan

Woohoo! Just remember that temp tables are like OPTION (RANDOM RECOMPILE), like I teach you in this Fundamentals of TempDB lecture.


Should You Use SQL Server 2022’s DATETRUNC?

T-SQL
10 Comments

SQL Server 2022 introduced a new T-SQL element, DATETRUNC, that truncates parts of dates. For example:

Truncates everything in that date other than the year, so it returns just 2017-01-01 00:00:

You might ask, “Well, why not just use YEAR()?” That’s a good question – there are times when you need a start or end date for a date range, and this could make it easier than trying to construct a full start & end date yourself.

Easier for you, that is – but not necessarily good for performance. Let’s take the Stack Overflow database, Users table, put in an index on LastAccessDate, and then test a few queries that are logically similar – but perform quite differently.

And check out their actual execution plans:

The first one, passing in a specific start & end date, gets the best plan, runs the most quickly, and does the least logical reads (4,299.) It’s a winner by every possible measure except ease of writing the query. When SQL Server is handed a specific start date, it can seek to that specific part of the index, and read only the rows that matched.

DATETRUNC and YEAR both produce much less efficient plans. They scan the entire index (19,918 pages), reading every single row in the table, and run the function against every row, burning more CPU.

SQL Server’s thought process is, and has always been, “I have no idea what’s the first date that would produce YEAR(2017). There’s just no way I could possibly guess that. I might as well read every date since the dawn of time.”

That’s idiotic, and it’s one of the reasons we tell ya to avoid using functions in the WHERE clause. SQL Server 2022’s DATETRUNC is no different.

So why doesn’t Microsoft fix this?

YEAR and DATETRUNC are tools, just like any other tool in the carpenter’s workshop. There are lots of times you might need to manipulate dates:

  • When constructing a dynamic SQL string, and you want to build a date – sure, using a function to build the WHERE clause string is fine. Just don’t use the function in the WHERE clause itself.
  • When constructing the contents of variables
  • When constructing the output of the query – sure, using a function like this in the SELECT is fine, because it doesn’t influence the usage of indexes in the query plan

DATETRUNC in the SELECT isn’t so bad.

Let’s use it in the SELECT clause to group users together by their last access date. Say we want a report to show trends over time. Here are two ways to write the same basic idea of a query:

The two queries do show the date in two different ways, but the UsersInvolved count is the same – it’s just different ways of rendering the same data:

When you review their actual execution plans, the first one (YEAR/MONTH) is much more complex, and goes parallel to chew through about 4 seconds of CPU time:

Whereas the new DATETRUNC syntax has a cool benefit: it only produces one value (the date), and the data in the index is already sorted by that column. Because of that, we don’t need an expensive sort in the execution plan. And because of that, we don’t need parallelism, either, and we only chew through about two seconds of CPU time. Nifty!

So should you use DATETRUNC? Like with most functions, the answer is yes in the select, but probably not in the FROM/JOIN/WHERE clauses.