Blog

What’s New in SQL Server 2022 Release Candidate 0: Undocumented Stuff

SQL Server 2022
24 Comments

Microsoft has an official list of what’s new in 2022 overall, but here I’m specifically focusing on system objects that might be interesting to script developers like you and I, dear reader.

New stored procedure sp_get_table_card_est_and_avg_col_len – I assume the “card” refers to statistics and cardinality, not Hallmark. SQL Server has historically struggled with memory grants because it uses datatype size to budget memory grants, and bigger-than-necessary data types (like NVARCHAR(MAX)) have led to larger-than-necessary grants. It’d be cool if this proc was a down payment to mitigate that problem, but I have a sinking feeling it has to do with external data sources. I would tell you more about it, but when I run it, I get:

New view sys.dm_tran_orphaned_distributed_transactions – every now and then, I’ve run across Availability Groups replicas with problems due to orphaned distributed transactions consuming DTC resources or holding locks. This new undocumented DMV might be a down payment to resolve that problem. I don’t have an easy way to reproduce the problem quickly, so I can’t demo it.

New view sys.database_automatic_tuning_configurationsthis one’s a little odd because Books Online tells me it’s been around since SQL Server 2017, but I don’t remember seeing it before, and it’s not in my 2019 test instances. Tells you if Force_Last_Good_Plan is on, and I would imagine that down the road, as more automatic tuning options might come out over the next several releases, this might have more info.

New Query Store DMV columns – now that Query Store is starting to work on read-only replicas, looks like they added plan_persist_plan_feedback.replica_group_id, plan_persist_query_hints.replica_group_id to support those goals. Plus plan_persist_plan_forcing_locations gets columns for timestamp and plan_forcing_flags.

New spinlock troubleshooting – sys.dm_os_workers gets columns for spinlock_wait_time_ms, spinlock_max_wait_time_ms, and spinlock_wait_count.

New stuff to support offloaded compression

This stuff needs its own section. RC0 introduced the ability to offload compression to Intel processors equipped with QuickAssist.

We get new sp_configure options for ‘hardware offload mode’ and ‘backup compression algorithm’. By default, these are off. To turn on offloaded compression, install the Intel QAT drivers, then do an alter:

Which returns:

After restarting the SQL Server, check this brand spankin’ new DMV:

And, uh, on my VM, it’s still not enabled:

Because you can enable it even on processors that don’t support it, which strikes me as kinda odd. I suppose you would want to make it part of your standard build, and then whenever it’s available, it’ll get used, assuming you call for offloaded backup compression in the right way.

New messages in RC0

In each release, I check sys.messages for new stuff. Some of this stuff gets added for the cloud, like Azure SQL DB or Managed Instances, so read these with a grain of salt. Here’s what’s new in RC0, new from the last CTP:

  • 1136: The tempdb has reached its storage service limit. The storage usage of the tempdb on the current tier cannot exceed (%d) MBs.
  • 5373: All the input parameters should be of the same type. Supported types are tinyint, smallint, int, bigint, decimal and numeric.
  • 5374: WITH clause is not supported for locations with ‘%ls’ connector when specified FORMAT is ‘%ls’.
  • 16722: Cannot change service objective for %ls to %ls as long-term retention is not supported yet on Hyperscale. Please disable long-term retention on the database and retry
  • 17414: Retrieving the address of an exported function %.*ls in accelerator library %.*ls failed with error 0x%x.
  • 17415: %.*ls component enumeration failed with zero component count.
  • 17416: %.*ls component enumeration failed with mismatch in component count.
  • 17417: %.*ls %.*ls not compatible with SQL Server.
  • 17418: Detected %.*ls %.*ls.
  • 17419: %.*ls hardware detected on the system.
  • 17420: %.*ls hardware not found on the system.
  • 17431: %.*ls initialization failed with error %d.
  • 17432: %.*ls initialization succeeded.
  • 17433: %.*ls session creation failed with error %d.
  • 17434: %.*ls session sucessfully created.
  • 17435: %.*ls will be used in hardware mode.
  • 17436: This edition of SQL Server supports only software mode. %.*ls will be used in software mode.
  • 17437: %.*ls will be used in software mode.
  • 17438: %.*ls session alive check failed with error %d.
  • 17439: %.*ls session tear down failed with error %d.
  • 17440: %.*ls session close failed with error %d.
  • 17441: This operation requires %.*ls libraries to be loaded.
  • 19713: Statistics on virtual column are not avalable.
  • 19714: Number of columns in PARTITION clause does not match number of partition columns in Delta schema.
  • 21093: Only members of the sysadmin fixed server role or db_owner fixed database role or user with control db permission can perform this operation. Contact an administrator with sufficient permissions to perform this operation.
  • 22786: Synapse workspace FQDN is not in the list of Outbound Firewall Rules on the server. Please add this to the list of Outbound Firewall Rules on your server and retry the operation.
  • 22787: Change feed table group limit of %d groups exceeded
  • 22788: Could not enable Change Feed for database ‘%s’. Change Feed can not be enabled on a DB with delayed durability set.
  • 25755: Could not create live session target because live session targets are disabled.
  • 31633: The length of the provided %ls exceeds the maximum allowed length of %u bytes.
  • 31634: The %ls must contain a ‘%ls’ for use with managed identity.
  • 31635: The %ls’s ‘%ls’ value must be a %ls for use with managed identity.
  • 31636: Error retrieving the managed identity access token for the resource id ‘%ls’
  • 33547: Enclave comparator cache failed to initialize during enclave load.
  • 39057: The value provided for the ‘%.*ls’ parameter is too large.
  • 39058: The parameter ‘%.*ls’ has a type that is not supported.
  • 45770: Failed to move the database into elastic pool due to internal resource constraints. This may be a transient condition, please retry.
  • 46552: Writing into an external table is disabled. See ‘https://go.microsoft.com/fwlink/?linkid=2201073’ for more information.
  • 46553: Create External Table as Select is disabled. See sp_configure ‘allow polybase export’ option to enable.
  • 46953: Pass through authorization using S3 temporary credentials is not supported. Please use S3 credentials to access storage.
  • 47507: Adding memory optimized files to the database replicated to Azure SQL Managed Instance is not supported because its service tier does not support In-memory OLTP capabilities. Consider replicating database to managed instance service tier supporting In-memory OLTP capabilities.
  • 47508: Adding multiple log files to the database replicated to Azure SQL Managed Instance is not supported because managed instance does not support multiple log files.
  • 47509: Adding FileStream or FileTables to the database replicated to Azure SQL Managed Instance is not supported because managed instance does not support FileStream or FileTables.
  • 47510: Adding multiple memory optimized files to the database replicated to Azure SQL Managed Instance is not supported because managed instance does not support multiple memory optimized files.

If any of those messages are interesting to you, feel free to leave a comment about it.

New database-scoped configuration options

These are all new since SQL Server 2019 – some were introduced in prior CTPs, but I’m mentioning them all here because there’s good stuff in here for query tuners:

  • 25 – PAUSED_RESUMABLE_INDEX_ABORT_DURATION_MINUTES
  • 26 – DW_COMPATIBILITY_LEVEL
  • 27 – EXEC_QUERY_STATS_FOR_SCALAR_FUNCTIONS
  • 28 – PARAMETER_SENSITIVE_PLAN_OPTIMIZATION
  • 29 – ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY
  • 31 – CE_FEEDBACK
  • 33 – MEMORY_GRANT_FEEDBACK_PERSISTENCE
  • 34 – MEMORY_GRANT_FEEDBACK_PERCENTILE_GRANT
  • 35 – OPTIMIZED_PLAN_FORCING
  • 37 – DOP_FEEDBACK
  • 38 – LEDGER_DIGEST_STORAGE_ENDPOINT
  • 39 – FORCE_SHOWPLAN_RUNTIME_PARAMETER_COLLECTION

That last one’s particularly interesting to me because SQL Server 2019 originally shipped in a way that you could see runtime parameters in sys.dm_exec_query_statistics_xml, and then they turned it off around CU11-12 without documenting the changed behavior. That was a total bummer, because that feature was a lifesaver for troubleshooting parameter sniffing. I’m hoping we can get that back again.


How to Install SQL Server 2022 Release Candidate 0

SQL Server 2022
0

SQL Server 2014’s end of support date is coming in less than two years: July 9, 2024. I know that sounds far-off right now, but consider the work you have to do between now and then:

  • Pick a new version to migrate to
  • Set up development SQL Servers for that new version
  • Start developing & testing with it
  • Get users to sign off that it’s working as expected
  • Design your high availability & disaster recovery strategies for it (something I recommend folks revisit with each new release)
  • Set up production/HA/DR SQL Servers
  • Migrate over to the new servers

With that in mind, if you’re still running SQL Server 2014 today in production, now is the time to start moving your development servers over to the newer SQL Server version. I’ve got a post on which SQL Server version you should use, but if you’re just now starting a 2014 replacement plan, then you’re the kind of shop that doesn’t upgrade very often. (Which is cool! Change = risk.) In that case, I’d think about using SQL Server 2022 to get the longest lifespan you can out of your SQL Servers.

The first release candidate for Microsoft SQL Server 2022 is out, so let’s see what’s involved with installing it.

Start with a fresh, empty Windows VM for testing. Never install test/evaluation bits on your laptop or an existing SQL Server – it’s a recipe for trouble. Pre-release bits (and even release bits!) can cause side-by-side existence problems that you don’t want to have to waste time troubleshooting.

When the Windows VM is ready, download the installer here. When you run it, the Installation Center opens:

SQL Server Installation Center

Click the Installation tab at left:

And for a standalone evaluation/development server, click the first line for a new standalone installation. (Most of the rest are just hyperlinks to go download other things that aren’t included in the SQL Server installer.)

The installer launches, and it’s time to choose your character:

Your choices:

  • Evaluation Edition – just like Enterprise, but it times out after 180 days.
  • Developer Edition – just like Enterprise, no time bomb, but isn’t allowed to be used for production purposes. Frankly, you’d be suicidal to use Evaluation Edition for production purposes too, since it has that time bomb. You wanna choose Developer because your evaluation period is probably going to extend beyond 180 days, and you don’t wanna have to worry about time bombs.
  • Express Edition – only for tiny databases.

Choose your character – I’m going with Developer – and then hit next.

Let’s not pretend you’re going to read that, but I’ll point out one amusing section:

The $5 limit is because starting with SQL Server 2022, Microsoft is using Fiverr for development, testing, and support. The most they can get back from Fiverr for a refund is the $5 they spent on the contractor, so they gotta cap their losses. Otherwise, they’d go bankrupt from KB4538581 alone.

Accept the terms, and hit Next, and the installer will do some basic validation to make sure the VM is in a safe place:

In my case, the Windows Firewall warning is because Windows Firewall is enabled. If you click the Warning link, you’ll get instructions to set up an exclusion rule to let SQL Server traffic pass through the firewall. If you plan to test apps hitting this server, make a note that you’ll need to change the Windows Firewall settings later – but you can ignore this for now during setup, and circle back to it later.

When you hit Next, you get the first new screen:

As strange as it may seem, Microsoft is on a mission to get even more money out of you than just the SQL Server licensing costs. They want you to pay ongoing fees to use Azure services to manage your SQL Servers, regardless of where they live. If you want to burn money, buy candles.

Me personally, I can’t afford that and it’s irrelevant to my testing, so I’m going to uncheck that and click Next. Now, it’s time to pick features.

Install as few features as practical. Every box you check here has a performance cost. Many of them will consume CPU and memory even when they’re not being actively used.

The Features list in that screenshot looks like it’s cut off, like there’s more stuff below “Redistributable Features”. In RC0, there’s nothing else to see – there’s nothing below that. It’s a bug. What, you thought the product was ready? Buckle up, bud – it’s the first of many.

In that feature list, the only thing I’m going to check is Database Engine Services. I’m not a fan of SQL Server’s full text search, although I understand why people use it. Wanna check any other box on a production SQL Server? You should pause for a moment and reconsider the life choices that brought you here. Wanna check them just because it’s a dev box? Bad idea: if you make something available in development, then developers will come to rely on it, and then they’ll build code atop it, and need it in production. In terms of features, treat your dev servers like production.

After checking the Database Engine Services box, the Instance Root Directory option will light up:

You can leave that one on the C drive – that’s fine. We’ll set the data & backup locations in subsequent steps. Click Next.

You can install multiple instances of SQL Server on the same Windows VM. I’m actually going to choose Named Instance because I need this for something else I’m doing, but you should leave it on Default.

When you click Next, there will be a significant pause before the next screen appears:

If you’re installing just for the purposes of testing 2022, you can stick with these service account names. Check the box for Grant Perform Volume Maintenance Tasks – that’s Instant File Initialization, which helps restore databases much more quickly. That’s important when you’re doing testing on new versions.

If your application needs case sensitivity or a different collation, click the Collation tab:

If you’re testing to replace an existing server, connect to that server with SQL Server Management Studio. Right-click on the server name, click Properties, and the existing server’s collation will be listed on the properties screen:

Back on your SQL Server 2022 setup, choose the collation that matches your existing production server, and click Next.

On the Server Configuration tab, the first decision you have to make is whether to allow SQL authentication. Over time, trends have gone back & forth as to whether this was a good idea. We were trying to stamp it out for a while, but then Azure SQL DB came out and it didn’t support Windows auth, so Microsoft had to backtrack the idea that SQL authentication was bad. These days, Azure SQL DB supports Windows auth (although it’s often a pain in the rear), so “experts” are all over the place as to whether or not you should have it on.

Most of the apps I interact with require it, so I’m going to set Mixed Mode, and then set a strong password for the SA account.

You have to take action in the “Specify SQL Server administrators” box. If you don’t, nobody’s going to be an admin on the instance, and you’re going to be in a world of hurt when you go to actually use this server. At bare minimum, add yourself by clicking the “Add Current User” button, which will take a few seconds. For real production purposes, you’ll want to use an Active Directory group consisting of database administrators.

Click the Data Directories tab, and you’ve got some work to do:

You have to take action here too. If you leave the “Data root directory” to be the OS boot drive, then sooner or later someone’s going to create or restore a database, pour a lot of data into it, and run the OS boot drive out of space. Choose a different volume for the database files to live.

I’m using a server with a Z drive, so after I type in “Z:\” in that top box and hit Tab, here’s what my screen looks like:

For my purposes, that’s fine. Click the TempDB tab:

If you have standard file sizes that you use, feel free to change those. For me, 8MB file size and 64MB autogrowth is awfully small. Even on dev/test servers, I go with 1024MB initial size and autogrowth sizes for both data and log files, like this:

Then click the MaxDOP tab:

SQL Server’s gotten a lot better over the last couple of releases at setting the right Maximum Degree of Parallelism (MaxDOP or MAXDOP, depending on where in that screenshot you look – consistency is the hobgoblin of little minds.)

Click the Memory tab:

Click the Recommended radio button, or else SQL Server will drain your memory dry. Just as a side note, Max Server Memory doesn’t actually mean the maximum – there are things SQL Server will use memory above and beyond that maximum. We’re just setting a ceiling to make sure things don’t go absolutely crazy.

Don’t click the FILESTREAM tab. I know it’s in all caps, which means Microsoft is yelling at you, imploring you to use SQL Server as a really expensive file server. It’s a bad idea. Save your lunch money, and put files where they belong.

Click Next, and decision time is over – just click Install:

After installation finishes, you can use SQL Server Management Studio to connect. You don’t have to download a new version – SSMS 19 isn’t required.


[Video] Office Hours: Ask Me Anything About SQL Server

Videos
0

Post your Microsoft data platform questions at https://pollgab.com/room/brento and upvote the ones you’d like to see me answer. I sat down by the pool in Cabo with a cup of coffee to go through ’em:

Here’s what we covered in this episode:

  • 00:00 Introductions
  • 00:43 MergeItLikeItsHot: Hi Brent, do you have any good resources to look at when planning DR procedure for sql server and for Azure sql database specifically?
  • 01:46 Neil: whats your approach to automating restores from prod to test ? how do you select the last backup ? currently i set up a linked server to query msdb on the prod server and find latest backup history. but i want to get rid of all linked servers. is there a better way ?
  • 04:00 Donovan: Is SQL lock pages in memory a best practice or edge case for modern SQL Server and Windows OS? If edge case, when should it be used?
  • 06:06 Trushit: Were you always fluent and comfortable speaking in front of a camera? Any tips for someone who feels awkward ? Even when I listen to the recording of own voice, it sounds so different, in a bad way, than what I hear when I am speaking to someone else.
  • 08:20 CJ Morgan: Have you ever worked w/Bidirectional replication? I ask because we have a client that wants an updateable copy of their database up in Azure and aside from a VM running SQL in Azure, the only replication we see as being supported for and updateable “subscriber” is Bidirectional.
  • 10:00 Eduardo: Do you keep any interesting stats on the questions that are asked each episode (e.g. percent of questions that are new each episode, percent of questions for topic ABC, etc)?
  • 10:20 ILoveData: Hello Brent, just curious what the ad revenue on your small YouTube channel looks like. Is that something you would be willing to share?
  • 11:15 Haydar: Does high VLF count / large transaction log file size affect log shipping performance?
  • 13:13 Seshat: What are the pros / cons of using native SQL backup with multi terabyte DB’s (11+ terabytes)?

Your Turn: I’m Not Answering These 13 Office Hours Questions

Normally, after I do a round of Office Hours of going through the questions that got posted at https://pollgab.com/room/brento, I answer the highly upvoted ones and then clear the queue. If y’all didn’t upvote questions, then I don’t answer ’em.

However, today I’m trying something different: I’ll post the non-upvoted questions here for y’all to see if there’s something you want to answer in the comments. I’ll number ’em so y’all can refer to ’em easier in the comments.

  1. GI Joe DBA: Thoughts on contract to hire jobs? I don’t like them, I want to be hired FT immediately. I think agencies use them to squeeze $$ from the placement but, there’s no guarantee I will be hired and it’s a hassle switching benefits. I’ll only consider it if pay is higher than average.
  2. lockDown: Hi brent, we work on a multitenancy model sql DB and we need to split the data to 2 Dbs, since there multiple apps writing to the DB & the Devs are asking for a schema lock on the (2k) schemas that will be moved. I feel like stoping writes should be done on the App am i right?
  3. Jarkko: Should there be any concerns when a query runs in 2 seconds but consumes 16 seconds worth of CPU time (SQL 2019 Enterprise, 64 cores)?
  4. RonS: Hello Brent, Moving to 2019 SS Standard/HA. 7 databases that need to talk with each other. Standard edition requires each DB to have its own IP address. Current environment all the DB’s live in one server and can see each other (db.schema.table). Linked servers right direction?
  5. CPL: Hi Brent .. What is the process that actually updates indexes? e.g. table with 3 NC indexes & all three indexes contain Col1. If one of these indexes is used where we then update Col1 how are the other 2 indexes subsequently updated? Is it still via the transaction log?
  6. Wenenu: What are the top SQL index naming schemes you see in the field? How did the various schemes originate?
  7. Anatoli: CONTAINSTABLE and FREETEXTTABLE seem to hide the logical reads when statistics io is enabled. Is there an alternate way to see the logical reads incurred by these TVF’s? Using SQL 2019 – 2014 compat mode
  8. Sebastian: How would you change SQL Sentry Plan Explorer to make it more usable / friendly?
  9. DBA_preparing_for_jobsearch: Could you suggest a good strategy on improving left anti semi joins?
  10. Least Significant Who In Whoville: Hola mi amigo. I am the sole DBA at a recently acquired adult beverage manufacturer and our new controlling company. Do you have any suggestion on how to break the news that their adhoc architecture is one mistake away from disaster when their shadetree DBAs know more than me?
  11. Itching to Emigrate: Is it true that Americans in Europe working remotely for American companies fall under GDPR? Can you point us to resources for this question? Everything I see only looks at the consumer side of GDPR.
  12. reps_for_bulk_deletes: For “Full” recovery model, how much bigger than a row of data is the log entries associated with the deletion of that record? Trying to come up with a heuristic for when to approach alternate deletion methods by looking at remaining space in log and on log disk.
  13. BlitzFan: Hi Brent, At our shop we make all kind of backups, SQL backups, Avamar backups of SQL backup shares and of entire databaseservers. But we never test those Avamar backups, so no disaster recovery tests and no disaster recovery plan. What would you say to management?

[Video] 300-Level Guide to Career Internals at SQLBits

You’re overwhelmed with choices: so many things you could learn, so many ways you could specialize in your career. Which one should you choose? What are the safe bets, and what are the risky bets? Should you be a contractor? A consultant? A freelancer? Specialize in the cloud, and if so, what products? There’s no guide for data professionals here, and it’s kinda scary.

About ten years ago, I sat down with a pen and paper to analyze the ways that I could make a living with data. I devoured a lot of business books, thought about the kinds of things companies pay money for, and laid out a simple grid to help decide what kind of career I wanted to build for myself.

Today, in 2022, you’re facing the same problems. In one session, I’ll explain what companies pay for, what individuals pay for, and how you could build different careers with that knowledge. I won’t tell you what’s right for you – I just want to give you a map of different choices, like a guidance counselor would. I’ll finish the session by explaining the choice I made, and why.


[Video] Office Hours at the End of the World

Videos
3 Comments

I hopped down to Cabo San Lucas and sat at the End of the World to answer the questions you posted & upvoted at https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Intros
  • 00:44 Kirk Saunders: Hey Brent! When did you determine your knowledge/skillset/etc.. was sufficient to teach your various classes? I think I’m doing well at my job, but I’m concerned I’m overestimating my skillset (big fish, small pond). Any insight on a more objective measure is greatly appreciated!
  • 02:42 Monkey: Howdy, Brent! When I specify RETAINDAYS or EXPIREDATE when doing backups and set either to 30 days, does it mean after 30 days some job checks whether my backup expired, and deletes it ? if no, how can I make so older than 30 days backups be deleted automatically ?
  • 03:31 Yakov: What was office hours like prior to the introduction of PollGab?
  • 04:54 Rizzo: How should you deal with a SQL power user that forgets to commit their transaction on a busy server before leaving for the day?
  • 06:43 Gary Kendall: Hey Brent! We have a commercial app that creates many many temp tables (not in TempDB) for ad-hoc reporting and other things but doesn’t clean up after itself. Is there a tipping point where too many tables in one DB might cause overall performance problems with SQL Server? Thx!
  • 09:30 Select_Star_not_Asterisk: What’s your though about Contained Availability Group? For me, the new features, contained master and MSDB is a big check for me.
  • 11:16 Uncle Kenny G: We have 1000 databases on Azure SQL Server. We are going to adopt some third party monitoring software (Red-Gate, SolarWinds, checkmk, etc…). Does using these monitoring software will put pressure on the DBs and increase the overall costs at the end of the month?
  • 12:05 SQL Helper: In the consulting world, how do you handle an engagement that you cannot solve or is technically not possible as outlined in the SOW? Do you get compensated for your time working on the issue?
  • 16:34 Haddaway: What is the best tool for monitoring Always On replication progress? Do we need third party monitoring for good Always On Monitoring?
  • 17:24 Eduardo: What are the keys to becoming a good public speaker?

Download the Current Stack Overflow Database for Free (2022-06)

Stack Overflow
5 Comments

Stack Overflow, the place where most of your production code comes from, shares a version of their data in XML format from time to time, and then I import it into SQL Server format.

Stack Overflow’s database makes for great blog post examples because it’s real-world data: real data distributions, lots of different data types, easy to understand tables, simple joins. Some of the tables include:

  • Comments: 85M rows, 16GB data
  • PostHistory: 150M rows, 250GB (most of which is text, though)
  • Posts: 56M rows; 150GB
  • Users: 18M rows, 2GB
  • Votes: 230M rows; 4.5GB, making for fun calculations and grouping demos

This isn’t the exact same data structure as Stack Overflow’s current database – they’ve changed their own database over the years, but they still provide the data dump in the same style as the original site’s database, so your demo queries still work over time. If you’d like to find demo queries or find inspiration on queries to write, check out Data.StackExchange.com, a public query repository.

I distribute the database over BitTorrent because it’s so large. To get it, open the torrent file or magnet URL in your preferred BitTorrent client, and the 54GB download will start. After that finishes, you can extract it with 7Zip to get the SQL Server 2016 database. It’s 4 data files and 1 log file, adding up to a ~430GB database.

Want a smaller version to play around with?

  • Small: 10GB database as of 2010: 1GB direct download, or torrent or magnet. Expands to a ~10GB database called StackOverflow2010 with data from the years 2008 to 2010. If all you need is a quick, easy, friendly database for demos, and to follow along with code samples here on the blog, this is all you probably need.
  • Medium: 50GB database as of 2013: 10GB direct download, or torrent or magnet. Expands to a ~50GB database called StackOverflow2013 with data from 2008 to 2013 data. I use this in my Fundamentals classes because it’s big enough that slow queries will actually be kinda slow.
  • For my training classes: specialized copy as of 2018/06: 47GB torrent (magnet.) Expands to a ~180GB SQL Server 2016 database with queries and indexes specific to my training classes. Because it’s so large, I only distribute it with BitTorrent, not direct download links.

As with the original data dump, these are provided under cc-by-sa 4.0 license. That means you are free to share it and adapt it for any purpose, even commercially, but you must attribute it to the original authors (not me):

so-logo

Happy demoing!


[Video] Office Hours: Speed Round Edition

Videos
0

Not all of the questions you post at https://pollgab.com/room/brento require long answers. Here’s a quick speed round:

Here’s what we discussed:

  • 00:00 Introductions
  • 00:24 EngineHorror: Hey Brent! What’s your opinion on page- and row-level compression in general? Is it true it increases locking besides burning extra CPU?
  • 01:08 Pony: Is Microsoft Assessment and Planning tool (MSAP) still best tool for discovering SQL Servers within a company? are there better or alternative tools for this ? We need this to build inventory list of SQL servers we have
  • 01:21 DBA Champion: What monitor would you recommend for a DBA ? Inches, resolution, number monitors (1,2,3…) ?
  • 02:00 Monkey: Hi Brent! Have you ever considered working with Microsoft on SQL Server improvement?
  • 03:20 Monkey: Do you recommend performing transaction log backups into 1 single TRN file (append them), or each t-log backup should be performed into separate file (1 backup file = 1 trn) ?
  • 03:43 Mike: Have you ever implemented Dynamic Data Masking, and what is your opinion on it ? Do you think in-house developers who has read access to Production, should or should NOT be able to read Personally Identifiable Information such as Customer Name, Email and Address ?
  • 04:22 James Adams: Will the Senior DBA class ever come back in stock?
  • 05:07 B-treehouse: I must optimize the database of a multitenant app. The first col of most NC indexes is the ‘TenantId’. This causes param sniffing problems due to the mix of very small and very large tenants. Would it be an acceptable use case for ‘OPTIMIZE FOR(@TenantId UNKNOWN)’ ?
  • 06:02 Tefnut: What types of perf issues do you like to use sp_HumanEvents for?
  • 06:25: Lilandra Neramani: Who is the Brent Ozar for all things Power BI?
  • 06:45 Can I join you: Hi, I use azure sql db and running a query against a view with a WHERE clause with a variable (Id =@id). Query runs slow and performs a full table scan. When I hard code the value of @id it uses a seek and runs fast. I thought Sp_executesql would fix it but it runs slow also. Why
  • 07:27 Dom: How would you describe your driving style ? Are you slowly cruising around looking at the scenic view or do you drive a bit more “sporty” ? Asking cause some of your car have been ingeneered to be driven pretty fast 😮
  • 08:43 sol: Howdy, sir! How come you haven’t gone bald still? Based on recent driving videos your hair is gorgeous!

Office Hours Speed Round, Text Edition

Got questions for me? Post ’em at https://pollgab.com/room/brento and upvote the ones you’d like to see me cover. I filter out the ones that are too short for video answers, and here’s the latest batch:

Corrupted: Should DBCC CheckDB be run on secondary replicas in AG as well ? Is it recommended to attach corrupted database to Prod server, to check how your integrity check job will react ?

You should run it anywhere you want to fail over to. Read this.

Khaled Budajaja: Brent Ozar Office Hours is more informative then many hours spent on reading blogs and technical articles. Did you create this? I don’t recall any body else doing office hours. Many thanks Brent

Aww, thanks. It’s definitely not my idea though – it’s common for college professors to have open office hours.

David: Where can I find information about the master and msdb databases similar to what you present in Fundamentals of TempDB?

What are the pain points you’re having with those databases that you need to solve? Leave ’em in the comments.

Bill Hicks still lives: God created the Earth in 7 days. Do you think it’s possible to migrate from Azure SQL Database to CosmosDB or Table Storage in 7 days? Apart from the joke have you ever done that and what are the pros and cons? The goal is to have reports go faster.

CosmosDB or Table Storage for fast reports?!? That is not even wrong.

Unut: Is indexing bit fields like “IsActive” on large tables a futile effort when the large majority of records are active?

Generally, single-column indexes are less effective than multi-column indexes, regardless of the data type.

Khepri: Does FORCE ORDER query hint ever have any practical uses?

Yes, if you’re low on time, and it gets you across the finish line quickly, and you’re willing to live with the technical debt.

Not_a_DBA_but_I_play_one_on_tv: I restore a 600gb database backup to a test server every morning. Typically the restore takes 45 minutes to run. One day last week, it only took 15 minutes. Everything appeared to have run correctly, size didn’t change. What could cause that one restore to run quicker?

Less competition for shared hardware resources.

Unut: What types of issues do you like to use the SQL default trace for?

Check out the sp_Blitz source code.

James Adams: What are the powershell’ish things a senior SQL DBA should know?

I stopped teaching production database administration work years ago because I focus on development DBA work now. I’d start with DBAtools.io. I will note, though, that it’s interesting that a lot of these questions are still about production database administration. I just don’t wanna do that work anymore – I don’t find it enjoyable because it’s basically on-call break/fix stuff. Like, I could go the rest of my life without getting woken up because somebody ran a server out of drive space. It’s 2022. Quit treating drive space like it’s unobtainium.

sqldeo: Hi Brent Any suggestion on tempdb growth,not clearing the space even though developer dropping the temp table after doing ETL stuff SQl 2019, initials size same for all 8 files, auto growth enabled. db are aprox 5 TB in total size, snapshot iso. enable for 1 db. any suggestion?

Attend my Fundamentals of TempDB class.

T: Brent, currently we have SQL Server Failover Cluster with 2 nodes. We are planning to migrate it to Availability Group solution. Do we need to reinstall instances on both nodes or can we reuse it ?

Before going live on any HA/DR platform, you should test it repeatedly. That means you should build a new environment from scratch and test the bejeezus out of it before you go live.

Murad: What is the widest varchar column you would ever consider as a key for a new non-clustered index?

I wouldn’t set limits on that without knowing more about the business problem we’re facing. I can totally imagine scenarios where I’d index a couple of NVARCHAR(2000) columns, for example.

Milind: Hi Brent, how are you? How to do 2.5M+ records update without table lock? It is about account balance limit reset overnight daily. During midnight schedule, other processes are getting hampered due to table lock. How to do such operation without impact and possibly quick? Thxs

Read this series from Michael J. Swart.

Mekhit: Is there a recommended way to identify the hot (or not) columns for a given NC index?

There’s no instrumentation in SQL Server for that. You’ll be forced to use common sense. I know, terrifying, right?

Sothis: What are the signs that a given operator’s estimated number of rows was a hard coded estimate provided by the engine?

Take the estimated number of rows, divide that by the number of rows in the table, and look for suspiciously round numbers that don’t change regardless of the predicates you’re searching for.

Sothis: Is there a good way to determine the appx tipping point (row count) for index seek + key lookup vs index scan operation?

No.

Livnat: Any tips for tuning for “PARTITION BY” performance in TSQL?

Read these.

Waldemar: Do you have any query tuning / index tuning tips for linked server queries?

Don’t run them. Connect to the server that has the data you want. If you want to get something done quickly, you don’t turn to someone else and say, “Hey, can you ask that person over there to do this for me?” It’s just idiotic.

Nephthys: When passing multiple args to a stored procedure via a single delimited string and using STRING_SPLIT, is there a max number of args you recommend to not exceed for performance reasons? 

There are even gotchas with just one row. Read this.

Right Said Fred: What are the top 3 DB communities in order?

That’s a weird question – can you rephrase it for the kind of thing you’re looking for out of a community? I’m not sure if you mean places where general database people hang out, or where SQL Server people hang out, or where questions get answered, or where presentations happen, or what.

SQL Crooner: Can we safely ignore the page life expectancy alerts from Idera SQL DM?

I cover that in detail in my Mastering Server Tuning class. I wish I could do justice to it quickly, but that’s why I have training classes – some topics require deeper discussions and details.

Jose: What is your opinion of the various commercial apps that re-write your queries a hundred different ways to find the optimal query syntax?

Really cool, but usually really expensive.

Nephthys: Does clustered index fragmentation matter for INSERT performance?

Run an experiment with your particular table’s structure, your nonclustered index structures, and your typical insert statement to find out.

CakeAndEatItToo: When running a query through the Actual Execution Plan, is it more important to be looking at time taken or cost%?

Neither. Time to head to my free How to Think Like the Engine class.

Alberto: Hi Brent, I read that default isolation level for SQL Azure DB is Read Committed Snapshot. In your experience, what should we pay attention to when migrating from an on-prem with Read Committed isolation ?

Start here.

DBGeek: Like you, I use a Macbook Pro as my primary work computer. I use a virtual machine where I install VPN clients etc. But SSMS sometimes behaves really strangely on the Macbook’s screen resolution. What is your solution/workaround to handle high DPI settings?

In your hypervisor, change the Retina settings. Whatever you’ve got it set to, use the opposite one. Then, in Windows, don’t use the HighDPI settings at all.

Reshep: What are the top conventions used by your clients when creating different availability groups (business unit, region, etc)?

Business units and RPO/RTO.

Heryshaf: How do you know when insert contention is happening and what are the common ways of dealing with it?

Review your server’s top wait types. For common ways of dealing with it, check out my training classes.

Pietro: What naming convention do you recommend for clustered and non-clustered indexes? We currently use IX_TableName_Co1_Col2_Inc everywhere.

When you’re looking at an execution plan, why waste valuable pixels on IX? You know it’s an index. Why waste space on the table name? The table name’s already shown to the left of the index name. Just use the columns.

Ptahhotep: Do you have any kind of tab organization strategy in SSMS when many tabs are open?

Yes, save files and close them. The kinds of psychopaths who leave multiple SSMS tabs open are the same kinds of nut jobs who leave hundreds of browser tabs open instead of using bookmarks. They should be hunted down and eliminated before they harm others.

Tenenit: What data profiling queries do you like to run against client tables when identifying new index key candidates?

I don’t. The contents of the data don’t matter – the contents of the query do. For the mechanics of how that works, and why you should never select a table’s data to determine selectivity, check out my Fundamentals of Index Tuning class.

OMC: Is it possible to optimize a SQL table both for high INSERTS and SELECTS or must you pick one or the other?

That’s a great question, and it’s a sign that you’re ready for my Mastering Index Tuning class.

Chris May: How does the optimiser know which set of statistics to use for a given column if there are multiple statistics available? e.g. default system statistics, statistics as part of an index, manual created/partitioned statistics etc

I’ve heard someone say it’s the most recently updated one, but I have no idea if that’s true. I was this close to recording a video of me testing that hypothesis, but I ran out of time before heading down to Cabo for vacation.

Len: What are the pros / con’s of creating a clustered index on an identity int field vs a datetime field based upon GETDATE() ?

The datetime column will take up more space due to its data type size and the fact that SQL Server will have to add a tie-breaking uniqueifier behind the scenes. The clustering keys get added to all of the nonclustered indexes, too, so more space there as well.

Waldemar: What are the top areas of specialized consulting for SQL Server?

Watch this. I explain the different kinds of specialized consulting, what they do, and how to get into those careers. I know it’s kinda long, like 43 minutes, but if you don’t have 43 minutes to learn, well, we don’t have to worry about you getting a career as a specialist, do we? 😉

Horus: What is the killer feature Microsoft should add to SQL table partitioning that would lead to more wide scale adoption?

Why does it need to be adopted more widely? That feature is a pain in the butt. Instead, I wish they could build automatic sharding, automatically splitting data across multiple database servers for faster queries. That’s kind of a pipe dream for relational databases though.


Estimated and Actual Plans Can Have Different Shapes.

Execution Plans
15 Comments

A reader posted a question for Office Hours:

Hi Brent, What is your take on Hugo Kornelis’s explanation of execution plan naming. As her his explanation, estimated exec plan is simply an execution plan whereas actual execution plan = execution plan+run-time stats. Do you agree that the naming is flawed and confusing? – Yourbiggestfan

I like Hugo a lot – brilliant fella, and he knows way more than I do about execution plans – but he’s wrong on this one. Estimated and actual plans don’t always have the same shape.

I’ll use the training version of the Stack Overflow database, but any size will work here. Start with a few indexes:

Then create a stored procedure – yes, doing this with a temp table is silly, but I need a short, simple example to show the problem:

Then get the estimated execution plan for India:

The estimated plans for procs are a little weird in that they look like they’re a single statement:

But let’s zoom in on the second query in the proc. I’m not concerned about the plan for the insert into the temp table – that one’s pretty straightforward since we have an index on Location. Pay particular attention to the second query’s plan, though:

Read right to left, top to bottom to see that SQL Server estimates it will use a single CPU core (no parallelism) to:

  1. Scan the temp table
  2. Do a series of clustered index seeks on Users, then
  3. Do a series of UserId index seeks on Comments, then
  4. Sort the comments by Score, descending

But when you actually run the query and include the actual plan, the shape is totally different:

The query went parallel, and it chose a completely different query plan. Because there were so many rows in the temp table, SQL Server decided to scan the Score index on the Comments table, from highest-ranking Comments to lowest. It figured it wouldn’t have to read too many Comments before it stumbled across 100 that were written by people in the temp table. There were tons of differences, and here are just a few:

  • The estimated plan was serial, but it actually went parallel
  • The estimated plan used the Comments.UserId index, the actual one used Comments.Score
  • The estimated plan started with the temp table, the actual one started with Comments

The root cause on this one was the fact that when the estimated plan was generated, SQL Server hadn’t created the temp table yet, and it didn’t have any statistics on its contents. At runtime, the new creation of statistics caused SQL Server to go back and recalculate the plan while the proc ran, so the actual plan had a wildly different shape.

SQL Server 2022 is even worse.

I’ll do a classic parameter sniffing demo that I do all the time:

Execute it for Reputation = 2 – no need to get the actual plan, just run it:

And then try to get the estimated plan for Reputation = 1:

SQL Server 2022 is all, “Estimated plan? I could tell you – but then I’d have to kill you.”

That’s why I keep using the terms estimated plan & actual plan. For another example, check out Erik Darling’s recent post about estimated plan parallelism. (I’d already written & scheduled mine when his published, and I had to link to it here because the timing’s great.)


[Video] Office Hours: Long Answers Edition

Videos
0

This time on Office Hours, I let a few questions piled up at https://pollgab.com/room/brento that required in-depth answers to really do ’em justice. In particular, there was a statistics question that needed demos.

Here’s what we covered:

  • 00:00 Introductions
  • 00:35 Grogu: Did DBA Brent ever have to support a version of SQL Server that Microsoft had end of lifed? Any tips on discussing this issue with management?
  • 04:19 Ahsoka Tano: What are your favorite PASS Summit presentations from prior years?
  • 07:11 Can: I have 3000 DB per SQL Server. When I need to restart SQL Services, the service is waiting on Stopping mode.
  • 09:22 i_use_uppercase_for_SELECT: When doing filtered statistics do you basically have to use full scan? It appears the where clause for filtered statistics is applied after the tablesample system is applied so you end up sampling the same data as the main statistics which wasn’t seeing this data originally.
  • 29:02 Shaheen: Is there info in SQL Server that will tell us why it picked one index over another for a given query?
  • 31:58 muppet#1: What’s your favourite SSMS plugin (assuming you use any)?
  • 33:02 DBAInProgress: What are some of the major gotchas of using TDE?
  • 34:22 Dmitriy: How would I begin to optimize a reporting query that has no WHERE clause, but has a bunch of LEFT JOINs to a bunch of other tables via foreign keys?
  • 35:55 Columnstore newbie: I have a clustered columnstore index. The segments look good but selecting TOP 1000 order by UtcDate is still very slow. Why?

Let’s Make September Our Free Community Tools Awareness Month.

Last week, I was reading a brand new article from a Microsoft employee about how you should directly query sys.dm_exec_requests in order to find out what’s running on your system.

Brent Reading Book
“Step 1: get a stone that looks round.”

I lost my mind.

There was a lot of yelling at the monitor.

In the year 2022, nobody should be reinventing the wheel. There are plenty of free wheels available for you to choose from. You’re literally wasting your time if you start from scratch with a boulder and chisel, and then try to turn it into a wheel.

The Microsoft data platform community is amazing, and has been that way for years. There are so many free resources to help you do your job faster, easier, and more accurately.

And sure, I’ve been around for quite a while, and I take for granted that everybody in the database business knows about all this cool free stuff. I’m not talking about the First Responder Kit, either – I’m talking about a stunning list of resources so large that it’s intimidating just to get started.

That’s where you come in.
What do you rely on every week?

In September, I want you to improve community knowledge about one free tool that you rely on every week in order to get your job done.

Your first reaction is gonna be that everybody already knows it, but trust me, they do not. Just by reading this blog post, you’re already ahead of many folks out there who don’t have the time to keep up with the industry. Imagine that you’re talking to a brand new hire at your organization who needs to get up to speed on how you’re able to do your job so effectively.

Pick one of these things to share:

  • Introduce the tool to readers for the first time
  • Tell a story about how it saved your bacon
  • Share a non-default configuration option that you use, and why
  • Write a review – explain what you like about a tool and what you wish was different
  • Compare several free tools that do the same thing – explain the pros & cons of each one
  • Put together a list of learning resources for a free tool – maybe you like the tool, but it isn’t easy to use, and you want to put together a set of links to show a new user where to begin

And there are any number of ways you can share it:

  • Write a blog post (if you don’t have a blog, write on LinkedIn, SQLServerCentral, MSSQLTips)
  • Record a short video
  • Improve the tool’s documentation

You can schedule it anytime you want during September. When it goes live, leave a comment here with a link to it. I’ll post a roundup post, and I’ll set up social media re-sharing so that I can keep driving new folks to your work over time. I’ll be working on it too – most of my September blog posts will be focused on free community tools.

Let’s make sure that nobody in our industry has to reinvent the wheel again!


Who’s Hiring in the Database Community? August 2022 Edition

Who's Hiring
15 Comments

Is your company hiring for a database position as of August 2022? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


Office Hours Speed Round: Text Edition

Got questions for me? Post ’em at https://pollgab.com/room/brento and upvote the ones you’d like to see me cover. I filter out the ones that are too short for video answers, and here’s the latest batch:

Thomas Franz: When I have a very large table with no statistics (or not on a specific column in the WHERE) and want to show the estimated execution plan it takes minutes because it builds the statistics first. How can I prevent / speed up it (want the create stats asynchron as stats updates)?

There’s no such thing as asynchronous stats creation, only updates. SQL Server needs the stats first to create the plan, or else the plan would literally be based on random guesses.

Kinneret: Is table ordering (Table B appears before Table A) within a view definition ever important for query performance?

Yes, read this all the way through.

SQLSteve: I have inherited a system that has an unusual index setup. Most tables have a clustered index but also have a PK non-clustered index which is the same as the clustered. IndexUsageStats show 0 for some NCI’s but can also show usage up to 50% of the Clustered.

Cool. As long as we’re just sharing stuff we got from others, I got a metal pineapple with an unusual inside compartment:

A metal pineapple - shout out to the fine folks at é

I don’t have any questions about what I have either.

Luca: Are Azure DR nodes more prone to “replica not healthy” errors? Our nodes are in a different region & every ~2 wks we see: Always On Availability Groups connection…terminated / The local replica of availability group … is preparing to transition to the resolving role

My gut feeling is that sysadmins who use the cloud are just more reckless when it comes to network config. They’re more likely to just up and change firewalls, subnet configs, routing, etc and wing it as they go.

Midwest DBA: Hi Brent, Does a Senior Development DBA need to be better at writing T-SQL than developers? Do you need to be some sort of T-SQL Guru, or just know the most common anti-patterns. Also, any good book suggestions on T-sql Anti-patterns?

I think they need to better understand the execution plan ramifications of the T-SQL they write. For T-SQL books, get Itzik Ben-Gan’s.

WhyIsItAlwaysOnFire: Any recommendations to prevent Non Yielding Scheduler errors? Seeing occurrences of this and subsequent thread pool starvation occurring during index maintenance.

Patch SQL Server, Windows, and your hardware drivers/firmware/BIOS.

Film_Buff: Hello! I’ve heard you say that doing sorting in SQL Server is expensive since one pays for licensing, and that it should be offloaded to the app. Is this a difficult task for developers to workout? What are the things to consider in regards to design and implementation? Thanks!

Your developers should be able to sort a list or an array fairly easily. That isn’t something that requires a lot of design planning. You’re overthinking it.

Midwest DBA: Will my SAN admin try and get me fired if I make all Identity columns Big Int going forward? Or is there a case when you can trust the business and set Identity to Tiny INT or INT?

Read how row compression works.

Martin: Our company has 3rd part DBA management. They do one size fits all index maintenance( reorg >5% and rebuild >30%) for all user DBs once a week. Is it better to allow them to do this, or not? They also run a nightly update stats job against all user DBs. Is this helpful or harmful

If you don’t trust your remote DBA firm, I can help you find a better one.

Garry Kasparov: I have a query that calculates Instance wide signal wait time as a % of clock time per period. if (instance wide signal wait time / Clock time )*100 = 100, then I’m thinking the instance would benefit from an additional CPU core. Is this a easy way to determine more cpu?

Without looking at the queries using the most CPU to see if you can tune them? That seems like a pretty expensive script.

DBA douglas: Hello Brent, I’m currently in the USA. I’m wondering if I can move to Europe or New Zealand, and provide American Companies off hours DBA support? Do you think companies would find it valuable to have an American Citizen Working in a different time zone?

Why would their DBA need to be an American citizen?

I_wish_I’d_done_better_in_school: What are the 3 most common issues that clients hire you to fix?

Slow performance (#1 by far), unreliable SQL Servers, and planning for the future.

I_wish_I’d_done_better_in_school: Hey Brent, Do you see many productions DBA’s skilling up in powershell still? Or are there new tools that make this unnecessary? I’m trying to get my team to adopt DBAtools, but my manager says that all the DBAs would have to learn powershell and that is too much to ask.

Be the change you want to see in the world. Consistently solve problems with PowerShell faster than your teammates, and they’ll want to be more like you. (Assuming, that is, that you can – if you can’t, then there’s your answer too, but you don’t need them to adopt it first in order to improve your own skills.)

Greef Karga: What are your pros / cons for generating / executing dynamic SQL in the C# client app vs doing so in a stored proc?

Developers are best at the language they use the most. What language do your developers use the most?

Qaiser: Rolling upgrade or Backup Restore for database migration in AG from sql 2012 SP4 to sql 2019 Enterprise. Some experts say that Rolling upgrade is risky as no rollback option.

Read this.

sqldeo: Hi Brent,I got unusual issue for my tempdb,i have 8 tempfile for 8 core cpu but not growing all equally (initial size and growing size equal across all files)using sql 2019 so traceflag 1117 also not recommended by MS,any solution to look the problem !thanks

Grow ’em out manually.

Seriously, just do it once and be done. Grow ’em out to 25% of the total size of the databases on the server for starters, and just be done with it. Why waste a minute of your life trying to solve this problem? If they continue to grow beyond that size, take my Fundamentals of TempDB class to understand what’s using the space and what you can do about it.

Valdemar: Should stored procs that create local temp tables also drop the local temp tables upon completion?

You can if you want, but it doesn’t really matter, as Paul White discusses in these posts. SQL Server keeps the structure & statistics around for other sessions. The drop only affects your ability to access the temp table after your proc finishes.

WBurgh: Good old MDS, any word on the street on when it’s being killed off? Asking since I’m not seeing anything in the Azure pipeline from MS.

No, but I certainly wouldn’t do any new implementations of it.

LegalEagle: Good Morning Brent! I have an availability group running on physical hardware and wish to add an additional node but add it as a VM to offload reporting. Can I mix a physical SQL with a VM SQL in the same AG? Google doesn’t seem to have information on that. See you at PASS!

Yes, you can, period. See you in Seattle!

Scrootch: My friend is struggling with PAGELATCH on the tempDB, leading to GAM contention. Server spec: 2019 Enterprise, 24 CPU/tempDB files, 320GB memory, 760 batch req/s. Would adding more tempDB files than CPUs have a negative effect to address this issue? Eg adding 32 file vs 24 CPUs.

Check out my Fundamentals of TempDB class.

Maksim: Has moving lob data off row for a given table ever moved you across the performance finish line?

No.

CakeAndEatItToo: The ‘missing index’ dmv is limited to 600 results, but for a SaaSy server with 1000+ db’s there’s a lot of cpu cycles spent cleaning up every couple of minutes. Is it possible to increase the limit or scale back the cleanup or do ANYthing without turning the whole thing off?

I’d be curious as to how you’re measuring that. Post a question on DBA.StackExchange.com with your evidence showing the cleanup of that DMV is causing overhead on your system.

Bill: I have an OLTP system with cache instability, but plenty of cpu to cover it. I had a frequently run proc that runs in <10ms., been stable for years, start blocking, and queries to timeout after 30s., seemingly due to it being Recompiled. indexstats are current. Where to start?

Run sp_BlitzCache, look at the Priority 1 warnings, and read the URLs in each of those.

Kinneret: Have you ever written an epic SQL query only to see SSMS freeze up or crash?

Click File, Save. Save your work every few minutes. No different than the advice I’d give to anyone using any computer application, ever.

Mollusk: What are your thoughts on running SQL HA technology like AG’s or a failover cluster instance in a public cloud. Issues are that VMs could be vmotioned (a lot). How will this affect auto-failovers or cluster health? Can I just set healthcheck and timeout settings higher?

I haven’t done hands-on AG work in years, sorry. I got out of production DBA work because complexity kept rising, fragility kept getting worse, and I kept hitting weird undocumented problems. I got tired of being on-call to fix that kind of thing.

J Katz: How to get a job starting out with perf tuning? Im a Senior infrastructure DBA, and I want to get experience perf tuning –I love your courses BTW, best investment ever!. Does Microsoft have positions like this? Should I expect to take a pay cut?

Use the Buddy System. Call people in your network who know you know your stuff. They’ll get you past the stack of resumes. Generally speaking, Microsoft doesn’t tune the T-SQL or indexes that runs on their customers’ servers.

DataPayload: Hello Brent, My friend understand that 1=1 help to comment out and 1=2 in create table without data in where clause. He is not sure if these helps the query logic or plan any other way. Can you please help him. Thank you.

Read this.

SQLServile: Our server is to be replaced with a newer one, which is fine. The staff doing this though, say it will be “too hard” to retain the original servername in this exercise. A new name will break countless connections (but hey, not their problem). Is there a less disruptive way?

They’re right. Instead, immediately after migrating to the new server, rename the old server as something else, like PROD_OLD. Create a DNS CNAME (alias) for the old name, and have it point to the new server’s name.

Yousef: Is it ever OK to specify an additional / redundant WHERE condition value to get a better index selection?

I would want to see an example.

Dan Griswold: Should Managed Service accounts like NT SERVICE\…. be disabled for security purposes? Why or why not? I have searched all over and can’t seem to find a clear answer.

I don’t do security work.

Raguel: Do you know of anyone that has ported the Stackoverflow or AdventureWorks DBs to other platforms (PosgreSQL, etc) for bakeoff performance testing purposes against SQL Server?

Read this.

Galina: What are your pros/cons of streaming on Youtube vs Twitch?

In my experience, Twitch pays better. When I was streaming regularly, I was getting $500-$750/month from Twitch, where as I don’t think I ever cracked $200/mo from YouTube.

Stacey Q: What is the largest DB you have seen that had enough server memory to cache the entire DB in RAM?

About 1.5TB. I’ve seen much larger databases, but they just weren’t cached in RAM.

Shaheen: How does index tuning for data warehouse queries differ from index tuning for OLTP queries?

These days, data warehouse tuning tends to start with clustered columnstore and go from there. OLTP tuning starts with multiple rowstore indexes.

Don’t Blame Anthony: How does SQL server determine row order for “Select Top” or “Delete Top” when no order by clause is specified? I see this pattern frequently.

Whatever rows the query plan pulls first, that’s what comes out. Order isn’t guaranteed without an order by. It can change based on parallelism, available indexes, merry-go-round reads, all kinds of stuff.

Stockburn: Hi Brent, company has 600K+ databases in Azure SQL DB. How would you change your perf tuning methods, investigation when faced with so many DBs. They all use a similar schema but may be used more / less heavily. We have been using query store, interested in your advice.

Read this.

Marc Spector: Sometimes when we run sp_whoisactive we see long duration sessions from our .NET apps with a sql_command of “sys.sp_reset_connection;1” and status of “dormant”. What causes this? Is this anything to be concerned with?

Read this.

TheCaptain: Hi Brent, Some developers at my organisation have written a module to encapsulate some common client parameters, making them optional and default: timeout=0, connection_timeout=0. What’s your opinion and do you know of any articles with guidance on best practices?

No.

MikeNM: I was watching your talk about debugging stored procedures, and your laments about the death of debugging in SSMS. Have you worked with SQL debugging in VS Code? If not, what do you recommend?

I use the technique I describe in the video.

Cats_Everywhere: Brent, Do you know any DBAs who have taken advantage of the WFH movement to move to a low cost of living city/area? Any suggestions? Must have great restaurants!

Yes, but it’s typically been to move closer to family.

Tefnut: When SQL connections are re-used via .NET connection pooling, does the sproc author need to take care to reset connection level settings that were enabled in the executed stored procedures? Ex: At the end of the the sproc do SET NOCOUNT OFF, SET XACT ABORT OFF, etc.

No, but remember that the procs might be called by other things, not just .NET.

Yourbiggestfan: Hi Brent, Can you point me to any website/blogsite which has performance tuning challenges/exercises for SQL Server where we can test our knowledge? Your invisible indexes challenge is the kind I am looking for.

Yes, click Training at the top of the site.

Emelio: What is the best book on Data Warehouse Design?

I have no idea.

NotCloseEnought2RetirementToStopLearning: Hi Brent My shop has been mainly SQL 2012 and 2016 (nagging my manager to upgrade). Can you recommend key\cool features released in SQL 2017, 2019 and 2022 that I should focus on so I Iook super smart when we do upgrade? One feature per version please.

Read this. 

Taweret: What are the top rules / guidelines from older SQL Servers that you see used on new SQL Server versions but shouldn’t be?

To monitor Page Life Expectancy and to lower Fill Factor in order to reduce fragmentation.

Khonsu: What are the pros / cons of creating NC temp table indexes inline vs a separate step after table creation?

Read this and/or attend my Fundamentals of TempDB class.

Steven Grant: Do you have a recommended sample DBA calendar that shows admin functions to perform daily, weekly, monthly, yearly for SQL shops with little previous DBA structure?

No. Neat question, but no.

James: Hi Brent – With the non stop surge towards cloud, companies desiring and hiring more towards BI/Data Teams/Engineers etc, and being adamant that we are not needed, is it time to accept the inevitable that all respect and understanding of a DBA is gone and we need to move on?

Yes, as you can tell by the complete lack of database questions above, there’s no work left to be done. You should move on. You go on ahead without me. Best wishes, hugs, thoughts, and prayers.

Midwest DBA: Who is the Brent Ozar of Data Warehousing? Do you have any recommendations on training resources?

Data warehousing is being reinvented, and there are tons of approaches. For example, check out James Serra’s video on what a modern data warehouse entails:


Office Hours: Dodging Work Edition

Videos
8 Comments

Post your questions at https://pollgab.com/room/brento and upvote the ones you’d like to see me cover. Today, I’m dodging work, so I went through your questions while I waited for the coffee shop to open:

Here’s what we covered:

  • 00:00 Introductions
  • 03:28 Ivan: What are the top signs that a table has a poor clustered index?
  • 05:39 NotCloseEnough2RetirementToStopLearning: Hi Brent, it seems the job market wants and is paying more for data generalists (wider but less deep skillset) than data specialists (DBA, deep but less wide skillset). This seems to be limiting salaries and the number of Senior positions. Any thoughts or comments?
  • 07:52 Gennady: When should you use a table per year of data (and union them together in a view) vs using formal sql table partitioning?
  • 12:32 Mr. Ed: How would I determine if my IT team needs a DBA?
  • 14:28 Tom: Can you recommend a live T-SQL class?
  • 15:23 Jorriss: If you were a woodchuck, would you prefer brick and mortar structures?
  • 17:18 J.P.: Why should I learn noSQL if I know SQL? Is it only because of the speed?
  • 19:06 Rob: We have a 4.8 tb SharePoint database that is causing a lot of issues like backups due to the file size. Has anyone run across this issue?

[Video] Fundamentals of Stored Procedures at SQLBits

T-SQL, Videos
12 Comments

Anybody can write a stored procedure with a little help from Google. This session is about how to write stored procedures that have a high likelihood of performing well and are easy to troubleshoot.

This fast-paced, all-demo session from SQLBits will NOT cover how to write a query, syntax, or performance tuning. This is about good best practices after you’ve written the first one – things like how to catch errors, how to pass in multiple values, how to debug without the debugger, and more.

If you enjoyed this session, check out SQLBits 2022’s free video library with the other sessions from this year, and for all years.


Why Adding Some Memory Doesn’t Fix All Caching Problems

Wait Stats
18 Comments

It seems obvious: add some memory, and SQL Server should be able to cache data, thereby making queries run faster … right?

Well, if you can’t cache everything you need for a query, you might be surprised to find that SQL Server may still read the entire table up from disk, regardless of what parts of the table are currently cached in memory.

Let’s start with a little setup. I’m going to set the server’s max memory at about 8GB, which isn’t enough to cache the entire Comments table from the Stack Overflow database. (I’m using the 2018-06 version of the database, in which the Comments table is about 21GB. I’m also going to drop all of my nonclustered indexes to force SQL Server to do a table scan.

Then in order to demo the problem as quickly and easily as possible, I’m going to write a query that you probably wouldn’t normally write. I’ll take the Comments table of the Stack Overflow database, which has a clustered primary key on the Id column, and query it to find all the comments Jon Skeet has ever made:

A couple of things to recognize:

  • There’s no index on UserId, so SQL Server will need to scan the entire clustered index
  • The ORDER BY Id actually pays off a little for the clustered index scan – because we asked for the data sorted by Id anyway

The first time the query runs on 8GB RAM…

If we look at the actual execution plan, we spent 48 seconds waiting on PAGEIOLATCH (storage reads) in a 22-second query:

During that 22-second query, we read the entirety of the Comments table up from storage. We can prove that by checking sys.dm_io_virtual_file_stats before & after the query runs. That’s where SQL Server tracks how much it’s read & written from the various data & log files:

SQL Server read 21GB of data from the Stack Overflow database, which is spread across 4 data files. That’s the size of the Comments table. Okay, so far so good.

The second time the query runs on 8GB RAM…

We’ve already got at least some of the table cached in memory, right? If we rerun the table again, surely our PAGEIOLATCH waits will drop a little because the buffer pool is warmed up, right? Right?

And we read the same 21GB of data up from disk again:

The problem is that this query needs the data sorted in the order of the clustered index’s Id, so it’s going to do a scan from the beginning of the clustered index to the end. When the query completes, sure, some of the Comments table is cached in memory – but only the end of it, the highest Ids. So when our query starts again, it’s like Groundhog Day – we begin reading the table from the beginning again.

Adding just some memory doesn’t fix this.

Let’s amp our server’s memory up to 16GB and try the query again:

I’m also running the query a couple of times just to make doggone sure the buffer pool is warmed up and that SQL Server’s total memory has risen up to match its target. Then, run it a third time, and check wait stats and storage reads:

The query still takes 22 seconds, and we still wait about a minute on storage because we’re reading it all back up from disk again:

But cache the whole table, and magic happens.

Raise the server’s memory up to 24GB, enough to cache the 21GB Comments table, and run the query a couple times to prime the pump:

Suddenly the query finishes in 3 seconds instead of 22, and we don’t spend any time waiting on storage whatsoever:

Because we did no physical reads at all – the only activity during this span was a little writing to TempDB by sp_BlitzFirst itself:

Keep in mind that this is a simplified demo to illustrate a complex problem:

  • Real-life queries are more complex than this, especially in how we have to dig into the plan to understand why they’re scanning an index in order.
  • Adding an index to get a seek is a great solution – but only if you’re reading a relatively small portion of the table. If you seek in and read a range of rows, especially on real-world-sized data sets, you can still hit this problem.
  • Real-life servers serve multiple databases at a time, each running multiple queries. It’s hard as hell to cache everything you need.
  • SQL Server uses memory for lots of things, not just caching data, and they interact in tricky ways.
  • This is especially tricky in cloud VMs where you might want to make a giant leap up in memory, but you’re required to buy way more CPUs than you need in order to get more memory.

The point of the story here was to explain why adding memory – even doubling or tripling it – might not be enough to put a dent in PAGEIOLATCH waits. If SQL Server wants to read pages in a specific order, and the pages it wants aren’t in memory, it will throw out cached pages on the very same object – only to read those pages back up from disk again later while executing the query.


How to Add Invisible Indexes in SQL Server Enterprise Edition

Indexing
19 Comments

Every now and then, a client says, “We want to add indexes to this third party application, but if we do, we’ll lose support.”

No problem – enter indexed views.

Say our vendor app is the Stack Overflow database, and the app constantly queries Users by display name:

That query plan scans the Users table because there’s no index on DisplayName:

And our mean ol’ nasty vendor won’t let us add any indexes. No problem – let’s add a view:

A regular view like that is just syntactic sugar that makes it easier to write queries. You hide the complexity in the view, and then people can easily select stuff from the view without understanding everything that the view is pulling together. A view is just a T-SQL shortcut, but it doesn’t change the database structure.

However, you can turn a regular view into an indexed view (or a materialized view, different terms, same thing) by creating a unique clustered index atop the view.

Normally, when we think about clustered indexes, we would assume that the Id column is the right one to use here, since it’s unique. However, the entire purpose of what we’re about to do is just to give ourselves an index on DisplayName – so let’s cluster on that:

Then run our query again and check out its new execution plan:

SQL Server Enterprise Edition says, “Hey, you’re looking for Users with a specific DisplayName – it just so happens I’ve got a copy of that over here, sorted in order!” (Standard Edition doesn’t know this trick.) It produces a nifty execution plan with a key lookup – without actually having an index operation on the Users table itself.

The real beauty, though, is that when our mean ol’ nasty vendor looks at the indexes on the Users table, they have no idea that we’ve done anything. If we need to open a support case with the vendor or upgrade the software, we can temporarily drop our indexed views, and then recreate ’em after the support call is over. (That’s actually important to do, too – if the vendor tries to alter the table, their scripts can fail due to the presence of the indexed view.)

Indexed views have tons of gotchas, and explaining ’em all is way outside of the scope of this blog post. I demo ’em in the live Mastering Index Tuning class next month, or if you’ve got the recordings, head over here now.

Speaking of which, you know who else has the recordings? Paul White and Michael J. Swart, both of whom had good answers to the challenge in Monday’s blog post. Paul used the indexed view, whereas Michael had a more, shall we say, “creative” solution.


Updated First Responder Kit and Consultant Toolkit for July 2022

Summer has turned the corner. Sure, right now the First Responder Kit is wearing its swimsuit, basking in the sun, but already as we speak, the days are getting shorter. It’s only going to be a matter of time before the Pumpkin Spice release of the First Responder Kit. Go outside and take a walk, work on your tan while you still can.

How I Use the First Responder Kit
Wanna watch me use it? Take the class.

To get the new version:

Consultant Toolkit Changes

I updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy your spreadsheet and querymanifest.json into the new release’s folder.

sp_Blitz Changes

  • Enhancement: ignore SQL Server 2022’s default In-Memory OLTP usage for TempDB if it’s low. (#3110)
  • Enhancement: ignore SQL Server 2022’s idle POPULATE_LOCK_ORDINALS wait. (#3105)
  • Enhancement: ignore queries in system databases that have recompile hints. (#3119, thanks Erik Darling.)
  • Fix: typo on a check. (#3116, thanks Andreas Jordan.)

sp_BlitzCache Changes

  • Fix: should no longer get arithmetic overflow errors on servers with huge numbers of reads and writes that overflowed bigints. (#2980, thanks sunsickteck and RihoA.)
  • Fix: the Average Max Memory Grant column contents formula was wrong. (#3120, thanks MrTCS.)
  • Fix: make warning capitalization more consistent. (#3096)

sp_BlitzFirst Changes

  • Enhancement: the file stats output section now shows the database name at the far right. (#3118)
  • Fix: only repopulate the ##WaitCategories table if its contents are older, not newer. (#3092)

sp_BlitzIndex Changes

  • Fix: lower memory usage on SQL Server 2019 instances that are facing a bug in an out-of-control number of entries in sys.dm_db_missing_index_group_stats_query. This DMV is only supposed to have 600 rows, but it looks like in some older 2019 CUs, the number of recommended index plans wasn’t capped the way BOL says it should be. (#3085, thanks Paul Neering.)

sp_BlitzLock Changes

  • Enhancement: add spid and wait_resource columns to output. (#3101, thanks David Hooey.)

sp_DatabaseRestore Changes

  • Enhancement: look for Ola’s CommandExecute proc in the current database, so it should be supported in more non-system-database scenarios. (#3094 and #3095, thanks Ben Wiggins.)
  • Fix: case sensitivity typo involving BackUpFile. (#3100, thanks Maarten Clardij.)

Bonus changes: Anthony Green kept the SQL Server versions file up to date.

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. Be patient: it’s staffed by volunteers with day jobs. If it’s your first time in the community Slack, get started here.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


SQL Server Pop Quiz: A Key Lookup Without the Index

Execution Plans
14 Comments

How do we get this to show up in an execution plan:

Without having an index seek or index scan operation on the same table?

Take any copy of the Stack Overflow database and write a query that will produce a key lookup on the Users, table, but it’s not allowed to have an index seek or index scan operation on the Users table. You’re allowed to change the database structure – for example, if you need to change the clustered index, drop it, add nonclustered indexes, whatever – and of course include those changes in your comment as well.

Post your query in the comments here by end of day Wednesday, and I’ll pick my favorites and put ’em in the blog on Friday. My favorite creative answers will get 1 year access to the Recorded Class Season Pass: Masters Classes. My own answer blog post is written & scheduled for Thursday.