Blog

How to Go Live on SQL Server 2022

SQL Server 2022
7 Comments

You’re planning to migrate to SQL Server 2022, and you want your databases to be faster after the migration.

This is NOT a blog post about how to migrate – that’s the same as it’s been for a long time. Go build the new 2022 servers, and use log shipping or database mirroring to sync the old and new servers. (I’m not a fan of using Distributed Availability Groups to do version upgrades. You can, but it is a heck of a lot of moving parts to set up for a one-time migration.)

This is a blog post about what to do after you migrate:

  1. Turn on Query Store, but make no other changes to your databases (including compatibility levels)
  2. Gather a 1-2 week baseline of performance metrics and query plans
  3. Switch some (but not all) databases to SQL Server 2022 compatibility level
  4. If users are unhappy with performance, use Query Store to identify plans that got worse
  5. Get relief for that query quickly, possibly with 2022’s new ways to fix performance without changing the query

Let’s go into details on each one of those.

Stage 1: Turn on Query Store.

This new-in-2016 feature is like a black box recorder for your query plans. It has a lot of weaknesses, but it’s still better than your game plan that did not include Query Store.

The goal here is to start capturing query plans as they are today, before all hell breaks loose. Compatibility levels change execution plans, and SQL Server 2022 has a lot of ways that it might change your plans. If performance gets worse, we’re going to want to see what the query plan looked like before our change.

If you’re migrating from SQL Server 2014 or earlier, you don’t have access to Query Store yet. That’s okay: you can still do the migration, but just make sure that Query Store is the first thing you turn on after you go live on 2022. (You might have heard that Query Store is on by default in 2022 – that’s misleading. It’s only on for new databases that you create from scratch – it’s not on for existing databases you migrate into 2022.)

If you’re migrating from SQL Server 2016, 2017, or 2019, turn on Query Store sooner rather than later – long before you actually do the migration project. This way, you’ve got the historical data when you’re actually running on the current version, before the database is moved to 2022.

To learn how to configure Query Store, watch this video from Erin Stellato:

In either case, once you’ve moved to 2022, don’t make any other changes to your databases at first – especially not compatibility level. SQL Server 2022 has options for compatibility level going all the way back to SQL Server 2008, so you should be able to forklift your databases over as-is, and just leave them there for a week or two.

Stage 2: After going live on 2022, gather a baseline.

We want to get a crisp, clear picture of how the old compatibility level is working out for us on SQL Server 2022.

Whenever you change anything about an application – whether it’s a code deployment, new indexes, or a SQL Server version change – people are going to always say, “Hey, things are slower than they used to be.” They’re probably lying. They’re just taking advantage of the opportunity to pin the blame on you, Default Blame Acceptor, and that’s why I want you changing as few things as possible when you do the migration.

Things may actually have gotten worse for their specific query, or for the workload overall. Maybe we messed up a SQL Server configuration, missed a trace flag, or the server hardware isn’t quite what we thought it was. (True story: a recent client’s new server was accidentally provisioned on the slowest possible storage instead of the fastest, so of course their upgrade went poorly.)

Do your normal investigative troubleshooting, and you can even use Erin’s tips in her video on how to use Query Store to track down query plan changes. However, during this stage, do not upgrade the database’s Compatibility Level to fix one query’s performance. Doing so will change the performance of many other queries, some for the better and some for the worse, and you’re not prepared to troubleshoot that right now. If people are complaining about one query, troubleshoot that query. If people are complaining about the whole server, troubleshoot the whole server – but leave compat level where it is for 1-2 weeks.

Stage 3: Change databases to 2022 compat level one at a time.

Compatibility level is a database-level setting. You can see it by right-clicking on a database and going into its options, or by looking at the compatibility_level column in sys.databases.

It’s database-level because it’s possible that some of your databases will perform better on newer compatibility levels – but some of them may not. That means you should take the angriest users, the ones who are the most pissed off about slow performance, and try setting just their databases over to 2022 compat level.

It’s a simple one-line change to change to , as the documentation illustrates:

“Wait – that’s two lines, not one,” you say, pointing your Cheeto-dust-encrusted finger at the screen. Well, the change is one line, but I want you to note the prior compatibility level and the date/time that you changed it – because you might need to roll back. Don’t worry, rolling back is as easy as running that same ALTER DATABASE command, but with the old compat level instead of the new one.

You can make this change whenever you want, without taking the database offline, but there’s a catch: it clears the plan cache for that database. That means you’re temporarily susceptible to parameter sniffing issues as SQL Server suddenly builds new query plans for this database.

Stage 4: Troubleshoot reports of slow performance.

Most of your queries are probably going to go faster after the change. But what’s most? 90%? 99%? 99.9%? Even if just 0.1% of your queries slow down, that’s still a heck of a lot of queries to suddenly have to troubleshoot – especially when users often don’t even know which query they’re talking about. They’ll put in frantic help desk tickets that say things like “The customer screen is slow!!1!” and “The import process is down!!one!” and “My keyboard is filled with Cheetos dust!”

If tons of reports of slowness come in quickly, don’t be afraid to change the compatibility level back to the prior one. It’s a safe, quick, easy way to make the screaming stop. And… just stop there. If users are happy enough on the old compatibility level, leave it there.

However, if the slow query complaints come in at a rate that you can handle, and you’re seeing performance improvements that you wanna keep in other areas of the app, then it’s time to roll up your sleeves and do troubleshooting on the slow queries.

One of the easiest ways is to ask Query Store, “What query plans have gotten worse?” In SSMS, go into the database, Query Store, and then run the Regressed Queries report.

As soon as the report opens, you’re going to need to change the configuration, because the defaults are wrong. At the top right of the report, click Configure. Here’s what the defaults look like:

By default, it’s showing total duration – so queries that ran more often during a time window will show artificially higher on the graph – when the real root cause is that they’re running more often. Me, I like changing “Based On” to “Avg”.

Then, down at the bottom of the window, change “Minimum number of query plans” from 1 to 2. We’re looking for queries whose plan actually changed (perhaps due to the new compat level), not just slowed down. Click OK, and view the report.

The top left window lists the regressed queries, with biggest impact to smallest. As you click on each query, the top right window will update to show the performance of various plans for that query. Remember how I asked you to save the prior compatibility level, and the date/time that you changed it? We’re concerned about queries whose plan changed after the time that you changed the compatibility level. That indicates a query whose performance may have been adversely affected by the new compat level.

Stage 5: Get relief for that slow query.

I’m going to list these from easiest to most time-consuming:

Option A: forcing an older plan. While you’re in Query Store’s Regressed Queries report, one of the easiest ways to get temporarily relief is to click on the query plan you used to get with the prior version of SQL Server, then click the Force Plan button on it.

I said it was easy. I didn’t say it was good.

When you force a plan, that doesn’t mean it’s going to perform well with all possible parameters for that query. This is especially true for queries that keep changing plans for valid reasons, like the example at right. That query’s got 5 different plans, and genuinely needs them. If I force a single plan for it, but my data has outlier parameters, I’m probably gonna get performance complaints from those users. We cover better ways to fix those in my Mastering Parameter Sniffing class, but that’s outside of the scope of this blog post.

Forcing a plan also means if a better option comes along later, like En Vogue’s lovin’, you’re never gonna get it. The whole reason you upgraded compat levels was to get better performance, but that query plan is stuck in the past. You can revisit them later by going into Query Store’s Queries with Forced Plans report – and I’d recommend going back in there every couple/few Cumulative Updates. Microsoft improves query plan behavior over time, so it’s possible that by unforcing a plan later, you’ll get better plan options.

Besides, you’re going to want to go into that Queries with Forced Plans report anyway because query plan forcing can fail, plus the query may change over time. As the query changes, the forced plan will no longer be relevant (because it’s for a query that no longer exists.)

I do still like forcing plans – and I wanna tell you about it because it’s quick and easy – it just has drawbacks, so we need to keep going with more options.

Option 2: give SQL Server hints for the plan. Let’s say that in SQL Server 2022 compat level, SQL Server decided to do something in the query plan that made performance worse instead of better. Let’s say it used batch mode processing on a rowstore index. If you want to disable that behavior, you can add a hint to the query without touching the query itself.

This is called Query Store hints, and David Pless has a tutorial on it. It’s not as easy as forcing a plan in the Query Store GUI. You’re going to have to get the query’s ID from Query Store, then apply the query hint you want using sp_query_store_set_hints.

It does involve work on your part, but … it’s still faster than fixing/tuning the query. As soon as you start changing the query itself, you’re probably dealing with getting approval from different folks in the company, getting it into source control, testing it, getting it deployed, etc. Query Store hints are instant, presto, in production. I can’t imagine how that could go wrong.

Option III: fix/tune the query. This takes the most work, but I’ll be honest: as a consultant, it’s the one I do the most often. Usually when I pop open a query that people are complaining about, I say, “Okay, here’s why it’s not performing well, and if I change this, this, and this about the query, it’ll go dramatically faster than it used to, and it’ll get a better plan overall.”

This is the option that doesn’t just bring performance back to its prior levels – it makes performance better, and after all, isn’t that what we want? (Honestly: no, a lot of people just want it back to the way it was, and that’s why this is the last option.)

SUM(MARY)

When you’re preparing to migrate to SQL Server 2022:

  1. Turn on Query Store, but make no other changes to your databases (including compatibility levels)
  2. Gather a 1-2 week baseline of performance metrics and query plans
  3. Switch some (but not all) databases to SQL Server 2022 compatibility level
  4. If users are unhappy with performance, use Query Store to identify plans that got worse
  5. Get relief for that query quickly, possibly with 2022’s new ways to fix performance without changing the query

But while you’re in those final two stages, pay particular attention to the bold words. If users aren’t complaining about performance, move on to the servers where they are complaining about performance. Every server has queries that need to be tuned, and you need to focus on the ones that users will actually appreciate.


[Video] Office Hours: Live Q&A

Videos
2 Comments

This one’s broken up into two parts because I took a bio break mid-stream:

  • 00:00 Start
  • 03:04 Clippy: Hey Brent, you are the best!!! Can you share us more about your roots? What’s the origin of your family name and where does it comes from?
  • 04:15 I_ALSO_WANT_A_FERRARI: Hi Brent, first of all thank you for all your good advices. During a batch I can see a heavy CPU load, in the perf monitor the
  • #compiles are +/- 80% compared to the
  • #batch requests. Too much. Most of the compiles are coming from TVP as param for a SP. Is this a bad practice?
  • 06:52 Sid V: Is their better value for the production DBA to go deep (knowledge) in their field or go wide (knowledge) in their field? What are the common examples you see of going deep and wide?
  • 10:38 Curious DBA: Hi Brent! Do you recommend manually configuring the pagefile.sys size and drive(s) it resides on when configuring new SQL Servers, or is that something you let Windows handle? If you do configure it, how do you determine how much size to allocate to it?
  • 11:33 Jaime Sommers: Is there a good way to determine what percent of queries are single threaded vs multi threaded for the purposes of knowing if we have too many cores / licensing.
  • 13:07 Q-Ent: Hi Brent . How do you imagine your life at your retirement. I assume for us, your followers will be “Brent Ozar Unlimited Last Update 2 years ago” :D. Do you prepare any successor for this empire !!!! Love your Job and your life perspective.
  • 14:03 gserdijn: Hello Mr Brent, Microsoft documentation on SET ARITHABORT for all versions states: “Setting ARITHABORT to OFF can negatively impact query optimization, leading to performance issues.” Why is that? Is the Cardinality Estimation so fragile?
  • 15:51 Yusuf: What are the best resources for learning PostgreSQL performance tuning?
  • 18:09 The Dyslexic DAB: Hi, We used to have a ‘special’ SQL license which meant that it matter if we installed Enterprise or Standard. Our licensing has changed and we now have dozens of Enterprise Editions that we can’t afford! Any advice or gotchas for downgrading from Enterprise to Standard? Thanks

And part two:

  • 00:00 Start, Twitch ad discussions
  • 00:40 DumbQuestionsRUs: Would you still recommend formatting drives to 64k?
  • 00:54 Shefatyah: What interesting stuff in the query plan XML do you wish was visible in the SSMS UI query plan?
  • 01:43 DantheSQLMan: Do you have any advice on working out of the country with SQL consulting?
  • 03:10 i_love_you_brent: Good morning Brent! We have 250 db on 1 instance.we use SIOS for failover on secondary. All dbs are on different versions with creepy procs for etl and etl data is stored in the same db.job runs every 15 to summarize data. Manager wants to separate 250 etl DBs on different machin
  • 04:38 Shefatyah: Sometimes running “EXEC sp_BlitzFirst @Seconds =10, @ExpertMode = 1” just runs forever but if we turn off expert mode and re-run it, it returns results instantly. Is this due to tempdb contention? Any troubleshooting tips? SQL 2019 Enterprise thanks
  • 05:40 NoobDBA: Hi Sir! Can you share with us, your biggest challenge on any consulting job that have you been into. Thanks!
  • 07:40 How do you eat rice with chopsticks?
  • 08:04 Yusuf: Is there a recommended way to run a SSMS query and then immediately run sp_blitzcache to analyze the most recent run for that query?
  • 08:53 Cara Dune: Which is better for office hour streaming? Youtube or Twitch? Why?
  • 10:02 ChopstickWizard: Probably a very silly question. When I write Select A.* from A INNER JOIN B On A.T = B.T and then someone else writes Select A.1 From A, B Where A.T = B.T Whose is better? Performance wise. so sorry if this is a very basic question
  • 12:26 EngineHorror: Have you ever run into the Halloween problem in any of the DMLs? Can one say that the recent versions of SQL Server don’t have it?
  • 13:48 i_love_you_brent: We have 250 dbs on 1 instance with etl and oltp together. is it worth spending time on separating those to 500 dbs 1 for etl and 1 for oltp. reports are timing out sometimes.
  • 14:40 ExtramileDBA: I have done tons of projects on SQL server and recently moved to a shop that is pro-MYSQL with SQL server only supporting ISV products. Do you know of any awesome MySQL conferences similar to the likes of Group By, SQL bits and Pass.
  • 16:08 Aleksey Vitsko: Hi Brent! With announced “failover from SQL Server 2022 to SQL Managed Instance” feature still being in private preview, do you think Microsoft will make this feature publicly available with one of future CU for SQL 2022 ? Just install CU and feature becomes available ?

I’m doing live streaming on my Twitch channel on Wednesdays & Thursdays this summer, and the recordings will go live later on my YouTube channel. I’ll take questions from PollGab, do live coding, work on the First Responder Kit, and write blog posts.

The stream will start at 8:15AM Pacific, 11:15AM Eastern, and you can see the time in your local time zone here. I’ll stream for around 45 minutes.

To get notifications when I go live, follow me on Twitch, or Google Calendar users can use this invite. See you in Twitch!


What Does Setting the SQL Server Compatibility Level Do?

If you right-click on a database in SQL Server Management Studio, you get an option to set Compatibility Level at the database level:

When you upgrade SQL Server or you want to improve performance, which option should you choose? Should you always go with the newest compatibility level? If you have an old application, can you upgrade SQL Server but still keep the old compat level without the vendor knowing? Let’s hit the common questions.

What does compatibility level do?

When Microsoft brings out a new version of SQL Server, they tend to keep the newest features only available in the newest compatibility levels. For example, SQL Server 2022’s Parameter Sensitive Plan Optimization (PSPO) is only available in databases running under SQL Server 2022 compatibility level.

That means if you’re taking a database that used to live on an older SQL Server, and you want to host it in SQL Server 2022, and you want it to have the same behavior that it’s always been used to, you should keep it on the compatibility level that it’s currently on. For example, if you’re hosting it in SQL Server 2016, and the database is currently at SQL Server 2016 compatibility level, then you could move the database to a 2022 server, but keep compat level on 2016, and the users shouldn’t notice the difference.

In reality, though, there are things inside SQL Server itself, at the server level, that will change no matter what your compatibility level is. For example, if Microsoft deprecates a feature and removes it altogether, that feature isn’t available even if you’re on older compat levels. (Hello, Big Data Clusters.)

Should I change compatibility level?

If there’s a specific feature that you need that’s only available in some compat levels, then yes.

However, if you’re happy with performance, then no. Hear me out: changing your compatibility level can make performance worse instead of better. Sure, in some cases, it makes performance GREAT – but because all change = risk, then changing compat level when you’re already happy is dangerous.

How does compatibility level affect performance?

If you migrated from SQL Server 2019 to 2022, here are ways that changing a database’s compatibility level can make things better or worse:

In each version of SQL Server, different features are enabled under newer compatibility levels. Before changing your compat level, you should review what features are added when you change from your current compatibility level to the new one. This is a good starting point.

What should I measure before changing compatibility level?

In theory, you should have a performance baseline of things like:

  • CPU, memory, and storage metrics
  • Your top wait types so you know what SQL Server is bottlenecked on
  • Which queries are using the most resources
  • Query plans of well-performing queries (because things might get worse, and you’ll wanna know what they used to look like back in the good old days)

Then, when people suddenly complain about performance, you can check your baseline to see whether things actually got worse, or whether your users had taken up eating shrooms. You could also track down which queries were NOW at the top of your resource-consuming query list, look at what their query plans USED to look like, and then figure out how to get back to the good old days.

Common ways to accomplish this are third party monitoring products, Query Store, or the First Responder Kit.

In reality, you’re not gonna do any of this ahead of time. So, when you change compatibility levels on the fly, and performance gets worse, you’re not going to have any answers.

Does that mean I shouldn’t touch compatibility level?

No, not at all! You can change compatibility levels whenever you want, one database at a time. You can also change back instantly as well. You just need to be aware of when you made the change, what you changed, and communicate it to the rest of the team so they can roll your change back if necessary.

What compatibility levels are available?

The screenshot at the top of the blog post was taken in SQL Server 2022, and even in this recent release, Microsoft supports compatibility levels going all the way back to SQL Server 2008. That’s kinda awesome, because it means that Microsoft is trying to keep old databases working great in newer versions of SQL Server.

In theory, that means you can take an old vendor application that was once certified on SQL Server 2008, and keep moving it to newer and newer versions of SQL Server. In theory, that means it’ll keep working the exact same way as long as you keep the same compatibility level – and hey, it might even get faster if you change to newer compatibility levels.

So, can I actually do that?

Well, no. I mean you could, but you might get caught.

Here’s the thing: the vendor might be relying on a feature that’s no longer available in newer versions of SQL Server. I gotta be honest, that’s extremely unlikely, but it is possible. And if they are, and their application suddenly breaks, you can’t restore a newer SQL Server database to an older version of SQL Server.

So if you take your SQL Server 2008 server, back up the databases, restore them onto SQL Server 2022, and then start using the app – and people start complaining – you can’t restore those 2022 backups down onto SQL Server 2008, even if they’re still in the same 2008 compat level. You can only restore to newer versions of SQL Server, not older.

Therefore, you’re taking a risk when you move databases onto newer versions of SQL Server. Make sure the vendor actually supports the newer version of SQL Server, because you don’t wanna be the person that the vendor blames for their application not working successfully.

Want to watch me write this blog post?

I streamed this blog post live if you want to get a rough idea of what’s involved with writing a post like this:


[Video] Office Hours Speed Round: 21 Answers in 15 Minutes

Videos
2 Comments

HONK HONK! It’s time for a fast round of answers to concise questions y’all posted and upvoted at https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Start
  • 00:19 TheyBlameMe: Hi Brent. What is your view on using Dacpac diffs vs “old style” manual update scripts for deploying DB schema changes in dev-test-prod?
  • 00:52 Alex: Hi boss, I have an app that creates a new DB for each customer. All DBs (3K~) have the same structure. Is there a best strategy or an article you can point to consolidate all DBs into one huge DB. Is it a good idea? I’m trying to save on maintaining hundreds of databases. Thanks
  • 01:42 Perplexed: Had a vender application running a stored proc and it wasn’t working. Ended up using Profiler to capture error message and passing that to vendor to fix. Is there a better way to find error messages that procs are kicking out, but hidden by error handling in proc?
  • 02:31 Piotr: Is there anything comparable to first responder kit for PostgreSQL that you like to use when performance tuning in on that side of the fence? How hard would it be to write a first responder kit for PostgreSQL?
  • 02:56 HashMatch: Hey, Brent! I have a work superior who prefers to use several UPDATE statements instead of joins, to “keep track of row counts”. How do I best demonstrate this isn’t a good idea for performance?
  • 03:39 Steven: Hi Brent, in a nighlty ETL my friend has 2 sprocs updating 1 table in parallel. A page-level deadlock occurs randomly (1 in a 100 runs). Any tips or ressources on how to fix a deadlock at page-level while keeping the sprocs parallel? Thank you
  • 04:34 Piotr: Do you have any recommended tools for diff’ing two SQL tables for the purpose of showing the index differences between the two tables (DEV – PROD)?
  • 04:55 Björgvin: Do you ever see any issues with using windows mount points and SQL Server?
  • 05:29 TY: Hi Brent, in my job we often use the ROLLBACK of a transaction for testing purposes. Is there an easy way to rollback after two or more days, when the TRAN has been already COMMITED? Like a checkpoint where you can return to, but only for a single table or a database?
  • 06:04 Piotr: Do you think we will ever see FORCE_SHOWPLAN_RUNTIME_PARAMETER_COLLECTION in SQL2019 again? Is this feature worth upgrading from SQL 2019 to SQL 2022?
  • 07:01 Mickey: Hi Brent, I have a reporting query that runs under 10 seconds on other environments but runs for hours without finishing on this one environment. I’ve verified stats are up-to-date and the proper indexes are the same across all environments. Any other recommendations? Thanks!
  • 07:44 Gromit: Do you have a good way to fix high VLF count that doesn’t break log shipping?
  • 08:04 Nick Smith: Hi. Why might performance degradation of truncate tables in tempdb be seen after some time working in SQL Server 2019? There are no schema locks. I make look wait types trace in the current session, then there is nothing but SOS_SCHEDULER_YIELD and SOS_WORK_DISPATCHER
  • 08:37 LostInSpace: I have a developer using WAITFOR DELAY (10 – 15 minutes) to pause the iterations of his code instead of doing it in c#. I noticed them using sp_WhoIsActive with parms: @find_block_leaders = 1, @get_locks = 1, @get_additional_info = 1. It leaves suspended connections. good/bad?
  • 09:28 WorkinForDaMan: I’m the DBA for a city with a hybrid environment (on-prem and Azure VMs). While I was on leave, IT pushed 2016 AZC+GDR build instead of, SP3+GDR. I’m comparing both for diffs but wonder if you’d suggest reverting to SP3+GDR since we don’t have managed instances. Thank you, sir!
  • 10:07 Haydar: Do you have any recommended books / courses for sizing azure vm’s for lift and shift of SQL Server?
  • 10:53 Doc: Do you encounter peeps that live on the cruise ship in retirement? Is this an option for you?
  • 11:30 Ron Howe: What diagnostics would you recommend for a SQL Server that is fully 100% CPU throttled during query execution due to a “bad” query plan and you can’t get a SQL connection as such and a hard reboot seems the only solution?
  • 12:03 Nortzi: Hi Brent, recently a SQL statement with a begin tran and commit tran was executed from ssms and returned a results message. We had a blocking issue the next day. Turns out this query was still technically running and never finished. What do you think could have caused this?
  • 12:56 TY: Hi Brent, it seems like you know everything about SQL Server, or at least it seems like it. Can you do a short session with something that you don’t know much about and lead us through the process of learning it. It would be very beneficial to see how you build your knowledge. Ty
  • 13:56 marcus-the-german: Hi Brent, do you recommend that the sql server instance collation is the same like the user databases. If yes, how should we deal with databases which have a different collation?

Office Hours Speed Round, Easy Text Answers Edition

Some of the questions y’all post at https://pollgab.com/room/brento have easy one-line answers. Let’s knock ’em out:

George: Hi Brent, what recently has been the most challenging/surprising/new-to-you performance issue you have encountered?

SQL Server 2019’s slowdowns. I spent days working on that.

RoJo: Have you used Distributed AG as a way to upgrade major versions of SQL server without downtime? Seems like a nice way to try it out on a second site before a switch. Any concerns? Maybe jump from 2016 to 2019, or 2022 to big a jump? Cheers

No because it’s so much work to set up.

ConsultantWannabe: I’m a generalist trying to make the jump into the consultant role, I don’t want to be “a jack of all trades”. How should I start finding that niche (or that “expensive” thing to stand next to, apart from SS)? Do you think just asking around to the guys in suits is a good idea?

Ask executives what technology problem they can’t solve with their current staff.

Ive_Got_Heaps: Hey Brent, Our DB is loaded with heaps as our ERP system doesn’t utilize primary keys (begins crying). My plan is to create clustered indexes on existing columns where possible, or create an Id column for tables where no existing column can be used. Is this a sound approach?

Ask the ERP vendor. If it’s an in-house app, watch this.

Yevgeny: What are the top causes of data file corruption for SQL Server on a Windows cloud VM and how do you avoid them?

In the cloud, you don’t get root cause analysis from your vendor. Do backups and high availability (like AGs for automatic page repair.)

SQLrage: In 2019, can statistics updates on a table cause an execution plan to be recreated for a parameterized proc that hits the table but does not use that updated statistic in particular? Trying to better understand why plans regenerate automatically.

Read this and do a test.

Bart: Is there any harm in deleting Extended Properties of a table column? I inherited a database that’s been converted from MS Access into MS SQL several years ago and I think the extended properties are a result of that conversion.

What’s the benefit in deleting them? Why risk it? Who cares?

My latest toy, heading out for engine work first

Piotr: What file system folder convention do you like to use when locating data files / log files for a new SQL DB?

\MSSQL\DATA

Chetan: Which nice car did you buy recently? What do you drive now?

A 1964 Porsche 356 coupe, which is getting its engine checked out first before I take it on any road trips. Until it’s done, and because it’s springtime, I mostly drive my Speedster replica.

MacAries: I had an On Prem 3 CTE then join for result query that ran subsecond, but coming from Azure Function that sent multiple and crippled the on-prem server to a 20 result /minute nevermind the lock and batch waits is their some basic translation that azure needs to get the query run?

Read this or watch this.

Will Marshall: Do you run into any common performance issues with SQL always encrypted?

I’ve never had a client use it.

Haydar: What is the best way to copy a few tables from SQL Server to PostgreSQL?

I would ask this guy.

CB: Hi Brent – It seems SQL functionality isn’t supported in SQL task editor. Statement: Declare @sql nVARCHAR(max) Error: The Declare SQL construct or statement is not supported. Is there a solution to that?

I don’t know what the “task editor” is.

Will Marshall: What are the best courses / books for learning SQL Always ON?

I haven’t seen any that were updated with what’s new in SQL Server 2019, 2022, or Managed Instances Link.

Bocephus: For network perf testing between two windows nodes, what tools do you like to use?

Copy a large file like a backup.

Hal Jordan: What should we look at when OS pages per second paging rate is high for the bare metal SQL Server 2019 instance?

Attend this class.

Mirza: Discussion happening in the company about automating SQL patching on clusters using Powershell and SCCM. Both PS script and SCCM are owned by the server team. Does the DBA team lose control and does it matter? What is your experience/opinion regarding automating SQL patching?

Read this.

Kyle: Hi Brent! What are the best practices for restarting your SQL service on an HA system? Is there any way to do it with causing any downtime?

No, all restarts will cause downtime, even for cluster and AG failovers, so for minimal downtime, use clusters and AGs.

SQL_Developer_Admin: If there is left outer join, then why there is right outer join as well, if we could just swap the sides of the tables. Any scenario you know where only left join can be used or right join can be used.

Sometimes it’s nice to have multiple tools to approach the same problem from different angles.

Eyvindur: You mentioned caching as a possible solution to lessen the load on SQL Server. Are triggers are good solution for cache invalidation?

The idea of caching is to lessen load. Do triggers add or lessen load on the database?

Eduardo: Do you see any RDBMS disruptors threatening to steal Microsoft / Oracle market share in the near future?

If by “near” future you mean 5-10 years, yes, Postgres and AWS Aurora.

ChopstickWizard: Been mulling this over for sometime. There are instances where I want to recommend a product to a team, like example : Cockroach db, mage.ai etc. But the problem is, they seem “non enterprisey” just by their name in case of “cockroach” or mage.ai’s tag line. Have you faced this?

You mean names like ChopstickWizard?


[Video] Office Hours: Ask Me Anything About Azure SQL DB & MSSQL

Videos
2 Comments

Post your questions at https://pollgab.com/room/brento and upvote the ones you’d like to see me answer.

Here’s what we discussed today:

  • 00:00 Start
  • 01:40 prasad: Hi Brent, I want to become a full fledged database architect. I have been reading and practising lot of stuffs here and there, but no certain path. I also subscribed once to ur master class bundle. can you guide me on a proper path for the same? Thanks in advance
  • 04:57 ExcitingAndNew: Hi Brent, what is your opinion on the practice of inserting SQL comments into SQL requests as tags to allow DBAs to track the requests in the server ? (I’m talking about comments for instance just after SELECT/INSERT/UPDATE/DELETE to force SQL Server to keep them everywhere)
  • 05:40 DGW in OKC: What is your opinion on the practice of a manager who consistently assigns DBA tasks to an employee who is marginally proficient at DBA work and is not really that interested in this discipline anyway?
  • 08:08 Fjola : sp_BlitzFirst shows the top expensive query of type : statement. It’s unclear which app/sp is generating this query. What is the best way to track down the app / sp generating this query / statement?
  • 10:05 Chris: Have you ever had manually created statistics either be a root cause or the final push needed to cross the line?
  • 10:53 CKI: How to get history of most recent queries executed with username in SQL? Auditing is not an option. Thank you!
  • 11:50 Piotr: Do you have a recommended method for adding notes to a given NC index (i.e. why this index is needed, which app uses it)?
  • 12:38 Perplexed: What are your thoughts on using PERSIST_SAMPLE_PERCENT to force all future UPDATE STATS to use a specific sampling? I just started using this on a very large table that was not getting stats right after updating the stats.
  • 14:02 UncleFester: When running Select */Count(*) SQL was using an index, returning only 47 mil rows of 95 mil in the table. Rebuilding the index/statistics was no help. Dropping/recreating the index solved it. Can I really trust Select * or Select Count(*) to return all of the rows in the table?
  • 14:56 RoJo: Debate rages here on what login to use for security: AD/Windows or SQL direct. Is either more secure? or if equal, do you prefer one and why? Thanks dude

SQL ConstantCare® Population Report: Spring 2023

Ever wonder how fast people are adopting new versions of SQL Server, or what’s “normal” out there for SQL Server adoption rates? Let’s find out in the spring 2023 version of our SQL ConstantCare® population report.

Out of 3,002 monitored servers, here’s the version adoption rate:

The big 3 versions are all within 1% of the last quarter’s numbers:

  • SQL Server 2019: 33%
  • SQL Server 2017: 19%
  • SQL Server 2016: 28%

On the other extreme:

  • SQL Server 2022: 2% – which is right on track with 2019’s adoption rates after it came out. It’s not low at all – it’s right in the same ballpark.
  • Azure SQL DB: 1%
  • Azure SQL DB Managed Instances: 2%

Just 13% of the population are running unsupported major versions (2014 & prior).

Here’s how adoption is trending over time, with most recent data at the right:

In other news, we do actually have 6 folks now running SQL Server on Linux! It’s a total of 32 SQL Servers, a mix of 2017 and 2019, running on a mix of Ubuntu and Amazon Linux. That’s about a 1% adoption rate. Only 4 of those are Dev or Express Editions, too. I’m still feeling pretty comfortable with my 2018 prediction that in 2026, <5% of SQL Servers would be running on Linux.

Other interesting tidbits:

  • 8% of SQL Servers are in a failover cluster
  • 23% of SQL Servers have Availability Groups enabled (not necessarily using it, it’s just enabled)
  • 6% of servers have Filestream enabled (not necessarily using it, it’s just enabled)
  • 34% are fully patched to the latest possible Cumulative Update (that’s amazing!)

34% of servers are fully patched to the latest possible Cumulative Update. That’s awesome! Nice work, y’all. My first thought was, “Oh, I bet the old versions like 2012 are fully patched, but the new ones aren’t patched because the patches keep coming out.” Nope, it’s actually the opposite: the most-currently-patched folks are on 2016 & 2017. The least-patched are the unsupported versions that haven’t had patches in forever. Disappointing.

Only 7% of SQL Servers are on unsupported versions or builds. That’s awesome too! Keep up the good work on patching, y’all.


Announcing Live Office Hours on Wednesdays & Thursdays

Company News
3 Comments

Office Hours - ask me anything.It’s time for summer school!

I’m doing live streaming on my Twitch channel on Wednesdays & Thursdays this summer, and the recordings will go live later on my YouTube channel. I’ll take questions from PollGab, do live coding, work on the First Responder Kit, and write blog posts.

The stream will start at 8:15AM Pacific, 11:15AM Eastern, and you can see the time in your local time zone here. I’ll stream for around 45 minutes.

To get notifications when I go live, follow me on Twitch, or Google Calendar users can use this invite. See you in Twitch!


What SQL Server People Should Know About Amazon Aurora Serverless

You, dear reader, are most likely a Microsoft SQL Server user – either a DBA or developer.

Set your pencil down for a second because you’re not going to learn about a Microsoft product today, nor are you going to learn something that is going to be immediately useful to you in your job. Today, I’m writing about a completely different product just to give you a better insight on what else is out there.

Our SQL Server monitoring app, SQL ConstantCare®, uses AWS RDS Aurora PostgreSQL as its database back end. I wrote about that design decision five years ago, and since then, we’ve been really happy with both PostgreSQL in general and AWS Aurora in particular. We just didn’t have to spend time worrying about the database layer – for the most part, it just worked, in the same way Microsoft Azure SQL DB just works.

No matter what database platform you use, costs are split into two main parts:

  • The storage layer – where you pay for how much you store, and how many accesses you do (reads and writes)
  • The compute layer – where you pay for CPU & memory for the servers to process your ugly queries

Traditionally, us data people have wanted the fastest storage we could get, and the most compute power we could get. That led to problems with management because we bought really good stuff, and then… it usually sat around idle, unused. We had to pay a lot to handle the peaks of our workloads, and then we were locked into expensive stuff that rarely got hit hard.

Serverless can reduce those costs while simultaneously handling more bursts.

To help reduce costs and improve flexibility, several years ago, AWS introduced Aurora Serverless. Instead of paying for fixed server sizes (like 8 cores and 60GB RAM), Aurora Serverless:

  • Watches your query workload
  • Automatically adds & removes CPU & memory on the fly
  • And reacts automatically in milliseconds without dropping queries

Not minutes. Not seconds. Milliseconds. That’s bananapants. You just define a minimum and maximum size for your database servers, and Aurora automatically handles the rest, independently, for each replica. Instead of provisioning an 8-core, 60GB RAM server, you could say you want your config to swing anywhere from 2 cores and 4GB RAM all the way up to 64 cores and 128GB RAM. (More on the exact provisioning later.)

Somebody at Microsoft is going to be tempted to pipe up, “Yeah, we have serverless in Azure SQL DB too,” and I can’t help but respond in meme form:

Yes, that’s actually in the documentation, and it’s why you’re not hearing a lot of buzz about serverless here in the Microsoft & Azure communities. Azure balances load in minutes, not milliseconds, and drops connections at the worst possible time – when you’re under heavy workloads and the server isn’t able to keep up.

I suppose that’s serverless, in the sense that you lose the database server temporarily? Let’s move on.

Aurora Serverless was a perfect fit for SQL ConstantCare®.

SQL ConstantCare® consists of a small app that clients install and point at their SQL Servers. Once a day, that app polls all of their SQL Servers, exports diagnostic data to JSON files, encrypts it, and sends it to us. We can’t predict when it’s going to happen – it’s based on the schedule that users set up on their end, which tends to be even-hours (like noon or 4PM local time in their own time zone.)

When the files come up to us in AWS, we:

  1. Decrypt & extract them, and then load their contents into a database
  2. Run diagnostic queries, checking for problems, and build a list of things to warn them about
  3. Send them emails with advice

That application logic layer has been serverless (AWS Lambda) all along, and I wrote about that design decision as well. That’s paid off really well as we scaled to thousands of client servers because the workload is really bursty. We don’t wanna pay for lots of app servers to sit around idle most of the time, but when client files come in, we want to process the data as quickly as practical.

The problem right from the start? The database layer! AWS Lambda would see lots of incoming files all at once, spin up lots of workers to process all those files, and then – wham, all those workers would try to tackle the database server at the same time. Richie had to work hard to smooth out the peaks, or else we just kept DDoS’ing our own database server.

By switching to Aurora Serverless, the database could now better handle bursts of incoming files – while simultaneously cutting costs by downsizing capacity for the hours per day that we sat mostly idle.

When you create serverless replicas, you size them in Aurora Compute Units (ACUs). One ACU = 1 core and 2GB RAM. Because we wanted to make sure it worked, we picked:

  • Min replica size: 0.5 ACUs (that’s 1/2 a core and 1GB RAM)
  • Max: 32 ACUs (that’s 16 cores, 64GB RAM)

And then sat back and watched the replica automatically handle workloads.

Aurora resizes our database servers FREQUENTLY.

Here’s an average of the primary writer’s ACUs over a 24-hour span, in 1-minute intervals:

Don’t worry about the time zone or peak hours – just think about this being a 24-hour time span. On average, we have a few different bursts per day, but we generally range from around 2-3 ACUs (cores) up to around 28 ACUs (cores). But here’s where it starts to get wild – instead of looking at averages, let’s look at the MAX in each minute range:

Aurora is frequently slamming the gas pedal, taking us right up to 32 cores! We’re maxing out at 32 cores all the time throughout the day. And conversely, the min in each minute:

It’s slamming the brakes right down to 1-2 ACUs all the time.

Lemme rephrase those 3 charts: in many one-minute time spans of the day, our Aurora database server scales up and down automatically from 1 core to 32. All day long. In fact, it’s rare to have a 15-minute time span where Aurora didn’t ramp capacity up and down like crazy. Here’s the same set of charts in 15-minute intervals – first, maxes:

In most of the day, in any given 15-minute time span, we were hitting 15-32 cores. But for mins:

It dropped right down to 1-2 cores.

Again, this is night-and-day different to what Azure is doing. Microsoft’s all, “Hey, let’s think about this for a while and transition up and down over the span of a minute or two.” Amazon’s reacting with lightning speed, so as each query comes in, it gets the power it needs and our apps don’t notice the difference.

Admins have to ask new questions.

Because Aurora’s reaction time is so fast, it opens up totally new questions for performance tuning and cost management.

“Are we doing the right database maintenance?” Resource-intensive queries literally cost you money. We were spending around $50/day for storage throughput at one point, and one day’s cost suddenly shot to $250. After digging into AWS’s Performance Insights (which is fantastic, by the way), we determined that a surprise automatic index maintenance job had cost us $200. That experience changed the way we think about index maintenance, and for SQL Server admins, that would be especially eye-opening. Are you sure you need those index rebuilds? Can you prove that the server is actually getting faster for your money? Odds are, it’s not.

“Should we raise or lower the max ACUs?” If we raise it, then resource-intensive queries might be able to finish more quickly – but scaling up compute power doesn’t always result in a linear speed-up of queries. Just because you go from 16 cores to 32 cores doesn’t mean big queries finish exactly 2x faster. The question morphs into, “How low can we go? How low can we set the max without hurting our most valuable queries?”

“When we change ACUs, how does that affect IOPs?” When you add more ACUs, you get more memory available to cache data, which means you hit storage less frequently. When you reduce ACUs, you’re likely going to hit storage more. There are no dials to set for storage performance – Amazon manages that for you automatically, and just bills you for every IO you do. There’s no easy across-the-board answer here, either – the answer is going to depend on your own workloads. For SQL ConstantCare®, we’re constantly importing new data and querying it, so cache is less useful to us.

“Does serverless make sense for our workloads?” Does this scale up/down, cost up/down actually benefit your application either in terms of lower cost, or faster performance? For our particular workload, it was really funny when we first switched over – Aurora Serverless was almost exactly the same cost as our previous pre-sized, inflexible servers, hahaha! However, by tuning our max ACUs, we could more easily find a sweet spot where we could reduce costs without harming performance.

But here’s my favorite part of Aurora Serverless: it gives us the ability to run new product experiments without worrying about capacity. For example, since the very start, I’ve always wanted to have a completely free tier of SQL ConstantCare that alerts you about the most vital issues, like missing backups, database corruption, or an urgent patch. Before moving to Aurora Serverless, I didn’t want to test that out because a huge wave of free adoption might cause us database performance nightmares. Now, the database server isn’t holding us back from those issues. (So stay tuned for some fun experiments!)

“Does migrating from SQL Server to Postgres make sense?” By this point of the post, you might be jealous of this cool technology. Remember, though, that right at the beginning I told you that today’s learning wasn’t going to be too relevant to your current job. You don’t really wanna change an entire application’s back end from one database platform to another, regardless of how easy Babelfish might make it seem. Back end migrations rarely make sense. However, I just wanted to talk about this technology today so you could see what’s going on in other database platforms.

Besides, Microsoft’s surely working on similar capabilities for Azure SQL DB. I love how cloud databases turned into an arms race because they’re a real differentiator for cloud vendors.

If you liked this post, you’ll probably enjoy Ed Huang’s musings on building a database in the 2020s. He asks, “If we were to redesign a new database today from the ground up, what would the architecture look like?” To be frank, it wouldn’t look like Microsoft SQL Server, and between his post and mine, you’ll understand why. I’m not saying SQL Server is dead, by any means – but if you’re a developer building new-from-scratch applications in the 2020s, SQL Server probably isn’t your first choice, and that’s why.


[Video] Office Hours: Ask Me Anything at Sea

Videos
0

It was a dark and stormy morning, on the last day of my Panama Canal cruise. Before I perished, I sat down to answer your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

    >

  • 00:00 Start
  • 00:36 ClippyTheDBA: Hi Brent. How you keep up-to-date with technology in general and in SQL Server specifically?
  • 02:56 Björk: Should foreign keys be indexed from day1 or only when performance issues arise from lack of a NC index on the foreign key?
  • 03:55 ClippyTheDBA: Hola Brent. If you just started your consulting business, what thing(s) you would change or do differently (based on your current consulting business experience)? [Kind of Lessons Learned]
  • 06:00 Haydar: What bad stuff can we look forward to once SQL Server hits the max size for a data file? Any interesting stories of this happening in the wild?
  • 06:52 Eduardo: How do I find the worst performing query for a given app using the first responder kit?
  • 08:51 Netanel: Is it safe to overlap invocations of the tools in the first responder kit (i.e run sp_blitzindex at the same time of running sp_blitzcache, sp_blitzfirst etc)?
  • 09:48 Alex: Hi guru, Regarding your last office hours where you talked about 2022 version. I was planing to upgrade from 16 to 22, now you got me thinking. Should I upgrade to 19 instead as 22 is full of bugs?
  • 10:56 Gopher: What criteria do you use when picking cruise ship line A vs B?
  • 13:09 AGAnyday: I am in a shop with backup policies dictating full backups daily, Colleague DBAs say there was a research policy informing that course of action. What is the best gentle approach to convince the management to resort to full backups for weekends then do differentials every night?

Who’s Hiring in the Microsoft Data Platform Community? April 2023 Edition

Who's Hiring
7 Comments

Is your company hiring for a database position as of April 2023? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


[Video] Office Hours: Ask Me Anything About SQL Server in Costa Rica

Videos
4 Comments

Off the coast of Costa Rica, I went through your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Start
  • 00:30 gotqn: In SQL Database we have “sp_invoke_external_rest_endpoint”, but in SQL Server 2022 still have not. Is using SQL Server Machine Learning (sp_execute_external_script) a good alternative or nasty hack?
  • 02:19 Chase C: If you had complete authority to focus Microsoft on fixing or adding a feature to SQL Server, what would you choose?
  • 03:52 I like hot pot: Hi Brent, is any of your client using Sql Server 2022 in production and if yes, how bad is it?
  • 04:56 Peter: Hi Brent, our devs are writing parameterless table functions opposed to views. Beyond the extra syntax required to create them is there a demonstrable downside or should I just step away even though I think it is hiorrible.
  • 05:36 Jr Wannabe DBA: Hi Brent, what do you think it is the best way to trigger (fire and forget, do not wait for result) an external application action from an INSERT into a SQL table? Looking for a direction, not full solution. Thanks in advance.
  • 06:36 Mr. SqlSeeks: Upgrading our Azure VMs. Our folks suggest using a VM size that has burstable IOPS. I think about troubleshooting performance issues and scalability testing, having to figure out if things were executed during a burst or not. I feel like I want consistent IOPS in Prod. Thoughts?
  • 07:49 Mike: If hosting AlwaysOn Availability Group on Azure VMs – can I have, say, 1 server (replica) with 32 vCPU, 2nd server with 4 vCPU, and 3rd server with 8 vCPU? Will AG with different size replicas work fine? And, can this affect data synchronization in some way?
  • 09:03 chandwich: Hey Brent. Have you ever had a client refuse to let you install your sp_Blitz stored procedures? If so, how did you handle that situation?
  • 10:18 TheyBlameMe: Hi Brent. What’s your preferred strategy when updating a limited downtime production system from an int PK parent table to bigint?
  • 11:42 Netanel: When is a high perfmon value for skipped GHOST records per second ever a concern? sp_BlitzFirst is reporting 53k skipped ghost records per second.

Is SQL Server 2019 More CPU-Intensive Than SQL Server 2016?

SQL Server 2019
70 Comments

I’m running into something that I’m having a hard time believing.

A client was hitting CPU issues during load testing, and they swore all things were equal between their SQL Server 2016 and 2019 environments. The 2019 box was having CPU pressure issues that didn’t show up on the 2016 box. I’ve played this game before, and every time, the root cause has been different configurations between the two servers.

However, this time, not only were the servers the same, but I’m even seeing this same behavior with a simple query that I can reproduce on any 2016 vs 2019 setup. I haven’t tested on any other versions yet, but after a day of banging my head against the wall, I figured it was time to bring in the smart people – and that means you, dear reader.

Take any two identical servers, and I do mean identical – same CPU speeds, same power savings settings – and run this setup script. We’re creating a database in 2016 compat level just to compare the exact thing across all versions:

Then turn on statistics time, and run this query:

The query’s terrible, of course, but it’s designed to do a fixed amount of CPU work every time. We’re not disk-bottlenecked – the tiny numbers table fits easily in memory. You’re going to be tempted to change the table design or query design, and you’re absolutely welcome to, but make sure the query is CPU-bottlenecked, not read-bottlenecked.

Compare the CPU time (not duration) across SQL Server versions. Because I’m paranoid, I built a brand new Windows Server 2016 box from scratch up in the cloud, and installed two instances of SQL Server on it. Left hand window is SQL Server 2016, right hand window is 2019 RTM – don’t run them at the same time, obviously, because that would screw up the CPU availability:

SQL Server 2019 uses 5-10% more CPU time to execute the same query.

It’s not just single-threaded queries, either – if I let the query go parallel by removing the MAXDOP 1 hint, 2019 is still slower:

You’re also going to be tempted to say, “Just change the compat level, query, or indexes to make the whole thing go faster on 2019” – but that’s not the point, because often we can’t tune an entire running workload. (In this demo case, 2019 compat level actually works beautifully, dropping the CPU time down by about 1/3, and I wish the client’s case was that easy. They already tried that before they called me. Bummer.)

You’re also going to be tempted to say, “I bet it’s fixed in a 2019 Cumulative Update,” in which case, check out this wider screenshot. The far right window is 2019 CU19, the most current one, and it exhibits the same higher CPU usage as 2019 RTM:

You might even be tempted to say it’s the new lightweight query profiling – try turning that off:

And at least in my tests, it makes no difference.

That’s where you come in.

If you have access to absolutely identical environments (or different versions installed on the same base hardware), are you able to replicate these findings? Does the same query use more CPU time on 2019 than it did on 2016? The best evidence for this is a side-by-side screenshot of the same query’s output across the different versions.

For our own evil purposes, 2017 doesn’t really matter (because you’ve gotta get to current versions anyway), but if you want to test on, say, 2016 vs 2022, you’re welcome to. In our brief testing, we’ve seen 2022 exhibit the same CPU problems as 2019.

I wouldn’t use this case as evidence that 2019/2022 are “bad” by any means – they’re fine. It’s just helpful for folks to understand, when they’re doing capacity planning for new versions, that they may have to buy more licensing for the same server at upgrade time. In this particular client’s case, we’re probably going to have to bump from 8 cores to 10 cores in order to handle the same workloads – in their cases, the CPU difference is closer to 20%.


This is the Last Week of Free PowerShell Training.

PowerShell
11 Comments

You’ve been working hard all month to get through our newest class, Fundamentals of PowerShell for DBAs. This week, it all starts to come together as you put together PowerShell plus SQL Server to automate tasks on a regular basis:

If you couldn’t keep up, no worries – you can always purchase the course, or pick up my Recorded Class Season Pass Fundamentals to revisit the material when your schedule is less hectic. Hope you enjoy the free training!


Office Hours: Questions That Didn’t Make the Cut

Sometimes, y’all post questions at https://pollgab.com/room/brento and they don’t get a lot of upvotes, and reading through ’em, I bet it’s because other people don’t quite understand what you’re asking. I think there might actually be a good question at the heart of it, but … I’m just not sure what it is, and it needs to be rephrased.

Here’s a rundown of some of ’em that came in recently. If you recognize one of these as yours, you’re welcome to re-submit it at PollGab, but with clarification.


MonkeySQLDBA: Recently watched one of your video on fragmentation, great stuff. quick question, if your fill factor is set to 80, will the internal fragmentation get to 70% faster than leaving fill factor to 0?

Zoom out: what problem are you trying to solve? What’s the action that you would take based on this knowledge? Think about those, and then re-post the question.

Piotr: If SQL Server and PostgreSQL were aircraft which aircraft models would they be? F14 vs SU57?

I love the question, but I don’t know enough about aircraft to answer it. I don’t even know if those two airplanes are good, bad, or the same, hahaha.

Keith: Hey Brent! Is it safe to upgrade my Azure SQL database from compatibility level 140 to 160 just to be able to run the GENERATE_SERIES function?

Is it hard for you to create a numbers table? Like, really? I’m not being sarcastic, but if you need a list of numbers, why wouldn’t you just create one?

I Cannot Do This Alone: I’m sure it has always been hard to get skilled help, BUT… have you ever heard of “non-tech” companies sharing there tech talent in like a pool? I know I’m on the hook for a trashing here, but better by you than my CIO 😉

I think you’re describing outsourcing – having a group of full time tech employees that you can call on whenever you need them, and just pay for what you use.

Brent Reading Book
“Can you show us your feet?” Wat?!?

Henry: I am planning on an in place SQL Server upgrade from SQL Server 2012 to 2016 on a Windows Failover Cluster. All of my databases are in Full mode with Transaction logs taken every 5min. Should i put the databases in Simple during the upgrade?

Don’t upgrade in place, period.

Government Cheese: How do you like to measure IOPs for SQL Server storage (bare metal and cloud VM)?

First, I don’t, but even if I did – are you talking about measuring how many they already consume, or how much a new server provides? If you’re asking how many they already consume, how is that useful? You don’t know if the users are happy or not, or if you need to reduce storage throughput or increase it.

ConsultantWannabe: Hey Brent, you teach we should stand next to expensive things (like SS or Oracle) as contractors/consultants, any advice to identify our own expensive thing to stand next to? Obviously apart from standing nex to SS. Thanks

Are you … asking me … how to find out how much things cost? I’m confused. Why wouldn’t you just … ask management what the most expensive thing in the shop is?

Jeremiah Daigle: I have a server that has 2 8Core CPUs, and only have 8 enterprise core licenses. I was planning to just remove one of the CPUs, but ran into issues not having the blank to put back in. Is there anything to be concerned about by turning down each CPU to 4 cores in bios instead?

Having a blank to put in? I’m not sure what you mean – you shouldn’t need a “blank” CPU. I think someone’s pulling your leg, like they’re telling you to go get blinker fluid.

Maksimilian: What’s the best technique for a SQL sproc to self audit the params it was called with?

Self audit? Have the proc log them to a table. (I think I might be misunderstanding the question because it seems so obvious, but if you want to log something, and you’re in a database, well, uh, put it in a table.)

Eduardo: Linked-in provides automated public notification of completed course training. How should DBAs notify potential employers of completed Ozar training?

You mean … how do you add things to your resume in LinkedIn? I’m genuinely confused – are you asking how to edit your resume? I’m guessing you just click Edit on your profile, right? Put in whatever text you want there. If you’re saying that you have a problem because you need automation every time you complete any of my courses, and I have so doggone many of them, then stop putting each one – just put Fundamentals and Mastering.

Hangman: When is sp_whoisactive context_switches a useful metric for performance troubleshooting?

Ask whoever told you to look at that metric. Otherwise, don’t walk into the airplane cockpit, point at a gauge, and ask the pilot, “Hey, what’s that dial mean?” That’s not an effective use of anyone’s time. SQL Server is way worse than an airplane cockpit: there are precisely 1.21 gigawatts of metrics out there, and most of ’em just aren’t useful.

Eduardo: Is it good idea to start identity integer cols for new fast growing tables at the max negative value for a big int? Do you see this much in the field?

It’s fine. I almost never see it.

Tony: Will you be purchasing TSQL fundamentals 4th edition?

No. I’m sure it’s good, and I’m sure you doubt my T-SQL abilities, fair enough, but I’ve moved on to learning other stuff.

Isaac: How do you find all the queries that are using the kitchen sync query pattern (Col1 = @Col1Val or Col1 IS NULL) AND (Col2 = @Col2Val or Col2 IS NULL)? How do you find the worst of the worst?

Instead of looking for anti-patterns, ask, “What are the 10 worst-performing queries that I need to tune, and what are the anti-patterns in those?” That’s what sp_BlitzCache does.

TheCuriousOne: Hi Brent! From your perspective, is there any open problem/issue preventing a problem free upgrade from SQL Server 2019 to 2022 and if so, what are the gotchas to look out for?

Microsoft used to publish detailed upgrade guides for each version of SQL Server, but they stopped doing it. Check out the most recent one from 2014 (PDF) and that’ll give you a rough idea of how complex it is to migrate an existing environment.

Eduardo: What is your favorite graph database and why?

I don’t use any myself, so I’m not qualified to answer that.

Wasn’t_Me: We are thinking about switching from Azure to AWS. On docs.aws.amazon.com I find this phrase: “When you set up an Amazon RDS DB instance for Microsoft SQL Server, the software license is included.” What?? Does it means that on AWS I don’t have to pay SSRS, SSIS, SSAS?

Licensing is included in the hourly rate, yes. Amazon also offers bring-your-own-licensing. Keep in mind that you said RDS, and RDS doesn’t have SSRS, SSIS, and SSAS – you’ve got a lot more reading to do. Fortunately, I’ve got a training class to help.

Marian: Hi Brent! Have you even been to Romania? Would you consider attending some big tech event in Romania in the nearby future?

No, and since the pandemic, I’ve cut back a lot on my conference schedule. I’m sure Romania is nice, but I did a quick Google search and didn’t see any SQL Server conferences in Romania. I’m not really interested in non-SQL-Server conferences – when I want to learn other technologies, I tend to use cheaper/easier methods rather than traveling.

Neil: I set up all my SQL servers with TCP/IP enabled only. A developer is trying to connect with named pipes. Should I enable named pipes or force them to use TCP/IP?

I don’t have any opinion on this one whatsoever. (I don’t think I’ve ever disabled named pipes.) Why did you disable it?


[Video] Office Hours in Cartagena, Colombia

Videos
1 Comment

Before heading out to Old Town for sightseeing, I went through your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Start
  • 00:44 Haydar: Is there a good way to suppress part of TSQL batch from showing the query plan in SSMS but having the rest of the batch show the query plan?
  • 02:13 Mike: Hi Brent, so simple blocking on a busy server, can cause a failover ? Is it because of exhaustion of thread workers, resulting in THREADPOOL waits? But I can’t understand the mechanism, how this can lead to failover – could you please explain ?
  • 03:03 Greef Karga: Please describe the most strict / locked down environment you have worked in and the challenges it posed.
  • 06:11 Q-Ent: Hi Brent. Have you ever used buffer pool extension as an option for better performance?
  • 07:29 Hangman: When is sp_whoisactive context_switches a useful metric for performance troubleshooting?
  • 08:36 AFriendOfMineAsk: Howdy Sir! Our CTO wants a monthly dashboard of all Production SQL Servers, can you please give some advice or any tips on what KPIs should include or should i have to start looking for a new job? Thanks and have a great day Sir!
  • 09:57 Jester: What tool do you like to use to find the worst performing queries with implicit conversions?
  • 10:55 Yaakov: What is the best way to print the line number currently executing inside a TSQL Sproc? 11:45 AM, no I’m not a morning person.: using where X in (1,2,3,4,5, etc) for selecting random sets of items. Is there any performance reason not to do this and instead use parameters or put them into a temp table first? Is it right to assume this won’t be able to be a cached query?
  • 12:59 Mike: Since 2005, new version of SQL Server had been released every 2, maximum 3 years. What is your opinion on when the next version (after 2022) will come out ? Will it be late 2024, sometime in 2025, or later, and why ?

It’s Monday. Do You Know Where Your PowerShell Training Is?

PowerShell
2 Comments

You’re over the hump! Just two weeks left to go.

All month long, I’m giving away our newest class, Fundamentals of PowerShell for DBAs. Here’s what you’ll be learning this week:

In case you didn’t know before now, go head over to the class and watching the Before the Class modules that explain how to set up your workstation to follow along.

Not patient? Can’t wait for the rest of the videos to go live? Recorded Class Season Pass Fundamentals holders can jump in now – it’s included with your existing membership. Enjoy!


Slow Startup, Failover, and Restore Times with In-Memory OLTP

When In-Memory OLTP came out, I worked with it briefly, and I remember coming away thinking, “Who in their right mind would actually use this?” I was so horrified that I wrote a presentation about it and gave it at a couple of conferences:

Many laughs were had, but obviously that didn’t make me any friends at Microsoft, ha ha, ho ho. I figured I was done with Hekaton, and I wouldn’t ever have to see it again because y’all all saw the video, and you’d never be crazy enough to implement that feature.

<sigh>

Well, here it is in 2023, and recently I’ve talked to a couple of architects who wish they could go back in time and watch that video. In both cases, they suffered from the same issue.

The short story is that the more data you put into durable In-Memory OLTP tables – and even just 5GB of data can hit this issue – the more your startups, failovers, and restores turn into long stories, to the point where other databases on your SQL Server are practically unusable.

Setting up the 5GB Votes table

To demonstrate the problem, we’ll start with a large copy of the Stack Overflow database. I’ll drop my nonclustered indexes just to make the next screenshot more clear, and then I’ll list the tables in the database to find a good candidate for the demo.

The highlighted table, dbo.Votes, has about 150 million rows and takes up about 5GB of space. That’s not big data by any means, and we can easily fit it in memory on our 8-core, 60GB RAM server. Let’s migrate it into In-Memory OLTP in order to make performance go faster:

After migrating that 5GB table to In-Memory OLTP, it’s taking up 5GB RAM, right? Well, no:

Try 28GB RAM. That’s why we have a sp_Blitz check to warn about high memory usage for In-Memory OLTP:

As to why an uncompressed 5GB rowstore table takes up 28GB RAM, that’s a topic for another blog post. (Sometimes, I imagine rowstore tables being introduced today – people would think they’re the most amazing thing ever.)

What happens when SQL Server restarts?

For most of us, when we restart SQL Server, our biggest concerns are things like dropping connections, losing performance metrics, and starting over again with a fresh buffer pool. In-Memory OLTP users have a bigger problem: when the SQL Server restarts, CPU goes straight to 100%, and stays there:

Ouch. Why is CPU so high? Because with In-Memory OLTP, SQL Server starts up the database by reading the In-Memory OLTP tables up into something we call “memory.” It’s using less memory than before, thank goodness – it’s “only” 13GB – but 13GB of data still takes a long time to pull from disk and reconstruct in memory. (I’m not going to go into technical terms like crash recovery phases here because it’s not necessary to convey the overall problem.)

You can see exactly how long it takes in the SQL Server logs. In this case, the StackOverflow database recovery took 50 seconds – and in this case, because our server’s got really fast storage, the storage was able to deliver data so quickly that it pegged CPU at 100% for all 50 seconds.

During startup, wait stats show a lot of wait times on XTP_PREEMPTIVE_TASK and SLEEP_DB_STARTUP:

There’s also blocking, which is kinda funny – anything that needs to query sys.databases gets blocked, like IntelliSense queries running in the background:

This happens during failovers, restores, and attaches, too.

Startups, failover clustered instance failovers, and restores all exhibit the same problem: they need to start up the database, and doing that means reconstructing the In-Memory OLTP data from scratch.

This is particularly problematic for shops that use Always On Availability Groups. Say you want to add a new database into an existing cluster, and the new database happens to use In-Memory OLTP. You restore it on the primary, and … boom, the server hits 100% CPU usage for an extended period of time, which affected all the running queries on existing databases on that server.

Here’s what CPU looks like at the end of a database restore:

CPU was relatively low while the restore was processing, but once SQL Server was done writing the files, CPU went straight to 100%. Why? Because it needed to bring the database online, which meant reading the In-Memory OLTP data from disk and reconstructing the table.

Even if you just attach an In-Memory OLTP database – a process that is normally near-instantaneous – you bring SQL Server to its knees while it reads through that data and populates memory.

In my example, I don’t have indexes on the In-Memory OLTP table, but if I did, the situation would be even worse. Indexes on these tables are only kept in memory, not on disk, so they’re reconstructed from scratch at startup, failover, restore, and attach time.

Bottom line: the more In-Memory OLTP durable data you have,
and the more databases you have that use it,
the worse this gets.

Trace flags 3408 and 3459 don’t help on SQL Server 2022, at least.

Among other sources, Konstantin Taranov’s excellent trace flag list reports that trace flag 3408 forces all databases to use just one thread when starting up. I don’t think I’ve ever needed to use that trace flag, but it doesn’t appear to help here. In my lab, I set up 3408 in the startup config, restarted the SQL Server, and CPU still went straight to 100%:

Furthermore, 3408 doesn’t appear to have the desired effect on SQL Server 2022 (and I didn’t bother checking on other versions, since it wouldn’t have helped my client, as they had a lot of databases with In-Memory OLTP.) Databases are still starting up with multiple threads, which would drive CPU to 100%:

Same problem with trace flag 3459, which is supposed to disable parallel redo for AGs, but that’s unrelated to database startup, as we can see by the flames:

And just because someone’s going to ask, no, setting MAXDOP to 1 has no effect on system processes like bringing databases online.

So how do we get faster startups, failovers, and restores?

In theory, you could add CPUs. In practice, that’s a really expensive way to solve this problem, and it doesn’t work when you have several (or heaven forbid, dozens) of databases that use In-Memory OLTP. SQL Server doesn’t just start up one database at a time – it starts them in groups, which means you can saturate lots of CPUs on restart. If you need to dive into the internals of this, here are a couple of resources:

In theory, you could use slower storage. Bear with me for a second: if your primary concern was that other databases on the SQL Server were unusable while In-Memory OLTP databases came online, you could actually put the In-Memory OLTP filegroup on slower storage. In this 5GB Votes table example, that does actually lower CPU usage during most of the process, only driving it to 100% near the end of the process:

(Yes, I actually tested that for one particular client, who was curious.) However, that also makes the In-Memory OLTP databases take even longer to come online! In my example with a 5GB table, the database took ~80 seconds to come online instead of ~50 – making your RTO goals tougher to meet.

In practice, minimize what you keep in In-Memory OLTP durable tables. A few ways to do that:

  • Make them schema-only instead, and don’t keep the data around. Yes, you lose all data when the SQL Server goes down, but if you’ve been using this feature for temporary data anyway, like session state, reconsider whether you need to keep it.
  • Sweep older data out to conventional tables. One of Microsoft’s design patterns for this feature is to use In-Memory OLTP tables only for ingestion, but then after the data’s been absorbed into the database, archive it.
  • Use regular tables instead, not In-Memory OLTP. Conventional tables don’t interrupt startup.

[Video] Office Hours at Sea: Heading to Jamaica

Videos
0

I’m on a boat! I’m on a 10-day Panama Canal cruise, and I stopped (well, not the boat) off the coast of Florida to answer questions y’all posted at https://pollgab.com/room/brento.

  • 00:00 Start
  • 00:56 DBe: In several places I’ve worked, it’s been “policy” to automatically restart heavy-use SQL servers off-hours on a regular cadence. Usually monthly but in a couple cases weekly. Is the scheduled restarting of SQL servers a common, viable business practice?
  • 03:53 BBDD: Hey Brent, do you know why a heap with one nvarchar100 column and 1row can be with size 2 GB and all used.What might have happened to the poor table. I tried to reproduce it creating the same table with inserting, deleting truncating but the size when it was back to 1 row was 1mb
  • 04:52 Brandon: Changing order of joins made a BIG difference for me recently, I suppose b/c it helped SQL start in a better place on the search for a good plan before time ran out. This was on 2008. Have you run into this much & do you think it’s less of an issue after 2008?
  • 06:04 Gigiwig: Hi Brent, a friend of mine has a server instance with a sql_… collation. One of the dbs is from a vendor and insists on a different collation. Can that cause problems concerning joining to system tables, using tempdb? How did you handle collation mismatches in the past? Thx
  • 07:43 Pamela Anderson: Has the problem introduced by PSPO implementation in SQL Server 2022 CTP – impossibility to tell which statement relates to which batch (or SP?), that you blogged about last year – been actually fixed in 2022 when it came out ? Or we have monitoring broken ?
  • 08:35 Runyan Millworthy: What are the top signs that a shop needs more SQL DBA’s?
  • 11:22 Kajimial: Hugs Brent, watched your video for optimizing checkdb? Isn’t with physical_only supposed to be faster? I ran it on 8TB db and completed for a little bit over a day and just checkdb completes for 6h? How can this be possible? DB with no load and running it with maxdop 0 with Ola
  • 13:27 Craig Gardner: I’m doing a penetration test on a server and have come across a SQL Server. Build number is 12.0.937.0 and version is 2014 (although I doubt this). I can’t find any information on that build number. Do Microsoft have a list of build numbers for Azure Managed Instances?

Who’s Hiring in the Microsoft Data Platform Community? March 2023 Edition

Who's Hiring
14 Comments

Is your company hiring for a database position as of March 2023? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.