Whether you’re looking to go cutting-edge and hop on SQL Server 2019 as soon as it’s available, or whether you’re just trying to evacuate from SQL Server 2008 to anything that’s going to be supported after this summer, you’re probably doing upgrade projects. This is a good time to ask yourself 5 questions:
1. Do we really need the higher compatibility level right away? When you flip the switch at the database level to enable a newer compat level, you’re taking a risk that your performance will be better across the board for every single query. The reality is that sometimes, it’s not. You probably tested your slow queries and found they went faster with the new compat level, but you didn’t test your fast queries to see if they regressed. After migrating to a newer version, just leave the compat level at the same one you had before for a week or two. Later, flip the switch on the newer compat level, watch what queries perform better or worse, and be ready to pop back to the older compat level temporarily if necessary. This just buys you and your team time to do performance mitigations on queries that regressed.
2. Will we enable Query Store? This built-in tool helps you do a better job of identifying which queries are regressing, and why. It’s not intuitive, and the default settings are a really bad idea, but you can work around those if you spend some time learning it ahead of time. It’s just a question of whether you’re willing to invest that time. (The tool isn’t going away, and I think learning how to configure and use it is a good investment that would pay you dividends in the long term.)
3. What’s our back-out plan? I’m not a fan of upgrading in place because it leaves you without a safety net. However, even if you build new VMs and migrate over to them, you still need to do a round of user acceptance testing to make sure everything’s okay on the new environment. Once you go live on a newer version, and you start doing deletes/updates/inserts (DUIs) on that new version, your data just flat out now lives in the new version. You can’t restore a database to an earlier SQL Server version. So before you let the users in, designate a team to do a round of quick performance tests, and have everyone agree on the success criteria. If they don’t sign off on go-live data, scratch it, and stay on the older infrastructure for another week or two (or more) until you iron out the performance issues.
4. What sp_configure settings and trace flags are we using, and why? Over the years, we might have implemented different tweaks in order to fix performance or reliability issues. When you upgrade, that’s a good time to stop and revisit those settings to see if they’re still relevant. Plus, their behavior can change – here’s a problem I’m running into right now with trace flag 834 and CTP 3.0:
Microsoft just doesn’t have the bandwidth to test every possible combination of trace flags and sp_configure settings. The more you venture away from the defaults, the more you’re taking risks that your particular combination won’t behave the way it has in the past.
Update May 29: with the help of the brilliant @sqL_handLe, we traced down that it’s a bug in the Hybrid Buffer Pool feature that’s trying to use conventional memory as a PMEM device. Boy, talk about a good example of it being difficult to test every combination of feature – I never would have thought to investigate Hybrid Buffer Pool as a culprit, and I’m still not sure how to do good root cause analysis on that feature given the limited instrumentation that I found during my investigation.
5. How often will we patch? SQL Server 2017 doesn’t have Service Packs anymore, only Cumulative Updates. Going to 2017 or 2019 is a good time to stop and talk with management about their expectations for patch outages. If they’re used to only taking annual outages for Service Packs, they’re going to be in for a big surprise with the frequency of Cumulative Updates: every 30 days for the first year of a product, and then every 60 days after that. When managers tell me they want less frequent maintenance windows, I walk them through the list of things fixed in recent updates. Take SQL Server 2017 Cumulative Update 15, for example: 2017’s been out for almost two years now, and yet CU15 still found & fixed bugs involving incorrect query results, Dynamic Data Masking security leaks, incorrect query results, stack overflows during backups, slight memory issues, and incorrect query results. Bugs are real, y’all.
What, you’re upset that SQL Server has bugs?
Well, it could be worse. The bugs could be there, and not getting fixed. Like, uh, the SQL Server 2008 that you’re supposed to be binning this summer.