Microsoft releases SQL Server Cumulative Updates about every 60 days. This is a part of your job that you’re going to be doing a lot, so let’s get good at it! Here’s the strategy I like to use.
Pick what patch you’re going to apply. Generally speaking, you should be on the most recent Cumulative Update available for your version. (Years ago, folks only applied Service Packs, but starting with SQL Server 2017, Service Packs are gone. Microsoft only ships Cumulative Updates now.)
Decide how you’re going to detect problems. Every now and then, an update breaks something. For example, SQL Server 2019 CU7 broke snapshots, SQL Server 2019 CU2 broke Agent, and so many more, but my personal favorite was when SQL Server 2014 SP1 CU6 broke NOLOCK. Sure, sometimes the update installer will just outright fail – but sometimes the installer succeeds, but your SQL Server installation is broken anyway, and it may take hours or days to detect the problem. You need to monitor for new and unusual failures or performance problems.
Design your ideal rollout strategy. Here’s the order I like to use, but I understand that not everyone has all of these environments. More on that in a second. Roll out the patch in this order:
- Development servers – you want your developers seeing any failures or behavior changes first so they can feel confident that the eventual production patch will produce behavior they’re used to.
- QA/test servers
- Disaster recovery servers – often, these are the least-critical servers in the list that are actually being monitored with monitoring software, and leverage SQL Server’s high availability and disaster recovery features like clustering, and also have to keep up with the level of writes happening in production. New problems will show up here, and hopefully monitoring will detect them before you apply patches to subsequent levels.
- Read-only replicas – servers where some end user activity happens, but it’s less critical than your primary servers. This advice applies whether we’re talking replication, log shipping, or Always On Availability Groups.
- Failover servers – now we’re getting really close. The idea here is to patch these without taking a production outage – but that’s not always possible depending on the HA/DR features you’re using, and the way you’re using them.
- Production primary servers – and the way you patch these is to actually not patch them at all. On “patch day”, simply fail over to your failover server, which has already been patched. This way, if you experience any surprise issues with the patch within the first few days, you can fail back over to your unpatched production server. (This also means you need to hold off patching that server for a few days to give yourself a safety net.)
Design your actual rollout strategy. Having read that above rosy-world scenario, now you’re looking at your own environment going, “Brent, I don’t have a bunch of those servers.” Okay, no problem: scratch out the lines you don’t have, but understand that you’re also scratching out possible safety nets. This is something to think about when you’re designing your next SQL Server architecture.
Design your rollback strategy. In the event that you do detect problems – and it’ll happen sooner or later – you want to have a rough idea of what you’re going to do. In dev/QA/test, you might just choose to uninstall the update and wait it out, giving other SQL Server customers time to troubleshoot the problem with Microsoft on their mission-critical servers, then apply the next fixed update instead. If the update made it all the way to your DR or failover tier without you catching the problem, you might not have the luxury of cleanly uninstalling the update, and your rollback strategy may be to open a support case with Microsoft to troubleshoot the problem – hopefully before applying the failed patch to your production primary servers.
What we’ve done so far seems like a lot of designing, but remember, you only have to do this once, and you can reuse it for every update you apply to this environment.
When applying the actual patch, here’s what I like to do, in order:
- Verify that you have backups. Ideally, do a test restore, too: backup success messages don’t mean you have working backup files.
- Stop or shut down client apps. You don’t want folks starting a transaction as your update begins.
- Make sure there’s no activity happening on the server, especially long-running jobs like backups.
- Apply the update – if you’re using PowerShell, check out how to automate patching with DBAtools.
- Apply Windows updates since you’re down anyway. (Sometimes I find folks have been applying SQL updates, but not Windows updates – they’re both important.)
- Confirm the SQL Server service is started, and check your monitoring tools for any unexpected failures.
- Confirm the SQL Server Agent service is started again, and kick off your next log backup job.
- Start client apps back up and make sure they function.
Over the coming days, keep a much closer eye than normal on monitoring tools looking for unexpected failures. Then, it’s time to hop back on the hamster wheel again, and start planning your next round of updates.