Microsoft releases SQL Server Cumulative Updates about every 60 days. This is a part of your job that you’re going to be doing a lot, so let’s get good at it! Here’s the strategy I like to use.
Pick what patch you’re going to apply. Generally speaking, you should be on the most recent Cumulative Update available for your version. (Years ago, folks only applied Service Packs, but starting with SQL Server 2017, Service Packs are gone. Microsoft only ships Cumulative Updates now.)
Decide how you’re going to detect problems. Every now and then, an update breaks something. For example, SQL Server 2019 CU7 broke snapshots, SQL Server 2019 CU2 broke Agent, and so many more, but my personal favorite was when SQL Server 2014 SP1 CU6 broke NOLOCK. Sure, sometimes the update installer will just outright fail – but sometimes the installer succeeds, but your SQL Server installation is broken anyway, and it may take hours or days to detect the problem. You need to monitor for new and unusual failures or performance problems.
Design your ideal rollout strategy. Here’s the order I like to use, but I understand that not everyone has all of these environments. More on that in a second. Roll out the patch in this order:
- Development servers – you want your developers seeing any failures or behavior changes first so they can feel confident that the eventual production patch will produce behavior they’re used to.
- QA/test servers
- Disaster recovery servers – often, these are the least-critical servers in the list that are actually being monitored with monitoring software, and leverage SQL Server’s high availability and disaster recovery features like clustering, and also have to keep up with the level of writes happening in production. New problems will show up here, and hopefully monitoring will detect them before you apply patches to subsequent levels.
- Read-only replicas – servers where some end user activity happens, but it’s less critical than your primary servers. This advice applies whether we’re talking replication, log shipping, or Always On Availability Groups.
- Failover servers – now we’re getting really close. The idea here is to patch these without taking a production outage – but that’s not always possible depending on the HA/DR features you’re using, and the way you’re using them.
- Production primary servers – and the way you patch these is to actually not patch them at all. On “patch day”, simply fail over to your failover server, which has already been patched. This way, if you experience any surprise issues with the patch within the first few days, you can fail back over to your unpatched production server. (This also means you need to hold off patching that server for a few days to give yourself a safety net.)
Design your actual rollout strategy. Having read that above rosy-world scenario, now you’re looking at your own environment going, “Brent, I don’t have a bunch of those servers.” Okay, no problem: scratch out the lines you don’t have, but understand that you’re also scratching out possible safety nets. This is something to think about when you’re designing your next SQL Server architecture.
Design your rollback strategy. In the event that you do detect problems – and it’ll happen sooner or later – you want to have a rough idea of what you’re going to do. In dev/QA/test, you might just choose to uninstall the update and wait it out, giving other SQL Server customers time to troubleshoot the problem with Microsoft on their mission-critical servers, then apply the next fixed update instead. If the update made it all the way to your DR or failover tier without you catching the problem, you might not have the luxury of cleanly uninstalling the update, and your rollback strategy may be to open a support case with Microsoft to troubleshoot the problem – hopefully before applying the failed patch to your production primary servers.
What we’ve done so far seems like a lot of designing, but remember, you only have to do this once, and you can reuse it for every update you apply to this environment.
When applying the actual patch, here’s what I like to do, in order:
- Verify that you have backups. Ideally, do a test restore, too: backup success messages don’t mean you have working backup files.
- Stop or shut down client apps. You don’t want folks starting a transaction as your update begins.
- Make sure there’s no activity happening on the server, especially long-running jobs like backups.
- Apply the update – if you’re using PowerShell, check out how to automate patching with DBAtools.
- Apply Windows updates since you’re down anyway. (Sometimes I find folks have been applying SQL updates, but not Windows updates – they’re both important.)
- Confirm the SQL Server service is started, and check your monitoring tools for any unexpected failures.
- Confirm the SQL Server Agent service is started again, and kick off your next log backup job.
- Start client apps back up and make sure they function.
Over the coming days, keep a much closer eye than normal on monitoring tools looking for unexpected failures. Then, it’s time to hop back on the hamster wheel again, and start planning your next round of updates.
Very nice to-do-list.
It would be nice, if you could place a link on https://sqlserverupdates.com/ to this article – it makes it easier to find when you really need it 🙂
Thomas – great idea, done!
Are there any ramifications of applying updates on a DR server when all the databases are in Restoring mode?
Yep. The specifics are beyond the scope of this post though – that’s where your design steps come in as I discussed in the post. The good news is you still have job security. 😉 The bad news is that I’m not doing ALL of your job, heh.
I got a little sick when I saw the mention of opening a ticket with MS. I haven’t had to open one for an SQL Server incident since last year, but I’ve open several in the last few months related to AD issues. We have either figured it out on our own or given up. Weeks go by with nothing to show for it. It seems that only tickets submitted via the Azure Portal are handled by people with a clue.
As always nice post @BrentO.
Totally agree in the list.
Number 2. and number 8. are the toughest ones! Stop/Start manually or automated!
And-And-And your App-Admins won’t [always] know what Services they need to stopp 😉 you got to tell them.
for one server one instance that’s all ok but for a thousand servers or several th… ha ha
what “monitoring tools” you guys sugest ? I’m used to create my owns but for sure you guys know better ones.
That’s outside of the scope of this blog post. Let’s not turn this into a shopping comparison – just don’t Want to go there here. Thanks for understanding.
If you work in a large enterprise, your company will have already invested in patch automation software which is fully capable of also patching SQL Server. Two which I’ve used: SCCM and Tanium. The patching team which manages all the Microsoft patches can include MSSQL CU’s and service packs as part of monthly patch updates and rollout out at same schedule as MS monthly patching staggering dev, qa, test and prod environment. This requires working silos, but that’s always a good thing.
Chad – can you take a quick look again through the post and verify that the patching software can do those things? May want to double check those.
On the applying the patch, yes it does. Things like shutting down apps, verifying backups and verifying SQL Server after patching are additional processes/procedures as you pointed out.
Right – applying the patch is the easy part. 😉
yes applying the patch is the easy part… wondering how involved your automated process is and how you leverage SCCM for SQL patching.. especially if you are patching clustered instances
Should we stop sql services as well before the patching? I think that step 2 should include it as well, right?
I’ve never done that myself, but I’ve heard other folks do it. If you want to do it, that’s completely fine.
Thanks for your insights Brent! Adding a couple to our list 🙂
I like to separate Windows and SQL updates – if something fails it makes it a little easier to narrow down the possible causes.
Also, when D/R (log shipping for example) is in standby mode, that may not always work if your production server is not patched yet because your databases might need to be upgraded before SQL can bring them online.
Question on patching 2 servers involved in transactional replication between each other. Both servers are on SQL 2014 sp2 cu17. To apply the most current updates, I need to apply 2014 sp3, then cu4 then GDR. Do both servers have to be patched at the same time or can one server be patched then wait a few days to patch the following server?
Howdy ma’am! You can usually patch them out of order.
Thanks so much, Brent!
Just a quickie note: the GDR releases are separate to the CU releases; you cannot apply both.
Once you have applied a CU, you must remain on the CU releases; same for GDR, once you start on GDR, you stick on GDR. The only time you can swap between them, is when applying a new SP (you could be waiting a long time these days!) or reverting back to the original RTM.
For many of the Instances I look after, we stick on the GDR release schedule; far fewer updates to apply, most for security patching; sure, we don’t get the latest-and-greatest features and improvements; but for these systems, stability and reliability are more important.
Nice article, for me as working with SQL Server since more than 17 years, I learnt the hard way, Never ever try Microsoft first patch release, wait for some time until “Someone else try it first”, then apply it in your environment in the same order you provided in the article.
As now MS is releasing a patch every 2 month, I am patching my server once per year, that`s more than enough for my current stable environment.
We upgrade passive node on FCI and failover to that node, after the upgrade and keep the upgrade other node on lower CU, so we can rollback to that node when something goes wrong. We do this for a week and then upgrade other node.
If we failover, SQL Server nicely rollbacks any Changes it made.
How many sql servers are you doing manually like this? I’m trying to find if people are widely using SCCM to automate .. especially where SQL clusters are concerned
Thierry Van Durme already mentioned it… there might be an issue with holding of updating the previous-version active node after moving the database to the formerly-standby node of a cluster. The internal database version might change and I am not 100% sure you can always re-active the old node(s) with the updated databases. I do recall having issues with that, although that was almost a lifetime ago, so that issue might no longer be a thing. If anyone could confirm that either way that would be nice.
We can never really confirm that permanently – only Microsoft could. I’m gonna be honest here: they’re not putting that level of testing into Cumulative Updates, and it’s not fair to ask them to do it, either. They would have to test upgrading from every possible CU – and that’s just not realistic.
The fact that they’re not putting this level of testing in is exactly why we have to not upgrade every production server at the same time. You simply wouldn’t be able to handle the aftermath when these bugs strike, and they’ve certainly been striking with a vengeance in the last couple of years.
I just stepped in to this trap while was updating my cluster environments of SQLs 2017 (CU19 to CU24) and 2019(CU11 to CU13).
in our test all worked well, but on prod 50% failed to failover. now we are investigating what is the problem.
this is only to share the info (and my frustration).
never expect the patches to magically work:-)
Are security updates cumulative? For example, SQL 2014 SP3 has 2 consecutive security updates listed:
12.0.6433.1 and 12.0.6372.1.
Does 12.0.6372.1 contain what was included in 12.0.6433.1? or do I need to apply both?
Hi Brent! What do you think about the note in the official documentation saying:
“Mixing versions of SQL Server instances in the same AG is not supported outside of a rolling upgrade and should not exist in that state for extended periods of time as the upgrade should take place quickly. The other option for upgrading SQL Server 2016 and later is through the use of a distributed availability group.”
If I understand correctly, I shouldn’t leave the one server with the old CU update, right?
They’re talking about versions, not patch levels.
Besides, even if they were talking about patch levels, the whole reason we’re doing this is that Microsoft doesn’t really support broken CUs. All they do is tell you to uninstall it from everywhere, which involves serious outages. It’s easier and safer to leave a node unpatched.
Facepalming myself right now! Sorry for the dumb comment and as always thanks for the quick feedback!
Can we split Windows and OS patching in to 2 different cycles instead of applying both patches on same day? Is it risky to apply both together?
Correction: It would be Windows and SQL patching
That’s outside of the scope of this blog post.
Nice article as always Brent. I was asked to test patching SQL server using SCCM. However while checking online it seems people have had trouble with patching SQL utilizing SCCM. In addition i have tested using DBATools and that works great but my organization is quite secure and wont allow me to patch remote servers using DBATools.
My question is what have you heard about patching SCCM? Is it recommended or stick to PowerShell scripts?
Thanks, glad you liked the post.
This post is about how to patch SQL Server.
If I wanted to write a post about how *not* to patch SQL Server, I’d have written about SCCM, heh.
Can we apply SQL Server 2014 SP3 CU4 directly on top of SQL Server 2014 CU11 or it should be to apply SQL Server 2014 SP2 then SQL Server 2014 SP3?
correction – on top of SQL Serve 2014 SP1 CU11
Just apply the most recent service pack, and then the most recent cumulative update for that service pack.
Thank you very much Brent!