How to Patch SQL Server

Microsoft releases SQL Server Cumulative Updates about every 60 days. This is a part of your job that you’re going to be doing a lot, so let’s get good at it! Here’s the strategy I like to use.

Pick what patch you’re going to apply. Generally speaking, you should be on the most recent Cumulative Update available for your version. (Years ago, folks only applied Service Packs, but starting with SQL Server 2017, Service Packs are gone. Microsoft only ships Cumulative Updates now.)

Decide how you’re going to detect problems. Every now and then, an update breaks something. For example, SQL Server 2019 CU7 broke snapshotsSQL Server 2019 CU2 broke Agent, and so many more, but my personal favorite was when SQL Server 2014 SP1 CU6 broke NOLOCK. Sure, sometimes the update installer will just outright fail – but sometimes the installer succeeds, but your SQL Server installation is broken anyway, and it may take hours or days to detect the problem. You need to monitor for new and unusual failures or performance problems.

Design your ideal rollout strategy. Here’s the order I like to use, but I understand that not everyone has all of these environments. More on that in a second. Roll out the patch in this order:

  1. Development servers – you want your developers seeing any failures or behavior changes first so they can feel confident that the eventual production patch will produce behavior they’re used to.
  2. QA/test servers
  3. Disaster recovery servers – often, these are the least-critical servers in the list that are actually being monitored with monitoring software, and leverage SQL Server’s high availability and disaster recovery features like clustering, and also have to keep up with the level of writes happening in production. New problems will show up here, and hopefully monitoring will detect them before you apply patches to subsequent levels.
  4. Read-only replicas – servers where some end user activity happens, but it’s less critical than your primary servers. This advice applies whether we’re talking replication, log shipping, or Always On Availability Groups.
  5. Failover servers – now we’re getting really close. The idea here is to patch these without taking a production outage – but that’s not always possible depending on the HA/DR features you’re using, and the way you’re using them.
  6. Production primary servers – and the way you patch these is to actually not patch them at all. On “patch day”, simply fail over to your failover server, which has already been patched. This way, if you experience any surprise issues with the patch within the first few days, you can fail back over to your unpatched production server. (This also means you need to hold off patching that server for a few days to give yourself a safety net.)

Design your actual rollout strategy. Having read that above rosy-world scenario, now you’re looking at your own environment going, “Brent, I don’t have a bunch of those servers.” Okay, no problem: scratch out the lines you don’t have, but understand that you’re also scratching out possible safety nets. This is something to think about when you’re designing your next SQL Server architecture.

Design your rollback strategy. In the event that you do detect problems – and it’ll happen sooner or later – you want to have a rough idea of what you’re going to do. In dev/QA/test, you might just choose to uninstall the update and wait it out, giving other SQL Server customers time to troubleshoot the problem with Microsoft on their mission-critical servers, then apply the next fixed update instead. If the update made it all the way to your DR or failover tier without you catching the problem, you might not have the luxury of cleanly uninstalling the update, and your rollback strategy may be to open a support case with Microsoft to troubleshoot the problem – hopefully before applying the failed patch to your production primary servers.

What we’ve done so far seems like a lot of designing, but remember, you only have to do this once, and you can reuse it for every update you apply to this environment.

When applying the actual patch, here’s what I like to do, in order:

  1. Verify that you have backups. Ideally, do a test restore, too: backup success messages don’t mean you have working backup files.
  2. Stop or shut down client apps. You don’t want folks starting a transaction as your update begins.
  3. Make sure there’s no activity happening on the server, especially long-running jobs like backups.
  4. Apply the update – if you’re using PowerShell, check out how to automate patching with DBAtools.
  5. Apply Windows updates since you’re down anyway. (Sometimes I find folks have been applying SQL updates, but not Windows updates – they’re both important.)
  6. Confirm the SQL Server service is started, and check your monitoring tools for any unexpected failures.
  7. Confirm the SQL Server Agent service is started again, and kick off your next log backup job.
  8. Start client apps back up and make sure they function.

Over the coming days, keep a much closer eye than normal on monitoring tools looking for unexpected failures. Then, it’s time to hop back on the hamster wheel again, and start planning your next round of updates.

Previous Post
Who’s Hiring in the Database Community? June 2021 Edition
Next Post
“I’m getting index seeks. Why are my row estimates still wrong?”

21 Comments. Leave new

  • Thomas Franz
    June 8, 2021 4:22 pm

    Very nice to-do-list.

    It would be nice, if you could place a link on https://sqlserverupdates.com/ to this article – it makes it easier to find when you really need it 🙂

    Reply
  • Are there any ramifications of applying updates on a DR server when all the databases are in Restoring mode?

    Reply
    • Yep. The specifics are beyond the scope of this post though – that’s where your design steps come in as I discussed in the post. The good news is you still have job security. 😉 The bad news is that I’m not doing ALL of your job, heh.

      Reply
  • James Fogel
    June 8, 2021 6:00 pm

    I got a little sick when I saw the mention of opening a ticket with MS. I haven’t had to open one for an SQL Server incident since last year, but I’ve open several in the last few months related to AD issues. We have either figured it out on our own or given up. Weeks go by with nothing to show for it. It seems that only tickets submitted via the Azure Portal are handled by people with a clue.

    Reply
  • As always nice post @BrentO.
    Totally agree in the list.

    Number 2. and number 8. are the toughest ones! Stop/Start manually or automated!

    And-And-And your App-Admins won’t [always] know what Services they need to stopp 😉 you got to tell them.

    for one server one instance that’s all ok but for a thousand servers or several th… ha ha

    Reply
  • Dba locoCrazy
    June 8, 2021 6:13 pm

    what “monitoring tools” you guys sugest ? I’m used to create my owns but for sure you guys know better ones.

    Reply
    • That’s outside of the scope of this blog post. Let’s not turn this into a shopping comparison – just don’t Want to go there here. Thanks for understanding.

      Reply
  • Chad Miller
    June 8, 2021 6:25 pm

    If you work in a large enterprise, your company will have already invested in patch automation software which is fully capable of also patching SQL Server. Two which I’ve used: SCCM and Tanium. The patching team which manages all the Microsoft patches can include MSSQL CU’s and service packs as part of monthly patch updates and rollout out at same schedule as MS monthly patching staggering dev, qa, test and prod environment. This requires working silos, but that’s always a good thing.

    Reply
    • Chad – can you take a quick look again through the post and verify that the patching software can do those things? May want to double check those.

      Reply
  • Should we stop sql services as well before the patching? I think that step 2 should include it as well, right?

    Reply
  • Thierry Van Durme
    June 9, 2021 6:27 am

    Thanks for your insights Brent! Adding a couple to our list 🙂
    I like to separate Windows and SQL updates – if something fails it makes it a little easier to narrow down the possible causes.
    Also, when D/R (log shipping for example) is in standby mode, that may not always work if your production server is not patched yet because your databases might need to be upgraded before SQL can bring them online.

    Reply
  • Juanita Drobish
    June 10, 2021 3:22 pm

    Question on patching 2 servers involved in transactional replication between each other. Both servers are on SQL 2014 sp2 cu17. To apply the most current updates, I need to apply 2014 sp3, then cu4 then GDR. Do both servers have to be patched at the same time or can one server be patched then wait a few days to patch the following server?

    Reply
    • Howdy ma’am! You can usually patch them out of order.

      Reply
    • Just a quickie note: the GDR releases are separate to the CU releases; you cannot apply both.
      Once you have applied a CU, you must remain on the CU releases; same for GDR, once you start on GDR, you stick on GDR. The only time you can swap between them, is when applying a new SP (you could be waiting a long time these days!) or reverting back to the original RTM.
      For many of the Instances I look after, we stick on the GDR release schedule; far fewer updates to apply, most for security patching; sure, we don’t get the latest-and-greatest features and improvements; but for these systems, stability and reliability are more important.

      Reply
  • Hello Brent,

    Nice article, for me as working with SQL Server since more than 17 years, I learnt the hard way, Never ever try Microsoft first patch release, wait for some time until “Someone else try it first”, then apply it in your environment in the same order you provided in the article.

    As now MS is releasing a patch every 2 month, I am patching my server once per year, that`s more than enough for my current stable environment.

    Reply
  • Dev Bethanasamy
    June 12, 2021 2:35 pm

    We upgrade passive node on FCI and failover to that node, after the upgrade and keep the upgrade other node on lower CU, so we can rollback to that node when something goes wrong. We do this for a week and then upgrade other node.
    If we failover, SQL Server nicely rollbacks any Changes it made.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu