I saw a DBA.StackExchange.com question and had to laugh out loud:
“Is there a cool way of performing CU updates for SQL Server on hundreds of machines?”
No, and it has nothing to do with technology.
- Which servers are mission-critical 24/7 and can’t go down?
- Which servers can only be taken down in specific time windows?
- Which servers have dependencies between each other, like database mirroring and AlwaysOn Availability Group replicas?
- Which servers have automatic failover mechanisms (like clusters) where you can patch the standby node first, then fail over once, and patch the primary without having to fail back?
- Which servers have vendor apps that required a specific hotfix that may not be included in the cumulative update you’re about to apply?
- Which servers are running long batch jobs like data warehouse loads or backups that would take a long time to restart if you took it down in the middle of its operations?
- Which servers have pending sp_configure changes that will take effect and surprise you when you restart?
Patching is hard work. Seriously.
Seriously, patching drives anyone crazy. I’d rather not to get my life more complicated. 🙂
Haha yeah. Good times.
Working for a large corporation it’s about pecking order. If you’re infrastructure then you can reboot a few hundred production servers at night time and call the shots – everyone is working overtime to do health checks afterwards. They don’t give a toss about the clusters and mirrors; they don’t even monitor the damned things. But they’re the golden haired children with dozens of staff and a blank cheque to do whatever they want – so it’s all okay.
If you’re a DBA though patching requires days of tedious paperwork and authorisations weeks in advance. Hell why do you want to patch at all? And nobody wants to stay late for health checks, plus as a DBA there’s no authority to force it. Managers “WANT PATCHING DONE!” but become mysteriously absent when these process blocks get brought up.
One thing I thought is funny though is vendor recommendations. Most vendors I’ve talked to don’t have proper DBAs or even know much about the back end, and would have everything on 2000/2005 if they could get away with it. Unfortunately we have to put our foot down and force upgrades where possible because (again) not that anyone cares about the database or extra load of having to deal with SQL 2000 servers, but it’s only supported on Server 2003 and that’s out of support now and Infrastructure again gets to make the call to get it moved.
But I also think some vendors and DBAs are too cautious and never even try to patch. In our shop we generally consider SPs as required just as any other OS level update (the only exception being for SCOM as the installer spews if you’re too up to date), nd we patched CUs during the early 2012 and 2014 corruption debacles (for safety though we weren’t personally affected). We did go through dev/test/qa first but no special testing was down – a lot like most vendors.
Eh, consulting DBAs like to make a big show and dance on testing for patches but I don’t believe them. They leave it in the laps of the application people who have extremely limited testing capabilities themselves and even then nowhere near the complexity and load to reveal the kinds of intricate issues any SP or CU could conceivably cause.
And there’s just no evidence it’s done. No massive blog articles with details of which careful benchmarks were taken, how the hundreds or thousands of procedures were benchmarked in a way that would be useful to compare again later, backups taken and traces started, replays done and (somehow?!) data comparisons made at the end – assuming time stamps and everything were perfect. I’ve quizzed even Oracle experts with decades of experience and they admit it’s just not done.
Testing is the unicorn of database patching.
I feel if they have a concern after it has been running in non prod for a while then they need to come up with whatever verification testing they feel would make them comfortable. Otherwise not be a road block. because it comes to the point that perhaps nothing we do can make them comfortable but updates still need to be done.
As Brant mentioned there are many reasons why it is a pain. Some of the large applications come to mind such as SCOM and PeopleSoft and SAP and other picky vendors just saying they don’t support it. Many others are not an issue for People so much but some are definitely a challenge.
Sorry Brent not Brant. 🙂
Bret, Hey thanks for the post. I loved it! These are great points. All in all on stack exchange there were some great points too. Management made me post it!
I remember back when I was the SCCM admin for my previous company before I became an accidental DBA, and I was just as naive about it. I was like “hey we can apply CU’s via SCCM, how cool is that” and then the DBA team quickly took me in a room and threatened bodily harm if I didn’t put their servers in a group to exempt them from any deployments.
Awesome post, brought back some memories………..terrible, horrible memories.
Even when you believe you have all the issues covered there is getting approvals.
I get questions like:
How can I be sure that the application will not later on take a code path that gets affected by this update even if so far it never has in any non prod environment for the last 6 months? I am not comfortable okaying this.
All the seven non-prod environments updated but the approvals are very hard to get for production.
And then a security update for SQL Server gets applied by Forefront with no warning. Patching is a mess.
I learned long ago to be aggressive about applying SPs and needed CUs That was before the recent debacle with 2014 SP1… But I do wait a bit to apply them to see if they’re stable. After that, though, I make sure everything is up-to-date. Nothing worse than putting off the inevitable and getting caught with a bug or by an auditor!
It takes a lot of communication and trust with the developers/business partners, but once you get everyone in the groove it goes better. It’s still a pain, especially clusters, but I just hunker down and get ‘er done.
That said, my company has Premier Support from Microsoft. So that makes a big difference, at least mentally.
I had a sysdmin, who was setting up WSUS, decide that SQL Server patches could be applied with the other Windows patches. (Yeah!) This of course happened during a scheduled maintenance cycle so I didn’t notice right away, luckily I caught it before he was able to apply the second round of monthly patches. And moreover, there was no negative impact. But I’m still seeing these one off builds that haven’t had any new updates applied by my team. I still cringe when I see those and think about it. It just goes to show how little is actually understood among other IT disciplines when it comes to SQL Server.
This is a two way street. Most disciplines are ignorant of each others challenges and responsibilities. You might be annoyed with the operations team (what @Cody was talking about) but then you are probably not responsible for the rest of the infrastructure and all of the security nightmares that DBAs and Developers create for the Ops team. It is easy to villianize the Ops team, though if rebooting a SQL server causes an outage, you just got valuable intelligence about the weaknesses in your configuration.
To illustrate I recently attended some training for SQL Server 2014, and the presenter/teacher told us that the first thing to disable in *production* (I had to get him to verify this) was the firewall followed by UAC. To put this into a DBA perspective (my background is in operations), imagine developers working day to day with the sa account on your production SQL servers. Scary stuff.
Going forward, the various disciplines will need to work together (the culture and processes that are behind the so called DevOps movement). A good starting point would be the books “The Phoenix Project” and “Team of Teams”.
Having said all that, like the original post said, taken as a whole this stuff is hard for primitive monkeys to get right.
Somewhat entertaining. I never really care about the firewall and in fact sometimes enable it. And I never bothered with disabling UAC. I have also had the hat of domain admin and VMware admin along with BladeSystem administrator/builder (HP). Once siloized (as they are now) it just tends to be more difficult to control things nicely.
Lots of SQL 2014 running with no issues. However no AG yet! 🙂
This brings me back to a patch I installed…years ago at a previous company… they did not want to have a backup server in place for the call center’s sql server running windows server 2000… and a windows server patch made the hard drive unsupported.. and when we booted back up… blue screen of death.
The Windows updates are a much bigger problem for me. I can sort through the KB’s, Hotfixes, Cu’s, blog posts, etc for SQL but I don’t have the time, experience, or tech savvy to research and evaluate 40 “Important” and 25 “Optional” Windows updates every 2nd Tuesday.
We have been burned more than once by these updates but not applying them is no answer either.
A recent (May 2015) clean install of Windows 2008R2 SE (with SQL2008R2 SP2 SE) resulted in well over 300 Windows Updates (4 cycles of apply, reboot, download/apply, reboot, …).
BTW I have heard that there is a Windows 2008R2 patch out there that will cause SQL servers to reboot when starting SQL server.. it has not happened to any of my servers so can not comment.
Well, perhaps I can chime in from a vendor’s perspective.
SQL Server and Windows updates are an enormous issue for us. Consider that we have a large number of large customers, each of which have their servers patched to varying levels based on their organization’s policies and administrator’s personal past histories (where no policy exists).
At what patch level are we supposed to aim? We could never attempt to keep up with all the combinations from all our customers and regression test against them, yet the customers expect everything to work perfectly against whatever versions they maintain.
What we do is to support SQL Server 2008 R2 on a Windows Server 2008 R2, SQL Server 2012 R2 on Win Server 2012 R2, and SQL 2014 AG on Win 2012 R2 and keep all of them up to date with MS patches, service packs and other updates. We test our releases against those platforms and work around any issues that come up so that our software works with the latest MS releases.
I wish we had the resources to maintain the thousands of test environments we would need to replicate the combinations of environments that result from the “everybody decides what patches they want to apply” model, but that is not possible.
So, realize your software vendor needs to make a choice – it will most likely be based on something like how we do our testing.
Side note: I am amused when people tell me of problems they had 10-15 years ago with an update and because of that, they are hesitant to apply even security patches. It’s probably a good idea to have a “staging environment” to test these things if you need to worry about things like that – don’t let those fears stop you from applying security patches. Consider the cost of the advice I just gave – ignore it if you thought it was worthless.
Pretty much what I hope for from vendors is at least say you may not have certified your product has no issues with the latest SQL release but that the vendor will support it and work with the customer to resolve problems. Not the ‘we will not support you’ that is sometimes heard. And at least test basic functionality of the RTM version of the product.
I understand and I don’t fault vendors for not approving every CU. What really gets to me is the vendors that we need to convince that 2015 isn’t a good year to install SQL 2008 RTM. Testing service packs before the old one is out of support should be standard.
This strikes me as a place where IT Service Management / ITIL can be really useful. In defining IT Services at the business level, including their importance and maintenance windows (or maintenance blackouts), that trickles down to the supporting services and DBs. The next step would be to define the logical and system-based dependencies in some sort of management DB. If the business puts in the work up front (or if IT is smart enough to lead them to do it), then by doing those few things you have an exact idea of which servers can get patches when, and what level of care needs to be taken in doing so.
In these sorts of posts and questions, I see a lack of IT service management as the missing glue to bring it all together. It’s a nebulous concept, but by creating some rules and checklists around IT services (e.g. if we call it a service, we’ll keep track of users and # users, we’ll have processes for on-boarding/off-boarding, we’ll define a maintenance window) — a lot of these problems and questions don’t ever have to occur. Easier said than done, but it boggles my mind how it hasn’t caught on yet.
ITIL is a tool not a solution. I have seen patching grind to a halt in the bureaucracy that often springs up around an ITIL implementation. You do not need ITIL, you need effective and efficient processes and a culture that encourages this. ITIL can help or hinder this, just as a hammer can be used to help or hinder your home renovations. Checklists and rules will solve nothing on their own, though they may (likely) be part of the final solution.
You really need to foster a culture of collaboration, teamwork, and excellence before any process change has the possibility of having an impact. Toxic relationships between disciplines (like what Cody describes above) will need to be fixed first.
I agree — I don’t think my comment meant to say otherwise. ITIL and IT Service Management are tools, and they certainly can be misused, but what I mean is that I don’t see them on the radar of a lot of companies that could use them, even where there is respect for process. Many companies that need a better mechanism for process management get lost in the weeds because there isn’t an internal framework to work fro. I meant that the concepts of ITIL / IT Service Management can be that framework to help businesses begin to get their house in order, and surface those tensions between disciplines that you mention above. I’m not saying it’s the solution, merely that it could be a tool worth considering on the path to such a solution, which only comes with time and cultural willingness to change.
I have long complained that SQL patches should not be included, least not as mandatory, for Windows Update. Upgrades do go wrong, just dealing with upgrading SQL2008R2 SP2 to SP3 build 6220 for MS15-058 and it corrupts Reporting see http://blogs.msdn.com/b/sqlreleaseservices/archive/2014/10/16/report-builder-of-sql-server-2008-r2-service-pack-3-does-not-launch.aspx
Also what about services running from another server blasting SQL when you upgrade SQL. These need to be controlled first before you can upgrade.
Ah, that’s the huge problem we see as vendors. Someone had a problem with an update in the past, so new patches and updates will not get applied. Multiply this by 600 different experiences from 600 different customers. Then, the same user won’t upgrade our software because our release was tested with up to date SQL Server releases because we can’t maintain test environments with every combination of patch levels.
Now, they insist that we fix the issue that was fixed in our new release without using our new release, because they refuse to install it. All because they had an issue with someone else’s patch once in the past. Does this sound absurd? We get hit with this about once per month.
Thanks for the post. I realize this post is somewhat old but I have a question. For SPs and CUs updates, what do these actually make changes to? Just the sql server binary files? Do these updates make any changes to the user databases, datafiles?
Brian – for random unrelated questions, your best bet is to head over to http://dba.stackexchange.com.
Thanks I will, sorry about the random unrelated!
Anyone know the patching lifecycle of SQL SP?
Does the previous SP-n go end-of-support 2 years after a new SP-n+1 arrived and become the mainstream support?
So for a non-patch SQL server, its mainstream support actually ends 2 years after another new SP arrived.
Please shed some light on this.
Irene – it depends on your version of SQL Server. For 2017, for example, there are no more service packs. Your best bet is to check the “other builds” links for your version at https://SQLServerUpdates.com.
What about old version of SQL like SQL 2016? Does the old SP main stream support ends in 2 years after a newer SP arrived?
Irene – go ahead and read the link I gave you to read. The answer is there. Thanks!
Looks like the SP model is gone under SQL 2017.
Just wonder if there is any MSDN document discussing this new patching model from SQL 2017 onward?
Are things as equally challenging if you use one of the various flavours of SQL Server provided by Azure?
It depends on what you mean “provided by Azure”, like a VM versus Azure SQL DB. This post is about SQL Server, so I assume you mean SQL Server in an Azure VM – and sadly, it has the same challenges.
I meant the other cases – i.e. not just SQL Server in a VM