When the business decides it’s time to give up fixing the old server because it’s not fast enough or reliable enough, stop. The worst thing you can do is jump directly to the hardware vendor’s site, check a few boxes, and order a new shiny server that ends up with exactly the same problems as the last server.
1. Get the business’s RPO and RTO goals in writing. These two numbers determine whether you build a cluster, an AlwaysOn Availability Group, a log shipping backup, or a pair of tin cans connected with string. Learn what RPO and RTO are here.
2. Find the right SQL Server feature for your RPO/RTO goals. For example, if you’re not allowed to lose any data, and you need to fail over in less than 1 minute, then your options include failover clustered instances, AlwaysOn AGs with synchronous mirroring, or synchronous database mirroring. Use Page 2 of the HA/DR worksheet linked in the last post to figure that part out, or watch the HA/DR architecture modules of the Senior DBA Class.
3. Review the current server’s wait stats. To help figure out what kind of hardware you need for tomorrow, look at the pains you’re experiencing today. For example, if you’re suffering from PAGEIOLATCH% waits because you can’t cache enough data in RAM, the next server may need significantly more memory (or faster storage, or time spent tuning indexes & queries.) If you’re suffering from LCK% waits because the apps are holding locks on their side, though, you may not see relief no matter what kind of hardware you throw at it.
4. Benchmark your current hardware’s speed. I’ve written about how to check performance on a new SQL Server, but this is also really useful when you’re transitioning away from an old server. For example, take your backup & CHECKDB job runtimes, and the new server should be able to perform at least that quickly (if not more so, depending on what your bottleneck was in the above step, and which kind of bottleneck you’re trying to fix with the new server.)
5. Get the server’s proposed expiration date. Servers are like milk cartons: both need an expiration date printed clearly on the outside. Ask the business users, “How long does this server need to last?” They usually say, “Forever,” and start laughing, but once the laughing stops, here’s how to handle the discussion.
6. Ask the developers, “What features are we adding during that time?” I don’t expect them to have specific line-by-line answers about which features they’re adding over the next 3 years, but I do expect them to have rough architecture ideas. Are we going to add some kind of big data analysis projects? Machine learning in the database? R, Python, or Java running inside it? Storing new kinds of data, like tracking every web site click?
7. Ask the business, “How much will load grow during that time?” Are we expecting to acquire our next several larger competitors, or maybe conquer sales in a new country? Or are we just expecting to tread water and only deal with incremental growth?
For all of the above questions, the more specific the answers are, the more precise your server sizing can be.
In reality, they’re very rarely specific.
And that’s fine!
But it just means that your server design needs to be more flexible: we need to design something that we expect will change rapidly over time as we learn more about the answers above. That’s one of the reasons why I really love building new environments as Availability Groups up in the cloud: it’s so much easier to add in new, more powerful replicas, join them to the cluster, and then get rid of the old ones. Your server design can leverage the flexibility of the cloud to handle new features or workloads, and you can throw cloud at performance issues.
Nice new hardware. A neighbor has one in yellow. It really stands out.
Thanks! So many good colors for these.
It’s all good stuff Brent, but really, isn’t it, “What does it cost to duplicate current server with newer hardware, and now how much more money can we find?” Then you answer the other questions to fit the new system. Figuring new systems will have 2x the RAM, much faster, larger SSDs, and OMG networking.
Baseline is very important but hopefully you already have your weekly performance reports that provide a baseline, I mean, backups, say what?
JR – I’ve got good news: someday, you’re going to get a better job where they do a better job of planning. Keep your hopes up, and keep looking. Hang in there! It gets better.
Brent, really, even my previous comment is off. For 90% of SQL Servers these days *nobody* buys a server, it’s all allocated out of the private cloud, or containerized, or something.
I was actually approached by a big company, they were advertising for senior SQL Server types, and the job – the whole team – was new server configuration! I thought they were nuts. And that was OMG probably 15 years ago.
Recently I’ve been using Azure SQL, where “planning your server” is even more click-click than ordering from Dell. Makes me crazy sometimes that I *can’t* be a little more specific about configuration, but hey, diagnosing capacity issues in such shared environments is also crazy-making sometimes.
I remember the olden days when a dba building a server included actually touching the hardware, these days we seldom ever are in the room with it or even in the same city with it. Thanks for the walk down memory lane, so to speak, anyway.