We got a phenomenal series of questions from a client, and I wanted to encapsulate the answers into a blog post to help more folks out:
Should all index changes require testing in a staging environment, no matter how big or small? What would be a reasonable timeline duration from index identification to deployment? What should the process of index review entail? What levels/kinds of justification/analysis should be required in order to move forward with an index change?
I’m going to zoom out a little and ask it as, “Should index changes require going through change control?” To answer that, I need to think about…
How risky are index changes?
At first glance, you might think index changes are pretty harmless. However, let’s think about a series of scenarios where an index change could result in slowdowns or an outright outage.
Index changes can fill up your drives. Adding indexes may require more space in the data files for the additional copy of your data (even temporarily, if you’re replacing an existing index.) Your log files are impacted too – index creation is a logged activity, which may cause transaction log growth. Heck, even TempDB can grow if you use the sort-in-TempDB option during index creation. If you’ve got database mirroring or Always On Availability Groups, you also have to look at the drive sizes on those remote replicas as well. If you’re using virtualization snapshots/replication, or storage snapshots/replication, you might even have to work with your sysadmins on the storage space availability.
Index changes can cause blocking. Even when you use Enterprise Edition’s ONLINE = ON while creating an index, you’re still going to need locks, and you can still block other queries. Plus, if your index change backfires and you need to drop an index, there is no online drop or disable. To learn more, check out how online index operations work, and guidelines for online indexed operations.
Index changes can slow down unrelated operations. As you’re creating large indexes, you’re going to need CPU power, workspace memory, and storage throughput. The more of these that you consume, the more you can impact other running queries – even if they’re working on completely different tables or databases.
Indexes can cause inserts to fail. If you create filtered indexes or indexed views, for example, and your application uses non-standard connection settings, your inserts can fail.
Really fancy indexes can have really fancy side effects. If you start playing around with columnstore indexes or in-memory OLTP, you might need a spectacular amount of memory to apply and maintain your index changes.
How sensitive is your workload to performance risk?
Some companies stretch their dollars as much as possible, running at 50-60-80% CPU use at any given time. Users are wildly unhappy with performance as it is, and even the slightest hiccup causes screams of pain.
Some companies go in armed for bear: they build monster servers to make sure they can handle surprise peak workloads. I had one DBA say in a class, “I size all my servers so that I can run backups and checkdb in the middle of the day without anyone noticing.” (Love.)
Any change is risk.
Whether we’re talking adding a nonclustered index or truncating a table, database administration is about understanding how much risk is involved with a change, and your organization’s level of tolerance to that risk.
To reduce the risks involved with any change, test it.
The closer your tests can be to production, the more comfortable you can be with those risks. In a perfect world, that means using a test server and test workload that’s absolutely identical to production. In the real world, the closeness of your tests (and even whether you test at all) is up to your tolerance for risk and your budget.
If you want to learn more about using source control and automated deployments for database changes to reduce your risks, check out Alex Yate’s upcoming classes on Database DevOps with SSDT and with the Redgate tools.