Should Index Changes Require Change Control?

We got a phenomenal series of questions from a client, and I wanted to encapsulate the answers into a blog post to help more folks out:

Should all index changes require testing in a staging environment, no matter how big or small? What would be a reasonable timeline duration from index identification to deployment? What should the process of index review entail? What levels/kinds of justification/analysis should be required in order to move forward with an index change?

I’m going to zoom out a little and ask it as, “Should index changes require going through change control?” To answer that, I need to think about…

How risky are index changes?

At first glance, you might think index changes are pretty harmless. However, let’s think about a series of scenarios where an index change could result in slowdowns or an outright outage.

Index changes can fill up your drives. Adding indexes may require more space in the data files for the additional copy of your data (even temporarily, if you’re replacing an existing index.) Your log files are impacted too – index creation is a logged activity, which may cause transaction log growth. Heck, even TempDB can grow if you use the sort-in-TempDB option during index creation. If you’ve got database mirroring or Always On Availability Groups, you also have to look at the drive sizes on those remote replicas as well. If you’re using virtualization snapshots/replication, or storage snapshots/replication, you might even have to work with your sysadmins on the storage space availability.

Index changes can cause blocking. Even when you use Enterprise Edition’s ONLINE = ON while creating an index, you’re still going to need locks, and you can still block other queries. Plus, if your index change backfires and you need to drop an index, there is no online drop or disable. To learn more, check out how online index operations work, and guidelines for online indexed operations.

Index changes can slow down unrelated operations. As you’re creating large indexes, you’re going to need CPU power, workspace memory, and storage throughput. The more of these that you consume, the more you can impact other running queries – even if they’re working on completely different tables or databases.

Indexes can cause inserts to fail. If you create filtered indexes or indexed views, for example, and your application uses non-standard connection settings, your inserts can fail.

Really fancy indexes can have really fancy side effects. If you start playing around with columnstore indexes or in-memory OLTP, you might need a spectacular amount of memory to apply and maintain your index changes.

How sensitive is your workload to performance risk?

Some companies stretch their dollars as much as possible, running at 50-60-80% CPU use at any given time. Users are wildly unhappy with performance as it is, and even the slightest hiccup causes screams of pain.

Some companies go in armed for bear: they build monster servers to make sure they can handle surprise peak workloads. I had one DBA say in a class, “I size all my servers so that I can run backups and checkdb in the middle of the day without anyone noticing.” (Love.)

Any change is risk.

Whether we’re talking adding a nonclustered index or truncating a table, database administration is about understanding how much risk is involved with a change, and your organization’s level of tolerance to that risk.

To reduce the risks involved with any change, test it.

The closer your tests can be to production, the more comfortable you can be with those risks. In a perfect world, that means using a test server and test workload that’s absolutely identical to production. In the real world, the closeness of your tests (and even whether you test at all) is up to your tolerance for risk and your budget.

If you want to learn more about using source control and automated deployments for database changes to reduce your risks, check out Alex Yate’s upcoming classes on Database DevOps with SSDT and with the Redgate tools.

Previous Post
Quirks When Working With Extended Events To Track Locks
Next Post
[Video] Office Hours 2019/09/12 (With Transcriptions)

8 Comments. Leave new

  • Thank you for this great reminder! Not too long ago, I learned this the hard way… blocking about 1,000 inserts while adding an index. That meant I got to make those happen manually. It was a painful, humbling lesson.
    Now I don’t complain so much when I have to wake up at 3 A.M. to do index maintenance.

    Reply
  • Another category of index change risk – exposing non-deterministic defects. Example: queries that are missing an explicit ORDER BY, that have consistently produced the correct result by luck/accident because the resultset was always produced in clustered index order. Add a non-clustered index that changes execution plan, or modify the clustered index columns … particularly fun when the query uses FOR XML and the resulting error is an obscure XML parsing error on the client side.

    Reply
  • A new index can result in new deadlocks – that can make the dba unpopular for a while

    Reply
  • As Tom Kyte says in the Oracle world, there’s no such thing as a small change…

    Please do not make any changes to a production system – a live system – without first testing for any side effects. For example, please do not read a blog post or a book chapter, and then check your system and find you are using manual memory management – and then just turn on automatic memory management. Query plans may change and performance may be impacted. One of three things could happen:

    -Things run exactly the same
    -Things run better than they did before
    -Things run much worse than they did before

    Exercise caution before making changes; test the proposed change first!

    One more example to remind you that there is no such thing as a “small” change.

    LED lights are an excellent lighting solution due to their longevity and power efficiency. Replace all traffic lights with LED lights, “small” change right? Wrong. It turns out that they may not be the best choice in all conditions. Normally, the excess heat generated by incandescent bulbs is enough to melt the snow off lights so that they remain visible even in freezing conditions. Traffic lights that employ LED lighting, while far more power efficient and reliable than older ones, aren’t able to melt the snow that accumulates. Snow blocking traffic signals is a significant problem as it has already led to dozens of accidents and at least one fatality.

    Reply
    • NYC, there are a few LED traffic lights that are virtually impossible to make out on a bright summer day.

      Reply
  • A few weeks ago, I found a way to make a hard coded query coming from our application run exceedingly fast by using a filtered index. Adding WHERE SOMEFIELDNAME IS NOT NULL to the index made it lightning quick because most of our clients never have any values in that field. I tested it on a few databases and rolled it out to a client.

    One of their nightly scheduled tasks started failing because it invoked a stored procedure with compiled with QUOTED_IDENTIFIER turned OFF. And it turns out that filtered indexes cause all triggers and and procedures with QUOTED_IDENTIFIER turned OFF to barf.

    So, yeah, new indexes can cause unexpected problems and need to be tested thoroughly.

    Reply
  • Michael J Swart
    September 17, 2018 9:45 am

    In our shop hundreds of databases with the same schema get updated monthly. We really depend on change control so that our source of truth for table definitions is source control (not prod or any other environment).

    We have a well-practiced deployment pipeline for Database changes. So if a production DBA wanted to recommend a new index, it gets added into source control, tested and shipped everywhere mostly automatically.

    Of course there are some emergency situation where an index might have to be added quickly. But for us this is really rare and involves a lot more red tape and process so that adding an index to source control first is not only safer, it’s easier.

    It’s a really good topic. I would advocate using source control to anyone. Even if an organization has a high tolerance to risk today, that can always change. Deferring the adoption of source control can be tricky because it’s hard to detect a change in risk-tolerance. There’s no clear signal that says “now’s the time to adopt source control”

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}