Index Maintenance Madness

SQL Server
39 Comments

When we look at a new server, we check out their database maintenance jobs, making sure their server is in a safe place before we dig into their pain points. We verify if their backups and DBCC CHECKDB jobs can meet their RPO/RTO goals. We also check out their index maintenance jobs, if any.

When I see that they are rebuilding/reorganizing indexes daily, I ask, “What are you trying to solve with running index maintenance so frequently?”

They usually respond with one of these answers:

  • I’m not sure. Someone else set that up.
  • To fix fragmentation.
  • If we don’t do it, performance becomes horrible.

How do I respond to each of those?

Someone else set it up

Fair enough, but let’s make some adjustments.

  • Change it to be weekly or even less frequently.
  • Use Ola Hallengren‘s IndexOptimize but don’t use his defaults. I suggest 50% for REORGANIZE, 80% or even 90% for REBUILD.
  • Setup a new job to run update stats via IndexOptimize daily. See example D on the IndexOptimize page (link above).

To fix fragmentation

See above changes, but then we also briefly discuss why rebuilding/reorganizing indexes isn’t necessary all that often. If it’s a recent client, I’ll tell them to watch Brent’s video.

For performance reasons

Now this is the one that I love to discuss.

I’ve only had a handful of clients who insisted it was necessary to avoid huge performance problems. Without even looking at the rest of the server, I am confident the issue is Parameter Sniffing or out-of-date statistics. When you rebuild an index, you get updated statistics for that index and any execution plan that references that index gets wiped from the plan cache. But that’s a very expensive way to do those two tasks! If the issue is with out-of-date statistics, then update statistics daily. If the issue is with Parameter Sniffing, then you’ve got more work to do.

I’ve lost hundreds of hours to troubleshooting Parameter Sniffing issues over the span of my career: figuring out which queries are the culprits, writing custom code to detect an issue and analyzing the queries to determine if a code change is needed, a covering indexing is needed or if a Parameter Sniffing workaround is needed.

There isn’t just one easy answer

Every query that is having issues needs to be investigated to determine what can be done to either work around the issue or fix it. Back in SQL Server 2005 and earlier, we had very limited workaround options:

  • Index hints
  • sp_recompile/WITH RECOMPILE
  • DBCC FREEPROCCACHE
  • Stored procedure branching
  • Using local variables inside of stored procedures instead of the input parameters

With newer versions, we can get fancier:

  • OPTION (OPTIMIZE FOR (@param1 = 12345)) at the query level
  • If we can’t modify the queries, we can use a plan guide (2012+)
  • With SQL Server 2016+, we’ve got the Query Store

This blog post isn’t going into detail about Parameter Sniffing

We’ve got tons of blog posts and even a dedicated page on that topic already.

This is just to get you thinking that maybe there’s an easier way to do things than extremely resource-intensive index rebuilds.

Previous Post
First Responder Kit Release: Treating Object References Like Object Instances, Man
Next Post
What Should We Change About the Data Professional Salary Survey?

39 Comments. Leave new

  • We generally have an index rebuild job weekly.
    However, if you have a 12-14 hour maintenance window every day, is there really a real world down side to rebuilding every night?
    Yes, it’s overkill, but is it doing any harm?

    Reply
    • If I had that large of a window every night, I’d probably do all the things. The servers I’ve worked on in the past and the clients I’ve worked with don’t have those kinds of idle windows. Plus lots of these clients are experiencing heavy blocking, and the index maintenance is a big culprit. But one downside is if you have SSDs. You’ll wear them out faster.

      Reply
  • In our 2016 SP1 data warehouse we rebuild daily using Ola’s scripts and Brent’s suggested defaults so we only rebuild the indexes that need it. Plus we use Ola’s scripts to do a ModifiedOnly statistics update after our overnight data load.

    Reply
    • I like what you’ve setup except for the schedule. I’d do it less frequently unless you have a large window of very low activity every night. Most systems don’t have that kind of window.

      Reply
      • We’re lucky that we have a 6 hour maintenance window each night so we limit Ola’s script to only run for 5hrs (allowing a buffer). And also do our diff backups in this time.

        Reply
  • You forgot a reason:

    Microsoft support won’t move your System Center case forward unless you’re doing daily defrags, even though that has nothing to do with the issue.

    Reply
  • Yeah, generally one of the first things I do on a new client’s systems is index LESS often.

    And one client, their biggest issue was never the indices, it was bad parameter sniffing. Of course they wouldn’t upgrade from SQL 2005.

    Reply
  • Andrej Kuklin
    December 6, 2017 6:13 am

    You can use plan guides starting from SQL Server 2005 (with some improvements in SQL Server 2008). They can be a lifesaver for processing queries generated by SSAS which you can’t modify or hint otherwise.

    Reply
  • David Solomon
    December 6, 2017 8:52 am

    Nice concise write-up, but I’d also point out that certain indexes re-frag so fast it’s hardly worth ever defragging.

    Reply
  • Good Article! 🙂
    “When I see that they are rebuilding/reorganizing indexes daily, I ask, “What are you trying to solve with running index maintenance so frequently?””
    It’s always good to ask the question and make recommendations but I like to take the stance that if it isn’t causing (or related to) an issue then potentially changing it may introduce issues. And always make sure any change like this is thoroughly tested and analysed in pre-production and non-client facing systems before hitting production.

    Reply
    • Usually it is causing issues. But even if it’s not, we still dig into what to do instead. So with the clients that I see doing this and specify “for performance reasons”, we go over Parameter Sniffing: what it is, how to find the problematic queries, how to test and what are the workarounds. This blog was written the week before I had a client that was doing index maintenance HOURLY. That’s a first for me. And they are using Standard Edition. YIKES.

      Reply
  • Hi Tara,

    I have to ask, what is the basis of your recommendation to use a 50% Fill Factor when you REORGANIZE an index?

    Reply
    • Hey Jeff,

      Tara’s moving and without internet! The line you’re talking about

      50% for REORGANIZE, 80% or even 90% for REBUILD

      Is about the fragmentation level of the indexes before you reorg or rebuild, not the fill factor you should set them to.

      Thanks!

      Reply
      • Whew!!! Thanks, Erik. I thought Tara had eaten a bad egg or something. 😉
        As for rebuilding at 80 or 90% fragmentation, I’ll say that the damage (slowdowns because of the wrong kind of page splits) has already been done by then. If you’re going to rebuild indexes (and I’m with you folks… I recommend that you don’t rebuild indexes in 98% of tables {there’s always an exception}), then the only reason to do a rebuild is to totally avoid page splits. There’s a way to do it but I’m not yet done with my “bong water” experiments and so I’m also not yet ready to make a recommendation.

        As a bit of a sidebar, the 17th of January, 2018, will be the two year anniversary of the last time I rebuilt indexes with one exception and that’s a table that is heavily inserted into and then nearly as heavily deleted from.

        Hum a rock at Brent and tell him I said “Damn good bong water”! 😉 He’ll understand the meaning.

        Reply
        • Between your beer popsicles and Brent’s bong water, we’re all in for a good time.

          Is the scenario you’re looking at similar to this one?

          Thanks Jeff!

          Reply
          • Exactly. As that article states, the problem is that not all the rows in pages are deleted and so the pages persist in the CI at roughly 50% physical fragmentation (page density) and the NCI’s (some of which are quite wide thanks to the unconcerned use of DTA and the like) persist with as little as 5% page density.

            For those reading this, it’s not just DELETEs that kill you on this. As the article states, changing the content of a Status column, even as a second key column, can cause some pretty severe data movement that leaves pages almost empty especially if the first key column has a relatively low cardinality, such as a ClientID column.

            Considering that even a GUID used as the first column of a CI settles out with a “Natural Fill Factor” of 69.5% (almost a perfect 70% for such an animal), it’s pretty easy to find tables that suffer “Perpetual Physical Fragmentation” by checking the “avg_page_space_used_in_percent ” column of sys.dm_db_index_physical_stats for physical fragmentation rather than usually misleading “avg_fragmentation_in_percent”, which depicts logical fragmentation.

            I guess that’s what causes people difficulty in buying in on the “no defrag” recommendation. They lump both types of fragmentation (misnamed as “logical” and “physical” fragmentation) under a single label. When they have a couple of tables that is of the nature that we’re speaking about, they think that justifies defragging everything instead of making the realization that they’re actually measuring the wrong type of fragmentation to determine when a table or NCI’s actually need to be defragged.

            Another part of the problem is that a lot of people will only believe what they can see and I can’t blame them there. If you can’t reliably demonstrate and measure the effects and then turn your test jig over to someone else and then they can repeat the same experiment with the same results, then you’ve invented “cold fusion”. I’m in the process of building such test jigs that simulate a year of doing (at multiple levels) and not doing defragmentation with measurements for every hour. As you can well imagine, the tests take a bit of time to run so I’m also writing the code to automate them so the next poor slob that runs them doesn’t have to have a full supply of beer popsicles and bong water at the ready to suffer through extended runs. 😉

            The other thing that I’m discovering is that when people DO index maintenance as some form of defragmentation, almost all of them are waiting until massive page splits begin (the avoidance of which is one of the primary reasons that people do defrags) instead of catching things early enough to avoid page splits at all simply because they’re using the wrong methods to detect fragmentation.

            To be sure, Brent’s original “bong water” suggestion and Kendra’s follow ups on statistics has solved a lot of problems for me and the systems I manage. You good folks being joined at the hip on the subject certainly helps with changing the public perception on the subject. Thanks for what all of you do.

  • Ola’s index maintenance scripts also have a TimeLimit which is very handy when you have a limited window. Will everything get defragged? Maybe not that night. Should happen another night.

    Also, I am not as worried about performance per say, as memory pressure. Empty pages go into memory and unfortunately my servers do not have terabytes of ram to just load the entire db into memory. 🙂

    Reply
    • Agreed… nearly empty pages in memory IS a good reason to defrag. Oddly enough, fixing those things will also help performance for multiple reasons.

      Reply
  • I had one scenario not discussed here which was fixed (accidentally) by doing index rebuilds. One side effect of an index rebuild is it updates statistics at 100% sampling. We found a pattern of a query taking a bad plan after the normal stats update occurred. Their normal stats update would sample at 20%. As a result the query had less accurate stats. The index rebuild was an accidental fix for something that was really stats related. We eventually fixed this with a plan guide though.

    Reply
    • Why didn’t you just rebuild the stats more often? “A” plan guide only fixes one thing and not permanently.

      Reply
  • Hi Brent-

    I might be slightly drifting away from this topic, but my question is still related to index maintenance.
    I have a client who has few databases in a server which are closely 2TBs each. They have never run any DBCC CheckDB(Scary, I know) and never has run any index maintenance or stats update in their environment.
    Most of these databases have auto update stats turned on which fairly does a decent job for their requirement.

    These servers are very critical, and it can never have any downtime whatsoever for any sort of maintenance. They only have an hour once in a while allocated for any maintenance work. So whatever has to happen, it has to happen only in that hour. And they do see some performance bottleneck here and there and do complain about slowness at times. Often when I look at certain queries they have a lot of outdated stats too…

    My question as below –

    1. How effectively do I manage indexes for such large databases?
    2. If I ever have to rebuild indexes and update the stats, what sampling rate or fragmentation percentage should I choose?
    3. And not running DBCC CheckDB scares me. What is the best advice on this to run integrity checks on these databases.

    I have tried to run some in the past and it starts blocking processes. They are running SQL 2017 Ent edition.

    Thank you very much for your feedback in advance.

    Much appreciated.

    Reply
  • Fair enough Brent….I just love your way of teaching and do follow you. Will look at attending your class sometime .. And thanks for getting back 🙂

    Reply
  • Robert G Godbold
    September 25, 2019 2:43 pm

    After reading all of this stuff on index maintenance, I’ve finally settled on using dm_db_index_physical_stats to implement 2 conditions in my automated maintenance task: “REBUILD if external-frag > 85% and space-used {magic-number} MB”.

    So if the index is “too big” (I think it’ll take too long, for my environment), then I won’t rebuild during the task; I’ll have the task email me so I can deal with it manually. And if the index isn’t due for a rebuild, then I only reorg when I think it’s wasting too much space due to “internal/logical fragmentation”.

    I figure that if I’ve got a lot of db-catalogs, each with various indexes with avg 250-750MB of empty space, I could easily be wasting 10’s of GB of RAM and disk space; it’s all about the aggregates. But if I have some indexes with 5% density that only take up ~10MB (~1250 page_count), I almost certainly don’t care.

    Reply
  • I know this is an old post, but I was investigating an extract with abnormal runtimes I discovered something odd with statistics and IndexOptmize. I found that the IndexOptmize stats update does not update change tracking tables statistics by default unlike sp_updatestats. I was thinking there is no way they missed this, there has to be some magic parameter I can add so it does the statistics for these tables as well? Looking through I didn’t see anything, so I am posing this question. Do I still need to create another routine to handle the Change Tracking table statistics to avoid performance related problems in this area, or is there a parameters to direct the IndexOptmize to update these statistics as well?

    Reply
  • Mattia Nocerino
    October 6, 2021 7:46 am

    I always thought that AUTO UPDATE STATISTICS ASYNC was enough for statistics maintenence. Then I read this article where you suggest to set up a daily job to do that, but I can’t figure out why.

    Is this considered a best practice? Something that you would setup on every SQL Server or only for specific scenarios?

    Thanks

    Reply
    • I wasn’t saying to update stats daily as a best practice. I was saying that if you are doing index maintenance daily that it would be less intrusive to do update stats daily instead. What your system needs could be different than another system. I do often start off with daily update stats but sometimes weekly. Just depends.

      Reply
  • Just a real quick update. I’ve pretty much proven that if you’re using REORGANIZE at all, you’re probably CAUSING your indexes to perpetually fragment. If you can’t do a BEBUILD, it’s better to do nothing than it is to use REORGANIZE. There are, of course, exceptions but they’re much more rare than you might think and they’re going to cost you dearly in CPU and the number of and size of log file entries.

    Also, if you’re rebuilding index that fragment using the default Fill Factor, you’re causing just as much damage as if you used REORGANIZE.

    And it turns out that the use of REORGANIZE is what actually makes Random GUIDs have page splits all day every day. If you do index maintenance correctly (which does NOT include the use of REORGANIZE), you can do like I did and insert 100,000 rows per day into a Random GUID Clustered Index with less than 1% fragmentation and no index maintenance for 58 days. I’ll do the math for you there… 100K rows per day for 58 days is 5.i MILLION rows inserted with less than 1% fragmentation. GUID fragmentation is a myth perpetuated by “Best Practices” that are actually the very WORST practices and a whole bunch of misinformation and some seriously inadequate tests.

    I also went about 4 years without doing any index maintenance, which proves Brent’s points on the subject. You DO need to update stats and you DO need to do DBCC CHECKDB but, really, the only time you REALLY need to do index maintenance is to recover space in memory (and on disk, of course) and you really need to pay attention to how you do it or you’ll get the “morning after” syndrome of massive blocking due to page splits.

    Brent… thanks for getting me interested in index maintenance way back when.

    Reply
  • Hi Brent

    Just a quick cautionary tale related to use of Indexes, and where a rebuild doesn’t help.

    When I checked the transaction in question I could see no problem, and certainly nothing like the 20 digit value it was reporting as the error.
    CHECKDB showed no problem in the data.
    I extracted the code that did the check and compared the execution plans.
    The start up check was using a covering index (designed for use in reporting) my query was using the clustered index.
    This covering index had been corrupted in such a way that the values it was holding for the transaction were nothing like the table values and yet it was not showing as corrupt.

    I scripted a drop and create for the index which fixed the issue.
    Out of curiosity and ignorance I restored the backup and used a rebuild and then a reorganise of the index neither of which worked (I wasn’t expecting reorganize to but naively I was expecting Rebuild to refresh itself)

    It does shake my confidence in reports that are using covering indexes though to be fair this is the first time I have knowingly come across this in 12+ years of the application in use on around 2000 sites around the UK.
    Is there something basic that I, as the one eyed dba in the kingdom of the blind is missing ?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.