When Does Index Fragmentation Matter?

A while back, we posted about how SQL Server index fragmentation doesn’t matter. That blog post generated a lot of heated discussion and has served to make people aware that fragmentation may not be the source of their woes. Sometimes, though, there are data access patterns where fragmentation really does matter.

Perpetual Fragmentation

Some data access patterns lead to tables and indexes that will always be fragmented, no matter what you do. If you have frequent data changes in the middle of a table or index, you can see heavy fragmentation.

What’s this pattern look like? Check it out!

TL;DR – The code in this example inserts a large number of rows at the end of the table. 70% of rows are marked as processed and are then deleted. This causes a lot of free space or, as you call it, fragmentation.

The only way to get rid of this fragmentation is to rebuild both the non-clustered index and the clustered index. For this example, the defragmentation reduces the size of the index by more than 3x (from 25 pages to 7 pages) and the table consumes just over 50% less space.

For workloads with a lot of random deletes (or updates that change the size of a row in a way that causes data movement), index defragmentation is necessary. Without index fragmentation, these database will continue to grow in size and result in the usual problems that we associate with fragmentation.

What Doesn’t Cause Perpetual Fragmentation

At this point you might be saying “Ah ha! You’ve proved that I should always defragment my indexes! Constantly! ZOMGWTFBBQ!!!11!11!shiftone!!!!!”

You’re wrong.

For some workloads, you can still avoid index fragmentation – if you’re adding data to the tailing end of the table and deleting data from the beginning of the table, you won’t need to defragment your indexes. Ghost record clean up should take care of deleted rows in this case.

For most workloads, your best bet is still to analyze SQL Server wait statistics and understand the basic problem before you start defragmenting indexes on a regular basis.

Acknowledgements

The original idea for this blog post comes from Hemant K Chitale’s Index Growing Larger Than The Table. Even though SQL Server handles this situation differently from Oracle, the underlying pattern is still interesting for database practitioners.

The random length string code was adapted from T-SQL: Random String.

Previous Post
Why Core-Based Licensing Matters for Performance Tuning
Next Post
SQL Server’s Cost Threshold for Parallelism

24 Comments. Leave new

  • “For most workloads, your best bet is still to analyze SQL Server wait statistics and understand the basic problem before you start defragmenting indexes on a regular basis.”

    We take a similar but reverse philosophy…automate general rules for managing fragmentation across the environment, and then finding indexes that don’t justify fragmentation maintenance due to access patterns and the general cost/pain of maintaining them, and exempt those. Ola Hallengren’s stuff works great for avoiding unnecessary rebuilds…much better than a blanket REBUILD ALL THE THINGS that SQL Server tries to push you towards with the Maintenance Plans.

    Personally I’m a little more worried about internal fragmentation / page density issues than logical fragmentation because we have some very poor index designs we have to take care of, and the page splits add up to more pages for a given amount of data, and you can feel that pain everywhere…the buffer pool, I/O, storage, etc. But sometimes there are indexes or designs where the juice isn’t worth the squeeze, and so we exempt it and let it fragment to its heart’s content. If it’s not getting heavy internal fragmentation and range scans aren’t a chief method of access, meh, let it fragment. Reorgs and rebuilds -can- be very costly and I’ve been woken up in the middle of the night by filled log drives more than once. Significantly more than once, I should say.

    Reply
    • Great point – exceptions are required for every policy.

      Out of interest, how are you measuring internal fragmentation and bad page splits?

      Reply
      • Not with any automated process, really. Usually its in response to problematic events with a particular table, so I use sys.dm_db_index_physical_stats to manually check it under avg_page_space_used_in_percent. I’ve gotten deep into the weeds with some of these tables, and was actually wishing there was a way to graph external fragmentation along a table so you could figure out “where” the fragmentation was occurring…I started thinking of how it might be possible with DBCC IND or sys.dm_db_database_page_allocations…and then I realized the scope of the waste of time that might turn out to be, did a little analysis of how this table was used using the plan cache, and exempted the table from maintenance. Several weeks later, no increase in average duration or logical reads on the procs in the plan cache. Problem solved…

        Re bad page splits, fn_dblog() would be one way but I’d have to be really motivated (wheelbarrows of cash or casual threats of waterboarding from the CIO usually suffice) to dig that deeply without a good reason. Still a little green with the log file spelunking, myself.

        Reply
  • I just learnt something from Noam Brezis last week in his presentation for sqlpass Performance VC. I think extent fragmentation should be considered as well:
    http://www.madeirasql.com/truth-about-index-fragmentation/

    Reply
    • Hi Klaas,

      I’ve gone down this road before and been unable to prove that multiple file groups have much benefit unless I’m being required to use entire disks for storage. Modern storage hardware can stripe the extents across multiple disk volumes so what’s logically contiguous for SQL Server may not even be located on the same shelf of storage.

      Have you found scenarios with modern hardware where you’re able to get performance improvements by making this change? And if so, have you blogged about it?

      Reply
      • Jeremiah

        I did not do any testing myself. I don’t get much time to experiment, and the daily workload is nearly nothing. Nightly ETL and index rebuilds are the workout sessions for my SQL Servers.
        Noam’s explanation and demo convinced me that this method could have benefits and no cons.

        We do have several SQL Servers and different storage solutions from different vendors, with different versions and configuration. It’s not my specialty, but any monitoring always comes up with one big problem on all SQL Servers: IO latency.

        Last weeks mirroring entered paused state twice, because of ‘I/O requests taking longer than 15 seconds to complete on file … ‘, and ‘waiting for buffer latch type 4’ errors during index rebuilds.
        I’m just hoping that using separate files would make it possible to use more channels and facilitate parallel writing or something, and that continuous files will be easier to read.
        On some servers I have to reformat with 64KB block sizes, but again I don’t know if that has any effect on SAN drives.

        Furthermore rebuilding to another file would not leave empty extents and the total used space would be smaller and continuous, not?

        I suppose the configuration of our SAN is not perfect, and the administrator says with newer versions the striping logic will be smarter, but we’re also thinking of moving back to stand alone machines with local disks.

        Reply
        • Klaas – so if your daily workload is nearly nothing, and your index rebuild is half of the workout of your SQL Server…why are you rebuilding indexes again?

          Especially if it’s breaking database mirroring?

          Reply
          • Yes, really ridiculous, isn’t it?
            We’re preparing to move more DB’s and applications which are now written in COBOL and use DB2 databases. Those are about 100 times bigger and 1000 times busier than what we have migrated so far. I want to train best practices, automate what I can and keep up with all technologies before the fire starts.
            And I want to prove that DB mirroring isn’t the solution we need, hèhè.

          • K – I think we’ve found the problem: you’re trying to prove something. As long as you’re trying to prove that mirroring won’t work, then yes, doing constant unnecessary index rebuilds will do the trick. Congratulations!

  • No no, that’s just a side effect.

    Seriously, in a few months, we’ll have some DB’s that won’t fit in memory anymore. More RAM is on my wish list, but I doubt I will get enough.

    I often read the advise to do index rebuilds and I turned to Ola’s solution.
    I saw your webinar and I understand that the more data pages are in memory, the less fragmentation matters, but what about wasted space on the loaded pages? We’re filling memory with empty space when we don’t rebuild to approach full pages, or am I wrong? And backup time and size will be better too, no? And thus restore will be faster?

    I assumed you meant that external fragmentation is not always a problem on virtualised machines and shared storage, but what about internal fragmentation? Indexes are rebuilt with Fill Factor 0, but two 4GB indexes have 45 avg_page_space_used_in_percent after a week, shouldn’t I rebuild those?

    Reply
    • If your problems are mostly localised to a couple bad offenders like those two indexes where a week after a rebuild, you have 50% whitespace in your pages, and if it isn’t vendor software where your hands are tied, maybe the best solution, instead of perpetually bandaiding it with rebuilds, would be index analysis to see if there’s a better clustering key. Sounds like an awful lot of random DML throughout that table. If you can fix the root of the problem, ie what is causing the fragmentation, that is ideal…but sometimes not always possible.

      Reply
      • Yes, I know.

        It’s an in house developed ERP, but I can’t touch that either. Developers have promised that one day they will look into it, and choose better data types, write more efficient procedures and use better key columns and do everything I want.

        Reply
    • Klaas – it’s all about solving pains. If your biggest pain is improving backup time and size, take the right steps to fix that. Is internal fragmentation a pain point for you, and how are the symptoms manifesting themselves?

      Reply
      • I’m trying to be the invisible DBA and avoid the pains, resolving issues before they manifest themselves. I’m always looking to reduce backup time and other jobs, because the faster the job, the less probable it will conflict with others. Now I hear that most of my work is only theoretically beneficial.

        The real pain is IO latency, and since the real cause probably won’t be solved soon, my plan B is to reduce IO, a.o. by keeping everything as dense as possible.

        Reply
        • Klaas – don’t resolve issues you don’t have. By doing defragmentation, you’re actually making backup & other job times *worse* (by inflating the log and the number of changed pages in the database), and making IO latency worse.

          Reply
          • OK…..

            I’ll disable the schedule for the indexoptimize and see what happens.
            This feels like a betrayal to my religion and I’ll spend the next few nights awake in terror. But then, what’s new?

            Life will be a lot easier when you deliver the super powers I ordered three times already.

            Thank you very much.

  • If it makes you sleep better at night….one of the things I do when deciding to cast an index into the Pit of Fragmentation Despair and no longer perform reorgs or rebuilds on it, is scour the plan cache for execution plans that access it, and take note of your average reads and durations…document all of this when the index is not heavily fragmented, then let it go, and later query the plan cache after it has been a while (perhaps after a patching reboot, which will reset the query stats) and the index has fragmented as expected. Then you can compare average logical reads and duration…if the queries accessing that table are heavily impacted you can reevaluate, otherwise, if the performance difference is marginal (compared to the log-pummeling pain of index operations) then you know you made the right call. One of the hundred gig tables we took out of the maintenance process, the most expensive query that runs against it went from 22 to 24 logical reads after we stopped defragging it. Those two extra logical reads…not exactly worth the firehose of log traffic from ALTER INDEX…REORGANIZE every night.

    Turning all index maintenance off at once might be a bit harder to quantify…I guess just pick some queries and monitor their performance over time. A tool like Ignite (or whatever the Borg at SolarWinds rebranded it to) is handy for this, but the plan cache is sufficient as well.

    Reply
  • It was my understanding 2 hell with index fragmentation, when I’ve got to share my raid 10 with other applications.
    And even if I didn’t have any other apps working on the same raid, I’d face the prob that I’d never know when and where exactly the fragments are torn apart and written into individual disks.
    On the other hand, if I’d have full controll over where I write to, what gets written to which disk, when I have my own SAN, when I even controll when what kind of heavy ETL is done, then I stand a good chance in my fight against index fragmentation.
    Index fragmentation was nothing I ever even considered, whenever I had Gigs and no TBs of data to work with.
    But what would I know, with as little as 2 years experience. And I’m not ashamed to call my self a newby ;D

    Reply
    • Honestly, we believe that you can ignore index fragmentation for the most part. Your indexes are going to fragment any way and most people are too aggressive about when they try to fix things. There are some workloads where you really do want to defragment indexes, and that’s what I was trying to point out.

      Reply
      • I understand and I very much appreciated your blog. Cheers 🙂
        What I have learned in a nutshell from your blog is, when in heavy OLTP workload with massive transactions, I might want to have another closer look into my fragmentation.
        Until then, I’m just gonna chill 😀

        Reply
  • Ok… I know this post is 7 years old but I have a question. The article states…

    “For most workloads, your best bet is still to analyze SQL Server wait statistics and understand the basic problem before you start defragmenting indexes on a regular basis.”

    There are a lot of posts in this world that all say roughly the same thing but I’ve not yet found one that tells folks how to find the indexes that the “wait statistics” actually relate to. So how do you do that?

    Reply
    • That’s a great question, and we dig into it in my Mastering Index Tuning class in the module where I explain the differences between sys.dm_db_index_operational_stats and sys.dm_db_index_usage_stats. I can’t do justice to the entire class in the comments here, obviously, but check out the columns in sys.dm_db_index_operational_stats and that might be enough to get you started.

      Reply
  • Heh… I posted on one of your other threads on this subject. To summarize, Random GUID fragmentation is a myth, ever increasing indexes are actually a hot spot for fragmentation, and using REORGANIZE is a nice way to perpetuate page splits and the resulting fragmentation all day every day. 😀

    Again, thank you Brent for getting me more interested in index maintenance.

    Reply
  • “For workloads with a lot of random deletes … index defragmentation is necessary. Without index fragmentation, these database will continue to grow in size and result in the usual problems that we associate with fragmentation.”

    Should the second sentence read “Without index defragmentation…” Or am I missing something?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.