Meme Week: Setting Fill Factor to Fix Fragmentation

I just shake my head when I see people setting fill factor in an attempt to “fix” fragmentation.

The default fill factor is 100%. That means during an index rebuild, SQL Server packs 100% of your 8KB pages with sweet, juicy, golden brown and delicious data.

But somehow, some people have come to believe that’s bad, and to “fix” it, they should set fill factor to a lower number like 80% or 70%. But in doing so, they’re setting fragmentation to 20%-30%. They’re telling SQL Server to leave 20%-30% free space on every single page during an index rebuild.

That’s internal fragmentation – empty space on pages – and it’s bad. It makes your database 20-30% larger, makes your table scans take 20-30% longer, your maintenance jobs take 20-30% longer, and makes your memory 20-30% smaller.

And the very people who keep playing around with fill factor are the ones who wonder why their queries aren’t getting faster, why their maintenance is taking longer, and why their fragmentation isn’t getting better. The jobs still keep taking forever every night, and they don’t connect the dots that they’re the problem.

Previous Post
Meme Week: NOLOCK? NOSQL.
Next Post
Big Data Clusters: Out of Aces

27 Comments. Leave new

  • Good points for sure. I’m wondering if there’s a place for it when you have a “known fragmenter”…say a terrible clustered index key that just instantly fragments because of the nonascending key…that you can’t change (we have one vendor table clustered on a GUID, for example, because they hate kittens I think). I’m torn on whether to a.) exempt it from index maintenance, let that bad boy fragment out to 30% page fullness and 99% fragmentation, b.) pay the painful REORG price every few days, kinda pointlessly, or c.) maybe find a sweet spot with fill factor to “bake in” the fragmentation, but still keep it in the maintenance routines?

    Maybe there’s no advantage to doing that…we have certain tables we’ve just let grow out because it is pointless to try (clustered on a guid, 100GB but only 30GB of actual data…if you scan that table its a buffer cache killer!), others that are more critical that I’m hesitant to just completely exempt…

    (as an aside, years of having people who are not DBAs saying “this process is slow, have you defragmented the database” to me has definitely put me on the side of “it’s not fragmentation, trust me, it’s never fragmentation”.
    I think that sort of “folk wisdom” of defragmenting indexes to make them go faster may have been born out of people rebuilding tables and (accidentally) also updating stats, which is much more likely to fix a performance issue…at least, that’s my theory for why that idea remains so pervasive.)

    Reply
    • I work with a system at the moment where every table is clustered on a GUID. I don’t set 100% fragmentation on that, because that leads to page splits on every insert, which in turn pushes up the fragmentation level, requiring a rebuild sooner, leading to more page splits…. It’s on a very long list of things I’ve told them to never do again.
      If I had my way, when you tried to cluster and index on a GUID, you’d get a message telling you to do something else for a living. Flipping burgers or something.

      Reply
  • Are you saying then that we should always set the fill factor to 100% and then rebuild the indexes on a more frequent basis?

    Reply
    • usually, yes. But update statistics regularly and rebuild indexes only as it is needed. At clients with small databases and no DBA, I would set up a bi-monthly or monthly index rebuild that I am sure was excessive, but didn’t want to just let it go. At large database places with no DBA, would not schedule one, the risk of filling up the log or backups with no on available to fix it I felt was too great. Otherwise I do it manually on a per index basis, somewhat subjectively choosing larger indexes to rebuild at lower fragmentation percentages, smaller indexes at higher percentages and ignoring anything smaller than about 10 Mb

      Reply
  • Like everything else there is a (probably limited) time and place for setting Fill Factor to a value less than 100%. I did so myself recently and it fixed a specific page splits problem.

    I agree that it can make table scans longer but if your pages are in memory already is it really that much longer? And if they aren’t but databases are on fast storage, again is this a huge difference in terms of I/O? Of course conversely fast storage can mean problems associated with page splits are less likely to be problematic anyway. Happy to be disabused of my opinions though 🙂

    Reply
  • There may be very specific cases where an index fill factor of 95 or 98 will work better than 100, but 70 or 80 look ridiculous for general purpose indexing. I remember your “SELECT TOP 100 .. WHERE Reputation = 1 ORDER BY CreationDate” in your indexing course: specific data sets or specific processing patterns may benefit from non-standard approaches. But optimizing for index fill size is not in the top opportunities in most cases.

    A fill factor of 70 is increasing the size by 43% (100/70 = 1.43). 80 is increasing by 25%. Fill factor of 50 is doubling the size.

    Reply
  • Gustav Swanepoel
    February 25, 2022 5:44 pm

    Thoroughly enjoying this week’s “meme” week and the gems of important information conveyed. Insert meme: Wayne’s World “We’re not worthy”

    Reply
  • Jeff Moden has a really great video on dealing with Guid fragmentation that involves setting it to lower than 100% and reindexing at a low fragmentation % threshold, but of course that’s for the particular scenario where you’re optimizing for heavy inserts (and expansive updates). I wonder what, if any, other scenarios make sense to use less than 100%?

    Reply
    • Hi, for those interested here is the link: https://www.youtube.com/watch?v=jx-FuNp4fOA
      We use GUID PKs as well. Of course it bloats the indexes because of 16 Bytes instead of 8 but with enough memory that is ok. We used to set FF very low but now we recommend 90 for smaller DBs (< 200 GB) and 95 for bigger. That way the new records will slowly fill the gaps as the video shows.
      On the other hand we can preassign the primary key values in the Client/Appserver for all tables at the same time, can insert larger object graphs in one go without roundtripps, don't need an additonal column for data replication across systems,….GUIDs are not that bad

      Reply
  • Daniel Johnson
    February 25, 2022 5:51 pm

    Fill factor needs to be adjusted on an index by index basis.

    Before you consider it you have to prove that the index is constantly getting fragmented and that the fragmentation is causing performance issue. If both of those are true you should reevaluate your index and see why. You might not have a great index to begin with.

    If you have ran through all of the above and you cannot alter the index then maybe you can consider adjusting your fill factor but it will cost you storage and memory to prevent fragmentation. 98/100 times, not worth it.

    Reply
  • I disagree (in an “It depends” sort of way). If all you ever do is insert data at the end of a table, then sure, go with fill factor set to 100%. If, however, you have regularly insert data randomly (random GUID as primary key, anyone?), then having the fill factor set at 100% pretty much guarantees page splits with Every. Single. Insert.

    I am not saying that you *should* use a random GUID as a primary key, but I have inherited enough poorly designed systems to know it’s common (It’s not the AppDev’s problem after it’s released, right?). And in that situation, I would never choose a fill factor of 100%. Depending on how much data is inserted in an average period between index rebuilds, I might go as low as 50%. In any case, 80% seems like a decent starting point until you have enough data to make an informed decision.

    Reply
    • A primary (valid) use case for using GUID as PK is when you have lots of inserts. With an identity int field there is only one place new records go… the last page. With Guid your inserts will go all over the place increasing throughput.

      Reply
    • Kevin Martin Tech
      February 25, 2022 9:54 pm

      At the custom software development shop, I work for (https://www.emergentsoftware.net), I created exceptions to the never use uniqueidentifier/guid primary key columns/clustered index rule on our database development style/rule wiki.

      The exception use cases are for when there are separate systems and merging rows would be difficult, and as a last page contention solution (have not tried OPTIMIZE_FOR_SEQUENTIAL_KEY yet). There will be a high bar by exhausting other options before pulling out the GUID PK hammer. 😉

      After watching Jeff Moden’s demo video where he showed to not REORGANIZE and only REBUILD GUID indexes and add some fill factor, we are implementing this on an in-development client’s web app project with 100s of millions of rows. This uniqueness of GUIDs will simplify data movement between different systems for this project where ETL work using Azure Data Factory (SSIS) would not be possible.

      Instead of creating a custom index maintenance plan script that checks for GUID column key leading indexes, we are appending “_INDEX_REBUILD_ONLY” to the index name and using Ola Hallengren’s maintenance script to exclude that pattern for the REORGANIZE step and the next step will REBUILD the indexes that contain the “_INDEX_REBUILD_ONLY” tag.

      By Jeff’s demo this should work out well, but if it does not, I will go back to a putting the clustered index on a surrogate identity column.

      Reply
      • Hi Kevin,

        Thank you for the very kind and honorable mention. Since you’re doing experimental work based on my presentation, feel free to contact me.

        I had to read your post a couple of times. At first, it sounding like you were going to convert EVERYTHING to Random GUIDs and I was screaming “NNNNNNNNNNNNOOOOOOOOOOOOOOOO” the whole time I thought that, Then it finally sunk into my caffeine deprived skull to how you were surgically intending to use it instead of a panacea. Whew!

        As I said in the presentation, because of the random inserts, you could end up with all pages in memory and, like I also said in the presentation, that could be a very good thing or a very bad thing. 😀 “It Depends” and I just wanted to provide the reminder there.

        Seriously, I’d love to know how it turns out for you and will be happy to help on your great experiment. If you don’t know my email address from the presentation, I am on LinkedIn as Jeff Moden. There is a Jeff Moden on twitter but that’s not me by any stretch of the imagination.

        Reply
        • Kevin Martin Tech
          March 14, 2022 10:13 pm

          Will Do! This is an estimated 1.5 year development project, so hang tight on feedback. I have a sprint in a couple of months for setting up the database maintenance. I’m only loading 50k-1M rows for each of the migrations every month or so for UAT testing (but not at scale testing yet).

          Reply
  • It depends…

    Reply
  • If you are lucky enough to have SSD’s then fragmentation of indexes is not a performance issue (fragmentation is bad because it causes HD heads to move a lot). SSD’s access the next page of an index so much quicker that it does not matter. Unfortunately we have never been allowed SSDs on our servers. I think we still should have all fill factors set to 100% unless an index is getting reorganized/rebuilt a lot. Then back it down by 5% increaments to see if it helps (put it back if it does not).

    Reply
  • I maintain an ERP system where “Best Practice” is to set the fill factor to 80.

    The database size is now at 4 TB.
    After having seen Brents video on fill factor I slowly over a period of a year changed the indexes to fillfactor 95. Then to 100.
    Now we have no internal fragmentation – only external fragmentation – which again is no problems since we use something called “random access” both on our storage but also in memory (yes – we use RAM)

    So thank you Brent for making our database smaller.
    Make backups and checkdb run faster – and make us able to keep more data in memory.

    I suspect Brent won’t answer most of the comments on here since he already answered 99% of the questions in his video about this subject. Also the one about pagesplits and how it’s measured – and not measured (adding a new page is also a page split)

    See more here
    https://www.youtube.com/watch?v=iEa6_QnCFMU

    Cheers

    Reply
    • Bingo.

      Reply
    • had a third party app that said 50%. Got tired of this little 30 Gb database being the largest consumer of memory in my sql server, set it to 100 and then suddenly a bunch of problems with the app randomly leaving behind orphaned records in workflow tables stopped happening

      Reply
  • Reply
  • Bren, you wrote …

    But somehow, some people have come to believe that’s bad, and to “fix” it, they should set fill factor to a lower number like 80% or 70%. But in doing so, they’re setting fragmentation to 20%-30%. They’re telling SQL Server to leave 20%-30% free space on every single page during an index rebuild.

    I’ll say “It Depends”. A great example is comparing that to the reason why people allocate more disk to their LDF files than needed. We all know the reason is to prevent unexpected expensive growth. The same is true with CORRECTLY allocated Fill Factors. Done CORRECTLY and for the right reasons, that extra space will get used and, in the process, prevents so-called (and they’re correctly named) “BAD” page splits which, as you know, causes serious logical fragmentation, massive physical fragmentation resulting in page density issues much worse than the mere 70 or 80% you cited, massive blocking, and massive overuse of the log file, which affects backups, restores, live disk space required, etc, etc.

    As you know (or at least I think you know), I went for 4 years without doing any index maintenance on my primary production box and it bloody well worked… most of the time. The physical fragmentation of many of the huge indexes resulted in page densities as low as 10% (some stupid deletes) and a whole trainload of indexes with page densities of only 50%. I’ll also state that the logical fragmentation substantially affects performance even when SSDs are involved.

    While I agree that not all of that can be prevented by lowering the Fill Factor, when combined with the right kind of index maintenance, a 70 or 80% Fill Factor is chump change compared to “natural” Fill Factors of only 50%.

    “It Depends” and nothing is a panacea except for not using REORGANIZE. I’ve not found a place yet where it actually works as well or tamely as a lot of people say it does but there’s always the currently undiscovered exception.

    What I DO absolutely agree with is that what people have come to believe are “Best Practices” actually aren’t and were never intended to be taken as such. Even that is hidden in plain sight in the documentation before they changed it on 20 April 2021. And, I also agree that people are generally checking the wrong stuff to measure fragmentation. To your and Erik Darling’s very good points, I don’t know of many people that actually check segment or physical fragmentation in the form of page density.

    Reply
    • Damn… I left a “T’ off your name and it’s only got 5 letters. My serious apologies especially since I place great honor on the man known as “Brent Ozar”.

      Reply

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.