Blog

Defragmenting compacts as well as changes physical order

Index maintenance compacts a structure *and* changes physical order– both can be important!

Editors Note: this post was updated in June 2014 to link to an online index rebuild bug that can cause corruption.

Once up on a time, there was a database server with 500GB of data and a heavy read workload of dynamic queries. Data was updated frequently throughout the day and index tuning was a serious challenge. At the best of times, performance was dicey.

Then things went bad

Application performance plummeted. Lots of code changes had been released recently, data was growing rapidly, and the hardware wasn’t the absolute freshest. There was no single smoking gun– there were 20 smoking guns!

A team was formed of developers and IT staff to tackle the performance issue. Early in the process they reviewed maintenance on the database server. Someone asked about index fragmentation. The DBA manager said, “Of course we’re handling fragmentation!” But a few queries were run and some large, seriously fragmented indexes were discovered in production.

The DBA explained that fragmentation wasn’t the problem. She didn’t have automated index maintenance set up, but she periodically manually defragmented indexes that were more than 75% fragmented.

Bad, meet ugly

At this point the whole performance team flipped out. Trust disappeared. Managers squirmed. More managers were called in. The DBA tried to change the subject, but it was just too late. More than a week was wasted over Fragmentation-Gate. It was a huge, embarrassing distraction, and it solved nothing.

Here’s the deal– the DBA was right. Fragmentation wasn’t the root cause of the performance problem. The DBA was a super smart person and very talented at performance tuning, too! (And no, I’m not secretly talking about myself here– this is not the “royal she”.) But she made a strategic miscalculation: she should have set up occasional automated index maintenance to align with her team’s normal practices and standards.

Why you need automated index maintenance

When performance gets bad, one of the very first things people look at is whether systems involved are configured according to best practices. If you’re not following a best practice, you need to have a really good reason for it.

Regular index maintenance still has a lot of merit: even in Shangri-La, where your data all fits into memory and your storage system is a rockstar with random IO, index maintenance can help make sure that you don’t have a lot of empty space wasting loads of memory.

It’s still a good idea to automate index maintenance. Absolutely don’t go too crazy with it– monitor the runtime and IO use and run it only at low volume times to make sure it helps more than it hurts. Be careful, but don’t skip it.

How much downtime can you spare?

Before you implement index maintenance, find out how much time tables can be offline in each of your databases. Then, figure out what operations you want to use.

To Rebuild or Reorganize: That is the Question

First off: ‘Reorganize’ and ‘Rebuild’ are two different operations that each reduce fragmentation in an index. They work differently toward the same end. You don’t need to run both against the same index. (I sometimes find that people are doing both against every index in a maintenance plan. That’s just double the work and NOT double the fun.)

Rebuild: An index ‘rebuild’ creates a fresh, sparkling new structure for the index. If the index is disabled, rebuilding brings it back to life. You can apply a new fillfactor when you rebuild an index. If you cancel a rebuild operation midway, it must roll back (and if it’s being done offline, that can take a while).

Reorganize: This option is more lightweight. It runs through the leaf level of the index, and as it goes it fixes physical ordering of pages and also compacts pages to apply any previously set fillfactor settings. This operation is always online, and if you cancel it then it’s able to just stop where it is (it doesn’t have a giant operation to rollback).

Factors to consider:

  • Standard Edition rebuilds ain’t awesome. If you’ve got SQL Server Standard Edition, index rebuilds are always an offline operation. Bad news: they’re also single-threaded. (Ouch!)
  • Enterprise Edition rebuilds have gotchas. With SQL Server Enterprise Edition, you can specify an online rebuild — unless the index contains large object types. (This restriction is relaxed somewhat in SQL Server 2012). You can also use parallelism when creating or rebuilding an index– and that can save a whole lot of time. Even with an online rebuild, a schema modification lock (SCH-M) is needed at the time the fresh new index is put in place. This is an exclusive lock and in highly concurrent environments, getting it can be a big (blocking) problem.
  • There’s a bug in SQL Server 2012 Enterprise Edition Rebuilds that can cause corruption. If you’re running SQL Server 2012 SP1 – SP2, parallel online index rebuilds can cause corruption. Read about your options here.
  • Rebuilding partitioned tables is especially tricky. You can rebuild an entire partitioned index online– but nobody really wants to do that because they’re huge! The whole idea behind horizontal partitioning is to break data into more manageable chunks, right? Unfortunately, partition level rebuilds are offline until SQL Server 2014.
  • Reorganizing can be pretty cool. ‘Reorganizing’ an index is always an online op, no matter what edition of SQL Server you’re using. It doesn’t require a schema mod lock, so it can provide better concurrency. Reorganizing only defragments the leaf level of the index. On large tables it can take longer than a rebuild would take, too. But as I said above, it’s nice that you can reorganize for a while and then stop without facing a massive rollback.

SQL SERVER 2014: WAIT_AT_LOW_PRIORITY, MAX_DURATION, AND ABORT_AFTER_WAIT

I’m really excited about new index rebuild options that are shipping in SQL Server 2014. Check *this* out:

ALTER INDEX OhSoFragmented ON dbo.MyTable REBUILD
WITH (
	ONLINE = ON
	(WAIT_AT_LOW_PRIORITY
		(MAX_DURATION= 5, ABORT_AFTER_WAIT=BLOCKERS)
	)
);
			

So we’ve got new tools for those concurrency problems I was talking about with online rebuilds. Now, we can say how long we’re willing to wait to get that schema modification lock (in minutes), and if we have to keep waiting what should happen. (Options: kill those who are blocking us, kill ourself, or do nothing.) Obviously there are some situations where just killing the blockers could be a terrible problem, but I’m interested to experiment with this.

You Didn’t Answer the Question: Do I Use Rebuild or Reorganize?

Yeah, I totally dodged that question, didn’t I?

If you have a regularly scheduled downtime every weekend, you’re probably fine with straight up index rebuilds, even if you have Standard Edition. Single threaded offline index maintenance may not be the hottest thing in the world, but hey, if you’ve got time for it then embrace the simplicity.

If you have Enterprise Edition, embrace parallel index rebuilds– and use the ONLINE option for indexes that allow it if people need to access the database during your maintenance window.

If you have database mirroring or AlwaysOn Availability Groups, tread with caution– particularly with rebuilds. It’s easy to generate a ton of IO with index maintenance, and it could mean putting your secondaries or mirror so far behind that they can’t catch up.

Maintenance plans or custom scripts?

You can go the easy way and use SQL Server Maintenance Plans, but unfortunately they’re very simplistic: you can only say “rebuild all the indexes” or “reorganize all the indexes”. You cannot say, “If the index is 45% or more fragmented, rebuild it– otherwise do nothing.” If you don’t spend much time with SQL Server and you’ve got downtime available every weekend, this can be a decent option.

If you need to minimize downtime, custom index maintenance scripts are the way to go. Our favorite: Ola Hallengren’s maintenance scripts. These are super flexible, well documented, and … free! The scripts have all sorts of cool options like time boxing and statistics maintenance.

Some tips for using Ola Hallengren’s index maintenance scripts:

  1. Download and configure them on a test instance first. There’s a lot of options on parameters, and you’ll need to play with them.
  2. Get used the ‘cmdexec’ job step types. When you install the scripts you’ll see that the SQL Server Agent jobs run index maintenance using a call to sqlcmd.exe in an MSDOS style step. That’s by design!
  3. Use the examples on the website. If you scroll to the bottom of the index maintenance page you’ll find all sorts of examples showing how to get the procedure to do different useful things.

Find out when maintenance fails

Don’t forget to make sure that your maintenance jobs are successfully logging their progress. Set up Database Mail and operators so jobs let you know if they fail.

Tell your boss you did a good thing

Finally, write up a quick summary of what you did, why you chose custom scripts or maintenance plans, and why. Share it with your manager and explain that you’ve set up automated index maintenance as a proactive step.

Having your manager know you’re taking the time to follow best practices certainly won’t hurt– and one of these days, it just might help you out. (Even if it’s just by keeping everyone from following a red herring.)

Time to level up

  1. Check out more of our free articles and videos about indexes
  2. See if Brent’s 90 minute training on The Best Free SQL Server Downloads is right for you
  3. Join us in person to learn How to Be a Senior DBA in 2014
↑ Back to top
  1. Hi,

    Firstly thanks to you and all of the team at Brent Ozar for all the SQL Server Knowledge you continue to share with the community.

    Not to prejudge it’s content but I’m just wondering how your excellent blog aligns with Brent’s forthcoming webast “Why Index Fragmentation Doesn’t Matter”?

    Should we be rebuilding our indexes or not?

    Thanks,
    George

    • Haha! You’ll have to attend Brent’s webcast to see. I do agree that index maintenance is very often too overhyped and also misunderstood. I’ve found many cases where the maintenance has had a worse effect than the problem it’s trying to solve.

      • Webcast times are awkward for me over here in Belfast, Northern Ireland but I always make sure I catch up with the recording of the session.

        Yes i’ll be very interested to find out what the definitive index maintenance solution is!

        Also by way of feedback the pricing of your online training videos is very reasonable and I hope to be able to convince my company to fork out for your index training ones real soon!

        • I did see Brent’s webcast “Why Index Fragmentation Doesn’t Matter” yesterday and he makes a lot of good points. Turning off ongoing index maintenance is something I now need to give consideration to. But the information in your blog does seem to somewhat conflict. I guess the big question is: when does it make sense to defrag on in a modern shared SAN environment which as Brent put it is fragmented by design?

          This actually is a question for Brent as well, but I am waiting for his blog on the webcast to be posted so I can ask the question there.

          • Hi Don. Let’s step back and ask, “What’s the problem you’re trying to solve?”

            During the webcast, I took an extreme position in order to get people to think about how aggressively they defrag/rebuild. Defrags and rebuilds aren’t free, as I explained in the video, so as you’re thinking about the frequency, make sure you’re paying the right price and getting the right result.

  2. Pingback: (SFTW) SQL Server Links 06/09/13 • John Sansom

  3. Thanks for the post!
    Could you please elaborate on the case of database mirroring or AlwaysOn Availability Groups?
    It’s clear that rebuilding generates a lot of transaction log activity that will take time to apply on the secondary, but is there an actual danger of breaking the mirroring?

    • Hey Alex,

      With all types of replication products I’ve worked with and heard of (database mirroring, availability groups, transactional replication, SAN replication, filter driver replication technologies), there’s always a way that you can push so much data churn into the system that your secondary gets so far behind that you have to reset everything and start over.

      Things get so bogged down that perhaps things *could* catch up, but you’re so far out of SLA that it’s faster to reinitialize/resnap/etc. The process of just handling a vast amount of “deltas” is worse than just starting over in these cases.

      For availability groups and mirroring, yes, I do know of specific cases where this scenario has been caused by index rebuilds. Of course the scenarios vary in terms of geographic distance between the partners, throughput, server and storage builds. But it’s not terribly uncommon.

      Kendra

      • I see, thanks.
        Would you recommend some settings to prevent it, perhaps limiting the parallelism of the rebuilds?
        Though I suppose the way to go for a new setup is first to monitor and see if any action is required.

        • Yeah, there’s lots you can tweak — you can leave a few minutes between commands so that things have time to catch up, do partition level rebuilds (offline, only if using partitioning), and timebox the whole operation so that you only do a smaller amount of work. And of course limiting what you’re targeting (by level of fragmentation, purpose) helps. And you could potentially reorganize for a while and then manually stop the command in some instances.

          • Great points, thanks!

          • It’s been a while since my last comment — and since then I’ve found myself pretty much agreeing with the DBA described in the article. Fragmentation isn’t the problem (and even Brent has such an article).

            Due to the SCH-M locks, the I/O load, and the AlwaysOn Availability Groups issues that index maintenance causes, I decided to simply disable the index maintenance. As our system is 24/7 and highly concurrent, this actually improved performance (combined with the usual query tuning and offloading to readable replicas).

            However, it still bothers me — we now have high fragmentation in most of the main tables, which means we’re doing extra I/O and taking up extra RAM.
            But reducing that, temporarily, using index maintenance costs a lot of I/O by itself, and with locking and replica issues.

            Would you say that in this situation it would be best to just continue to skip index maintenance?
            Or perhaps only run short reorgs, which should have the least negative impact?

          • Hi Alex,

            We’re at the end of the allowable threaded comments/ “reply chain” for this particular comment. I can reply to this through the WordPress GUI, but WordPress will force you to start a new comment on this post if you’d like to respond. (Just an FYI in case the missing “reply” looks weird.)

            You’re in a special place with your application. You’ve got a complex high availability in place and 24×7 customers. Your comment basically says, “there’s no obvious choice here, only judgment calls.”

            That’s correct, and it’s you who has to make the judgment call. For me to tell you what’s right in your situation I’d need a lot more information, the type of detailed data that I’d only get over a multi-day consulting engagement.

            So I can’t tell you not to worry. It’s incredibly valuable to periodically reassess the situation and decide if it’s time to customize your maintenance to do something better. That’s what keeps you on the lookout for issues like index rebuilds potentially causing corruption in some versions of SQL Server (like we just found out about in the 2012 SP1 line).

  4. Kendra…you read my mind on my exact issue… as I just setup a bcp site with always on and mirroring with trans repl… it took the maintenance and introduced a whole bunch of issue with TLogs getting HUGE.. and the rebuilds of indexes tanked me a few times…I have yet to dig myself out given I am still the new guy on the block and my learning is a daily event!

    Good Read for sure… thx

    -Zero

  5. I am a SysAdmin and a NetAdmin… which does NOT mean I’m a DBAdmin (despite what hiring software seems to think).

    Your article gave relevance to the TechNet articles and other articles written towards a DBA audience. It allowed me to know that I could Reorganize my database uh… things… tables? whatever–without taking down my monitoring software’s DB.

    So thanks for writing something a non-DBA could grok!

  6. I am testing Index fragmentation on 35gb Database setting my Log size to 8gb i have used ola Hallengren’s scripts and tryied Custom scripts but the transaction log is filling up . Do we need to increase th Transaction log or is there any other way to make sure Transaction log is not filling up. Thanks for your resposne.

    • Padhu – yes, rebuilding indexes is a logged operation, so depending on your individual server’s setup, you may need to try using simple recovery mode, increasing the log size, doing transaction log backups more often, etc.

  7. Brent, Thanks for your response, I do not want to set database to simple recovery mode as we will be loosing point of recovery features and since this is OLTP System it is critical to have point of recovery. I am good to increase size for 35gb but i do have database which is 600gb and believe we need around 100% database size for Transaction Log as index are droped and re-created during rebuild process. I do tryied taking multiple Transaction log backup as Index fragmentation job was running but still it didn’t help, Your response and help is much appreciated.

  8. I’ve noticed my databases reserved space has started growing more, but the used space has slightly grown is about the same typically after we run our maintenace plan (index rebuild, reorg, update stats). One thing we did a while back is set our maintenance plan to run index rebuilds for two of our largest tables before it was runing reorg operations. This was mostly due to the fact a reorg was taking a long time versus a rebuild. Could that have caused our sudden increase in reserved space to increase?

    • Yes– absolutely. REBUILD requires temporary space in the datafile for building that nice, polished new structure whereas REORGANIZE fixes things up as it goes, and doesn’t need a bunch of extra space. It’s healthy to allow the database to keep that unused space in place so you don’t get into a deadly cycle of shrinking (which causes fragmentation) and then regrowing when you do maintenance… only to trigger another shrink that causes more fragmentation.

      There’s a good writeup of all the considerations around the temporary space issue in Books Online here: http://technet.microsoft.com/en-us/library/ms179542.aspx

      • Kendra,

        Thanks Kendra for the quick response. I figured the data files would grow when an index rebuild was being issued, but what’s strange to me is our DB is made up of multiple data files not sure if that matters total size 1,301 GB reserved 1,245 used after we ran the index rebuild on the two tables it jumped up to 1,317 reserved 1,207 used. Wait think I figured it out.. index 1 is 65 Gigs, and index 2 is 77 Gigs in size. When I force a rebuild it needs to expand the data file to accomodate the size at which time it will initiate a rebuild. Is that accurate.. guess I’ll revert the maintenance plan back to not force rebuilds.

  9. Another problem with rebuilds if you don’t have a maintenance window – even with SQL Enterprise and online rebuilds: any transaction started with snapshot isolation when the index rebuilds will fail with error 3961 if it accesses the rebuilt table again

    i.e.

    TX 1 TX 2
    begin snapshot tran

    select data from T
    rebuild index on T online
    rebuild complete

    select data from T again: 3961

  10. Well, the formatting got lost; begin snapshot, select data, select data again should be TX 1. rebuild index, rebuild complete should be TX2

  11. Great tips out here. Good stuff!

  12. On a rebuild completion will it update stats ? Does re-org update stats also ?

  13. We are thinking of purchasing an all flash array for our data center [for SQL, Exchange, VMware, etc.] and since flash is 20x faster than regular disk [and I know this may seem like a silly question]…..but is a re-org/rebuild still needed if your entire database is lightning fast even with heavy fragmentation?

    • Not a silly question at all!

      Even on traditional hardware, most of the read and write activity ends up being random on modern database systems due to the number of databases, the way storage is configured, and lots of things like that.

      Occasional defragmentation may still be desirable in some cases– it does compact pages and help eliminate wasted space on the pages that physical fragmentation can cause, which can still be valuable for making data more dense (and therefore making your memory more efficient, also). But I like where you’re going to try to minimize maintenance: that can absolutely help in maximizing performance, too!

  14. What is the real process behind the rebuild and reorganize? in the storage level.
    which one is more faster and why?

    • Hi Manjesh. For questions like that, check out Books Online. You’d be amazed at what you can learn in the manual – it’s really well-written these days. For even more internal details, check out the book SQL Server 2012 Internals by Kalen Delaney & friends.

  15. Pingback: Índices – Fragmentação – Rebuild ou Reorganize | Thiago Timm

  16. I could not imagine what would happen to me if I let the indexes get to 75% fragmentation. Wouldn’t be good for sure so I’m not getting it. Maybe because its such a small database I am not sure. I have larger tables than that.

    • If your databases aren’t large and most of your workload is random reads and writes (which is true for many OLTP databases), honestly you might not notice much of a difference. In the example I’m describing here, there was no measurable difference after all of the maintenance was standardized and fixed, and it wasn’t even a small database.

      • For what I would call our primary database which is currently about 6.5TB it processes around 500,000 claims a day which come in in the form of files (X12). There is some OLTP but much what goes on is reporting. However the picture is so much more complicated. I can point at so many issues. I many times suggested we hire you guys to tell Management these issues because it would have more impact coming from an outsider.
        Regarding the indexes, if they get too fragmented on some of these large table the performance gets very bad. I’m talking 20%. Even if I do not create a covering index and a look-up is required it can have a severe impact on some of these. So I end up with really too many indexes. So there are problems on top of problems it gets somewhat complicated.

        So for re-indexing I run a job 3 days a week. On the two weekday night jobs the threshold for re-indexing is 20% but it is 10% on Sunday mornings. I need a large transaction log drive and i have put logic in the script to monitor the size and go to sleep if it reaches a certain size. Yes I have no choice but to shrink it at that point. Oh and yes I do have table partitioning and as a job step before I get into the primary script i reorg partitions. However partitioning seems to be a performance hit for us to be honest, and mys testing has shown we get better performance if I remove the partitioning.

        One more thing I have noticed that reorganizing seems to create more impact on replication than simply altering an index. I believe that is because altering the index you can sort in TEMPDB which is not replicated

        So that’s part of my story and observation.

        • Wow, 20% My guess would be that the index maintenance is improving performance by changing something other than the fragmentation.

          I did a couple videos on how index maintenance can have OTHER side effects that can really boost performance here: http://www.brentozar.com/archive/2014/04/index-fragmentation-bad-statistics-arent-always-problem-video/

          If that’s the case, there might be faster ways to handle that than index maintenance.

          If the index maintenance is working well for you then it’s not the worst thing in the world. I don’t want to make it sound like it’s *bad*. But if the index maintenance starts taking too long and there’s issues with it in running an 24×7 environment, then it might be interesting to look deeper.

          • Ok I will check out the video. But tell me doesn’t index fragmentation = diskIO? Because there is no doubt my disk are overloaded. If i have 25% fragmentation in a table that is 1TB that’s a good deal of IO.

          • Not necessarily. Most OLTP databases already have random IO/lookups, and don’t heavily use sequential scans. If you’ve got more than one database with a file on the same drive, you’ve also got a random IO pattern even if they’re scanning and more than one is active. Brent goes even deeper in this video: http://www.brentozar.com/archive/2013/09/why-index-fragmentation-doesnt-matter-video/

            I do think that periodic index maintenance can help remove wasted empty space on pages. It’s just that historically index maintenance has been overly encouraged and credited with more benefits than it really has.

  17. Hi Kendra,

    We have Instances with very large number DBs and couple of the DBs is big close to 1TB.
    We are using OLA Hallengren script for Index Maintenance, in which it first updates statistics for all DB then runs Index maintenance. We also set time limit around 14 hours. So what happened is, within 14 hours it is able to finish only half of the DBs and when next time it runs again the same half of them gets done, though there are many times it finishes for all Dbs.

    So I am thinking may be run the maintenance script in two different schedules by splitting the DBs and also a thought of editing the script to move the DBs that are done to the bottom of the table where it sets the selected & completed bits and when next time runs it will not rerun the same one and will resume from where it left last time.

    Do you come across anything like that and what is your suggestion.

    Thank you in advance.
    Van

    • My first suggestion would be to question the assumption that you need to do index maintenance on 1 TB of data on a regular basis.

  18. Thanks for your quick response, yes we do need index maintenance on that big DB on a regular basis since there is lot of activity is going on that DB for nightly batch process in big Financial firm. We do archive them quarterly, but still it is too big.

    Also on my previous comment there is a mistake, actually OLA script, updates the column statistics and depends on the fragmentation level it runs the Index Rebuild or Reorganize & update Statistics.

    Thank you
    Van

    • OK– so to be really blunt here, you aren’t saying that you know you really need to do all this reindexing. You believe you do, but you don’t know specifically what will happen if you don’t. So you’re hitting multiple terabytes with a massive amount of IO and it takes a while.

      This is a common mistake people make: they reindex out of fear. It’s going to take some careful analysis and work to figure out how to do less and get your maintenance done faster, and possibly some custom coding on your part. There’s no magic answer I can give to fix that.

  19. Thanks Kendra!

  20. I was excited to see the new WAIT_AT_LOW_PRIORITY option for index rebuilds in 2014 as we constantly have SCH_M / SCH_S blocking issues on our AlwaysOn Secondary node when Index rebuilds are happening on the Primary and a long running query is holding a SCH_S on the same object on the secondary.

    I ran a few tests today thinking that this new option would clear the blocking on the Secondary but it hasn’t worked :( I used the following as a reference : http://www.sqldiablo.com/2012/08/01/alwayson-availability-groups-isolation-levels-selects-blocking-writers/ and did the following:

    ON SECONDARY KICK OFF LONG RUNNING QUERY:
    USE [AdventureWorks2012]

    select
    NumbersTable.Number
    from AdventureWorks2012.dbo.ErrorLog el
    cross apply (
    select
    Number
    from
    (
    select
    row_number() over (order by s1.name) as number
    from AdventureWorks2012.sys.sysobjects s1
    cross apply AdventureWorks2012.sys.sysobjects s2
    cross apply AdventureWorks2012.sys.sysobjects s3
    cross apply AdventureWorks2012.sys.sysobjects s4
    cross apply AdventureWorks2012.sys.sysobjects s5
    ) as InnerNumbersTable
    ) NumbersTable
    group by NumbersTable.Number
    order by NumbersTable.Number desc;

    ON PRIMARY REBUILD INDEX WITH NEW WAIT_AT_LOW_PRIORITY OPTION:
    USE [AdventureWorks2012]
    GO
    ALTER INDEX [NCI_ErrorNumber_ErrorSeverity_ErrorState] ON [dbo].[ErrorLog] REBUILD PARTITION = ALL
    WITH (ONLINE = ON (WAIT_AT_LOW_PRIORITY (MAX_DURATION= 2 MINUTES, ABORT_AFTER_WAIT=BLOCKERS)));
    GO

    When I check the Secondary I can see blocking as the long running query has a SCM_S lock and the Index rebuild is waiting on a SCH_M lock.

    However…..

    I wait and wait but the blocking doesn’t clear. I was expecting the long running query to be killed after 2 minutes to allow the Index Rebuild to happen.

    Either this is not meant to resolve the AlwaysOn blocking issues or I am not doing the test right… Would be great to hear if anyone has managed to get this one to work.

    Thanks!

    • Ooo, great comment Jack!

      I believe that you are seeing the expected behavior and that the ALTER INDEX command’s new option is designed to only abort blockers on the primary: but I agree, I don’t think that’s clear in the documentation! (I tend to think of that new feature as being for things like partition switching in and out and not specifically for AGs, which is maybe why I’d never thought of it in the way you were testing.)

      • Thanks Kendra! However, I just died a little inside :( I was triumphantly telling our Report Team that 2014 has solved this issue and I just needed to run this test to confirm and we were set to upgrade non-production to 2014 from 2012.

        I would imagine there has to be a lot of people out there experiencing blocking on AlwaysOn for the same reason – what a shame the SQL Server Team didn’t take this opportunity to resolve it.

        I guess I will have to develop my own in house solution for this issue (pain). Maybe I’ll roll it out along with SQL Server 2014 ;-)

        • I hate to be the bad news bears, but just in case you haven’t seen it, there’s also this:

          http://www.brentozar.com/archive/2014/01/careful-adding-indexes-with-alwayson-availability-groups/

          Longrunning transactions on secondary replicas can also block ghost cleanup from running in the primary and cause other perf issues, so having some way to detect and alert on those queries can come in handy.

          • Yep – have a comment even on that post too :)

            Our analysts run queries on the secondary that can last hours (they have a standard of SQL I like to call horrorSQL).

            And as the source for the Report Team’s ETL’s is the Primary node they want us to run our index rebuilds during the day at the same time the analysts are online – hence why we hit this issue so much.

          • Got it. Honestly, in your situation, I would ask, “Why even rebuild?” I’d try to reorganize (as rarely as I can get away with) and perhaps supplement with periodic statistics updates if needed. Reorganize is still pretty darn good– yeah, it only addresses leaf level fragmentation, but it still compacts pages and reclaims wasted space in them, and that’s pretty important. Why not just do that?

            (Note: We’re at the end of allowed comment replies on this particular comment thread, so you may need to start a new comment chain to reply back.)

          • Added this AGENT JOB to the Secondary which will give me what I want :)

            DECLARE @BlockingSessionID tinyint

            –Get the SPID of a Blocking Session holding a SCH_S when a SCH_M lock is trying to execute

            SELECT @BlockingSessionID = blocking_session_id
            FROM sys.dm_exec_requests
            WHERE wait_type = ‘LCK_M_SCH_M’ AND blocking_session_id IS NOT NULL AND ((wait_time/1000)/60) >=2

            –Kill the session
            IF @BlockingSessionID IS NOT NULL
            BEGIN
            DECLARE @SQLCommand varchar(64)
            SET @SQLCommand = ‘KILL ‘ + CAST(@BlockingSessionID as varchar(4))
            EXEC (@SQLCommand)
            END

            –Returned to the user on SSMS:
            –Msg 0, Level 11, State 0, Line 0
            –A severe error occurred on the current command. The results, if any, should be discarded.

  21. Thanks again Kendra. I think you may be on to something. We have the following in place for our maintenence (from Ola’s script):

    @FragmentationLow nvarchar(max) = NULL,
    @FragmentationMedium nvarchar(max) = ‘INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE’,
    @FragmentationHigh nvarchar(max) = ‘INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE’,
    @FragmentationLevel1 int = 30,
    @FragmentationLevel2 int = 70

    When i go through my log table I can see quite a lot on indexes over 70% fragmented and get REBUILT (I run the script across each DB once a week).

    Would you recommend looking at FILL FACTOR to try and get the fragmentation level down?

    Also I know the Report Team have Index Drop and Creates built into their ETL Loads – might be time for us to have a little chat :)

    Any final thoughts on Stats updates across a DataWarehouse? I see so much conflicting advice out there on this topic in a Very Large Table (Billions of Rows) environment.

    Your help today has been great!

    • Fillfactor might be an option. I don’t like to lower it across the board, but when you’ve got frequently fragmented indexes, that’s a sign it might work.

      I’d also look at the size of the fragmented indexes and consider bumping up @PageCountLevel. The default level of 1000 pages (~8MB) isn’t super high. I’d just be curious to know how many of the frequently fragmented tables are close to that.

      For the ETL drop and create, man, that’s a hard one: it can be so much better to do that when loading data! But if it’s part of an AG, man, that makes it more rough. You’ve got to be in full recovery model and can’t get minimal logging, and all that data’s got to get churned over to the secondary server. For warehouses I always want to really know what the RPO and RTO are, and if there’s any way I can get away with simple recovery model for staging databases for performance reasons!

      On the stats update, yeah– it’s a hard one. I’ve seen billion row warehouses where stats were updated once a day right after data loads with default sampling and that was just fine, and others which were much more sensitive and needed more special treatment. It does vary quite a lot by environment.

      The biggest advice I’ve learned the hard way is not to get into updating stats with fullscan or adding filtered stats or super fine tuning parts of it unless you can clearly document what queries are improved by the change and by how much. If you can’t then it becomes difficult for others to support and understand. (Been there!)

      So really with the stats question, I would look at things like your top queries in sp_BlitzCache and work through tuning top queries and making them faster, and only worry about the stats as you work through problems and can document why you’re making decisions. Later on that will help someone else (or even you) figure out if that maintenance is choice is still the best one.

  22. Kendra I would like to try and see how fragmentation affects disk reads. For instance if I turn Statstics IO on and run a query would I see changes in reads as fragmentation grows? Would I have to free the cache to do so? Perhaps the DMV sys.dm_io_virtual_file_stats wold be the way to do that? Any idea’s?

    Thanks

    • Oh, I think this might actually make a good post on its own. Here are some quick notes:

      If it compacts the pages, you’ll see fewer reads, because there’s simply fewer pages in the index. This is one of the reasons why I think it’s probably good for most folks with active systems to at least periodically run index maintenance, even if some people only need to do it quite rarely.

      You can use the avg_page_space_used_in_percent column in sys.dm_db_index_physical_stats to estimate how much space that would be if a large table was scanned.

      For queries you can run yourself: Statistics IO will break it down and show you logical and which of those were physical vs readahead.

      For a dev system: For sys.dm_io_virtual_file_stats, that’s physical reads, so for testing measuring that way you’d have to run DBCC DROPCLEANBUFFERS (clean pages) and CHECKPOINT (dirty pages). < - Disclaimer for readers, be very careful, don't blow away your buffers in production, that'll impact your performance!

      For production: My favorite way of looking at this actually asking, "How much space in my memory is wasted in the middle of the day?" (Or whatever your most active time is.) You can see the free space in memory in sys.dm_os_buffer_descriptors. Query you can modify for that easily in BOL here.

      Where it gets tricksy: You might actually see fewer reads due to an execution plan change brought on as a side effect, rather than the actual compaction of the pages. This makes showing it with a sample of a lot of queries in a production workload very difficult, unless you’re painstakingly comparing the plans and data hasn’t been coming in.

      • Sorry to crash the thread, but I was intrigued by “How much space in my memory is wasted in the middle of the day?”

        Is it something like this?

        SELECT mem_space_wasted_mb = SUM(CONVERT(BIGINT, free_space_in_bytes)) / 1024 / 1024
        ,mem_space_wasted_percent = SUM(CONVERT(BIGINT, free_space_in_bytes)) * 1.0 / COUNT(*) / 8096
        FROM sys.dm_os_buffer_descriptors

        Are there any rule-of-thumb numbers for how much these should be, perhaps by type of system? (I’m probably hoping for too much here…)
        It sounds to me that it could be perfectly reasonable for a system to “waste” a lot of memory on page free space as part of normal operations (even before taking into consideration cost of the I/O load that index maintenance generates).

        Thanks!

        • One interesting thing is that the wasted space is not entirely from index fragmentation: it might be from heaps (you can get weird trapped space in these), or it might be from having rebuilt a lot of indexes with a LOWER fillfactor.

          And you’re totally right that there’s always going to be some of this. When it gets to be around 25% of all the memory on the server, that’s where I start thinking “we can probably do better.” But sometimes index maintenance with a low fillfactor is actually more of a part of the cause than the solution, oddly enough!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php