I bet you’re worried about your index fragmentation.
And I bet you’re wasting your time, missing the real point.
What is Index Fragmentation and How It Happens
Let’s take a step back for a second and pretend that your database is a phone book organized by last name, first name.
As people move into your city, we have to add them to the phone book. Ideally, each page has some empty space, and we govern that with the fill factor. When SQL Server rebuilds indexes, it uses the fill factor to decide how much free space to leave on each page. If there’s not enough free space, SQL Server has to do some rearranging – but it can’t exactly shove a brand new page in the middle of the phone book. The book’s already bound. We’ll have to tack more blank pages onto the end.
Problem #1 – Internal Fragmentation: We’ve got a newly added page with hardly any stuff on it.
Problem #2 – External Fragmentation: The phone book pages are out of order.
Updating people in our phone book causes problems too. When Pat Down marries Phil McCann, and we update her last name in the phone book, we leave empty space in the D section and cram a new record into the M’s. Deletes cause problems by leaving empty space behind (internal fragmentation).
How Fragmentation Hurts SQL Server Performance
Bad internal fragmentation (having lots of free space on the pages) means the index is bigger than it needs to be. Instead of our phone book having 1,000 pages that are 100% full, we might have 1100 pages that are only 90% full. This means every time we need to scan the index, it’ll take 10% longer (1,100 pages instead of 1,000). This also means we need more memory to cache the same amount of data – because SQL Server has to cache the empty space on each page. Our lowest unit of caching is a single page, not a record.
Bad external fragmentation (having shuffled pages on disk) means our storage performance could be slower. If magnetic hard drives have to jump around to different areas of the drive, their performance suffers – a lot. Take this recent Toshiba enterprise drive review by StorageReview – it gets around 200MB/sec for large sequential reads, but under 2MB/sec for random reads. Ouch.
Fixing Index Fragmentation Temporarily
If you can’t cache the database in memory, or you want to fix it on disk anyway, you can solve it by rebuilding or defragmenting the index. Most folks do this with maintenance plans, but those have a nasty problem. They rebuild (or defrag) every single index in the database, every time, whether it’s necessary or not. The maintenance plans ignore whether the table’s even had a single write since the last time it was maintained.
This is a problem because rebuilding and defragmenting indexes causes SQL Server to write to the transaction log. The more we write to the logs, the longer our log backups take, the more we have to push across the network wire for database mirroring or log shipping, and the longer restores take.
We might even be doing more damage, too. Some DBAs decide they want to fix fragmentation by setting a low fill factor, like say 50%. By doing so, half of every page would be empty – so inserts would be blazing fast. Reads, however, would be twice as slow. In order to scan the entire phone book, we’d have twice as many pages we have to read. Tweaking fill factor is a dangerous dance, much like the ones I did in high school. (True story: broke my nose slam dancing.) We might be forcing more empty space on each page every time we rebuild when we don’t need to.
Fix Index Fragmentation Permanently
Start by trying to cache your database – or at least the data that’s frequently accessed – in memory. External fragmentation (out-of-order pages on disk) doesn’t matter as much when we don’t have to hit the disks to begin with. The difference between physically fragmented disk throughput and physically unfragmented disk throughput is miniscule compared to the speed difference between memory and disk. Cache it and be done with it – 384GB of memory is just $5-$6k.
Next, dig into what your storage system is really doing with your data. If you’re using shared storage like EMC, NetApp, or Dell gear that shares drives between a bunch of different servers, then all of your drive access will be random anyway. Your hard drives are shared with other servers that are also making drive requests at the same time, so the drives will always be jumping all over the place to get data. Defragging your indexes is just meaningless busy work.
Find out what your queries are waiting on – and identify what problem you’re really trying to fix. During our SQL Server health checks, we rarely say, “Ah, yes, here’s the problem – your data is fragmented.” We DBAs are just used to defragmenting the bejeezus out of our databases because it’s one of the few problems we can fix easily, and see simple numbers to know whether our indexes become less fragmented. Find the real root cause, and then attack that.
And if you discover that the root problem really is fragmented indexes – and it just might be – use Ola Hallengren’s free database maintenance scripts. They check the fragmentation level of each index before they start – and then only defrag or rebuild the indexes that meet your thresholds. This keeps your log file activity as light as possible while achieving the performance improvement you’re after.