Last week, I blogged about the basics of SQL Server index fragmentation: why it happens, how to fix it, and how often people are fixing it. I left you with a cliffhanger: it seemed that the frequency of defrag jobs didn’t appear to affect fragmentation levels:
- Databases with no index defragmentation were an average of 5% fragmented
- Monthly – 17% fragmented
- Weekly – 3% fragmented
- Daily – 6% fragmented
At first glance, that would seem to indicate that your database got worse off if you defragmented! But like all good novels (and most bad ones), the plot thickens.
Enter Data Mining with Excel and SQL Server
Data mining is a lot like diamond mining, only there’s no monopoly on the market, and the ladies don’t seem to appreciate a quality KPI. Otherwise, they’re identical: there’s a whole lot of money in it, but that money doesn’t usually go to the people who do the actual mining. It goes to the executives and salespeople who take advantage of the mined products to make better decisions.
The people doing the mining, on the other hand, are forced to spend their lives in tiny, dark caves (or “cubicles”) trying to extract beautiful gems (or “data”) while risking painful lung ailments (or “carpal tunnel”) due to toiling with terribly unsafe and outdated hardware (or “hardware”).
For today’s demo, I will be the miner, and you’ll be the executive who takes advantage of my work. (It’s okay, I’m used to it – I work for a vendor now.)
In my podcast Data Mining with Excel in Four Minutes, I explained how to set up Microsoft’s free data mining add-ins for Excel 2007. It’s an Excel plugin that hooks up to any SQL Server Analysis Services server on your network, either SQL Server 2005 or 2008, and makes data mining a point-and-click affair. It doesn’t require high-end horsepower – even a desktop or laptop works great for this. If you can’t be bothered to set up an SSAS instance, then check out my Data Mining in the Cloud writeup on how to get started without using a server at all.
Help SSAS Help You: Explain Your Numbers
While data mining is really easy to set up, you can get much better results if you “prequalify” your data and turn some of the numbers into basic categories.
If I was working with United States salary data, for example, my source data might have a column for Hourly Wage. I would add another column and call it Tipped Employees:
- Under $6.55 per hour – Tipped Employees = Yes. You can pay someone less than minimum wage if they get tips, and in that case, you really just can’t go by their hourly wage alone.
- $6.55 per hour and over – Tipped Employees = “Unknown.” In a perfect world, I’d have enough data to find out if these people get tips, but that’s not always the case.
By adding a new attribute to my data, something that’s not clear from the numbers alone, I might get better insight from my data mining efforts.
By the way, if you’re reading this and it’s after July 2009, the minimum wage has risen to $7.25 per hour. If you’re a VB developer, you should immediately ask for a pay increase to match the new standard – unless of course they’ve got a tip jar by your desk.
Explaining Our Index Fragmentation Numbers
In the case of our index fragmentation numbers, one of the source data fields is Page Count – the number of pages that an object has. Size matters with fragmentation: small objects with only a handful of pages may appear to have very high fragmentation numbers, but they can’t actually be defragmented. There’s only so much defragmentation you can do when a table only has three pages. I’ve actually been on support escalation calls where customers demand to know why a defrag job doesn’t reduce all types of fragmentation to absolute zero, even for tables with just one page.
Microsoft’s best practices on SQL Server 2000 index defragmentation notes that:
“Generally, you should not be concerned with fragmentation levels of indexes with less than 1,000 pages. In the tests, indexes containing more than 10,000 pages realized performance gains, with the biggest gains on indexes with significantly more pages (greater than 50,000 pages).”
With that in mind, I added a Page Count Group column and calculated it with a formula:
=IF(Table1[[#This Row],[page_count]]>50000,"Large",(IF(Table1[[#This Row],[page_count]]<10000,"Small","Medium")))
That adds a text label for Small, Medium or Large depending on the size of the table.
Suddenly, The Data Makes More Sense

Fragmentation Pivot Table
Even before doing data mining, if we just add a Pivot Table, we can suddenly make more sense out of the numbers.
For Large tables, we see an average 44% fragmentation when the database has no defragmentation jobs set up. Monthly defrag drops that to 14%, and daily drops it to just 2%! The Weekly data is a bit of an outlier here, but it’s still less than no defrag jobs at all, so we’ll have to dig deeper.
For Medium tables, we see the type of data distribution we would hope for: the more often we defrag, the lower our fragmentation gets.
For Small tables, the data is all over the place, but we know why: it has to do with the way smaller tables behave.
Adding this bit of human interpretation helped us get better results from our data – and we haven’t even started mining!
More Reading on SQL Server Fragmentation
If you liked this article, check out:

Brent,
Great post and series! Looking forward to part 3.
The small / medium / large categories helps put priority and focus where it is needed most – kind of like "don't waste your time" / "look a little closer" / "really pay attention!".
While I understand the stated index size thresholds of 10,000 pages and 50,000 pages (and that page count is the metric which was collected), it may also be helpful to offer a relative perspective by stating these thresholds for reference purposes as storage capacity metrics of about 80 MB (10,000 pages x 8 KB page size) and about 400 MB (50,000 pages x 8 KB page size). The "about" caveat depends on whether your preferred storage metric unit measures are decimal (MB = 1,000,000 bytes or 10^6 bytes) or binary (MB = 1,048,576 bytes or 2^20 bytes) – roughly a 5% difference for MB and 7.3% for GB.
Scott R.
Brent, does this scale down at all?
I have 3 productions servers. Only one database has indexes that are over 10000 pages, only 3 others have indexes over 1000 pages.
I have scheduled jobs that defrag indexes on the main databases (HR, Accounts, CRM etc) as I know they cause slowdowns if they are not kept neat and tidy.
Doesnt the frag %age have a vote in whether the index should be rebuilt/reorganised? Or does a 500 page index that is 90% fragged not cause the problems that a 5000 page index that is 2% fragged …
Jonathan
Jonathan – everything is “Your mileage may vary.” Depends on the random speed of the drives, the read vs write mix of the data, the frequency that the data is queried, etc. My approach is to start with Michelle Ufford’s excellent index defrag script, set it with very high thresholds (like only rebuild indexes when things are extremely fragmented) and then see how long it takes to run. Then I turn it down lower and lower over time so that I defrag as much as possible in as little time as possible. Whatever performance gains I can get from it, great, but I don’t go to the individual table to look at fragmentation vs size vs read/write mix and so on.