Tag Archive: index

Webcast Recording – DBA Darwin Awards: Index Edition

Right after the DBA says “Uh oh,” they call in the consultants. You wouldn’t believe some of the stuff I’ve seen, but here’s the worst of it: some of these things are probably happening in your environment right now. If you’re a production DBA of any experience level, do yourself a favor and look for these common mistakes before they call in the consultants.

In this 30-minute video, we’ll discuss heaps, fill factor, maintenance plans, over-indexing, and the Database Tuning Advisor’s stupidity so you can learn what you’re doing wrong before your boss has to hear it from the outsider:

The links we discuss in the video:

Enjoyed this video?  Check out our upcoming webcasts.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Indexing for Deletes

If you’re only indexing to make reads faster, you need to think again. Indexes can make every operation in the database faster, even deletes.

The Problem: Deletes are Very Slow

I was lazily researching new development techniques one day when I received an email from a client asking why deletes could be slow. I rattled off a few quick possibilities and promised that I’d look into it as soon as I was able to. Due to the vagaries of travel, it took me more time than I’d expected to dig into the problem, but I found something interesting that I should have come to mind right from the start.

This database contains some hierarchical data. My initial thought was that there was a cascading delete taking place in the hierarchy. After some initial checks into cascading deletes, fragmentation, statistics, and triggers, I ran an actual delete on a test system and found something very interesting – almost all of the time spent deleting the row was spent performing a clustered index scan on a different table.

What Happens During a Delete?

When you try to delete a row, a few things happen. SQL Server says “OK, let’s make sure that we can actually delete this row, what else depends on it?” SQL Server will check for dependent rows by examining all foreign keys. It will then check any related tables for data. If there is an index, SQL Server will use that index to check for related data. If there isn’t an index, though, SQL Server will have to scan the table for data.

Deletes and Table Scans

Don’t believe me? Try this out yourself.

Make a new database. Copy data in from the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables in AdventureWorks. I use the Import Data wizard to quickly copy data from one database to another.

ALTER TABLE Sales.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader PRIMARY KEY (SalesOrderID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT PK_SalesOrderDetail PRIMARY KEY (SalesOrderDetailID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader
FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)
ON DELETE CASCADE;

With these three statements in place, we’re able to create a situation where SQL Server has to perform a full table scan just to delete a single row. Make sure you’ve told SQL Server to include the actual execution plan and run this:

DELETE FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 51721;

Once that query runs, the execution plan is going to look a bit like the execution plan below. If you add it up, 99% of the work comes from finding the rows to delete in the SalesOrderDetail table and then actually deleting them.

Who knew that deleting a row was so much work?

Making Deletes Faster

How would we go about making deletes like this faster? By adding an index, of course. Astute readers will have noticed the missing index information in that execution plan I took a screenshot of. In this case, the missing index looks something like this:

CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_SalesOrderID
ON Sales.SalesOrderDetail(SalesOrderID);

Before adding the index, the query had a cost of 2.35678. After adding the index, the delete has a cost of 0.0373635. To put it another way: adding one index made the delete operation 63 times faster. When you have a busy environment, even tiny changes like this one can make it faster to find and delete records in the database.

What’s All of This Mean?

When you’re looking into database performance problems, remember that you aren’t always reading just to return data to the user, sometimes you need to find data in order to delete it. Even when we’re trying to get rid of data, it can be helpful to have an index to make deletes go faster.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

Sargability: Why %string% Is Slow

People love to search.

Google has us addicted to fast, easy search functions.  Users expect every application to have a built-in blazing-fast search functionality.  To pull that off, developers build search queries that let users enter a string, and we ask SQL Server to find matches.  For example, say our users need to find some nuts.  We take their string, put percent signs on either side, and then search for that field in the product names:

SELECT *
FROM AdventureWorks.Production.Product
WHERE [Name] LIKE '%nut%'

This works great on our machine – but when we scale it up to hundreds or thousands of simultaneous users, or if we ramp up the number of products, our response time sucks.  Let’s find out why by checking the execution plan.  In SQL Server Management Studio, click Query, Display Estimated Execution Plan, and we get:

Query Execution Plan

Query Execution Plan

There’s only one operation, which would sound like a good thing, but that one operation is a clustered index scan.  That means SQL Server has to read every row out of the Product table, check to see whether it’s got “nut” anywhere in the name, and then return our results.

But wait – there’s an index on Production.Product for the Name field, and SQL Server will use that index if we have a slightly different query:

Less Wildcards, Same Nuts

Less Wildcards, Same Nuts

If we take the leading % wildcard out, notice that this time it does two operations.  First, it does an index seek against our Name index, then it looks up the corresponding rows in the base table (Product).  Two operations might sound more expensive, but if we looked at the total cost for each query, the second one would be faster in situations where we’re only pulling a small number of search results back.  If this table was for a company called Nothin’ But Nuts, on the other hand, we would probably still need to scan the entire table, but that’s a discussion for another day.

So why doesn’t SQL Server use the index for the %nut% query?  Pretend for a moment that you held a phone book in your hand, and I asked you to find everyone whose last name contains the letters HAM.  You would have to scan every single page in the phone book, because the results would include things like:

  • Beckham
  • Chambers
  • Hamilton
  • Thames

If, on the other hand, I asked you to find everyone whose last name began with the letters HAM, you could turn straight to the H page in the phone book, scan through a few lines, and quickly get the information you need.

When I asked you for everyone beginning with the letters HAM, my query was sargable.  When I asked you for all last names containing HAM anywhere in the name, my query was not sargable – meaning, you couldn’t leverage the indexes to do an index seek.  (Yes, sargable is sort of a real word – it stems from the phrase Search Arguments.)

Since we can’t talk users out of using search queries, and since non-sargable queries don’t scale, we need to find a better way to search for text.  That’s where SQL Server’s Full Text Search comes in.  Unfortunately, your queries must change – normally when you add an index, your queries just magically become faster.  Full text indexes don’t work that way.  You have to tweak your queries to use operators like CONTAINS instead of LIKE.  If you’re dealing with an old boat anchor of a database and the developers are long gone, you might be out of luck.

%String% is just one thing that will cause SQL Server to do slower scans instead of high-performing index seeks.  To learn more about sargability, check out:

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Index Fragmentation Findings: Part 2, Size Matters

Last week, I blogged about the basics of SQL Server index fragmentation: why it happens, how to fix it, and how often people are fixing it.  I left you with a cliffhanger: it seemed that the frequency of defrag jobs didn’t appear to affect fragmentation levels:

  • Databases with no index defragmentation were an average of 5% fragmented
  • Monthly – 17% fragmented
  • Weekly – 3% fragmented
  • Daily – 6% fragmented

At first glance, that would seem to indicate that your database got worse off if you defragmented! But like all good novels (and most bad ones), the plot thickens.

Enter Data Mining with Excel and SQL Server

Data Mining with Open Source Tools

Your Grandfather In His Cubicle

Data mining is a lot like diamond mining, only there’s no monopoly on the market, and the ladies don’t seem to appreciate a quality KPI.  Otherwise, they’re identical: there’s a whole lot of money in it, but that money doesn’t usually go to the people who do the actual mining.  It goes to the executives and salespeople who take advantage of the mined products to make better decisions.

The people doing the mining, on the other hand, are forced to spend their lives in tiny, dark caves (or “cubicles”) trying to extract beautiful gems (or “data”) while risking painful lung ailments (or “carpal tunnel”) due to toiling with terribly unsafe and outdated hardware (or “hardware”).

For today’s demo, I will be the miner, and you’ll be the executive who takes advantage of my work. (It’s okay, I’m used to it – I work for a vendor now.)

In my podcast Data Mining with Excel in Four Minutes, I explained how to set up Microsoft’s free data mining add-ins for Excel 2007.  It’s an Excel plugin that hooks up to any SQL Server Analysis Services server on your network, either SQL Server 2005 or 2008, and makes data mining a point-and-click affair.  It doesn’t require high-end horsepower – even a desktop or laptop works great for this.  If you can’t be bothered to set up an SSAS instance, then check out my Data Mining in the Cloud writeup on how to get started without using a server at all.

Help SSAS Help You: Explain Your Numbers

While data mining is really easy to set up, you can get much better results if you “prequalify” your data and turn some of the numbers into basic categories.

If I was working with United States salary data, for example, my source data might have a column for Hourly Wage.  I would add another column and call it Tipped Employees:

  • Under $6.55 per hour – Tipped Employees  = Yes.  You can pay someone less than minimum wage if they get tips, and in that case, you really just can’t go by their hourly wage alone.
  • $6.55 per hour and over – Tipped Employees = “Unknown.”  In a perfect world, I’d have enough data to find out if these people get tips, but that’s not always the case.

By adding a new attribute to my data, something that’s not clear from the numbers alone, I might get better insight from my data mining efforts.

By the way, if you’re reading this and it’s after July 2009, the minimum wage has risen to $7.25 per hour.  If you’re a VB developer, you should immediately ask for a pay increase to match the new standard – unless of course they’ve got a tip jar by your desk.

Explaining Our Index Fragmentation Numbers

In the case of our index fragmentation numbers, one of the source data fields is Page Count – the number of pages that an object has.  Size matters with fragmentation: small objects with only a handful of pages may appear to have very high fragmentation numbers, but they can’t actually be defragmented.  There’s only so much defragmentation you can do when a table only has three pages.  I’ve actually been on support escalation calls where customers demand to know why a defrag job doesn’t reduce all types of fragmentation to absolute zero, even for tables with just one page.

Microsoft’s best practices on SQL Server 2000 index defragmentation notes that:

“Generally, you should not be concerned with fragmentation levels of indexes with less than 1,000 pages. In the tests, indexes containing more than 10,000 pages realized performance gains, with the biggest gains on indexes with significantly more pages (greater than 50,000 pages).”

With that in mind, I added a Page Count Group column and calculated it with a formula:

=IF(Table1[[#This Row],[page_count]]>50000,"Large",(IF(Table1[[#This Row],[page_count]]<10000,"Small","Medium")))

That adds a text label for Small, Medium or Large depending on the size of the table.

Suddenly, The Data Makes More Sense

Fragmentation Pivot Table

Fragmentation Pivot Table

Even before doing data mining, if we just add a Pivot Table, we can suddenly make more sense out of the numbers.

For Large tables, we see an average 44% fragmentation when the database has no defragmentation jobs set up.  Monthly defrag drops that to 14%, and daily drops it to just 2%!  The Weekly data is a bit of an outlier here, but it’s still less than no defrag jobs at all, so we’ll have to dig deeper.

For Medium tables, we see the type of data distribution we would hope for: the more often we defrag, the lower our fragmentation gets.

For Small tables, the data is all over the place, but we know why: it has to do with the way smaller tables behave.

Adding this bit of human interpretation helped us get better results from our data – and we haven’t even started mining!

More Reading on SQL Server Fragmentation

If you liked this article, check out:

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Index Fragmentation Findings: Part 1, The Basics

We asked the community for index fragmentation information for data mining purposes, and the results are in – and interesting. Over the next couple of weeks, I’ll talk about some of the results we found in the field.

What Is SQL Server Fragmentation?

SQL Server stores data in pages.  To make things easy, think of them as being literally pages of paper.  Let’s say you’re the database, and I’m going to give you data to store.  You’ve got a pile of blank paper and a pen, and your job is to keep track of the names of the customers I say out loud.

I want my imaginary table to be stored with a clustered key on Last Name, First Name.

I start talking, and I say things like this:

  • Denny Cherry
  • Steve Ballmer
  • Bill Gates
  • Donald Farmer
  • Jason Massie
  • Tom LaRock
  • Tim Ford

Notice that I’m not going in alphabetical order – I’m just yelling them out as people buy my product.  People don’t buy it in alphabetical order.  (Looks like Denny is an early adopter – he’s probably the kind of guy who runs Windows 7 on his desktop already.)

Fragmented Pages

Fragmented Pages

You’re faced with two challenges:

  • How much blank space do you leave on each page in order to accommodate new people who will have last names between, say, Farmer and Ford?
  • When you run out of space on a page, how do you quickly rewrite that page to break up the data and leave more room?

There’s a lot of database concepts here like fill factor and page splits, but the problem we’re focusing on here today is fragmentation: what happens when these pages of paper are scattered all over the place with varying amounts of free space on them.

It gets more complicated with wider tables, tables that store more data than just first and last name.  The wider they are, the more likely they are to have complicated indexes – so you’ve got lots of pieces of paper to manage, duplicate copies of your data.

After a couple of hours of me yelling out names, two things are going to happen. First, you’re going to quit, and second, I’m going to have to figure out how to clean up the mess to improve performance.

Does It Really Matter?

Are we just being anal when we want our pages nice, clean and neat?  Do we really need to keep our cubicle clean?  (I’ve seen some of your workplaces, and no, pizza crumbs don’t go away by themselves.)

It turns out fragmentation matters – a lot.  Microsoft put out a whitepaper on SQL Server 2000 Index Defragmentation Best Practices, and it included performance statistics on the impact of fragmentation.  Here’s a quote:

“The workload performance increase realized in the small-scale environment ranged from 60 percent at the low level of fragmentation to more than 460 percent at the highest level of fragmentation. The workload performance increased realized for the large-scale environment ranged from 13 percent at the low fragmentation level to 40 percent at the medium fragmentation level.”

Index fragmentation slowed their systems down between 13% to 460%.  Ouch.

How Do We Fix SQL Server Fragmentation?

The solution is to regularly defragment or rebuild our indexes. As with anything else in SQL Server, there’s three ways to do that:

  • Use the maintenance plan wizard in SQL Server Management Studio. Microsoft includes a really simple wizard to automate this process, but it has a killer weakness.  It rebuilds every index, every time, regardless of the fragmentation level.  On databases under 10gb with long maintenance windows every night or weekend, this makes sense.  The larger the database grows and the smaller the maintenance window shrinks, this is no longer an option.  This is compounded by the fact that SQL Server Standard Edition can’t rebuild indexes online – it holds locks on the table while it works, locking out your users.
  • Write your own T-SQL scripts to do more custom logic. Using T-SQL commands, you can check first to see which indexes are fragmented, and only rebuild the ones that match your thresholds for reindexing or rebuilding.  Michelle Ufford aka SQLFool wrote an excellent index defrag script for SQL Server 2005/2008, and it’s available on SQLServerPedia’s code library.  She’s also doing a free webcast about index defragmentation on March 12.
  • Buy a third party index maintenance program. When something is difficult or impossible with the native SQL Server tools, vendors like Quest step in and build tools to make it easier.  Quest Capacity Manager adds even more functionality with things like reporting and maintenance windows: you can pick maintenance window times for each server, and Capacity Manager will automatically defrag as many indexes as possible during that time window and bail out before your production users return.
Another Silent Killer

Another Silent Killer

All of these solutions have their pros and cons, and sadly, many DBAs just don’t take any action at all.  Last year, I wrote a tech brief for Quest called Best Practices in Index Maintenance – Fighting the Silent Performance Killer, and I called it that because so many DBAs just don’t know how much performance they’re leaving on the table.  The cheesiness didn’t stop with the title, either – here’s the opening paragraph:

“Right now in your data center, a vicious, unforgiving villain is slowly draining the performance from your database servers. It strikes without warning and doesn’t leave a trace in event logs or messages. This killer is fragmentation: data that belongs together logically is not stored together physically.

I’m revisiting the topic now because in these tight budget times, we’re not getting the money we need to improve server performance, and we have to figure out how to get more value out of the hardware we already have.

So How Many DBAs Defragment Their Databases?

Before we start, I’d start by saying these results were only gathered from volunteers willing to send in their data. My hunch, based off the DBAs I spoke with, is that results are probably worse than this survey appears. When DBAs emailed me their results, and I asked if they were currently doing any index defrag maintenance, they said things like:

  • “Ummm…no. Just learning about it.”
  • “No, but I’ve been meaning to.”

So my guess, and this is just an unscientific guess, is that DBAs who aren’t doing index defragmentation were less likely to send in their index fragmentation statistics. Because of that, I won’t cover the percentages of who’s doing it and who’s not, because I don’t think it’s that accurate.

How Often Do They Defrag Indexes?

Answers were daily, weekly, monthly and not at all, which leads to the next question: how much of a difference does it make?  Let’s look at the average fragmentation percentages for all objects in the database:

  • Databases with no index defragmentation were an average of 5% fragmented
  • Monthly – 17% fragmented
  • Weekly – 3% fragmented
  • Daily – 6% fragmented

It’s like a looking at the results of a developer writing SQL after three Red Bulls – it doesn’t make any sense, and you just want to wad it up and throw it away.

But that’s where data mining comes in.  If we start analyzing the data and including more attributes about what we’re analyzing, it turns out it all makes perfect sense.

Continue to Part 2: My Study on SQL Server Index Fragmentation

Brent Ozar PLF Team

Brent Ozar PLF is a boutique consulting firm focused on understanding your environment and strategy. We partner with you to objectively identify pain points and develop solutions that align to your business goals. Your experience comes first; we share our knowledge and expertise to help you.

More Posts

Follow Me:
TwitterFacebookGoogle PlusYouTube