Let’s have a chat about your indexes in SQL Server. Indexes can be confusing. They are the easiest way to boost performance of queries, but there are so many options and terms around indexes that it can be hard to keep track of them. Let’s chat about the differences between clustered and nonclustered indexes, key columns and nonkey columns, and covering and non-covering indexes.
Clustered vs. Nonclustered Indexes
Clustered index: a SQL Server index that sorts and stores data rows in a table, based on key values.
The basic syntax to create a clustered index is
1 |
CREATE CLUSTERED INDEX Index_Name ON Schema.TableName(Column); |
Nonclustered index: a SQL Server index which contains a key value and a pointer to the data in the heap or clustered index.
The basic syntax for a nonclustered index is
1 |
CREATE INDEX Index_Name ON Schema.TableName(Column); |
The difference between clustered and nonclustered SQL Server indexes is that a clustered index controls the physical order of the data pages. The data pages of a clustered index will always include all the columns in the table, even if you only create the index on one column. The column(s) you specify as key columns affect how the pages are stored in the B-tree index structure. A nonclustered index does not affect the ordering and storing of the data.
A B-tree structure has at least two levels: the root and the leaves. If there are enough records, intermediate levels may be added as well. Clustered index leaf-level pages contain the data in the table. Nonclustered index leaf-level pages contain the key value and a pointer to the data row in the clustered index or heap.

fine art, folks.
There are a few limits to indexes.
- There can be only one clustered index per table.
- SQL Server supports up to 999 nonclustered indexes per table.
- An index – clustered or nonclustered – can be a maximum of 16 columns and 900 bytes.
These are limits, not goals. Every index you create will take up space in your database. The index will also need to be modified when inserts, updates, and deletes are performed. This will lead to CPU and disk overhead, so craft indexes carefully and test them thoroughly.
Primary Key as a Clustered Index
Primary key: a constraint to enforce uniqueness in a table. The primary key columns cannot hold NULL values.
In SQL Server, when you create a primary key on a table, if a clustered index is not defined and a nonclustered index is not specified, a unique clustered index is created to enforce the constraint. However, there is no guarantee that this is the best choice for a clustered index for that table. Make sure you are carefully considering this in your indexing strategy.
Key vs. Nonkey Columns
Key columns: the columns specified to create a clustered or nonclustered index.
Nonkey columns: columns added to the INCLUDE clause of a nonclustered index.
The basic syntax to create a nonclustered index with nonkey columns is:
1 |
CREATE INDEX Index_Name ON Schema.TableName(Column) INCLUDE (ColumnA, ColumnB); |
A column cannot be both a key and a non-key. It is either a key column or a non-key, included column.
The difference lies in where the data about the column is stored in the B-tree. Clustered and nonclustered key columns are stored at every level of the index – the columns appear on the leaf and all intermediate levels. A nonkey column will only be stored at the leaf level, however.
There are benefits to using non-key columns.
- Columns can be accessed with an index scan.
- Data types not allowed in key columns are allowed in nonkey columns. All data types but text, ntext, and image are allowed.
- Included columns do not count against the 900 byte index key limit enforced by SQL Server.
Covering Indexes
Covering index: all columns returned in a query are in the index, so no additional reads are required to get the data.
A covering index will reduce the IO operations, and improve performance of queries.
Let’s create a query and compare two indexes. I’m creating these on the Product.Products table in the AdventureWorks2012 database.
The query we want to use is
1 2 3 |
SELECT ProductID, Name, ProductNumber, Color FROM dbo.Products WHERE Color = 'Black'; |
The first index is nonclustered, with two key columns:
1 |
CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name, ProductNumber); |
The second is also nonclustered, with two key columns and three nonkey columns:
1 2 |
CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber) INCLUDE (Color, Class, Style); |
In this case, the first index would not be a covering index for that query. The second index would be a covering index for that specific query.
Indexes Can Be Magic, Except Whey They Aren’t
Indexes are the easiest thing to add to your database to boost performance. However, too much of a good thing can be bad. When designing a database, or troubleshooting poor performance, consider all your index options and carefully test them.
55 Comments. Leave new
Thank you for the article short and to the point.
There is a lot more about indexes but you cover what really matters from the point of view of a developer.
If you allow me I would add some information about when not to indexing:
Small tables (a table scan is faster an index are a waste of resources)
A lot of Changes Update, Delete and insert on the table
FQR
Frank
I have read about SQL Server physical Architecture. But I am bit confused with allocation units.
can you please tell me what exactly an allocation unit is and for what it is used for?
Thank you.
That’s not covered in this blog post. I’m going to direct you to “Table and Index Organization”.
Thank you so much.i will check this link.
i have already read the page which you have mentioned in the previous post. Anyways thank you for posting.
Good article on indexes but I have one point of contention. On a clustered table, a non-clustered index has the key value and a copy of the clustering key; it is a lookup value and not a pointer. For heaps it does have the key value and the RID pointer. Seems like I’m being petty, I know but when designing indexes this little tidbit is really important to understand as clustering keys get heavily replicated throughout non-clustered indexes. Can you tell that this is a hot-button of mine?
Thanks for that clarification Rick!
Very good and easy to follow explanation and very nice diagram.
Hi Jes,
If you want to be technical, please note:
B-Tree structure is not used for indexes in SQL Server, B + Tree structure is used instead.
Regards,
Pavel
Pavel – yes, you are correct. Thanks for the clarification!
Very well explained!
Last two SQL Create requests look similar to me
CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name, ProductNumber);
CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)
Are we not going to have two identical indexes with different names?
Ignore it.
INCLUDE (Color, Class, Style);
was missing.
Hi,
I am bit confused with the statement “A column cannot be both a key and a non-key. It is either a key column or a non-key, included column.” Can you please elaborate it a bit with some example to make me clear my understanding.
Thanks,
Som
Sure! Let’s look at the sample later in the blog post.
CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)
INCLUDE (Color, Class, Style);
In this case, Name and ProductNumber are key columns. Color, Class, and Style are non-key columns – they are “included”, but not part of the leaf level.
We couldn’t make this index happen:
CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)
INCLUDE (Name, Color);
In that example, Name is in the both the key and non-key column lists, and SQL Server would throw an error.
Does that help?
Thanks, now it makes me understand clearly 🙂 Appreciate your quick response
Hi Brent,
Please confirm do include columns increase the size of index as they reside at leaf pages, if not then why?
Regards,
Neeraj
Include columns reside at the leaf level, but they do not count towards the number of key columns allowed, or the size of the nonclustered index. More information can be found at http://msdn.microsoft.com/en-us/library/ms190806.aspx.
Hi Jess,
Thanks for answering. However, as per the link it seems that column specified in include column increase the size of the index. Below is the statement
The total size of all nonkey columns is limited only by the size of the columns specified in the INCLUDE clause;
Also, i checked the sp_spaceused with both index with and without included columns and found that include column increase the size of index.
Request you to help how SQl server compute the size of include column with example.
Thanks
Adding non-key (included) columns does not count towards the maximum size of the non-clustered index, which is limited to 900 bytes. Yes, it will increase the size of the index itself. You can run these queries to see the size of the index increase.
USE AdventureWorks2012;
GO
CREATE INDEX IX_Products_ProductNumber ON dbo.Products(ProductNumber);
CREATE INDEX IX_Products_ProductNumber_Includes ON dbo.Products(ProductNumber)
INCLUDE (Name, SafetyStockLevel, ReorderPoint);
SELECT database_id, DDIPS.object_id, object_name(DDIPS.object_id) as table_name, IDX.name as index_name, DDIPS.index_id, index_type_desc, alloc_unit_type_desc, page_count, page_count * 8 as index_size_KB
FROM sys.dm_db_index_physical_stats(DB_ID(‘AdventureWorks2012’), OBJECT_ID(‘dbo.Products’), NULL, NULL, ‘LIMITED’) DDIPS
INNER JOIN sys.objects OBJ ON OBJ.object_id = DDIPS.object_id
INNER JOIN sys.indexes IDX ON IDX.object_id = OBJ.object_id
AND DDIPS.index_id = IDX.index_id;
The page count for IX_Products_ProductNumber_Includes is indeed larger. There are two distinct sizes to an index – the size of the index key, and the size of the index itself.
Thanks, it’s clear now.
Hi Jes,
you are correct if you say that INCLUDE will not affect the size of the index. But that’s only because INCLUDED cols will not be part of the B-Tree entries.
That is a very important issue to know. If you have really large relations a wrong definition of an index can increase the IO significantly.
Just as an example. I have a relation called companies and a CostCenter is stored for each company. Most of the queries run again the CostCenter only but Name and TaxNo should be selected, too:
SELECT Id, Name, TaxNo, CostCenter FROM dbo.tbl_Companies WHERE …
To prevent a lookup in the CLUSTERED KEY i will create an index which covers the predicate and the attributes which should be displayed. (Id is the key attribute of the clustered index).
— Now I create a covering index:
CREATE INDEX ix_tbl_Companies_Covered ON dbo.tbl_Companies (CostCenter, Name, TaxNo);
And I create a “similiar” index but include the Name and TaxNo (because I’m not searching for it!)
CREATE INDEX ix_tbl_Companies_Covered ON dbo.tbl_Companies (CostCenter) INCLUDE ( Name, TaxNo);
If I check the number of pages in the B-Tree it will be much more pages with covered index but with INCLUDED index.
The number of pages in the leaf will in both scenarios the same!
— view is available since 2012!
SELECT index_id,
page_type,
COUNT(next_page_page_id) AS NumPages
FROM sys.dm_db_database_page_allocations
(db_id(), OBJECT_ID(‘dbo.tbl_Companies’), IndexNumber, NULL, ‘DETAILED’)
GROUP BY
index_id,
page_type
BTW: A really good explained article concering indexes – Thumbs up!!!
Yep, absolutely correct observation. Thanks for the example!
Sorry,
i’m a bit confused. Please could you clarify these two points:
1 – What’s the difference between B+tree and B-tree; and how is related to clustered and non-clustered indexes;
2 – “INCLUDED cols will not be part of the B-Tree entries.” So i expect into the example of Uwe Ricken that, the first index (i.e.: the one without the INCLUDE) would require more pages. But seems to be the opposite.
Thank you v much
alessaless
B+ and B- is far beyond the scope of this blog post. You can read about those more at http://en.wikipedia.org/wiki/B%2B_tree and http://en.wikipedia.org/wiki/B-tree.
I’m really very new to the whole “art of indexing” as I call it. I’m trying to understand the whole concept. Your diagram & explanation are beginning to sink in. Thanks.
My only question is regarding your very last line of the article, Jes. “…consider all your index options and carefully test them.”
What “index options” do you first look for when you are looking at a table.
And also, “…carefully test them.”. When you say that, how do you test or how do you go abouts in determining if an index was helpful or not?
Thanks and I look forward to hearing from you!
John, we have a ton of resources on our index page – https://www.brentozar.com/sql/index-all-about-sql-server-indexes/.
Index options are clustered, nonclustered, using key columns, and even things like indexed views and filtered indexes.
There are two DMVs that can help you determine if indexes are being used – sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats. A general methodology: Run index stats for a table. Apply an index. Run some queries against it that would represent your typical load. Check the stats again to see if the new indexes are being used. Make sure adding or removing indexes won’t make other queries or parts of your applications slow.
I hope this helps! It’s a big topic, and there is a lot to learn. Take a step at a time, and you’ll be performance tuning with indexes in no time at all!
Hi Jes,
My understanding of indexes was that we should index on attributes used in ‘where’ clause & ‘joins’ and if you want to have covering indexes, use the attributes in the ‘select’ clause in the include column.
Is my understanding correct?
In this scenario, won’t the below index be covering instead of IX_Products_Name_ProductNumber_ColorClassStyle:
CREATE INDEX IX_Products_Color ON dbo.Products(Color)
INCLUDE (ProductID, Name, ProductNumber, Class, Style);
Smiles,
Shankar
Shankar, you’re correct – you want to index on columns in joins and the where clause.
There are several ways you could write this covering index – depending on existing indexes, recommended missing indexes, and existing queries.
Hi Jes!
Great write up, it has clarified some things for me. The one thing that’s a bit confusing to me is the terminology ” Clustered Index = sorts and stores data rows in a table” or I have also heard, that the clustered index is the table. So the way I understand this is if I have a table and it has a clustered index, it’s a clustered table and the data is stored in sorted matter. If it does not have cluster it’s a heap so it’s not sorted. With this being said, moving on to Column Store indexes, they can only be non-clustered which means the data is not sorted so while we no longer have to maintain additional non-clustered indexes if you have a column store index, you should still have a clustered index so the table will store the data sorted? Hope this makes sense! Thank you!
You are correct on the difference between a heap and a clustered index table. Unless there is a proven performance benefit in favor of a heap, I would recommend having a clustered index on every table.
Columnstore indexes are simply another type of nonclustered index. They store the data on a column basis, rather than a row basis like traditional nonclustered indexes. One thing to note is that adding a columnstore index to a table will make it read-only, so adding or changing data after a columnstore index is added becomes much harder.
Thanks Jes!
We began using columnstore indexes for our data warehouse where our data is read only with the exception of loading in which case we are dropping and recreating the columnstore index. We just noticed in the execution plans that it’s not using the clustered index, it’s only using the columnstore index so that is what brought up the question. I agree that keeping the cluster is a good call since it physically stores the sorted data even though we might not see it being used. Next question is why the execution plan now displays that 0 partitions are being scanned with a column store index. When we only have the clustered index, we do see partition scans. I believe it has something to do with the segmentation elimination, but I’m still researching and experimenting at this point. This may be out of scope for this specific topic, but if you have any pointers I would greatly appreciate it and we can take it offline if necessary. Thanks again!!
Fine article!
Do you a similar article about MySQL indexes. I tried to search on your site but couldn’t find anything.
Sunil, I don’t. The best I can do is point you to the MySQL documentation – http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html.
Ok. Thanks.
Hi Jes, if the SELECT statement contains most table columns and the execution plan is showing a KEY LOOKUP due to the WHERE clause using a non-clustered index column, is it wise to INCLUDE all these columns in the non-clustered index? It seems that I’m duplicating most of the data from the CLUSTERED INDEX to avoid the lookup? I tried researching online what the guidelines are and couldn’t really find anything…
Thanks!
That’s going to vary in every situation, based on the data types and sizes of the columns. Does the size of the index outweigh the benefit of having it? How much faster is the query with this nonclustered index? What is the cost? Is the I/O reduced? All of these things have to be taken into account.
Is there a way to tell if Index with covering columns effectively being utilized or is it doing table lookups?
Say, I have a tblA with contract_no (pk), …. dealer_no…
Have index idx_dealer_with_include_cols on dealer_no with few included columns…
I want to know how effective is my index? else should I trim it to just keep 1 column ie. dealer_no
any DMV query to figure this out?
Trying the understand sp_BlitzIndex output for the below:
dbo.ContractTBL.idx_refund_date (88)
[1 KEY] Refund_Date {datetime 8} [1 KEY] Contract_Number {varchar 18} 90
Reads: 2 (2 seek) Writes:48,698
1,752,982 singleton lookups; 1 scans/seeks; 0 deletes; 0 updates;
9,219,273xx rows; 3.0GB Lock escalation attempts: 15,510; Actual Escalations: 278.
I see it say 2 reads but again says 1,752,982 singleton lookups – what does this mean?
Essentially trying to understand ‘Usage Stats’ and ‘Op Stats’ ; Any link with explanation?
Thanks in advance..
Our http://BrentOzar.com/BlitzIndex page does give more information about these columns: https://www.brentozar.com/blitzindex/sp_blitzindex-workaholic-indexes/. Have you looked at your top resource-intensive queries in the execution plan cache to see if this index is used in one of them?
Maybe it’s just getting to late in the day but I don’t really see an answer to this question on either of those links, has the information been removed since then? Trying to understand the difference between the Usage Stats and the Op Stats.
Have you looked at what they return?
SELECT
*
FROM
[sys].[dm_db_index_usage_stats] [ddius]
CROSS APPLY [sys].[dm_db_index_operational_stats]
([ddius].[database_id],
[ddius].[object_id],
[ddius].[index_id],
NULL) [ddios];
Usage stats is the number of times the index has shown up in an execution plan.
Operational stats is the number of times the index was actually touched.
So say you’ve got a key lookup, and it was hit 100 times – usage stats will show 1 hit, operational stats will show 100.
Excellent explaination Jes !
It’s really that the KEY of an index can be 900 bytes OR 16 columns – whichever comes first. But, even that’s not always true. If a table has an XML column then the PK can have a maximum of 15 columns AND the PK *must* be the CL key.
Why are Op Stats all 0’s and yet Usage Stats show 29 seeks?
Usage Stats Op Stats
Reads: 29 (29 seek) Writes:14 0 singleton lookups; 0 scans/seeks; 0 deletes; 0 updates;
David – when you’ve got questions unrelated to what was covered in a blog post, head on over to http://dba.stackexchange.com. Be as specific as you can in the question, like give examples.
Good article on indexes.
Excellent explanation Jes Schultz Borland .
Thanks,
Dasteswarudu D.
Hello there,
this is a bit old post. However I was reading it now and would have question:
Are you sure, that that last example, for covering index, is good?
You are using query:
SELECT ProductID, Name, ProductNumber, Color
FROM dbo.Products
WHERE Color = ‘Black’;
And as covering index:
CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)
INCLUDE (Color, Class, Style);
In that index, you are missing ProductID column. From that, what I understand what covering index is, this isn’t covering inde. To go further, SQL would be doing index scan on this, correct? To make it correct covering index, shouldn’t it be adjusted to this:
CREATE INDEX IX_Products_Name_ProductNumber_ProductIDColorClassStyle ON dbo.Products(Name, ProductNumber)
INCLUDE (ProductID, Color, Class, Style);
And to make it covering index only for this specific query, like this:
CREATE INDEX IX_Products_Name_ProductNumber_ColorProductID ON dbo.Products(Name, ProductNumber)
INCLUDE (Color, ProductID);
Thanks,
Jan
Or, if you are counting with secret column from clustered index, then OK, it is covering that query :). But wasn’t that obvious directly …
Jan
We love you Brent. Thank you for clear and simple and easily digestible explanations of all things SQL Server. This post is terrific!
Awww, you’re welcome!
recently I observed a design issue(?) where I found NC index has been designed to INCLUDE Clustered Key. I am curious to know how does it make sense designing NC index this way?
Sure, click Training at the top of the site and watch my totally free How to Think Like the Engine to understand how that works.
Brent, when a clustered index is used on a plan but Key Lookup occurs, is it advisable to recreate the clustered index with the columns to the index (create with include) ?
Alexandre – for help with specific plans, your best bet is to start here: https://www.brentozar.com/archive/2009/03/getting-help-with-a-slow-query/