SQL Server Index Terms

Indexing
55 Comments

Let’s have a chat about your indexes in SQL Server. Indexes can be confusing. They are the easiest way to boost performance of queries, but there are so many options and terms around indexes that it can be hard to keep track of them. Let’s chat about the differences between clustered and nonclustered indexes, key columns and nonkey columns, and covering and non-covering indexes.

Clustered vs. Nonclustered Indexes

Clustered index: a SQL Server index that sorts and stores data rows in a table, based on key values.

The basic syntax to create a clustered index is

Nonclustered index: a SQL Server index which contains a key value and a pointer to the data in the heap or clustered index.

The basic syntax for a nonclustered index is

The difference between clustered and nonclustered SQL Server indexes is that a clustered index controls the physical order of the data pages.  The data pages of a clustered index will always include all the columns in the table, even if you only create the index on one column. The column(s) you specify as key columns affect how the pages are stored in the B-tree index structure. A nonclustered index does not affect the ordering and storing of the data.

A B-tree structure has at least two levels: the root and the leaves. If there are enough records, intermediate levels may be added as well. Clustered index leaf-level pages contain the data in the table. Nonclustered index leaf-level pages contain the key value and a pointer to the data row in the clustered index or heap.

hand-drawn representation of a b-tree.
fine art, folks.

There are a few limits to indexes.

  • There can be only one clustered index per table.
  • SQL Server supports up to 999 nonclustered indexes per table.
  • An index – clustered or nonclustered – can be a maximum of 16 columns and 900 bytes.

These are limits, not goals. Every index you create will take up space in your database. The index will also need to be modified when inserts, updates, and deletes are performed. This will lead to CPU and disk overhead, so craft indexes carefully and test them thoroughly.

Primary Key as a Clustered Index

Primary key: a constraint to enforce uniqueness in a table. The primary key columns cannot hold NULL values.

In SQL Server, when you create a primary key on a table, if a clustered index is not defined and a nonclustered index is not specified, a unique clustered index is created to enforce the constraint. However, there is no guarantee that this is the best choice for a clustered index for that table. Make sure you are carefully considering this in your indexing strategy.

Key vs. Nonkey Columns

Key columns: the columns specified to create a clustered or nonclustered index.

Nonkey columns: columns added to the INCLUDE clause of a nonclustered index.

The basic syntax to create a nonclustered index with nonkey columns is:

A column cannot be both a key and a non-key. It is either a key column or a non-key, included column.

The difference lies in where the data about the column is stored in the B-tree. Clustered and nonclustered key columns are stored at every level of the index – the columns appear on the leaf and all intermediate levels. A nonkey column will only be stored at the leaf level, however.

There are benefits to using non-key columns.

  • Columns can be accessed with an index scan.
  • Data types not allowed in key columns are allowed in nonkey columns. All data types but text, ntext, and image are allowed.
  • Included columns do not count against the 900 byte index key limit enforced by SQL Server.

Covering Indexes

Covering index: all columns returned in a query are in the index, so no additional reads are required to get the data.

A covering index will reduce the IO operations, and improve performance of queries.

Let’s create a query and compare two indexes. I’m creating these on the Product.Products table in the AdventureWorks2012 database.

The query we want to use is

The first index is nonclustered, with two key columns:

The second is also nonclustered, with two key columns and three nonkey columns:

In this case, the first index would not be a covering index for that query. The second index would be a covering index for that specific query.

Indexes Can Be Magic, Except Whey They Aren’t

Indexes are the easiest thing to add to your database to boost performance. However, too much of a good thing can be bad. When designing a database, or troubleshooting poor performance, consider all your index options and carefully test them.

Previous Post
AlwaysOn Availability Groups in AWS Revisited
Next Post
Your First Good Job Isn’t In the Newspaper

55 Comments. Leave new

  • Thank you for the article short and to the point.
    There is a lot more about indexes but you cover what really matters from the point of view of a developer.
    If you allow me I would add some information about when not to indexing:
    Small tables (a table scan is faster an index are a waste of resources)
    A lot of Changes Update, Delete and insert on the table
    FQR

    Reply
  • Rick Roberts
    July 13, 2012 4:43 pm

    Good article on indexes but I have one point of contention. On a clustered table, a non-clustered index has the key value and a copy of the clustering key; it is a lookup value and not a pointer. For heaps it does have the key value and the RID pointer. Seems like I’m being petty, I know but when designing indexes this little tidbit is really important to understand as clustering keys get heavily replicated throughout non-clustered indexes. Can you tell that this is a hot-button of mine?

    Reply
  • Very good and easy to follow explanation and very nice diagram.

    Reply
  • Pavel Nefyodov
    July 17, 2012 5:58 am

    Hi Jes,

    If you want to be technical, please note:
    B-Tree structure is not used for indexes in SQL Server, B + Tree structure is used instead.

    Regards,
    Pavel

    Reply
  • Very well explained!

    Reply
  • Pavel Nefyodov
    July 18, 2012 10:19 am

    Last two SQL Create requests look similar to me

    CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name, ProductNumber);

    CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)

    Are we not going to have two identical indexes with different names?

    Reply
  • Hi,

    I am bit confused with the statement “A column cannot be both a key and a non-key. It is either a key column or a non-key, included column.” Can you please elaborate it a bit with some example to make me clear my understanding.

    Thanks,
    Som

    Reply
    • Jes Schultz Borland
      September 27, 2012 2:58 pm

      Sure! Let’s look at the sample later in the blog post.

      CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)
      INCLUDE (Color, Class, Style);

      In this case, Name and ProductNumber are key columns. Color, Class, and Style are non-key columns – they are “included”, but not part of the leaf level.

      We couldn’t make this index happen:
      CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)
      INCLUDE (Name, Color);
      In that example, Name is in the both the key and non-key column lists, and SQL Server would throw an error.

      Does that help?

      Reply
  • Hi Brent,

    Please confirm do include columns increase the size of index as they reside at leaf pages, if not then why?

    Regards,
    Neeraj

    Reply
    • Jes Schultz Borland
      November 6, 2012 3:18 pm

      Include columns reside at the leaf level, but they do not count towards the number of key columns allowed, or the size of the nonclustered index. More information can be found at http://msdn.microsoft.com/en-us/library/ms190806.aspx.

      Reply
      • Hi Jess,

        Thanks for answering. However, as per the link it seems that column specified in include column increase the size of the index. Below is the statement

        The total size of all nonkey columns is limited only by the size of the columns specified in the INCLUDE clause;

        Also, i checked the sp_spaceused with both index with and without included columns and found that include column increase the size of index.

        Request you to help how SQl server compute the size of include column with example.

        Thanks

        Reply
        • Jes Schultz Borland
          November 13, 2012 10:54 am

          Adding non-key (included) columns does not count towards the maximum size of the non-clustered index, which is limited to 900 bytes. Yes, it will increase the size of the index itself. You can run these queries to see the size of the index increase.

          USE AdventureWorks2012;
          GO

          CREATE INDEX IX_Products_ProductNumber ON dbo.Products(ProductNumber);

          CREATE INDEX IX_Products_ProductNumber_Includes ON dbo.Products(ProductNumber)
          INCLUDE (Name, SafetyStockLevel, ReorderPoint);

          SELECT database_id, DDIPS.object_id, object_name(DDIPS.object_id) as table_name, IDX.name as index_name, DDIPS.index_id, index_type_desc, alloc_unit_type_desc, page_count, page_count * 8 as index_size_KB
          FROM sys.dm_db_index_physical_stats(DB_ID(‘AdventureWorks2012’), OBJECT_ID(‘dbo.Products’), NULL, NULL, ‘LIMITED’) DDIPS
          INNER JOIN sys.objects OBJ ON OBJ.object_id = DDIPS.object_id
          INNER JOIN sys.indexes IDX ON IDX.object_id = OBJ.object_id
          AND DDIPS.index_id = IDX.index_id;

          The page count for IX_Products_ProductNumber_Includes is indeed larger. There are two distinct sizes to an index – the size of the index key, and the size of the index itself.

          Reply
          • Thanks, it’s clear now.

          • Hi Jes,

            you are correct if you say that INCLUDE will not affect the size of the index. But that’s only because INCLUDED cols will not be part of the B-Tree entries.

            That is a very important issue to know. If you have really large relations a wrong definition of an index can increase the IO significantly.

            Just as an example. I have a relation called companies and a CostCenter is stored for each company. Most of the queries run again the CostCenter only but Name and TaxNo should be selected, too:

            SELECT Id, Name, TaxNo, CostCenter FROM dbo.tbl_Companies WHERE …

            To prevent a lookup in the CLUSTERED KEY i will create an index which covers the predicate and the attributes which should be displayed. (Id is the key attribute of the clustered index).

            — Now I create a covering index:
            CREATE INDEX ix_tbl_Companies_Covered ON dbo.tbl_Companies (CostCenter, Name, TaxNo);

            And I create a “similiar” index but include the Name and TaxNo (because I’m not searching for it!)

            CREATE INDEX ix_tbl_Companies_Covered ON dbo.tbl_Companies (CostCenter) INCLUDE ( Name, TaxNo);

            If I check the number of pages in the B-Tree it will be much more pages with covered index but with INCLUDED index.
            The number of pages in the leaf will in both scenarios the same!

            — view is available since 2012!
            SELECT index_id,
            page_type,
            COUNT(next_page_page_id) AS NumPages
            FROM sys.dm_db_database_page_allocations
            (db_id(), OBJECT_ID(‘dbo.tbl_Companies’), IndexNumber, NULL, ‘DETAILED’)
            GROUP BY
            index_id,
            page_type

            BTW: A really good explained article concering indexes – Thumbs up!!!

          • Jes Schultz Borland
            December 7, 2012 1:58 pm

            Yep, absolutely correct observation. Thanks for the example!

          • Sorry,
            i’m a bit confused. Please could you clarify these two points:

            1 – What’s the difference between B+tree and B-tree; and how is related to clustered and non-clustered indexes;

            2 – “INCLUDED cols will not be part of the B-Tree entries.” So i expect into the example of Uwe Ricken that, the first index (i.e.: the one without the INCLUDE) would require more pages. But seems to be the opposite.

            Thank you v much
            alessaless

          • Jes Schultz Borland
            November 10, 2014 9:13 am

            B+ and B- is far beyond the scope of this blog post. You can read about those more at http://en.wikipedia.org/wiki/B%2B_tree and http://en.wikipedia.org/wiki/B-tree.

  • John Waclawski
    February 4, 2013 2:36 pm

    I’m really very new to the whole “art of indexing” as I call it. I’m trying to understand the whole concept. Your diagram & explanation are beginning to sink in. Thanks.

    My only question is regarding your very last line of the article, Jes. “…consider all your index options and carefully test them.”

    What “index options” do you first look for when you are looking at a table.

    And also, “…carefully test them.”. When you say that, how do you test or how do you go abouts in determining if an index was helpful or not?

    Thanks and I look forward to hearing from you!

    Reply
    • Jes Schultz Borland
      February 4, 2013 2:56 pm

      John, we have a ton of resources on our index page – https://www.brentozar.com/sql/index-all-about-sql-server-indexes/.

      Index options are clustered, nonclustered, using key columns, and even things like indexed views and filtered indexes.

      There are two DMVs that can help you determine if indexes are being used – sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats. A general methodology: Run index stats for a table. Apply an index. Run some queries against it that would represent your typical load. Check the stats again to see if the new indexes are being used. Make sure adding or removing indexes won’t make other queries or parts of your applications slow.

      I hope this helps! It’s a big topic, and there is a lot to learn. Take a step at a time, and you’ll be performance tuning with indexes in no time at all!

      Reply
  • Shankar Krishnamoorthy
    March 1, 2013 8:55 am

    Hi Jes,
    My understanding of indexes was that we should index on attributes used in ‘where’ clause & ‘joins’ and if you want to have covering indexes, use the attributes in the ‘select’ clause in the include column.

    Is my understanding correct?

    In this scenario, won’t the below index be covering instead of IX_Products_Name_ProductNumber_ColorClassStyle:

    CREATE INDEX IX_Products_Color ON dbo.Products(Color)
    INCLUDE (ProductID, Name, ProductNumber, Class, Style);

    Smiles,
    Shankar

    Reply
    • Jes Schultz Borland
      March 1, 2013 10:51 am

      Shankar, you’re correct – you want to index on columns in joins and the where clause.

      There are several ways you could write this covering index – depending on existing indexes, recommended missing indexes, and existing queries.

      Reply
  • Hi Jes!

    Great write up, it has clarified some things for me. The one thing that’s a bit confusing to me is the terminology ” Clustered Index = sorts and stores data rows in a table” or I have also heard, that the clustered index is the table. So the way I understand this is if I have a table and it has a clustered index, it’s a clustered table and the data is stored in sorted matter. If it does not have cluster it’s a heap so it’s not sorted. With this being said, moving on to Column Store indexes, they can only be non-clustered which means the data is not sorted so while we no longer have to maintain additional non-clustered indexes if you have a column store index, you should still have a clustered index so the table will store the data sorted? Hope this makes sense! Thank you!

    Reply
    • Jes Schultz Borland
      March 18, 2013 8:10 am

      You are correct on the difference between a heap and a clustered index table. Unless there is a proven performance benefit in favor of a heap, I would recommend having a clustered index on every table.

      Columnstore indexes are simply another type of nonclustered index. They store the data on a column basis, rather than a row basis like traditional nonclustered indexes. One thing to note is that adding a columnstore index to a table will make it read-only, so adding or changing data after a columnstore index is added becomes much harder.

      Reply
      • Thanks Jes!

        We began using columnstore indexes for our data warehouse where our data is read only with the exception of loading in which case we are dropping and recreating the columnstore index. We just noticed in the execution plans that it’s not using the clustered index, it’s only using the columnstore index so that is what brought up the question. I agree that keeping the cluster is a good call since it physically stores the sorted data even though we might not see it being used. Next question is why the execution plan now displays that 0 partitions are being scanned with a column store index. When we only have the clustered index, we do see partition scans. I believe it has something to do with the segmentation elimination, but I’m still researching and experimenting at this point. This may be out of scope for this specific topic, but if you have any pointers I would greatly appreciate it and we can take it offline if necessary. Thanks again!!

        Reply
  • Sunil Shevante
    June 27, 2013 1:41 am

    Fine article!

    Do you a similar article about MySQL indexes. I tried to search on your site but couldn’t find anything.

    Reply
  • Hi Jes, if the SELECT statement contains most table columns and the execution plan is showing a KEY LOOKUP due to the WHERE clause using a non-clustered index column, is it wise to INCLUDE all these columns in the non-clustered index? It seems that I’m duplicating most of the data from the CLUSTERED INDEX to avoid the lookup? I tried researching online what the guidelines are and couldn’t really find anything…

    Thanks!

    Reply
    • Jes Schultz Borland
      October 16, 2013 7:34 am

      That’s going to vary in every situation, based on the data types and sizes of the columns. Does the size of the index outweigh the benefit of having it? How much faster is the query with this nonclustered index? What is the cost? Is the I/O reduced? All of these things have to be taken into account.

      Reply
  • Is there a way to tell if Index with covering columns effectively being utilized or is it doing table lookups?

    Say, I have a tblA with contract_no (pk), …. dealer_no…
    Have index idx_dealer_with_include_cols on dealer_no with few included columns…

    I want to know how effective is my index? else should I trim it to just keep 1 column ie. dealer_no

    any DMV query to figure this out?

    Reply
  • Trying the understand sp_BlitzIndex output for the below:

    dbo.ContractTBL.idx_refund_date (88)
    [1 KEY] Refund_Date {datetime 8} [1 KEY] Contract_Number {varchar 18} 90
    Reads: 2 (2 seek) Writes:48,698
    1,752,982 singleton lookups; 1 scans/seeks; 0 deletes; 0 updates;
    9,219,273xx rows; 3.0GB Lock escalation attempts: 15,510; Actual Escalations: 278.

    I see it say 2 reads but again says 1,752,982 singleton lookups – what does this mean?
    Essentially trying to understand ‘Usage Stats’ and ‘Op Stats’ ; Any link with explanation?

    Thanks in advance..

    Reply
    • Jes Schultz Borland
      September 20, 2014 3:02 pm

      Our https://www.brentozar.com/BlitzIndex page does give more information about these columns: https://www.brentozar.com/blitzindex/sp_blitzindex-workaholic-indexes/. Have you looked at your top resource-intensive queries in the execution plan cache to see if this index is used in one of them?

      Reply
      • Maybe it’s just getting to late in the day but I don’t really see an answer to this question on either of those links, has the information been removed since then? Trying to understand the difference between the Usage Stats and the Op Stats.

        Reply
        • Erik Darling
          April 22, 2016 4:25 pm

          Have you looked at what they return?

          SELECT
          *
          FROM
          [sys].[dm_db_index_usage_stats] [ddius]
          CROSS APPLY [sys].[dm_db_index_operational_stats]
          ([ddius].[database_id],
          [ddius].[object_id],
          [ddius].[index_id],
          NULL) [ddios];

          Reply
        • Usage stats is the number of times the index has shown up in an execution plan.

          Operational stats is the number of times the index was actually touched.

          So say you’ve got a key lookup, and it was hit 100 times – usage stats will show 1 hit, operational stats will show 100.

          Reply
  • Excellent explaination Jes !

    It’s really that the KEY of an index can be 900 bytes OR 16 columns – whichever comes first. But, even that’s not always true. If a table has an XML column then the PK can have a maximum of 15 columns AND the PK *must* be the CL key.

    Reply
  • David Goodman
    May 24, 2016 8:22 pm

    Why are Op Stats all 0’s and yet Usage Stats show 29 seeks?

    Usage Stats Op Stats
    Reads: 29 (29 seek) Writes:14 0 singleton lookups; 0 scans/seeks; 0 deletes; 0 updates;

    Reply
  • Dasteswarudu D
    June 29, 2016 8:22 am

    Good article on indexes.
    Excellent explanation Jes Schultz Borland .

    Thanks,
    Dasteswarudu D.

    Reply
  • Hello there,

    this is a bit old post. However I was reading it now and would have question:
    Are you sure, that that last example, for covering index, is good?
    You are using query:
    SELECT ProductID, Name, ProductNumber, Color
    FROM dbo.Products
    WHERE Color = ‘Black’;
    And as covering index:
    CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber)
    INCLUDE (Color, Class, Style);

    In that index, you are missing ProductID column. From that, what I understand what covering index is, this isn’t covering inde. To go further, SQL would be doing index scan on this, correct? To make it correct covering index, shouldn’t it be adjusted to this:
    CREATE INDEX IX_Products_Name_ProductNumber_ProductIDColorClassStyle ON dbo.Products(Name, ProductNumber)
    INCLUDE (ProductID, Color, Class, Style);

    And to make it covering index only for this specific query, like this:
    CREATE INDEX IX_Products_Name_ProductNumber_ColorProductID ON dbo.Products(Name, ProductNumber)
    INCLUDE (Color, ProductID);

    Thanks,
    Jan

    Reply
    • Or, if you are counting with secret column from clustered index, then OK, it is covering that query :). But wasn’t that obvious directly …

      Jan

      Reply
  • Ned Bakelman
    April 14, 2019 7:52 am

    We love you Brent. Thank you for clear and simple and easily digestible explanations of all things SQL Server. This post is terrific!

    Reply
  • recently I observed a design issue(?) where I found NC index has been designed to INCLUDE Clustered Key. I am curious to know how does it make sense designing NC index this way?

    Reply
  • Alexandre Dantas
    July 25, 2019 12:10 pm

    Brent, when a clustered index is used on a plan but Key Lookup occurs, is it advisable to recreate the clustered index with the columns to the index (create with include) ?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.