Blog

Do you use heaps in SQL Server? How can heaps impact your query performance and database maintenance? In this 30 minute webcast I will show you how heaps behave differently than than tables with clustered indexes in SQL Server. You’ll learn how to identify heaps, how to measure their size, and how to plan a change to convert your heaps to clustered indexes.

This talk is aimed toward DBAs and developers who have worked with SQL Server >= 1 year. I cover LOTS of dynamic management views and tools to view table structures– it’s OK if you haven’t used them all before, you’ll still be able to get the concepts.

Want the scripts used in the talk? Scroll on down– scripts are included in this post below the video.

--Author: Brent Ozar PLF, LLC
--
--Purpose: Demo some of the peculiarities of heaps in SQL Server.
--This script is for use on TEST servers (not production). It creates
--and drops a database and uses undocumented stored procedures.
--
--References: We run this script in our video demo on heaps. Check out the
--video for intepretation of the script results and more.
--
--Usage notes: This demo uses both undocumented and documented procedures
--in SQL Server.
--
--The undocumented procedures are NOT meant to be run against
--production-- they are widely used, but be careful with all undocumented procedures
--and use at your own risk. ("Undocumented" means Microsoft doesn't officially
--support them.)
--
--This demo also uses sys.dm_db_database_page_allocations, which is a new undocumented
--DMV in  SQL Server 2012. If you're testing these scripts against an earlier version of
--SQL Server, you can use DBCC IND (also undocumented) to see the same information.
--We include an example of DBCC IND with the syntax in the demo.

SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GO

--Recreate the HeapsOfHeaps database for testing.
--This drops the database-- use with care.  (Hopefully you
--don't have production databases with that name.)

WHILE @@TRANCOUNT > 0
    BEGIN
        ROLLBACK
    END

IF DB_ID('HeapsOfHeaps') IS NOT NULL
    BEGIN
        USE master;
        ALTER DATABASE HeapsOfHeaps SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
        DROP DATABASE HeapsOfHeaps;
    END
GO

--Create and use the database.
--This creates it in the default location with default sizes-- change if you wish.

CREATE DATABASE HeapsOfHeaps;
GO

USE HeapsOfHeaps;
SET NOCOUNT ON;
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GO

--Let's create a heap!
CREATE TABLE dbo.DataPile
    (
      datapileid BIGINT IDENTITY  NOT NULL,
      col1 VARCHAR(1024) NOT NULL
    );
GO

--Insert 1000 rows.

DECLARE @i INT = 1;
BEGIN TRAN
WHILE @i <= 1000
    BEGIN
        INSERT  dbo.dataPile (col1)
                SELECT  REPLICATE('A',200) ;
        SELECT  @i = @i + 1;
    END
COMMIT;
GO

--Here's a query that will show if you have heaps.
--You want to look for Index_Id = 0.

SELECT  sc.name AS [Schema Name] ,
        so.name AS [Table_Name]
FROM    sys.indexes si
        JOIN sys.objects so ON si.object_id = so.object_id
        JOIN sys.schemas sc ON so.schema_id = sc.schema_id
WHERE   so.is_ms_shipped = 0  /* Not shipped by Microsoft */
        AND si.index_id = 0 /* Index Id 0 = A Heap */
        AND so.type = 'U'; /* User table */
GO

--You can also look at the table and TRY to find indexes
exec sp_helpindex 'DataPile';
GO

--Let's look at the structure of our heap with DBCC IND.
--This is a special system command (undocumented!).
--PageType 10= an IAM page (Index Allocation Map).

--Parameters: DatabaseName, TableName, IndexID

DBCC IND('HeapsOfHeaps', 'DataPile',0)
GO

--In SQL 2012 we can also query a new DMV to see allocations!
--This DMV is also undocumented.
--Parameters: DatabaseId, ObjectId, IndexId, PartitionId, mode

select *
from sys.dm_db_database_page_allocations(db_id(),object_id('DataPile'),0,null,'DETAILED');
GO

--We can also get information about the table's fragmentation.
--This is a documented procedure, but I'm running it in detailed mode so it's going to look
--at all the pages in the active table. Be very careful if you run this in detailed mode
--against a production database.

SELECT alloc_unit_type_desc,
	index_depth, page_count, avg_page_space_used_in_percent,
	record_count, forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('DataPile'), 0, NULL,'detailed');
GO

--sys.dm_db_partition_stats looks at the metadata.

select used_page_count, in_row_used_page_count, reserved_page_count, row_count
from sys.dm_db_partition_stats
where object_name(object_id) = 'DataPile';

--Let's count how many reads we have to do to scan the table.
--Run the select, then go to the Messages tab.
--Look at the number of "logical reads" we did.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT *
FROM dbo.dataPile;
GO

--Let's make half of the rows have larger values in them.

UPDATE dbo.DataPile
SET col1=REPLICATE('B',1000)
WHERE dataPileid % 2 = 0;
GO

--How many reads does it take to scan the table now?
--Run the select, then go to the Messages tab.
--Look at the number of "logical reads" we did.

SELECT *
FROM dbo.dataPile;
GO

--Why did we do so many more logical reads????
--Let's find some information by looking at the table's fragmentation now.
--Note: look at the record_count column-- we didn't insert any records!
--It's counting wrong by the number of forwarded records.
--(That's documented in Books Online, by the way.)

SELECT alloc_unit_type_desc,
	index_depth, page_count, avg_page_space_used_in_percent,
	record_count, forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('DataPile'), 0, NULL,'detailed');
GO

--We can see something special happened reading, as well.
--Check out forwarded_fetch_count.

select leaf_insert_count, leaf_update_count, forwarded_fetch_count
from sys.dm_db_index_operational_stats(db_id(),object_id('DataPile'),0,null);
GO

--What about deletes?

--Let's delete all of the rows from the table, except for five.
DELETE FROM dbo.datapile
where datapileid > 5;
GO

--OK, we only have FIVE rows in this table now. We started with 1000.
--How many reads does it take to read now?
--Run the select, go to the messages tab, and look at logical reads.

SELECT *
FROM dbo.dataPile;
GO

--What changed?
--We got rid of MOST of the forwarded record pointers.
--But we still have 80+ pages allocated! For only five rows!

SELECT alloc_unit_type_desc,
	index_depth, page_count, avg_page_space_used_in_percent,
	record_count, forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('DataPile'), 0, NULL,'detailed');
GO

--sys.dm_db_partition_stats agrees that we still have LOTS of pages.
--It gets the row count right, by the way.

select used_page_count, in_row_used_page_count, reserved_page_count, row_count
from sys.dm_db_partition_stats
where object_name(object_id) = 'DataPile';
GO

--Let's create a nonclustered index on our heap.

CREATE UNIQUE NONCLUSTERED INDEX ix_datapile_nc_datapileid ON dbo.DataPile (datapileid);
GO

--Let's look at what pages are being used for the nonclustered index.
--We're using the undocumented new DMV again.
--We have an INDEX page type instead of a DATA page type. */
select *
from sys.dm_db_database_page_allocations(db_id(),object_id('DataPile'),3,null,'DETAILED');
GO

--Now, let's take the index page ID:
--This is the allocated_page_page_id for the row where page_type_desc=INDEX_PAGE
--We'll plug it into DBCC PAGE.
--This is another undocumented procedure that lets us look at page data.
--Check out the values in the HEAP RID Column and record them.

DBCC TRACEON (3604);
DBCC PAGE (HeapsOfHeaps, 1, 284,3)
GO

--Now let's rebuild the heap.
--We can do that-- we can rebuild a heap! (On SQL Server 2008 and above.)
--Prior to SQL Server 2008, the most popular way to de-fragment a heap is to
--add and remove a clustered index--- but it has the same impact we'll see here.

ALTER TABLE dbo.DataPile REBUILD;
GO

--Let's check out our nonclustered index...

select allocated_page_page_id, page_type_desc
from sys.dm_db_database_page_allocations(db_id(),object_id('DataPile'),3,null,'DETAILED');
GO

--Hey, wait a second, the PAGE IDs are different for the Non-Clustered index.
--And we didn't rebuild the non-clustered index--- we just asked for a rebuild of the heap!
--But yet our non-clustered index shows us that it is on totally new pages now.
--What's up with that????

DBCC TRACEON (3604);
DBCC PAGE (HeapsOfHeaps, 1, 370,3)
GO

-- Compare those HEAP RIDS to what you recorded before.
--We explain why this is and what this means in the video.

--Wow, rebuilding a big heap with nonclustered indexes could
--cause LOTS and LOTS of unexpected IO.
--Really, the times when we want to have a heap in SQL Server are the exception rather than the rule.
--We like to default to clustered indexes in SQL Server---
--unless we have performance tests showing that heaps are better for that use case!
↑ Back to top
  1. Hi team,

    Every day i have spend around 20 mins time here, i learn alote of new things in this school only

    this session is good but you are creating some scripts right at time of demo only we have to see that script its easly understand.
    I hope did u increase the size 100% but i m not able to look it. could you please correct here only.

    Thank you for your session

    • Hi– I am glad you enjoy the session!

      I apologize that you are having a hard time reading the scripts on the screen. When I did the webcast itself, I should have resized the screen to make it larger— viewers of the webcast could dynamically resize it, but now with the recording it is trickier. I’ll check if I can fix the recording by editing and zooming.

      For the time being, the scripts I used are all in the post below the video, in case that helps follow along.

      Thanks,
      Kendra

  2. Hello,

    thank You very much for such great video, from which I could learn a lot. Demo is a great way to learn and try things as long as You speak, especially for us who’s mother tongue is not English. Just by speaking, without demo, it would be hard to follow all things.
    So, please, continue with such a great way of doing things and helping community grab the SQL Server aspects better.

  3. Great video but voic is hard to understand.

  4. Great video! We are actually experiencing a performance issue with one of our third-party aps and the majority of their tables are heaps (hence the reason i was interested in this video!) and this is one area we have brought up to them. Now i have a better understanding of the downside of heaps and good to know they should only be used as an exception rather than the rule, but more importantly WHY.

    If the vendow cannot explain why there are heaps rather than clustered indexes, what can be done? (short of shaking our head..)

  5. Very nice video Kendra. Thanks for sharing!

  6. Great video, thanks.
    My question is: should I create a clustered index even if I don’t have any column worthwhile?
    meaning, is it better to create an identity column just for the clustered index, rather then leave it as a heap?

    • Hi Eran,

      A good rule of thumb is that you should create a clustered index unless you can prove that a heap will perform better. (Typically the “heaps perform better” scenario is one where you’re going to write once and then always scan the data afterward or do something predictable– so things like staging tables for data loading, data warehouse tables behind analysis cubes that are built in specific ways, etc.)

      For most general and OLTP purposes, it is definitely worth defaulting to having a clustered index. You want the clustered index to ideally be:
      * Unique
      * As narrow as possible
      * Static (columns that aren’t updated)
      * Not prone to fragmentation

      Sometimes this means using a surrogate key (like an int identity) is best. But this isn’t always the case. If you have two columns which can make a good key when combined (unique, narrow data types, hopefully ever-increasing), and which are frequently read, then that can out-perform a surrogate key in some situations.

      So to wrap things up: Usually it *is* better to cluster the table rather than go with a heap. But do look hard at the query and data patterns and test to make sure that a surrogate identity is the best bet– sometimes there are other options which are better for your read queries.

  7. Hi Kendra,
    What would be the downside/benefit of having a clustered index on a table that is NON-UNIQUE (say, there is no combination of rows that provides a unique row)? Would it still be beneficial to add a clustered index on the most unique column rather than keep it as a heap?

    Thanks in advance.

    • When you make an index non-unique, SQL Server has to make it unique for you behind the scenes.

      For a clustered index this means that for every row, SQL Server will have to check if it is unique or not. If it’s not unique it will have to populate a “uniquifier” to make things unique. This takes up space on the row (approx 8 bytes overhead on the row when populated).

      A uniquifier on the clustered index will also sneak into all your nonclustered indexes. Nonclustered indexes use the clustering key (plus uniquifier if present) to link each row with the clsutering key, plus if non-clustered indexes aren’t unique then they promote the clustering key into their own key to guarantee uniqueness.

      Clear as mud, right?

      Short story is that non-unique clustering keys are less efficient than unique clustering keys.

      But that being said, yes, I would generally choose a non-unique clustering key for a table over being a heap for most purposes. The “lesser weevil” and all that. There’s a lot of things I would look at to make sure it’s a “safe” choice.

      Sorry for the wordy response– I have a 75 minute talk on selecting clustering keys and the pros and cons of different choices. There’s a lot to consider, unfortunately!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>