No but really, how big should my log file be?

SQL Server
26 Comments

Most of you are going to hate this

And TL;DR, there’s a script at the end of the post. But like The Monster At The End Of This Book, it’s worth it not to skip the middle.

There are about a billion but-what-ifs that could come into play. I can’t possibly answer all of those for you. But that’s not the point of this post, anyway! If you’re in a special circumstance, using some fancy features, or doing something utterly deranged to your database, this isn’t the post, or script, for you.

I mean really, unless the size of your log file is causing you some dramatic pain, leave it alone. You should probably go invent cold fusion if log file size is the worst issue in your database. Congratulations.

This is also a lousy place to ask me if you can shrink your log file. I have no idea how or why it got that size. There’s free space now because you’re using FULL recovery model and you took a log backup, or you’re in SIMPLE and your database hit a CHECKPOINT. No magic there. It may very well grow to that size again, so shrinking it could be a really dumb idea.

So what’s the point? Lots of people ask me this question: clients, Office Hours attendees, random passerby on the street who recognize me (even without my Robot). I usually give them the same answer and explanation, unless I have ample evidence that their fancy and/or deranged ways require a different estimate.

From the ivory tower

A good STARTING POINT for your log file is twice the size of the largest index in your database, or 25% of the database size. Whichever is larger.

Why?

If the largest object in your database is larger than 25% of your database, you are likely running some type of maintenance. Index rebuilds require the size of the object being rebuilt in log space. I usually rule of thumb twice that space, in case you’re doing anything else while you’re doing that maintenance, like ETL, reports, dragging data to and fro, purging data, whatever. If you’re only ever reorganizing the largest object, you may not need all that space. Are you sure you’re ONLY ever reorganizing that? I’ll wait.

But 25% seems so random!

Well, kinda. but you’re here for a starting point. If you’re not Super DBA and taking baselines and trending your database file sizes over time, random is better than nothing. It buys you some leeway, too.

  • If you miss a log backup (maintenance plans got you down?)
  • If you’re not taking frequent enough log backups (can I interest you in RPO/RTO insurance?)
  • If you run other long/large transactions (SSIS won’t save you)

You’ll have a fair amount of room to do your dirty work. Most sane and rational people consider this to be a positive thing.

But what if my log file still grows?

Well, then you found out you need a bigger log file. Or you need to take log backups more frequently. Perhaps those hourly log backups aren’t working out as you planned, hm?

And if your log file never grows, you’ll look really smart. And you’ll never have to wait for your log file to expand. They don’t benefit from Instant File Initialization the way data files do.

Show me the script already

It’s all right under here. Don’t forget to change the USE statement. All sizes are in GB. If your database is smaller than 1GB, you’re one of those lucky DBAs who can take vacations and stuff. Go do that. Life is short.

If your database is under 1GB, and your log file is over 1GB, start taking log backups. I’m pretty sure you’re not.

 

Previous Post
Stored Procedure Cached Time vs SQL Statement Cached Time
Next Post
The Top 4 Job Tasks DBAs Forget

26 Comments. Leave new

  • So Erik, to summarize, you would want to presize your database log file when/if it becomes a problem to prevent auto growth from recurring and to know that your log file size will remain a constant to prevent dba intervention with possibly having to shrink it. Correct? This is assuming, as you mentioned, log backups are being taken frequent enough.

    Reply
    • Hey Rob,

      Generally, yes. But keep in mind that log file size will necessarily grow along with data and object sizes in your database. So unless your database is static, don’t freak out at every growth.

      Thanks!

      Reply
  • James Youkhanis
    February 10, 2016 9:18 am

    Erik without baseline information, would you recommend when a new database is created that I use 25% for my initial log size?

    Reply
    • This is a wide open question!

      How “new” is the database?
      Is it for a brand new app?
      Are there any projections for data size in the RFP or project plan?
      How much data is development generating?

      If you’re here, you’ll just have to pick a metric and trend it over time.

      Is it a new customer in a known app?
      Do they have data in another app that needs to be migrated?
      Do you have baselines on other databases used by the application?
      How much commensurate internal data are they currently generating?

      If you’re here, start with the metrics from a client database of a client of like size.

      Reply
  • Nice blog post. You cannot over emphasize “…you’ll never have to wait for your log file to expand. They don’t benefit from Instant File Initialization the way data files do.” I recently had a tempdb log file growth event happen on our production DB server and it brought the entire site to crawl; running on SSD but 60K+ TPS system. Any thoughts on tempdb log sizing?

    There are some good queries available online which utilize the default trace to show when database log and data file growth events occur.

    Reply
  • “–No, I’m not writing this to loop through all of your databases”

    Kind of off topic, but this has been something I’ve had to do on more than one occasion, and I’ve seen a number of ways to do it. For the curious, this (so far) has been the easiest approach:

    DECLARE @SQL NVARCHAR(MAX)

    SELECT @SQL = STUFF((
    SELECT ‘
    USE [‘ + d.name + ‘]
    SELECT
    thing
    FROM
    place;’
    FROM sys.databases d
    WHERE thing = other_thing –if you want only user databases or online databases do it here
    FOR XML PATH(”), TYPE).value(‘.’, ‘NVARCHAR(MAX)’), 1, 2, ”)

    EXEC sp_executeSQL @SQL;

    Reply
  • Every time a DBA shrinks a transaction log, God kills a kitten. Please, think of the kittens.

    Reply
  • What a blast from the past with The Monster At The End Of This Book! Oh, happy memories! Yes, I enjoyed the rest of your blog but that was my favorite part. 🙂

    Reply
  • I LOVE “The Monster At The End Of This Book” so I read the whole article.

    -J Grover <– really Grover

    Reply
  • Here you go:

    /************** Create table variable to store database names **************************/
    Declare @Databases table
    (id int identity (1,1),DatabaseName nvarchar(100))

    CREATE TABLE #Temp_LOG(
    [schema_name] [nvarchar](128) NULL,
    [table_name] [sysname] NOT NULL,
    [name] [sysname] NULL,
    [row_count] [bigint] NOT NULL,
    [index_total_space_gb] [decimal](18, 2) NULL,
    [largest_index_times_two_(gb)] [bigint] NULL,
    [database_size_(gb)] [int] NULL,
    [current_log_size_(gb)] [int] NULL,
    [25%_of_database_(gb)] [int] NULL,
    [maybe_this_is_a_good_log_size(gb)] [bigint] NULL
    )

    /************** Insert database names into table variable **************************/
    Insert into @Databases

    SELECT name [Database Name]
    FROM [model].[sys].[databases]
    where (database_id > 4 and Name Not In(‘Distribution’))
    order by name

    –select * from @Databases –Select all databases from table variable (For testing only)

    /***************************** Set up variables for loop **********************/

    Declare @row int
    Declare @rows int
    Declare @database_name Nvarchar(MAX)
    Declare @Execdatabase Nvarchar(MAX)

    /***************************** Set up loop execution *****************************/
    Select @rows = (Select MAX(id) from @Databases)
    Set @row = 1

    While @row [ls].[25%_of_database_(gb)]
    THEN [ls].[largest_index_times_two_(gb)]
    ELSE [ls].[25%_of_database_(gb)]
    END AS [maybe_this_is_a_good_log_size(gb)]
    FROM [log_size] AS [ls]
    OPTION ( RECOMPILE );’
    exec (@Execdatabase)
    Set @row = @row + 1
    End

    Select * from #Temp_LOG

    Drop Table #Temp_LOG

    Reply
  • The paste did not work correctly last time I’ll try once more…
    /************** Create table variable to store database names **************************/
    Declare @Databases table
    (id int identity (1,1),DatabaseName nvarchar(100))

    CREATE TABLE #Temp_LOG(
    [schema_name] [nvarchar](128) NULL,
    [table_name] [sysname] NOT NULL,
    [name] [sysname] NULL,
    [row_count] [bigint] NOT NULL,
    [index_total_space_gb] [decimal](18, 2) NULL,
    [largest_index_times_two_(gb)] [bigint] NULL,
    [database_size_(gb)] [int] NULL,
    [current_log_size_(gb)] [int] NULL,
    [25%_of_database_(gb)] [int] NULL,
    [maybe_this_is_a_good_log_size(gb)] [bigint] NULL
    )

    /************** Insert database names into table variable **************************/
    Insert into @Databases

    SELECT name [Database Name]
    FROM [model].[sys].[databases]
    where (database_id > 4 and Name Not In(‘Distribution’))
    order by name

    –select * from @Databases –Select all databases from table variable (For testing only)

    /***************************** Set up variables for loop **********************/

    Declare @row int
    Declare @rows int
    Declare @database_name Nvarchar(MAX)
    Declare @Execdatabase Nvarchar(MAX)

    /***************************** Set up loop execution *****************************/
    Select @rows = (Select MAX(id) from @Databases)
    Set @row = 1

    While @row [ls].[25%_of_database_(gb)]
    THEN [ls].[largest_index_times_two_(gb)]
    ELSE [ls].[25%_of_database_(gb)]
    END AS [maybe_this_is_a_good_log_size(gb)]
    FROM [log_size] AS [ls]
    OPTION ( RECOMPILE );’
    exec (@Execdatabase)
    Set @row = @row + 1
    End

    Select * from #Temp_LOG

    Drop Table #Temp_LOG

    Reply
  • Post is too long sorry.

    Reply
    • This is similar to what has already been posted to loop through all databases. In my case, I am skipping system databases.

      declare @databases table (id int, dbname nvarchar(128))
      declare @dbname nvarchar(128)
      declare @dbid int = 0
      declare @query nvarchar(4000)
      declare @sql nvarchar(3000)

      set @sql =
      ‘;WITH [log_size]
      AS ( SELECT TOP 1
      SCHEMA_NAME([t].[schema_id]) AS [schema_name] ,
      [t].[name] AS [table_name] ,
      [i].[name] ,
      [p].[rows] AS [row_count] ,
      CAST(( SUM([a].[total_pages]) * 8. ) / 1024. / 1024. AS DECIMAL(18,
      2)) AS [index_total_space_gb] ,
      ( SUM([a].[total_pages]) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] ,
      ( SELECT ( SUM([mf].[size]) * 8 ) / 1024 / 1024
      FROM [sys].[master_files] AS [mf]
      WHERE [mf].[database_id] = DB_ID() ) AS [database_size_(gb)] ,
      ( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
      / 1024 AS INT)
      FROM [sys].[master_files] AS [mf]
      WHERE [mf].[database_id] = DB_ID()
      AND [mf].[type_desc] = ”LOG” ) AS [current_log_size_(gb)] ,
      ( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
      / 1024 * .25 AS INT)
      FROM [sys].[master_files] AS [mf]
      WHERE [mf].[database_id] = DB_ID()
      AND [mf].[type_desc] = ”ROWS” ) AS [25%_of_database_(gb)]
      FROM [sys].[tables] [t]
      INNER JOIN [sys].[indexes] [i]
      ON [t].[object_id] = [i].[object_id]
      INNER JOIN [sys].[partitions] [p]
      ON [i].[object_id] = [p].[object_id]
      AND [i].[index_id] = [p].[index_id]
      INNER JOIN [sys].[allocation_units] [a]
      ON [p].[partition_id] = [a].[container_id]
      WHERE [t].[is_ms_shipped] = 0
      GROUP BY SCHEMA_NAME([t].[schema_id]) ,
      [t].[name] ,
      [i].[name] ,
      [p].[rows]
      ORDER BY [index_total_space_gb] DESC)
      SELECT * ,
      CASE WHEN [ls].[largest_index_times_two_(gb)] > [ls].[25%_of_database_(gb)]
      THEN [ls].[largest_index_times_two_(gb)]
      ELSE [ls].[25%_of_database_(gb)]
      END AS [maybe_this_is_a_good_log_size(gb)]
      FROM [log_size] AS [ls]
      OPTION ( RECOMPILE );’

      insert into @databases
      select database_id, name from sys.databases where database_id>4

      declare @logSize table(
      [schema_name] [nvarchar](128) NULL,
      [table_name] [sysname] NOT NULL,
      [name] [sysname] NULL,
      [row_count] [bigint] NOT NULL,
      [index_total_space_gb] [decimal](18, 2) NULL,
      [largest_index_times_two_(gb)] [bigint] NULL,
      [database_size_(gb)] [int] NULL,
      [current_log_size_(gb)] [int] NULL,
      [25%_of_database_(gb)] [int] NULL,
      [maybe_this_is_a_good_log_size(gb)] [bigint] NULL
      )

      while (1=1)
      Begin
      select top 1 @dbid=id, @dbname=dbname from @databases
      where id > @dbid
      order by id
      if @@ROWCOUNT = 0 BREAK;
      set @query=’use [‘ + @dbname + ‘]; ‘ + @sql
      insert into @logSize
      exec sp_executesql @query
      End

      select * from @logSize

      Reply
  • I modified the previous query I posted to include database names because on one of my servers, I have multiple databases with the same index name.

    declare @databases table (id int, dbname nvarchar(128))
    declare @dbname nvarchar(128)
    declare @dbid int = 0
    declare @query nvarchar(4000)
    declare @sql nvarchar(3000)

    set @sql =
    ‘;WITH [log_size]
    AS ( SELECT TOP 1
    DB_NAME() [Database_Name],
    SCHEMA_NAME([t].[schema_id]) AS [schema_name] ,
    [t].[name] AS [table_name] ,
    [i].[name] ,
    [p].[rows] AS [row_count] ,
    CAST(( SUM([a].[total_pages]) * 8. ) / 1024. / 1024. AS DECIMAL(18,
    2)) AS [index_total_space_gb] ,
    ( SUM([a].[total_pages]) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] ,
    ( SELECT ( SUM([mf].[size]) * 8 ) / 1024 / 1024
    FROM [sys].[master_files] AS [mf]
    WHERE [mf].[database_id] = DB_ID() ) AS [database_size_(gb)] ,
    ( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
    / 1024 AS INT)
    FROM [sys].[master_files] AS [mf]
    WHERE [mf].[database_id] = DB_ID()
    AND [mf].[type_desc] = ”LOG” ) AS [current_log_size_(gb)] ,
    ( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
    / 1024 * .25 AS INT)
    FROM [sys].[master_files] AS [mf]
    WHERE [mf].[database_id] = DB_ID()
    AND [mf].[type_desc] = ”ROWS” ) AS [25%_of_database_(gb)]
    FROM [sys].[tables] [t]
    INNER JOIN [sys].[indexes] [i]
    ON [t].[object_id] = [i].[object_id]
    INNER JOIN [sys].[partitions] [p]
    ON [i].[object_id] = [p].[object_id]
    AND [i].[index_id] = [p].[index_id]
    INNER JOIN [sys].[allocation_units] [a]
    ON [p].[partition_id] = [a].[container_id]
    WHERE [t].[is_ms_shipped] = 0
    GROUP BY SCHEMA_NAME([t].[schema_id]) ,
    [t].[name] ,
    [i].[name] ,
    [p].[rows]
    ORDER BY [index_total_space_gb] DESC)
    SELECT * ,
    CASE WHEN [ls].[largest_index_times_two_(gb)] > [ls].[25%_of_database_(gb)]
    THEN [ls].[largest_index_times_two_(gb)]
    ELSE [ls].[25%_of_database_(gb)]
    END AS [maybe_this_is_a_good_log_size(gb)]
    FROM [log_size] AS [ls]
    OPTION ( RECOMPILE );’

    insert into @databases
    select database_id, name from sys.databases –where database_id>4

    declare @logSize table(
    [database_name] [sysname],
    [schema_name] [nvarchar](128) NULL,
    [table_name] [sysname] NOT NULL,
    [name] [sysname] NULL,
    [row_count] [bigint] NOT NULL,
    [index_total_space_gb] [decimal](18, 2) NULL,
    [largest_index_times_two_(gb)] [bigint] NULL,
    [database_size_(gb)] [int] NULL,
    [current_log_size_(gb)] [int] NULL,
    [25%_of_database_(gb)] [int] NULL,
    [maybe_this_is_a_good_log_size(gb)] [bigint] NULL
    )

    while (1=1)
    Begin
    select top 1 @dbid=id, @dbname=dbname from @databases
    where id > @dbid
    order by id
    if @@ROWCOUNT = 0 BREAK;
    set @query=’use [‘ + @dbname + ‘]; ‘ + @sql
    insert into @logSize
    exec sp_executesql @query
    End

    select * from @logSize order by [database_size_(gb)] desc

    Reply
  • Nice work on estimating sizes !
    In addition, I think that it makes (close to) no sense to (frequently) rebuild the clustered index for a table. Therefore, the metric ‘[index_total_space_gb]’ should point to the biggest non-clustered index.
    That if I’m not get it all completely wrong, of course 🙂

    Reply
    • There’s no filter on clustered vs nonclustered, so it should just find the largest heap/clustered/nonclustered structure period.

      Reply
  • Henrik Staun Poulsen
    March 3, 2016 5:47 am

    Hi Erik,

    May I recommend that you cast mf.size as a bigint?
    That way it does not crash on bigger databases.
    It is all too easy for sum(mf.size) to exceed 15 TB.

    Best regards,
    Henrik

    Reply
  • Marvel Mayfield
    July 12, 2018 7:50 am

    I’m not someone who writes these types of SQL queries- I excel at finding them on the internet instead! 😉 Might I ask where to do the casting of mf.size to BIGINT that Henrik suggested? I’m running into this issue & I’ve tried a few variations without success & would greatly appreciate feedback from you query writing gurus!

    Reply
  • Would your recommendations change if your biggest index is page compressed?

    Reply
    • Jess — no, the script in the post is there as a starting place for people to size things. Your log file should still be at least as big as your largest object, even if it’s compressed. It may end up needing to be larger, but likely not smaller.

      Reply
      • Totally understand that this is a starting point, I’ve just been thinking about data compression and the effect on transaction log records so I was really just curious on your thoughts

        I *think* (and please correct if I’m wrong) that data can’t be page compressed in the transaction log, so I was wondering whether the base log size should be at least as big as your largest non-compressed (or row compressed maybe) index? For example my largest index is 4GB with PAGE compression but 23GB with no compression applied and 12GB if ROW compression is used. Therefore if during index rebuild the index ‘ends up in’ the t-log uncompressed I would need a lot more room available.

        Reply
        • It sounds like you have the perfect setup to test that assumption out in.

          You could even blog about it

          😉

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.