Most of you are going to hate this
And TL;DR, there’s a script at the end of the post. But like The Monster At The End Of This Book, it’s worth it not to skip the middle.
There are about a billion but-what-ifs that could come into play. I can’t possibly answer all of those for you. But that’s not the point of this post, anyway! If you’re in a special circumstance, using some fancy features, or doing something utterly deranged to your database, this isn’t the post, or script, for you.
I mean really, unless the size of your log file is causing you some dramatic pain, leave it alone. You should probably go invent cold fusion if log file size is the worst issue in your database. Congratulations.
This is also a lousy place to ask me if you can shrink your log file. I have no idea how or why it got that size. There’s free space now because you’re using FULL recovery model and you took a log backup, or you’re in SIMPLE and your database hit a CHECKPOINT. No magic there. It may very well grow to that size again, so shrinking it could be a really dumb idea.
So what’s the point? Lots of people ask me this question: clients, Office Hours attendees, random passerby on the street who recognize me (even without my Robot). I usually give them the same answer and explanation, unless I have ample evidence that their fancy and/or deranged ways require a different estimate.
From the ivory tower
A good STARTING POINT for your log file is twice the size of the largest index in your database, or 25% of the database size. Whichever is larger.
Why?
If the largest object in your database is larger than 25% of your database, you are likely running some type of maintenance. Index rebuilds require the size of the object being rebuilt in log space. I usually rule of thumb twice that space, in case you’re doing anything else while you’re doing that maintenance, like ETL, reports, dragging data to and fro, purging data, whatever. If you’re only ever reorganizing the largest object, you may not need all that space. Are you sure you’re ONLY ever reorganizing that? I’ll wait.
But 25% seems so random!
Well, kinda. but you’re here for a starting point. If you’re not Super DBA and taking baselines and trending your database file sizes over time, random is better than nothing. It buys you some leeway, too.
- If you miss a log backup (maintenance plans got you down?)
- If you’re not taking frequent enough log backups (can I interest you in RPO/RTO insurance?)
- If you run other long/large transactions (SSIS won’t save you)
You’ll have a fair amount of room to do your dirty work. Most sane and rational people consider this to be a positive thing.
But what if my log file still grows?
Well, then you found out you need a bigger log file. Or you need to take log backups more frequently. Perhaps those hourly log backups aren’t working out as you planned, hm?
And if your log file never grows, you’ll look really smart. And you’ll never have to wait for your log file to expand. They don’t benefit from Instant File Initialization the way data files do.
Show me the script already
It’s all right under here. Don’t forget to change the USE statement. All sizes are in GB. If your database is smaller than 1GB, you’re one of those lucky DBAs who can take vacations and stuff. Go do that. Life is short.
If your database is under 1GB, and your log file is over 1GB, start taking log backups. I’m pretty sure you’re not.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
WITH log_size AS ( SELECT TOP 1 SCHEMA_NAME(t.schema_id) AS schema_name , t.name AS table_name , i.name , p.rows AS row_count , CAST(( SUM(CONVERT(BIGINT, a.total_pages)) * 8. ) / 1024. / 1024. AS DECIMAL(18,2)) AS index_total_space_gb , ( SUM(CONVERT(BIGINT, a.total_pages)) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] , ( SELECT ( SUM(CONVERT(BIGINT, mf.size)) * 8 ) / 1024 / 1024 FROM sys.master_files AS mf WHERE mf.database_id = DB_ID() ) AS [database_size_(gb)] , ( SELECT CAST(( SUM(CONVERT(BIGINT, mf.size)) * 8 ) / 1024 / 1024 AS INT) FROM sys.master_files AS mf WHERE mf.database_id = DB_ID() AND mf.type_desc = 'LOG' ) AS [current_log_size_(gb)] , ( SELECT CAST(( SUM(CONVERT(BIGINT, mf.size)) * 8 ) / 1024 / 1024 * .25 AS INT) FROM sys.master_files AS mf WHERE mf.database_id = DB_ID() AND mf.type_desc = 'ROWS' ) AS [25%_of_database_(gb)] FROM sys.tables t INNER JOIN sys.indexes i ON t.object_id = i.object_id INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id WHERE t.is_ms_shipped = 0 GROUP BY SCHEMA_NAME(t.schema_id) , t.name , i.name , p.rows ORDER BY index_total_space_gb DESC) SELECT ls.schema_name, ls.table_name, ls.name, ls.row_count, ls.index_total_space_gb, ls.[largest_index_times_two_(gb)], ls.[database_size_(gb)], ls.[current_log_size_(gb)], ls.[25%_of_database_(gb)] , CASE WHEN ls.[largest_index_times_two_(gb)] > ls.[25%_of_database_(gb)] THEN ls.[largest_index_times_two_(gb)] ELSE ls.[25%_of_database_(gb)] END AS [maybe_this_is_a_good_log_size(gb)] FROM log_size AS ls OPTION ( RECOMPILE ); |
26 Comments. Leave new
So Erik, to summarize, you would want to presize your database log file when/if it becomes a problem to prevent auto growth from recurring and to know that your log file size will remain a constant to prevent dba intervention with possibly having to shrink it. Correct? This is assuming, as you mentioned, log backups are being taken frequent enough.
Hey Rob,
Generally, yes. But keep in mind that log file size will necessarily grow along with data and object sizes in your database. So unless your database is static, don’t freak out at every growth.
Thanks!
Erik without baseline information, would you recommend when a new database is created that I use 25% for my initial log size?
This is a wide open question!
How “new” is the database?
Is it for a brand new app?
Are there any projections for data size in the RFP or project plan?
How much data is development generating?
If you’re here, you’ll just have to pick a metric and trend it over time.
Is it a new customer in a known app?
Do they have data in another app that needs to be migrated?
Do you have baselines on other databases used by the application?
How much commensurate internal data are they currently generating?
If you’re here, start with the metrics from a client database of a client of like size.
Nice blog post. You cannot over emphasize “…you’ll never have to wait for your log file to expand. They don’t benefit from Instant File Initialization the way data files do.” I recently had a tempdb log file growth event happen on our production DB server and it brought the entire site to crawl; running on SSD but 60K+ TPS system. Any thoughts on tempdb log sizing?
There are some good queries available online which utilize the default trace to show when database log and data file growth events occur.
Brent Ozar himself did a post on tempdb sizing a month ago that might help you:
https://www.brentozar.com/archive/2016/01/cheat-sheet-how-to-configure-tempdb-for-microsoft-sql-server/
John’s right about Brent’s post on sizing. If you want to track log growth, I have one on using Extended Events to do so.
https://www.brentozar.com/archive/2015/12/tracking-tempdb-growth-using-extended-events/
“–No, I’m not writing this to loop through all of your databases”
Kind of off topic, but this has been something I’ve had to do on more than one occasion, and I’ve seen a number of ways to do it. For the curious, this (so far) has been the easiest approach:
DECLARE @SQL NVARCHAR(MAX)
SELECT @SQL = STUFF((
SELECT ‘
USE [‘ + d.name + ‘]
SELECT
thing
FROM
place;’
FROM sys.databases d
WHERE thing = other_thing –if you want only user databases or online databases do it here
FOR XML PATH(”), TYPE).value(‘.’, ‘NVARCHAR(MAX)’), 1, 2, ”)
EXEC sp_executeSQL @SQL;
Every time a DBA shrinks a transaction log, God kills a kitten. Please, think of the kittens.
What a blast from the past with The Monster At The End Of This Book! Oh, happy memories! Yes, I enjoyed the rest of your blog but that was my favorite part. 🙂
I LOVE “The Monster At The End Of This Book” so I read the whole article.
-J Grover <– really Grover
Here you go:
/************** Create table variable to store database names **************************/
Declare @Databases table
(id int identity (1,1),DatabaseName nvarchar(100))
CREATE TABLE #Temp_LOG(
[schema_name] [nvarchar](128) NULL,
[table_name] [sysname] NOT NULL,
[name] [sysname] NULL,
[row_count] [bigint] NOT NULL,
[index_total_space_gb] [decimal](18, 2) NULL,
[largest_index_times_two_(gb)] [bigint] NULL,
[database_size_(gb)] [int] NULL,
[current_log_size_(gb)] [int] NULL,
[25%_of_database_(gb)] [int] NULL,
[maybe_this_is_a_good_log_size(gb)] [bigint] NULL
)
/************** Insert database names into table variable **************************/
Insert into @Databases
SELECT name [Database Name]
FROM [model].[sys].[databases]
where (database_id > 4 and Name Not In(‘Distribution’))
order by name
–select * from @Databases –Select all databases from table variable (For testing only)
/***************************** Set up variables for loop **********************/
Declare @row int
Declare @rows int
Declare @database_name Nvarchar(MAX)
Declare @Execdatabase Nvarchar(MAX)
/***************************** Set up loop execution *****************************/
Select @rows = (Select MAX(id) from @Databases)
Set @row = 1
While @row [ls].[25%_of_database_(gb)]
THEN [ls].[largest_index_times_two_(gb)]
ELSE [ls].[25%_of_database_(gb)]
END AS [maybe_this_is_a_good_log_size(gb)]
FROM [log_size] AS [ls]
OPTION ( RECOMPILE );’
exec (@Execdatabase)
Set @row = @row + 1
End
Select * from #Temp_LOG
Drop Table #Temp_LOG
The paste did not work correctly last time I’ll try once more…
/************** Create table variable to store database names **************************/
Declare @Databases table
(id int identity (1,1),DatabaseName nvarchar(100))
CREATE TABLE #Temp_LOG(
[schema_name] [nvarchar](128) NULL,
[table_name] [sysname] NOT NULL,
[name] [sysname] NULL,
[row_count] [bigint] NOT NULL,
[index_total_space_gb] [decimal](18, 2) NULL,
[largest_index_times_two_(gb)] [bigint] NULL,
[database_size_(gb)] [int] NULL,
[current_log_size_(gb)] [int] NULL,
[25%_of_database_(gb)] [int] NULL,
[maybe_this_is_a_good_log_size(gb)] [bigint] NULL
)
/************** Insert database names into table variable **************************/
Insert into @Databases
SELECT name [Database Name]
FROM [model].[sys].[databases]
where (database_id > 4 and Name Not In(‘Distribution’))
order by name
–select * from @Databases –Select all databases from table variable (For testing only)
/***************************** Set up variables for loop **********************/
Declare @row int
Declare @rows int
Declare @database_name Nvarchar(MAX)
Declare @Execdatabase Nvarchar(MAX)
/***************************** Set up loop execution *****************************/
Select @rows = (Select MAX(id) from @Databases)
Set @row = 1
While @row [ls].[25%_of_database_(gb)]
THEN [ls].[largest_index_times_two_(gb)]
ELSE [ls].[25%_of_database_(gb)]
END AS [maybe_this_is_a_good_log_size(gb)]
FROM [log_size] AS [ls]
OPTION ( RECOMPILE );’
exec (@Execdatabase)
Set @row = @row + 1
End
Select * from #Temp_LOG
Drop Table #Temp_LOG
Post is too long sorry.
This is similar to what has already been posted to loop through all databases. In my case, I am skipping system databases.
declare @databases table (id int, dbname nvarchar(128))
declare @dbname nvarchar(128)
declare @dbid int = 0
declare @query nvarchar(4000)
declare @sql nvarchar(3000)
set @sql =
‘;WITH [log_size]
AS ( SELECT TOP 1
SCHEMA_NAME([t].[schema_id]) AS [schema_name] ,
[t].[name] AS [table_name] ,
[i].[name] ,
[p].[rows] AS [row_count] ,
CAST(( SUM([a].[total_pages]) * 8. ) / 1024. / 1024. AS DECIMAL(18,
2)) AS [index_total_space_gb] ,
( SUM([a].[total_pages]) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] ,
( SELECT ( SUM([mf].[size]) * 8 ) / 1024 / 1024
FROM [sys].[master_files] AS [mf]
WHERE [mf].[database_id] = DB_ID() ) AS [database_size_(gb)] ,
( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
/ 1024 AS INT)
FROM [sys].[master_files] AS [mf]
WHERE [mf].[database_id] = DB_ID()
AND [mf].[type_desc] = ”LOG” ) AS [current_log_size_(gb)] ,
( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
/ 1024 * .25 AS INT)
FROM [sys].[master_files] AS [mf]
WHERE [mf].[database_id] = DB_ID()
AND [mf].[type_desc] = ”ROWS” ) AS [25%_of_database_(gb)]
FROM [sys].[tables] [t]
INNER JOIN [sys].[indexes] [i]
ON [t].[object_id] = [i].[object_id]
INNER JOIN [sys].[partitions] [p]
ON [i].[object_id] = [p].[object_id]
AND [i].[index_id] = [p].[index_id]
INNER JOIN [sys].[allocation_units] [a]
ON [p].[partition_id] = [a].[container_id]
WHERE [t].[is_ms_shipped] = 0
GROUP BY SCHEMA_NAME([t].[schema_id]) ,
[t].[name] ,
[i].[name] ,
[p].[rows]
ORDER BY [index_total_space_gb] DESC)
SELECT * ,
CASE WHEN [ls].[largest_index_times_two_(gb)] > [ls].[25%_of_database_(gb)]
THEN [ls].[largest_index_times_two_(gb)]
ELSE [ls].[25%_of_database_(gb)]
END AS [maybe_this_is_a_good_log_size(gb)]
FROM [log_size] AS [ls]
OPTION ( RECOMPILE );’
insert into @databases
select database_id, name from sys.databases where database_id>4
declare @logSize table(
[schema_name] [nvarchar](128) NULL,
[table_name] [sysname] NOT NULL,
[name] [sysname] NULL,
[row_count] [bigint] NOT NULL,
[index_total_space_gb] [decimal](18, 2) NULL,
[largest_index_times_two_(gb)] [bigint] NULL,
[database_size_(gb)] [int] NULL,
[current_log_size_(gb)] [int] NULL,
[25%_of_database_(gb)] [int] NULL,
[maybe_this_is_a_good_log_size(gb)] [bigint] NULL
)
while (1=1)
Begin
select top 1 @dbid=id, @dbname=dbname from @databases
where id > @dbid
order by id
if @@ROWCOUNT = 0 BREAK;
set @query=’use [‘ + @dbname + ‘]; ‘ + @sql
insert into @logSize
exec sp_executesql @query
End
select * from @logSize
I modified the previous query I posted to include database names because on one of my servers, I have multiple databases with the same index name.
declare @databases table (id int, dbname nvarchar(128))
declare @dbname nvarchar(128)
declare @dbid int = 0
declare @query nvarchar(4000)
declare @sql nvarchar(3000)
set @sql =
‘;WITH [log_size]
AS ( SELECT TOP 1
DB_NAME() [Database_Name],
SCHEMA_NAME([t].[schema_id]) AS [schema_name] ,
[t].[name] AS [table_name] ,
[i].[name] ,
[p].[rows] AS [row_count] ,
CAST(( SUM([a].[total_pages]) * 8. ) / 1024. / 1024. AS DECIMAL(18,
2)) AS [index_total_space_gb] ,
( SUM([a].[total_pages]) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] ,
( SELECT ( SUM([mf].[size]) * 8 ) / 1024 / 1024
FROM [sys].[master_files] AS [mf]
WHERE [mf].[database_id] = DB_ID() ) AS [database_size_(gb)] ,
( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
/ 1024 AS INT)
FROM [sys].[master_files] AS [mf]
WHERE [mf].[database_id] = DB_ID()
AND [mf].[type_desc] = ”LOG” ) AS [current_log_size_(gb)] ,
( SELECT CAST(( SUM([mf].[size]) * 8 ) / 1024
/ 1024 * .25 AS INT)
FROM [sys].[master_files] AS [mf]
WHERE [mf].[database_id] = DB_ID()
AND [mf].[type_desc] = ”ROWS” ) AS [25%_of_database_(gb)]
FROM [sys].[tables] [t]
INNER JOIN [sys].[indexes] [i]
ON [t].[object_id] = [i].[object_id]
INNER JOIN [sys].[partitions] [p]
ON [i].[object_id] = [p].[object_id]
AND [i].[index_id] = [p].[index_id]
INNER JOIN [sys].[allocation_units] [a]
ON [p].[partition_id] = [a].[container_id]
WHERE [t].[is_ms_shipped] = 0
GROUP BY SCHEMA_NAME([t].[schema_id]) ,
[t].[name] ,
[i].[name] ,
[p].[rows]
ORDER BY [index_total_space_gb] DESC)
SELECT * ,
CASE WHEN [ls].[largest_index_times_two_(gb)] > [ls].[25%_of_database_(gb)]
THEN [ls].[largest_index_times_two_(gb)]
ELSE [ls].[25%_of_database_(gb)]
END AS [maybe_this_is_a_good_log_size(gb)]
FROM [log_size] AS [ls]
OPTION ( RECOMPILE );’
insert into @databases
select database_id, name from sys.databases –where database_id>4
declare @logSize table(
[database_name] [sysname],
[schema_name] [nvarchar](128) NULL,
[table_name] [sysname] NOT NULL,
[name] [sysname] NULL,
[row_count] [bigint] NOT NULL,
[index_total_space_gb] [decimal](18, 2) NULL,
[largest_index_times_two_(gb)] [bigint] NULL,
[database_size_(gb)] [int] NULL,
[current_log_size_(gb)] [int] NULL,
[25%_of_database_(gb)] [int] NULL,
[maybe_this_is_a_good_log_size(gb)] [bigint] NULL
)
while (1=1)
Begin
select top 1 @dbid=id, @dbname=dbname from @databases
where id > @dbid
order by id
if @@ROWCOUNT = 0 BREAK;
set @query=’use [‘ + @dbname + ‘]; ‘ + @sql
insert into @logSize
exec sp_executesql @query
End
select * from @logSize order by [database_size_(gb)] desc
Nice work on estimating sizes !
In addition, I think that it makes (close to) no sense to (frequently) rebuild the clustered index for a table. Therefore, the metric ‘[index_total_space_gb]’ should point to the biggest non-clustered index.
That if I’m not get it all completely wrong, of course 🙂
There’s no filter on clustered vs nonclustered, so it should just find the largest heap/clustered/nonclustered structure period.
Hi Erik,
May I recommend that you cast mf.size as a bigint?
That way it does not crash on bigger databases.
It is all too easy for sum(mf.size) to exceed 15 TB.
Best regards,
Henrik
I’m not someone who writes these types of SQL queries- I excel at finding them on the internet instead! 😉 Might I ask where to do the casting of mf.size to BIGINT that Henrik suggested? I’m running into this issue & I’ve tried a few variations without success & would greatly appreciate feedback from you query writing gurus!
Marvel — I’ve updated the code in the post. Let me know if that works for you.
Eureka! Thanks so much, Erik!
Would your recommendations change if your biggest index is page compressed?
Jess — no, the script in the post is there as a starting place for people to size things. Your log file should still be at least as big as your largest object, even if it’s compressed. It may end up needing to be larger, but likely not smaller.
Totally understand that this is a starting point, I’ve just been thinking about data compression and the effect on transaction log records so I was really just curious on your thoughts
I *think* (and please correct if I’m wrong) that data can’t be page compressed in the transaction log, so I was wondering whether the base log size should be at least as big as your largest non-compressed (or row compressed maybe) index? For example my largest index is 4GB with PAGE compression but 23GB with no compression applied and 12GB if ROW compression is used. Therefore if during index rebuild the index ‘ends up in’ the t-log uncompressed I would need a lot more room available.
It sounds like you have the perfect setup to test that assumption out in.
You could even blog about it
😉