In SQL ConstantCare®, we’re adding a notification for when your data size grows >10% in a single week. When this happens, you want to stop for a second to ask yourself a few questions:
Did someone just create a backup table? Folks often do a SELECT INTO to back up an important table right before they make a dangerous change. Unfortunately, they rarely remember to go back and delete that table. Run sp_BlitzIndex @Mode = 2, which gives you an inventory of objects in the database. Copy/paste that into Excel, and sort it by size descending (either number of rows, or space allocated), and just make sure you don’t have a bunch of backup tables using up space. If you’re not sure if they’re still used, try renaming them, like rename SalesBckup to SalesBckup_ToBeDeleted_20191231. That way, if someone comes running back in saying they need that table, you can rename it back and you’re good to go. Otherwise, delete it on the specified date.
Did someone just create a bunch of indexes? Someone might have gone a little overboard with the Database Tuning Advisor creating covering indexes for a whole bunch of queries.
Is Instant File Initialization properly configured? Will we be able to grow out data files instantly? This is a massive help for restores, too: when you’re under the gun to get back online quickly, you need the ability to instantly grow out the data files before a restore begins.
Do I need to reconfigure the database’s log file? Instant File Initialization only works on the data files, not the log files, so large file growths can be slow. Make sure you don’t have the log file configured with a large percentage growth, like 10%, which might take a long time on a 100GB log file. It’s also a good time to make sure you don’t have shrink jobs – if someone’s really growing the data by 10% per week, you’re going to need some log space while these events happen.
Am I backing up the log often enough? If we’re only doing log backups once per hour, the log file might be growing to a large size, and log backup times might be getting ugly. Consider backing up the log more frequently to nibble off the changes over time.
Do I have enough space for TempDB? My general rule of thumb is that you want TempDB to be able to handle 25% of the server’s total data size, or the size of your largest table, whichever is larger. Sooner or later, somebody’s gonna dump your largest table into TempDB as part of a reporting query, and you’re going to need to be able to handle that workload without taking the server down.
Do I have enough space in DR? It’s not enough to monitor space on your primary replicas: you also have to check your disaster recovery environments to make sure you’ll be able to fail over. Don’t just think about data & log space – think about TempDB size and backup capacity over there, too.
We ran into the orphaned backup table issue too and implemented a TBD schema that we use to take table backups. When the schema is created, we create a job that scans for objects belonging to that schema with a create date older than a certain time. Disk growth has declined dramatically.
Great post! My only disagreement is that the last two items (maybe even last 3) should be thought about not just when your databases grow quickly. Also, a while back Erik wrote a post about log space. Do you still agree that a good starting point is twice the size of your largest index or 25% of your database size?
Sure, sounds good!
I’ve never found any rule that works for log spacing for any db bigger than about 2-3 Gb, and then I usually use 1/4 of the DB size, though I typically dont size any smaller than 256 Mb unless I know the transactions are going to be extremely minimal. I have to look it up each time I need to explain why 256 Mb, but it has something to do with the 64k extents fitting neatly into the ldf, minimizing the number of virtual log files in 2014+ and the VLFs being reasonably sized for some multipurpose use.
I think its better to size the ldf arbitrarily at first and see what it grows to, then back it up, shrink it to nothing and expand it to whatever seems right + more for small woopsies and growth
“I think its better to size the ldf arbitrarily at first and see what it grows to, then back it up, shrink it to nothing and expand it to whatever seems right + more for small woopsies and growth”
This. A random % rule ain’t gonna fly in an enterprise environment. the answer is always “it depends”.
Monitor, measure, and adjust to reality.
Yup. The wasted space, or inadequate space can become a much bigger problem later when you use a percentage rule that is almost never going to apply.
When you have a data analyst do something stupid in Python that fills up a log drive, causing an emergency, and you look at a bunch of big ldf files that are 99% empty, you can’t know which of those LDFs can sacrifice the space unless you sized it thoughtfully in the first place, at which point you’d know exactly what you could sacrifice, if anything.
I my place of work we keep track of all that and also do one more thing for temporary backup tables when needed (for deployments or other needs). We created a solution called the zzBACK database, is a special database that only has temporary backup tables or other SQL objects they are named zzBACK_Tablename_YYYYMMDD. This database is never backup because is for temporary objects only. Each month a maintenance job runs in every server that has it and delete any object that is older than 30 days in the database (this parameter can be adjusted if necessary). So our rule is, if you need a temporary backup of any SQL object , including tables with data will be created on the zzBACK database that is clean up every month. A self cleaning solution.
I’d add that you should check with whoever manages your VM’s that you actually have space despite what the OS may report, thin provisioning should not be used on production servers, but sometimes it is >.<.
Did the latest release contain a bug? I’ve seen that. “Exsqueeze me mister app boss, do you know you’re creating 10gb of log rows every day, since that last deployment? Do you really need fifty copies of each row? You’ll fix it? Good. Now all I have to do is figure out how to clean up, delete 100gb of bogus and dup rows without totally bogging production.”