Your Database Just Hit One Terabyte: Now What?

You were minding your own business, and all of a sudden it happened.
You glanced at file sizes one day, and your eyes got big. The numbers got a little large while you weren’t looking. This is a great time to stop and think about a few changes to the way you’re managing this database.
List your largest objects by size. Run sp_BlitzIndex @Mode = 2, @GetAllDatabases = 1, @SortOrder = ‘size’ to list out your largest indexes, and look at the size column. Start asking basic questions about these objects. Do we actually need these? Are we deleting out old data we no longer need? I ask these questions first because it helps get the database size back down before we tackle the rest of this stuff.
Start dropping unused indexes. Run sp_BlitzIndex @Mode = 2, @GetAllDatabases = 1, @SortOrder = ‘reads’ and ask tough questions about anything with 0 reads. Document the sizes of the indexes you’re dropping and use this as your victory trophy: these are easy wins at this point, and they buy you some time.
List your toughest queries by reads. Run sp_BlitzCache @SortOrder = ‘reads’ and start asking questions about how often these queries run, whether they’re running during business hours (check the last execution column), and whether they should be offloaded to a read-only replica.
Start thinking about offloading CHECKDB. As databases grow, it’s really common for DBAs to run corruption checking less often, and less thoroughly. They’re nervous because running CHECKDB with the right options takes too long on the primary, and their answer has been to cut corners, like running with the PHYSICAL_ONLY switch. You’ve got a big database now, which means big corruption risks, and you need to be using the fully intensive options like EXTENDED_LOGICAL_CHECKS. Your users probably won’t tolerate the slowdowns on the primary, and that’s okay: move CHECKDB over to AG secondaries, or on restored copies of the database.
Start thinking about storage snapshot backups and restores. Long term, as your database continues to grow, backup/restore/checkdb times are only going to get worse – much worse. Refreshing development and QA databases is going to get worse, too. Storage snapshot backups let you back up databases of any size, within seconds, and present that fresh backup over to another server in seconds, too. It sounds like black magic, but it really does work well. You need to have this in place before you hit the 2-5TB marks, and it takes planning, so start that planning and budgeting now.
Time a test restore. Pretend you just lost the production server – like somebody deleted the wrong VM, or you have a serious Windows or SQL patching problem. If you had to restore right now, how much data would you lose, and how long would you be down for? Time these numbers.
Run a limited DR test. Imagine that the production site is hosed or hit by drones. Do you have a DR replica ready to go? How long will it take to bring it online, and how far behind will it be? If you’re using continuous synchronization tools like Availability Groups, Log Shipping, etc, then start monitoring those methods to see how far behind they are in seconds or minutes. Monitor that for a week to get a rough idea of whether your network infrastructure is currently keeping up.
Document your current RPO/RTO state. Now that you’ve reduced the database size and done some timed tests, take a deep breath and start architecting. If some bozo runs a delete without a where clause, how long is the business comfortable being down while I restore the database to another server and push that table’s contents over? How much data am I allowed to lose? Document the current state using the RPO/RTO worksheet in the First Responder Kit, hand that to your manager, and say, “FYI, here’s how much data we could lose and how long we could be down in the event of these kinds of outages. If the business isn’t okay with that, have them look at the cost numbers on the second page of the worksheet, and let’s talk about budgeting to improve those numbers.”
Start tracking your backup sizes. I love that SQL Server’s built-in system tables like msdb.dbo.backupset track the sizes of your uncompressed database and the compressed backups over time for you. Check out sp_BlitzBackups for a quick report on how your databases have been growing, and consider plotting database size in a report for management. As data grows on fixed hardware, query performance can go down, and you wanna see that coming before the users gather at the door with pitchforks. Those are really useful numbers to have at budget time, too.
Ask for training help. Your developers are probably trying to solve terabyte-sized problems with gigabyte-sized skills. Tell your manager that the developers probably need my all-hands Fundamentals bundle, where everybody in the team gets access to my Fundamentals of Query Tuning, Index Tuning, and Parameter Sniffing classes, and the managers get progress reports to know that the team are making their way through the videos. Raise their skill levels before they ship too much more technical debt.
Related

Hi! I’m Brent Ozar.
I make Microsoft SQL Server go faster. I love teaching, travel, cars, and laughing. I’m based out of Las Vegas. He/him. I teach SQL Server training classes, or if you haven’t got time for the pain, I’m available for consulting too.
