When you weren’t looking, your databases went and grew up. Now your backup window has grown so large that you’re about ready to open it and jump.
Time to make a choice.
The Native Way: Tuning SQL Server Backups
You can theoretically pull this off by using a combination of tactics:
Back up as infrequently as the business will allow. Run your full backups once a week (or if you want to go wild and crazy, once per month) and differential backups periodically. As Jes explains in her backup and recovery class, differentials back up the data pages that have changed since the last full backup. When disaster strikes, you only need to recover the most recent full backup, the most recent differential backup, and all of the log backups after the differential. This can shave a lot of time off your restores – but only if you minimize the number of changed pages in the database. This means…
Change the database as little as possible. We can’t change what the users do, but we can change what we DBAs do. Stop doing daily index defrag/rebuild jobs – you’re just changing pages in the database, which means instantly inflating the size of your differential backups. In a scenario like this, you can only do index maintenance when you’re sure it is the only way to solve a performance problem, and it absolutely has to be your last resort.
Tune the data file read speeds. You need to read the pages off disk as fast as possible to back them up. Use tools like CrystalDiskMark and SQLIO to measure how fast you’re going, and then tune your storage to go faster.
Compress the data as much as possible. It’s not just about minimizing the size of your backup file – it’s about minimizing the amount of data we have to write to disk. Bonus points for using index compression inside the database so that it’s compressed once, not recompressed every time we do a backup, although that doesn’t really help with off-row data.
Tune the backup target write speeds. If you’re using a small pool of SATA drives in RAID 5 as a backup target, it’s probably not going to be able to keep up with a giant volume of streaming writes, even if those writes are compressed versions of the database. Problems will get even worse if multiple servers are backing up to the same RAID 5 pool simultaneously because the writes will turn random, which is the worst case scenario for RAID 5.
Tune the bottleneck between the reads and the writes. If you’re backing up over the network, use 10Gb Ethernet to avoid the pains of trying to push a lot of data through a tiny 1Gb straw.
Tune your backup software settings. If you’re using native backups, start with using multiple files and the built-in options, and graph your results. Third party compression products usually offer all kinds of knobs to tweak – you’ll need to use that same level of graphing diligence.
Whew. I got tired just typing all that stuff. And if you’re lucky, at the end of it, your backups will complete in an hour or two, but the server might be darned near unusable while you’re beating the daylights out of it. Then the fun balancing act starts, trying to figure out the right point where the system is still usable but the backups complete quickly.
Or Just Cheat with SAN Snapshots.
In my Virtualization, SAN, and Hardware video class, I explain how SAN snapshots are able to take a full database backup of any size in just a couple of seconds.
See, while it’s technically a backup, I don’t really consider it a backup until it’s off the primary storage device. Your SAN storage, expensive as it was, is still vulnerable to failure, and you need to get that data out as quickly as possible. The good news is that you can move that data out without dragging it through the SQL Server’s storage connections, CPU, and network ports. You can simply (simply?) hook a virtual tape library, actual tape library, or another storage device to the same storage network, and copy directly between the two.
Your data read speeds may degrade during that process, but it’s up to you – if you want to architect your storage so that it’s fast enough to do these full backups without any noticeable performance to the end user, it’s possible by inserting enough quarters in the front.
You still have to pay attention, though, because your backup process will look like this:
- Daily full backups via SAN snapshots – all writes are quiesced for 1-10 seconds during this time
- Conventional log backups every X minutes – where X is dictated by the business
If you push a big index rebuild job through, you can still bloat the transaction log, and your log backups may take longer than X minutes to complete. This is where our RPO/RTO planning worksheet is so important – if your RPO is 1 minute, you simply may not be able to do index rebuild jobs.
SAN snapshots have one other drawback: depending on your storage make/model, snapshots may not be included in your licensing pricing. You may have to spend a lot more (typically tens of thousands of dollars) to unlock the feature. Ask your SAN admin if snapshots are right for your wallet.
Building Terabyte Servers Means Starting with Backups First
When I’m building a SQL Server to hold multiple terabytes of databases, this backup question is the very first one we have to address – even before we talk about the speed of end user queries.
Otherwise, we could end up designing a server with all local solid state drives, which is very inexpensive and satisfies end user performance goals – but we can’t back the data up fast enough.