You manage a SQL Server instance with hundreds or thousands of mission-critical databases. You want to back them all up as quickly as possible, and one maintenance plan job isn’t going to cut it.
You also want to protect those databases by restoring them as quickly as possible somewhere else – maybe nearby, or maybe to a totally separate place like the cloud.
First, let’s scale out our backup jobs by:
- Installing Ola Hallengren’s free backup script
- Creating a table with a list of databases and their desired Recovery Point Objective (RPO, aka data loss) – done with sp_AllNightLog_Setup
- Set up several Agent jobs to back up databases as necessary – also done with sp_AllNightLog_Setup
- Inside each of those Agent jobs, they call sp_AllNightLog @Backup = 1, which loops through the table to find databases that need to be backed up, then call Ola Hallengren’s DatabaseBackup stored procedure
- Keeping that database list up to date as new databases are added – done by a job calling sp_AllNightLog @PollForNewDatabases = 1
Then, we’ll scale out our restores with a similar technique. Let’s start with backups first.
Setting Up sp_AllNightLog
- SQL Server 2016 or newer – this may work on older versions, but no guarantees.
- Ola Hallengren’s Maintenance Scripts must be installed – you don’t have to use them for your full backups, index maintenance, or checkdb, but sp_AllNightLog relies on the backup, command execution, and logging. They’re free, open source, and well-tested by the community, so we’re piggybacking on those.
Once those are set, download the free First Responder Kit, which includes stuff like sp_AllNightLog, sp_Blitz, sp_BlitzIndex, and more database utility scripts.
Install sp_AllNightLog, sp_AllNightLog_Setup, and sp_DatabaseRestore by running their respective .sql scripts in the master database, or the database of your choice. These scripts install the stored procedures, but they don’t actually call the stored procs for the first time.
To create the infrastructure for both backing up and restoring, run:
sp_AllNightLog_Setup @RPOSeconds = 60, @RTOSeconds = 60, @BackupPath = '\\FILESERVER\FOLDER\', @RestorePath = '\\FILESERVER\FOLDER\PRIMARYSQLSERVERNAME', @Jobs = 4, @RunSetup = 1;
Parameters for sp_AllNightLog_Setup:
- @RPOSeconds – we’ll aim to run log backups on this frequency. (Depending on the number of jobs, log activity, server workload, etc, it may be higher than that – more on that further in the documentation.)
- @RTOSeconds – like @RPOSeconds, but for restores.
- @BackupPath – where your valuable goodies go. Public service announcement: local backups are useless if Windows won’t start.
- @RestorePath – where we’ll restore from. Note in the sample call that there’s a server name in here: if you’re using a single folder share to back up multiple SQL Servers, we need to know which one to back up.
- @Jobs – the number of Agent jobs we’ll create to run backups, between 1 and 20. We recommend at least 2 jobs to start, and if you’ve got over 20 databases, add 1 job per every 20 databases.
- @RunSetup – if 1, it creates the database msdbCentral, adds the tables backup_worker and configuration, and adds agent jobs to call sp_AllNightLog. These jobs are disabled by default.
- @UpdateSetup – if 1, updates the msdbCentral.dbo.backup_configuration with the new RPOSeconds and BackupPath parameters, and enables/disables Agent jobs based on the…
- @EnableBackupJobs, @EnableRestoreJobs – if null (default), no actions are taken. If 0, disables the jobs. If 1, enables them. Can either be used with @RunSetup = 1 during initial config, or with @UpdateSetup = 1 during a failover.
- @Debug – if 1, shows messages in the SSMS Messages tab while it works.
- @Help – if 1, prints out messages nobody ever reads. Although you might. You’re reading this, after all.
After it finishes, you can see the tables it created:
SELECT * FROM msdbCentral.dbo.backup_worker; SELECT * FROM msdbCentral.dbo.backup_configuration; SELECT * FROM msdb.dbo.restore_worker; SELECT * FROM msdb.dbo.restore_configuration;
Note that the dbo.restore_configuration and dbo.restore_worker tables are stored in msdb, not msdbCentral – more on that in the restore section.
And in SSMS, go into Agent, Jobs to see your newly created jobs, all of which start with sp_AllNightLog. They’re all disabled by default – you should only enable them after you’ve read and understood this documentation.
Jobs for the Primary (Backing-Up) Server
The sp_AllNightLog_PollForNewDatabases Agent Job
When enabled, this job runs sp_AllNightLog @PollForNewDatabases = 1, which:
- Looks in sys.databases for databases that don’t exist in dbo.backup_worker, and adds them
- Looks for any rows in dbo.backup_worker that indicate a failed log backup, and clear it out for restarting
- Waits for one minute, then goes back to step 1
You only need one of these jobs running. sp_AllNightLog_Setup creates one of these, but it’s disabled by default.
The sp_AllNightLog_Backup Agent Jobs
When enabled, these jobs run sp_AllNightLog @Backup = 1, which:
- Queries dbo.backup_worker for the top 1 database not backed up in [RPO] (ordered by the worst RPO first)
- If it didn’t find a database needing a backup, it waits for 3 seconds (not configurable as of mid-2017, but I’d love to have this check the backup_worker table and make a better decision about how long to wait)
- Updates dbo.backup_worker to indicate that we’re starting a backup
- Calls Ola Hallengren’s DatabaseBackup stored proc to run a backup
- Updates dbo.backup_worker to indicate that the backup finished (either successfully, or with an error number)
- Goes back to step 1
You probably want at least two of these jobs because if one database suddenly generates a lot of transaction log activity, and the log backup takes a while, you want other backups running simultaneously.
The right number of sp_AllNightLog_Backup Agent jobs depends on:
- Your RPO – the more data you’re willing to lose, the longer you may be comfortable going between log backups
- Your number of databases – the more databases you have (especially with short RPOs), the more simultaneous backups you may need to take
- Your server’s performance, and the backup target’s speed – the faster your server, the more simultaneous jobs you may be comfortable running
As of v1, we don’t have good performance baseline numbers yet across lots of customer environments. As we see this thing in more places, we’ll be able to give better guidance. Down the road, we’d love to have sp_AllNightLog_Setup @UpdateSetup also give recommendations on the number of Agent jobs you should be running (since we know your backup history and performance throughput.)
Jobs for the Secondary (Restoring) Server
The restore jobs are basically just like the backup jobs:
- sp_AllNightLog_PollDiskForNewDatabases Job (1) – looks at your restore path to see if a new database has shown up, and if so, adds it to msdb.dbo.restore_worker. You only need one of these.
- sp_AllNightLog_Restore Jobs (Multiple) – calls sp_AllNightLog @Restore = 1. Instead of calling Ola Hallengren’s DatabaseBackup stored proc, @Restore = 1 means we call the First Responder Kit’s sp_DatabaseRestore.
You’ll only either backups OR restore jobs enabled – but not both.
The msdbCentral database is created by sp_AllNightLog_Setup, but on secondaries, you’ll probably want to drop the empty msdbCentral database and restore it from the primary. That way, whenever the primary fails over to the secondary, you’ll run RESTORE WITH RECOVERY on the msdbCentral database, and immediately your backup configuration will be ready to go. (You’ll still need to disable the restore jobs, and enable the backup jobs.)
- sp_AllNightLog @Restore = 1 can be blocked if someone else is using a database.
- sp_AllNightLog @Restore = 1 overwrites existing databases without warning.
- sp_AllNightLog @PollDiskForNewDatabases = 1 will add new database records for any folders it finds – even if there’s no valid full backups there.
- sp_AllNightLog_Setup will fail if the msdbCentral database exists and is in a restoring state.
- User databases only – system databases aren’t handled here.
Advanced: Cross-Region Disaster Recovery
Once you’ve mastered the above techniques, the next level is to introduce a disaster recovery SQL Server in another data center:
Here’s what makes this technique easier:
Connection strings should use a DNS CNAME. Your applications and users should point to a DNS CNAME (in this diagram, VirtualSQLServer1), which is a record you can edit at any time to point to a different SQL Server. When you need to fail over from SQLServerA to a different one, just edit the CNAME so that instead of pointing to SQLServerA, it points to SQLServerB (or DR or whatever). DNS management is outside of the scope of this post, but we should mention to keep your TTLs short to avoid long failovers.
Your backup file server should sync to the cloud. As SQL Server backups finish, use a sync tool or script to copy all backups up to Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage. For details on how to do this with Google, read the “Google Compute Engine – Disaster Recovery.pdf” included in our First Responder Kit.
Build a DR site ahead of time in the cloud. You could do this in the same cloud provider, or a different one. Set up the same file sync app so that the DR file server (in this diagram, FilesDR) is continuously pulling your backups down from cloud file storage.
Set up SQLServerDR just like you did SQLServerB. Configure sp_AllNightLog_Restore to pull the backups from \FilesDR\Backups.
Getting Started and Next Steps
- Read the sp_AllNightLog documentation
- Download the free First Responder Kit (and check out Tara’s white paper in there about disaster recovery in Google Cloud)
- For interactive help, visit the #FirstResponderKit channel in the SQL Server community Slack (just be patient, community members aren’t always around in the Slack room)
- To file a bug, hit the Github Issues
If you’d like us to help with a consulting implementation, check out our SQL Critical Care®. We work together to assess your SQL Server high availability and disaster recovery needs, your staff’s strengths and challenges, and your application’s requirements. Then we build a custom HA/DR plan for you, and help you get there.