As a SQL Server DBA, Postgres Backups Surprised Me.
39 Comments
I’ve worked with Microsoft SQL Server for so long that I just kinda took backups for granted. We run a backup command, and SQL Server:
- Logs activity to the transaction log while the backup is running
- Reads the exact contents of the database files, and writes them out to a backup file
- When it’s done, it also includes the transaction log changes so that it can use the combination of the data files, plus the transactions that happened during the backup itself, to get a single point in time version of the database files
All without stopping transactions or closing the data files. It’s pretty nifty, and it works really well. The good part is that it’s very efficient at backing up the entire database, and restoring the entire database to a single point in time.
The drawback is that it’s impossible to restore a single object from backup, by itself. Oh sure, we’ve complained about it for years, and it’s the #2 top voted feature request, but it doesn’t seem to be happening anytime soon. We’ve learned to work around that by restoring the entire database somewhere else, and then extracting just the data we need.
PostgreSQL Backups are Totally Different.
One of the gotchas of Postgres is that there are a million different ways to accomplish any task. You could stop the database service and get file-level backups, but of course that’s a bad idea for production databases. You could install extensions like Barman to automate backups for you, and in many cases that’s a great idea for production databases. However, we’re going to focus on the built-in way that most shops start with.
When you back up a database with pg_dump, it actually generates a text file with statements like CREATE TABLE and INSERT that reconstruct the data from scratch.
I’ll give you a moment to re-read that sentence.
At first, you’re going to be horrified, but give it a second and open your mind.
Sure, there are drawbacks: if you work with databases over a few hundred megabytes in size, it probably horrifies you to think about a text file that large. No worries: you can tell pg_dump to compress (zip) the output into a custom file format as it goes. Another drawback is that there’s no such thing as combining transaction log backups with this – if you want to get point-in-time recovery, you’re going to need a better solution than pg_dump.
However, pg_dump has some pretty intriguing advantages – for starters, table-level restores. The pg_restore documentation page has all kinds of switches for just restoring one table, or only restoring the data (not the schema), or just restoring specific indexes, or more.
Speaking of indexes, get a load of this: because pg_dump is only backing up the data, indexes don’t bloat your backup files. You could have 50 indexes on a table, but that index data itself isn’t getting backed up – only the index definition, aka the CREATE INDEX statement! At restore time, pg_restore reads the backup file, runs insert commands to load the data, and then when it’s all there, runs the CREATE INDEX statements necessary to re-create your indexes. Your database can be partially online while the indexes are re-created. (Is this restore strategy going to be faster? Probably not, and I’m not going to test it, but it’s wild to know.)
But here’s the part that’s really going to blow your mind: since the dump file is just a list of commands, it’s technically possible to restore a Postgres database back to an earlier version. Take a plain-text backup (or convert the existing one to plain-text format), and then execute the commands, looking for any errors caused by newer engine features. Edit the backup file to remove the unavailable features in your older version, and then try again.
Which Approach Is Better?
Microsoft’s approach focuses on backing up (and restoring) the exact data file contents at extremely high speed, shoving the data out without concern for its contents. Properly tuned, it’s fast as hell. I’ve had clients who regularly backed up and restored 1-5TB databases in just 5-20 minutes. That’s useful when you’ve got very short SLAs, but not shared storage.
Microsoft’s integration with the transaction log also means that the full backup is very extensible. You can integrate it with log backups, or use it to seed transaction log shipping, database mirroring, or Always On Availability Groups. There’s just one backup approach in SQL Server, but it has all kinds of flexibility that Microsoft has built up over the decades.
On the other hand, there are a bunch of different ways to back up Postgres databases. If you choose the pg_dump approach, it also lends itself to all kinds of creative use cases right out of the box. The more I played with it, the more amused I was at its capabilities. For example, backing up data from AWS Aurora Postgres and restoring it to my local Postgres instance was a no-brainer. The fact that one was a platform-as-a-service database in the cloud, and the other was an on-premises database, just simply didn’t matter – something Azure SQL DB just can’t pull off, even though Microsoft manages the whole code stack.
Microsoft has just one backup tool, and it works really well – as long as you don’t need to do something unusual, like restore a single table or downgrade versions. Postgres has lots of backup tools that have more flexibility and power overall – buuuut, it’s up to you to pick the right one and then configure it in a way that supports your RPO/RTO.







If your company is hiring, leave a comment. The rules:






















