I’ve worked with Microsoft SQL Server for so long that I just kinda took backups for granted. We run a backup command, and SQL Server:
- Logs activity to the transaction log while the backup is running
- Reads the exact contents of the database files, and writes them out to a backup file
- When it’s done, it also includes the transaction log changes so that it can use the combination of the data files, plus the transactions that happened during the backup itself, to get a single point in time version of the database files
All without stopping transactions or closing the data files. It’s pretty nifty, and it works really well. The good part is that it’s very efficient at backing up the entire database, and restoring the entire database to a single point in time.
The drawback is that it’s impossible to restore a single object from backup, by itself. Oh sure, we’ve complained about it for years, and it’s the #2 top voted feature request, but it doesn’t seem to be happening anytime soon. We’ve learned to work around that by restoring the entire database somewhere else, and then extracting just the data we need.
PostgreSQL Backups are Totally Different.
One of the gotchas of Postgres is that there are a million different ways to accomplish any task. You could stop the database service and get file-level backups, but of course that’s a bad idea for production databases. You could install extensions like Barman to automate backups for you, and in many cases that’s a great idea for production databases. However, we’re going to focus on the built-in way that most shops start with.
When you back up a database with pg_dump, it actually generates a text file with statements like CREATE TABLE and INSERT that reconstruct the data from scratch.
I’ll give you a moment to re-read that sentence.
At first, you’re going to be horrified, but give it a second and open your mind.
Sure, there are drawbacks: if you work with databases over a few hundred megabytes in size, it probably horrifies you to think about a text file that large. No worries: you can tell pg_dump to compress (zip) the output into a custom file format as it goes. Another drawback is that there’s no such thing as combining transaction log backups with this – if you want to get point-in-time recovery, you’re going to need a better solution than pg_dump.
However, pg_dump has some pretty intriguing advantages – for starters, table-level restores. The pg_restore documentation page has all kinds of switches for just restoring one table, or only restoring the data (not the schema), or just restoring specific indexes, or more.
Speaking of indexes, get a load of this: because pg_dump is only backing up the data, indexes don’t bloat your backup files. You could have 50 indexes on a table, but that index data itself isn’t getting backed up – only the index definition, aka the CREATE INDEX statement! At restore time, pg_restore reads the backup file, runs insert commands to load the data, and then when it’s all there, runs the CREATE INDEX statements necessary to re-create your indexes. Your database can be partially online while the indexes are re-created. (Is this restore strategy going to be faster? Probably not, and I’m not going to test it, but it’s wild to know.)
But here’s the part that’s really going to blow your mind: since the dump file is just a list of commands, it’s technically possible to restore a Postgres database back to an earlier version. Take a plain-text backup (or convert the existing one to plain-text format), and then execute the commands, looking for any errors caused by newer engine features. Edit the backup file to remove the unavailable features in your older version, and then try again.
Which Approach Is Better?
Microsoft’s approach focuses on backing up (and restoring) the exact data file contents at extremely high speed, shoving the data out without concern for its contents. Properly tuned, it’s fast as hell. I’ve had clients who regularly backed up and restored 1-5TB databases in just 5-20 minutes. That’s useful when you’ve got very short SLAs, but not shared storage.
Microsoft’s integration with the transaction log also means that the full backup is very extensible. You can integrate it with log backups, or use it to seed transaction log shipping, database mirroring, or Always On Availability Groups. There’s just one backup approach in SQL Server, but it has all kinds of flexibility that Microsoft has built up over the decades.
On the other hand, there are a bunch of different ways to back up Postgres databases. If you choose the pg_dump approach, it also lends itself to all kinds of creative use cases right out of the box. The more I played with it, the more amused I was at its capabilities. For example, backing up data from AWS Aurora Postgres and restoring it to my local Postgres instance was a no-brainer. The fact that one was a platform-as-a-service database in the cloud, and the other was an on-premises database, just simply didn’t matter – something Azure SQL DB just can’t pull off, even though Microsoft manages the whole code stack.
Microsoft has just one backup tool, and it works really well – as long as you don’t need to do something unusual, like restore a single table or downgrade versions. Postgres has lots of backup tools that have more flexibility and power overall – buuuut, it’s up to you to pick the right one and then configure it in a way that supports your RPO/RTO.


39 Comments. Leave new
pg_probackup anyone? No? Yeah, me neither.
Nice article Brent, for fulls vldbs the restore will be a lot of pain even partially online while waiting for the indexes. By the way, do you intend to create fundamental and master postgres’ classes ?
Thanks, glad you liked it! Yes, I’m building classes for Postgres, but they’ll be structured differently.
wow!! i’m in.
Looking forward to their release.
Thanks!
Just the perfect timing, when so many of our sql databases are moving to postgres. Really excited and looking forward
I could listen to Brent even of he was teaching – I don’t know – fricking how to build a sand castle. That’s how great of a speaker/teacher he is.
I don’t work with postgres, but i’ll tune in for sure
Aww, thanks!
The default option should always be the most resilient, reliable, choice. By all means, have other options that allow you to do clever things, but the standard I-don’t-know-what-I’m-doing setup should be a safe one.
As such, I like the SQL Server tool. Do the backup. Do the restore. It just plain works (unless it runs out of disk space).
Sure, it would be nice to restore a single table but, mostly when I want to do this, I don’t want to lose the old version of the table – I want to restore the backed up version into a different table name so that I can update or merge any incorrect or missing records. As such, I’m perfectly happy to do a full restore to a test database and then copy the table to the original database and do any further manipulation manually.
I do like the idea of the (non-clustered) Indexes being recreated, however – It sounds like that would optimize them and, while it might be slower, the disk space and backup/restore time saved might make this worthwhile.
Siggy – when you say, “Do the backup. Do the restore. It just plain works” – are you implying that pg_dump doesn’t work? I’m confused.
I have never used Postgres so have no idea whether pg_dump works as well as SQL Server but having multiple standard ways to do something as critical as backups and restores seems risky to me.
From what you said in the main article, pg_dump does not recreate the Indices automatically and so this is something that the stressed out DBA will need to remember to do before allowing the users back in. Sure, it SHOULD be part of their standard restore procedure but…
Besides, (almost) no-one properly tests backups and restores onto Production in order to assess any impact on performance so they might forget to include that.
Siggy – you’re missing key details from the post, like how pg_dump and pg_restore do indeed automatically recreate the indexes. I’d ask politely that you reread the post, more carefully this time, before commenting again. Hope that’s fair.
You need reaction buttons so we can laud you for the expert handling of inappropriate comments.
Hahaha, that’s true, actually. I think there’s a plugin that lets people upvote comments, lemme see – I should put that in.
Sounds like pg_dump is the equivalent of the SSMS scripting wizard and the Bulk Copy Program. Not that that’s either a good or bad thing.
Exactly! Or building a DACPAC.
I have made a move to PostgreSQL consulting few years (eventhough I still have customers on SQL Server) and when I learn about pg_dump I understood the real meaning of logical and physical backups.
PG_dumps are just logical backups in which you can put whatever you want from the instance, even roles (globals). This type of backup is perfect for migrate from PG9 to PG16 for exemple because it is not dependent of the environment and still the way to upgrade “safely”. With SQL Server upgrades are in most cases straight forward and backup/restore process works fine.
Regarding physical backups on PostgreSQL I used Barman a time but prefere now PGBackrest.
This boils downs to logical vs physical backups. Why not have both?
I would hear about referencial integrity of the restored database. Is it possible to have missing keys where foreign keys point to?
With pg_restore ?
DUMPs are flat file with SQL statements that executed by the instance so if you want to run the restore a multiple threads you might have some issues with foreign keys dependancies but then you would have SQL statement error and the restore process would stop. They are also some tricky case where table designs have circular foreign keys constraints, in that case you could remove the constraints during the restore process and re-apply them afterwards…
Hi Brent – thanks so much for posting this article. I’ve been reading your blog for years. Thank you kind sir for your work!
I’m an Oracle DBA but also have Production experience with SQL Server, and now also on the learning PostgreSQL journey. Exciting times!
For Postgres Production workloads, I recommend checking out pgBackRest. It appears this is the “go-to” tool for Production workloads. The overall complexity of the solution has to be considered as well. When it all turns to custard at 3AM, I don’t want to be reading pages of documentation on how to perform a restore (especially if manual intervention is required). “Simplicity is the ultimate sophistication”.
I gotta say though – I can’t believe something as critical as backup & recovery for serious workloads are best left to third-party tools in the PostgreSQL world. This really surprised me as well. I’ve found similar niggles with the availability/replication story in Postgres too.
Say what you want about commercial database licensing (e.g. SQL Server & Oracle), but one thing that’s really great is the tooling (and getting the basics right) in my opinion.
Question – is pg_dump really a backup tool? It depends. I mean, if you can allow losing all transactions between the beginning of your dump and the time you need to import your dump, then pg_dump can be considered as a backup tool. I’d rather use the term “export” than “backup” so that people don’t have a false sense of security.
You also can’t do PITR with pg_dump. For a continuous-archiving solution (i.e. using the WAL) you need physical backups. Ideally a combination of the two should be performed to try and cover all bases/use cases. But I agree – it really depends on your RPO/RTO requirements and from there making decisions.
One difference I’ve noticed between Postgres and SQL Server backups is that in SQL Server, we create backups using T-SQL, so by supplying a command to the database engine itself. I can write a native T-SQL script to do backups/restores, etc, and the engine knows I’m doing a backup. In Postgres, though, the backup tools appear to be external executables to the database engine itself. pg_dump is just another tool creating a client connection to the DB engine, and the engine isn’t even necessarily aware that a backup operation is happening. I’ve always interpreted this difference in design as reflecting Postgres’ academic origins, versus SQL Server’s commercial genesis – backups are front-and-centre for enterprise users, but are (arguably) less important if the database engine is used as a tool for exploring database research.
I think it’s less about academics vs commercial, and more about SQL Server coming with things built in, and Postgres choosing to offer things as extensions.
In SQL Server, you buy a boxed product and “everything is included,” for better or for worse. Over the decades, SQL Server grew to a monster box that includes everything from Reporting Services to Service Broker to Database Mirroring to Java stored procedures to machine learning to SQL Server Notification Services to Big Data Clusters – I could go on and on. Microsoft pitches SQL Server as a single (expensive) product that has opinions on the right way to do things, and those opinions are built into the product.
That’s a good recipe for success for a commercial boxed product because each additional feature can be seen as additional value (although sometimes it’s actually a liability and a testing nightmare.)
In Postgres, the core product is a relational database engine: that’s it, nothing more. They work hard to keep it simple, and there are organizational barriers to adding more features and complexity to the engine itself. However, anyone can build extensions to do whatever they want, so there’s an (overly vibrant and overgrown) extension ecosystem with no roadmap or catalog.
That’s a good recipe for success for an open source product because the core engine itself doesn’t end up with garbage cruft that has to be tested and maintained over time.
They’re both interesting approaches – they’re just different.
“Microsoft’s approach focuses on backing up (and restoring) the exact data file contents at extremely high speed, shoving the data out without concern for its contents. Properly tuned, it’s fast as hell. I’ve had clients who regularly backed up and restored 1-5TB databases in just 5-20 minutes. That’s useful when you’ve got very short SLAs, but not shared storage.”
Word! And it’s dead easy. We moved to VEEAM and it’s blazing fast…once it gets going. Probably takes 5-20 minutes for triggering the backup and retrieving it from storage. But also zero perf hit on the db but you pay dearly for it.
When I saw for the first time pg_dump generates a text file with statements like CREATE TABLE and INSERT I was like… that is very primitive.
But you are right, it gives space to a wide number of opportunities.
Hold on: what if I have encrypted data? Does it means that they will be in plain text followed by the encrypt command?
Great question! Postgres doesn’t have column-level encryption yet. Here are the kinds of encryption it has: https://www.postgresql.org/docs/current/encryption-options.html
Well, if you use pg_crypto to encrypt a field the data is being decrypted by pg_dump because pg_dump is basically running select queries through the optimizer to build the creates, insert, statements of the flat file.
You could choose to exclude the field that is encrypted during the pg_dump process and extract the encrypted data by using the COPY command into a csv file next to it… but that is very heavy work knowing that DUMP aren’t considered as real backups and most people use BARMAN or PGBackrest anyway to keep the data encrypted.
EDB is also proposing a payed feature with their TDE, I haven’t seen it hit the market properly…
Knowing what I just said about pg_crypto, “smart” people would just encrypt at the file system level, that might be enough for some shops. But I would just let the app handle field encryptions… it might be more suitable with PostgreSQL if that’s a must for you…
For logical backups, pg_dump is the way to go… with it’s positive and negative sides (as described in the post)
There are also a physical backup, which is called: pg_basebackup
That takes a “snapshot” at all storage files (and Write Ahead Log files)
– A physical backup is done on the whole DBMS instance and not individual databases, and of course it’s version dependent (so no restore/load onto a later Major Postgres Version, then pg_dump/logical-backup is needed)
– A Point In Time Recovery can be made on a physical backup…
– Note: The equivalent of SQL Servers transaction log dumps, is to “archive” all WAL files (on another host), which is done automatically by the “archive_command” configuration, which is executed when any WAL file (16 MB) is filled up)
Fully acknowledge your comment Göran. I’ve setup a Postgresql database (on Linux), and a test database which is cloned every night from production, For the warm backup I use pg_basebackup. Really, the equivalent of SQL Server Backup is pg_basebackup, not pg_dump. And the comparison from robinwilson16 below with Oracle datapump is also very apt: there you can also recover data e.g. from a single table.
That being said, with postgresql it was a lot of scripting, but so I could make my backup-, restore- and clone scripts exactly as I wanted to have them. Sure, it is more difficult than to setup SQL Server, and (nearly) nothing for those who hate scripting.
In MS SQL, ever heard of a BACPAC? It is a file that contains a backup like pg_dump. A BACPAC is a compressed file that contains all the DDL statements and data. It is not a simple text file, but in theory, you could accomplish a single table restore. Obviously, Microsoft will not support this.
Robert – the terms “in theory” and “Microsoft will not support this” kinda rule out this as a backup solution, though, right?
Plus, the only way it’d work is if you were taking BACPACs on a regular basis, which would be additional overhead on the server, all in the wild hopes that maybe someone will delete a table.
Doesn’t really make practical sense.
Completely agree! Unsupported land is a place that I do not go unless my boss insists after I push back. I have done some pretty scarry things in unsupported land that I will not embarrass myself by divulging in the specifics. Thankfully, I am not long working for that boss!
This sounds a lot like the Oracle data pump. I remember the first time I was trying to right-click things to back it up only to realise I had to go into CMD and type a load of stuff to tell it which schemas to back up. The data pump had replaced some other mechanism which also still apparently worked. From memory I’m not sure if you could restore a table and not sure the format the backups were created in.
I wondered why it was made so much more difficult compared with the GUI in SQL Server.
I know I have struggled to create backups in MySQL/MariaDB when the databases are large via PHPMyAdmin although it also works ok via CMD and that one includes the full script including create table and create index statements. I did manage to create a script to back up each database and GZip it and delete backups older than X days which worked quite well although took a while to work it out and that could be run as a cron job on a schedule.
I use pgbackrest from Crunchydata for on-prem PG, and I was able to configure something similar to MS SQL backups with this tool (even Diffs). It is flexible and also can be used to populate replicas etc.
Wal-G from Yandex is also promising, but at time of our evaluation it had lack of documentation.
A dump is not a backup.
https://www.dbi-services.com/blog/what-is-a-database-backup-back-to-the-basics/
Use pgBackRest or Barman to do your backups. Starting with Postgres 17 you can do real incremental backups using pg_basebackup
Brent – Our customers routinely have terabyte plus databases that already take a significant
amount of time using SSEE. These databases have hundreds of GB of varchar(max) /
varbinary(max). This seems absolutely unworkable. Our enterprise customers don ‘t have days
to do full backups.
Worked on SQL server, Oracle, Mysql and Postgress. Mysql is pathethic with lots of bugs.