An alternative backup strategy is to directly copy the files that
PostgreSQL uses to store the data in the database. In
Section 16.2 it is explained where these files
are located, but you have probably found them already if you are
interested in this method. You can use whatever method you prefer
for doing usual file system backups, for example
tar -cf backup.tar /usr/local/pgsql/data
There are two restrictions, however, which make this method
impractical, or at least inferior to the pg_dump
method:
The database server must be shut down in order to
get a usable backup. Half-way measures such as disallowing all
connections will not work
(mainly because tar and similar tools do not take an
atomic snapshot of the state of the file system at a point in
time). Information about stopping the server can be found in
Section 16.5. Needless to say that you
also need to shut down the server before restoring the data.
If you have dug into the details of the file system layout of the
database, you may be tempted to try to back up or restore only certain
individual tables or databases from their respective files or
directories. This will not work because the
information contained in these files contains only half the
truth. The other half is in the commit log files
pg_clog/*, which contain the commit status of
all transactions. A table file is only usable with this
information. Of course it is also impossible to restore only a
table and the associated pg_clog data
because that would render all other tables in the database
cluster useless. So file system backups only work for complete
restoration of an entire database cluster.
An alternative file-system backup approach is to make a
"consistent snapshot" of the data directory, if the
file system supports that functionality (and you are willing to
trust that it is implemented correctly). The typical procedure is
to make a "frozen snapshot" of the volume containing the
database, then copy the whole data directory (not just parts, see
above) from the snapshot to a backup device, then release the frozen
snapshot. This will work even while the database server is running.
However, a backup created in this way saves
the database files in a state where the database server was not
properly shut down; therefore, when you start the database server
on the backed-up data, it will think the server had crashed
and replay the WAL log. This is not a problem, just be aware of
it (and be sure to include the WAL files in your backup).
If your database is spread across multiple file systems, there may not
be any way to obtain exactly-simultaneous frozen snapshots of all
the volumes. For example, if your data files and WAL log are on different
disks, or if tablespaces are on different file systems, it might
not be possible to use snapshot backup because the snapshots must be
simultaneous.
Read your file system documentation very carefully before trusting
to the consistent-snapshot technique in such situations. The safest
approach is to shut down the database server for long enough to
establish all the frozen snapshots.
Another option is to use rsync to perform a file
system backup. This is done by first running rsync
while the database server is running, then shutting down the database
server just long enough to do a second rsync. The
second rsync will be much quicker than the first,
because it has relatively little data to transfer, and the end result
will be consistent because the server was down. This method
allows a file system backup to be performed with minimal downtime.
Note that a file system backup will not necessarily be
smaller than an SQL dump. On the contrary, it will most likely be
larger. (pg_dump does not need to dump
the contents of indexes for example, just the commands to recreate
them.)