One of the main tasks of any kind of administrator is to make sure that the data that he's responsible for will be available if anything bad happens (asteroids, flood, locusts, hail) in order to do that you will need a high availability solution to obtain continuity and a good backup plan to support your system in case of human error ( -Hello Mr. admin, i just dropped the customer payment table) trust me, it happens ... a lot...
I briefly covered high availability with the implementation of a hot standby in an earlier post and now its time to cover backup options available in PostgreSQL.
There are 2 kinds of backups in PostgreSQL, physical and logical.
The good thing with logical backups is that they are simple to implement and maintain, selective backup and restore (even in later PG versions) is easy. Usually the backup output consists in one file that of course can be compressed. the major con of this method is that lacks Point In Time Recovery. (
A database with the PITR feature can be restored or recovered to the
state that it had at any time since PITR logging was started for that
database.) it also lacks incrementality (
each backup is a new WHOLE backup of your database). This makes these kinds of backups not really usable in very large production installations. Before you decide to say, "Not good" let me tell you cases that these kinds of backups would be better than incremental backups.
Test servers with not much data, small installations that the output will be small and the backups can be taken in a daily (or even less) basis. Large installations that don't have many data changes over the day. Data Warehouses and reporting servers.
Examples
once again PGlab1 will be the my lab rat , don't worry,
No Animals or Data Were Harmed
PGDATA=/opt/PGDATA
export backupfile=""backup"_`date "+%d%m%Y"`.tgz"
pg_ctl stop && tar cvfz ~/$backupfile $PGDATA && pg_ctl start
Basicaly i just stopped the database , took a tgz of my $PGDATA and started the DB again.
simple and effective, restore can be done on a different path than $PGDATA, just make sure you provide -D on pg_ctl or set the PGDATA to the correct path before you start.
pg_dump and pg_dumpall
pg_dump and pg_dumpall, exports one or all databases to a (by default) human readable sql format.
it can be compressed by default and it supports a lot of options like data only , schema only etc.
i have a database for this posts sake called testbackup
if i run pg_dump testbackup i will get on stdout one by one the sql commands that i would need to remake the database from scratch so just by redirecting it to a file you have a simple backup. i wont get into details about the format of this file , i will just say that at first you will see connection details like the encoding and the extensions that exist , then the table creation script, then the data (using a postgres command called COPY) and then the indexes and constrains.
NOTE that taking a pg_dump wont backup the users, and thats because users in postgres are global and they exist in postgres database. to backup users you can use pg_dumpall -g (-g means globals).
Here's a script that i am using in order to take this kind of backup :
export PGDATA=/opt/db/data
export PGPASSFILE=/opt/db/data/pgpass.conf
logfile=/var/lib/pgsql/backup/pgbackup.log
backupdir=/var/lib/pgsql/backup
pgdump=/usr/bin/pg_dump
psql=/usr/bin/psql
pgdumpall=/usr/bin/pg_dumpall
psqluser=postgres
retention=7
#</Variables>
usersfilename="`date "+%d%m%Y"`.bck.users"
$pgdumpall -g -U $psqluser > $backupdir/$usersfilename && find $backupdir/*bck.users* -ctime +$retention -exec rm {} \;
for db in `echo "select datname from pg_database
where datname not in ('template0','template1','postgres');
"|$psql -A -t postgres`
do
backupname=""$db"_`date "+%d%m%Y"`.bck"
logfilename=""$db"_`date "+%d%m%Y"`.bck.log"
usersfilename=""$db"_`date "+%d%m%Y"`.users"
$pgdump -Fc -v -f $backupdir/$backupname -U $psqluser $db 2> $backupdir/$logfilename && find $backupdir/$db*bck* -ctime +$retention -exec rm {} \;
done
Notice that i use the -Fc switch in pg_dump, that means custom format, and it can be used for selective restore using the pg_restore command. if i had one of these backups and i wanted to restore the table "customers" i would run :
pg_restore -Fc -t customers -f <file name> -U <username> -h < host name> -d <db name>
NOTE that there is a switch (-j) for parallelism.
more about pg_restore pg_dump and pg_dumpall on :
pg_restore ,
pg_dump ,
pg_dumpall
Now that we are done with database dump backup basics , lets move to live , or online backup , PITR and timelines.
In order to get a backup that is incremental you will need a basebackup and all the changes that transactions do to the database, so you need the transaction logs or as we call them in postgres WAL segments. I wont say many things about how transaction mechanism works in postgres, this is a backup and restore post so i will leave WAL mechanism for another post.
Standalone hot physical database backup
I will use the following directories and variables for examples sake
export BACKUPNAME=""backup"_`date "+%d%m%Y"`.tgz"
postgres@pglab1:/opt$ ls -l
total 8
drwxr-xr-x 2 postgres postgres 4096 Mar 27 12:32 BACKUP
drwx------ 15 postgres postgres 4096 Mar 27 11:40 PGDATA
mkdir /opt/BACKUP/archives
Set an archive_command. In postgresql.conf and restart the server:
wal_level = archive
archive_mode = on
archive_command = 'test -f /opt/BACKUP/archiving/archiving_active && cp %p /opt/BACKUP/archive/%f'
mkdir /opt/BACKUP/archiving/
touch /opt/BACKUP/archiving/archiving_active
now run :
psql -c "select pg_start_backup('BACKUP')"
tar -cvzf --exclude=$PGDATA/pg_xlog -f ../BACKUP/$BACKUPNAME $PGDATA
psql -c "select pg_stop_backup(), current_timestamp"
now, lets crash and restore
rm -rf /opt/PGDATA/* (yoohoo !!!)
untar the backup (.tgz) in $PGDATA , you should miss pg_xlog dir , create it as postgres user
then on $PGDATA edit a file called
recovery.conf and add :
restore_command = 'cp /opt/BACKUP/archive/%f %p'
start the database and watch the logfile, it should show something like :
2013-03-27 13:22:58 EET::@:[3047]: LOG: archive recovery complete
2013-03-27 13:22:58 EET::@:[3045]: LOG: database system is ready to accept connections
2013-03-27 13:22:58 EET::@:[3061]: LOG: autovacuum launcher started
the recovery.conf will also be automatically renamed to recovery.done.
Hot physical backup & Continuous Archiving
Now this is what you would want for a mission critical production installation with a lot of GBs or Tbs of data and a lot of concurrent users hitting the DB 24/7.
For examples sake i will delete my whole cluster and make the steps one at the time the backup will be taken locally something that of course is not suggested, and at the end i will perform a PITR and i will also talk about timelines.
edit postgresql.conf and enable archiving :
wal_level = archive
archive_mode = on
archive_command = 'cp %p /opt/BACKUP/archives/%f'
(
NOTE that archive_command can be scp, a more advanced external script or anything that would transfer the archived WALs to the desired location)
restart the server
psql -c "select pg_start_backup('my backup')"
you can now tar , rsync or whatever you want to another node, something like
"rsync -cva --inplace --exclude=*pg_xlog* ${PGDATA}$OTHERNODE:$BACKUPNAME/$PGDATA"
would work
for my example, i will just use tar like the previous example:
tar -cvz --exclude=/opt/PGDATA/pg_xlog/ -f /opt/BACKUP/backup.tgz $PGDATA
psql -c "select pg_stop_backup(), current_timestamp"
At this moment i have a base backup , and the mechanism that archives all wal segments, lets add some data and force some checkpoints.
notice that the archives directory now has WALs
postgres@pglab1:/opt/PGDATA/pg_xlog$ ls -l /opt/BACKUP/archives/
total 49156
-rw------- 1 postgres postgres 16777216 Mar 27 13:57 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Mar 27 14:02 000000010000000000000002
-rw------- 1 postgres postgres 293 Mar 27 14:02 000000010000000000000002.00000020.backup
-rw------- 1 postgres postgres 16777216 Mar 27 14:04 000000010000000000000003
a WAL segment is happening either on size or time threshold, with the default postgresql.conf values that means on 16Mb or every 5 minutes, whatever happens first. Both parameters can be, and should be changed for performance's sake depending on your workload, so monitor checkpoint frequency.
now lets say that something really bad happened, like a mistaken but commited update on the table backup to make it easier for me i created that table with a datetime column with default value now().
so we have :
datetime | count
----------------------------+-------
2013-03-27 14:05:05.999257 | 1000
2013-03-27 14:05:14.911462 | 1000
2013-03-27 14:05:19.419173 | 1000
2013-03-27 14:05:25.631254 | 1000
2013-03-27 14:06:39.97177 | 1000
2013-03-27 14:09:53.571976 | 1000
Lets also assume that we know that the update was recorded at 2013-03-27 14:05:25.631254 and we want the database back to that exact time.
edit a recovery.conf as we did before :
restore_command = 'cp /opt/BACKUP/archives/%f %p'
recovery_target_time = '2013-03-27 14:04:00'
and restart the db, check the logfile , you'll see something like :
LOG: starting point-in-time recovery to 2013-03-27 14:04:00+02
Now lets Recover from a crush
once again , rm -rf /opt/PGDATA/*
untar the basebackup , place recovery.conf with or without the recovery_target_time
and start the database.
in my example i also did a PITR to 2013-03-27 14:01:00
and the table now has :
1 | 2013-03-27 13:56:49.163269
PostgreSQL documentation describes timelines much better than i could. so here it is right from the documentation :
The ability to restore the database to a previous point in
time creates some complexities that are akin to science-fiction
stories about time travel and parallel universes. For example,
in the original history of the database, suppose you dropped a
critical table at 5:15PM on Tuesday evening, but didn't realize
your mistake until Wednesday noon. Unfazed, you get out your
backup, restore to the point-in-time 5:14PM Tuesday evening,
and are up and running. In
this history of the database
universe, you never dropped the table. But suppose you later
realize this wasn't such a great idea, and would like to return
to sometime Wednesday morning in the original history. You
won't be able to if, while your database was up-and-running, it
overwrote some of the WAL segment files that led up to the time
you now wish you could get back to. Thus, to avoid this, you
need to distinguish the series of WAL records generated after
you've done a point-in-time recovery from those that were
generated in the original database history.
To deal with this problem,
PostgreSQL has a notion of
timelines. Whenever an archive recovery
completes, a new timeline is created to identify the series of
WAL records generated after that recovery. The timeline ID
number is part of WAL segment file names so a new timeline does
not overwrite the WAL data generated by previous timelines. It
is in fact possible to archive many different timelines. While
that might seem like a useless feature, it's often a lifesaver.
Consider the situation where you aren't quite sure what
point-in-time to recover to, and so have to do several
point-in-time recoveries by trial and error until you find the
best place to branch off from the old history. Without
timelines this process would soon generate an unmanageable
mess. With timelines, you can recover to
any prior state, including states
in timeline branches that you abandoned earlier.
Every time a new timeline is created,
PostgreSQL creates a
"timeline history" file that shows which
timeline it branched off from and when. These history files are
necessary to allow the system to pick the right WAL segment
files when recovering from an archive that contains multiple
timelines. Therefore, they are archived into the WAL archive
area just like WAL segment files. The history files are just
small text files, so it's cheap and appropriate to keep them
around indefinitely (unlike the segment files which are large).
You can, if you like, add comments to a history file to record
your own notes about how and why this particular timeline was
created. Such comments will be especially valuable when you
have a thicket of different timelines as a result of
experimentation.
The default behavior of recovery is to recover along the
same timeline that was current when the base backup was taken.
If you wish to recover into some child timeline (that is, you
want to return to some state that was itself generated after a
recovery attempt), you need to specify the target timeline ID
in
recovery.conf. You cannot recover
into timelines that branched off earlier than the base
backup.
I know that this probably needs a review, and the plan is to do it at some point.
Thanks for reading
Vasilis