Leandro Lucarella
Leandro Lucarella (2010), released under the BOLA license
Bacap is a very simple script (~100 SLOC of Bash) to do an incremental backup that saves space using rsync and hard-links. Is not the first, and it probably will not be the last, so why should you use precisely this one? I have no idea. All I can tell you is:
I did it, so it has to be great!
Is very simple, meaning is very easy to understand and customize.
You can backup multiple hosts.
Did I mention is very simple? I guess that is the only selling point, so remember: It’s very simple =)
Doing something very complex in ~100 SLOC is not easy, unless you’re standing
in the shoulders of giants. I’m standing in the shoulders of rsync mainly, so
you should install it before using the script. You will need a bunch of basic
POSIX commands, like date
, dirname
, readlink
, basename
,
cat
, awk
, etc.; and crontab
if you don’t want to run the script
manually each time you remember to actually do a backup; but I’m sure you
already have those. And of course, Bash, but again, I’m sure you have it too.
If you want to backup remote hosts, be sure ssh is installed too.
Once you have it all, just download the script from the git repo and copy it to wherever you like. Set the executable bit if appropriate:
chmod a+x bacap
If you don’t like the defaults (you probably wont), you can add a configuration file. Configuration files are searched in this places:
/etc/bacaprc
/etc/bacap/bacaprc
bacaprc
in the same directory as the bacap
script
Optional parameter passed as argument to the script
$CONFIG_PATH/$HOST/bacaprc
Order is important, since all files are read (if possible) and values in the
last configuration file read overwrites old values. The script takes an optional
parameter, which is another location to look for a configuration file. If the
configuration file passed as argument can’t be found, an error will be printed
(no error is issued if any of the other configuration files are missing).
Also, config options could be specified on a per host basis by creating a
bacaprc
file in $CONFIG_PATH/$HOST
. As a side effect of this,
configuration file(s) are read initially and each time the script backups a new
host. So the configuration file(s) are read at least two times even if you
backup one host.
The configuration file is a Bash script too, and these are the default values:
# Default config values
# Be verbose
VERBOSE=1
# Be extra verbose
DEBUG=0
# Don't actually do anything, just print the commands
DRY_RUN=0
# Force synchronization, even when the target already exist
FORCE_SYNC=0
# Log file (if empty, print to stdout/err)
LOG_FILE=
# Where to find the configuration of the hosts to backup
CONFIG_PATH=/etc/bacap/hosts
# Name of the local host (so no ssh would be used with this host)
LOCALHOST=$HOSTNAME
# Where to put the backups
BACKUP_PATH=/backup
# Date format used for backed up directories (passed to the date command)
DATE_FMT="%Y-%m-%d"
# Ping remote hosts to check if they are up (set to 0 if your hosts don't
# reply to ICMP pings).
PING_CHECK=1
# rsync flags to use
RSYNC_FLAGS="-aAHSXx --numeric-ids"
# rsync flags to use when in verbose mode
RSYNC_VERBOSE_FLAGS="-v --stats"
# rsync remote shell to use
RSYNC_RSH="ssh"
Once you’ve created the configuration file(s), you should create the directory
$CONFIG_PATH
(meaning, the value you used for that variable in the
configuration file):
mkdir -p $CONFIG_PATH
Then create a directory for each host you want to backup there, the directory
name should be the name of the host (as you would use to connect to it using
ssh). For now let’s say we will only backup localhost
:
mkdir $CONFIG_PATH/$LOCALHOST
You should be able to guess what $LOCALHOST
stands for by now =)
Now, you should create at least one file there, paths
which should have one
line for each path to backup in that host. Let’s say we want to backup only
/etc
and /home
:
echo /etc > $CONFIG_PATH/$LOCALHOST/paths
echo /home >> $CONFIG_PATH/$LOCALHOST/paths
But sometimes there are things there that you don’t want to backup, in that
case you can create a file named excludes
too, and write which paths you
want to exclude there, one path in each line (you can use wildcards and anything
supported by the --exclude-from
rsync option). Let’s say we don’t want to
backup rata’s home:
echo /home/rata/ > $CONFIG_PATH/$LOCALHOST/excludes
Also, if you don’t want to exclude files matching some pattern, you can create
a file named includes
with the patterns you want to include (you can use
anything supported by the --include-from
rsync option)
That’s pretty much it. If you want to add other hosts, just create the host directory and the needed host configuration files.
As we said in the configuration section, the only argument the script take is an extra configuration file. All options are managed through configurations files.
The script creates a new directory in $BACKUP_PATH/$host/$date
and copies
(hard-links) the configured paths for each $host
. $date
is the current
date at the time of starting the script, formated according to $DATE_FMT
. By
default this has day resolution, but you can add hours, minutes or even
seconds if you want to do more frequent backups. If the directory already exist
for any host, it skips that host.
A symbolic link is created at the end of the backup, with the name
$BACKUP_PATH/$host/current
, and pointing to the newly created directory.
Here are a few tips on how to configure Bacap for several common scenarios.
You probably want to automate your backup using cron. I will not include
a cron tutorial here, but if you are completely lost, you can add this line to
/etc/crontab
to make a daily backup at 6:30:
30 6 * * * root /path/to/bacap
If you are a Debian user, you can also simply install the script in
/etc/cron.daily
(or make a symlink or something similar) and you are set.
When doing a backup of a remote host, you probably want ssh to be able to
login without providing a password. To do so, you can generate a ssh key using
ssh-keygen
, copy the public key to the target’s
/root/.ssh/authorized_keys
using ssh-copy-id root@host
(or the user
that runs the backup) and set the Bacap configuration variable RSYNC_RSH
to something like:
RSYNC_RSH="ssh -i /path/to/priv-key -o NumberOfPasswordPrompts=0"
The -o NumberOfPasswordPrompts=0
is not necessary, but you would appreciate
it if something is wrong with your key, since if you don’t use it, rsync will
hang asking for a password.
Also, you may consider using StrictHostKeyChecking=no
ssh option if you
backup hosts with dynamic IP address.
When the bandwidth is not tight, you probably want to ensure ssh doesn’t use compression:
RSYNC_RSH="ssh -o Compression=no"
And if your network is trusted, you probably don’t need very strong encryption either:
RSYNC_RSH="ssh -o Compression=no -c arcfour"
If you want to see what have actually changed between two backups you can run
rsync with your usual flags plus -nv --delete
. For example if you just use
-a
, to see the differences between lolaus/2010-07-11
and
lolaus/2010-07-12
you can run:
rsync -nav --delete lolaus/2010-07-11/ lolaus/2010-07-12/
Do it yourself: this script was heavily inspired by an article by Kevin Korb (well, it was really inspired by the previous version of the article =).
Back In Time: Nice looking graphical alternative.
rsnapshot: A more mature and heavier alternative. I didn’t really used it though, so I can’t say much.
I’m sure there are plenty of others, if you have one and want to be listed here, please feel free to drop me an e-mail.