Bacap - The extremely simple backup script

Author

Leandro Lucarella

Contact

luca@llucax.com.ar

Copyright

Leandro Lucarella (2010), released under the BOLA license

About

Bacap is a very simple script (~100 SLOC of Bash) to do an incremental backup that saves space using rsync and hard-links. Is not the first, and it probably will not be the last, so why should you use precisely this one? I have no idea. All I can tell you is:

  1. I did it, so it has to be great!

  2. Is very simple, meaning is very easy to understand and customize.

  3. You can backup multiple hosts.

Did I mention is very simple? I guess that is the only selling point, so remember: It’s very simple =)

Installation

Doing something very complex in ~100 SLOC is not easy, unless you’re standing in the shoulders of giants. I’m standing in the shoulders of rsync mainly, so you should install it before using the script. You will need a bunch of basic POSIX commands, like date, dirname, readlink, basename, cat, awk, etc.; and crontab if you don’t want to run the script manually each time you remember to actually do a backup; but I’m sure you already have those. And of course, Bash, but again, I’m sure you have it too. If you want to backup remote hosts, be sure ssh is installed too.

Once you have it all, just download the script from the git repo and copy it to wherever you like. Set the executable bit if appropriate:

chmod a+x bacap

Configuration

If you don’t like the defaults (you probably wont), you can add a configuration file. Configuration files are searched in this places:

  1. /etc/bacaprc

  2. /etc/bacap/bacaprc

  3. bacaprc in the same directory as the bacap script

  4. Optional parameter passed as argument to the script

  5. $CONFIG_PATH/$HOST/bacaprc

Order is important, since all files are read (if possible) and values in the last configuration file read overwrites old values. The script takes an optional parameter, which is another location to look for a configuration file. If the configuration file passed as argument can’t be found, an error will be printed (no error is issued if any of the other configuration files are missing). Also, config options could be specified on a per host basis by creating a bacaprc file in $CONFIG_PATH/$HOST. As a side effect of this, configuration file(s) are read initially and each time the script backups a new host. So the configuration file(s) are read at least two times even if you backup one host.

The configuration file is a Bash script too, and these are the default values:


# Default config values

# Be verbose
VERBOSE=1

# Be extra verbose
DEBUG=0

# Don't actually do anything, just print the commands
DRY_RUN=0

# Force synchronization, even when the target already exist
FORCE_SYNC=0

# Log file (if empty, print to stdout/err)
LOG_FILE=

# Where to find the configuration of the hosts to backup
CONFIG_PATH=/etc/bacap/hosts

# Name of the local host (so no ssh would be used with this host)
LOCALHOST=$HOSTNAME

# Where to put the backups
BACKUP_PATH=/backup

# Date format used for backed up directories (passed to the date command)
DATE_FMT="%Y-%m-%d"

# Ping remote hosts to check if they are up (set to 0 if your hosts don't
# reply to ICMP pings).
PING_CHECK=1

# rsync flags to use
RSYNC_FLAGS="-aAHSXx --numeric-ids"

# rsync flags to use when in verbose mode
RSYNC_VERBOSE_FLAGS="-v --stats"

# rsync remote shell to use
RSYNC_RSH="ssh"

Once you’ve created the configuration file(s), you should create the directory $CONFIG_PATH (meaning, the value you used for that variable in the configuration file):

mkdir -p $CONFIG_PATH

Then create a directory for each host you want to backup there, the directory name should be the name of the host (as you would use to connect to it using ssh). For now let’s say we will only backup localhost:

mkdir $CONFIG_PATH/$LOCALHOST

You should be able to guess what $LOCALHOST stands for by now =)

Now, you should create at least one file there, paths which should have one line for each path to backup in that host. Let’s say we want to backup only /etc and /home:

echo /etc > $CONFIG_PATH/$LOCALHOST/paths
echo /home >> $CONFIG_PATH/$LOCALHOST/paths

But sometimes there are things there that you don’t want to backup, in that case you can create a file named excludes too, and write which paths you want to exclude there, one path in each line (you can use wildcards and anything supported by the --exclude-from rsync option). Let’s say we don’t want to backup rata’s home:

echo /home/rata/ > $CONFIG_PATH/$LOCALHOST/excludes

Also, if you don’t want to exclude files matching some pattern, you can create a file named includes with the patterns you want to include (you can use anything supported by the --include-from rsync option)

That’s pretty much it. If you want to add other hosts, just create the host directory and the needed host configuration files.

Usage

As we said in the configuration section, the only argument the script take is an extra configuration file. All options are managed through configurations files.

The script creates a new directory in $BACKUP_PATH/$host/$date and copies (hard-links) the configured paths for each $host. $date is the current date at the time of starting the script, formated according to $DATE_FMT. By default this has day resolution, but you can add hours, minutes or even seconds if you want to do more frequent backups. If the directory already exist for any host, it skips that host.

A symbolic link is created at the end of the backup, with the name $BACKUP_PATH/$host/current, and pointing to the newly created directory.

Tips

Here are a few tips on how to configure Bacap for several common scenarios.

Automating backups using cron

You probably want to automate your backup using cron. I will not include a cron tutorial here, but if you are completely lost, you can add this line to /etc/crontab to make a daily backup at 6:30:

30 6 * * * root /path/to/bacap

If you are a Debian user, you can also simply install the script in /etc/cron.daily (or make a symlink or something similar) and you are set.

Providing a ssh key

When doing a backup of a remote host, you probably want ssh to be able to login without providing a password. To do so, you can generate a ssh key using ssh-keygen, copy the public key to the target’s /root/.ssh/authorized_keys using ssh-copy-id root@host (or the user that runs the backup) and set the Bacap configuration variable RSYNC_RSH to something like:

RSYNC_RSH="ssh -i /path/to/priv-key -o NumberOfPasswordPrompts=0"

The -o NumberOfPasswordPrompts=0 is not necessary, but you would appreciate it if something is wrong with your key, since if you don’t use it, rsync will hang asking for a password.

Also, you may consider using StrictHostKeyChecking=no ssh option if you backup hosts with dynamic IP address.

Backup local networks nodes (or nodes with a fast connection)

When the bandwidth is not tight, you probably want to ensure ssh doesn’t use compression:

RSYNC_RSH="ssh -o Compression=no"

And if your network is trusted, you probably don’t need very strong encryption either:

RSYNC_RSH="ssh -o Compression=no -c arcfour"

Listing differences between 2 snapshots

If you want to see what have actually changed between two backups you can run rsync with your usual flags plus -nv --delete. For example if you just use -a, to see the differences between lolaus/2010-07-11 and lolaus/2010-07-12 you can run:

rsync -nav --delete lolaus/2010-07-11/ lolaus/2010-07-12/

Similar alternatives

  • Do it yourself: this script was heavily inspired by an article by Kevin Korb (well, it was really inspired by the previous version of the article =).

  • Back In Time: Nice looking graphical alternative.

  • rsnapshot: A more mature and heavier alternative. I didn’t really used it though, so I can’t say much.

I’m sure there are plenty of others, if you have one and want to be listed here, please feel free to drop me an e-mail.