Virtually everybody agrees that backups are a good idea, but few people actually do them. Backup software is often time consuming to set up or just overkill for a single-user system. In this article I’ll show how easy it is to build your own backup solution on Linux using rsync and an external disk drive.
The backup schema proposed here is based on Mike Rubel’s excellent Easy Automated Snapshot-Style Backups with Linux and Rsync article. In my schema we leave out anything sophisticated and concentrate on the single-user scenario. The result is probably one of the simplest and least expensive backup systems you can get away with.
These are the features in a nutshell:
- No network, server or special software (except rsync) required
- Suitable for single-user workstations
- After setup usable without root access
- Multiple backups are possible, with each being a complete snapshot
- Restoring a backup is done using standard Unix utilities (cp!)
- Very space-efficient due to use of hard links between backups
- Backup mediums are cheap (any external USB drive will do)
Rsync snapshots make use of Unix hard links to save space. This works under the assumption that only a small percentage of files change between backups which should be true in many cases.
The backup medium shouldn’t be mounted all the time because that would increase the risk of accidental overwrites and other hazards. An external drive disconnected from both power and USB most of the time is a pretty good solution for single-user workstations. The downside is that backups have to be triggered manually which is a bit inconvenient and requires discipline.
If you need networked multi-user backups, try dirvish which shares many advantages with the solution presented here. It is a proven rsync-based system and can be executed from a central backup server via cron.
First of all you need an external disk drive. I would suggest that it should be at least 2-3 times as big as the data you want to backup. Then create a file system that supports hard links (ext2/3 do, among others). When using ext2/3 it’s a good idea to set the usual 5% space reservation for user root to zero; otherwise you’d waste a lot of space. You might also want to adjust the file system check interval using
Make sure your user can mount the disk and write to it. Create a dedicated
backup directory on your disk, especially if you use the disk for other archiving purposes, too. It also helps to easily check if the disk is mounted.
In your backup directory, each snapshot gets its own subdirectory. The snapshots share common, unmodified files to save space. After a few backups, the directory looks like this:
$ ls /path/to/backup/dir/ backup.0 backup.1 backup.2 backup.3 $
The latest backup is always
backup.0, the oldest is the one with the highest number (
backup.3 here). When a new backup is made, the oldest directory is deleted and the other ones’ IDs are moved up. That means,
backup.2 and so on.
When a new backup is made, rsync links to files in
backup.1 if a file hasn’t changed since last time. The rsync command line responsible for this behavior looks like this:
$ cd /path/to/backup/dir/ $ rsync --archive --link-dest=../backup.1 SOURCE_DIR/ backup.0
Note that the trailing slash with
SOURCE_DIR is significant.
In case you don’t want to backup certain files (browser caches or the
~/.gvfs mountpoint come to mind), you can add exclusion patterns to rsync’s command line. Using the
--filter switch (or the more limited
--exclude-from), those patterns may be specified in a separate file, too.