Backup on Linux: rsnapshot vs. rdiff-backup (vs. Time Machine)
Apple's release of Leopard and the included backup utility, Time Machine, has generated a lot of talk about backups recently. I will admit Time Machine is pretty cool and believe that it is a bit more than a glorified GUI on top of an existing *nix tool as some have claimed. However, the core functionality is very similar to the command-line tool, rsnapshot, which is itself based on a rsync script by Mike Rubel. Time Machine added a couple of features and a GUI to make it easy to use. Since I prefer the command line over GUIs most of the time anyways, rsnapshot seemed perfect for me.
To be complete, I researched a number of other backup utilities for Linux. Dirvish and flyback were out because I prefer the command line and they didn't seem to offer anything more than rsnapshot. Scripting rsync wouldn't get me anything more than rsnapshot either, plus it would be more work. In the end, I eliminated all but rsnapshot and another command line tool called rdiff-backup. Rdiff-backup has some advantages over rsnapshot (and Time Machine) because it stores compressed deltas of each version of a file instead of a complete copy of the file each time it changes. This is not a big deal for small files, but for large files that change often, it makes a significant difference. However, the big disadvantage of rdiff-backup, for me, was the inablility to perform different levels of backup, such as hourly, daily, weekly, monthly, etc. Depending on the needs of the user, this could negate the space saving advantage by needing to keep a large number of snapshots.
I ended up choosing rsnapshot over rdiff-backup because of this last reason. It seems rdiff-backup is closer to a version control tool and rsnapshot closer to a traditional backup solution. It would be great to create a hybrid of the two tools to gain the advantages of each. I started to dig into the source of rdiff-backup (python source makes me happy), but I didn't want to get too sidetracked with another project. For now, I am using Mercurial to version control my /etc
and /home/.*
config files and rsnapshot as a broader, general purpose backup tool.
Here is my comparison of rsnapshot and rdiff-backup:
Similar Features |
|
Written in | rsnapshot is written in Perl; rdiff-backup is written in Python and C |
Size | rdiff-backup stores previous versions as compressed deltas to the current version similar to a version control system. rsnapshot uses actual files and hardlinks to save space. For small files, storage size is similar. For large files that change often, such as logfiles, databases, etc., rdiff-backup requires significantly less space for a given number of versions. |
Speed | rdiff-backup is slower than rsnapshot |
Metadata | rdiff-backup stores file metadata, such as ownership, permissions, and dates, separately. |
Transparency | For rsnapshot, all versions of the backup are accessible as plain files. For rdiff-backup, only the current backup is accessible as plain files. Previous versions are stored as rdiff deltas. |
Backup levels | rsnapshot supports multiple levels of backup such as monthly, weekly, and daily. rdiff-backup can only delete snapshots earlier than a given date; it cannot delete snapshots in between two dates. |
Community | Based on the number of responses to my post on the mailing lists (rsnapshot: 6, rdiff-backup: 0), rsnapshot has a more active community. |
Reference: Chapter 7 in Backup & Recovery O'Reilly 2007
Comments
Hi, nice post. I'd be interested to read more about how you've set up version control on your home dir (could make a nice blog post? :)).
I haven't figured out a manageable way yet (using bzr) - there are always lots of new files appearing that have to go to your ignore-list, or something like that...
Thanks! YC
yungchin,
thanks. i am still refining the process of how to use the version control. i plan to post about it sometime soon. please let me know what solution you go with as well.
I use rdiff-backup since years to make backups over ssh, and i'am happy with it.
we used rsnapshot on our server, to make a backup over ssh. but it took a lot of cpu? also we did a crazy missconfiguration, as we wanted to stop the hourly backup, because of the cpu load. wenn you suspend the hourly backup, it will not make a daily... it just stoped to make any backup!
we did only see it by incident a half a year later, but we did not lose any data! luckely!
manuel, Thanks for the notes on your experiences.
Note on rdiff-backup, for transparency, there is a FUSE implementation called archfs that allows you to mount all snapshots in a read-only filesystem. It's a little rough around the edges, but I've been impressed now that I've got it to work.
It occurs to me that it should be possible to run rdiff-backup against the fuse-exposed snapshots, in order, skipping the ones you don't want. Haven't tried it, but it would be an interesting experiment...
Chris, thanks a lot for the information. Please feel free to leave a link to your notes if you've posted any.
We've been using rdiff-backup for a few months. Our requirements are for a cross platform, cheap, solution, so anything that required hard-link support couldn't be considered. Personally, I'd wipe all the MS-windows server machines, but our customers would be mad. We're mostly *NIX for production systems, with a few Windows dev VMs.
In that time, we've recovered from "oops" moments twice. Once was during an email system upgrade that couldn't be completed and the other was while screwing with an Alfresco system permissions model that failed in the end. Backups rock, but folks here already know that.
Has anyone solved the remote Windows backup issue in a good free way that can actually be restored without loss of user and group permissions?
rdiff-backup is mostly good, but there are a few problems. Large file differencing doesn't work in our experience. If it doesn't crash, you'll get a completely new copy. We tried backing up complete Xen image files this way. We got around this issue by mounting the IMG files on the host and pushing the rdiff-backup to another system on the same LAN. Complete system backups are 2-3 minutes now.
That works for Xen, but not VirtualBox with a Vista-64 host. Any ideas?
johnP, Thank you for adding your experience with rdiff-backup. I'm sorry I can't be of any to help to you. Maybe another visitor will have ideas.
I've been using rdiff-backup for my home system since March 2008 and am overall happy with it. I've had to restore single files a few times, and though that procedure is not _very_ straightforward, it's been easy enough to figure out how to do it the few times I've needed it. The FUSE file system seems interesting.
My biggest gripe with rdiff-backup is that backing up large files like VirtualBox disk images takes very long. Since there is no progress indication, and I do backups manually (when I see a need for it), this can be a bit frustrating. But I haven't experienced any problems with it like johnP writes, and the resulting increment files appear to be reasonably small.
Jakob, thanks for adding your experience also. Good to hear rdiff-backup makes you happy.
I have been using rdiff-backup for a long time and it seems to be working well.
I have been using rdiff-backup for a couple of years now with just a few restore operations and so far i am lucky with it.
Recently i did a small trial on changing meta data of media files such as comments on jpeg's or ID3-tags on mp3 or OGG files. Since rdiff-backup is storing only diffs even on binary files, i expected the diffs to be tiny. However, the diffs i got so far are roughly halve the size of the original file when i changed a few characters.
The background for this is the question whether media data should have a rsync or a rdiff backup. Currently I'm doing rsync on my pictures (jpg) and audio files (ogg). My plan was to change this to rdiff, basically to be save against deletion of the files. This would be a perfect solution if rdiff-backup would find the real differences in the file, e.g. the changed matadata. Otherwise it will mess up my backup system with a lot of unnecessary data. So for the time beeing i will keep rsync for media data and rdiff-backup for all the other files.
hi thanks for the write-up. i think i'll be going rsnapshot for gentoo server backup strategy. is it possible to get an rss feed for the comments of this post?
Leho Kraav: thanks, a comments feed is a good idea. i will try to add it this weekend.
UPDATE: I've created the comments feed. See above for the feed link.
is there a better way to test the feed than giving them kudos to the author!? i thinks not!
To Leho or anyone else who subscribed to the Atom feed: Sorry about the spam comment. I forgot to filter the spam from the feed. It should be fixed now.
I'd just finished setting up rsnapshot and then read about some glowing recommendations for rdiff-backup.
Thank you for your clear break down.. much time saved :) on what would have been a wild-goose-chase considering I just need a traditional backup set-up.
http://old.nabble.com/Differences-between-rsnapshot-and-rdiff-backup-td15282022.html
Did you write this too? This guy is referencing to the O'Reilly "Backup & Recovery" book.
Richard: yes, that was my message to the mailing list, before I posted this on my blog. Yeah I forgot to include that I used that book as a reference. I updated the post above to include the book reference and links to the discussions on the two mailing lists.
I have been using rdiff-backup for over 3yrs. It works well for etc, var, and home and I have restored home files because of user error a number of times; it works well but dont get an average user to do it.
I came up with and use a hybrid solution with rsnapshot for these PITA cases. I use rdiff-backup to take daily snapshots on the same host to a different disk, which are then rsync'd to a server (I know rdiff-backup can do this, but trial and sweat have chosen).
I then use rsnapshot for hourly and daily snapshots and let the user have at it. When 'they' do something and lose their 'most needed' file, they just look into the rsnapshot 'copies' and pullout the one they need. If they have to go back further than a week (which almost never happens), then I can dig it out of rdiff-backup 'copies' and dump it on their host.
Yes it takes a little extra space to have the extra 'copy' but for ease of use, this saves (my) time/effort. And just for fun, I still do quarterly tarballs incase doom sets in =).
I would love an hybrid of these two excellent programs, but can python and perl really be friends... Whats really needed is filemanager plugin to display the snapshots' on a right click context menu for a given file or directory.
Thanks for the excellent writeup. Whilst habitually following bu solutions, I find it pretty confusing re: pro/con of the multiple solutions.
Pretty well settled on rdiff-backup as I run five o/s on the same box and chrono backup is not very meaningful for me.
I would note that newbies to command line backup, probably want to prefix their command with 'time' to obtain elapsed data. Also, you really need to figure out the right parms for backup AND (test it !) RESTORE, then incorporate them into a script. Else, a great chance you will forget the parms ! One other recommendation: do a bare metal backup with something like clonezilla. You do not need to change it unless you modify your partitions, but great peace of mind having redundancy to recover from a disaster !
Howard addl argument for CLI vs GUI: I have had good luck with ctl-c to KILL cli backups, not so good luck with gui (Lucky Backup one of the exceptions).
Hey there,
I'm working on a disk-to-disk backup utility, called hdb (hard drive backup):
http://www.subspacefield.org/security/hdb/
I link to a variety of related tools, including duplicity, which may interest you.
I'm a security weenie, and you can tell from my choice of technologies (SHA-512 hashes and encrypted backup media).
If you're interested at all in high-speed disk-to-disk backups, then you might want to check out the tool and mlist. It's in alpha right now - so you can easily make suggestions that will affect its fate - but progressing rapidly.
Written 100% in pure ruby... no modules, nothing to install, just plop the script in ~/bin and you're done.
Thanks for the explanation. I've heard several recommendations for both of these and never was clear on their differences.
Has anyone looked at bigsync (C http://code.google.com/p/bigsync/) or BigSync (perl http://stefan.hoefer.ch/projects/bigsync/5-bigsync-an-overview) or xdelta (http://freshmeat.net/projects/xdelta/) for efficient large file backups?
I'm still looking for a cross platform, script-able way to deal with virtual machine image files.
We're still using rdiff-backup here. Complete VM backups only take about 2-3 minutes per day, but changes in our VM technology is forcing a revisit. I'm doing performance testing of some alternate solutions now and will post results to my blog when they are complete.
Nice comparison. After reading your blog I think I'm much more inclined right now to using rdiff-backup as my preferred backup software
For folks interested in a GUI, Back-In-Time seems to be rsnapshot with a GUI. Deployed this on Mom's Linux machine and it is working very nicely with hourly, daily, weekly and monthly snapshots. Nice if you are limited to *NIX file systems. The automatic "smart" snapshot management of Back-In-Time is nice.
Duplicati is a GUI for Duplicity. I found Duplicati to be extremely slow for both Full and incremental backups. That's an understatement, actually. 8.5 hours for a 100GB backup seems excessive to me.
The only GUI I've seen for rdif-backup is a web server-based solution. Ew.
And in limited tests, rdiff-backup on Win64 to Win64 (both truecrypted disk partitions) of 18GB virtual machines does only store the differences involved. Truecrypt isn't really important to the tests, since both partitions were mounted to drive letters during the test.
Anyway, I wrote up my experiences with Duplicati vs Rdiff-backup in a blog entry.
I would like the hard link-based implementation of rsnapshot for independent backups that would be simpler to recover. However, I ended up rejecting it because it is based on the seriously flawed include/exclude rules of rsync (sensitivity to trailing '/'s, etc, IIRC) which are confusing, easy to get wrong, and often difficult to get right, even when you think you understand it. For compatibility reasons, this design flaw in rsync will never be fixed.
In contrast, the author of rdiff-backup, using librsync rather than rsync itself, was free to implement a different, more straightforward approach to specifying files to be included or excluded, and did so.
This was for me a deciding factor, and I think it deserves a place in your list.
The only minor gripe I have with rdiff-backup is that it's rather slow, but that's not a major issue if you can leave it running, e.g., at lunchtime.
As an update on the last post, I just noticed that another Mike made the same point about ease of use at http://www.mail-archive.com/[email protected]/msg02917.html
"Unlike rsync, rdiff-backup was written originally for backups. It has sensible defaults (so no need for the -av -delete -e > ssh options) and fewer quirks (for instance, no distinction between
<destination>
,<destination>/
, and<destination>/.
)."
A subsequent commenter pointed out the same approach I ended up using to cope with the fact that intermediate rdiff-backups could not be deleted (i.e. you have to keep all the incremental backups back to the oldest one you need to keep): use separate repositories for short- and long-term backups.
I would still prefer an rdiff-backup implemented using hard links, but would not use rbackup and its crufty, embrangled rsync interface.
Great summary -- thank you. Had I found your writeup earlier, I would have made my choice faster. I came to the same conclusion for the same reasons and am liking rsnapshot. rdiff-backup seemed better in most ways, but the inability to preserve my oldest backups (when space becomes tight) was a deal breaker.
hey, I just evaluate buying a new linux fileserver for home. I am using dirvish since years and it works fine for me. Your observations about dirvish ring a bell - its a useful and well-done tool and easygoing but a bit awkward to setup. The new hardware options I evaluate (synology ds212j or netgear readynas duo) seem to slightly play better with rsnapshot.
conclusion: your post showed up in google on firstpage for "dirvish rsnapshot" and your summary was an excellent thinking I can build upon! thanks!
__I will go for rsnapshot__.
(and for home server hardware: __synology. there are cookbooks to get rsnapshot running, but as its just a perl script piggybacking rsync, it should run on anything linux. I was looking for a NAS and backup box for less than 200$ and it has 15Watt power drain with a fairly decent linux on top, so synology floats my boat)
and "storebackup"? anyone use it?
I use rsnapshot too and I love it. I do like the fact that rdiff-backup makes deltas of files and I am thinking in the near future I might implement both (since I like the daily, monthly, etc ...).
My idea is:
rdiff-backup to pull in the files, and do an rsnapshot on the output of the rdiff-backup directory.
deploying these backups back out in the case of disaster recovery might be a small pain ... but not too hard for a small shell script wrapper to handle.
Thanks for the cool review.
I ran rdiff-backup for years, but the requirement that both the server and the client have to run compatible versions of rdiff-backup proved to be just too frustrating.
I have moved to rsnapshot because it has no such requirement. I have also come to prefer the very transparent hard-link approach that rsnapshot uses.
Long read, but worth it. I started out with rdiff-backup, but as I'm planning on setting up a solution that can backup my entire home dir (~500 GB and growing), speed IS a factor for me.
As well, the factor that I'll be able to delete individual revisions is a big plus for me. (Seems I'll have to exclude virtual machines though, as I imagine they will make the backups grow very large - but that's not a deal breaker for me, as I usually use VMs to do Windows Testing, and nothing more.)
setup: several linux computers (ubuntu 10.04 lts) running in home office / family home , windows running on virtualbox vm's only. Notebook with 1 TB harddisk primary work-machine, often out-of-the-office at customer's, since the need for offline collaboration. Server is now 7 years old, but still sufficient with 1TB data disks and 2TB hot-swap backup-disks. Drawback: Data consumption EXPLODES whenever directories are renamed.
Using mercurial, unison, rdiff-backup and tar for 5+ years now (since I abandonded Windows), and being VERY pleased with all solutions.
Before rdiff-backup had ALL my personal data (source code, scans, letters, projects, spreadsheets, pictures, fotos, measurement data files...) in yearly folders (e.g. "2007-thomas", "2008-thomas", ...), each of those is a mercurial "project" (so "2007-thomas/.hg/..." etc) and on demand synchronized to a server (mercurial on apache), also used to collaborate a yearly folder over several machines, e.g. when I do work (edit documents, add scanned correspondence) for my wife andrea (e.g. get folder "2013-andrea" to my machine with hg pull, hg update, then edit the documents, when I'm done i push it back to the server, then my wife pulls it back to her computer again and does a merge). This works also nicely when off-line from server, e.g. she works at her office and I'm working with her folders at the same time...).
Because yearly folders got to big for mercurial (4 GB ++) to handle, switched to rdiff-backup and yearly folders with mass-data (e.g. 2013-thomas-massdata containing, photos, family videos, large measurement data files, etc). These are backed-up on demand using rdiff-backup.
Un-personal data (software installation packages, documentation) stored in file-system only on same server, excerpts synchronized to mobile computer using unison only (e.g. documentation I need to take to work with).
Backing up EVERYTHING above on server on backup-harddisks (rotating schedule) inserted into hot-swap drive bay using just UNISON. No need for historical incremental backups since all version history is already maintained in the mercurial repositories, rdiff-backup folders and virtualbox snapshots. Excerpt from unison preference, /srv/data is all the data to back-up, /srv/bak2 is the mounted backup drive destination, keep changed or deleted files in /srv/bak2/data.unisonbak root = /srv/data root = /srv/bak2/data force = /srv/data backuplocation = central backupdir = /srv/bak2/data.unisonbak
Backup of linux system root partition done with tar and rdiff-backup. All the above personal/un-personal data is NOT in the home directory, only soft-linked to as it resides on another partition, system partition is 32GB with i.e. 6+ GB space used up by the linux system. 1. on notebook boot to live-linux ubuntu 10.04 on usb-stick or cd 2. backup complete linux root file system (mounted -o ro in /mnt/src) to an UNCOMPRESSED tar on another local partition on the same notebook (mounted in /mnt/dst) tar --create --verbose --preserve --numeric-owner --one-file-system --show-omitted-dirs --file "$DSTDIRPATH/pb06-032ubu_rootfs_offlinebackup.tar" /mnt/src | tee -a "$LOGFILEPATH" 3. on server rdiff-backup the uncompressed tar into a sysbak folder for this specific notebook #BACKUPSRCDIR='/mnt/sda8_rootfs-offlinebackup/pb06-032ubu_rootfs_offlinebackup' BACKUPSRCDIR='[email protected]::/mnt/sda5-sysbak/pb08-006ubu_rootfs_offlinebackup' BACKUPDSTDIR='/srv/data2/sysbak/pb08-006ubu/pb08-006ubu_rootfs_offlinebackup_rdiffbackup' nice rdiff-backup -v5 --print-statistics --preserve-numerical-ids --exclude-other-filesystems ${BACKUPSRCDIR} ${BACKUPDSTDIR} Example of todays incremental backup's compression data: size of previous (2 year old!) tar=8.1GB, size of previous rdiff-backup directory /srv/data2/sysbak/pb06-032ubu=8.6GB size of new tar=11.9GB, size of previous rdiff-backup directory /srv/data2/sysbak/pb06-032ubu=12.0GB
EXCELLENT compression ratio!!!
Has anyone checked out rbackup? Their aim is to combine the simple automation features of rsnapshot with the disk benefits of rdiff-backup. I know this is an old thread, but these issues are still relevant and not definitively resolved. I set up rsnapshot as an alternative to my lab's time machine for a mac os x file server a few weeks ago and have been pretty happy. Subsequent research has pointed me to rbackup as being even better, but I haven't found many reviews of it yet.
My lab is also considering using a cloud file service like dropbox or google drive for education, but I don't trust them to keep the only copies of our data. I'm considering using them as the primary network drive for non- sensitive data, and having a backup fileserver do pulls from those services and use rbackup to keep snapshots of it.
disqus:2710689250
I saw there was a post on Hacker News recently that mentioned some other alternatives: https://news.ycombinator.co...
disqus:2710718498
Great link, thanks!
For the record, I'm now enthralled by BorgBackup and will pay no more attention to rbackup. I found a conversation on reddit with enough reviews of other backup tools to convince me.
https://www.reddit.com/r/li...
disqus:2710881566
Thanks for the Reddit link! BorgBackup sounds like the way to go. I will have to check it out also!
disqus:2711836249