Using git for Backup is Asking for Pain

December 08, 2009 at 01:10 PM | Version control | View Comments

git isn't a backup system.

Neither is Mercurial, Bazaar, Subversion or even (even) CVS CSV.

Version control systems, with the possible exception of SourceSafe, are great at keeping track of code. Why is that? Because they were designed to keep track of code.

Unfortunately, though, the features of a good VCS are entirely different – and often exactly the opposite – of the features which make a good backup system.

Take, for example, file ownership. A good VCS will, very rightly, ignore file ownership: when I check out someone else's code, I should be the owner of those file - not whatever uid originally created them. A good backup system, on the other hand, will do everything in its power to preserve file ownership: when I restore from my backups, I want /etc/shaddow to be owned by root and /home/wolever/ to be owned by wolever.

And ownership is just one example - permissions†, creation and modification times, empty directories‡, hardlinks, xattrs, resource forks, … the list of details that a backup system must keep track of goes on and on.

In fact, there are so many things a backup system can get wrong, there is a project called Backup Bouncer, designed specifically to verify that backup scripts correctly copy all the various bits of metadata tracked by the filesystem.

So, please: if you value your bytes, use a real backup system, not git.

†: Most VCSs only track the 'x' bit - for backup purposes, all bits, including suid bits, must be tracked.
‡: fun fact - Mercurial and git don't track empty directories, but Bazaar does.