Highly efficient file backup system based on the git packfile format. Capable of doing fast incremental backups of virtual machine images.
It uses a rolling checksum algorithm (similar to rsync) to split large files into chunks. The most useful result of this is you can backup huge virtual machine (VM) disk images, databases, and XML files incrementally, even though they’re typically all in one huge file, and not use tons of disk space for multiple versions.
It uses the packfile format from git (the open source version control system), so you can access the stored data even if you don’t like bup’s user interface.
Unlike git, it writes packfiles directly (instead of having a separate garbage collection / repacking stage) so it’s fast even with gratuitously huge amounts of data. bup’s improved index formats also allow you to track far more filenames than git (millions) and keep track of far more objects (hundreds or thousands of gigabytes).
bup is overly optimistic about mmap. Right now bup just assumes that it can mmap as large a block as it likes, and that mmap will never fail.
Because of the way the packfile system works, backups become “entangled” in weird ways and it’s not actually possible to delete one pack (corresponding approximately to one backup) without risking screwing up other backups.
Stay up-to-date by subscribing to the Comments RSS Feed for this post.