The Dissertation Behind rsync
Andrew Tridgell (PDF, via @CompSciFact):
This thesis presents efficient algorithms for internal and external parallel sorting and remote data update. The sorting algorithms approach the problem by concentrating first on highly efficient but incorrect algorithms followed by a cleanup phase that completes the sort. The remote data update algorithm, rsync, operates by exchanging block signature information followed by a simple hash search algorithm for block matching at arbitrary byte boundaries. The last chapter of the thesis examines a number of related algorithms for text compression, differencing and incremental backup.