A Tale of Three Git Filter Branches
Greg Hurrell (via tweet):
I used
git-filter-branch
to rewrite the history of the repo containing this website’s files, processing 4,980 commits and transforming 3,702 wikitext files to Markdown along the way. I wrote three separate versions: the first would have taken as long as 42 days to complete, the second perhaps 3 to 4 days, and the third and final version completed in about an hour.[…]
That last one sure sounds the most elegant, doesn’t it? But it also obliges us to accept a reality about Git’s object database: it’s made to be blazingly fast for certain common operations (
git status
,git commit
etc) but not others. For example, answering that question of “detecting when an item first entered the repository” could require you to traverse back from the currentHEAD
all the way back to the root commit of the repository, which could mean examining a thousands-long commit chain. And note, even if you know how Git works and seek to minimize the number ofgit
processes that you need fork and the number of commits that you actually need to examine (eg. by limitinggit log
with a pathspec), Git’s internals will still need to traverse that thousands-long chain in the worst case.