Monday, May 15, 2017

A Tale of Three Git Filter Branches

Greg Hurrell (via tweet):

I used git-filter-branch to rewrite the history of the repo containing this website’s files, processing 4,980 commits and transforming 3,702 wikitext files to Markdown along the way. I wrote three separate versions: the first would have taken as long as 42 days to complete, the second perhaps 3 to 4 days, and the third and final version completed in about an hour.

[…]

That last one sure sounds the most elegant, doesn’t it? But it also obliges us to accept a reality about Git’s object database: it’s made to be blazingly fast for certain common operations (git status, git commit etc) but not others. For example, answering that question of “detecting when an item first entered the repository” could require you to traverse back from the current HEAD all the way back to the root commit of the repository, which could mean examining a thousands-long commit chain. And note, even if you know how Git works and seek to minimize the number of git processes that you need fork and the number of commits that you actually need to examine (eg. by limiting git log with a pathspec), Git’s internals will still need to traverse that thousands-long chain in the worst case.

Comments RSS · Twitter

Leave a Comment