Sunday, July 15, 2007

Subversion to Git

I never liked CVS—the limitations and design flaws become obvious almost as soon as you start using it—but after years of use it was a familiar model that worked well enough in practice. Then came Subversion. It worked like CVS but fixed the glaring problems, providing atomic commits, the ability to move files and folders, checksumming, revision numbers that represent whole trees, and easier tagging and branching (albeit by shoe-horning them into the tree). Switching was an easy decision. I remained happy with it, especially with the more reliable FSFS backend and the new working copy format. I was aware that Subversion had some problems, but none were too bothersome, and as a solo developer I knew I didn’t need a distributed version control system.

Then I heard about Git when I happened to watch a video of its creator, Linus Torvalds. It had not occurred to me that CVS and Subversion were fundamentally broken. Torvalds is undeniably a smart guy, but he’s also known for his bluster. I’d heard Git mentioned a few times before, usually in the context of it being difficult to use, something only a kernel developer could love. So I was skeptical but interested enough to try it out.

What I found is that Torvalds’s bragging is justified. Learning about Git after using other version control systems is somewhat like learning a new programming language that’s radically different from what you’ve known before. Even if you might not need the more unusual features most of the time, you feel as though your eyes have been opened, your mind expanded.

Beyond just the distributed model, Git’s implementation is beautiful. It stores the repository in such a logical way, using far fewer files and much less disk space than Subversion. Each file, no matter its location in the tree or history is only stored once. The repository is stored in a single .git folder at the top level of your working directory. In addition to its repository, Subversion needs a .svn folder inside each folder of the working directory. The working directory holds two extra files (one for the data and one for the properties) for each file, plus some more file and folders for each directory. Git needs none of these. It’s more efficient, and you don’t need to make sure all your tools have special handling for the invisible .svn folders.

Git’s implementation is compact. It doesn’t rely on a ton of other libraries, and the tar.bz2 file is only 1.1 MB. Compiling Git took only 15 seconds on my Mac, compared to 104 seconds for Subversion.

The user interface is currently a bit rougher than Subversion’s, but it’s not too bad, and some things are actually nicer. Unfortunately, the man pages and manual don’t build out of the box on Mac OS X, so I recommend reading them online, along with the tutorial.

In order to migrate from Subversion I needed to install LWP:

sudo perl -MCPAN -e shell
install Bundle::LWP

and a different build of Subversion that included the Perl bindings. With /usr/local/lib/svn-perl in my PERL5LIB I was then able to use the git-svnimport import command, but I found that this didn’t work very well, perhaps because one of my Subversion repositories didn’t use the traditional branches/tags/trunk structure. I had much better luck with:

git-svn init -T <svn-repos> <svn-repos>
git-svn fetch

So far I’ve moved two Subversion repositories to Git. Before I move my main development repository I want to write some shell aliases to make it a bit smoother to use, and some scripts to integrate it with BBEdit.

39 Comments

I recently blogged about moving to git too, yet not as in details as yours. I'm also checking out Mercurial at the moment too.

Thanks to Mark Grimes of stateful.net for opening my eyes to this :D

I just recently left Subversion as well. I've moved to Mercurial, which is fantastic. Have you taken a look at Mercurial? How is Git different than it? Either way, distributed scm rocks.

Mercurial was my second choice but, Mercurial lacks lightweight branches and does no content hashing.

I recommend the MacPorts installation of Git:
port install git-core +svn +doc [+darwin_9 if 10.5]

Otherwise all my projects are housed under git (some are sizable, like MacPorts), been using it everyday for a solid month... finally feeling like I'm out of the romance phase, but it's such a vast improvement from SVK which I've been using for the last few years that I feel like my scm is working for me instead of against me -- and most of all, I'm not afraid when my branches diverge from trunk anymore.

Mercurial also looks good, but I like Git's storage system and branching better. It also seems to be faster, and I think it will have more support going forward.

Michael,

Git 1.5.3 will introduce a command called git-stash which I find very appealing for all those moments at the office I'm forced to fight a fire on another branch when I'm not ready to check my current work in as a functionally-atomic commit.

Another nice feature is the delta compression, which seeing is believing considering the fact that a git repo and working copy are a single entity unlike svn.

Does this work from within Xcode?

> Mercurial lacks lightweight branches

I don't really see what you mean by that, Mercurial has both "full branches" (repository clones) and "named branches" (branches within a repository)

> does no content hashing.

Of course it does

Nice summary. Now if you could do a comparison with darcs... :)

Wincent Colaiuta is moving to Git, too, and he has a nice post on Torvalds vs. Bram Cohen (of BitTorrent and Codeville).

It's strange to me that hg and git seem to have so much more mindshare than bzr... While speed has historically been bzr's weak spot, it's arguably more featureful and has better cross-platform support than either (git being linux-centric and hg's reliance on gnu diff complicating bsd support). And as far as speed goes, bzr handles the average project (e.g. not the linux kernel or mozilla trees) just fine. There is a reason canonical/ubuntu uses it, and if you're curious as to why an organization of their scope would choose bzr over git or hg, I suggest giving it bzr a test-drive. You may be pleasantly surprised.

I've been happy with darcs and running ``darcs record'' often as I progress, and ``darcs push'' to save the changes offsite as well.

After the summer I'm thinking of importing my darcs repositories to the new version, when its released. It supposedly will fix some bugs.

I'm open to moving to git though as needed, some time down the road.

Thanks for the post. Just to pile onto the inevitable "what about...": where you say "other version control systems" you really seem to mean "centralized version control systems". Given that a lot of your comments about the superiority of Git over Subversion would apply to almost any other DVCS, it would be interesting to know which of those you looked at and why you found them less desirable.

Given the popularity of Subversion, I consider interoperability with it a must-have feature of any distributed source control system.

I wrote a tutorial on how to use Git and Bazaar with Subversion:
http://info.wsisiz.edu.pl/~blizinsk/git-bzr.html

Two days ago I though I'd check out Darcs. I began compiling it on my FreeBSD system (it required compilation of a bunch of dependencies as well.. including the heavy heavy GHC compiler). Went away to watch One Upon A Time In America (it lasts 4 hours). When finished I went back to the computer only to find out GHC was still! compiling itself.

Fuck this I though and went ahead with GIT. It compiled in less seconds than I was prepared for. After looking around for ways to move from SVN to GIT I was suprised to find IT actually came with the necessary tools (git-svnimport) which worked great (after compiling subversion-perl). I recemmend using git-svnimport with the -m (if you have branches) flag and the -A flag (check man page or online man page). So far I'm liking it a lot. Still a lot more documenation to read through but not even close to the entire book you need to read to learn SVN if you don't already know it.

I wasn't impressed with Subversion, but then it *was* designed to be a replacement for CVS. After using SVN for a couple of years, I gave Git a try and though it takes some time to get your head around if you've never used distributed VCSes before, it is spectacular. I've never had a complete feeling of ease with CVS or SVN -- I really didn't have much trust in them. I've had no problems with Git, it's sensible, fast, and I trust it to not screw up my sources. I just wish it integrated with Xcode.

I am using VSS. It is great.

Masklinn,

Cool! I can honestly say my knowledge of Mercurial comes from reading comparisons and not trying each one out individually. I guess Mercurial has mostly caught up to git's features. My sources of information for the Git/Mercurial shoot-out ventured a few places:

http://tytso.livejournal.com/29467.html
http://utsl.gen.nz/talks/git-svn/intro.html#hg-rulz
http://jaredrobinson.com/blog/?m=200701

I didn't spend the time to contradict the research I read on Mercurial as much as Git since I was more concerned with speed and storage model over portability (I don't have to run Windows) and Git did not introduce any showstoppers to make me venture to my second choice in detail.

But considering the sources are documented from only a few months ago tops, I'd say Mercurial needs to do a better job of showing off their features because everything I read of comparisons in mid 2007 seemed to suggest otherwise. Of course maybe Mercurial documentation addresses all this, sadly I was looking for a shorter path in picking a SCM then having to read each version control system's docs from A-Z.

@pp64 -- sounds like it may be moot for you now, but the Linux binary of Darcs works fine on FreeBSD.

Sean Kelley

Mercurial is preferable for me because it works on multiple platforms. It is also far more user friendly than GIT - especially for developers coming from Subversion. It also has Eclipse plugins.

Sean

um, surely this article should've detailed more than how it stores its metadata and how fast it compiles?

darcs is just too slow. that's why i am switching to git now.

This is one of the most pathetic articles I've ever seen. All you do is point out how fast git compiles and how it stores less metadata. These things are almost completely irrelevant in comparing versioning systems. Fail. I have no idea why this was on digg, it's a shame.

lofi and inboulder: This is not an article. It's a blog post that mentions some of the things I noticed when switching to Git, most of which I'd not seen emphasized elsewhere. There are plenty of good Web pages that compare various version control systems and that explain the benefits of distributed version control, plus the Git manual and Torvalds talk at Google that I linked to. My goal was to get interested people to look at Git, not to rehash or synthesize what's already out there.

How it stores the data is probably the most important part of a version control system. How fast it compiles is relevant as a rough metric of the simplicity, especially since compiling Subversion is a well-publicized benchmark among Mac developers.

Where I work we are stuck using Subversion because of one reason: TortoiseSVN. There are several developers who use Windows, and nothing beats Tortoise's clicky interface.

What's up with the character encoding in the git manual? I can't read any of the examples! All the spaces appear weird.

At least the SVN book is readable.

João Marcus

As Diego said, TortoiseSVN is the major reason why I choose SVN at work. The problem is, git/hg/bzr/etc developers underestimate the importance of a nice GUI.

Subversion user and FlickrExport developer Fraser Speirs looks at Git: “I use externals a fair bit in my SVN projects, and I’m not yet certain how one could replicate them in Git.”

I think he missed mentioning all the best parts of Git (e.g. content tracking, cryptographic history, delta compression), yet he did bring up the one major thing that Git lacks currently (svn:externals). He also undervalued decentralization which is common... I'm certainly interested to see his uncertainty and indifference diminish with a more thorough investigation of the toolkit -- but the great part about us being decentralized especially in community projects is how well systems like Git work behind a Subversion server. I remember when I had to fight to the bitter end for Subversion instead of CVS... now you just feel sorry for the poor souls merging trees from the Subversion client, even with svnmerge.py as it's not without limitations either.

Bill Bumgarner on Git and Subversion: “Git will win because it is about 8 bazillion times easier to use because it doesn’t scatter administrivia crap throughout your work area. This is just so fundamentally the right way to do stuff.”

More from Wincent: “Subversion is awesome, in its own way, but it ain’t going anywhere. Git, on the other hand, definitely is going somewhere, and in fact has already gotten so far ahead of Subversion that I can’t imagine Subversion ever catching up.” That’s pretty much why I switched sooner rather than later. And, of course, it was nice to free up some gigabytes on my notebook drive and reduce by a zillion the number of files that SuperDuper has to deal with.

Here’s a response from a Codeville user.

Hey Git-Guys :-),

I think it is not really important how much disk space a repository needs or how beautiful an implementation is compared to svn, because git works in a totally different way and it depends of you development model if it works for you. There are features in svn you'll never need in git (like access control) and there are features in git you'll never need in svn (e.g. lokal branching).

So if you want a single repository, you'll get a easy way to show the people what is the current development version of the software and you'll get a single point of failure.

If you want to be independent from other developers and maybe from a server git maybe you favourite, but it can be a bit more stressfull to insert patches from a lot of different people.

In my opinion centralized version control systems works fine in a usual small and mid-size development team.

if you have a hierachical organized and you have to give your patches to a middleman, distributed version control systems are a good way to do it.

Greetings from a stupid and ugly svn (and git) user :-)

Patrick: I'm a solo developer, so I don't really need Git's distributed features, but the reliability and efficiency of its storage model are very important to me. With a large repository (particularly one with many files) it makes a huge difference in speed. The space efficiency matters when I'm using a relatively small laptop drive.

I also worked with cvs/svn and never knew they were broken until I heard Linus presentation at Google.

One think that I was not expecting was how easy it is to create and use a Git repository.

I think that every programmer that uses the centralized paradigm should really give it a try.

Yes, VSS is amazing! It even integrates with Visual Studio.

Ok ok, being serious. I use git as well. It has really given me a tool with a wide application range. Need more visual tool support though.

Git definitely got many things right, but most SCM systems and git is the rule not the exception miss the most important part. While it is vital for the linux devs to get the command line tools up and running, you will get only a huge acceptance if you get the tool integration right. That is one reason why people still use VSS despite being utter trash, that is the reason why so many people stuck to CVS while being flawed and TortoiseSVN and the early Eclipse integration are the reasons why this system has taken over CVS. It the tool integration had been there realitvely early SVN still would have its place of possible CVS successors instead of having taken over CVS.
Git is an impressive piece of technology, but having a few tcl tk clients which ease some work is not enough, you need VStudio.Net integration, Eclipse integration even is more vital and Netbeans integration to its string rails support is a must have.
The only IDE which has some integration currently is Intellij....

I have been happy with subversion for about a year now and have recently discovered git. Sounds great but to re-enforce Werner's point we heavily use the integration tools for Visual Studio and Eclipse. Without these tools git is unfortunatelly out of the question.

I think the people asking for Visual Studio and Eclipse integration are missing the point. Developers who work on real projects don't use IDEs. They use vim or maybe emacs. They also don't run Windows, so Windows support is irrelevant. And although I have used Eclipse and liked it- it only makes sense for Java development - which nobody could pay me enough to ever do again.

@Patrick:

actually, access control sometimes is needed, and it can be done with some scripts around git:

a) limiting repo access to certain users, even when remote git runs as a single user:
put a wrapper into .ssh/authorized_keys which checks whether up- or download is allowed

b) updating individual refs within a repo can be limited by the hooks/update script.

Feel free to mail me for more info :)

One thing I didn't catch yet is limiting visibility to certain remote refs (eg. some user shall only see certain branches).
Any idea ?

cu

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment