Monday, February 6, 2017

GVFS (Git Virtual File System)

Saeed Noursalehi (via Peter Steinberger):

Here at Microsoft we have teams of all shapes and sizes, and many of them are already using Git or are moving that way. For the most part, the Git client and Team Services Git repos work great for them. However, we also have a handful of teams with repos of unusual size! For example, the Windows codebase has over 3.5 million files and is over 270 GB in size. The Git client was never designed to work with repos with that many files or that much content. You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.

Even so, we are fans of Git, and we were not deterred. That’s why we’ve been working hard on a solution that allows the Git client to scale to repos of any size. Today, we’re introducing GVFS (Git Virtual File System), which virtualizes the file system beneath your repo and makes it appear as though all the files in your repo are present, but in reality only downloads a file the first time it is opened. GVFS also actively manages how much of the repo Git has to consider in operations like checkout and status, since any file that has not been hydrated can be safely ignored. And because we do this all at the file system level, your IDEs and build tools don’t need to change at all!

Note that the initial comments refer to “NFS+” when they mean “HFS+.” I doubt the choice of file system is the problem here. Some of the comments are interesting, though.

Previously: Facebook Makes Mercurial Faster Than Git.

Update (2017-02-09): Brian Harry (via Hacker News):

GVFS (and the related Git optimizations) really solves 4 distinct problems[…]

[…]

Looking at the server from the client, it’s just Git. All TFS and Team Services hosted repos are just Git repos. Same protocols. Every Git client that I know of in the world works against them. You can choose to use the GVFS client or not. It’s your choice. It’s just Git. If you are happy with your repo performance, don’t use GVFS. If your repo is big and feeling slow, GVFS can save you.

Looking at the GVFS client, it’s also “just Git” with a few exceptions. It preserves all of the semantics of Git – The version graph is a Git version graph. The branching model is the Git branching model. All the normal Git commands work. For all intents and purposes you can’t tell it’s not Git. There are three exceptions.

2 Comments RSS · Twitter

"That sounds like a great idea!", said Clearcase in 1997.

[…] GVFS (Git Virtual File System), The Largest Git Repo on the […]

Leave a Comment