Archive for July 16, 2015

Thursday, July 16, 2015

Unpacking Git Packfiles

Aditya Mukerjee (via Hacker News):

I’ll count packfiles as the third strategy that Git uses to reduce disk space usage, even though packfiles were really created to reduce network usage (and increase network performance). It’s helpful to keep this in mind because the design of Git’s packfiles were informed by the goal of making network usage easy. Reducing the disk space needed is a pleasant side effect.

[…]

The packfile starts with 12 bytes of meta-information and ends with a 20-byte checksum, all of which we can use to verify our results. The first four bytes spell “PACK” and the next four bytes contain the version number – in our case, [0, 0, 0, 2]. The next four bytes tell us the number of objects contained in the pack. Therefore, a single packfile cannot contain more than 232 objects, although a single repository may contain multiple packfiles. The final 20 bytes of the file are a SHA-1 checksum of all the previous data in the file.

The heart of the packfile is a series of data chunks, with some metainformation preceding each one. This is where things get interesting! The metainformation is formatted slightly differently depending on whether the data chunk that comes after it is deltified or not. In both cases, they begin by telling us the size of the object that the packfile contains. This size is encoded as a variable-length integer with a special format.

[…]

While it’s possible to to work around the aforementioned buffering issues and parse a packfile without ever reading the IDX file, the index makes it a lot easier. Like the packfile, a version 2 index file starts with a header, though the index file header is only eight bytes instead of 12. […] After the header, we encounter what Git calls a fanout table.

Aditya Mukerjee:

I discovered this while working on a clean-room implementation of Git in pure Go. While there are a lot of references to packfiles online, surprisingly, the actual format of packfiles was rather underdocumented. Most resources just mention that they exist, and describe how to use git verify-pack to inspect a packfile, without explaining how to parse packfiles and apply deltas.

I decided to write this up to save others the trouble of having to reverse-engineer it from scratch!

Obergefell v. Hodges: the Database Engineering Perspective

Sam Hughes (epic 2008 post):

Altering your database schema to accommodate gay marriage can be easy or difficult depending on how smart you were when you originally set up your system to accommodate heterosexuality only. Let’s begin.

[…]

No matter how advanced and flexible your table structure, it will always be possible to create data which cannot fit into it. At that time, you will need to change your database. And the longer it’s been since you did, the less pleasant that’s going to be.

The lesson is not “prepare for every possible eventuality”. The lesson is to become comfortable and confident in modifying your schemata without losing data, and rolling back botched changes. Do this regularly, so that it becomes second nature. The lesson is to get used to change.

And what is true of our databases is also true of our world views. The future is vast and humans are creative. Things are going to happen which nobody could predict.

Sam Hughes (2015 update):

To investigate the specific ramifications of today’s ruling, however, here’s the schema we’re probably starting with:

[…]

Already the constraints on a schema like this are quite complicated. husband_id and wife_id are both foreign keys for column people.id. Check constraints ensure that the value of marriages.husband_id always points to a people row with gender set to “male” and the value of marriages.wife_id always points to a row with gender set to “female”. (Exactly how the gender column should be structured is outside the scope of this essay, but the values “male” and “female”, at least, should be available. Structuring the name column is even further out of scope, because yikes.) divorce_date is nullable. Probably there ought to be another check constraint which ensures that divorce_date doesn’t come before marriage_date.

It might be required to incorporate some sort of check for duplicate combinations of husband_id and wife_id… but then again, this could make it impossible for a couple to e.g. marry in 1994, separate in 2009 and then remarry in 2015.

[…]

But the more interesting thing is that you just incidentally let in a whole bunch of edge cases. Up until now, it wasn’t possible for an individual to marry themself. Now it is, and you need a new check constraint to ensure that partner_1_id and partner_2_id are different. Regardless of concerns about duplicate rows/couples remarrying, you also now have to contend with swapped partners: Alice marries Eve, and also Eve marries Alice, resulting in two rows recording the same marriage. This can typically be prevented by ensuring that partner_2_id is greater than partner_1_id, which would incidentally also prevent self-marriage as described above. Note that this could in turn invalidate previously-existing heterosexual marriages where the husband_id was lower than the wife_id. This constraint would have to be applied for future inserts only, or the disordered rows would need to be swapped.

Feeder 3.0

Steve Harris:

It’s now possible to share Feeder’s library with cloud services such as iCloud Drive and Dropbox by placing the library folder in the appropriate location, thanks to a new library format — there is a new Move Library command in the Feeder menu to do that for you. Feeder automatically updates as soon as it detects a change has been made.

[…]

With this release, I have decided to stop offering Feeder for sale on the Mac App Store, so can offer the same upgrade deal to all customers, along with the fastest updates and best service I can possibly provide.

See also: BBEdit Leaving the Mac App Store.

Update (2015-08-14): Steve Harris (comments):

Even though not all my sales go through the App Stores, Apple’s 30% cut far exceeds what I pay the UK government in Income Tax and National Insurance each year, and for that I get things like healthcare, pension, education, transport, emergency services, defence, etc. To think of it another way, if I add up all the money they’ve taken since the store’s launch in 2011, it could pay my rent for almost 7 years.

FastSpring, who process my direct sales, take 10%. They don’t promote or review the apps, host downloads and so on, but they do handle things like regional sales taxes and allow the developer to know who their customers are, process refunds, etc. Developers in business before the Mac App Store know firsthand that you can do it cheaper, with more control and flexibility AND provide better service to your customers by selling your apps yourself.

Steve Harris:

Following my post earlier in the week on how Apple’s 30% cut of all Mac App Store sales is threatening the very existence of this business, I have decided to take action and introduce some transparency to the pricing. If Apple wants a 30% markup to everything they sell through the Mac App Store, that should be obvious.

Yes, Apple does allow the Mac App Store price to be higher than the direct price.

Cmd-Number Shortcuts for Safari 9

Daniel Jalkut:

In Safari 8 and earlier, keyboard shortcuts combining the Command key and a number, e.g. Cmd-1, Cmd-2, Cmd-3, would open the corresponding bookmark bar item. So if you arranged your most-frequently-visited sites in the first few bookmark bar slots, you could easily jump to those pages by muscle memory thanks to these shortcuts.

In Safari 9, these shortcuts now switch to any open tabs you have in a Safari window. This will come as a surprise to folks who have gotten used to e.g. using Cmd-1 to quickly jump to e.g. Google News, or Yahoo Stocks.

The implicit shortcuts for bookmark bar items are still available, but you have to add the option key into the mix. So where you used to press Cmd-1, you must now press Cmd-Opt-1.