Saturday, October 26, 2013

Exploring the New iWork File Formats

I think the new file format is a regression, though. I would love to know the justification for these obfuscated data files, and what advantages they bring over the previous XML-based format. I’d love to be able to tell you what advantages they bring, but they’re unreadable. This isn’t yet a problem for end users, aside from the lack of backwards compatibility, but it might be in the future.

No more documented XML format or included PDF version, which was much better for previewing than Quick Look. Note that Apple is not eating its own dog food here. The file format does not use property lists, NSKeyedArchiver, or Core Data.

Update (2013-10-27): Drew McCormack:

They have split that potentially large XML document into many small binary files. Each file can now be loaded in isolation, and this is much better for iOS. Effectively, they have built a partial-loading document format. Closer inspection shows that each slide is a separate file, so they can just load what is on the current slide, and leave the rest on disk.

This makes a lot of sense except that Core Data is already a reasonably compact, partial-loading document format. It has efficient syncing support as a built-in feature, and the underlying SQLite format is robust and open. On paper, Core Data is what an app should use in this case. Yet the iWork team apparently had so little confidence in Core Data (or perhaps the iCloud portion) that they invented a whole new file binary format.

Update (2013-10-29): Drew McCormack:

Apple is apparently using Google’s Protocol Buffers for iWork’s file format.

Update (2013-11-08): Sean Patrick O’Brien has an in-depth look at the new file format:

Components are serialized into .iwa (iWork Archive) files, a custom format consisting of a Protobuf stream wrapped in a Snappy stream.

Core Data iWork Keynote Mac Mac App Numbers.app Pages.app

6 Comments RSS · Twitter

Michael Tsai - Blog - Numbers ’13 Performance

October 27, 2013 7:24 PM

[...] improvements. After all, iOS devices have less RAM and slower processors. In theory, iWork’s new binary file format should also be smaller and faster than [...]

has

October 28, 2013 10:47 AM

"They have split that potentially large XML document into many small binary files."

Apple could also have split it into many small XML files, reducing read times while keeping it in a reasonably self-descriptive format that third-parties can read and write with relative ease. Or they could provide public developer documentation describing the binary format. Say what you like about MS, but they provide extensive public documentation for their Office file formats.

User lock-in strikes me as a likely explanation. Developer obtuseness, stupidity and/or sheer lack of resources to produce anything better might be another, but I seriously doubt anyone working on high-profile tent-pole products like iOS or iWork makes these kinds of decisions without the top brass's say-so. Look at the way Apple's business is moving, and it's clear they don't see software as a significant revenue stream in itself. Rather, the money is in selling the hardware to run those apps along with the services to organize/store/control the user's resulting data. Google's business model much the same: give away the product, winning huge market share and killing the competition in the process, and monetize the crap out of collecting and controlling the user's information and data.

Giving away iWork 13 effectively kills any chance of other vendors competing against Apple in the consumer end of the productivity app market. Sure it hands the semi-pro and pro end to MS Office, but Apple won't hesitate to dump small numbers of high-cost, low-profit customers in order to win large numbers of low-cost, high-profit ones in return**. Make it incredibly easy for those users to put all their data into iWork and iCloud and rather less easy to extract it out again, and all they have to do then is sit on their asses all day collecting revenue as users travel along their proprietary iToll roads.

(**I remember reading somewhere that Steve Jobs had considered killing Apple's pro market entirely due to it providing such poor ROI and growth potential compared to the consumer market. In the end he decided to keep it around, but only because it continued to provide some useful value to Apple as a shiny high-end status symbol, not because they give a crap about pro users themselves.)

has

October 28, 2013 11:06 AM

As to why they're not using Core Data, CD was designed to manage and serialize an object graph stored and accessed by a single client on the local system only. Their attempts to make CD operate across a highly distributed, multi-client network were laughable at best: you can't take an existing single-user architecture like CD and simply "scale it up"; network programming requires a whole different philosophy and approach right from the ground up.

I suppose if you're in a charitable mood, you might consider that iWork 13's binary format may just be a quick-n-dirty stop-gap measure for local storage while the iWork developers are busy figuring out the real challenges of making distributed replication, synchronization and storage work properly. In which case, they may eventually get around to replacing the closed unstable binary format with an open stable format that third-parties can safely work with as well. (Much as they keep newly added frameworks private until they've had a few years to knock out all the kinks, after which they can safely document the final stable APIs and make them public.)

But hey, this is the internet, so I think I prefer the "Apple Wants to Control Your Data" version much more. :)

Michael Tsai

October 28, 2013 11:11 AM

@has Indeed, one of the interesting things about the new iWork format is that it does not seem to really be designed for syncing, either (i.e. a list of transactions). I wonder what they’re using for the back-end of iWork for iCloud to enable simultaneous editing.

Matt

October 28, 2013 11:40 AM

You might be looking at next years core data upgrade.

Nicholas Riley

October 28, 2013 1:37 PM

Just take a look at your favorite Web debugging tool while doing simultaneous editing — it's just a bunch of HTTP requests which encode the changes you’re making, and corresponding persistent pull requests (what used to be called Comet) — nothing terribly special.

Exploring the New iWork File Formats

6 Comments RSS · Twitter

Leave a Comment