There are two kinds of tricky bugs. The first is the masquerade party: something is going wrong, it’s clear what the problem is, but that can’t happen.
One of the great contributions of Martin Fowler’s Refactoring was that it gave a name to one of the best approaches to addressing an intractable bug. If you can’t find the problem, and localization strategies can’t quite pin it down, you can just refactor the hell out of the neighborhood. This may or may not fix the bug, but at least it improves the code. Before refactoring, bug hunts left tracks all over the program in the form of hooks and temporaries and diagnostic writes. Nowadays, bug hunts leave mown lawns and cleaner design.
I also found Debug It! to be a good basic level book. I’m not aware of any good advanced books on this topic. This seems to be the case for most areas of our field.
“Many of these issues take hours to resolve and some can permanently corrupt your account,” one top developer told me. “AppleCare has been unable to assist our customers who run into these issues.”
Many veteran developers have learned their lesson and given up on iCloud’s Core Data syncing entirely. “Ultimately, when we looked at iCloud + Core Data for [our app], it was a total no-go as nothing would have worked,” said one best-selling iPhone and Mac developer. “Some issues with iCloud Core Data are theoretically unsolvable (stemming from the fact that you’ve put an object model on top of a distributed data store) and others are just plain bugs in the implementation,” he said.
WWDC 2013 is just around the corner, and while many of iCloud’s syncing issues have been fixed, dozens of bugs remain unsquashed. So can these issues ever be solved? “[Apple’s] approach to the problem was very novel and interesting, and perhaps they will ship a version of it that works – but it functions very differently than typical sync solutions in that there is not a central server hub that maintains the ‘truth in the cloud,’” Pierce told me. “Because of this there is a lot of fragility to the implementation, and I’m not sure it will ever scale well to larger data sets,” he said.
The article insinuates that Apple doesn’t care enough to fix the problem. I don’t think that’s the case. Rather, I don’t think they know how to fix the problem. They aimed high, but ultimately bit off more than they can chew.
Opaque errors are just the beginning—developers are also frustrated with how iCloud handles a user’s data if the user chooses to turn off document and data syncing. Doing this, it turns out, completely removes a user’s locally stored iCloud data. And signing out of iCloud results in the system moving iCloud data outside of an application’s sandbox container, making it impossible for the app to use the data any longer. The assumption here is clear: you’re either using iCloud exclusively for data storage or you don’t want to use that data at all.
So when I say that there are two iClouds, I mean that there are two iClouds. One of them is used heavily inside Cupertino for its own services and the other is offered as a developer API and used only selectively for Apple’s own apps. I’m not here to say whether that’s right or wrong or fair or not or whatever, those are just the facts.
And, once you dig in, iCloud for developers is far less a completely holistic solution and much more of a loose bundle of networking protocols and systems that are unified in name only. It involves so many departments and teams inside Apple that it makes for a very fragile system.
In general, when iCloud data doesn’t synchronize correctly (and this happens, in practice, often), neither the developer nor the user has any idea why.
In some circumstances (and we haven’t been able to figure out which, yet), iCloud actually changes the object class of an item when synchronizing it.
In some cases (again, not all the time), iCloud may do one of the following:
Owner relationships in an item’s data will point to the wrong owner;
Owner items get lost in synchronization and never appear on computers other than the one on which they were created. (This leads to the item never appearing in the UI on any other machine.) When this happens, bogus relationships get created between blob items and an arbitrary unrelated owner.
When the blob data doesn’t show up, we have no choice but to wait — the application can’t display what isn’t there, and there are no mechanics by which Core Data can make a request for immediate delivery of the data from iCloud.
Core Data’s iCloud gets most of the grief when it comes to the public perception of the troubles, but document sync isn’t all oats and honey. I have had a nearly complete version of Elements ready to ship with iCloud support for several months. Why do I say nearly complete? There are bugs related to iCloud’s document syncing architecture that make me extremely uncomfortable shipping the software.
There’s also a good thread on Hacker News:
No one really fully knows WTF is going with iCloud - documentation is basically nonexistent, support unavailable, implementation broken in trivial ways, and error messages inscrutable. […]
IMO, from having studied this for many, many hours over many moons, is that what Apple is trying to do is fundamentally hard, if not entirely unfeasible. They’re trying to replace a smart server with one that’s dumb as a doorstop. More specifically, they’re trying to emulate a CRUD web service with a file sync engine.
Conflict resolution is left up to individual clients, since the server doesn’t do any “thinking”. Likewise, there is no canonical, authoritative state of the store, since the server doesn’t “think”, only the clients do. Apple was hoping that by shoving a bunch of diffs of your database onto the server, that clients can reliably reconstruct a sane database by playing them back - except that multiple clients are updating the diffs simultaneously and there is no server-side conflict resolution.
Oh, and if stuff fails, there are no regular snapshot states to fall back to, because iCloud is a file store, not a database engine. Your whole store is now corrupt. Enjoy.
Oh, and there’s no way to nuke the database and start over - there are metadata files that are undocumented or that we don’t even have sandbox access to, that interfere with completely destroying the database.
Update (2013-03-31): Another Hacker News thread:
It is trivially easy for a combination of devices (or really, just two) to generate a combination of diffs that do not playback to a consistent database state.
In a normal world where CRUD operations are handled by a server with some understanding of the underlying data model, the server resolves conflicts as the authoritative data source, and thus consistency is maintained.
In the world of CDIS, there is no authoritative server, nor is one device ever designated as the canonical state (this would be unrealistic, since devices get retired and lost, nor is there anything special about a device that makes it canonical). Instead, conflict resolution is left to each device, the implementation of which is broken. Which is to say, your CDIS-enabled database will work fine until the moment two devices make conflicting changes. At which point it is irrecoverably corrupted.
The CDIS client’s reaction to this is to revert your local copy of the database to its last known good state and cease communicating with iCloud entirely. This error occurs silently without either notification via UI (to the user) or API (to the developer). There is no way to query this state, and as of today the only visibility into this error is via console logging.
Update (2013-04-03): Tom Harrington:
One day you call
-addPersistentStoreWithType:configuration:URL:options:error: and it fails, with an incomprehensible error about not being able to upload (or sometimes download) a file named baseline.zip. Or else, some other equally incomprehensible error relating to undocumented internal classes like
PFUbiquitySwitchboardCacheWrapper. You’ve never heard of baseline.zip, and you weren’t attempting to upload or download anything. Your app doesn’t have access to iCloud at this point, but what’s worse, you also don’t have access to your local data store, and you have no recovery path for either problem. Worst of all, you did not actually do anything wrong, and there’s nothing you can do to fix it. Try again, and better luck next time.
Update (2013-04-11): Tom Harrington:
Today I’m starting off with some things that, while not actually bugs, may catch a developer off guard. In this post I’m sticking to how iCloud is designed to work, and not getting into the questions of how and when it doesn’t work.
Update (2013-04-16): Tom Harrington:
When iCloud has new incoming data, it imports the new changes to your data store first and tells you about them afterward. You find out about this when
NSPersistentStoreDidImportUbiquitousContentChangesNotification gets posted. There is no corresponding will import notification or anything like a should import delegate call that might allow you to veto changes. And since Core Data doesn’t care if you create a duplicate, duplicates are created and you’re left to clean up the mess.
Update (2013-05-07): Tom Harrington:
The upshot? I’ve seen
addPersistentStoreWithType:etcetera: block for 30 minutes. And keep in mind, this is when iCloud is working normally. This is not an error condition, this is “working as designed”.
Even if Apple works out syncing — somehow — that’s just not enough. That just gets us to where we should have been in 2008. The future belongs to apps with more sophisticated services.
Update (2013-05-15): Tom Harrington:
Neither of them turns out to be very interesting. Trailers has two entities and the spelling dictionary just one. Neither has any relationships. But this lack of interesting detail is, really, the interesting part. In the only two examples I’ve been able to find of Apple using iCloud with Core Data, the data models are almost trivial. Less complex, in fact, than you’d expect in a decent introduction to using Core Data.
Update (2013-05-23): Drew McCormack:
With WWDC just around the corner, we again start to ponder if this will be the year Apple gets iCloud right. But the more I dig into iCloud/Core Data sync, the more I have come to realize that even if it worked as designed, it still may be quite flawed as a solution. They may have gotten it wrong from the outset, and some design failures are probably not easily addressed. What follows is a list of what I think is fundamentally wrong with iCloud/Core Data’s design, leaving aside any of the practical failures that we have witnessed in the past.
Update (2016-06-17): Here are some more posts that I’ve written related to iCloud Core Data: