Tuesday, June 4, 2019

Syncing Core Data With CloudKit and NSPersistentCloudKitContainer

WWDC Session 202:

CloudKit offers powerful, cloud-syncing technology while Core Data provides extensive data modeling and persistence APIs. Learn about combining these complementary technologies to easily build cloud-backed applications. See how new Core Data APIs make it easy to manage the flow of data through your application, as well as in and out of CloudKit. Join us to learn more about combining these frameworks to provide a great experience across all your customers’ devices.

See also:

This is great to see, although for all the specific use cases I have in mind it would likely be more appropriate to use CloudKit directly. NSPersistentCloudKitContainer seems too automatic/opaque. (I say this before the session has taken place, though.)

Hunter Hillegas:

It’s a whole new managed approach where Core Data owns the CloudKit container and handles it. No clue how well it works.

Scott Perry:

Yup, Core Data CloudKit implements to-many relationships using CRDTs!

Drew McCormack:

Looking at the Core Data + CloudKit sample code, I have a feeling it might be a case of “fool me twice”. The fact that there is no global identifier for objects means you end up with a lot of messy deduplication code. Even a real basic app like the sample will scare many away.

Malcolm Hall:

You can get the global identifier using recordForManagedObjectID: it is stored in a meta data table, see attached screenshot:

Drew McCormack:

Yeah, I figured they had an id internally. My point is the user doesn’t seem to be able to pick one. Eg. merging tags is trivial if you can choose your own global id.

Malcolm Hall:

Oh so in the sample the deduplicate method is finding all tags with the same name, selecting one, and then setting all posts that are using any of the duplicates to the single tag. That’s pretty nasty.

Malcolm Hall:

So CloudKit Core Data Sync doesn’t merge in remote changes before syncing up the modified local record, and sends whole record not just changed fields, causing it to overwrite whatever changes another device made that was not yet received, seems like a dealbreaker.

Previously:

Update (2019-06-04): Malcolm Hall:

sadly Core Data CloudKit isn’t using CKReference for related records, just using a string field, thus losing integrity. I was really hoping they would make the public the CKReferenceActionValidate that Notes uses for e.g. the one-to-many folder notes relation.

Update (2019-06-06): Tom Harrington:

I went to the Core Data with CloudKit lab and, to their credit, the team did not run away and hide from me.

Update (2019-06-11): Bob Cottrell:

There was a throwaway line at the end of the Core Data and CloudKit session that talked about using Lamport timestamps to, you know, actually reflect the distributive nature of syncing. This was probably one of the most mind-blowing thoughts to come out of all the videos so far.

This is interesting in light of Malcolm Hall’s statement above that it does not seem to be properly merging the fields. He’s also said that he looked for timestamps in the syncing database but couldn’t find them. So it’s not clear to me what Core Data actually does.

Update (2019-06-17): Drew McCormack (tweet):

The general approach Apple are taking seems sound enough to me. They are using their new generational storage (with history) to track changes, and update the cloud from that. Effectively having “versions” in your store is very powerful, because you can bring in new sync changes while the UI of the app continues on oblivious. […] This is preferable to the nightmare we used to have with concurrent changes, where one context would be forced to merge in change notifications, and failing to do so would lead to exceptions.

[…]

The nature of Apple’s sync, where you effectively have distributed stores, means you can’t globally validate data like you can with a single central store. Apple have chosen to work around this by disallowing validation on relationships. It’s something to keep in mind. For example, if you have a one-to-one relationship that was previously non-optional, once you add sync, you will have to make it optional, and concurrent changes could lead to an orphaned object where the relationship is nil. This is not really avoidable in a decentralized syncing system, though how it is handled can vary. In Ensembles, the same problem can arise, and a delegate method is called to allow the app code to correct the issue (eg delete an object); Apple have opted to just disallow validation of relationships, which means you will probably need to add your own checks to “correct” the data.

Andy Bargh:

It also seems to be a bit of a partial solution as well as there doesn’t seem to be any support for the Public or Shared databases in there either which seems like a bit of an omission.

5 Comments RSS · Twitter

[…] Syncing Core Data With CloudKit and NSPersistentCloudKitContainer […]

[…] Syncing a Core Data Store with CloudKit (my post) […]

Would really be useful if it worked with the Public database as this is where we store information that is important to be up-to-date and available to all users. Data that now has to be stored locally by other means.

Womich— it’s not just useful, it’s a huge gap. Why on earth did Apple knowingly exclude public database CloudKit with core data integration, again? People all over the internets have been looking for this feature for over 5 years. Come on. Let’s please get this one done. I imagine with the new nspersisifent loudkitcontainer it’s simply a matter of exposing which dB to use.

Leave a Comment