Friday, February 7, 2025

SpamSieve 3.1.1

SpamSieve 3.1.1 improves the filtering accuracy of my Mac e-mail spam filter, amongst other enhancements and fixes.

The update was held up because the Developer ID Notary Service was down for most of the business day yesterday.

Some interesting issues were:

I don’t know whether something changed with a Sequoia update, but several customers reported crashes due to Core Data throwing fault handling exceptions when SpamSieve tried to do something with the Contacts database. Since exceptions can’t be caught in Swift, I wrapped all the Contacts API calls with Objective-C @try blocks.
SpamSieve uses Core Data external binary data storage to keep message data and other large blobs out of the SQLite databases, and this also suddenly became the cause of crashes for some customers. When there’s an error saving the database, SpamSieve logs it, and as recently discussed, this calls description on the related managed objects. Normally, this would be good because it would give an idea of what the app was doing at the time and perhaps provide a way to recover any unsaved changes, since the descriptions of the property values get logged as well.

However, blobs that are backed by files are handled using the _PFExternalReferenceData subclass of NSData, and it turns out that when it’s unable to load the data it just throws an exception. As above, this can’t be caught in Swift, so it crashes the whole app. I would prefer that invalid objects describe themselves as such rather than crashing, but I guess Apple didn’t think of this edge case. Objective-C wrapper to the rescue again.
Swift did help in another area, though. I was able to make some custom collections to optimize handling of large selections in table views. SpamSieve already fetched NSManagedObjectIDs so that tables with millions of rows only have to bring into memory the small number of objects that are actually being displayed at any one time. I previously discussed making a custom collection that uses Core Data’s built-in batching. However, in this case we do something different because that’s too slow when it might block the user interface. So I have another collection type that will fetch the IDs on a background queue, so that the table can reload asynchronously, and then realize the objects on the main queue, so they can be used in the table. This much was already in SpamSieve 3.0.

What’s new in 3.1.1 is that it uses a similar mechanism to handle the selected objects using IDs instead of objects. Previously, as soon as you made a selection, those objects would be fetched and used for menu validation and for restoring the selection if the database changed in the background. Now, this is all done using IDs, and if full objects are needed for validation or to do something with the selection, it only brings them into memory in batches. The most common case is saving/comparing/restoring the selection, and this can be done entirely using IDs. It also transparently skips objects that were selected but that got deleted between then and when it was time to actually process that object. So it’s conceptually a collection of optional objects, but there’s a lazy filter to make them appear non-optional.

It’s all hidden behind the Collection protocol so that tables that are backed by Core Data get the optimization, but the same code also works for other tables. However, though the result works well, I don’t give Swift full points because the implementation was unsatisfying. I couldn’t figure out how to express exactly what I wanted to the type system without adding a lot of boilerplate that, in my view, would be more likely to cause bugs in the future than would the invalid code that it was trying to protect me from.

The issue is that I want the collection to be generic over T (the type of the row object). Sometimes it will store a plain ArraySlice<T>, and other times it will store a lazy collection that fetches Ts. The latter is only legal if T is an NSManagedObject, but there doesn’t seem to be a way to tell Swift that the outer collection only uses the Core Data backing collection in that case. Thus, it will refuse to compile because it (correctly) can’t prove that T is always an NSManagedObject. It seemed like there were several potential solutions:
- The collection could have different subclasses, with one of them requiring T to be a managed object. This seemed like it would create a mess with all the operations that take two collections, which might then be of different types. Currently, the backing is an enum, so the compiler will check that all the combinations are handled everywhere.
- The backings within the collection could be hidden behind protocols. I didn’t quite figure out whether this would actually solve the problem because it seemed to be unworkable for other reasons: it seems to require either an existential property (not supported by all the macOS versions I’m deploying to) or giving the collection another generic parameter that would then spread throughout the app.
- I went with the much simpler solution of bypassing the type system by making the backing collection only officially store plain NSManagedObjects and then casting them to T on the way out.

Previously:

Contacts Core Data Mac macOS 15 Sequoia Notarization Optimization Programming SpamSieve Swift Programming Language

SpamSieve 3.1.1

Comments RSS · Twitter · Mastodon

Leave a Comment