Wednesday, November 20, 2024

SpamSieve 3.1

SpamSieve 3.1 improves the accuracy of my Mac e-mail spam filter, amongst many other enhancements.

Some interesting issues were:

NSHelpManager seems to be broken under Sequoia in that sometimes you have to click a help link multiple times for it to open the right page—until then it just opens the main help book page (FB15763353). To work around this, SpamSieve now opens the help page in the user’s browser, which some may prefer, anyway, because it has better window resizing and management than the system help viewer (which is now called Tips).
As with EagleFiler, our Download Fixer tool can be used to make the SpamSieve app launchable if macOS’s Gatekeeper erroneously thinks it’s damaged. I’ve also enabled library validation, in the hope that this will bypass the bug entirely by reducing the number of checks that Gatekeeper does and therefore that could potentially go wrong.
In rare cases, when an AppleScript asks Apple Mail to move a message it will fail and show an alert that blocks the entire app. (I think that scripting errors should just be reported back to the script.) It’s unclear exactly why this happens, but I’ve collected some workarounds that people have found helpful.
Reports of hangs caused by Swift Regex continue to trickle in, but it has been hard to track them down and find reproducible test cases. SpamSieve now has a debug mode (enable/disable) for detecting problematic patterns and their triggering text.
Some users have old audio components installed, which somehow break AppleScript (and, thus, SpamSieve) with error badComponentInstance, so we now have a way to report these so that they can be removed if necessary.
It seems like opening a loopback port was triggering Sequoia’s local network privacy alert. That doesn’t make sense to me, but the port is no longer needed since Sonoma, so hopefully stopping that will help.
Sometimes Core Data will report an error with NSCocoaErrorDomain, but the underlying database error will be reported via a key of NSSQLiteErrorDomain whose value is the error number, rather than using an underlying error object.
It’s now possible to iterate the entire corpus using AppleScript, even if it contains millions of words. Accessing the list of words directly, unfortunately still fails beyond a certain size. AppleScript seems to break down internally when given a large array, and, because it’s a list property rather than an element, there’s no way to make it lazy. AppleScript will convert the whole thing to an AEDesc right way. However, 3.1 does improve this somewhat by doing a more efficient fetch and by reducing the amount of Swift bridging overhead.

It is able to support arbitrary sizes when accessing token info objects. The key is to have the script request them by index, and this way SpamSieve can look up exactly what the script wants and return only that. If you don’t write the script that way, even though SpamSieve can fetch individual tokens—and has long done so when AppleScript requests them by name—AppleScript will eagerly request everything all at once and hit the aforementioned limit with too many objects. Even without the limit, this would be undesirable because of the RAM use. There does not seem to be a way to return only a list of unique IDs to AppleScript. Indexes work, if care is taken to make them efficient and stable between calls, but of course they can’t be truly stable if the data is being modified during iteration.

Previously:

Update (2024-12-09): Unfortunately, it turns out that library validation does not work around whatever macOS code signing bug is causing downloads to be incorrectly reported as damaged, so customers encountering this still need to use the Download Fixer tool. That’s easy enough to do, but some will probably give up before doing that because there’s no automated way to help them find it.

Apple Help Apple Mail Core Data Gatekeeper Mac macOS 15 Sequoia Programming SpamSieve

7 Comments RSS · Twitter · Mastodon

geriatricguy

November 20, 2024 4:04 PM

What you acutely saying that Mail broken in a few places and it is why it isn't functioning on any OS in the Apple family the way it suppose to be working.

Sebby

November 23, 2024 1:34 PM

It seems like opening a loopback port was triggering Sequoia’s local network privacy alert. That doesn’t make sense to me, but the port is no longer needed since Sonoma, so hopefully stopping that will help.

Me neither, but losing this is a shame. What IPC mechanisms are used now? Are you going to sanction another method for external processes to control SpamSieve except for scripting? Perhaps provide a CLI utility for assessing/training?

Michael Tsai

November 23, 2024 1:54 PM

@Sebby It was for communication between the Mail plug-in and the app, not documented for third-party use. I suppose it could be brought back as an option; I just don’t want to prompt users for access that’s not needed. I’m planning to make an SDK that’s built on the scripting interface, and probably a sample project for that will be a CLI utility.

Sebby

November 23, 2024 2:31 PM

@Michael OK. Look forward to it. It's not the end of the world as long as the drone setup still works, albeit with a bit of a delay. But having a proper interface for servers would be awesome.

Michael Tsai

November 23, 2024 2:57 PM

@Sebby This doesn’t affect the drone setup. Were you using the HTTP interface? Presumably not on a server since the port wasn’t exposed to the network. The AppleScript interface does work over the network (Remote Application Scripting). What are you trying to do?

Sebby

November 23, 2024 4:39 PM

@Michael I was planning to integrate the socket interface into the delivery pipeline on the server machine itself, i.e. an SMTP proxy would receive the message for final delivery, do the lookup, add in the header that the SIEVE script would use to move the message automatically to spam or leave in Inbox, then feed back to the MTA for delivery to the mailbox. I could also have the server train messages in folders by feeding them into SpamSieve, without needing a drone setup at all. Thankfully I simply never got around to doing any of this. But I would certainly like an official way to do it.

Michael Tsai

November 23, 2024 4:51 PM

@Sebby OK, seems like this would be easy enough with a command-line tool.

SpamSieve 3.1

7 Comments RSS · Twitter · Mastodon

Leave a Comment