Archive for April 13, 2026

Monday, April 13, 2026

SpamSieve 3.3

SpamSieve 3.3 is an update of my Mac e-mail spam filter that includes lots of changes to improve the filtering accuracy:

Much of this is automatic: SpamSieve is better at analyzing the message structure, HTML, and URLs within messages.
The other part is helping customers help themselves. If SpamSieve is continually letting spam messages through, the most common reason is that it’s been trained that messages like them are good. This can happen either due to accidental training or due to not training (e.g. just deleting) previous spam messages that it missed. It’s long been possible to fix uncorrected mistakes, but there can be so many messages to go through that this is overwhelming. Now, if you’re wondering why a certain spam message wasn’t caught, the expanded search features make it possible to find the specific previous messages that made SpamSieve think that one was good. For example, if the log shows that SpamSieve thought ”v1agra” was a key indicator of the message not being spam, you can find the previous messages containing that word and make sure that they are trained as spam, not good.
This sort of search can take a while because it has to read and process all the old messages in the corpus or log. The message parsing was already pretty optimized, due to work for EagleFiler and the bulk importer added in SpamSieve 3.0. Spam-processing messages to find their words was less optimized because it hadn’t been a bottleneck—the mail client doesn’t send very many new messages to be filtered at once, and it’s happening in the background, anyway. But now we have potentially a huge amount of data to process, and the user is waiting for the results.
- The first step was to make the spam engine fully threadsafe so that each core could run a separate instance of it.
- I initially planned to use multiple threads for reading individual messages from disk, but that turned out not to be necessary. SSDs are really fast.
- Even with LZFSE, which is supposed to be highly optimized for decompression speed, decompressing still takes way longer than reading from disk and was sometimes on par with SpamSieve’s own message processing. So it did make sense to do this in a separate thread from the I/O. Thankfully, I was not using transformable Core Data attributes, so it was easy to separate the fetching from the decompression.
- There are a bunch of regex objects that are used very frequently. These had been stored in an ivar dictionary, but that no longer made sense because I don’t want to recompile them for each message. My initial approach was to just put them in a threadsafe cache, which is also the approach that Swift Regex takes. But it turns out that with so many threads running at once there is significant overhead just from locking to read the cache. It works much better to use static variables, though that’s a lot more verbose.
- Likewise, NSString uses a shared object to convert between different encodings, and there was significant locking overhead around that. As there’s no API to access the converter object directly, I ended up implementing my own lock-free solution for the specific encodings that SpamSieve cares about.
- Swift string operations were another source of slowness. SpamSieve was calling the generalized contains() with a single ASCII character. That can be made much faster by using utf8.contains(). There are other cases where using unicodeScalars.contains() makes sense.
- The HTML processor is still written in Objective-C, and it turned out that Swift bridging overhead was taking more time than the actual HTML processing. This was fixable through a combination of (a) adding specialized Objective-C methods with known return types instead of returning id and casting from Swift, and (b) Using as NSDictionary to avoid eager conversions of whole dictionaries when often we only need to read one key.
I fixed a regression that started when SpamSieve 3.2 switched to using NSDockTile to draw numeric badges on the Dock icon instead of drawing them itself. This was necessary because doing it manually doesn’t work with Liquid Glass. When SpamSieve did the drawing itself, it used a cached image of the Dock icon. The system API apparently relies on the image file on disk, even just to update the badge, and so it would sometimes crash during a software update if that file got updated.
Some customers were seeing a new issue on macOS Tahoe, seemingly caused by App Nap. The app would be running in the background, with no windows visible, and get woken up by an Apple event—so far so good—but then macOS would stop giving it processor time while it was still generating the response to the Apple event. I’m not sure what’s going on here since most customers are not affected.
I continue to have problems with fake GitHub repos, but GitHub is once again helping to take them down.

Previously:

Apple Mail Concurrency Core Data Liquid Glass Mac Mac App macOS Tahoe 26 Optimization Programming Regular Expression SpamSieve Swift Programming Language Unicode

Comments

Artemis II Desktop Pictures

Nick Heer:

NASA has put a few hundred photos on Flickr with some awesome views — and I must emphasize how the word “awesome” undersells these images. I am using this one as the wallpaper on my iMac right now, and it feels like a pretty good use of a big, high-resolution display.

Previously:

Update (2026-05-04): Anil Dash:

The Flickr team at SmugMug did something special with their responsibility about these public works, due to their cultural significance to the world. They made the Flickr Commons, and brought in a team with expertise in digital archiving and community.

[…]

It’s in this context that NASA has long been sharing its imagery on Flickr, for all of its missions — not just Artemis II. There’s even a special section for NASA on The Commons. And since everything is provided in incredibly high-resolution and has every single detail about the photo and how it was taken, it’s possible to combine the information about the photo with other data and create amazing resources like this beautiful timeline of the entire mission.

Flickr Mac macOS Tahoe 26 Wallpaper

1 Comment

Artemis II’s Fault-Tolerant Computer

Logan Kugler (via Hacker News):

To ensure those wrong answers never reach the spacecraft’s thrusters, NASA moved beyond the triple redundancy of traditional systems. Orion utilizes two Vehicle Management Computers, each containing two Flight Control Modules, for a total of four FCMs. But the redundancy goes even deeper: each FCM consists of a self-checking pair of processors.

Effectively, eight CPUs run the flight software in parallel. The engineering philosophy hinges on a “fail-silent” design. The self-checking pairs ensure that if a CPU performs an erroneous calculation due to a radiation event, the error is detected immediately and the system responds.

“A faulty computer will fail silent, rather than transmit the ‘wrong answer,’” Uitenbroek explained. This approach simplifies the complex task of the triplex “voting” mechanism that compares results. Instead of comparing three answers to find a majority, the system uses a priority-ordered source selection algorithm among healthy channels that haven’t failed-silent. It picks the output from the first available FCM in the priority list; if that module has gone silent due to a fault, it moves to the second, third, or fourth.

[…]

Orion carries a completely independent Backup Flight Software (BFS) system. This is a prime example of dissimilar redundancy. It is implemented on different hardware, runs a different operating system, and utilizes independently developed, simplified flight software.

Jim Hillhouse:

There are two main flight computers that use two radiation hardened IBM PowerPC 750FX single-core processors, a CPU introduced in 2002 and used in Apple computers such as the iBook G3 until 2005.

Previously:

Update (2026-05-04): See also: Hacker News.

Craft Data Integrity PowerPC Processors Programming RAM

4 Comments