Archive for April 17, 2025

Thursday, April 17, 2025

ARC Optimization vs. -fstack-protector

Samuel Giddins:

After months of painstaking work, we’ve got our apps building, and most of our tests building, and almost most of them passing.

Except for some tests that use test fixtures. And assert that those test fixtures get deallocated. And they passed in Xcode. And failed when run via Bazel.

[…]

However, due to the way the stack protector works (by adding instructions at the start and end of function), they can interfere with a call to objc_autoreleaseReturnValue being able to see it’s matching call to objc_retainAutoreleaseReturnValue, and then [object autorelease] will actually have to do an autorelease. Which means that the object will go into the autoreleasepool. And it won’t be deallocated until that pool drains. And XCTestCase’s -setUp and -tearDown methods happen inside the same autoreleasepool.

[…]

What made this bug so fun (and infuriating) to investigate was that it sat at the intersection of a bunch of different moving pieces. Our code was technically wrong (relying on performance optimizations in the runtime isn’t especially safe). Bazel did something incredibly unexpected (passing -fstack-protector when I didn’t ask it to). The Objective-C runtime has a performance optimization that does more than optimize (this is valid code under ARC, and yet it’s behavior is different from what ARC’s semantics say it should be). And, finally, clang allows me to pass a compiler option that changes observable behavior, without documenting that it can do more than catch a small set of bugs.

Previously:

Performance of the Python 3.14 Tail-Call Interpreter

Nelson Elhage (via Hacker News):

Unfortunately, as I will document in this post, these impressive performance gains turned out to be primarily due to inadvertently working around a regression in LLVM 19. When benchmarked against a better baseline (such GCC, clang-18, or LLVM 19 with certain tuning flags), the performance gain drops to 1-5% or so depending on the exact setup.

[…]

Historically, the optimization of replicating the bytecode dispatch into each opcode has been cited to speed up interpreters anywhere from 20% to 100%. However, on modern processors with improved branch predictors, more recent work finds a much smaller speedup, on the order of 2-4%.

[…]

Still, nix was clearly enormously helpful here, and on net it definitely made this kind of multi-version exploration and debugging much saner than any other approach I can imagine.

Amazon Web Services Dark Patterns

Jeff Johnson:

As far as I can tell, the confusingly named Aurora PostgreSQL is not actually PostgreSQL but rather an Amazon-specific database designed with one overriding goal: to be infinitely more expensive than PostgreSQL, which is free. In any case, the AWS Free Tier details give the impression to unsuspecting new users that PostgreSQL is free, without making an explicit distinction between true PostgreSQL and Amazon’s faux PostgreSQL.

The worst part is that when you enable Aurora PostgreSQL on the free tier, which I apparently did without knowing exactly what Aurora meant, Amazon does not warn you that you’re about to be charged an obscene amount of money. And I didn’t even use the database, as far as I know.

[…]

The AWS documentation is, in a word, massive, just as AWS itself is massive. I would also suggest that there’s something very wrong with an internet service if you have to wade through the fine print of the service’s massive documentation just to discover that you can suddenly and silently incur a cost of hundreds of dollars per month for a feature on a specific tier advertised by the service as free.

This post is from last year, but I came across it while going through some old drafts, and it still sounds like something to be aware of.

Previously: