Friday, November 13, 2020

Apple Server Outage Makes Mac Apps Hang on Launch

Jeff Johnson:

WTF somehow my TCC seems fucked up on Mojave suddenly, for no apparent reason, no software updates. But only when my internet is connected?

Apps are hanging on launch! Reboot didn’t help.

Jonathan Deutsch:

I’m hitting the exact same thing on 10.15.7 starting ~30 min ago… lots of random hangs only when connected to wifi.

Skylar Lewis:

All of my non-Apple apps became really slow to open as well.

Panic:

😅 Looks like, when apps are launched, Gatekeeper is unable to check their validity over the internet, due to overwhelmed Apple servers.

Jeff Johnson:

I figured out the problem using Little Snitch.

It’s trustd connecting to ocsp.apple.com

Denying that connection fixes it, because OCSP is a soft failure.

(Disconnect internet also fixes.)

Make sure you deny it for both system and user. I ended up having to make 2 rules.

Patrick Wardle:

On Big Sur, trustd is in Apple’s “ContentFilterExclusionList”….meaning firewalls can’t block it! 😭

Welcome to the future? 😱

Jeff Johnson:

If you don’t have @littlesnitch then try /etc/hosts to fix Mac app launching

ocsp.apple.com port 80 is the problem

Nathan H. Leung shows how to do this with vi.

Jeff Johnson:

Don’t confuse Developer ID certificate status (/usr/libexec/trustd to ocsp.apple.com) with notarization (/usr/libexec/syspolicyd to api.apple-cloudkit.com).

Notarization check only occurs on first launch. Online Certificate Status Protocol can occur on any launch.

nut_bunnies:

I thought it was just Catalina being Catalina. I woke my computer from sleep and it couldn’t detect the fucking keyboard or trackpad.

Adam Engst:

It’s quite troubling that an Apple server being down could cause this. My iMac is sludge right now.

Guilherme Rambo, on the System Status page:

🔥 This is fine 🔥

Josh Centers:

It’s very simple: a screwed up server on the other end of the country shouldn’t render your computer unusable.

Łukasz Langa:

I am currently unable to work because macOS sends hashes of every opened executable to some server of theirs and when trustd and syspolicyd are unable to do so, the entire operating system grinds to a halt.

I’m typing this from my phone since the Mac is effectively frozen.

Nilay Patel:

I had three different Macs go sideways today because of a server issue I had no idea was happening. Many thoughts about how much we actually own our computers :(

Jeff Johnson:

Good news, Mac users! Our long international nightmare is over.

People are saying that ocsp.apple.com is back online, and that seems to be true.

Yan Zhu:

don’t block ocsp.apple.com forever because apple uses it to check for revoked notarizations

Jeffrey Paul (via David Heinemeier Hansson, Reddit):

It’s here. It happened. Did you notice?

I’m speaking, of course, of the world that Richard Stallman predicted in 1997. The one Cory Doctorow also warned us about.

On modern versions of macOS, you simply can’t power on your computer, launch a text editor or eBook reader, and write or read, without a log of your activity being transmitted and stored.

See also: Hacker News, 9to5Mac (Hacker News), ArsTechnica, MacRumors, The Verge, Philipp Defner, Nick Heer.

Previously:

Update (2020-11-16): Jeff Benson (via Nick Heer).

This brings with it several privacy concerns. First, because your computer has to send your IP to communicate with Apple, it means Apple can see your IP address and the application you’re trying to use. Second, OCSP uses unencrypted HTTP communications so “any entity with visibility to your macOS-based computer could also observe and/or log these facts.”

Jeff Johnson (tweet, Hacker News):

When you launch a Mac app, macOS may check with Apple’s Developer ID OCSP to see whether the app developer’s code signing certificate is revoked. […] Unfortunately, if there’s an internet connection problem involving the Developer ID OCSP, that can also prevent Mac apps from launching.

[…]

This actually wasn’t the only Developer ID disaster recently. A few weeks ago I wrote another blog post after Apple temporarily revoked HP’s Developer ID cert, which caused a widespread failure of HP printer software.

[…]

The reason I mention the cache period is that it appears Apple has greatly increased it, from 5 minutes to half a day, likely in order to mitigate the problems caused by Thursday’s outage.

[…]

The notarization status is cached permanently and has no expiration, unlike OCSP. Thus, notarization only affects your ability to install new apps, it doesn’t affect your ability to launch already installed apps.

Dave Wood:

I would really like to see a response from @apple on this. They need to acknowledge the problem & what they’re doing to ensure it doesn’t happen again. Bonus points if they explain how they’re not tracking everything we do.

Jeff Johnson:

One bad side effect of blocking ocsp.apple.com is that it can break the Mac App Store[…] because they’re running more than one service on that domain!

Howard Oakley:

We did have an alternative in macOS, which used to maintain a local database of revoked certificates, or so we suspect, until over a year ago. At the height of its use, that database was updated every couple of weeks. So if Apple revoked a certificate being used to sign malicious software, it could take another two weeks or more before that revocation had trickled down to all active Macs. One of the advantages of the newer OCSP approach is that your Mac can block software within minutes of Apple revoking its certificate, something we saw only too well with the recent accidental revocation of some old HP printer software.

[…]

There are fallbacks. If your Mac doesn’t have an internet connection at all, or the route to Apple’s OCSP service is blocked, your apps still open, with their certificates unchecked. It’s when that service isn’t inaccessible, but has failed, that the biggest problems arise. This is a well-known engineering problem, fail-safe design.

As Apple so devastatingly demonstrated last Thursday to millions of Mac users around the world, its design of the trustd signing certificate check doesn’t fail safe in those circumstances.

John Gruber:

Just an embarrassing bug for Apple on a high-profile launch day.

John Gruber:

Apple should publish information about this system in the excellent — but alas, not comprehensive — Apple Platform Security report[…]

Jacopo Jannone:

The problem is that Apple’s responder didn’t go down; it was reachable but became extremely slow, and this prevented the soft failure from triggering and giving up the check.

[…]

To make things worse, it is common for OCSP to use HTTP - I’m talking about good old plaintext HTTP on port 80, none of that HTTPS rubbish. There is usually a good reason for this, that becomes especially clear when the OCSP service is used for web browsers: preventing loops. If you used HTTPS for checking a certificate with OCSP then you would need to also check the certificate for the HTTPS connection using OCSP. That would imply opening another HTTPS connection and so on.

There’s got to be a way to do better than this for Gatekeeper given that Apple controls both ends of the connection.

It is clear that the trustd service on macOS doesn’t send out a hash of the apps you launch. […] macOS does actually send out some opaque information about the developer certificate of those apps, and that’s quite an important difference on a privacy perspective.

For privacy purposes, I think it’s a distinction without much difference. Rather than your Mac broadcasting that you launched a particular version of the Signal app, it broadcasts that you launched an app from Signal Messenger, LLC.

David Heinemeier Hansson:

I don’t see how this makes anything better? Sending a global unique hash of the developer certificate in the clear still allows both Apple to keep a log and anyone the power to snoop. This is fundamentally busted. Apple should send ban lists to the user.

Apple (Hacker News):

Gatekeeper performs online checks to verify if an app contains known malware and whether the developer’s signing certificate is revoked. We have never combined data from these checks with information about Apple users or their devices. We do not use data from these checks to learn what individual users are launching or running on their devices.

Notarization checks if the app contains known malware using an encrypted connection that is resilient to server failures.

These security checks have never included the user’s Apple ID or the identity of their device. To further protect privacy, we have stopped logging IP addresses associated with Developer ID certificate checks, and we will ensure that any collected IP addresses are removed from logs.

In addition, over the the next year we will introduce several changes to our security checks:

  • A new encrypted protocol for Developer ID certificate revocation checks
  • Strong protections against server failure
  • A new preference for users to opt out of these security protections

Nick Heer:

The prior version is available on the Internet Archive.

So they were logging the IPs. And they don’t deny using aggregate information about what users are launching, e.g. to get competitive data. In typical Apple fashion, the only acknowledgement that there was a problem is via a quote given to a third-party site (also: MacRumors, Hacker News):

What caused the OCSP server problem? Apple says it was due to a server-side misconfiguration that specifically interfered with macOS being able to cache OCSP responses for Developer ID. This configuration error, along with an unrelated content delivery network (CDN) misconfiguration, is what caused the slow performance for apps to launch.

The people who discovered and publicized the issue don’t get to break this news.

David Heinemeier Hansson:

This is a very welcome admission by Apple that the current system is deeply flawed, and the changes promised are solid improvements. But why does shit like this always have to be let out to back door with an obscure update to an Apple help site article?

It’s not clear whether the new preference will be for OCSP’s successor, notarization, or both.

Paul Haddad:

I know lots will make fun of “over the the next year” being fast, but I’m impressed that in just a few days Apple acknowledged a problem and promised a fix. That’s fast for them, its not an incident report, but its progress?

Howard Oakley:

What I attempt in this article is a coherent account of how macOS checks executable code before it’s loaded and run, in macOS 10.15 and 11.0.

Phil Vachon (via Hacker News):

Mayhem ensued, and after the issues were cleaned up, many questions remained about the implications of this failure. But first, let’s take a look at the mechanisms involved in authenticating an application package, at the most fundamental level.

[…]

Perhaps more transparency would help ease peoples’ concerns around misuse of their data. Having an auditable third-party run the OCSP responders for app certificate checks would assuage peoples’ concerns that Apple is misusing this data.

Update (2020-11-25): Adam Engst:

It’s hard to overstate the effect this problem had on the Mac world. Although Josh and I were able to get our iMacs working properly again reasonably quickly, the rest of our afternoon disappeared into trying to figure out what was happening. In the MacAdmins Slack, IT admins and consultants were doing the same, not just because of their personal Macs but also because they were being deluged with calls, email messages, and trouble tickets from their users and clients. Developers received bug reports demanding fixes, and the problem disrupted many online presentations, meetings, and conferences taking place during that time. A Hacker News thread about the problem garnered over 1150 comments, including some from Mac users who, like Josh, wasted significant time with troubleshooting, worried that their Macs had suffered a hardware failure.

Apple may not have actually taken every Mac in the world offline, but this network failure wasted several hours of time for what must have been millions of Mac users. (I suspect that people who weren’t attempting to launch apps during this time might not have noticed.) Nothing will give us that time back, but an acknowledgment and apology would be welcome.

This debacle also threw a spotlight on what seems like a weak point in macOS. It’s clear that Apple designed trustd to fail silently and gracefully when a Mac is offline, but why is there such a long timeout in the event of a network failure? Are there other components of macOS that make similar checks in everyday usage that could hurt the user experience in error conditions?

Update (2020-11-27): Howard Oakley (Hacker News):

Until 2018-19, it appears that macOS stored information about certificate revocations locally, in the ‘Gatekeeper’ database at /private/var/db/gkopaque.bundle, which at one time Apple updated every couple of weeks. But those Macs which have kept pace with the latest release of macOS stopped accessing that database in September 2019, with the release of macOS 10.15 Catalina. Apple hasn’t released an update to it since 26 August 2019, and anyone with a fresh installation of Big Sur will have a truly ancient version installed. As I pointed out here, that ‘Gatekeeper’ database is now disused.

Instead, Catalina and Big Sur now check all executable code on loading, and, when that code is signed with a developer certificate, perform an online check with Apple’s OCSP service, which has suddenly become so controversial.

Since the introduction of Gatekeeper in 2012, Apple has apparently revoked many compromised developer certificates. We see the tip of the iceberg of malicious software which is signed, detected by Apple, and quickly has its certificate revoked.

[…]

So Apple only seems to have been performing such extensive checks over the last 16, and no more than 23, months, although they have been applied to quarantined apps for around six years.

Update (2021-03-15): CryptoHack:

Overall, the incident this week was a good time to reflect on the trust model that has been promoted by organisations like Apple and Microsoft. Malware has grown in sophistication and most people aren’t in a position to judge whether it’s safe to run particular binaries. Code signing seems like a neat way to leverage cryptography to determine whether or not to trust applications, and to at least associate apps with known developers. And revocation is a necessary part of maintaining that trust.

However, by adding several mundane failure modes to the verification process, OCSP spoils any cryptographic elegance the code signing and verifying process has. While OCSP is also widely used for TLS certificates on the internet, the large number of PKI certificate authorities and relaxed attitude of browsers means that failures are less catastrophic. Moreover, people are accustomed to seeing websites become unavailable from time to time, but they don’t expect the same from apps on their own devices. macOS users were alarmed at how their apps could become collateral damage for an infrastructure issue at Apple. Yet this was an inevitable outcome arising from the fact that certificate verification depends on external infrastructure, and no infrastructure is 100% reliable.

Scott Helme also has concerns about the power that Certificate Authorities gain when certification revocation actually works effectively. Even if you aren’t bothered about the potential for censorship, there will be occasional mistakes and these must be weighed against the security benefits.

Update (2021-07-28): Howard Oakley:

In November 2020, Apple’s use of online OCSP checks came under fire, driving it to take immediate steps to protect privacy, and to state that certificate revocation checks will change in the following year to feature:

  • “a new encrypted protocol”;
  • “strong protections against” [OCSP] “server failure”;
  • “a new preference for users to opt out of these security protections”, which presumably means both hash lookup and certificate revocation checks.

As far as I’m aware, none of those three changes has yet been implemented, although there are only four months left before that year elapses.

sneak:

I’m the reason they “came under fire”, and they had been transmitting the app launches unencrypted back to Apple for two years already at that point. It of course continues today. It looks like it will continue in 12.x all next year, too.

Update (2021-08-13): Howard Oakley:

With less than three months to go to the end of that year, I can’t discover any further announcements from Apple that anything has changed, and by the end of November last year the trail runs cold. Apple revised that support article on 30 March 2021, but doesn’t appear to have altered anything of substance concerning its OCSP checks.

Of Apple’s four promises, removal of IP addresses from the OCSP servers should have happened immediately, and there appear to have been no further server outages, making it plausible that the service is now more robust.

Apple has made no announcement regarding the more difficult problem of introducing an encrypted protocol to protect revocation checks.

[…]

I think it’s time for Apple to provide an update on its progress in implementing the changes which it so publicly announced on 16 November 2020.

Update (2021-11-12): Howard Oakley:

In the normal run of macOS updates, we wouldn’t expect Monterey 12.1 for a month yet. Although its current beta-release apparently brings SharePlay, slightly delayed from the initial release, there’s no mention of a key feature which Apple promised us almost a year ago: the option to disable signing certificate checks with Apple’s OCSP servers. While this may not be at the top of everyone’s priorities, for many Mac users around the world it’s essential protection from prying state security services, and not a promise that Apple can renege on.

[…]

It’s the fourth promise which should be most obvious. I can see no change in Monterey 12.0.1 which provides a means for users to opt out of OCSP revocation checks. Perhaps Apple intends to introduce this in 12.1, but there’s no mention of it in the release notes. It’s also an important issue for those still using Big Sur. Given that Apple’s promise isn’t confined to any future release of macOS, the new user preference should surely also be implemented retrospectively in Big Sur as well, perhaps in its forthcoming 11.6.2 update.

Given the tens of thousands of engineers employed by Apple, and the apparent simplicity of this task, has Apple forgotten the promises it spells out so clearly in that support article, or has it no intention of doing what it still says it will? Perhaps you’d like to ask Apple whether it’s ever going to honour those promises.

13 Comments RSS · Twitter

It's worth pointing out that Microsoft also released a feature update this month. It took all of two minutes from start to finish. All my apps continued to launch just fine.

The whole thing is such a dispointment.

I doubt it, but the amount of online bad buzz might have Apple change that...

That explains why there was no "Mac" at the end of the commercial.

I don't really want to use OpenBSD as a desktop operating system, but...

It used to be Apple Computer, but now it's Apple's computer.

This is yet another example of Apple thinks everything is fine as long as it works ok for their employees in Apple Park who are all using super fast devices, on 10 gigabit connections, connected to only the newest and most expensive gear, with 100% server uptime, etc. They are terrible about engineering their software to account for flaky network connections, old hardware, and any kind of failure where the user might want to know WTF is actually happening. Millions of users sat down at their Mac to do some work today only to find out that it wasn't working right, they had zero idea why, and it wasn't their fault. That is inexcusable.

Another thing that irks me: By yesterday evening, record of the Apple Pay, Apple Card, and some other outages was removed from the Apple's status page. The day wasn't even over and Apple already removed a log of what happened.

I recreated this behavior Nov 2018, first with an Apple Support rep, then made a video to prove, then actually went to a to the local Apple Store so a Genius to see it, and document. They seem to have no intention of changing their course.
Primarily focusing on audio and video, my workstations need to be on a “network” for sharing media etc, but there is no “internet” available unless I specifically permit. Once Mojave hit, this type of behavior started, and failures were actually easy to spot in the console log.
I need to omit the router address from the TCP stack settings, as well as avoid entering a DNS server address. These have their own impact, but at least apps launch.

https://support.apple.com/en-us/HT202491

This looks like an unsigned apology letter. It should be rejected by Gatekeeper.

[…] Thursday, users on Twitter and other social media platforms began complaining that their Mac computers were becoming unresponsive, hanging and unable to launch or install many […]

[…] Thursday, users on Twitter and other social media platforms began complaining that their Mac computers were becoming unresponsive, hanging and unable to launch or install many […]

"We did have an alternative in macOS, which used to maintain a local database of revoked certificates, or so we suspect, until over a year ago. At the height of its use, that database was updated every couple of weeks. So if Apple revoked a certificate being used to sign malicious software, it could take another two weeks or more before that revocation had trickled down to all active Macs. One of the advantages of the newer OCSP approach"

I'm not sure where that's coming from, but I believe it may be a confusion between the Developer ID CA, which doesn't use a CRL, and the Apple WWDR CA, which does use a CRL. I tried to clarify this in my blog post: https://lapcatsoftware.com/articles/revocation.html

Leave a Comment