AWS Outage
Amazon (Reddit, Hacker News, 2, 3):
We are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.
I like how, unlike Apple’s status page, you can see a history of outages and updates.
A major Amazon Web Services (AWS) outage took down multiple online services for several hours this morning, including Amazon, Alexa, Snapchat, Fortnite, ChatGPT, Epic Games Store, Epic Online Services, and more. Some of the impacted platforms, including Fortnite, Epic Games Store, and Perplexity had announced that they are fully recovered and back online earlier this morning, while others are still having issues.
The AWS dashboard first reported issues affecting the US-EAST-1 Region at 3:11AM ET, and eventually said that “The underlying DNS issue has been fully mitigated.”
I noticed this through problems with Amazon SES, which seemed to continue long after Amazon reported it as fixed. Also, the status page said the outage was confined to Northern Virginia, but I saw reports that other zones were affected, too.
This is the real problem. Even if you don’t run anything in AWS directly, something you integrate with will. And when us-east-1 is down, it doesn’t matter if those services are in other availability zones. AWS’s own internal services rely heavily on us-east-1, and most third-party services live in us-east-1.
It really is a single point of failure for the majority of the Internet.
Normally, my site and store will failover to using Mailgun, but this ran into two problems:
SES was not failing right away, so it wouldn’t try Mailgun until after some sort of timeout.
Mailgun failed with “Connection unexpectedly closed” errors. It’s unclear to me whether this is because part of their SMTP service relies on other AWS services that were also down.
See also: Dave Mark, Brain Webster, John Gruber, Ryan Jones, Christina Warren.
Previously:
4 Comments RSS · Twitter · Mastodon
> It really is a single point of failure for the majority of the Internet.
It was a single point of failure big time. I had a small Amazon order scheduled for delivery yesterday. I happened to be awake at 3am EDT and suddenly realized they missed the "6-8 pm" promised delivery. No email, nothing. After making sure it wasn't delivered, I thought I'd cancel the order. I was logged in and one page said delivery would be today (10/20) but the status page for the order sad 10/21. So I stupidly logged out, figuring I could log back in. Keep in mind, this is Amazon....
At first I couldn't get past the User ID page. I checked DownDetector and saw the spike. At least it was entertaining reading the comments. It was about an hour later that I could see things start coming back - at least I got through the password page - but couldn't get through the captcha, because after the first test it would reset no matter what. I finally did around 4:30. When I called Amazon to cancel the order (maybe an hour later) I was on hold for about 10 minutes. After the cancellation I wished the person on the other end a good day, explaining my experience, probably giving him a heads-up, as he said I was his first call....
Here's the last sentence from ABC News:
> Shares of Amazon ticked up 1.3% in midday trading, despite the outage.
Seems that EU governments have taken notice. Not like devs located in the EU didn't know or joke about the spof/dependency for ages.
Still a relatively minor incident when one considers what could happen had the region suffered a complete outage.
Yup, Amazon SES were definitely held up. Fortunately not critical and the retries were done automatically from my local MTA, but I only use it at all to get around "Policy" blocking. I shouldn't have needed to, and after I get a proxy up and running in Linode for all my SMTP traffic, I won't need Amazon anymore. But it just goes to show how much we depend on this shit, even when we're going out of our way to avoid it!