{"id":49703,"date":"2025-10-20T16:52:23","date_gmt":"2025-10-20T20:52:23","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=49703"},"modified":"2025-10-28T09:56:46","modified_gmt":"2025-10-28T13:56:46","slug":"aws-outage","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2025\/10\/20\/aws-outage\/","title":{"rendered":"AWS Outage"},"content":{"rendered":"<p><a href=\"https:\/\/health.aws.amazon.com\/health\/status?ts=20251020\">Amazon<\/a> (<a href=\"https:\/\/old.reddit.com\/r\/aws\/comments\/1obd3lx\/dynamodb_down_useast1\/\">Reddit<\/a>, <a href=\"https:\/\/news.ycombinator.com\/item?id=45640838\">Hacker News<\/a>, <a href=\"https:\/\/news.ycombinator.com\/item?id=45646649\">2<\/a>, <a href=\"https:\/\/news.ycombinator.com\/item?id=45642951\">3<\/a>):<\/p>\n<blockquote cite=\"https:\/\/health.aws.amazon.com\/health\/status?ts=20251020\">\n<p>We are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.<\/p>\n<\/blockquote>\n\n<p>I like how, <a href=\"https:\/\/developer.apple.com\/system-status\/\">unlike Apple&rsquo;s status page<\/a>, you can see a history of outages and updates.<\/p>\n\n<p><a href=\"https:\/\/www.theverge.com\/news\/802486\/aws-outage-alexa-fortnite-snapchat-offline\">Jess Weatherbed<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.theverge.com\/news\/802486\/aws-outage-alexa-fortnite-snapchat-offline\"><p>A major Amazon Web Services (AWS) outage took down multiple online services for several hours this morning, including Amazon, Alexa, Snapchat, <em>Fortnite<\/em>, <a href=\"https:\/\/status.openai.com\/incidents\/01K80CBJD5Z64DF82KGT3K3QE0\">ChatGPT<\/a>, <a href=\"https:\/\/x.com\/EOSStatus\/status\/1980194104115150858\">Epic Games Store, Epic Online Services<\/a>, and more. Some of the impacted platforms, including <em><a href=\"https:\/\/x.com\/FortniteStatus\/status\/1980249794502082948\">Fortnite<\/a><\/em>, <a href=\"https:\/\/x.com\/EOSStatus\/status\/1980226546607726916\">Epic Games Store<\/a>, and <a href=\"https:\/\/x.com\/AravSrinivas\/status\/1980239929189036222\">Perplexity<\/a> had announced that they are fully recovered and back online earlier this morning, while others are still having issues.<\/p><p>The AWS dashboard first reported issues affecting the US-EAST-1 Region at 3:11AM ET, and eventually said that &ldquo;The underlying DNS issue has been fully mitigated.&rdquo;<\/p><\/blockquote>\n\n<p>I noticed this through problems with Amazon SES, which seemed to continue <a href=\"https:\/\/mastodon.social\/@lapcatsoftware\/115406571187976968\">long<\/a> <a href=\"https:\/\/www.dailymail.co.uk\/sciencetech\/article-15208111\/Snapchat-Roblox-Duolingo-Fortnite-outage.html\">after<\/a> Amazon reported it as fixed. Also, the status page said the outage was confined to Northern Virginia, but I saw reports that other zones were affected, too.<\/p>\n\n<p><a href=\"https:\/\/news.ycombinator.com\/item?id=45645912\">caymanjim<\/a>:<\/p>\n<blockquote cite=\"https:\/\/news.ycombinator.com\/item?id=45645912\"><p>This is the real problem. Even if you don&rsquo;t run anything in AWS directly, something you integrate with will. And when us-east-1 is down, it doesn&rsquo;t matter if those services are in other availability zones. AWS&rsquo;s own internal services rely heavily on us-east-1, and most third-party services live in us-east-1.<\/p><p>It really is a single point of failure for the majority of the Internet.<\/p><\/blockquote>\n\n<p>Normally, my site and store will failover to using Mailgun, but this ran into two problems:<\/p>\n\n<ul>\n<li><p>SES was not failing right away, so it wouldn&rsquo;t try Mailgun until after some sort of timeout.<\/p><\/li>\n<li><p>Mailgun failed with &ldquo;Connection unexpectedly closed&rdquo; errors. It&rsquo;s <a href=\"https:\/\/status.mailgun.com\">unclear<\/a> to me whether this is because part of their SMTP service relies on other AWS services that were also down.<\/p><\/li>\n<\/ul>\n\n<p>See also: <a href=\"https:\/\/bsky.app\/profile\/davemark.com\/post\/3m3n3yv3ryk2y\">Dave Mark<\/a>, <a href=\"https:\/\/mastodon.social\/@bwebster\/115408068307301571\">Brain Webster<\/a>, <a href=\"https:\/\/daringfireball.net\/linked\/2025\/10\/20\/major-aws-outage\">John Gruber<\/a>, <a href=\"https:\/\/x.com\/rjonesy\/status\/1980127572601258140\">Ryan Jones<\/a>, <a href=\"https:\/\/x.com\/film_girl\/status\/1980195651070915008\">Christina Warren<\/a>.<\/p>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/05\/02\/google-cloud-services-outages\/\">Google Cloud Services Outages<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2017\/03\/06\/amazon-s3-outage\/\">Amazon S3 Outage<\/a><\/li>\n<\/ul>\n\n<p id=\"aws-outage-update-2025-10-21\">Update (<a href=\"#aws-outage-update-2025-10-21\">2025-10-21<\/a>):  The cause of my Mailgun problem was, apparently, that they disable your account if you haven&rsquo;t logged in in a while. After logging into the Web interface, SMTP support was automatically reactivated.<\/p>\n\n<p><a href=\"https:\/\/www.theregister.com\/2025\/10\/20\/aws_outage_amazon_brain_drain_corey_quinn\/\">Corey Quinn<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=45649178\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/www.theregister.com\/2025\/10\/20\/aws_outage_amazon_brain_drain_corey_quinn\/\"><p>And so, a quiet suspicion starts to circulate: where have the senior AWS engineers who&rsquo;ve been to this dance before gone? And the answer increasingly is that they&rsquo;ve left the building &mdash; taking decades of hard-won institutional knowledge about how AWS&rsquo;s systems work at scale right along with them.<\/p><p>[&#8230;]<\/p><p>Once you reach a certain point of scale, there are no simple problems left. What&rsquo;s more concerning to me is the way it seems AWS has been flailing all day trying to run this one to ground. Suddenly, I&rsquo;m reminded of something I had tried very hard to forget.<\/p><p>[&#8230;]<\/p><p>You can hire a bunch of very smart people who will explain how DNS works at a deep technical level (or you can hire me, who will incorrect you by explaining that it&rsquo;s a database), but the one thing you can&rsquo;t hire for is the person who remembers that when DNS starts getting wonky, check that seemingly unrelated system in the corner, because it has historically played a contributing role to some outages of yesteryear.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/x.com\/alpennec\/status\/1980531015660834834\">Axel Le Pennec<\/a>:<\/p>\n<blockquote cite=\"https:\/\/x.com\/alpennec\/status\/1980531015660834834\"><p>Should we have a fallback to plain StoreKit in case RevenueCat, Superwall or Adapty are down? &#x1F914; <\/p><p>I guess apps that are only using StoreKit weren&rsquo;t affected by the AWS outage.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.dexerto.com\/entertainment\/aws-crash-causes-2000-smart-beds-to-overheat-and-get-stuck-upright-3272251\/\">Calum Patterson<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.dexerto.com\/entertainment\/aws-crash-causes-2000-smart-beds-to-overheat-and-get-stuck-upright-3272251\/\">\n<p>A major Amazon Web Services (AWS) outage on October 20 had the unexpected side effect of causing chaos in bedrooms across the US, as owners of Eight Sleep&rsquo;s $2,000+ &lsquo;Pod&rsquo; mattress covers found their smart beds had no offline mode and were stuck at high temperatures and odd positions in the night.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/writing.exchange\/@davepolaschek\/115408256669846725\">Dave Polaschek<\/a>:<\/p>\n<blockquote cite=\"https:\/\/writing.exchange\/@davepolaschek\/115408256669846725\">\n<p>The outage today reminded me of July 28, 1995, when <a href=\"https:\/\/www.skypoint.com\/members\/gimonca\/burnin.html\">almost all of Minnesota fell off the Internet<\/a>.<\/p>\n<\/blockquote>\n\n<p id=\"aws-outage-update-2025-10-22\">Update (<a href=\"#aws-outage-update-2025-10-22\">2025-10-22<\/a>): See also: <a href=\"https:\/\/arstechnica.com\/tech-policy\/2025\/10\/amazons-dns-problem-knocked-out-half-the-web-likely-costing-billions\/\">Ashley Belanger<\/a>, <a href=\"https:\/\/stratechery.com\/2025\/resiliency-and-scale\/\">Ben Thompson<\/a>, <a href=\"https:\/\/www.thebignewsletter.com\/p\/corporate-sludge-what-the-aws-outage\">Matt Stoller<\/a>.<\/p>\n\n<p id=\"aws-outage-update-2025-10-23\">Update (<a href=\"#aws-outage-update-2025-10-23\">2025-10-23<\/a>): <a href=\"https:\/\/newsletter.pragmaticengineer.com\/p\/what-caused-the-large-aws-outage\">Gergely Orosz<\/a>:<\/p>\n<blockquote cite=\"https:\/\/newsletter.pragmaticengineer.com\/p\/what-caused-the-large-aws-outage\">\n<p>Today, we look into what caused this outage.<\/p>\n<\/blockquote>\n\n<p id=\"aws-outage-update-2025-10-28\">Update (<a href=\"#aws-outage-update-2025-10-28\">2025-10-28<\/a>): <a href=\"https:\/\/www.theregister.com\/2025\/10\/27\/signal_ceo_meredith_whittaker_aws_dependency\/\">Thomas Claburn<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.theregister.com\/2025\/10\/27\/signal_ceo_meredith_whittaker_aws_dependency\/\"><p>Signal president Meredith Whittaker called attention to this massive dependency in <a href=\"https:\/\/mastodon.world\/@Mer__edith\/115445701583902092\">a thread<\/a> on the Mastodon social network, explaining how the concentration of power among cloud hyperscalers limits the options of services like Signal in terms of resiliency and network control.<\/p><p>Whittaker said that the concentration of power among cloud hyperscalers (AWS, Google, and Microsoft) is less widely understood than she expected, which bodes poorly for efforts to craft realistic strategies to change this dynamic.<\/p><p>She explained, &ldquo;The question isn&rsquo;t &lsquo;why does Signal use AWS?&rsquo; It&rsquo;s to look at the infrastructural requirements of any global, real-time, mass comms platform and ask how it is that we got to a place where there&rsquo;s no realistic alternative to AWS and the other hyperscalers.&rdquo;<\/p><\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Amazon (Reddit, Hacker News, 2, 3): We are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region. I like how, unlike Apple&rsquo;s status page, you can see a history of outages and updates. Jess Weatherbed: A major Amazon Web Services (AWS) outage took down multiple online services for several hours [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2025-10-20T20:52:25Z","apple_news_api_id":"f783e18e-bdb6-4446-9ee9-660f65fbd398","apple_news_api_modified_at":"2025-10-28T13:56:49Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAw==","apple_news_api_share_url":"https:\/\/apple.news\/A94Phjr22REae6WYPZfvTmA","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[602,672,728,2468,2190,96,50],"class_list":["post-49703","post","type-post","status-publish","format-standard","hentry","category-technology","tag-amazon-ses","tag-amazon-web-services","tag-domain-name-system-dns","tag-mailgun","tag-outage","tag-web","tag-webapi"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/49703","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=49703"}],"version-history":[{"count":5,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/49703\/revisions"}],"predecessor-version":[{"id":49805,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/49703\/revisions\/49805"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=49703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=49703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=49703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}