{"id":39253,"date":"2023-05-02T13:59:55","date_gmt":"2023-05-02T17:59:55","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=39253"},"modified":"2023-09-04T14:54:35","modified_gmt":"2023-09-04T18:54:35","slug":"google-cloud-services-outages","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2023\/05\/02\/google-cloud-services-outages\/","title":{"rendered":"Google Cloud Services Outages"},"content":{"rendered":"<p><a href=\"https:\/\/www.theregister.com\/2023\/04\/26\/google_cloud_outage\/\">Thomas Claburn<\/a> (<a href=\"https:\/\/news.ycombinator.com\/item?id=35732384\">Hacker<\/a> <a href=\"https:\/\/news.ycombinator.com\/item?id=35711349\">News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/www.theregister.com\/2023\/04\/26\/google_cloud_outage\/\"><p>Google Cloud stopped operating in Paris early on Wednesday morning local time due to &ldquo;water intrusion,&rdquo; said the off-prem biz, which a day earlier reported profitability for the first time.<\/p><p>[&#8230;]<\/p><p>&ldquo;Water intrusion in europe-west9-a led to an emergency shutdown of some hardware in that zone,&rdquo; the company&rsquo;s <a href=\"https:\/\/status.cloud.google.com\/incidents\/dS9ps52MUnxQfyDGPfkY\">status page<\/a> explains. &ldquo;There is no current ETA for recovery of operations in europe-west9-a, but it is expected to be an extended outage. Customers are advised to fail over to other zones in europe-west9 if they are impacted.&rdquo;<\/p><p>A short while later, the incident description changed to &ldquo;a multi-cluster failure and has led to an emergency shutdown of multiple zones.&rdquo;<\/p><p>[&#8230;]<\/p><p>Though more brief, the load balancing problems were far broader, affecting not just the europe-west9 zone but multiple zones in Asia, Australia, Europe, North America, and South America.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/GergelyOrosz\/status\/1651256082424012806\">Gergely Orosz<\/a> (via <a href=\"https:\/\/twitter.com\/drewthaler\/status\/1651329107488305153\">Drew Thaler<\/a>):<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/GergelyOrosz\/status\/1651256082424012806\">\n<p>I have questions. How does water intrusion into <em>one<\/em> data center take a whole zone (which should be multiple, physically separate and redundant DCs) offline?<\/p>\n<p>The point of availability zones is to avoid issues in one DC taking down the whole zone.<\/p>\n<p>Oh, I just see: an issue in one DC took down a whole region! So all AZs within that region are down.<\/p>\n<p>Wow, this is very bad: the point of AZs is exactly for this to not happen.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/joshuaseattle\/status\/1651256583563923457\">Joshua Burgin<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/joshuaseattle\/status\/1651256583563923457\"><p>Both Google and Microsoft don&rsquo;t guarantee that all zones are physically separate buildings or separated by at least &lt;x&gt; km\/miles. Many of their &ldquo;zones&rdquo; in smaller regions are just separate buildings by the same DC facility<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/dylantack\/status\/1651336707990769665\">Dylan Tack<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/dylantack\/status\/1651336707990769665\"><p>&ldquo;[AWS] AZs are <a href=\"https:\/\/aws.amazon.com\/about-aws\/global-infrastructure\/regions_az\/#US_West\">physically separated<\/a> by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.&rdquo;<\/p><\/blockquote>\n\n<p id=\"google-cloud-services-outages-update-2023-09-04\">Update (2023-09-04): <a href=\"https:\/\/www.itnews.com.au\/news\/microsoft-had-three-staff-at-australian-data-centre-campus-when-azure-went-out-599849\">Ry Crozier<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=37378080\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/www.itnews.com.au\/news\/microsoft-had-three-staff-at-australian-data-centre-campus-when-azure-went-out-599849\">\n<p>Microsoft had &ldquo;insufficient&rdquo; staff levels at its data centre campus last week when a power sag knocked its chiller plant for two data halls offline, cooking portions of its storage hardware.<\/p>\n<p>[&#8230;]<\/p>\n<p>&ldquo;We have temporarily increased the team size from three to seven, until the underlying issues are better understood and appropriate mitigations can be put in place.&rdquo;<\/p>\n<\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Thomas Claburn (Hacker News): Google Cloud stopped operating in Paris early on Wednesday morning local time due to &ldquo;water intrusion,&rdquo; said the off-prem biz, which a day earlier reported profitability for the first time.[&#8230;]&ldquo;Water intrusion in europe-west9-a led to an emergency shutdown of some hardware in that zone,&rdquo; the company&rsquo;s status page explains. &ldquo;There is [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2023-05-02T17:59:58Z","apple_news_api_id":"bdbdef2a-3131-40a6-bb15-35b1ab39567c","apple_news_api_modified_at":"2023-09-04T18:54:38Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAA==","apple_news_api_share_url":"https:\/\/apple.news\/Avb3vKjExQKa7FTWxqzlWfA","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[672,1354,856,2190,96,50],"class_list":["post-39253","post","type-post","status-publish","format-standard","hentry","category-technology","tag-amazon-web-services","tag-google-cloud-platform","tag-microsoft-azure","tag-outage","tag-web","tag-webapi"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/39253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=39253"}],"version-history":[{"count":2,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/39253\/revisions"}],"predecessor-version":[{"id":40531,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/39253\/revisions\/40531"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=39253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=39253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=39253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}