Monday, July 8, 2019

Cloudflare Outage Caused by Regular Expression

John Graham-Cumming:

Unfortunately, one of these rules contained a regular expression that caused CPU to spike to 100% on our machines worldwide. This 100% CPU spike caused the 502 errors that our customers saw. At its worst traffic dropped by 82%.

We were seeing an unprecedented CPU exhaustion event, which was novel for us as we had not experienced global CPU exhaustion before.

Update (2019-07-15): John Graham-Cumming (Hacker News):

Although the regular expression itself is of interest to many people (and is discussed more below), the real story of how the Cloudflare service went down for 27 minutes is much more complex than “a regular expression went bad”. We’ve taken the time to write out the series of events that lead to the outage and kept us from responding quickly. And, if you want to know more about regular expression backtracking and what to do about it, then you’ll find it in an appendix at the end of this post.

I wonder what the total wasted electrical power was during this event.

Probably 1/99999999999999999999th of the wasted power on decoding bitcoins. :P

