Thursday, January 17, 2019

How Facebook Keeps Messenger from Crashing on New Year’s Eve

Amy Nordrum (via Hacker News):

In addition to shifting loads, the Messenger team has developed other levers that it can pull “if things get really bad,” says Ahdout. Every new message sent to a server goes into a queue as part of a service called Iris. There, messages are assigned a timeout—a period of time after which, that message will drop out of the queue to make room for new messages. During a high-volume event, this allows the team to quickly discard certain types of messages, such as read receipts, to focus its resources on delivering ones that users have composed.

[…]

Georgiou says the group can also sacrifice the accuracy of the green dot displayed in the Messenger app that indicates a friend is currently online. Slowing the frequency at which the dot is updated can relieve network congestion. Or, the team could instruct the system to temporarily delay certain functions—such as deleting information about old messages—for a few hours to free up CPUs that would ordinarily perform that task, in order to process more messages in the moment.

[…]

“You can bundle some of those together into a single large request before you send it downstream. Doing that, you reduce the computational load on downstream systems.”

Batches are formed based on a principle called affinity, which can be derived from a variety of characteristics. For example, two messages may have higher affinity if they are traveling to the same recipient, or require similar resources from the back end. As traffic increases, the Messenger team can have the system batch more aggressively. Doing so will increase latency (a message’s roundtrip delay) by a few milliseconds, but makes it more likely that all messages will get through.

1 Comment RSS · Twitter


Graceful degradation is a great idea to handle high traffic events.

Leave a Comment