Twitter’s Recommendation Algorithm
Twitter (Hacker News, Slashdot):
At Twitter 2.0, we believe that we have a responsibility, as the town square of the internet, to make our platform transparent. So today we are taking the first step in a new era of transparency and opening much of our source code to the global community.
[…]
We also took additional steps to ensure that user safety and privacy would be protected, including our decision not to release training data or model weights associated with the Twitter algorithm at this point.
Twitter (tweet, Hacker News):
Twitter aims to deliver you the best of what’s happening in the world right now. This requires a recommendation algorithm to distill the roughly 500 million Tweets posted daily down to a handful of top Tweets that ultimately show up on your device’s For You timeline. This blog is an introduction to how the algorithm selects Tweets for your timeline.
[…]
The foundation of Twitter’s recommendations is a set of core models and features that extract latent information from Tweet, user, and engagement data. These models aim to answer important questions about the Twitter network, such as, “What is the probability you will interact with another user in the future?” or, “What are the communities on Twitter and what are trending Tweets within them?” Answering these questions accurately enables Twitter to deliver more relevant recommendations.
The recommendation pipeline is made up of three main stages that consume these features:
- Fetch the best Tweets from different recommendation sources in a process called candidate sourcing.
- Rank each Tweet using a machine learning model.
- Apply heuristics and filters, such as filtering out Tweets from users you’ve blocked, NSFW content, and Tweets you’ve already seen.
The (mostly Scala) code is here.
The new stuff in the Twitter algorithm is wild.
“author_is_elon”, “author_is_republican”, “author_is_democrat”, etc., are explicit terms that are special cased.
Likes, then retweets, then replies
Here’s the ranking parameters:
- Each like gets a 30x boost
- Each retweet a 20x
- Each reply only 1x
It’s much more impactful to earn likes and retweets than replies.
Images & videos help
Both images and videos lead to a nice 2x boost.
Links hurt, unless you have enough engagement
Generally external links get you marked as spam.
There were some very funny things in there that have now been deleted, but as of this writing, this one remains, as does one requesting a printed version of the repository.
Mysk:
What is this WTF thing that Twitter's algorithm refers to?
Mysk:
Things no longer restricted on Twitter
Update (2023-04-04): Mysk:
Oh, the number of unfollows might result in “shadow-banning” a Twitter account. The Twitter algorithm shadow-bans an account based on these 5 negative signals:
1- Blocks
2- Mutes
3- Abuse reports
4- Spam reports
5- Unfollows
Update (2023-04-24): Arvind Narayanan (via Hacker News):
It turns out to be a standard engagement prediction algorithm of the kind most major platforms use; I explained how these algorithms work in a recent essay. The source code release makes for an interesting case study of social media transparency. Let’s talk about what the code does and doesn’t reveal, from the perspective of trying to understand information propagation and algorithmic amplification on social media.
4 Comments RSS · Twitter · Mastodon
Elon tweeted about having the "author_is_elon" code removed. Seems like he was unaware of it all. See here:
https://twitter.com/elonmusk/status/1641908130274525187
Also this:
https://twitter.com/elonmusk/status/1628122949185159168
https://twitter.com/elonmusk/status/1521524585090277379
https://twitter.com/elonmusk/status/1454809318356750337
https://twitter.com/elonmusk/status/1525739780323016704
"What is this WTF thing that Twitter's algorithm refers to?" depending on context, probably "Who to Follow"
"Seems like he was unaware of it all"
He told his engineers to make sure his tweets got promoted by the algorithm. Everybody knew even before this release that his tweets got special treatment. I'm genuinely confused by Elon now claiming he didn't know about this. Maybe he didn't know specifically how the code would look, but he definitely must have known that his own tweets got special treatment.
I'm fairly certain that in Baby Boss mind it went something like this.
1 My tweets are the best
2 Therefore my tweets should be trending all the time
3 They aren't
4 The also is bad at finding the best tweets
So the dog, in his mind, was too correct the algorithm which was obviously broken.
To the overworked staff that is left in his The Office larp there was only ever one solution though.