Twitter Should Open Up the Algorithm

[+] gfodor|4 years ago|reply

Recommender systems like Twitter are as much about data as they are about code. Without the dataset and the derived statistics and models that are used for ranking and recall, the code is not going to help much with transparency.

[+] sokoloff|4 years ago|reply

Exactly. “Here’s the code for our matrix multiplication, dot product, and sorting; you’re welcome to check it for bugs.”

[+] sobkas|4 years ago|reply

> Recommender systems like Twitter are as much about data as they are about code. Without the dataset and the derived statistics and models that are used for ranking and recall, the code is not going to help much with transparency.

And also other way around, data depends on algorithms and tools that created them. You can't have insightful understanding of data without knowing how and why they were created.

[+] cycrutchfield|4 years ago|reply

Right. It’s a bit worrisome that people (even people in this very thread) know nothing about how recommendation and ranking work but think they know everything. “Release the code” is nonsensical.

[+] nopenopenopeno|4 years ago|reply

This is a ridiculous claim. While you won’t be able to make use of the code, it would still provide insight of the inputs in relative terms, and evidence methods that do not apply.

[+] soheil|4 years ago|reply

I think that's what everyone is talking about. They're just using the term algorithm colloquially.

[+] djanogo|4 years ago|reply

Oh, you mean the matrix came up with rule to ban people talking about lab leak? but matrix can't figure out simple crypto bots and need humans to report and manually ban them?

[+] 11111|4 years ago|reply

At least according to this Twitter both agrees with you and is in the process of exploring transparency for both.

https://blog.openmined.org/announcing-our-partnership-with-t...

[+] photochemsyn|4 years ago|reply

I think this is a great idea, and it shouldn't be limited to Twitter. Youtube recommendation algorithms should be handled similarly, indeed any public-facing system needs to be honest about the kind of spin they're putting on their recommendatiuon algorithms. Even better, allow the users to spin the dials when it comes down to their own particular searches/results.

> “One of the things I believe Twitter should do is open source the algorithm and make any changes to people’s tweets — you know, if they’re emphasized or de-emphasized — that action should be made apparent so anyone can see that action’s been taken. So there’s no sort of behind-the-scenes manipulation either algorithmically or manually.” Musk also added later, “The code should be on GitHub so people can look through it.”" (CNBC interview)

[+] yosito|4 years ago|reply

> allow the users to spin the dials

If I could pick something to regulate about recommendation algorithms, probably a big part of it would be mandating consumer controls over the algorithm. Complete transparency might be hard when sometimes even the programmers of these systems don't have the ability to see why they recommend what they do, but there should definitely also be mandatory oversight by consumer protection agencies that have full ability to audit the code and report on it.

[+] jimmaswell|4 years ago|reply

Sounds like a bad idea. Bad actors would immediately exploit it and figure out how to most effectively flood everyone with spam, same as if Google's exact algorithms were public. Not everything has to be open.

[+] LightHugger|4 years ago|reply

On occasion, people make the same argument about all open source software, that making it public makes it insecure.

Fortunately this has not turned out to be the case in reality. If there are flaws in the system they will probably be easier to find and fix with more eyes on it.

[+] matheweis|4 years ago|reply

Not if, as the article suggested, there were multiple algorithms to choose from.

Especially not if the top choices were fundamentally different from each other.

It may go so far as to make it significantly more difficult for bad actors to operate.

[+] memish|4 years ago|reply

Is that what happened with Linux and Mastodon?

[+] soheil|4 years ago|reply

Yes it'd be a cat and mouse game and that's how you build an efficient and fair algorithm. Not too unlike how the market decides what the price should be for a certain instrument. Lots of buys, lots of sells and where they meet is called the price. All in the open for anyone to see.

Let's stop the Communistic mentality around keeping things hidden. Please read about Soviet Russia, about Siberia and how Russians had no idea about the miseries inflected on them for decades.

[+] dkobia|4 years ago|reply

The most contentious moderation happens via subjective human intervention rather than an algorithms. Fully machine driven moderation is a black hole into which copious amounts of time and money from the r&d labs of the biggest social media companies disappear into every year for a long time now.

[+] ankit219|4 years ago|reply

But the article is not talking about content moderation as such. It's about surfacing the right content for right people, and regain the trust that author believes twitter has lost, without really justifying how.

[+] soheil|4 years ago|reply

> The most contentious moderation happens via subjective human intervention rather than an algorithms.

1. Source?

2. Thought experiment: if said subjective human intervention was recorded and codified into an algorithm it wouldn't be as contentious?

[+] jimkleiber|4 years ago|reply

Or just give us more options for viewing a feed than "recommended" or "latest."

Heck, even create an alogrithm store where people could create and purchase different ways to view their feed.

It doesn't seem that difficult to add these multiple sort and filter options, but maybe it's more complex than I imagine.

[+] alm1|4 years ago|reply

I feel like authors are thinking about the "algorithm" almost as if it's just a bunch of if-else statements in a tree.

Read only audit is an overly naive way to think about understanding complex algorithms. In an audit they would find an enable of large DNN transformer models with thousands of layers and thousands of features often with their own transformation trees. There are entire CS departments dedicated to researching tools to understand complex nonlinear models. You definitely can't do with a read only code audit. You can _barely_ do it with full access to the model, full access to the input data and being able to retrain and rerun the model on that data.

[+] CrimsonCape|4 years ago|reply

You are suggesting the people who read HN and might click over to the "open Twitter" codebase won't understand it?

Engineers don't exist in a vacuum, the same engineers who write the algos are here on HN. Show HN the Twitter source code and a lot of engineers will understand it.

What is more likely is that the engineers are currently bound by NDAs and can't say what influences the algo.

[+] lazzlazzlazz|4 years ago|reply

Comments saying that this is naive seem to be completely missing the point in their desire to cynically defend a miserable state of affairs: clients, and not Twitter's servers, should ultimately determine what users see. Like the web browser or an email client.

The fact that we calculate precisely what users see on their devices on servers is a result of the architectural constraints of the time, and more importantly, the ad model of monetizing social networks.

If social networks were not monetized in this way, there would be far more power allocated to clients and APIs.

This isn't to say user devices can do everything: they can't. But can easily be given significant power to filter, reorder, and request different content — and with more advanced engineering, allow users to parameterize feeds.

The reason we can't have this is that allowing this degree of user choice undermines the ads model.

[+] shalmanese|4 years ago|reply

> clients, and not Twitter's servers, should ultimately determine what users see. Like the web browser or an email client.

That would involve giving clients access to information that clients probably shouldn't have. eg: If a part of the weighting for recommendation is that people you follow who regularly DM other people you follow should be weighted higher, doing it client side would allow you to see other people's private DM information.

[+] 310260|4 years ago|reply

The author's suggestion of a marketplace for algorithms sounds really cool. Whether it's easily technically feasible is another question entirely but the idea of that level of customization would be really cool. Some of the functionality is already there with Lists (as far as allowing regular users to create feeds and others being allowed to subscribe to them).

I do worry about the political implications of it though. People choosing to subscribe to only their world view would create an impenetrable echo chamber. At least now, there is some crossover. If people chose algorithms that avoided the other side of a debate, it would make things worse. Furthermore, you could have outside influences tricking people into certain algorithms to secure a populace with their beliefs.

[+] alvaroir|4 years ago|reply

If the problem is that the "recommendations" are biased, wouldn't it be simpler to:

1) Remove the recommendation part. Go back to a simpler version of Twitter. 2) Still give the option for a recommendation system but, somehow, open the code, maybe some version of the data and publish documentation (like papers or something) detailing the training process of the current running version.

than build a recommender system marketplace? Also, in what sense would this be different from allowing third party Twitter clients + opening a ritcher API?

The ultimate idea of using ML is to "automatically" build the recommender system the user likes (measuring this with some particular metric like online time or retention) the most and also automatically adapt it as his/her preferences change. The problem to me is more the metrics chosen to be optimized.

However, I believe that in the end, and in order to be profitable, user retention and time on the platform will still be pursued. It doesn't seem like an easy fix to me.

Regarding the "free speech" part, I'm not an expert, but I'd say (and after having watched the TED interview) that countries' legislations will considerably constraint this.

I love the idea of a true free platform tho

[+] eric4smith|4 years ago|reply

Amazingly Naive.

Just the data and code needed to make the “algo” scale to what it is means there is no good way to “open it up”.

But let’s say, and why not, that it actually was opened.

I can see it now, everyone and their mother would be recommending changes to it. People would want it at the extremes, or tweaked to just not recommend their pet peeves.

And if they did not get their way, they would go off in a huff and threaten to join another service.

Years ago I ran a big enough service that had millions and millions of users every month. A member of the military came on and demanded we allow him to do something that was against TOS.

He complained that this was what he fought in Iraq for. So he could write anything he wanted anywhere.

America is certainly not free and it’s time we stop giving our content to services like Twitter and Facebook if we believe that.

[+] extheat|4 years ago|reply

> I can see it now, everyone and their mother would be recommending changes to it. People would want it at the extremes, or tweaked to just not recommend their pet peeves.

I think the solution to this is to let people switch between different recommendation algorithms, a stream of chronological tweets, most liked among recent posted, one powered by ML, etc. I don't see any reason the implementations behind this, or even Twitter more broadly, could not be open source. There are many other open source projects, like Signal that manage this just fine. And for a while, even Reddit used to be open source. It's definitely do-able.

[+] bko|4 years ago|reply

I think they're looking for a statement like:

if (tweet.from in bad_users): score *= 0.8

Do they have the levers in place to manipulate results or is it a clear objective scoring system that shows what you see on your feed and search results? I believe they have something like this. To what extent its being manipulated is a separate question.

[+] admax88qqq|4 years ago|reply

The interesting question this poses and has come up on a few comment threada here is

Is "security through on obscurity" necessary for automated content moderation?

Would an open algorithm be trigially gamed by spammers as they can now test offline exactly how their posts will be ranked/promoted?

My gut says yes but I'm not an expert in this area. Curious if anyone has a theory or idea on if an open moderation algorithm could work.

SpamAssassin exists and is open to moderate success. But is that just because it's use is not widespread enough to bother to test your spam mails against it?

If every email account in the world was covered by SpamAssassin, what would spam look like, and how much would make it through.

[+] bko|4 years ago|reply

The twitter algo shouldn't be complicated. For the most part twitter shows me posts and likes from people I follow, occasionally sprinkling in a topic I follow, like viral tweets.

I want to read stuff from people I follow in more or less chronological order. Sure if something has a 1k likes and was posted a few hours ago and I hadn't seen it, show me that. There are simple formulas for time based rank that are out there.

But I regularly see twitter suppressing posts from people I follow. I won't see someone tweet for a few weeks and I check their account and I see they've been tweeting this whole time. It's wrong and annoying to me as a user.

[+] jmugan|4 years ago|reply

I wish they'd just let me control what I see. Sure, throw an occasional ad in there to make money. It would be hard to blame them for the nastiness on Twitter if they didn't control it.

[+] andrew_|4 years ago|reply

Make moderation guidelines and moderation audit trails public and open.

[+] rco8786|4 years ago|reply

Not sure there is anything to gain here. Moderation is inherently subjective, especially at the scale of Twitter. The same people would just continue arguing about what should and should not be moderated and how.

[+] ItsMonkk|4 years ago|reply

Both filtration(moderation) and sortation(the algorithm) should be handled exactly as Adblock handles it. People can create any list that they want, and people can choose to either subscribe to those lists or not, and can at anytime decide to see what it would look like if they were not.

[+] mdb31|4 years ago|reply

Yes, like the lobste.rs moderation log. It's cute because, like, nobody cares.

[+] dontreact|4 years ago|reply

Lots of good points here about how opening up the algorithm itself won't illuminate much and is technically extremely difficult without potentially devaluing the company since you would also have to open the data system behind the algorithm.

However, open sourcing any and all manual interventions over the algorithm + the guidelines used for evaluation and/or labeling (if any is done), would help to build a little bit of trust.

Not that much though, but it would be a start.

[+] mritun|4 years ago|reply

Let’s start with HackerNews for a start.

What would you say about opening up every poster who’s been blocked and exactly how and for reason (or keywords) they’ve been blocked?

How about opening up what keywords trigger mail to go to spam vs inbox for email providers? It’s going to be very valuable for someone to know how spam filtering works if their delivery rate doubles!

[+] Juliate|4 years ago|reply

What would be much more interesting would be Twitter allowing users to define and publish their own algorithm, like lists with superpowers.

Stumbled on this idea on https://twitter.com/nbashaw/status/1515054551371378688

[+] soheil|4 years ago|reply

So by the same logic McDonald's should publish its secret sauce recipe. There is business value in keeping it hidden. It'd have a chance of happening if Twitter was taken private by someone with the intention of experimentation and improving the world in the long run.

[+] rufus_foreman|4 years ago|reply

McDonald's is required in some places (Canada, for example) to publish the ingredients for its food. For example, Big Mac Sauce:

Soybean oil, sweet relish (cucumber, glucose-fructose, sugar, vinegar, salt, xanthan gum, calcium chloride, natural flavour), water, vinegar, egg yolk, onion powder, spices, salt, propylene glycol alginate, colour, sugar, garlic powder, hydrolyzed (corn, soy, wheat) proteins. CONTAINS: Soy, Wheat, Egg, Mustard.

-- https://www.mcdonalds.com/ca/downloads/IngredientslistCA_EN....

And there are strict rules about how that ingredients list must be constructed:

"Ingredients must be declared by their common name in descending order of their proportion by weight of a prepackaged product. The order must be the order or percentage of the ingredients before they are combined to form the prepackaged product. In other words, based on what was added to the mixing bowl"

and on and on for many thousands of words, https://inspection.canada.ca/food-label-requirements/labelli...

No, that's not the full recipe, but it is information a company might not want to disclose, but is required to disclose because it affects the health and safety of consumers.

[+] CharlesW|4 years ago|reply

"Open up the algorithm" is such a charmingly naive take on community moderation at scale that I can almost forgive the author for literally parroting Musk as the source of a lightweight take synthesized mostly from other people's thoughts in order to attract attention to his "Every" newsletter platform.

[+] bko|4 years ago|reply

I think that's a way to call for transparency. For instance, if they say they don't shadow ban people, then it should be evident in the code. And it would force them to have a clear process for things like search without any "nudging" or social manipulation.

[+] paulsutter|4 years ago|reply

Do you consider Jack Dorsey naive on the subject?

"The choice of which algorithm to use (or not) should be open to everyone" - Jack Dorsey

https://twitter.com/jack/status/1507146276416098307

[+] dmarcos|4 years ago|reply

What makes in your opinion the request naive? Too complex to open source or for people to analyze and criticize? I see some people becoming Twitter algorithm experts as we also have Linux kernel developers

[+] soheil|4 years ago|reply

How do you know? Where did you come up with evidence to confirm that statement? There has never been a platform as large as Twitter that opened up its algorithm.

[+] dmitriid|4 years ago|reply

Twitter et al should/must let people opt-in to recommendation algorithms (opt-out being the default), and must have a "reset all recommendations" button present and visible at all times.

I don't care about the actual algorithm.

136 comments