in Reed's "No Rules Rules" book, they discuss how this contest was really a way to recruit top-quality engineering talent, which is a key assumption in how they ran their culture (highly paid small teams with lots of freedom, trusted to know what's best from the ground-up rather than top-down).
They didn't really need an algorithm in the first place, they just brainstormed what would sound the coolest to developers.
A little secret is that the content rights holders negotiated draconian streaming contracts that are outright prohibitive to Netflix. Netflix is trying to do everything possible to hide the fact that their streaming library is small and shrinking. In the old hey-day of Netflix mail-in red envelopes with DVDs you were hard pressed to find a movie they did not have. These days your search returns mostly crap.
All the top talent they've acquired and money spent still can't save them from their complete garbage lineup of content they push out, all while limiting account access and raising prices.
Wait, so they knew _in advance_ that they weren't going to use the result? Before they even started the competition? Or is this ret-conning once they decided not to use it. I feel like I can't really trust Reed's word on it, since he has a very strong incentive to paint himself as a mastermind.
Everything that Netflix says or does externally wrt to tech appears to be for this purpose, including open-sourcing some things. It is all catnip for the HN crowd to come work for them.
So the entire premise by which they were “recruiting” was dishonest? No one sees any problems with lying to people? Dishonesty has apparently become the norm, not the exception. Honestly and trust is a bedrock principle of society. It’s no wonder our society is in the state that it’s in today.
You really cannot trust anything anyone says, especially large companies. They basically always lie and manipulate and are at a constant advantage to those of us who find dishonesty morally wrong.
Ironically, Netflix's recommendation algorithm is now notoriously bad. In my experience, it seems to heavily push whatever "original" they just dumped $100 million into as well as what's most popular on the platform at the time. But it makes sense, since people increasingly rely on social media for deciding what to watch. A dollar Netflix spends improving their recommendation algorithm simply won't stack up to the dollar TikTok/Instagram/etc. spends improving theirs, because social media apps have so much more data to work with. It's probably more economical from their POV to let broad social media trends dictate what people watch.
I suspect their algorithm has other success metrics than "did user enjoy movie". For example if the movie is a Netfix Original or exclusive movie then it is more valuable to Netflix because they may discuss or recommend it, which leads to more signups. Similarly they may prefer newer shows that are more likely to be discussed and raise hype than older movies that even if the user loves them may be less likely to advertise to others.
in my experience from the outside it's difficult to separate their rec algorithm quality vs. the quality of their inventory. you're making the assumption they have a ton of great stuff that they're just not showing you, but they may not.
I wonder if the real goal is the secondary effect. If you watch a show on Netflix and enjoy it, that’s good for customer satisfaction. If you watch their exclusive original content, not only are you satisfied but now you are telling all your friends about something only Netflix can provide.
As someone who works in search relevance, having just a great algorithm isn't worth much.
You need all the team and know-how that has the maturity to maintain such an algorithm. Not just the ML skills. But all the bazillion ops, data quality, and many other things that go around it.
I've worked with a lot of teams that have one smart person building stuff off to the side, in R or a Notebook, and then nobody knows how to productionize it. They try to throw the algorithm over the fence. Even if the team somehow succeeds in getting it into an A/B test, it eventually falls by the wayside, unless they can build the team and workflow around that person / algorithm / methodology
> I've worked with a lot of teams that have one smart person building stuff off to the side, in R or a Notebook, and then nobody knows how to productionize it. They try to throw the algorithm over the fence. Even if the team somehow succeeds in getting it into an A/B test, it eventually falls by the wayside, unless they can build a team and workflow around that person / algorithm / methodology.
In my view, this marks a cultural failure - and certainly not on the part of the 'one smart person.'
Using a recommendation algorithm doesn't make any sense for Netflix' new business model where they produce their own content. It should be easy. They greenlight or buy the rights certain content for different demographics to maximize the coverage of their user base that has their needs satisfied enough to stay subscribed. If you're a 18-35 male, they've surely got people there working daily to make sure there's a content pipeline coming just for you. They shouldn't need AI to tell me that, they just need to identify when I'm in the demographic of one of their shows coming, and tell me about it. Maybe like a list. Still, they seem like they're doing a bad job at it, as I never really know what new shows Netflix is making for me. Sometimes I find out about them years later.
For example, compare this to Disney+. I vaguely know about every Marvel and Star Wars thing 2 years in advance, and usually vaguely know what order they're coming out, and I don't even know how I know this, somehow I just know. Okay, that's easy because it's basically 2 IPs. But ultimately Netflix is doing something similar behind the scenes, why haven't they succeeded at making me aware of what content I'm supposed to be hyped for?
To be honest, I would be happy if their 20+ people UX team would create an experience where I can easily find what I was watching instead of shoving stuff down my throat that I have no interest in.
Then again, as long as people pay for that experience, it will continue to be as unbearable as it is.
This feels like a disingenuous argument. “Young to middle-aged man likes Marvel and Star Wars” is such an obvious pick that it just about invites parody. As someone with at this point zero interest in Star Wars / Marvel content I a) wouldn’t retain the information that you did and b) wouldn’t find it to be that impressive that D+ was shoving it into my face.
To not understand the usefulness of recommendation algorithms almost feels intentionally contrarian.
The Netflix Prize competition (2006, completed in 2009) was a Kaggle competition before Kaggle competitions (2010), with the same business-side incentives.
Given the meteoric rise of ML/AI in the past few years, I'm surprised that Kaggle doesn't come up more often. It was all the rage 2013-2018...then most I heard about it was that it allowed free access to TPUs.
I feel like kaggle is a copycat wasteland for most projects now. You don't quite get top talent, rather you get a bunch of people xgboosting their way up your leaderboard.
There have been quite a few interesting Kaggle competitions in recent years, as well as other interesting ML/data science competitions on other platforms.
Platforms like Kaggle, DrivenData, Zindi, AIcrowd, CodaLab and others are running dozens-hundreds of competitions a year in total, including ones linked to top academic conferences. One interesting recent one is this one on LLM efficiency - trying to see to what extent people can fine-tune an LLM with just 1GPU and 24h: https://llm-efficiency-challenge.github.io/
Or the Makridakis series of challenges, running since the 80s, which are a great testbed for time-series models (the 6th one finished just last year): https://mofc.unic.ac.cy/the-m6-competition/
Interesting that the answer is basically "because of akrasia". I wish more of these recommendation algorithms were based more off of who we'd like to be rather than how we behave.
Anyone who has experience with ML wouldn’t be surprised by that. Oftentimes ML competitions are about combining dozens of models together to juice the extra 0.01%—something that isn’t viable in a production environment as the quote in the article confirms.
AFAIK, modern ML challenges try to combat that by moving from answer-only submissions to code submissions and putting constraints on compute.
Hang on a sec, it looks like BelKor's Pragmatic Chaos and The Ensemble had the exact same Best Test Score and % Improvement. Did BelKor win because they submitted their solution 20 minutes earlier than The Ensemble? Or is that Best Submit Time something else?
Yes - from what I remember the team that came second lost out due to something like 6 decimal places and the time diff in submission. It was pretty close!
Dr. David Belanger (RIP) was a part of advising this team (BelKor)and knew all about this, but never ever bragged about until one day I stumbled upon it and asked him about his involvement.
Also, they definitely used code from the first two competitions in the company.
"The first year of the competition, in 2006 and 2007, the technical advancements that were made by us and the other big teams I think were really significant in the field of recommender systems," says Volinsky. He thinks the idea that Netflix didn't use the results is a misconception. "We gave them our code. They definitely did implement and use those breakthroughs that we made in the first year."
For more that are interested , I can speak about it to the best of my ability. https://ieeexplore.ieee.org/author/38180399800 ( that is some things about him) Essentially he would say that they initially shelved it over time as it was not needed, but they definitely used. Belanger tragically died November 18, 2022.
Streaming gives them much better information than a user's voluntary stars. Did you actually watch it? All of it? How soon? All at once?
User ratings are fraught. The things a user actually does are more likely to be sincere. And it doesn't put any burden on the user. I have a feeling that even the simple thumb up/down means less than whether you actually finished watching the movie.
To bad they did'nt use any of that information they have on me to recommend me things I actually wanted to watch. They just kept pushing the same shit I never once displayed intention in watching over, and over and over again.
Sounds like everybody is winning other than me. Though given I cancelled and never looked back, maybe I am winning afterall.
Most companies that size (in evolving industries at least) should have 10-100 of those moonshot/R&D/marketing projects on the go. And I'm speaking to the choir here - exactly: surely most readers of thenextweb would be thinking the same here.
does anybody here know a lot about recommendation algorithms? The one I'm interested in seems obvious to me, but it's never implemented so maybe it doesn't work, or it serves outliers and not the mainstream so why bother. The data is there, but I don't get to mine it. (heheh "mine" it)
I don't want to know what's popular, I want to know what other people who share my past tastes are watching currently. Doesn't matter the size of my niche, why can't i find my niche, and be shown my cohort?
I just don't feel it's an impossible task to find, for example, a set of people who enjoyed Deadpool 1, Guardians of the Galaxy 1, but think the entire rest of the superhero cavalcade is utter garbage. ok, I'll allow Kickass also, and a grudging nod to Iron Man 1. (as an illustration, I just searched to look up the name of Iron Man, I could only remember Tony Stark. Top search results were "a fictional character", "played by R Downey Jr", "Marvel cinematic universe"... You see, none of those is Iron Man 1. I don't care about the horde of people who care about a supposed cinematic universe. We (in my cohort) liked the first one, and what we liked about it wasn't continued in the 2nd.
How about preferences that show I like a few early films with say George Clooney, but after that, no. Julia Roberts, same. I remember when Meryl Streep was a new actress, but now she's a red flag. I'm sure my tastes aren't shared by everybody, but I'm also sure, since I can articulate reasons, there must be others who in general share them, people who are willing to watch what's new and trendy in some way, but then not beat the dead horse. Oh, Spielberg, a few winners, but overall "nope nope nope", the way he aims at sentiment is a zero for me.
I originally had this idea back in the 80's when Consumer Reports had their monthly mail-in bingo card for what current movies you liked, then they would tabulate what was popular among their readers. That type of "best of" list is entirely missing co-variance, which I think is where I live.
That's called collaborative filtering (https://en.wikipedia.org/wiki/Collaborative_filtering) and is perhaps the most battle-hardened and most effective approaches in recommender systems. Even now, novel deep learning approaches implement the concept, but simple naive approaches are still as/more effective. The first paper published on it specifically in the field was in the 90s but the seeds of it go back to the 70s. It would have made a good thesis in the 80s :)
Part of why it's so effective are for the reasons you outline, in that it can find items that you'll probably like, that aren't similar to items you already like, based on people that have similar tastes to you.
I remember seeing the leaderboard rankings before the prize was awarded. If I recall correctly the top two teams were very close to the coveted goal. They ended merging to become one team and spit the prize once they achieved the goal.
Engineers called it "Not worth my valuable time" kind of engineering effort. An early version of the 10x engineer (who refuses to do lots of meaningless work).
This prize/competition has become a meme in itself and inspired hordes and hordes of people to dabble with recommendation systems.
The advance of recsys is in big part thanks for Netflix because it attracted a lot of people to the field, all companies started developing inhouse RecSys just like today you hear about inhouse LLM/chatgpt wrappers etc
[+] [-] mushufasa|2 years ago|reply
They didn't really need an algorithm in the first place, they just brainstormed what would sound the coolest to developers.
[+] [-] lr1970|2 years ago|reply
[+] [-] rabuse|2 years ago|reply
[+] [-] j2kun|2 years ago|reply
[+] [-] shermantanktop|2 years ago|reply
[+] [-] consumer451|2 years ago|reply
I was just an outsider, but I guess at least I learned about passive levitation.
[+] [-] iamleppert|2 years ago|reply
You really cannot trust anything anyone says, especially large companies. They basically always lie and manipulate and are at a constant advantage to those of us who find dishonesty morally wrong.
[+] [-] rappatic|2 years ago|reply
[+] [-] kevincox|2 years ago|reply
[+] [-] VikingCoder|2 years ago|reply
Husband.
Wife.
Husband and Wife.
Husband and 4th grader.
Wife and Pre-K.
Husband, Wife, 4th grader, and Pre-K.
4th grader.
Pre-K.
4th grader and Pre-K.
Each of those has a totally different viewing pattern and preferences.
[+] [-] 972811|2 years ago|reply
[+] [-] shusaku|2 years ago|reply
[+] [-] danielscrubs|2 years ago|reply
[+] [-] softwaredoug|2 years ago|reply
You need all the team and know-how that has the maturity to maintain such an algorithm. Not just the ML skills. But all the bazillion ops, data quality, and many other things that go around it.
I've worked with a lot of teams that have one smart person building stuff off to the side, in R or a Notebook, and then nobody knows how to productionize it. They try to throw the algorithm over the fence. Even if the team somehow succeeds in getting it into an A/B test, it eventually falls by the wayside, unless they can build the team and workflow around that person / algorithm / methodology
[+] [-] whimsicalism|2 years ago|reply
In my view, this marks a cultural failure - and certainly not on the part of the 'one smart person.'
[+] [-] geoduck14|2 years ago|reply
[+] [-] zzbzq|2 years ago|reply
For example, compare this to Disney+. I vaguely know about every Marvel and Star Wars thing 2 years in advance, and usually vaguely know what order they're coming out, and I don't even know how I know this, somehow I just know. Okay, that's easy because it's basically 2 IPs. But ultimately Netflix is doing something similar behind the scenes, why haven't they succeeded at making me aware of what content I'm supposed to be hyped for?
[+] [-] cybrox|2 years ago|reply
Then again, as long as people pay for that experience, it will continue to be as unbearable as it is.
[+] [-] cqqxo4zV46cp|2 years ago|reply
To not understand the usefulness of recommendation algorithms almost feels intentionally contrarian.
[+] [-] teruakohatu|2 years ago|reply
[+] [-] taeric|2 years ago|reply
[+] [-] minimaxir|2 years ago|reply
Given the meteoric rise of ML/AI in the past few years, I'm surprised that Kaggle doesn't come up more often. It was all the rage 2013-2018...then most I heard about it was that it allowed free access to TPUs.
[+] [-] htrp|2 years ago|reply
[+] [-] hcarlens|2 years ago|reply
Platforms like Kaggle, DrivenData, Zindi, AIcrowd, CodaLab and others are running dozens-hundreds of competitions a year in total, including ones linked to top academic conferences. One interesting recent one is this one on LLM efficiency - trying to see to what extent people can fine-tune an LLM with just 1GPU and 24h: https://llm-efficiency-challenge.github.io/
Or the Makridakis series of challenges, running since the 80s, which are a great testbed for time-series models (the 6th one finished just last year): https://mofc.unic.ac.cy/the-m6-competition/
[+] [-] tunesmith|2 years ago|reply
[+] [-] LudwigNagasena|2 years ago|reply
AFAIK, modern ML challenges try to combat that by moving from answer-only submissions to code submissions and putting constraints on compute.
[+] [-] codelikeawolf|2 years ago|reply
[+] [-] Irishsteve|2 years ago|reply
[+] [-] bob_theslob646|2 years ago|reply
https://www.thrillist.com/entertainment/nation/the-netflix-p...
Also, they definitely used code from the first two competitions in the company.
"The first year of the competition, in 2006 and 2007, the technical advancements that were made by us and the other big teams I think were really significant in the field of recommender systems," says Volinsky. He thinks the idea that Netflix didn't use the results is a misconception. "We gave them our code. They definitely did implement and use those breakthroughs that we made in the first year."
For more that are interested , I can speak about it to the best of my ability. https://ieeexplore.ieee.org/author/38180399800 ( that is some things about him) Essentially he would say that they initially shelved it over time as it was not needed, but they definitely used. Belanger tragically died November 18, 2022.
[+] [-] gwern|2 years ago|reply
[+] [-] jfengel|2 years ago|reply
User ratings are fraught. The things a user actually does are more likely to be sincere. And it doesn't put any burden on the user. I have a feeling that even the simple thumb up/down means less than whether you actually finished watching the movie.
[+] [-] theshackleford|2 years ago|reply
Sounds like everybody is winning other than me. Though given I cancelled and never looked back, maybe I am winning afterall.
[+] [-] chris-orgmenta|2 years ago|reply
The title to me sounds analogous to 'I didn't use that piece of code I wrote on the weekend (but I probably learned something from it)'.
If it were 10m, maybe relevant? But hardly to their bottom line.
https://www.macrotrends.net/stocks/charts/NFLX/netflix/reven... ...
Most companies that size (in evolving industries at least) should have 10-100 of those moonshot/R&D/marketing projects on the go. And I'm speaking to the choir here - exactly: surely most readers of thenextweb would be thinking the same here.
[+] [-] fsckboy|2 years ago|reply
I don't want to know what's popular, I want to know what other people who share my past tastes are watching currently. Doesn't matter the size of my niche, why can't i find my niche, and be shown my cohort?
I just don't feel it's an impossible task to find, for example, a set of people who enjoyed Deadpool 1, Guardians of the Galaxy 1, but think the entire rest of the superhero cavalcade is utter garbage. ok, I'll allow Kickass also, and a grudging nod to Iron Man 1. (as an illustration, I just searched to look up the name of Iron Man, I could only remember Tony Stark. Top search results were "a fictional character", "played by R Downey Jr", "Marvel cinematic universe"... You see, none of those is Iron Man 1. I don't care about the horde of people who care about a supposed cinematic universe. We (in my cohort) liked the first one, and what we liked about it wasn't continued in the 2nd.
How about preferences that show I like a few early films with say George Clooney, but after that, no. Julia Roberts, same. I remember when Meryl Streep was a new actress, but now she's a red flag. I'm sure my tastes aren't shared by everybody, but I'm also sure, since I can articulate reasons, there must be others who in general share them, people who are willing to watch what's new and trendy in some way, but then not beat the dead horse. Oh, Spielberg, a few winners, but overall "nope nope nope", the way he aims at sentiment is a zero for me.
I originally had this idea back in the 80's when Consumer Reports had their monthly mail-in bingo card for what current movies you liked, then they would tabulate what was popular among their readers. That type of "best of" list is entirely missing co-variance, which I think is where I live.
[+] [-] anjc|2 years ago|reply
Part of why it's so effective are for the reasons you outline, in that it can find items that you'll probably like, that aren't similar to items you already like, based on people that have similar tastes to you.
[+] [-] alexhutcheson|2 years ago|reply
The same general methods are also relevant to other matching problems, like search.
Disclosure: Work at Google, but not on anything related to this course.
[+] [-] bazil376|2 years ago|reply
[+] [-] dev-tacular|2 years ago|reply
[+] [-] j2kun|2 years ago|reply
> But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.
[+] [-] thefringthing|2 years ago|reply
[+] [-] ffhhj|2 years ago|reply
There are about 21 LatAm countries.
[+] [-] bryanrasmussen|2 years ago|reply
[+] [-] gyudin|2 years ago|reply
[+] [-] avelis|2 years ago|reply
[+] [-] nashashmi|2 years ago|reply
[+] [-] slt2021|2 years ago|reply
The advance of recsys is in big part thanks for Netflix because it attracted a lot of people to the field, all companies started developing inhouse RecSys just like today you hear about inhouse LLM/chatgpt wrappers etc