There are, I believe, three main reasons why the recommendations are so poor on YouTube:
a) YouTube doesn't know anything about the content itself, can only use metadata
b) The algorithm itself is biased towards creators that post often and keep users hook the longest, which is almost always vlogers (ask any animator what they think of YouTube)
c) Many recommendations systems today create many buckets and once you watch something from one bucket (you show your intent), the algorithm will focus on that bucket only. (You can see it working extremely poorly on Amazon that tries to sell you a fridge after you just bought a fridge).
It's very hard to build a great recommendation system (look at Spotify's Discover weekly), but because this is 101 of any machine learning course, it's the primary thing that companies refuse to outsource (I build company around it, failed badly).
I've found that the YouTube recommendations do a good job of picking a "next" video to watch, but an exceptionally poor job of constructing the front page.
If I watch "Some Video (part 1)", the recommendations reliably pick "Some Video (part 2)" next, with the other parts as the other related videos and similar content further down. If I watch a random video from a particular channel, the recommendations show more videos from that channel. If I watch a video of a particular game or a reaction to a given episode of a show, the recommendations show more videos of that game or more reactions to that same episode. If I listen to music by a particular artist, the recommendations show more music by that artist.
On the other hand, the front page consistently shows me 1) old videos I've already seen, 2) collections of highly viewed content that I have no interest in even if I've already hit the "not interested" X on it, and 3) popular videos by channels I already subscribe to (I don't want to know what's popular, I want to know what's new.).
Youtube does have automatic transcription for videos. It's not too hard to link this to a topic hierarchy (maybe they already do this). It seems like a hard problem at their scale, since unlike Spotify, the list of genres isn't knowable.
I've been building a search engine for lectures as a research project. For a small list of videos I find that browsing topic taxonomy is really nice compared to the recommenders that try to guess your intent.
There are commercial systems for automatically tagging the text (e.g. Watson) which hierarchies which don't go into niche areas - e.g. the Watson taxonomy tagger does 1,000 tags.
For more niche topics, I've explored Watson's entity recognition system, e.g. to recognize the names of diseases. The advantage is it picks up terms it hasn't seen- The problem is you can only identify entities that someone has trained a system to recognize.
The UI challenges are interesting as well. If spotified identified 100 genres that interested me, they could pick any arbitrary subset of playlists and I'd be pretty happy. If I used youtube to get home repair videos, and then they showed me videos about repairing parts of my house that aren't broken, it'd get pretty irritating.
d) The recommendations algorithm is one of the primary ways that YouTube users find videos to watch. No matter how bad its recommendations are, a lot of users will still act upon them, simply because the recommended videos are so visible. This becomes a self-fulfilling prophecy: videos that are frequently recommended are viewed many times because they are seen so often; the high view count on those videos makes them high-priority candidates for recommendation, and so on.
a) they really don't need to know the content, they can infer most of the content by 'who' watches and how much, aka collaborative filtering.
b) Shouldn't bias towards who "posts often" esp if the videos are shit. "Hooked the longest" would be ranked highly if you get hooked like the 'hooked users', makes sense?
c) those are poor systems, and actually, I think amazon has one of the much better rec systems.
Youtube has one of the worst. Just today their 4th ranked vid for me was the 2 hr long 9/11/2001 broadcast. Wat? Then, they never boost new videos of a channel I subscribe to and have seen every video of theirs over the last 3 months. I literally have to check the channel most recent vid list daily to see if I missed something.
I think youtube's weak recommender system is more a result of them having a hammer(deep learning) and seeing every problem as a nail.
d) revenue considerations, could be related to b). That would be something that could degrade de recomendations.
On a tangential note, ads. They sell those ads as "targeted" but when you play your "yoga" video... BAHM! a coca-cola add. So why build those "targeting" algorithms?
I was just talking about how terrible Youtube's recommendations are with my brother today and I realize this idea is naive but I think it would work better than the current machine language system:
- Gather up all the channels that are followed by channels that I follow and/or have liked videos on.
- Recommend me videos from those channels.
I'm pretty sure in my case this would be much better results than what I currently get shown at any given time.
If you are a parent, remember that YouTube results and suggestions can sometimes be rather "suggestive". All society needs some baseline of a moral code and an algorithm doesn't understand that.
Imagine if your child asked an adult neighhbor about the movie "beaches" and they responded with the same answers YouTube does. Go ahead search beaches. Or Beach, or vine.
I just tried this, and my first four results were for the Bette Midler film.
The rest are about beaches (most dangerous, weird things found on beaches, top 5 beaches in Brazil etc)
What is striking, and I've noticed this before on YouTube, is that the thumbnails all feature nearly nude women. You'd perhaps expect this to happen randomly for beach related videos, but I've noticed that if there even a fleeting bit of nudity in a film trailer or similar, it seems to end up in the thumbnail.
Does a human scan through and choose that moment, based on trying to maximise clicks? Or does an algorithm try random frames and then keep the ones that are click baitiest?
What are the methods in ML and predictive modelling that are used to counter 'bucketing'. As much as it is necessary for product creators to land us in groups by behavior, i think it is also necessary to have counter techniques to eventually split those groups and create new ones, no?
This is super interesting. How would I go about working with these guys, considering that they're in Palo Alto and I'm in London? I understand there's a bunch of hops you have to jump through in terms of visas, but I've never really looked into it.
These guys may be based in California but Google has a London office, and if you want to work on deep learning then Google DeepMind is the obvious place to go and they're based in London as well.
Why do you say that? What they recommend rarely is what I end up watching, but it's usually ballpark correct. So much of what motivates your daily YouTube viewings is externally, so there's only such much you can do. They are aware of this,
> Historical user behavior on YouTube is inherently
difficult to predict due to sparsity and a variety
of unobservable external factors.
[+] [-] doh|9 years ago|reply
a) YouTube doesn't know anything about the content itself, can only use metadata
b) The algorithm itself is biased towards creators that post often and keep users hook the longest, which is almost always vlogers (ask any animator what they think of YouTube)
c) Many recommendations systems today create many buckets and once you watch something from one bucket (you show your intent), the algorithm will focus on that bucket only. (You can see it working extremely poorly on Amazon that tries to sell you a fridge after you just bought a fridge).
It's very hard to build a great recommendation system (look at Spotify's Discover weekly), but because this is 101 of any machine learning course, it's the primary thing that companies refuse to outsource (I build company around it, failed badly).
[+] [-] JoshTriplett|9 years ago|reply
If I watch "Some Video (part 1)", the recommendations reliably pick "Some Video (part 2)" next, with the other parts as the other related videos and similar content further down. If I watch a random video from a particular channel, the recommendations show more videos from that channel. If I watch a video of a particular game or a reaction to a given episode of a show, the recommendations show more videos of that game or more reactions to that same episode. If I listen to music by a particular artist, the recommendations show more music by that artist.
On the other hand, the front page consistently shows me 1) old videos I've already seen, 2) collections of highly viewed content that I have no interest in even if I've already hit the "not interested" X on it, and 3) popular videos by channels I already subscribe to (I don't want to know what's popular, I want to know what's new.).
[+] [-] garysieling|9 years ago|reply
I've been building a search engine for lectures as a research project. For a small list of videos I find that browsing topic taxonomy is really nice compared to the recommenders that try to guess your intent.
https://www.findlectures.com
There are commercial systems for automatically tagging the text (e.g. Watson) which hierarchies which don't go into niche areas - e.g. the Watson taxonomy tagger does 1,000 tags.
For more niche topics, I've explored Watson's entity recognition system, e.g. to recognize the names of diseases. The advantage is it picks up terms it hasn't seen- The problem is you can only identify entities that someone has trained a system to recognize.
The UI challenges are interesting as well. If spotified identified 100 genres that interested me, they could pick any arbitrary subset of playlists and I'd be pretty happy. If I used youtube to get home repair videos, and then they showed me videos about repairing parts of my house that aren't broken, it'd get pretty irritating.
[+] [-] duskwuff|9 years ago|reply
d) The recommendations algorithm is one of the primary ways that YouTube users find videos to watch. No matter how bad its recommendations are, a lot of users will still act upon them, simply because the recommended videos are so visible. This becomes a self-fulfilling prophecy: videos that are frequently recommended are viewed many times because they are seen so often; the high view count on those videos makes them high-priority candidates for recommendation, and so on.
[+] [-] eanzenberg|9 years ago|reply
Youtube has one of the worst. Just today their 4th ranked vid for me was the 2 hr long 9/11/2001 broadcast. Wat? Then, they never boost new videos of a channel I subscribe to and have seen every video of theirs over the last 3 months. I literally have to check the channel most recent vid list daily to see if I missed something.
I think youtube's weak recommender system is more a result of them having a hammer(deep learning) and seeing every problem as a nail.
[+] [-] posterboy|9 years ago|reply
Maybe the system is trying to convince you to buy a different, perhaps more expensive fridge
[+] [-] WmyEE0UsWAwC2i|9 years ago|reply
On a tangential note, ads. They sell those ads as "targeted" but when you play your "yoga" video... BAHM! a coca-cola add. So why build those "targeting" algorithms?
[+] [-] ebbv|9 years ago|reply
- Gather up all the channels that are followed by channels that I follow and/or have liked videos on. - Recommend me videos from those channels.
I'm pretty sure in my case this would be much better results than what I currently get shown at any given time.
[+] [-] kaffeemitsahne|9 years ago|reply
[+] [-] JoshTriplett|9 years ago|reply
[+] [-] clydethefrog|9 years ago|reply
[+] [-] Perixoog|9 years ago|reply
[+] [-] jaypaulynice|9 years ago|reply
https://www.linkedin.com/pulse/beating-youtubes-ads-machine-...
[+] [-] utefan001|9 years ago|reply
Imagine if your child asked an adult neighhbor about the movie "beaches" and they responded with the same answers YouTube does. Go ahead search beaches. Or Beach, or vine.
[+] [-] ZeroGravitas|9 years ago|reply
The rest are about beaches (most dangerous, weird things found on beaches, top 5 beaches in Brazil etc)
What is striking, and I've noticed this before on YouTube, is that the thumbnails all feature nearly nude women. You'd perhaps expect this to happen randomly for beach related videos, but I've noticed that if there even a fleeting bit of nudity in a film trailer or similar, it seems to end up in the thumbnail.
Does a human scan through and choose that moment, based on trying to maximise clicks? Or does an algorithm try random frames and then keep the ones that are click baitiest?
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] chirau|9 years ago|reply
[+] [-] oneloop|9 years ago|reply
[+] [-] modeless|9 years ago|reply
[+] [-] Siecje|9 years ago|reply
[+] [-] hk__2|9 years ago|reply
[+] [-] cube00|9 years ago|reply
[+] [-] jepler|9 years ago|reply
[+] [-] oneloop|9 years ago|reply
> Historical user behavior on YouTube is inherently difficult to predict due to sparsity and a variety of unobservable external factors.