(no title)
jamesblonde | 24 days ago
TikTok's recommender is partly built on European Technology (Apache Flink for real-time feature computation), along with Kafka, and distributed model training infrastructure. The Monolith paper is misleading that the 'online training' is key. It is not. It is that your clicks are made available as features for predicitons in less than 1 second. You need a per-event stream processing architecture for this (like Flink - Feldera would be my modern choice as an incremental streaming engine).
* https://www.youtube.com/watch?v=skZ1HcF7AsM
* Monolith paper - https://arxiv.org/pdf/2209.07663
eddd-ddde|23 days ago
I don't think any single other platform has as good a search feature as TikTok does.
tridentboy|23 days ago
nik_0_0|20 days ago
dmix|24 days ago
kgeist|23 days ago
randysalami|24 days ago
pandemic_region|24 days ago
beAbU|23 days ago
BoxOfRain|23 days ago
rjh29|23 days ago
It never reliably gives me videos similar but not exactly the same, i.e. things I might be interested in.
vjerancrnjak|24 days ago
If by features you mean tracking state per user, that stuff can be tracked without Flink insanely fast with Redis as well.
If you re saying they dont have to load data to update the state, I dont see how massive these states are to require inmemory updates, and if so, you could just do inmemory updates without Flink.
Similarly, any consumer will have to deal with batches of users and pipelining.
Flink is just a bottleneck.
If they actually use Flink for this, its not the moat.
btown|23 days ago
My hunch is we start to learn a lot more about the core internals as Oracle tries to market to B2B customers, as Oracle is wont to do!
lsuresh|23 days ago
For anyone else, if you want to try out Feldera and IVM for feature-engineering (it gives you perfect offline-online parity), you can start here: https://docs.feldera.com/use_cases/fraud_detection/
unknown|24 days ago
[deleted]
bobek|23 days ago
[1] https://recombee.com
miohtama|24 days ago
wongarsu|23 days ago
AlienRobot|24 days ago
notyourwork|23 days ago
3abiton|23 days ago
not_ai|23 days ago
unknown|24 days ago
[deleted]
permo-w|23 days ago
owenversteeg|22 days ago
Speed completely changes the game in a few ways. The first is identifying interests. Imagine every possible interest in a tree structure. Let's say you're into kumiko. There are so many levels of the tree to traverse to find kumiko; perhaps Skilled crafts -> Woodworking -> Japanese -> Construction without use of fasteners -> Panels and decorative elements -> Kumiko. The more iterations you can get through, the better you can match people's interests. If someone has 10 interests and each one requires many questions to determine, it can take forever to find exact interests with a system that only narrows down your interests every X videos vs. after each video.
The second is matching current moods. Let's say you just broke up with your girlfriend, or your pet fish died, or you're on vacation in Spain. A rapidly-updating system can capture those trends and get right to the heart of them in time for them to matter. A slow system might only get through a few iterations and capture a vague interest in Spain; a fast-updating one can get through countless iterations of guessing. Spain? What city? Tourist or moving there? What type of tourist? Foodie? What type of food? How fancy? Bam, you're watching the perfect video about an upscale seafood restaurant in Barcelona.
The third is type and flavor of content. Even inside of a small niche you will find many flavors of content. Super-short or long form, fast paced or slow, funny or serious, intellectual, irreverent, political leanings, background music, et cetera. Maybe you like slow long-form woodworking content but like fast-paced travel guides. Maybe you hate background music except when it's in skateboarding videos. To determine this requires an incredible amount of "questioning" of the user.
Now, of course, an algorithm that updates once daily can also make inferences about your interests and preferences. It can certainly learn, with enough time, what you are into and how you like to consume it. But the key thing is that these inferences only enable _predetermined_ changes. Imagine you are a human showing someone TikToks. Imagine that you can ask them any questions about their preferences right as they watch a video. You may not ask a question after every video, but you will ask countless questions over the hours of scrolling that day, and you will get good data. Now imagine a new restriction: you must decide your questions once a day in advance. You will manage far fewer questions; and to follow up on them you must wait yet another day.
Now, why do I partly agree? Well, I don't think speed is everything; I think TikTok has another sort of je ne sais quoi to it. I think it has a unique culture and community. It has a better UI and better features than Instagram. It has a young and cool reputation, far from the Millennial taint of Instagram or Facebook. And I suspect that they are good at identifying _who_ you are and acting on that information. But in my eyes, the speed could very well be the most important part of the puzzle.
ryanjshaw|24 days ago
jamesblonde|23 days ago
SpaceManNabs|23 days ago
cactusplant7374|23 days ago
permo-w|23 days ago
unknown|23 days ago
[deleted]
NedF|23 days ago
[deleted]
computerthings|23 days ago
[deleted]
Jamesbeam|24 days ago
[deleted]