(no title)
filipeisho | 11 months ago
I think the demo could be more exciting, the voice of the person talking sounds like he's bored haha
filipeisho | 11 months ago
I think the demo could be more exciting, the voice of the person talking sounds like he's bored haha
dang|11 months ago
"What works well for HN is raw and direct, with zero production values. Skip any introductions and jump straight into showing your product doing what it does best. Voiceover is good, but no marketing slickness—no fancy logos or background music!"
I guess there's zero production values and zero production values...
filipeisho|11 months ago
lukasego|11 months ago
lmeierhoefer|11 months ago
IMO, the most promising approach to this is something along the lines of MA-RLHF (https://arxiv.org/abs/2410.02743) but adapted to the real world, i.e., spitting up the reward model to grade individual actions inside the trajectory to reduce the “attention distance” between the reward and the decision.