Thinking through some potentially interesting sources for videos where two people are talking but we don't know what was said and well, I think this is a decent starting point: https://www.youtube.com/watch?v=KLcfpU2cubo
Sadly, doesn't work too great in this situation:
> That they didnt go through but i would tell you theyre just a chill look at here lets do it chills with all of our great men and they look at every chance they go oh do you want to the black man well thats my gosh thats my gosh thats my gosh thats my gosh thats my gosh thats my gosh thats
Bowman: You know, of course, though, he's right about the 9000 series having a perfect operational record. They do.
Poole: Unfortunately that sounds a little like famous last words.
Bowman: Yeah, still it was his idea to carry out the failure mode analysis, wasn't it?
Poole: mmm
Bowman: Should certainly indicate... (away from camera): his integrity and self-confidence
Bowman: If he were wrong, it'd be the surest way of proving it.
Poole: It would be if he knew he was wrong.
Results:
"Of course there is recommended getting necessary to have a perfect operational rank i know youre going to be the first to do that youre going to get the best youre going to get the best youre going to get the best youre going to get the best youre going to get the best of yours if you want to rock better sure its well perfect."
Experienced lip readers are lucky to get half of what is said. Better than nothing but not reliable enough for anything and so better to use something else if possible.
'i love you' and 'island view' have the same lip movements is the clasical example.
Doesn't even need to be user guided. Use videos that have audio. You could have one AI that generates a transcript using the audio/video and another that watches the video on mute and tries to read the lips. Feedback would then be provided by the AI that had access to the audio.
luma|1 year ago
Sadly, doesn't work too great in this situation:
> That they didnt go through but i would tell you theyre just a chill look at here lets do it chills with all of our great men and they look at every chance they go oh do you want to the black man well thats my gosh thats my gosh thats my gosh thats my gosh thats my gosh thats my gosh thats
willwade|1 year ago
[deleted]
mtVessel|1 year ago
Uploaded video dialog:
Bowman: You know, of course, though, he's right about the 9000 series having a perfect operational record. They do.
Poole: Unfortunately that sounds a little like famous last words.
Bowman: Yeah, still it was his idea to carry out the failure mode analysis, wasn't it?
Poole: mmm
Bowman: Should certainly indicate... (away from camera): his integrity and self-confidence
Bowman: If he were wrong, it'd be the surest way of proving it.
Poole: It would be if he knew he was wrong.
Results:
"Of course there is recommended getting necessary to have a perfect operational rank i know youre going to be the first to do that youre going to get the best youre going to get the best youre going to get the best youre going to get the best youre going to get the best of yours if you want to rock better sure its well perfect."
coolandsmartrr|1 year ago
pogue|1 year ago
I'd be interested to know how accurate it is, from what angles it will read lips at (front facing, side, etc).
Sounds promising if it works well. Imagine all the historical videos without sound you could try to finally know what was being said.
bluGill|1 year ago
'i love you' and 'island view' have the same lip movements is the clasical example.
echelon|1 year ago
User-submitted videos (with audio for STT), user-crafted bounding boxes (we might not need these soon), and user-guided RLHF.
The submitted videos are likely diverse, challenging (otherwise the human might just do it), and representative of solving actual customer problems.
indoordin0saur|1 year ago
shrubble|1 year ago
tchock23|1 year ago
ydnaclementine|1 year ago
[deleted]