(no title)
w-m | 8 months ago
In the idea of making more of an OpenAI minute, don't send it any silence.
E.g.
ffmpeg -i video-audio.m4a \
-af "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:\
stop_periods=-1:stop_duration=0.02:stop_threshold=-50dB,\
apad=pad_dur=0.02" \
-c:a aac -b:a 128k output_minpause.m4a -y
will cut the talk down from 39m31s to 31m34s, by replacing any silence (with a -50dB threshold) longer than 20ms by a 20ms pause. And to keep with the spirit of your post, I measured only that the input file got shorter, I didn't look at all at the quality of the transcription by feeding it the shorter version.
jwrallie|8 months ago
One half interesting / half depressing observation I made is that at my workplace any meeting recording I tried to transcribe in this way had its length reduced to almost 2/3 when cutting off the silence. Makes you think about the efficiency (or lack of it) of holding long(ish) meetings.
dogprez|8 months ago
d1sxeyes|8 months ago
sudhirj|8 months ago
swyx|8 months ago
guys how hard is it to toss both versions into like diffchecker or something haha youre just comparing text
TimorousBestie|8 months ago
nickjj|8 months ago
Unfortunately a byproduct of listening to everything at 2x is I've had a number of folks say they have to watch my videos at 0.75x but even when I play back my own videos it feels painfully slow unless it's 2x.
For reference I've always found John Carmack's pacing perfect / natural and watchable at 2x too.
A recent video of mine is https://www.youtube.com/watch?v=pL-qft1ykek. It was posted on HN by someone else the other day so I'm not trying to do any self promotion here, it's just an example of a recent video I put up and am generally curious if anyone finds that too fast or it's normal. It's a regular unscripted video where I have a rough idea of what I want to cover and then turn on the mic, start recording and let it pan out organically. If I had to guess I'd say the last ~250-300 videos were recorded this way.
noahjk|8 months ago
quietbritishjim|8 months ago
But it feels (very subjectively) faster to me than usual because you don't really seem to take any pauses. It's like the whole video is a single run-on sentence that I keep buffering, but I never get a chance to process it and flush the buffer.
makeitdouble|8 months ago
Now I think speed adjustment come less from the natural speaking pace of the person than the subject matter.
I'm thinking of a channel like Accented Cinema (https://youtu.be/hfruMPONaYg), with a slowish talking pace, but as there's all the visual part going on at all times, it actually doesn't feel slow to my ear.
I felt the same for videos explaining concept I have no familiarity with, so I see as how fast the brain can process the info, less than the talking speed per se.
fuzztester|8 months ago
https://en.m.wikipedia.org/wiki/James_Goodnight
I have watched one or two videos of his, and he spoke slowly, compared to the average person. I liked that. It sounded good.
userbinator|8 months ago
Watching your video at 1x still feels too slow, and it's just right for me at 2x speed (that's approximately how fast I normally talk if others don't tell me to slow down), although my usual YouTube watching speed is closer to 2.5-3x. That is to say, you're still faster than a lot of others.
I think it just takes practice --- I started at around 1.25x for videos, and slowly moved up from there. As you have noticed, once you've consumed enough sped-up content, your own speaking speed will also naturally increase.
retsibsi|8 months ago
viraptor|8 months ago
We get used to higher speeds when we consume a lot of content that way. Have you heard the systems used by experienced blind people? I cannot even understand the words in them, but months of training would probably fix that.
Der_Einzige|8 months ago
https://en.wikipedia.org/wiki/Spreading_(debate)
SavioMak|8 months ago
fortran77|8 months ago
behnamoh|8 months ago
I wonder if there's a way to automatically detect how "fast" a person talks in an audio file. I know it's subjective and different people talk at different paces in an audio, but it'd be cool to kinda know when OP's trick fails (they mention x4 ruined the output; maybe for karpathy that would happen at x2).
janalsncm|8 months ago
Transcribe it locally using whisper and output tokens/sec?
echelon|8 months ago
Stupid heuristic: take a segment of video, transcribe text, count number of words per utterance duration. If you need speaker diarization, handle speaker utterance durations independently. You can further slice, such as syllable count, etc.
btown|8 months ago
varispeed|8 months ago
mrstone|8 months ago
Hilbert transform and FFT to get phoneme rate would work.
dTal|8 months ago
WalterSear|8 months ago
georgemandis|8 months ago
w-m|8 months ago
Good god. You couldn't make that any more convoluted and hard-to-grasp if you wanted to. You gotta love ffmpeg!
I now think this might be a good solution:
QuantumGood|8 months ago
zamadatix|8 months ago
https://www.theverge.com/news/603581/youtube-premium-experim...
ars|8 months ago
I listen to a lot of videos on 3 or even 4x.
zahlman|8 months ago
david_allison|8 months ago
brunoborges|8 months ago
pragmatic|8 months ago
vayup|8 months ago
In either case, I bet OpenAI is doing the same optimization under the hood and keeping the savings for themselves.
unknown|8 months ago
[deleted]
CSMastermind|8 months ago
Is it common for people to watch Youtube sped up?
I've heard of people doing this for podcasts and audiobooks and never understood it all that much there. Just feels like 'skimming' a real book instead of actually reading it.
keithxm23|8 months ago
Additionally, the brain tends to adjust to a faster talking speed very quickly. If I'm watching an average-paced person talk and speed them up by 2x, the first couple minutes of listening might be difficult and will require more intent-listening. However, the brain starts processing it as the new normal and it does not feel sped-up anymore. To the extent that if I go back to 1x, it feels like the speaker is way too slow.
Eezee|8 months ago
Same with a video. A lot of people speak considerably slower than you could process the information they are conveying, so you speed it up. You still get the same content and are not skipping parts as you would when skimming a book.
83|8 months ago
That's the goal for me lately. I primarily use Youtube for technical assistance (where are the screws to adjust this carburetor?, how do I remove this brake hub?, etc). There used to be short 1 to 2m videos on this kind of stuff but nowadays I have to suffer through a 10-15 minute video with multiple ad breaks.
So now I always watch youtube at 2x speed while rapidly jumping the slider forward to find relevant portions.
Feathercrown|8 months ago
cbsmith|8 months ago
niutech|8 months ago