top | item 46485181

(no title)

yashasolutions | 1 month ago

nice! I would love some insights how you identify the the meaningful clips (how to explain to the LLM what meaningful mean for a given content) - I have to build a similar tool internally and that's the question I am trying to find a good answer to right now.

Regarding your UI, it's nice. I would suggest adding some basic control for audio level in the player. Else. adding some search bar with auto complete or suggested query can make the interface more engaging for new users and more practical for returning users.

Then next level, you can try to make TikTok for audio with scrollable vertical view and animated audio waves (listening to audio while seeing something nice is a good way to hook people in) and generated subtitles. Viewing the text from what you're listening increases focus.

discuss

order

conradbez|1 month ago

Thanks for checking out

Couple tips on audio front:

1. gemini has native audio understanding so I would recommend checking out uploading there and playing with the prompt to get it's output matching what you are after

2. for audio over 1-hour I found chucking it into 45min segments made it easier for Gemini to give back reliable timestamps

3. you do need to check the LLM outputs for valid timestamps - it can go off the rails

I'll add search with the existing vector embeddings used for recommendation system and audio waves to the feature list - great idea!