(no title)
conradbez | 1 month ago
Couple tips on audio front:
1. gemini has native audio understanding so I would recommend checking out uploading there and playing with the prompt to get it's output matching what you are after
2. for audio over 1-hour I found chucking it into 45min segments made it easier for Gemini to give back reliable timestamps
3. you do need to check the LLM outputs for valid timestamps - it can go off the rails
I'll add search with the existing vector embeddings used for recommendation system and audio waves to the feature list - great idea!
No comments yet.