top | item 44407223

(no title)

bjar2 | 8 months ago

Ingestion engine, it is indeed a cron job that runs once a day to get the latest podcast episodes posted. Yes it scrapes the web for episodes and then populates the database. And yup yup, I transcribe the audio to text, and process the text to get the embeddings using embedding models. The secret sauce is using language models to find promising snippets within each episode by running a sliding window over the transcript. So I actually make different types of embeddings, for highlights and also for episodes. I also make use of the metadata in podcast episodes to enhance recommendations, mainly by deriving the strength of the source making the content.

You are spot on, I use celery for tasks, many different kinds of tasks actually, super handy tool to have, it truly enhances what I am able to do on Heroku. My devops life becomes much more comfy

discuss

order

r1290|8 months ago

[deleted]