top | item 36317465

(no title)

SmooL | 2 years ago

I've thought about doing this as well, but I haven't tried it yet. Are there any resources/blogs/information on various strategies on how to best chunk & embed arbitrary text?

discuss

order

busseio|2 years ago

I’ve been experimenting with sliding window chunking using SRT files. They’re the subtitle format for television and have 1 to _n_ sequence numbers for each chunk, along with time stamps for when the chunk should appear on the screen. Traditionally it’s two lines of text per chunk but you can make chunks of other line counts and sizes. Much of my work with this has been with SRT files that are transcriptions exported from Otter.ai; GPT-3.5 & 4 natively understand the SRT format and the concepts of the sequence numbers and time stamps, so you can refer to them or ask for confirmation of them in a prompt.