top | item 41106018

(no title)

This is quite interesting, but I have to ask, have you experimented much with larger LLMs as a mechanism to basically automate the entire process?

I'm doing something pretty similar right now for internal meetings and I use a process like: transcribe meeting with utterance timestamps, extract keyframes from video along with timestamps, request segmented summary from LLM along with rough timestamps for transitions, add keyframe analysis (mainly for slides).

gpt-4o, claude sonnet 3.5, llama 3.1 405b instruct, llama 3.1 70b instruct all do a pretty stunning job of this IMO. Each department still reviews and edits the final result before sending it out, but I'm so far quite impressed with what we get from the default output even for 1-2hr conversations.

I'd argue the key feature for us is also still providing a simple, intuitive UI for non technical users to manage the final result, edit, polish and send it out.

discuss

gklezd|1 year ago

That is a great point! I can certainly think of cases where you might want to go with an LLM instead and we have definitely experimented with that approach. Here are some reasons why we think TreeSeg is more suitable for us:

1. A more algorithmic approach allows us to bake certain contraints into the model. As an example you can add a regularizer to incentivize TreeSeg to split more eagerly when there are large pauses. You can also strictly enforce minimum and maximum sizes on segments.

2. If you are interested in reproducing a segmentation with slight variations you might not have good results with an LLM. Our experience has been that there is significant stochasticity in the answers we get from an LLM. Even if you try to obtain a more deterministic answer (i.e. set temp to zero), you will need an exact copy of the model to get the same result in the future. Depending on what LLM you are using this might not be possible (e.g. OpenAI adjusts models frequently). With TreeSeg you only need your block-utterance embeddings, which you probably have already stored (presumably in a vector db).

3. TreeSeg outputs a binary tree of segments and their sub-segments and so forth... This structure is important to us for many reasons, some of which are subjects of future posts. One such reason is access to a continuum between local (i.e. chapters) and global (i.e. full session) context. Obtaining such a hierarchy via an LLM might not be that straightforward.

4. There is something attractive about not relying on an LLM for everything!

Hope this is useful to you!