top | item 38117233

(no title)

ggerganov | 2 years ago

Yes, I was planning to do this back then, but other stuff came up. There are many different ways in which this simple example can be improved:

- better detection of when speech ends (currently basic adaptive threshold)

- use small LLM for quick response with something generic while big LLM computes

- TTS streaming in chunks or sentences

One of the better OSS versions of such chatbot I think is https://github.com/yacineMTB/talk. Though probably many other similar projects also exist by now.

discuss

generalizations|2 years ago

I keep wondering if a small LLM can also be used to help detect when the speaker has finished speaking their thought, not just when they've paused speaking.

drunkenmagician|2 years ago

Maybe using a voice activity detector, VAD would be a lighter (less resources required) option.

rjtavares|2 years ago

> use small LLM for quick response with something generic while big LLM computes

Can't wait for poorly implemented chat apps to always start a response with "That's a great question!"

Joeri|2 years ago

“Uhm, i mean, like, you know” would indeed be a little more human.

avarun|2 years ago

Just like poorly implemented human brains tend to do :P