top | item 45655692

(no title)

I don't think we've had the transformer moment for audio training yet, but yes, in theory audio-first models will be much more capable.

discuss

trollbridge|4 months ago

Particularly interesting would be transformations between tokenised audio and tokenised text.

I recall someone telling me once up to 90% of communication can be non-verbal, so when an LLM sticks to just text, it's only getting 10% of the data.