No, the whole point of transformer architecture is that it can do stuff like this without any extra training, an LLM can copy your writing pattern etc.
It did the same thing ChatGPT does when it picks up your writing style and exact words/sentences after a few messages. Literally - the audio is encoded as tokens and fed to the LLM, there is no distinction between text and audio from the model's point of view.
This was inference, not training. Like how you can paste a few paragraphs of text into ChatGPT and ask it to write another paragraph in a similar writing style.
Jensson|1 year ago
throwaway48540|1 year ago
simonw|1 year ago