I think if we replaced "AI" with "taking averages over subsets of historical examples", then there'd be no mystery for when "AI" will be good or bad at anything.
Would we expect a discrete melodic structure to be expressible as averages of prior music? No.
Pretty sure the first continuation is a famous piece with a few notes messed up. Can't remember the name. Honestly it only sounds marginally better than the old markov chain continuations.
Indeed, there is lots of denial or ignorance in this thread (ignorance in the technical sense). AudioLM already produced impressive results and it's a tiny fraction of what is already possible because performance simply improves with scale. One can probably solve music generation today with a ~$1B budget for most purposes like film or game music, or personalized soundtracks. This is not science fiction.
It doesn't surprise me that an AI model for language can't grok maths or music. I can't see how a language model can map to maths. Hell, I don't even know how to describe music in words. It's possible to articulate some maths in words, but that often involves using words with unexpected definitions.
MIDI is extraordinarily expressive and is likely used to sequence a large majority of music produced within the last three decades. A lot of the instruments you hear are synthesizers or samplers running directly from MIDI. There is a lot more to what MIDI can do, and is used for, than the conception most people have from "canyon.mid" or old website background music. If an AI can do MIDI just fine then it's an extremely small leap to doing audio just fine.
That’s what a musician does. They make short loops and loop them.
This reads like someone who knows sheet music and theory but does not listen to music. It’s repetition of short phrases over and over.
I’m not really sure what people expect of general AI trained on human generated outputs. It can’t make up anything anything “net new” only compose based upon what we feed it.
I like to think AI is just showing us how simple minded we really are and how our habit of sharing vain fairy tales about history makes us believe we’re masters of the universe.
Those models are not trained on short loops. They are trained on whole songs just like image generation models are trained on whole images. And yet they struggle to repeat sections, modulate to a different key, create bridges, intros and outros. After a few seconds of hallucinating a melodic line they simply abandon the idea and migrate to another one. There is no global structure whatsoever.
mjburgess|3 years ago
Would we expect a discrete melodic structure to be expressible as averages of prior music? No.
vladf|3 years ago
https://google-research.github.io/seanet/audiolm/examples/
phillipharr1s|3 years ago
bloep|3 years ago
stephencanon|3 years ago
denton-scratch|3 years ago
aaroninsf|3 years ago
but yes there is not yet at on-demand button rendering from a text prompt of bitstreams encoding composed performed and mastered music.
CactusOnFire|3 years ago
dwringer|3 years ago
causi|3 years ago
Der_Einzige|3 years ago
unknown|3 years ago
[deleted]
yeasurebut|3 years ago
This reads like someone who knows sheet music and theory but does not listen to music. It’s repetition of short phrases over and over.
I’m not really sure what people expect of general AI trained on human generated outputs. It can’t make up anything anything “net new” only compose based upon what we feed it.
I like to think AI is just showing us how simple minded we really are and how our habit of sharing vain fairy tales about history makes us believe we’re masters of the universe.
dimmuborgir|3 years ago