Algorithmic Music Generation With Recurrent Neural Networks [video]

[+] crucialfelix|10 years ago|reply

The funny thing is that the only really good ones are the first few where they claim its just random noise. The later ones just sound like a crappy radio.

With images the technique works because we like looking at the dense artifacts. Millions of dog heads essentially copy and pasted onto any appendage that looks like it should be a head. It looks like drugs and overwhelms in the same way.

If you just take white noise and throw it through a tuned filter bank (which is in essence and in final effect what they are doing here) then you just get crappy audio.

The more standard and successful use of NN in composition is the use it on pitch series and compositional forms. Like feeding it all of Beethoven and then getting it to generate similar compositions. That's been going on for decades. You can do it with that kind of data.

But the thing about pop and electronic music is that the easily machine observable elements are not very interesting. Listen to the 4/4 kick and snare pattern in the video. Its boring as hell. (Other tracks can be just a kick and snare and they are amazing and we celebrate them as classics and play them for 20 years. Machines will never understand why)

What's great and essential are things like spatial relationships between elements in the mix: how does the surge of the compressed synth/guitar cause the beat to tumble outward and stir you up ? after a series of peaks in a synth melody then the next time it pulls back creating a space that pulls at your heart strings. you create a negative space that the listener goes into. playing with listeners expectations based on what songs, conventions and tropes they already know and respond to.

[+] sweezyjeezy|10 years ago|reply

The image stuff works because we have found a way to model a good prior for it : convolution layers are basically enforcing some positional invariance and locality constraints on what our model believes the world looks like. Without this very strict prior, image recognition with neural networks just wouldn't really work.

We haven't found a way to enforce a good prior for temporal data like sound yet.

[+] melloclello|10 years ago|reply

> Machines will never understand why

humans don't really understand why either, if it didn't stop us, why should it stop a machine?

[+] rndn|10 years ago|reply

> Its boring as hell. Machines will never understand why

That statement seems to be overly strong. Perhaps we are far from having a machine that can figure out high level aspects of a song on its own, but I don't think it's implausible that with some more guidance (for example by learning also loop arrangements and filters instead of only the waveforms) these neural networks can potentially create quite interesting music today (especially in the EDM/IMD genre). This development might be scary because it possibly replaces human creativity to a large extent, but you can't stop it by claiming that it's impossible or that it will always be of poor quality. People have said the same when synthesized music arrived, that it lacks human aspects etc., and now it's has a high cultural significance, even though it uses things like auto-tune and consists of super clean loops.

[+] aflinik|10 years ago|reply

I think I like your musical taste.

[+] mpdehaan2|10 years ago|reply

I'm not 100% up to speed on my AI, but this sounds about like what you'd get with random variations on a signal, where the neural net is the "which sounds like X" filter, and picks one of the two to survive. But that would be using both some form of a genetic algorithm (details TBD?) and the neural net as the checker. But is it?

If they aren't doing it that way, I'd be interested in hearing how it's evolving the signal in that given direction - and also how that filter works (what libraries does it use?).

Sounds like it hit some sort of local maxima, so this system won't ever produce the original song, but something a percentage of the way toward it.

I'm a bit more interested in algorithmic composition, but this could be interesting if trying to blend genres. For a long time I've wanted to build a program that could produce essentially an infinite song morphing between genres with lots of tunable parameters.

[+] msamwald|10 years ago|reply

It would be interesting to know how novel those sequences are (obviously, the outcome would be far less impressive if what we hear is basically a looped, noisy sample of a song that already exists).

[+] m-i-l|10 years ago|reply

Not much information in the video on how this was achieved, but a quick search for "gruv algorithmic music generation" returns the following: https://sites.google.com/site/anayebihomepage/cs224dfinalpro... . Extract:

We compare the performance of two different types of recurrent neural networks (RNNs) for the task of algorithmic music generation, with audio waveforms as input (as opposed to the standard MIDI). In particular, we focus on RNNs that have a sophisticated gating mechanism, namely, the Long Short-Term Memory (LSTM) network and the recently introduced Gated Recurrent Unit (GRU). Our results indicate that the generated outputs of the LSTM network were significantly more musically plausible than those of the GRU.

[+] leaveyou|10 years ago|reply

Another promising field is RNN applied to TED talks: youtube.com/watch?v=-OodHtJ1saY

[+] acd|10 years ago|reply

I think it would sound better if we thought the neural network to play notes and music theory.

[+] anentropic|10 years ago|reply

This is not music. Music is not simply organised sound. Music is a cultural practise.

[+] yyyyes|10 years ago|reply

This is not a cultural practice?

[+] durbin|10 years ago|reply

Source code link?

31 comments