top | item 14664710

Generating music with expressive timing and dynamics

126 points| iansimon | 8 years ago |magenta.tensorflow.org

22 comments

order

contingo|8 years ago

It's refreshing to hear generated piano music that isn't either strictly metrical or entirely freeform, but with patches where you do get a somewhat natural sense of rubato and sensitive dynamic shaping. It's sort of convincingly improvisatory. The constantly shifting harmonic idiom is disorienting in a not very pleasant way – the worst kind of Chopin + Ligeti mashup – especially when you raise the temperature. It would be interesting to use period/style-specific training sets.

To my ears the 5:00 clip does have a larger structure, there are clearly extended passages of building up to and ebbing away from large climaxes, where you get a real sense of sustained intensification, but of course if you follow the detail everything is built up from lots of fleeting and unrelated ideas.

iheartmemcache|8 years ago

> "It's sort of convincingly improvisatory."

I'm not sure if it was the dynamics specifically, but it was clear to me that A was human. Within 30 seconds I was so sure, I hit pause and loaded the answer to see if I was right (I was, and I'm likely the worst pianist on these forums and only a casual fan of music that falls into the 'classical' genre.)

Here's[0] a fabulous physics paper that analyses the 16th notes by a studio drummer widely considered one of the best in his field. IIRC, the paper mentions he couldn't record with a click because it'd throw him off. That being said, the quality of the recording didn't suffer (his 2nd take of the track was more than good enough for the rest of the musicians to record against). So his own 'internal metronome' was more than good enough. The interesting thing wasn't that his syncopation was incompatible with a click track, but rather the skew which evolved throughout each phrase had a mathematical model that fit well against it. The study compared his recording against a corpus of user submissions of the same track and all of these drummers universally followed a similar set of dynamics. So presumably all humans (or at least, all western drummers who elected to submit their recordings) have that same skew intrinsically.

It's interesting whether it's a byproduct of culture (like an accent) or a feature intrinsic to humans. In fact, that itself would be an interesting study -- compare the patterns of a traditionally schooled western jazz drummer vs a tribal African drummer with vs an Indian tabla drummer. The end of the paper suggests additional avenues to explore, but who knows, maybe soon drumming will be 'solved' ?

I'm totally with you on seeing how it would do training against a specific set of recordings from a specific region and/or era. The results would be terribly interesting ! Or training just against some particular virtuoso like Gould on Bach or Horwitz on Chopin.

As I understand it, there are basically just a handful of songwriters out there (Shane McMcAnally is a prime example) who write songs for the major country-pop artists. If you have a listen to this[1], you can really see how similar each song is. (This isn't exclusive to country music - the 90s pop I grew up loving is pretty much the same, as demonstrated by Rob Paravorian[2].) There's probably a lot of money in automated songwriting for Katy Perry & her entourage. Startup idea for any of you kids.

IIRC, there's a startup which is already using pinterest, tumblr, and more obscure sites like lookbook to analyze and generate trends for clothing and interior design which design houses can pay semi-nominal fee to gain access to. H&M is great pumping out high-street fashion copies within a season, but imagine being able to actually beat Tom Ford to market.

There are also interesting sociological implications for this. The Culture of Chess changed with BlueGene. When I first read about AlphaGo I was floored. (I mean really. I had previously thought would be intractable within my lifetime due to the huge configuration space.) As we see these 'good enough' models emerge, this has wide implications on human culture as a whole.

I wonder how it will affect the value of artists (in any genre). An ex of mine who hated basketball (this was during the Kobe/Paul Pierce days) still managed to recognize the genius when I showed her some Michael Jordan clips. Certainly an artist in his craft. I'm not a fan of Lady Gaga[3] but when I saw this performance I could immediately see a significant amount of talent. Walter Murch is an absolutely amazing film editor, will he be reduced to a Final Cut Pro plugin? If I manage to get my hands on the all-22 recordings (for every NFL game, there's an overhead camera which records the whole field to let coaches analyze their opponents) of every American football team, can I out-tactic Bill Belichick ?

==

[0] journals.plos.org/plosone/article?id=10.1371/journal.pone.0127902 (Seriously, it's a fantastic paper.) [1] https://www.youtube.com/watch?v=FY8SwIvxj8o [2] https://www.youtube.com/watch?v=JdxkVQy7QLM [3] https://www.youtube.com/watch?v=oP8SrlbpJ5A

henearkr|8 years ago

It seems that this model does not have any notion of "cadence" (the punctuation in musical grammar, given by harmony and tonality). The "expressivity" must be correlated to the harmony grammar, else it does not make sense. Unfortunately the samples in the article do not sound very good to me, and I am pretty sure that it is because of that.

kastnerkyle|8 years ago

This is stunning! Great stuff.

Since the input and prediction is a single sequence, did you experiment with beamsearch/stochastic beamsearch decoding (maybe with additional diversity criteria)?

I found that even simple models (markov chains) got a big diversity boost with a stochastic beamsearch - it might avoid the problems with low temperature repetition that could happen in a standard beamsearch. However, my music models are much, much, (much) worse than this, so my relative improvement might be related to that.

Similarly, I am finding really nice results in text (RNN-VAE) with scheduled sampling, it might be worth experimenting with.

I am amazed at how good this next-step sampled output is. The above ideas might just hurt the result, I am having a hard time imagining how it could be better.

What soundfont/midi rendering package is used for this? The piano sound is really rich.

Looking forward to hearing what creative things users will do with this model.

iansimon|8 years ago

Hey Kyle, we didn't try anything more advanced than next-step sampling. You probably have a better sense than I do how much improvement such techniques are likely to yield. My unfounded suspicion is that we're close to the limit of generation quality from this dataset, and so I'm most interested in trying to gather 10-100x more skilled performances, one way or another.

There's also no consensus on whether the high- or low-temperature samples sound better. I've heard both opinions from several people.

Sageev did the final rendering, not sure what he used but I'm pretty sure it was nothing too fancy.

divenorth|8 years ago

I think the choice of piano really sells the quality of the result. Musically it's not that great since it still sounds like random noodling. Much better than any other implementations that I've heard.

DomreiRoam|8 years ago

Could it mean that you could generate music for games that would follow the action and help build up tension?

pasta|8 years ago

This is already done in a lot of games. But those are precomposed parts that are dynamically morphed into each other when action changes.

the_cat_kittles|8 years ago

that first example is jaw dropping. its just like what good musicians do when they are noodling. damn. well done! probably the best results i've ever heard for this type of effort.

3131s|8 years ago

I'm curious how many tries it took to get that. I've tried chopping up samples from piano music using onset detection and then recombining samples programmatically. The results were more interesting musically to me actually, but also not as reminiscent of a traditional classical / romantic piano piece.

So, this is probably the best RNN generated music that I've heard too but overall I'm still not extremely impressed.