Thanks for sharing - I've been wondering how the newer generation of AI-stuff would do on chorales.
I don't think this would have gotten great marks in my collegiate music theory IV class (iirc covered special cases of voice leading and rarer harmonic progressions like the augmented 6th and neapolitan chords), but honestly neither did I. In my highschool classes this probably would've gotten passing grades, at least on short exercises.
Is there an easy way to get it to "true up" the intonation on the longer chords? IMO part of the magic of this style of music is that a capella performances aren't constrained to equal temperament, and get really nice resonances on anything they hold.
It seems it gets the best results when you give it a melody.
Pretty impressive nonetheless.
I wrote a terrible script [1] a few years ago to do similar chorale harmonizations. I guess this is the year when most of my old projects get an AI version that completely outclass them ;-(
I'm sure your script runs a lot faster than my model :D A well-tuned heuristic script can probably do as good harmonizing as any black box deep learning model. I was mostly just curious how diffusion models would handle symbolic music data. This model does reasonably well on short time scales but has no idea about long-term context.
Listening for any span of a few seconds, it sounds quite nice! but beyond the length of a phrase or so, it just does not make any sense.
It reminds me a bit of that project to "create Beethoven's tenth" using an AI that I heard about a year ago. It was amazing in many ways, but the music didn't go anywhere and wasn't saying anything. I know that description is nebulous, the sort of feeling you might imagine or trick yourself into having, or a perception you invent perhaps out of defensiveness or say you have just to seem cultured.
And perhaps in a "blind" comparison, cantable diffuguesion wouldn't stand out as much. But with all that said, it definitively sounds not quite human after a moment.
I wonder how we can teach machines the larger-scale structures of (common practice) music. At the scale of an entire movement, structures can be merely formulaic and the music still turns out alright. At the level of phrases and themes, though, it's harder to articulate, and requires good taste. But it's the sort of "intuitive" thing that I'd expect AI to be good at, so I'm always surprised that it seems to be the thing AIs are worst at.
(My background: although I've not studied Bach's chorales in any depth, I've studied a few years of composition and used to be a church organist.)
>Four-part chorales are presented to the network as 4-channel images. As in Stable Diffusion, a U-Net is trained to predict the noise residual.
>After training the generative model we add 12 channels to the inputs, with the middle four channels representing a mask, and the last four channels are masked chorales. We mask the four channels individually, as opposed to Stable Diffusion Inpainting that use a one-channel mask.
How were they encoded, specifically? Anyway, it's fairly easy to break, say, try with "c'4 c'#4 d'4 d'#4 e'4 f'4 f'#4" as the melody.
There was a typo in the readme, thanks for pointing this out! I add 8 channels (4 mask + 4 masked chorales). The chorales are transformed into 4-dimensional arrays, each channel representing a part of the piece. I've added some example plots to the readme to illustrate.
My not very substantive response is that I like the project name. Maybe lots of text generators can come up with wordplay involving Subject ! and Subject 2, but I think you've already got one running.
q845712|3 years ago
I don't think this would have gotten great marks in my collegiate music theory IV class (iirc covered special cases of voice leading and rarer harmonic progressions like the augmented 6th and neapolitan chords), but honestly neither did I. In my highschool classes this probably would've gotten passing grades, at least on short exercises.
Is there an easy way to get it to "true up" the intonation on the longer chords? IMO part of the magic of this style of music is that a capella performances aren't constrained to equal temperament, and get really nice resonances on anything they hold.
bibanez|3 years ago
[1] https://github.com/bibanez/harmonizer
fagerhult|3 years ago
agalunar|3 years ago
It reminds me a bit of that project to "create Beethoven's tenth" using an AI that I heard about a year ago. It was amazing in many ways, but the music didn't go anywhere and wasn't saying anything. I know that description is nebulous, the sort of feeling you might imagine or trick yourself into having, or a perception you invent perhaps out of defensiveness or say you have just to seem cultured.
And perhaps in a "blind" comparison, cantable diffuguesion wouldn't stand out as much. But with all that said, it definitively sounds not quite human after a moment.
I wonder how we can teach machines the larger-scale structures of (common practice) music. At the scale of an entire movement, structures can be merely formulaic and the music still turns out alright. At the level of phrases and themes, though, it's harder to articulate, and requires good taste. But it's the sort of "intuitive" thing that I'd expect AI to be good at, so I'm always surprised that it seems to be the thing AIs are worst at.
(My background: although I've not studied Bach's chorales in any depth, I've studied a few years of composition and used to be a church organist.)
prvc|3 years ago
>After training the generative model we add 12 channels to the inputs, with the middle four channels representing a mask, and the last four channels are masked chorales. We mask the four channels individually, as opposed to Stable Diffusion Inpainting that use a one-channel mask.
How were they encoded, specifically? Anyway, it's fairly easy to break, say, try with "c'4 c'#4 d'4 d'#4 e'4 f'4 f'#4" as the melody.
fagerhult|3 years ago
Photogrammaton|3 years ago
mkw2000|3 years ago
TheOtherHobbes|3 years ago
mkw2000|3 years ago
[deleted]