top | item 34588764

Open source implementation of Google's MusicLM in PyTorch

118 points| bevenky | 3 years ago |github.com

22 comments

order

albertzeyer|3 years ago

This guy (Phil Wang, https://github.com/lucidrains) seems to have the hobby to just implement all models and papers he finds interesting. See his GitHub page. He has 228 repos, and most of them are some implementation of some machine learning paper. Some of those repos are quite popular.

jamessb|3 years ago

The project README thanks "Stability.ai for the generous sponsorship to work and open source cutting edge artificial intelligence research", so it's not necessarily just a hobby (though it's possible they just provide compute resources).

Phil's homepage [1] links to a form [2] where you can suggest a paper for him to implement.

[1]: https://lucidrains.github.io/

[2]: https://forms.gle/Dtrxc6CceHEcqS6X6

asciii|3 years ago

He is open to consultation and work so the repo is a nice gallery of what's possible and learnable material

https://lucidrains.github.io/

He also is creator of ThisPersonDoesNotExist.com

PartiallyTyped|3 years ago

The implementation is quite clean too, and provided that you have read the papers, they are easy to understand.

alexmolas|3 years ago

I don't understand how this got so many upvotes. It takes only one minute to read the code and realize that the model is not yet completely implemented. Sometimes I have the feeling that people upvote posts without even reading them...

Of course, it's good work, and knowing lucidrains trajectory it's probably going to be implemented in the following days/weeks. But I wonder how many people have at least opened the link before upvoting it.

hall0ween|3 years ago

This question is a tangent to your work. Having never used music LMs, and only being cursorily aware of them - how do you keep up with the sota in your field?

kavalg|3 years ago

Google's MusicLM sounds plausible, but quite dull and even sometimes irritating to my musician's ear.

jtode|3 years ago

As another musician, I'll point out that there was a point when the only thing AI could produce visually was a lot of dog faces.

Lucasoato|3 years ago

Does anyone know if these models can output also Midi instead of plain audio?

albertzeyer|3 years ago

This model is designed to output raw audio.

However, there are many models which do output midi. That's actually much simpler, and has been done already a few years ago.

I thought OpenAI did this. But then, I might misremember, because their Jukebox actually also seems to produce raw audio (https://openai.com/blog/jukebox/).

Edit: Ah, it was even earlier, OpenAI MuseNet, this: https://openai.com/blog/musenet/

However, midi generation is so easy, you even find it in some tutorials: https://www.tensorflow.org/tutorials/audio/music_generation

kolinko|3 years ago

Not out of the box, afaik. They produce spectograms that get converted into wav/mp3.

wokwokwok|3 years ago

No. They can’t.

You could train a model that could, but these models can’t.

Paper: https://google-research.github.io/seanet/musiclm/examples/

Quote: “By relying on pretrained and frozen MuLan, we need audio- only data for training the other components of MusicLM. We train SoundStream and w2v-BERT on the Free Music Archive (FMA) dataset (Defferrard et al., 2017), whereas the tokenizers and the autoregressive models for the seman- tic and acoustic modeling stages are trained on a dataset con- taining five million audio clips, amounting to 280k hours of music at 24 kHz.”

Tldr: you can only get out of these models what you put in, and these ones are trained on raw audio.

If you want midi output, you need to train a model on midi data.

henearkr|3 years ago

Nice work!

Won't the model training be a lot of cost to bear, though?

bevenky|3 years ago

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch.

https://github.com/lucidrains/musiclm-pytorch/blob/main/musi...

swyx|3 years ago

pardon my ignorance - what exactly is involved in reimplementing these models?

i assume there's only a superficial description of the architecture, and no weights to load in, so you'll have to train everything from scratch? do we even have their dataset?