This guy (Phil Wang, https://github.com/lucidrains) seems to have the hobby to just implement all models and papers he finds interesting. See his GitHub page. He has 228 repos, and most of them are some implementation of some machine learning paper. Some of those repos are quite popular.
The project README thanks "Stability.ai for the generous sponsorship to work and open source cutting edge artificial intelligence research", so it's not necessarily just a hobby (though it's possible they just provide compute resources).
Phil's homepage [1] links to a form [2] where you can suggest a paper for him to implement.
I don't understand how this got so many upvotes. It takes only one minute to read the code and realize that the model is not yet completely implemented. Sometimes I have the feeling that people upvote posts without even reading them...
Of course, it's good work, and knowing lucidrains trajectory it's probably going to be implemented in the following days/weeks. But I wonder how many people have at least opened the link before upvoting it.
This question is a tangent to your work. Having never used music LMs, and only being cursorily aware of them - how do you keep up with the sota in your field?
However, there are many models which do output midi. That's actually much simpler, and has been done already a few years ago.
I thought OpenAI did this. But then, I might misremember, because their Jukebox actually also seems to produce raw audio (https://openai.com/blog/jukebox/).
Quote: “By relying on pretrained and frozen MuLan, we need audio- only data for training the other components of MusicLM. We train SoundStream and w2v-BERT on the Free Music Archive (FMA) dataset (Defferrard et al., 2017), whereas the tokenizers and the autoregressive models for the seman- tic and acoustic modeling stages are trained on a dataset con- taining five million audio clips, amounting to 280k hours of music at 24 kHz.”
Tldr: you can only get out of these models what you put in, and these ones are trained on raw audio.
If you want midi output, you need to train a model on midi data.
pardon my ignorance - what exactly is involved in reimplementing these models?
i assume there's only a superficial description of the architecture, and no weights to load in, so you'll have to train everything from scratch? do we even have their dataset?
albertzeyer|3 years ago
jamessb|3 years ago
Phil's homepage [1] links to a form [2] where you can suggest a paper for him to implement.
[1]: https://lucidrains.github.io/
[2]: https://forms.gle/Dtrxc6CceHEcqS6X6
asciii|3 years ago
https://lucidrains.github.io/
He also is creator of ThisPersonDoesNotExist.com
PartiallyTyped|3 years ago
alexmolas|3 years ago
Of course, it's good work, and knowing lucidrains trajectory it's probably going to be implemented in the following days/weeks. But I wonder how many people have at least opened the link before upvoting it.
hall0ween|3 years ago
kavalg|3 years ago
jtode|3 years ago
Lucasoato|3 years ago
albertzeyer|3 years ago
However, there are many models which do output midi. That's actually much simpler, and has been done already a few years ago.
I thought OpenAI did this. But then, I might misremember, because their Jukebox actually also seems to produce raw audio (https://openai.com/blog/jukebox/).
Edit: Ah, it was even earlier, OpenAI MuseNet, this: https://openai.com/blog/musenet/
However, midi generation is so easy, you even find it in some tutorials: https://www.tensorflow.org/tutorials/audio/music_generation
kolinko|3 years ago
wokwokwok|3 years ago
You could train a model that could, but these models can’t.
Paper: https://google-research.github.io/seanet/musiclm/examples/
Quote: “By relying on pretrained and frozen MuLan, we need audio- only data for training the other components of MusicLM. We train SoundStream and w2v-BERT on the Free Music Archive (FMA) dataset (Defferrard et al., 2017), whereas the tokenizers and the autoregressive models for the seman- tic and acoustic modeling stages are trained on a dataset con- taining five million audio clips, amounting to 280k hours of music at 24 kHz.”
Tldr: you can only get out of these models what you put in, and these ones are trained on raw audio.
If you want midi output, you need to train a model on midi data.
alephxyz|3 years ago
bevenky|3 years ago
henearkr|3 years ago
Won't the model training be a lot of cost to bear, though?
bevenky|3 years ago
https://github.com/lucidrains/musiclm-pytorch/blob/main/musi...
swyx|3 years ago
i assume there's only a superficial description of the architecture, and no weights to load in, so you'll have to train everything from scratch? do we even have their dataset?