Hi HN! Author here, I wanted to show off the latest side hustle that I've been cooking for the past few months.
This is a terminal application that runs the latest AI models for music generation locally, using
the CPU or GPU of the device, and without the need of heavy dependencies like Python or machine
learning frameworks. It works on Linux, Mac and Windows seamlessly, with a binary size of just ~30 Mb
for the non-GPU versions.
The app works like this:
- It accepts a natural language prompt from the user
- Generates a music sample conditioned by the prompt
- Encodes the generated sample into .wav format and plays it on the device
Additionally, it ships a UI that allows interacting with the AI models in a chat-like web application,
storing chat history and generated music on the device.
The vision of the project is that it can eventually generate infinite music streams in real time, for
example, an infinite stream of always new LoFi songs for listening while coding, but not quite there yet...
I'm very glad this finally becomes available. I immediately asked it to do what I wanted for almost a decade: generate 30 minutes of Bach Cello Suite No.1 in G Prelude[1] (I want to listen to it for hours but don't want a loop neither do I want the whole original - only the prelude, subtle variations are welcome). The result was 9 seconds of ear horror. Do I need better hardware to get a listenable result (I ran it on a Ryzen laptop)?
This is cool! The docker image made this easy to try out. What's the reason for the 30s limit? Would it be possible to generate bars and stitch them together?
There's a model version that is able to generate music conditioned not only on natural language prompts, but also on other pieces of music, so it's possible to generate chunks of 10s where each chunk is generated based on the previous one.
The challenge with that model is that it's hard to export it in ONNX format so that it can be run outside of a machine learning framework in Python.
gabimtme|1 year ago
This is a terminal application that runs the latest AI models for music generation locally, using the CPU or GPU of the device, and without the need of heavy dependencies like Python or machine learning frameworks. It works on Linux, Mac and Windows seamlessly, with a binary size of just ~30 Mb for the non-GPU versions.
The app works like this:
- It accepts a natural language prompt from the user
- Generates a music sample conditioned by the prompt
- Encodes the generated sample into .wav format and plays it on the device
Additionally, it ships a UI that allows interacting with the AI models in a chat-like web application, storing chat history and generated music on the device.
The vision of the project is that it can eventually generate infinite music streams in real time, for example, an infinite stream of always new LoFi songs for listening while coding, but not quite there yet...
Hope you like it!
qwerty456127|1 year ago
[1] https://www.youtube.com/watch?v=mGQLXRTl3Z0
gabimtme|1 year ago
You can read more about the limitations here https://huggingface.co/facebook/musicgen-small
gsuuon|1 year ago
gabimtme|1 year ago
There's a model version that is able to generate music conditioned not only on natural language prompts, but also on other pieces of music, so it's possible to generate chunks of 10s where each chunk is generated based on the previous one.
The challenge with that model is that it's hard to export it in ONNX format so that it can be run outside of a machine learning framework in Python.