Show HN: Sonauto – A more controllable AI music creator
454 points| zaptrem | 1 year ago |sonauto.ai
My cofounder and I trained an AI music generation model and after a month of testing we're launching 1.0 today. Ours is interesting because it's a latent diffusion model instead of a language model, which makes it more controllable: https://sonauto.ai/
Others do music generation by training a Vector Quantized Variational Autoencoder like Descript Audio Codec (https://github.com/descriptinc/descript-audio-codec) to turn music into tokens, then training an LLM on those tokens. Instead, we ripped the tokenization part off and replaced it with a normal variational autoencoder bottleneck (along with some other important changes to enable insane compression ratios). This gave us a nice, normally distributed latent space on which to train a diffusion transformer (like Sora). Our diffusion model is also particularly interesting because it is the first audio diffusion model to generate coherent lyrics!
We like diffusion models for music generation because they have some interesting properties that make controlling them easier (so you can make your own music instead of just taking what the machine gives you). For example, we have a rhythm control mode where you can upload your own percussion line or set a BPM. Very soon you'll also be able to generate proper variations of an uploaded or previously generated song (e.g., you could even sing into Voice Memos for a minute and upload that!). @Musicians of HN, try uploading your songs and using Rhythm Control/let us know what you think! Our goal is to enable more of you, not replace you.
For example, we turned this drum line (https://sonauto.ai/songs/uoTKycBghUBv7wA2YfNz) into this full song (https://sonauto.ai/songs/KSK7WM1PJuz1euhq6lS7 skip to 1:05 if impatient) or this other song I like better (https://sonauto.ai/songs/qkn3KYv0ICT9kjWTmins - we accidentally compressed it with AAC instead of Opus which hurt quality, though)
We also like diffusion models because while they're expensive to train, they're cheap to serve. We built our own efficient inference infrastructure instead of using those expensive inference as a service startups that are all the rage. That's why we're making generations on our site free and unlimited for as long as possible.
We'd love to answer your questions. Let us know what you think of our first model! https://sonauto.ai/
[+] [-] adrianh|1 year ago|reply
Speaking as a musician who plays real instruments (as opposed to electronic production): how does this help me? And how does this enable more of me?
I am asking with an open mind, with no cynicism intended.
[+] [-] zaptrem|1 year ago|reply
We want you to be able to upload recordings of your real instruments and do all sorts of cool things with them (e.g., transform them, generate vocals for your guitar riff, use the melody as a jazz song, or just get some inspiration for what to add next).
IMO AI alone will never be able to touch hearts like real people do, but people using AI will be able to like never before.
[+] [-] LZ_Khan|1 year ago|reply
In this way it is a tool only useful to expert musicians.
[+] [-] 93po|1 year ago|reply
[+] [-] suyash|1 year ago|reply
[+] [-] whoomp12341|1 year ago|reply
Its a good muse, but I wouldn't trust what it makes out of the gate
[+] [-] cush|1 year ago|reply
There's always going to be a balance between creating high level tools like this with no dials and low level tools with finer control, and while this touts itself as being "more controllable", it's clearly not there. But, the same way Adobe has integrated outpainting and generative fill into Photoshop, it's only a matter of time before products like this are built into Ableton and VSTs - where a creator can highlight a bar or two and ask your AI to make the the snippet more ethereal, create a bridge between the verse and the sax solo, or help you with an outro.
That said, similar to generating basic copy for a marketing site, these tools will be great for generating cheap background music but not much else, but any musician, marketing agency, or film-maker worth their salt is going to need very specifically branded music for their needs, and they're likely willing to pay for a real licence to something audiences will recognize, using generative AI and tools to remix the content to their specific need.
[+] [-] TheActualWalko|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] boringg|1 year ago|reply
[+] [-] visarga|1 year ago|reply
[+] [-] _DeadFred_|1 year ago|reply
Look at current music production and compare it to past. Older music seems so much simpler. It was so much easier to come up with that 20% 'novel' when pop/recorded music was new. Ironically I think AI freeing people to focus on that 20% is going to add a lot of creativity to music, not reduce it.
I say this as someone who hates the concept of AI music. I'm actually really excited to see what it enables/creates (but I don't want to use it, even though I really could use it for vocals that I currently pay others to do for me).
I'll be here making my bad knockoffs of bad synth pop bands having fun and taking weeks to do 5% of what kids these days will start off as their entry point, with my 20% creativity ignored because my music sounds 'off' when I can't get the 80% familiar down.
People thought synthesizers were the end of music, yet Switched on Bach begot Jean Michel Jarre begot Kate Bush and on and on.
[+] [-] zaptrem|1 year ago|reply
Also, our model specifically excels at songs from the era before overproduction. Try asking for a Johnny Cash or Ella Fitzgerald-style country or swing/jazz song!
Here's an example: https://sonauto.ai/songs/taJX3GrKZW7C5qOhjopr
[+] [-] fennecbutt|1 year ago|reply
Why diffuse an entire track? We should be building these models to create music the same way that humans do, by diffusing samples, then having the model build the song using samples in a proper sequencer, diffuse vocals etc.
Problem with Suno etc, is that as other people have mentioned, you can't iterate or adjust anything. Saying "make the drums a little punchier and faster paced right after the chorus" is a really tough query to process if you've diffused the whole track rather than built it up.
Same thing with LLM story writing, the writing needs a good foundation, more generating information about the world and history and then generating a story taking that stuff into account, vs a simple "write me a story about x"
[+] [-] zaptrem|1 year ago|reply
[+] [-] saaaaaam|1 year ago|reply
[+] [-] garyrob|1 year ago|reply
I play guitar, but I'm not much of a guitarist or singer. I really like songwriting, not trying to be polished as a performer. So I intermittently look into the AI world to see whether it has tools I could use to generate a higher-quality song demo than I could do on my own.
I've been looking for something that could take a chord progression and style instructions and create a decent backing track for a singer to sing over.
But your saying "Very soon you'll also be able to generate proper variations of an uploaded or previously generated song (e.g., you could even sing into Voice Memos for a minute and upload that!)" is very intriguing. I mean, I can sing and play, it just isn't very professional. But if I could then have an AI take what I did and just... make it better... that would be kind of awesome.
In fact, I believe you could have a very big market among songwriters if you could do that. What I would love to see is this:
My guitar parts are typically not just strummed, but involve picking, sometimes fairly intricate. I'm just not that good at it. It would be fantastic to have an AI that would just take would I played and fix it so that it's more perfect.
And then to have a tool where I could say, "OK, now add a bass part," and "OK, now add drums" would be awesome.
[+] [-] maroonblazer|1 year ago|reply
https://www.pgmusic.com/
[+] [-] mschulkind|1 year ago|reply
https://youtu.be/PCYTqDSUbvU
[+] [-] LastTrain|1 year ago|reply
[+] [-] zaptrem|1 year ago|reply
[+] [-] rideontime|1 year ago|reply
[deleted]
[+] [-] dwallin|1 year ago|reply
I think it's better to think of the process of finding the right song as a search algorithm through the space of all possible songs. The current approach just uses a "pick a random point in a general area". Once we find something that is roughly correct we need something that lets us iteratively tweak the aspects that are not quite right, decreasing the search space and allowing us to iteratively take smaller and smaller steps in defined directions.
[+] [-] rcarmo|1 year ago|reply
[+] [-] dubeux|1 year ago|reply
[+] [-] anjel|1 year ago|reply
[+] [-] Recursing|1 year ago|reply
I was recently really impressed by the state of AI-generated music, after listening to the April Fools LessWrong album https://www.lesswrong.com/posts/YMo5PuXnZDwRjhHhE/lesswrong-... . They claim it took them ~100 hours to generate 15 songs.
Can't wait for the day I can instantly generate a song based on a random blog post or group chat history, this seems like a step in that direction
[+] [-] disqard|1 year ago|reply
[+] [-] echelon|1 year ago|reply
Focus on product. Give actual music producers something they'll find useful. These fad, meme products will compete on edge model capability for 99% of users and ignore serving actual music producers.
I'd like a product with more control, and it doesn't appear Suno or Udio are interested in this.
[+] [-] internet101010|1 year ago|reply
[+] [-] mrnotcrazy|1 year ago|reply
[+] [-] jsf01|1 year ago|reply
[+] [-] ibdf|1 year ago|reply
[+] [-] zaptrem|1 year ago|reply
[+] [-] CuriouslyC|1 year ago|reply
Some things it made sounded ok, but I feel like the average generation quality wasn't fantastic. It did a folk guitar melody and a vocoded thrash metal voice that I thought sounded pretty legit, but mostly vocals had an ear grating quality and everything had a bit of low bitrate vibe.
To be honest though, I don't think you need to try and outcompete Suno. I think you want to get into DAWs and VSTs and become the tool all the best producers in the world use. Spit out stems, and train your model on less processed sounds because things like matching reverb/delay and pre-squashed dynamics are a pain in the ass to work around.
Suno is trying to battle a large established industry that is actually very creator friendly and accessible. If you choose to instead serve that industry and enable it I think that's the winning play.
[+] [-] zaptrem|1 year ago|reply
Thanks for the feedback re: DAWs, though! That would be really cool. Maybe we can tag tracks based on the effects applied to them to allow this to be more controllable.
[+] [-] cchance|1 year ago|reply
[+] [-] sandkoan|1 year ago|reply
[+] [-] lta|1 year ago|reply
Any plans to release the model(s) under an open license ?
[+] [-] zaptrem|1 year ago|reply
[+] [-] echelon|1 year ago|reply
[+] [-] jedisct1|1 year ago|reply
[+] [-] digging|1 year ago|reply
Well, maybe I'll try out the next AI music creator posted on HN.
[+] [-] Nischalj10|1 year ago|reply
[+] [-] rexreed|1 year ago|reply
I was thrilled by 2 of the versions produced. I wish I could extend it more like one of the comments here said:
* ElectroKlezmerReggaeFunk 1: https://sonauto.ai/songs/s22rQEPnYsXy1yf7sjU0
* ElectroKlezmerReggaeFunk 3: https://sonauto.ai/songs/1iNTrA2CekPwp7XT9mmM
But wow, the UDIO version:
* https://www.udio.com/songs/j4zpRYgG2GEDbWpLPYbuJb
[+] [-] pachico|1 year ago|reply
I presentes this prompt "Noir detective music from the 60s. Low tempo, trumpet and walking bass" and got back a one-note only song that has nothing to do with the prompt if not for some lyrics that were a bit ridiculous.
This is just feedback, I'm passionately expecting something like this to surprise me but I know it's really hard!
Happy to share the song/project/account, if you tell me how to :)
[+] [-] ionwake|1 year ago|reply
[+] [-] zaptrem|1 year ago|reply