(no title)
muxator | 1 year ago
Musically, however, I can't help but notice that these models are still very far from being able to generate something interesting: from harmony, to tempo, to musical structure, to dynamics, everything is muddled and without structure. I guess there is still very much to work on, and I am not sure that purely generative models can attain higher levels. Maybe a mixed rule-based and generative approach would do?
The progress is really fast in this field, I really do not know.
BriggyDwiggs42|1 year ago
muxator|1 year ago
notjulianjaynes|1 year ago
The github page for bark links to a page about chirp, which returns a 404 page for me [2]. My guess is that the model used for suno.ai's song generator isn't too much different than the text to speech model.
I also have a hunch is that it was more like a coincidence than intentional that the bark model was capable of producing music, and that was spun off into this product.
Unfortunately, there seems to still be issues with bark when generating long (like book length) spoken audio. Which is too bad, as someone who's worked jobs that require lots of driving, it would be awesome to be able to have any text read to me in a natural sounding voice.
[1]https://github.com/suno-ai/bark [2] https://www.suno.ai/examples/chirp-v1
Almondsetat|1 year ago
gqcwwjtg|1 year ago
Elton John improvising on an oven manual is the high bar in my opinion. https://m.youtube.com/watch?v=8GuI4UUZrmw
muxator|1 year ago
Music is a language, even if with no semantic. It has conventions, dialects, a syntax, a grammar. There are multiple dimensions a musician uses to convey what he wants/feels: just like an actor has to control at the same time its voice, posture, interplay with other actors, so a good musician is aware of the structure of the piece he is composing/executing, the relations between the various subparts, how the musical discourse progresses in time, besides agogic, dynamics, sound color.
All of those aspects are continually perpetually compared against the conventions of the genre, mixed, evolved, strictly followed or balatantly negated.
This is something that normally a professional musician takes decades to master (apart from musical geniuses).
A listener takes less time to educate himself to appreciate those nuances (but not too little: let's say ~years). Once you develop a taste, it becomes very obvious to see through the spectrum that goes from bad quality tunes to musical artistry.
I see nothing musically interesting in this (wonderful) PoC of speech synthesis.
Just to be clear: I did not see anything particularly stunning even in Google's Bach Doodle from some years ago https://doodles.google/doodle/celebrating-johann-sebastian-b...
anileated|1 year ago
kevinmhickey|1 year ago
I wonder if these models would do something better if the text were poetic or punctuated differently.
wildzzz|1 year ago
ickyforce|1 year ago