(no title)
zebproj | 1 year ago
Some details:
The source code for this project can be found on github [0].
I am using an AudioWorklet node with custom DSP using Rust/WebAssembly. Graphics are just done with the Canvas API. The voice leading is done algorithmically using a state machine with some heuristics.
The underlying DSP algorithm is a physical model of the human voice, similar to the model you'd find in Pink Trombone [1], but with some added improvements. The DSP code for that is a small crate [2] I've been working on just for singing synthesizers based on previous work I've done.
0: https://github.com/paulBatchelor/trio
atech4826|1 year ago
Apologies for the late comment, but I had a query I wanted to share.
Would it be possible for you to create a tool that allows users to mimic human emotional sounds directly in the browser? I’m thinking of sounds like realistic coughs, sighs, gasps, and other vocal expressions like shouting or crying etc . It would be amazing if the tool could optionally incorporate TTS, but even without it, the functionality would be very valuable for content creators or people who need custom sound effects.
The idea is to let users customize these sounds by adjusting parameters such as intensity, pitch, and duration. It could also include variations for emotional contexts, like a sad sigh, a relieved sigh, a startled gasp, or a soft cough. An intuitive interface with sliders and buttons to tweak and preview sounds in real-time would make it super user-friendly, with options to save or export the generated audio much like the project pinktrombone
I’m quite new to this field and only have basic experience with HTML, CSS, and JavaScript. However, I am very much interested in this area and I was wondering if this is something that could be achieved using tools like CursorAI or similar AI-based solutions? Or better yet, is it possible for you to create something like this for people like me who aren’t very tech-savvy?
Thank you so much
zebproj|1 year ago
What a beautiful idea. Sadly, I do not think I currently have the skills required to build such a tool.
The underlying algorithms and vocal models I'm using here are just good enough to get some singing vowels working. You'd need a far more complex model to simulate the turbulent airflow required for a cough.
If you suspend disbelief and allow for more abstract sounds, I believe you can craft sounds that have similar emotional impact. A few years ago, I made some non-verbal goblin sounds [0] from very simple synthesizer components and some well-placed control curves. Even though they don't sound realistic, character definitely comes through.
0: https://pbat.ch/gestlings/goblins