WebGPU actually generates the speech entirely in the browser. Web Speech is great too, but less practical if the model is complicated to set up and integrate with the speech API on the host.
The implementation of the Web Speech API usually involves the specific browser vendor calling out to their own, proprietary, cloud-based TTS APIs. I say "usually" because, for a time, Microsoft used their local Windows Speech API in Edge, but I believe they've stopped that and have largely deprecated Windows Speech for Azure Speech even at the OS level.
scarface_74|1 year ago
moron4hire|1 year ago