Ask HN: How does Alexa avoid interrupting itself when saying its own name?
41 points| dumbest | 1 year ago
Self-Recognition: How does Alexa distinguish between its own voice and a user's voice saying "Alexa"?
Voice Characteristics: What specific features (e.g., pitch, tone) does Alexa analyze to recognize its own TTS voice?
Algorithms and Models: What machine learning models or algorithms are used to handle this task effectively?
Implementation: Are there any open-source libraries or best practices for developing a similar functionality?
Any insights or resources would be greatly appreciated. Thanks!
[+] [-] richarme|1 year ago|reply
Source: worked on 3rd party Alexa speakers
[+] [-] cushychicken|1 year ago|reply
it also has uses in noise canceling headphones, voice conferencing software, and radar/sonar in some cases.
No LLMs or deep learning at all - purely DSP!
[+] [-] kqr|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] Someone|1 year ago|reply
Slightly harder: keep it running, but discard hits that are timed close to the time you say “Alexa” yourself.
Even harder: have a second detector that is trained on the device saying “Alexa”, and discard hits that coincide with that detector firing. That second detector can be simplified by superimposing a waveform that humans will (barely) notice but that is easily detected by a computer on top of the audio whenever the device says “Alexa”.
Still harder: obtain the transfer function and latency between the speaker(s) and its microphone(s) and, using that, compute what signal you expect to hear at the microphone from the speaker’s output, and subtract that from the actual signal detected to get a signal that doesn’t include one’s one utterances.
That function could be obtained from one device in the factory or trained on-device.
I suspect the first already is close to good enough for basic devices. If you want a device that can listen whilexalso playing music at volume, the last option can be helpful.
[+] [-] davidmurdoch|1 year ago|reply
[+] [-] throwaway211|1 year ago|reply
[+] [-] stavros|1 year ago|reply
[+] [-] JoBrad|1 year ago|reply
[+] [-] matheist|1 year ago|reply
[+] [-] ITB|1 year ago|reply
[+] [-] shagie|1 year ago|reply
If there is nothing in the frequency range from 3kHz to 6kHz Alexa won't wake when a wake word is spoken. https://youtu.be/iNxvsxU2rJE doesn't wake up anything.
https://www.theverge.com/2018/2/2/16965484/amazon-alexa-supe...
> Apparently, the Alexa commercials are intentionally muted in the 3,000Hz to 6,000Hz range of the audio spectrum, which apparently tips off the system that the “Alexa” phrase being spoken isn’t in fact a real command and should be ignored.
Compare that with selecting 'Alexa, what time is it' (I'm on a Mac) and doing "speak text". Same speaker (for me with the previous video).
I had one device set with a wake word of "Amazon" but that got really annoying when watching AWS training videos. I believe Ziggy is the best wake word for that reason.
[+] [-] imranq|1 year ago|reply
How do you know what noise to add?
[+] [-] JoBrad|1 year ago|reply
[+] [-] felixgallo|1 year ago|reply
[+] [-] solardev|1 year ago|reply
-----
Not directly the same case but similar, Amazon trains Alexa to avoid certain mentions of her in commercials using acoustic fingerprinting techniques: https://www.amazon.science/blog/why-alexa-wont-wake-up-when-...
[+] [-] kqr|1 year ago|reply
[+] [-] CoastalCoder|1 year ago|reply
I suggest we don't personify devices.
[+] [-] ma2rten|1 year ago|reply
[+] [-] hoffs|1 year ago|reply
[+] [-] nickburns|1 year ago|reply
what i do find interesting, however, is that, at times, she'll wake to an utterance from some other media i have playing and seems to 'know' immediately that she was inadvertently awoken. the 'listening' tone and 'end listening' tones sound in quick succession. i do not have voice recognition enabled (to the extent that that setting is respected).
[+] [-] 01HNNWZ0MV43FF|1 year ago|reply
Speculation:
- To reduce latency, the "listening" tone plays as soon as the wake word chip hears the wake word
- To improve accuracy, the wake word chip keeps a circular buffer of the last couple seconds of audio, and the main CPU / DSP scans that when it wakes up
So you get spurious wakeups exactly the same as a human - You think you hear something, then you re-listen to it in your mind and realize it was something else.
[+] [-] icecube123|1 year ago|reply
But as others have said, they might be able to just sleep the wake algorithm temporarily when they know it’s playing back its own wake word.
[+] [-] caprock|1 year ago|reply
[+] [-] makerdiety|1 year ago|reply
Real AI doesn't need recursion that is explicitly instructed into its behavior. Because real artificial general intelligence has better things to do than to listen to human advisors and programmers who don't know about effective objective function optimization. Therefore, Alexa gets a rudimentary infinite recursion loop break statement explicitly installed into her by her human shepherds.
Edit: Recursion should be seen as a general, mathematical form of engineering constructs like acoustic echo cancellation and adaptive filtering. Recursion should be what those engineering tools get reduced to being.
[+] [-] smitelli|1 year ago|reply
[1] https://www.youtube.com/watch?v=LESFuoW-T7I
[+] [-] numpad0|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] chuckadams|1 year ago|reply
Anyone actually pull this off?
[+] [-] goutham2688|1 year ago|reply
[+] [-] next_xibalba|1 year ago|reply
[+] [-] 01HNNWZ0MV43FF|1 year ago|reply
[+] [-] dtagames|1 year ago|reply
[+] [-] cedws|1 year ago|reply
I’m guessing that the device just cancels out the output waveform from the input.
[+] [-] ww520|1 year ago|reply
[+] [-] julianacox02|1 year ago|reply
[deleted]