top | item 38664997

(no title)

ldarby | 2 years ago

I'm sure it's existed for many years. Ultrasonic tracking apps are from 2015: https://en.wikipedia.org/wiki/Cross-device_tracking#Applicat...

Let's try to imagine a world where that exists, keyword detection exists as well ("ok google" etc), and keyword detection for targeted ads doesn't exist. Can you? I can't.

discuss

order

jedberg|2 years ago

I can because I've worked in the space. You have to build a model for every key word that you are looking for. Those models take up space and lots of compute to train. That's why you can't set an arbitrary wake word for your Alexa/Google/Siri and you have to choose from a short list. Because those are the only models they have trained.

It would not make sense to train a model for every advertiser and then upload that to the phone. It would only make sense to capture the audio and send it to the cloud for generic speech processing. But that would also not make sense because it takes a bunch of compute to do speech processing, not to mention you'd see all the data being uploaded from your device and the cost of receiving, processing, and storing all that data.

I'm 99.9% sure that this is not happening today, but we are on the verge of the tech being good enough to do local speech processing, and then there is no bandwidth limitations, no storage issue, and the consumer pays for compute.

g-b-r|2 years ago

Only for very reliable recognition you'd need that, rough local speech recognition has existed for decades.

Here's a very basic offline app for Android: https://f-droid.org/packages/org.stypox.dicio . It works pretty bad for me, with its tiny not specialized models, but still enough for some purposes.

You can use an online model for the confirmation of the recognitions, by the way

ldarby|2 years ago

> It would not make sense to train a model for every advertiser and then upload that to the phone.

I was assuming that's what they were doing. Maybe you could combine both ways, with really imprecise models, so the phone captures and upload only words that above average chance of matching (so not that much data), and have heavy servers do the rest?

Yes I don't work in this field, but you shouldn't assume that just because you don't know how something can be achieved, then no one else has figured it out either (especially where they're motivated to keep it secret).