I think the responses to this can be broken down into a 2x2 matrix: level of concern vs. understanding of technology.
1) Don't understand ML; not concerned - "I have nothing to hide."
2) Don't understand ML; concerned - "I bought this device and now people are spying on me!"
3) Understand ML; not concerned - "Of course, Google needs to label its training data."
4) Understand ML; concerned - "How can we train models/collect data in an ethical way?"
To me, category 3 is the most dangerous. Tech workers have a responsibility not just understand the technologies that they work with, but also educate themselves on the societal implications of those technologies. And as others have pointed out, this extends beyond home speakers to any voice-enabled device in general.
In conversations about this with engineers the response I've gotten is essentially: "Just trust that we [Google/Amazon/etc.] handle the data correctly." This is worrying.
I'm in the 5th category. 5) Understand ML; concerned - won't allow any of these things in my house period because they will always use them for things behind the scenes that they won't state. I don't care how well trained they are, or "ethical." Ethical... according to who, and at what time period in the future? Ethics change. The data they have on you won't. Look at all of the politicians and other people getting in trouble for things they said 15 years ago, which were generally more acceptable at the time but we've "progressed" since then. Who will be making decisions about you in the future based on last years data? Just don't give it to them.
This classification is very useful to discuss this issue.
The difference between 3 and 4, noble as it is, can be caused by feasability concerns that push people into 3, not just ignorance of the privacy impact. Human labelling of training data sets is a big thing in supervised learning. Methods that dispense with this would be valuable for purely economic reasons beyond privacy - the cost of human labelling of data samples. Yet we don't have them!
Techniques like federated learning or differential privacy can train models on opaque (encrypted or unavailable) data. This is nice, but they assume too much: that the data is already validated and analyzed. In real life modelling problems, one starts with an exploratory data analysis, the first step being looking at data samples. Opaque encrypted datasets also stop ML engineers from doing error analysis (look at your errors to better target model/dataset improvements) which is an even bigger issue, IMO, as error analysis is crucial when iterating on a model.
Even for an already productivized model, one has to do maintenance work like checking for concept drift, which I can't see how to do on an opaque dataset.
>To me, category 3 is the most dangerous. Tech workers have a responsibility not just understand the technologies that they work with, but also educate themselves on the societal implications of those technologies.
Do you think its possible to be educated on the societal implications of these technologies and still not be concerned? Seems like you've written your own viewpoint into the only "logical" one here.
>To me, category 3 is the most dangerous. Tech workers have a responsibility not just understand the technologies that they work with, but also educate themselves on the societal implications of those technologies. And as others have pointed out, this extends beyond home speakers to any voice-enabled device in general.
Yes I'm frequently amazed how many coworkers I have that are still completely plugged into Google, Facebook, Amazon services/spyware, fill their homes with internet enabled "smart devices", have alexa/google assistance etc, and yet they act like I'm paranoid when I try to discuss security concerns or just flat out don't care.
As much as I hate to say it, I think there needs to be a massive breach or abuse of power from one of these organizations/services that has severe real world consequences for those that utilize/support them. Until then nothing will change.
> Tech workers have a responsibility not just understand the technologies that they work with, but also educate themselves on the societal implications of those technologies.
I think this goes well beyond tech workers. I think it's time for society to legally recognize the balance between the value of ML systems and the privacy concerns of customers of ML.
Doctors and lawyers obviously should understand the value of privacy, but we, as a society, have also created legal rights and duties for them. Conversations with lawyers and doctors are legally privileged; at the same time, there are specific consequences for medical companies or lawyers who do not protect that information.
Companies like Google, Apple, Amazon, etc. certainly have the resources, intelligence, and sophistication to comply with a similar regulatory regime. IMO it should be possible to construct a law that allows companies to collect, store, and tag customer data for purposes of training ML systems, but sets serious duties, with consequences, on them to do it right.
Right now, what is to keep employees at these companies from abusing these systems to stalk, to surveil, to harass, or even just to feed their own curiosity? These data systems are core trade secrets for these companies, which means they are opaque to any kind of oversight from outside the company.
The free market can't create the necessary balance because customers need information to make decisions--information that they don't have. The result will be an increasingly chaotic "hero/shithead rollercoaster" as customers make snap judgments based on scanty or wrong information about what these companies are actually doing.
This is a classic case for regulation, which prevents a "race to the bottom" of sketchy practices for short term gain, while also protecting the ability of people and companies to use this technology to create value.
Doing this right will help data-leveraging companies in the long run, just like attorney-client privilege and HIPAA have helped lawyers and doctors build trust (and therefore value) in their customer relationships.
One thing that I think gets lost in engineers (and humans) is scale.
Googazon doing {thing} might be "meh" for 10 people. But the implications look very different when it's doing {thing} for 10%+ of a country's population.
At 10 people, I may find out Ted likes to eat Italian. At 10%, I may find out an Italian chain has a sudden health issue and short their stock.
Which is in essence their original playbook: do things that only work at a scale that only we can play at.
Anyone remember the 3d printed gun stuff from a few years back? I think this isn't very different from it. You can take these raw pieces and explain how they are simple and good and draw these simple ethical conclusions from them, but then you add it up and the bigger picture doesn't feel quite the same way. 3D printers are good, sharing 3D printing plans is good, it's good to help your neighbor, no regulations and we're experiencing tremendous growth in the 3D space, people are inventing new stuff, starting new businesses, etc.. all good stuff. but letting any jackass off the street print a working gun when we have how many mass shootings a year? People don't feel the same way. All the pieces are totally okay until you've got a more questionable global intention, and how can you regulate intention?
Google using the data to train models is just a tool, it's a baby step, they aren't doing that to sell the models or in and of itself, they're doing it so that they can generate data that they might consider theirs and not yours from your voice data and then feed that in to other systems which generate tremendous profits for them in ways you don't even know. They have intended uses already. Is it a remotely fair question to talk about ethical training in this context without some idea as to the intended use and distribution of the meta data?
5) Understand ML; concerned - "Why do other people in the ML industry think it's OK to use and store peoples data in without informed consent (which are only those in group 3, and group 1+2 don't have informed consent)"
The Mycroft project has a better approach to this:
"Mycroft uses opt-in privacy. This means we will only record what you say to Mycroft with your explicit permission. Don’t want us to record your voice? No problem! If you’d like us to help Mycroft become more accurate, you can opt in to have your voice anonymously recorded."
I'm perhaps in a subcategory of (3) that falls under "Understand ML; concerned".
Knowing what I know about how people I have worked with have come close to or have actually mishandled data despite the best of intentions, I do not trust any of these teams without an explicit accountability mechanism that is observable by an outside entity. I'm not looking to punish slip-ups, because mistakes happen, but I am looking for external enforcement to keep people honest.
It's not that I think the engineers using this data are mustache twirling villains, it's that I think mishandling is inevitable due to inattention (yes, even you make mistakes!), and we have to design our data pipelines against that.
There’s a different dimension that may or may not understand ML, but are cognizant that any data created will be viewed at least by the company that creates it.
I fall into that category as I have time, nor do I trust any evaluation methods, to determine if a company is using my data ethically. If I create data and store it something that’s not mine, then I only do that in situations where I’m comfortable with the owner doing anything they want with it.
I understand ML and know that Google has to at least use it for training. I’ve also worked on IT long enough that even in super tight controlled environments data are misused by administrators.
> In conversations about this with engineers the response I've gotten is essentially: "Just trust that we [Google/Amazon/etc.] handle the data correctly."
No one is afraid of power when it's in their own hands. A common failure mode is that people assume a given power that's in their hands today will always be.
4 because not being explicit about the practice is misleading at best, outsourcing the difficult task of keeping the analysis private show how unimportant it's considered, and because big techs have a tendency to decrease privacy over time. Using clients who paid for the product as a dataset generator is also wrong.
But 3 at the same time because well, it's important to evaluate the performance of the product in the field not just in the lab. There were so many cases of catastrophic failures for ML models (ex. classifying black people as gorilla) that having a tight feedback loop is important.
It has to be done right, but evaluating a product that was primarily developed for (or at least by) English speakers and transfered to other domains seem like the right thing to do.
All in all, I don't and wouldn't use one of those assistants because 4 outweigh 3, but it's not binary.
>Tech workers have a responsibility not just (to) understand the technologies that they work with
Ok, I agree completely with you, 100%. However, based on my limited worldview, tech workers barely understand the tech they work with at all [0]. Asking for the ethical implications to be mulled over is unlikely to happen considering the near-weekly HN threads on "interviewing sucks, heres how to fix it, lol". We can't even figure out how to hire someone let alone how to impedance-match with them on deep issues like ethical implications of ML/AI.
Get real, obvious, informed consent by asking if you would like your voice prompts to be improved on / heard by real live humans as an opt in. I bet 1/500 of the population would opt in to it.
And the first one to do it should be apple itself.
Assuming categories 1 and 3 are sufficiently large (and I assume that is the case), this is easily resolved by allowing users to choose whether to donate their data for training or not.
If the training already only happens on a 1/500 sample, skewing the sample towards "people who don't care about their privacy" will probably not significantly impact the quality of the data.
I'm surprised this wasn't already the case, but hopefully the article will help the people responsible make better decisions in the trade-off between minimizing onboarding friction and respecting user's privacy in the future.
Asserting your point of view as "educated" and "correct" while labeling people who don't share it as dangerous. Doesn't sound like a great way to start a discussion.
I'm between 3 and 4: I just want proof that they remove PII from the audio files. If it's a bunch of audio files with unique IDs and metadata like time of day, count me as a member of group 3.
Even if I trust them to do what they say they're doing with the data I may not trust every party who comes to possess that data. And I may not trust their possession/use of it in all future contexts - as their privacy policy slowly drifts into the unknown year after year.
If they're collecting it in a way that can be requested by governments (for instance) or could be leaked by hackers that's another layer of valid "concern" not related to my understanding of the ML aspect of this.
The meta-issue in the United States is that once your data is accessible to a third party, you have no sovereignty over it, and abuse by private actors is "agreed to" by click-wrap and access by government actors is a simple subpoena.
The law needs to catch up. Sharing should require specific informed consent and legislation needs to establish a scope where data stored as a "tenant" on a third party server is given 4th amendment protection.
I agree, but I think this issue is incredibly mishandled by reporting. The title in the linked article being a great example.
There is absolutely no proof of number 2 in your list, but that is by far the widest-held belief.
It's infuriating, because we can't have a useful societal dialog about the issue if the largest chunk of concerned people are, essentially, conspiracy theorist.
The one thing about these stories that keep coming out about the home assistants... they kind of create the impression that this is an issue specific to home speakers, and you can avoid it, by simply not buying them.
That's misleading.
Any voice command you use to operate any internet connected tech gadget, from phones to smart TV's, is potentially stored and flagged for human review.
You really have to avoid using voice commands at all, on all of your devices. Even that is probably insufficient. You probably have to go even further and actively disable voice command features on all of your devices, assuming they actually support such a setting. Otherwise here's still the possibility of an accidental recording taking a journey through the clouds, to a stranger's ears.
So Google’s response is (paraphrased as fairly as I can while removing the sugar-coating):
’Yes, we hire people to listen in to and transcribe some conversations from the private homes of our customers (so as improve our speech recognition engines); but the recordings aren’t linked to personally identifiable information.’
Even assuming they have only the purest intentions here, I still don’t understand how they can possibly guarantee that these recorded conversations are not linked to personally identifiable information!
For example, what’s to stop me from saying “Hey Google, I am <full legal name / ID> and my most embarrassing and private secret is <...>”?
One might argue that they could detect this in the recognized text and omit those samples, but presumably the whole purpose of hiring people to create transcripts is because the existing speech-to-text engine isn’t perfect, and they need more training data.
"The man, who wants to remain anonymous, works for an international company hired by Google. "
So not a Google employee at all, a probably low paid contractor who is in possession of thousands of audio files. Your privacy matters, except when the bottom line is involved.
What is doubly concerning here is that the contractor was in a position to demonstrate how the system worked to the reporters. That would seem to indicate they have access to that data in a non-secured environment.
I'm not familiar with EU law around these things, but I would imagine there is some kind of whistleblower mechanism available, and a right for authorities to audit/inspect such activities?
The person is probably a temp/vendor from a consulting company (think accenture or cognizant), who should've signed the same NDA agreements as anyone working on that stuff.
Does it matter how much they're payed? They're probably payed the right amount relative to the work they are doing.
Also how is having access to small samples of audio a privacy issue? Are they also receiving enough information to attach an identity to the audio clips? How long are the clips? Are they randomly assigned to humans? Do those humans get to listen to multiple clips from the same Home device and can they tell that's the case?
Home, Siri, Alexa, M, they all do. I have friends that work on this field transcribing the audio, and measuring its accuracy. Sometimes it's multiple layers of contractors: An employee hands the task to a contractor, another contractor verifies the speech to text, and they're all managed by a contractor.
I grew up as a kid in a country ruled by Securitate [1], one of the few institutions that rivaled the East-German Stasi when it came to spying on its own citizens, and as such I'm very, very perplexed of why would anyone bring in a listening device in his/her own house out of his/her own volition. And those people even pay for the privilege of having their home-lives actively monitored and listened to almost all the time, it's crazy.
I would imagine that for the people that didn't grow up in such a country ruled by Securitate do not have the experience to make them fear being listened in. Not saying that they are wrong (they may turn out to be right), just that we are all products of our experiences.
Do you have a smartphone? Why would you bring that listening device (the smartphone) into your own house out of your own volition? Please explain, because I am very perplexed.
Not only that, but people still buy computers even though the Nazis used IBM computers to help them perpetuate the Holocaust. Surely you shouldn't use computers if the NAZIS used computers!
I own 4 Home Minis, 1 Home and 2 Home hubs I honestly don't care so long as my data is used to improve the functionality and stability of my investment. It is quite another thing if they are selling my conversations to third-party vendors.
I mean. Of course they are. Do you expect to be able to do any meaningful level of training on data that hasn't been properly labeled? At some point, a human has to go in and correct the software when the software gets it wrong. If you want services that do what Google Home does, you have to have this.
Even with that, I'm sure that the engineers are flagging voice requests that happen more then once, or where some one has to manually change or correct what the software thought was the request.
This is only creepy if you don't understand how the software works.
The biggest issue IMHO is how the average consumer has been deceived into the belief that current AI is pure AI, when in reality a lot of humans are looking at your pictures, listening to your recordings, crawling through your inbox and analyzing your browsing/purchasing/streaming history, right now: https://imgs.xkcd.com/comics/trained_a_neural_net.png
I think a lot of people here are under the assumption that voice commands, on any device, have the potential to be human reviewed. I am not sure whether or not the general public has that same assumption.
That being said, my biggest concern is the fact that many of these device don't have a hardware microphone kill switch. I feel better when I know I can control when a device is listening in. I've read reports that some Alexa devices have them, but I don't own any so I am unable to verify that.
I want all of my devices with microphones to hardware based kill switch for the mic; that's my phone, laptop, tablet, everything.
Assuming $0.3/audio clip and base wage of $10/hr, that equates to 33.3 audio clips/hr = 266.4 audio clips/day that are being monitored by any one 'language expert'.
However, Google does not specify how long a 'conversation' is. How many sentences make up a conversation? When is the cutoff point?
Google also says '1 in 500' conversations are monitored. That means for any one 'language expert', there are approx. 133,200 conversations/day that have a chance of being monitored.
So basically, you have a 0.2% chance that your conversation is being picked up by any particular 'language expert' per day.
The number of people in this thread who believe that this is ok because, 1) it's obviously the only way Google could train their voice system and thus 2) people clearly knew what they were getting into, is horrifying.
It's no coincidence that companies like Amazon market their Echos as "stocking stuffers" for the holiday season. I've wondered how Google Home and these "smart home" devices were always able to be priced as low as they are. Goes to show that paying for the product doesn't exempt you from still being part of the product.
A bit tangential, but I tried sharing this link with a few friends on Facebook Messenger, and noticed it's blocked because it "violates Community Standards" [1]. Even shortened bit.ly links are blocked.
Anyone know why that would be the case? I'm trying to not assume malice (eg. maybe it got misflagged?) but it certainly feels like censoring and is yet another push for me to drop Messenger too.
[+] [-] gringoDan|6 years ago|reply
1) Don't understand ML; not concerned - "I have nothing to hide."
2) Don't understand ML; concerned - "I bought this device and now people are spying on me!"
3) Understand ML; not concerned - "Of course, Google needs to label its training data."
4) Understand ML; concerned - "How can we train models/collect data in an ethical way?"
To me, category 3 is the most dangerous. Tech workers have a responsibility not just understand the technologies that they work with, but also educate themselves on the societal implications of those technologies. And as others have pointed out, this extends beyond home speakers to any voice-enabled device in general.
In conversations about this with engineers the response I've gotten is essentially: "Just trust that we [Google/Amazon/etc.] handle the data correctly." This is worrying.
[+] [-] cronix|6 years ago|reply
[+] [-] ovi256|6 years ago|reply
The difference between 3 and 4, noble as it is, can be caused by feasability concerns that push people into 3, not just ignorance of the privacy impact. Human labelling of training data sets is a big thing in supervised learning. Methods that dispense with this would be valuable for purely economic reasons beyond privacy - the cost of human labelling of data samples. Yet we don't have them!
Techniques like federated learning or differential privacy can train models on opaque (encrypted or unavailable) data. This is nice, but they assume too much: that the data is already validated and analyzed. In real life modelling problems, one starts with an exploratory data analysis, the first step being looking at data samples. Opaque encrypted datasets also stop ML engineers from doing error analysis (look at your errors to better target model/dataset improvements) which is an even bigger issue, IMO, as error analysis is crucial when iterating on a model.
Even for an already productivized model, one has to do maintenance work like checking for concept drift, which I can't see how to do on an opaque dataset.
[+] [-] mejari|6 years ago|reply
Do you think its possible to be educated on the societal implications of these technologies and still not be concerned? Seems like you've written your own viewpoint into the only "logical" one here.
[+] [-] okmokmz|6 years ago|reply
Yes I'm frequently amazed how many coworkers I have that are still completely plugged into Google, Facebook, Amazon services/spyware, fill their homes with internet enabled "smart devices", have alexa/google assistance etc, and yet they act like I'm paranoid when I try to discuss security concerns or just flat out don't care.
As much as I hate to say it, I think there needs to be a massive breach or abuse of power from one of these organizations/services that has severe real world consequences for those that utilize/support them. Until then nothing will change.
[+] [-] snowwrestler|6 years ago|reply
I think this goes well beyond tech workers. I think it's time for society to legally recognize the balance between the value of ML systems and the privacy concerns of customers of ML.
Doctors and lawyers obviously should understand the value of privacy, but we, as a society, have also created legal rights and duties for them. Conversations with lawyers and doctors are legally privileged; at the same time, there are specific consequences for medical companies or lawyers who do not protect that information.
Companies like Google, Apple, Amazon, etc. certainly have the resources, intelligence, and sophistication to comply with a similar regulatory regime. IMO it should be possible to construct a law that allows companies to collect, store, and tag customer data for purposes of training ML systems, but sets serious duties, with consequences, on them to do it right.
Right now, what is to keep employees at these companies from abusing these systems to stalk, to surveil, to harass, or even just to feed their own curiosity? These data systems are core trade secrets for these companies, which means they are opaque to any kind of oversight from outside the company.
The free market can't create the necessary balance because customers need information to make decisions--information that they don't have. The result will be an increasingly chaotic "hero/shithead rollercoaster" as customers make snap judgments based on scanty or wrong information about what these companies are actually doing.
This is a classic case for regulation, which prevents a "race to the bottom" of sketchy practices for short term gain, while also protecting the ability of people and companies to use this technology to create value.
Doing this right will help data-leveraging companies in the long run, just like attorney-client privilege and HIPAA have helped lawyers and doctors build trust (and therefore value) in their customer relationships.
[+] [-] ethbro|6 years ago|reply
One thing that I think gets lost in engineers (and humans) is scale.
Googazon doing {thing} might be "meh" for 10 people. But the implications look very different when it's doing {thing} for 10%+ of a country's population.
At 10 people, I may find out Ted likes to eat Italian. At 10%, I may find out an Italian chain has a sudden health issue and short their stock.
Which is in essence their original playbook: do things that only work at a scale that only we can play at.
[+] [-] Nelson69|6 years ago|reply
Google using the data to train models is just a tool, it's a baby step, they aren't doing that to sell the models or in and of itself, they're doing it so that they can generate data that they might consider theirs and not yours from your voice data and then feed that in to other systems which generate tremendous profits for them in ways you don't even know. They have intended uses already. Is it a remotely fair question to talk about ethical training in this context without some idea as to the intended use and distribution of the meta data?
[+] [-] isostatic|6 years ago|reply
[+] [-] iamnothere|6 years ago|reply
"Mycroft uses opt-in privacy. This means we will only record what you say to Mycroft with your explicit permission. Don’t want us to record your voice? No problem! If you’d like us to help Mycroft become more accurate, you can opt in to have your voice anonymously recorded."
(project is open source, at https://mycroft.ai/)
Let people participate in R&D if they want to, but don't force it.
[+] [-] munchbunny|6 years ago|reply
Knowing what I know about how people I have worked with have come close to or have actually mishandled data despite the best of intentions, I do not trust any of these teams without an explicit accountability mechanism that is observable by an outside entity. I'm not looking to punish slip-ups, because mistakes happen, but I am looking for external enforcement to keep people honest.
It's not that I think the engineers using this data are mustache twirling villains, it's that I think mishandling is inevitable due to inattention (yes, even you make mistakes!), and we have to design our data pipelines against that.
[+] [-] prepend|6 years ago|reply
I fall into that category as I have time, nor do I trust any evaluation methods, to determine if a company is using my data ethically. If I create data and store it something that’s not mine, then I only do that in situations where I’m comfortable with the owner doing anything they want with it.
I understand ML and know that Google has to at least use it for training. I’ve also worked on IT long enough that even in super tight controlled environments data are misused by administrators.
[+] [-] munificent|6 years ago|reply
No one is afraid of power when it's in their own hands. A common failure mode is that people assume a given power that's in their hands today will always be.
[+] [-] m3at|6 years ago|reply
4 because not being explicit about the practice is misleading at best, outsourcing the difficult task of keeping the analysis private show how unimportant it's considered, and because big techs have a tendency to decrease privacy over time. Using clients who paid for the product as a dataset generator is also wrong.
But 3 at the same time because well, it's important to evaluate the performance of the product in the field not just in the lab. There were so many cases of catastrophic failures for ML models (ex. classifying black people as gorilla) that having a tight feedback loop is important.
It has to be done right, but evaluating a product that was primarily developed for (or at least by) English speakers and transfered to other domains seem like the right thing to do.
All in all, I don't and wouldn't use one of those assistants because 4 outweigh 3, but it's not binary.
[+] [-] Balgair|6 years ago|reply
Ok, I agree completely with you, 100%. However, based on my limited worldview, tech workers barely understand the tech they work with at all [0]. Asking for the ethical implications to be mulled over is unlikely to happen considering the near-weekly HN threads on "interviewing sucks, heres how to fix it, lol". We can't even figure out how to hire someone let alone how to impedance-match with them on deep issues like ethical implications of ML/AI.
[0] https://stackoverflow.com/
[+] [-] novok|6 years ago|reply
And the first one to do it should be apple itself.
[+] [-] tgsovlerkhgsel|6 years ago|reply
If the training already only happens on a 1/500 sample, skewing the sample towards "people who don't care about their privacy" will probably not significantly impact the quality of the data.
I'm surprised this wasn't already the case, but hopefully the article will help the people responsible make better decisions in the trade-off between minimizing onboarding friction and respecting user's privacy in the future.
[+] [-] Rapzid|6 years ago|reply
Asserting your point of view as "educated" and "correct" while labeling people who don't share it as dangerous. Doesn't sound like a great way to start a discussion.
[+] [-] neilpointer|6 years ago|reply
[+] [-] wybiral|6 years ago|reply
If they're collecting it in a way that can be requested by governments (for instance) or could be leaked by hackers that's another layer of valid "concern" not related to my understanding of the ML aspect of this.
[+] [-] Spooky23|6 years ago|reply
The law needs to catch up. Sharing should require specific informed consent and legislation needs to establish a scope where data stored as a "tenant" on a third party server is given 4th amendment protection.
[+] [-] raghava|6 years ago|reply
agent( tech, management ) # assuming management has power over tech worker
understanding-of-ML( yes, no )
concerned-about-ethics-and-privacy( yes, no )
The below combinations are worst in terms of ethics.
{ agent[tech], understanding-of-ML[yes], concerned-about-ethics-and-privacy[no] }
{ agent[management], understanding-of-ML[no], concerned-about-ethics-and-privacy[no] }
{ tech[management], understanding-of-ML[no], concerned-about-ethics-and-privacy[no] }
[+] [-] cblades|6 years ago|reply
There is absolutely no proof of number 2 in your list, but that is by far the widest-held belief.
It's infuriating, because we can't have a useful societal dialog about the issue if the largest chunk of concerned people are, essentially, conspiracy theorist.
[+] [-] SmirkingRevenge|6 years ago|reply
That's misleading.
Any voice command you use to operate any internet connected tech gadget, from phones to smart TV's, is potentially stored and flagged for human review.
You really have to avoid using voice commands at all, on all of your devices. Even that is probably insufficient. You probably have to go even further and actively disable voice command features on all of your devices, assuming they actually support such a setting. Otherwise here's still the possibility of an accidental recording taking a journey through the clouds, to a stranger's ears.
[+] [-] electrograv|6 years ago|reply
’Yes, we hire people to listen in to and transcribe some conversations from the private homes of our customers (so as improve our speech recognition engines); but the recordings aren’t linked to personally identifiable information.’
Even assuming they have only the purest intentions here, I still don’t understand how they can possibly guarantee that these recorded conversations are not linked to personally identifiable information!
For example, what’s to stop me from saying “Hey Google, I am <full legal name / ID> and my most embarrassing and private secret is <...>”?
One might argue that they could detect this in the recognized text and omit those samples, but presumably the whole purpose of hiring people to create transcripts is because the existing speech-to-text engine isn’t perfect, and they need more training data.
[+] [-] TheAdamist|6 years ago|reply
So not a Google employee at all, a probably low paid contractor who is in possession of thousands of audio files. Your privacy matters, except when the bottom line is involved.
[+] [-] numbsafari|6 years ago|reply
I'm not familiar with EU law around these things, but I would imagine there is some kind of whistleblower mechanism available, and a right for authorities to audit/inspect such activities?
[+] [-] astrea|6 years ago|reply
[+] [-] hknd|6 years ago|reply
[+] [-] d1zzy|6 years ago|reply
Also how is having access to small samples of audio a privacy issue? Are they also receiving enough information to attach an identity to the audio clips? How long are the clips? Are they randomly assigned to humans? Do those humans get to listen to multiple clips from the same Home device and can they tell that's the case?
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] inerte|6 years ago|reply
Search for languages like Portuguese, Swedish, Chinese, etc on LinkedIn and you'll find the jobs posts https://www.linkedin.com/jobs/search/?keywords=portuguese&lo...
[+] [-] paganel|6 years ago|reply
[1] https://en.wikipedia.org/wiki/Securitate
[+] [-] d1zzy|6 years ago|reply
[+] [-] viklove|6 years ago|reply
[+] [-] drstewart|6 years ago|reply
[1] https://en.wikipedia.org/wiki/IBM_and_the_Holocaust
[+] [-] chance_state|6 years ago|reply
[+] [-] duxup|6 years ago|reply
I bugged my house... NOW MY HOUSE IS BUGGED!
Not to dismiss the value of the news here, it is important for folks to know, but the overall situation is both concerning, and amusing.
[+] [-] lovetocode|6 years ago|reply
[+] [-] RosanaAnaDana|6 years ago|reply
Even with that, I'm sure that the engineers are flagging voice requests that happen more then once, or where some one has to manually change or correct what the software thought was the request.
This is only creepy if you don't understand how the software works.
[+] [-] gtirloni|6 years ago|reply
[+] [-] JorgeGT|6 years ago|reply
[+] [-] rev12|6 years ago|reply
That being said, my biggest concern is the fact that many of these device don't have a hardware microphone kill switch. I feel better when I know I can control when a device is listening in. I've read reports that some Alexa devices have them, but I don't own any so I am unable to verify that.
I want all of my devices with microphones to hardware based kill switch for the mic; that's my phone, laptop, tablet, everything.
[+] [-] groovybits|6 years ago|reply
However, Google does not specify how long a 'conversation' is. How many sentences make up a conversation? When is the cutoff point?
Google also says '1 in 500' conversations are monitored. That means for any one 'language expert', there are approx. 133,200 conversations/day that have a chance of being monitored.
So basically, you have a 0.2% chance that your conversation is being picked up by any particular 'language expert' per day.
[+] [-] Ensorceled|6 years ago|reply
[+] [-] rchaud|6 years ago|reply
[+] [-] amacneil|6 years ago|reply
[+] [-] jankeymeulen|6 years ago|reply
Submitted link is citing this one...
[+] [-] rosszurowski|6 years ago|reply
Anyone know why that would be the case? I'm trying to not assume malice (eg. maybe it got misflagged?) but it certainly feels like censoring and is yet another push for me to drop Messenger too.
[1]: https://i.imgur.com/9n1Hyqb.png
[+] [-] bisRepetita|6 years ago|reply
I want to know what instructions both humans and computers are given if they hear illegal actions, such as violence, illicit trade, etc
If you are an employee, and hear a rape scene, a blackmailing dialog, do you have a duty to report, or to remain silent?
I also want to know how much access law enforcement has on this data. And whether they can re-identify the info, with or without a warrant.