Like many here I'm a huge fan of Neal Stephenson. A lot of people around weren't big fans of Anathem. I actually really liked it.
One of the ideas that came up in that book was the Reticulum (Internet) was populated by "botnet ecologies" that subtly manipulated facts, streams and the like such that filtering this out became another industry (of course).
I've seen the idea that this lies in our future raised here and it seems to get mocked. I think the idea has a lot of merit.
This makes sense. For me, the main problem of Google is trying to retrieve a treasure out of garbage. While the Internet has a lot of good information, much (most) of it is incorrect -- sometimes on purpose as you suggest. I would be much more interested in a learning system that is able to retrieve information from authoritative sources such as books, for example.
>> "Behind the scenes, Google doesn't only have public data," says Suchanek. It can also pull in information from Gmail, Google+ and Youtube."You and I are stored in the Knowledge Vault in the same way as Elvis Presley," Suchanek says.
I really hope Google does not use Gmail data for projects other than ads. They really needs to ask users to opt-in to this kind of data sharing. I'm ok with gmail being read for ads, but almost anything else is unethical, especially some experimental knowledge base.
> I really hope Google does not use Gmail data for projects other than ads.
It's already used by the Google Now cards on Android, and it's a fantastic feature. If I book a flight, I automatically get a card that reminds me to leave for the airport at the correct time (taking traffic into account), without any interaction on my part.
Why should google care what you are ok with after they already have all your data? If you don't want them to be able to engage in activities like this then don't give them your data in the first place.
The actual corpus that is worth using is the book corpus. While Google can't provide public access to all of the books it has scanned there is no restriction on them using the data in the books to feed this project. Given the amount of information they have scanned from libraries and elsewhere that is a much better source.
The funny thing is Doctorow makes references to "just metadata" years before it became a public issue, however this goes beyond metadata, and will eventually contain facts about people, not just tangential stuff.
"This isn't P.I.I."—Personally Identifying Information, the toxic smog of the information age—"It's just metadata. So it's only slightly evil."
How does this compare with NELL[0] from CMU? I'm assuming it's something like NELL, but scaled up 1000x because Google is not limited to how often it can search its own index, whereas NELL is limited to 10K queries/day?
Hi, I’m Kevin Murphy, one of the researchers at Google who worked on this project. Just to be clear, KV did NOT involve any private data sources -- it just analyzed public text on the web. (And yes, we do try to estimate reliability of the facts before incorporating them into KV.)
Also, KV is not a launched product, and is not replacing Knowledge Graph.
It might even be possible to use a knowledge base as detailed and broad as Google's to start making accurate predictions about the future based on analysis and forward projection of the past.
Hello Hari Seldon, psychohistory and mathematical sociology!
I don't believe this is what the article is talking about (knowledge vault) though. This is just the human and lightly machine curated graph (knowledge graph).
I see a lot of downvoting here of posts that express very reasonable concerns about privacy if Google is actually using private emails for this AI.
That Google is engaging in this behavior is indeed speculation, as far as I know. However, Google employees/allies have to realize that attempts to suppress debate on this issue can only backfire on them. Indeed, the fact that they don't have explicit policy on this (correct me if I'm wrong) is one of the reasons researchers are speculating.
It may well be that most people would agree with and/or permit Google to use their data in this way, but people should be given the opportunity to debate it in a reasonable fashion, else it looks like it was forced down their throats. And that's no good for anyone.
>> "Behind the scenes, Google doesn't only have public data," says Suchanek. It can also pull in information from Gmail, Google+ and Youtube."You and I are stored in the Knowledge Vault in the same way as Elvis Presley," Suchanek says.
Ugh... that's a bit much... because now any employee at google could potentially get access to random facts about me gleaned from my personal and business emails? Good luck keeping different levels of confidential information segregated correctly. That's awesome.
Most of the ideas produced by Socrates / Plato / Aristotle were in fact wrong. They are not a good primer on epistemology, concepts, percepts, metaphysics or anything else. They're a good primer on the history of philosophy.
They inspired incredible progress on thinking and understanding, but they were wrong more often than they were right, and are a poor reference to understanding what knowledge is.
Isn't it nice that millions of people made web pages that Google decided to scrape to harvest the work of others and run ads next to it for themselves?
Now try scraping Google and see what they do to you.
Many large sites don't allow scraping because of unnecessary server load (denial of service sometimes) so they'll offer an API where you can download content in a controlled (and monitorable) manner.
Those millions of people want google to scrape and harvest, in the hope that they will rank higher etc etc.
If an unknown person tries to scrape, he/she will promptly get banned by those very same people (Google wouldn't like someone scraping their stuff either).
"Knowledge Vault has pulled in 1.6 billion facts to date", does this fact also include the fact that I am adding more facts right now? What fact metric is this fact?
Knowing the people who have left Google, who collected a lot of that data, who we trusted, who are now gone, I wonder what other non-public data is being used, and how is it being used, and for only good purposes, or for nefarious purposes?
[+] [-] bra-ket|11 years ago|reply
This knowledge graph is probably the largest Bayesian network out there
[+] [-] Chronic29|11 years ago|reply
[deleted]
[+] [-] sixQuarks|11 years ago|reply
spammers will be populating the web with "facts" that suit themselves.
[+] [-] cletus|11 years ago|reply
One of the ideas that came up in that book was the Reticulum (Internet) was populated by "botnet ecologies" that subtly manipulated facts, streams and the like such that filtering this out became another industry (of course).
I've seen the idea that this lies in our future raised here and it seems to get mocked. I think the idea has a lot of merit.
[+] [-] coliveira|11 years ago|reply
[+] [-] wernercd|11 years ago|reply
http://spring.newsvine.com/_news/2006/08/01/307864-stephen-c...
[+] [-] dm2|11 years ago|reply
I really hope Google does not use Gmail data for projects other than ads. They really needs to ask users to opt-in to this kind of data sharing. I'm ok with gmail being read for ads, but almost anything else is unethical, especially some experimental knowledge base.
[+] [-] yid|11 years ago|reply
It's already used by the Google Now cards on Android, and it's a fantastic feature. If I book a flight, I automatically get a card that reminds me to leave for the airport at the correct time (taking traffic into account), without any interaction on my part.
[+] [-] magicalist|11 years ago|reply
Luckily the guy who said that is from Télécom ParisTech, i.e. he was completely speculating.
Public posts from google+ and youtube are fine, though.
[+] [-] dredmorbius|11 years ago|reply
One of the best discussions bar none of this issue I've seen.
[+] [-] jacquesm|11 years ago|reply
[+] [-] ChuckMcM|11 years ago|reply
[+] [-] sudont|11 years ago|reply
The funny thing is Doctorow makes references to "just metadata" years before it became a public issue, however this goes beyond metadata, and will eventually contain facts about people, not just tangential stuff.
"This isn't P.I.I."—Personally Identifying Information, the toxic smog of the information age—"It's just metadata. So it's only slightly evil."
[+] [-] discardorama|11 years ago|reply
[0] http://rtw.ml.cmu.edu/rtw/
[+] [-] murphyk|11 years ago|reply
Unfortunately, I cannot do a more detailed Q&A here, but if you want more details, please read the original paper here: http://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf. (Note that an earlier version of the work was presented at a CIKM workshop in Oct 2013 (see http://www.akbc.ws/2013/ and http://cikm2013.org/industry.php#kevin). We have also published tons of great related research at http://research.google.com/pubs/papers.html
[+] [-] dctoedt|11 years ago|reply
[1] http://en.wikipedia.org/wiki/Cyc
[+] [-] turbolent|11 years ago|reply
[+] [-] batbomb|11 years ago|reply
http://deepdive.stanford.edu/
[+] [-] turbolent|11 years ago|reply
[+] [-] panarky|11 years ago|reply
Hello Hari Seldon, psychohistory and mathematical sociology!
http://en.wikipedia.org/wiki/Foundation_series
http://en.wikipedia.org/wiki/Mathematical_sociology
[+] [-] walterbell|11 years ago|reply
There are bots [1] making Wikipedia contributions, Google could also make automated contributions to Wikipedia/Wikidata.
[1] http://wikipedia-edits.herokuapp.com/
[+] [-] rryan|11 years ago|reply
I don't believe this is what the article is talking about (knowledge vault) though. This is just the human and lightly machine curated graph (knowledge graph).
[+] [-] jnbiche|11 years ago|reply
That Google is engaging in this behavior is indeed speculation, as far as I know. However, Google employees/allies have to realize that attempts to suppress debate on this issue can only backfire on them. Indeed, the fact that they don't have explicit policy on this (correct me if I'm wrong) is one of the reasons researchers are speculating.
It may well be that most people would agree with and/or permit Google to use their data in this way, but people should be given the opportunity to debate it in a reasonable fashion, else it looks like it was forced down their throats. And that's no good for anyone.
[+] [-] dave_sullivan|11 years ago|reply
Ugh... that's a bit much... because now any employee at google could potentially get access to random facts about me gleaned from my personal and business emails? Good luck keeping different levels of confidential information segregated correctly. That's awesome.
[+] [-] api|11 years ago|reply
[+] [-] john61|11 years ago|reply
[+] [-] adventured|11 years ago|reply
Most of the ideas produced by Socrates / Plato / Aristotle were in fact wrong. They are not a good primer on epistemology, concepts, percepts, metaphysics or anything else. They're a good primer on the history of philosophy.
They inspired incredible progress on thinking and understanding, but they were wrong more often than they were right, and are a poor reference to understanding what knowledge is.
[+] [-] ck2|11 years ago|reply
Now try scraping Google and see what they do to you.
[+] [-] dm2|11 years ago|reply
If you provide value to Google they will make an API to allow accessing that data easier.
By scraping do you mean scraping their search results? They offer this, which is nice: https://developers.google.com/custom-search/
Many large sites don't allow scraping because of unnecessary server load (denial of service sometimes) so they'll offer an API where you can download content in a controlled (and monitorable) manner.
[+] [-] vijayr|11 years ago|reply
If an unknown person tries to scrape, he/she will promptly get banned by those very same people (Google wouldn't like someone scraping their stuff either).
Different players different rules, I guess.
[+] [-] plicense|11 years ago|reply
[+] [-] hanula|11 years ago|reply
[+] [-] illumen|11 years ago|reply