Definitely not solved. I wouldn't even say search is a well defined problem.
In any case, PageRank is a method for estimating quality of a page based on the amount of inbound links, not a solution to all of search.
But it's a property of the web at the time, not something universal to the search problem, e.g. it's not a statistic that exists if you want to search books.
I think the work being done on question answering (given a question and a document that answers the question, provide a concise answer) is a place where a lot of interesting work is being done, both in academia and at Google with the snippets of web pages it provides.
In particular, PageRank can't be used for corpora whose documents do not link to each other explicitly somehow - which, outside of hypermedia, is nowhere.
My impression is that "search" isn't so much of a solved problem, but the question is changing.
The next–very unsolved–problem is being able to "understand" natural language queries and "understand" source materials such that a user can ask for something and get it.
"Understand" is in quotes because because it means something rather specific.
Ha, not only is search not a solved problem, I would posit that search is getting WORSE.
Computer knowledge is a particularly good example for how search is degrading with time.
Try to figure out how to do X on the Beaglebone Black (I presume the Raspberry Pi has a similar problem, but it's not something I'm that familiar with).
The problem is that the Linux implementation for the Beaglebone went from weird distribution (Angstrom) to mainline Debian Linux kernel 3.8 -> 4.4 -> 4.14 in a VERY short time so the number of links to new stuff stayed flat.
Consequently, the old Angstrom stuff almost always fills the initial search positions for quite a ways even though it's completely useless.
This is occurring in other things, as well. Stack Overflow, for example, has no way to mark an answer as "This was correct 5 years ago but is now wrong."
Effectively, the web is becoming sclerotic and search engines are following it.
I REALLY miss old AltaVista's feature where it would give you a graphical representation of the clusters in your search so you could drill down into a less popular grouping. The fact that nobody has recreated this makes me wonder ...
> I am curious to know what major innovations in search engines happened since the page rank algorithm, or were there only incremental improvements?
a ton has happened! since pagerank, theres been a ton of advances around nlp that has changed the way queries are processed prior to information retrieval. for example, google's rankbrain seems to do a lot of the heavylifting around word similarity.
I certain wouldn't, since I still encounter things that I know are on the internet but Google can't find. It's possible that the next advance won't be actually indexing the web but rather figuring out what the user wants rather than what they requested.
> what the user wants rather than what they requested.
Amusing anecdote regarding this issue.
- I teach an introductory online chemistry class.
- If the students are determined enough, they can/do cheat on their quizzes.
- In one of my quizzes, I give the students a formula for a pretend material and ask them to compute its molar mass.
- If you perform the calculation, the molar mass works out to something like 108 grams / mole.
- If you try to Google the answer, Google is smart enough to know that my compound is unstable.
- Instead, Google provides the molar mass for a _related_ material (86 grams / mole)
- Each semester, I find a handful of students who dutifully tell me the answer is 86 g / mole.
Google does this to some extent. Recently I've found that Google uses my past queries to make cogent suggestions for my next search (essentially to predict what I want next based on prior info). Eg, if I've just searched for "sully", then a short time after type "t" into Google, the first suggestion is Tom Hanks. I've only noticed this in the last few months.
Googles monetization strategy blinds the results. My guess would be either a way to search beyond google or force it to give results that aren’t manipulated somehow.
Eridrus|8 years ago
In any case, PageRank is a method for estimating quality of a page based on the amount of inbound links, not a solution to all of search.
But it's a property of the web at the time, not something universal to the search problem, e.g. it's not a statistic that exists if you want to search books.
I think the work being done on question answering (given a question and a document that answers the question, provide a concise answer) is a place where a lot of interesting work is being done, both in academia and at Google with the snippets of web pages it provides.
aisofteng|8 years ago
colechristensen|8 years ago
The next–very unsolved–problem is being able to "understand" natural language queries and "understand" source materials such that a user can ask for something and get it.
"Understand" is in quotes because because it means something rather specific.
fh973|8 years ago
bsder|8 years ago
Ha, not only is search not a solved problem, I would posit that search is getting WORSE.
Computer knowledge is a particularly good example for how search is degrading with time.
Try to figure out how to do X on the Beaglebone Black (I presume the Raspberry Pi has a similar problem, but it's not something I'm that familiar with).
The problem is that the Linux implementation for the Beaglebone went from weird distribution (Angstrom) to mainline Debian Linux kernel 3.8 -> 4.4 -> 4.14 in a VERY short time so the number of links to new stuff stayed flat.
Consequently, the old Angstrom stuff almost always fills the initial search positions for quite a ways even though it's completely useless.
This is occurring in other things, as well. Stack Overflow, for example, has no way to mark an answer as "This was correct 5 years ago but is now wrong."
Effectively, the web is becoming sclerotic and search engines are following it.
I REALLY miss old AltaVista's feature where it would give you a graphical representation of the clusters in your search so you could drill down into a less popular grouping. The fact that nobody has recreated this makes me wonder ...
MaxBarraclough|8 years ago
Not counting comments? What more could you ask for?
vinn124|8 years ago
a ton has happened! since pagerank, theres been a ton of advances around nlp that has changed the way queries are processed prior to information retrieval. for example, google's rankbrain seems to do a lot of the heavylifting around word similarity.
saagarjha|8 years ago
I certain wouldn't, since I still encounter things that I know are on the internet but Google can't find. It's possible that the next advance won't be actually indexing the web but rather figuring out what the user wants rather than what they requested.
busyant|8 years ago
Amusing anecdote regarding this issue.
osrec|8 years ago
ianai|8 years ago
sehugg|8 years ago