They completely missed, with 1800+ citations, the winner of the “Theory of Cryptography Conference (TCC) 2016 Test of Time award”: “Calibrating Noise to Sensitivity in Private Data Analysis” by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Oh, it also just won the 2017 Gödel Prize; it really ought to be at the top of both the “Theoretical Computer Science” and “Computer Security and Cryptography” lists.
Worse still, with ~3000 citations, Dwork’s “Differential Privacy” (ICALP (2) 2006: 1-12), should rank even higher in the Theoretical Computer Science list. But Google Scholar has completely lost track of that foundational paper; it’s got it all confused with a completely different paper, Dwork’s 2008 “Differential Privacy: A Survey of Results”. Note that this also means that anybody searching for the general topic “differential privacy” on Google Scholar will not get to see the most-cited paper about it! https://www.microsoft.com/en-us/research/wp-content/uploads/...
Disclaimer: Dwork and I have been seen together, for 24 years.
From the article: "This release of classic papers consists of articles that were published in 2006..". Your second one could be there (I haven't looked for it), but you're mentioning some problems with the article, maybe it's that..
This has left me scratching my head - why just 2006 ? Having just one year of publications and labeling them "Classic Papers" is pretty misleading as the term is used to indicate a wide gamut of publications over a much longer period of time. It should be just called "Top papers or research from 2006". Unless this expands to at least cover a decade, it shouldn't be labeled as such.
This almost sounds like collecting my most liked pics from 2006 on Facebook and creating an album "Best moments of my life".
I was expecting "classic" to mean papers like Part-time Parliament, Mathematical Theory of Communication, Unix Time-Sharing System, etc. Certainly was in for a surprise...
They certainly do have data prior to 2006, based on Google Scholar results. It seems like an odd choice, but it's explicitly stated that these articles were chosen because they're roughly 10 years old.
I do find some of their choices a bit odd, though. Surely they can come up with better examples? The BigTable paper (OSDI '06) out of Google itself has far more citations (~4x per google scholar citation counts) of the highest-ranked DB paper, and I'd say it's much higher impact than any of them, being one of the early papers of the NoSQL movement. I'd understand if the algorithm in play were more nuanced, but the introductory page explicitly states that these are the most-cited papers of 2006, which doesn't seem to be the case.
Obligatory disclaimer: despite my current employment status, these views don't represent Google's.
This should be as simple as running a query in e.g. scholar: select area/field, sort by most cited, while ignoring citations that occur within x years of publication. Also, one could expand the citation relation transitively (like pagerank but without cycles).
As one might guess, there is a lot wrong with this list even within there stated goals. My examples are drawn from mathematics, since that's what I know. They appear to use the journal to classify category, which doesn't work very well since many of the best results are published in general journals. Additionally, since citation counts vary so widely between sub-fields, there is a strong pull towards selecting misclassified work from higher-citation fields. For example the paper "High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension" is listed in geometry but belongs elsewhere, and there are no probability papers in the category "Probability and Statistics with Applications". Also, the "Pure & Applied" category is meaningless. That list seems to be the most cited papers from five arbitrary journals. I guess it's a reminder that these problems are hard to automate, and that your work doesn't have to be perfect to share.
Cognitive Science suffers from the same problem of misclassifications from higher-citation fields (neuroscience).
Agreed that projects don't have to be perfect but it does have to have some functionality to ship... I don't see how I could use this could help me construct a course reading list or to improve my understanding of my academic field, given the problems.
Also, were you able to find any papers in number theory? That's a huge gap as it is one of mathematics's primary subfields. Analysis seems to represented, as well as topology (via "geometry").
Out of curiosity, does anyone have any examples of scientific books (or papers) that are the exact opposite: influential or famous at the time but completely and utterly destroyed by the test of time. Like, that seem silly to us in how completely and utterly wrong they turned out to be in their every single conclusion.
I'm thinking about research versions of Lord Kevin's favorite edict: "Heavier than air flying machines impossible" or the patent person (examiner? head of patent office?) who in the nineteenth century said everything that can be invented has been invented.
Not a field, but a person who everyone thought was Nobel prize bound and it turned out to be all BS. You may think that it's just one person, but the amount of research dollars that got allocated to try and prove or disprove all of this work would be staggering. https://en.wikipedia.org/wiki/Schön_scandal
Sure fields of research go obsolete all the time. E.g. much of the computer vision stuff from 2006 is basically dead now. If you go further back, a lot of early AI research was exciting at the time, but is entirely forgotten about now.
Methodology is not described and the resulting collections are of notably poor quality. Given Google's privileged position in knowledge production I wish they would be far more careful in cases like this.
For everyone disappointed to see papers only from 2006, here is a consolation prize. Creating a Computer Science Canon: a Course of “Classic” Readings in Computer Science: http://l3d.cs.colorado.edu/~ctg/pubs/sigcsecanon.pdf (CS only, date range = [1806:2006])
This is also very interesting: the AAAI Classic Paper Award.
The AAAI Classic Paper award honors the author(s) of paper(s) deemed most influential, chosen from a specific conference year. Each year, the time period considered will advance by one year.
Papers will be judged on the basis of impact, for example:
Started a new research (sub)area
Led to important applications
Answered a long-standing question/issue or clarified what had been murky
Made a major advance that figures in the history of the subarea
Has been picked up as important and used by other areas within (or outside of) AI
Has been very heavily cited
In the Middle Eastern and Islamic Studies section, five of the ten cited papers are about Turkey. Another is about representation of Islam in the Australian media.
This... doesn't seem like a very representative selection of 'timeless' papers.
The security examples were weak. Far more influential were the Ware or Anderson reports, MULTICS security evaluation, anything describing Orange Book-style systematic assurance of whole systems, at least one on capability-security or by Butler Lampson (did access control too), something on monitoring/logging, something on static analysis, CompCert or Coq, and so on.
Things that had a major impact on the problems they focused on which many other papers doing something similar built on or constantly referenced. I'm skeptical of citations in general since those who chase them usually do a high number of quotable papers in whatever fad is popular instead of hard, deep, and critical work. Those I listed are the latter with who knows what citations. The collection is probably still nice for finding neat ideas or just learning in general.
The point of the exercise is to find papers that are widely considered valuable, especially to other researchers. To do this, they're using citation counts.
There's obviously a number of problems with citations, including self-cites, negative citations ("Alice & Bob '06 shook the community when they found things, but our better, larger study finds no evidence of any effect"), and such. But it makes sense for a company built upon citation rank indexing to rely on such methods =)
[+] [-] drfuchs|8 years ago|reply
Worse still, with ~3000 citations, Dwork’s “Differential Privacy” (ICALP (2) 2006: 1-12), should rank even higher in the Theoretical Computer Science list. But Google Scholar has completely lost track of that foundational paper; it’s got it all confused with a completely different paper, Dwork’s 2008 “Differential Privacy: A Survey of Results”. Note that this also means that anybody searching for the general topic “differential privacy” on Google Scholar will not get to see the most-cited paper about it! https://www.microsoft.com/en-us/research/wp-content/uploads/...
Disclaimer: Dwork and I have been seen together, for 24 years.
[+] [-] jventura|8 years ago|reply
[+] [-] nyrulez|8 years ago|reply
This almost sounds like collecting my most liked pics from 2006 on Facebook and creating an album "Best moments of my life".
Do they not have data before 2006 ?
[+] [-] vitus|8 years ago|reply
They certainly do have data prior to 2006, based on Google Scholar results. It seems like an odd choice, but it's explicitly stated that these articles were chosen because they're roughly 10 years old.
I do find some of their choices a bit odd, though. Surely they can come up with better examples? The BigTable paper (OSDI '06) out of Google itself has far more citations (~4x per google scholar citation counts) of the highest-ranked DB paper, and I'd say it's much higher impact than any of them, being one of the early papers of the NoSQL movement. I'd understand if the algorithm in play were more nuanced, but the introductory page explicitly states that these are the most-cited papers of 2006, which doesn't seem to be the case.
Obligatory disclaimer: despite my current employment status, these views don't represent Google's.
[+] [-] a3n|8 years ago|reply
As they said in the post, they're measuring cites 10 years after. It's 2017. I imagine 2006 is their "inaugural year."
[+] [-] tandav|8 years ago|reply
I naively thought that it is a simple thing and someone have that "collection of best articles".
Things are going to more like "this is hard problem"
[+] [-] amelius|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] ThomPete|8 years ago|reply
[+] [-] diggan|8 years ago|reply
For more papers, there is a nice list here: http://jeffhuang.com/best_paper_awards.html not limited to 2006
There is a bunch more places to get papers listed here too: https://github.com/papers-we-love/papers-we-love#other-good-...
[+] [-] bokertov|8 years ago|reply
https://www.google.com/amp/s/selfcitation.wordpress.com/2011...
[+] [-] whynotqat|8 years ago|reply
[+] [-] glup|8 years ago|reply
Agreed that projects don't have to be perfect but it does have to have some functionality to ship... I don't see how I could use this could help me construct a course reading list or to improve my understanding of my academic field, given the problems.
[+] [-] stonesixone|8 years ago|reply
[+] [-] dev_tty01|8 years ago|reply
[+] [-] a3n|8 years ago|reply
[+] [-] logicallee|8 years ago|reply
I'm thinking about research versions of Lord Kevin's favorite edict: "Heavier than air flying machines impossible" or the patent person (examiner? head of patent office?) who in the nineteenth century said everything that can be invented has been invented.
[+] [-] ferdterguson|8 years ago|reply
[+] [-] Houshalter|8 years ago|reply
[+] [-] dekhn|8 years ago|reply
[+] [-] glup|8 years ago|reply
[+] [-] ivansavz|8 years ago|reply
[+] [-] kensai|8 years ago|reply
The AAAI Classic Paper award honors the author(s) of paper(s) deemed most influential, chosen from a specific conference year. Each year, the time period considered will advance by one year.
Papers will be judged on the basis of impact, for example:
https://aaai.org/Awards/classic.php[+] [-] joatmon-snoo|8 years ago|reply
[+] [-] spatulon|8 years ago|reply
[+] [-] idlewords|8 years ago|reply
This... doesn't seem like a very representative selection of 'timeless' papers.
[+] [-] nickpsecurity|8 years ago|reply
Things that had a major impact on the problems they focused on which many other papers doing something similar built on or constantly referenced. I'm skeptical of citations in general since those who chase them usually do a high number of quotable papers in whatever fad is popular instead of hard, deep, and critical work. Those I listed are the latter with who knows what citations. The collection is probably still nice for finding neat ideas or just learning in general.
[+] [-] nadim|8 years ago|reply
https://en.wikipedia.org/wiki/The_Source#The_Source.27s_Five...
[+] [-] Aardappel|8 years ago|reply
[+] [-] seanmcdirmid|8 years ago|reply
[+] [-] hkon|8 years ago|reply
[+] [-] threepipeproblm|8 years ago|reply
[+] [-] blt|8 years ago|reply
[+] [-] husamia|8 years ago|reply
[+] [-] teddyh|8 years ago|reply
[+] [-] qrbLPHiKpiux|8 years ago|reply
[+] [-] jldugger|8 years ago|reply
The point of the exercise is to find papers that are widely considered valuable, especially to other researchers. To do this, they're using citation counts.
There's obviously a number of problems with citations, including self-cites, negative citations ("Alice & Bob '06 shook the community when they found things, but our better, larger study finds no evidence of any effect"), and such. But it makes sense for a company built upon citation rank indexing to rely on such methods =)