Ask HN: How do you get notified about newest research papers in your field?

[+] karpathy|9 years ago|reply

I wrote http://www.arxiv-sanity.com/ (code is open source on github: https://github.com/karpathy/arxiv-sanity-preserver) as a side project intended to mitigate the problem of finding newest relevant work in an area (among many other related problems such as finding similar papers, or seeing what others are reading) and it sees a steady number of few hundred users every day and a few thousand accounts. It's meant to be designed around modular views of lists of arxiv papers, each view supporting a use case. I'm always eager to hear feedback on how people use the site, what could be improved, or what other use cases could be added.

[+] visarga|9 years ago|reply

Andrej, thank you very much for making this site. I use it every day.

A problem: I think one of the most necessary things that are missing from arXiv.org is comments. People just come, read, and then take their discussions somewhere else, fragmented all around the net. Arxiv-Sanity already filters just the ML articles and does personalized feeds, maybe it could also be a place of discussion. I know it potentially leads to other complications (like moderation), but I really think readers would benefit from reviews, questions and answers.

The current ML related discussion sites (blogs, /r/machinelearning, G+, Twitter, StackExchange and YC) are often mixed with lots of noise. I'd like to read what researchers think.

Another suggestion: add links to code repositories, where they are available. Maybe some of your trusted users could be empowered with the right to add such links, if it's too much work for a single person. If interesting discussions are reported on other pages on the internet, they could also be added to the article, to make them easier to find.

[+] dspoka|9 years ago|reply

For me getting alerted when there are new papers that cite papers that are relevant towards my current research topic would be ideal. Google scholars has alerts on authors and search queries but for me they don't have enough recall.

Its much easier to tell when a paper is relevant for me if it happens to cite 3 of the commonly used datasets for my particular task.

btw I use arxiv-sanity, its pretty great, thanks a lot!

[+] pfd1986|9 years ago|reply

I also use a homemade code to keep up with new papers.

I feed in a .bib file with papers I like and use a Naive Bayes classifier to find papers I might like in news feeds (science, nature, PNAS, etc).

It works pretty well. As a bonus you can use post high ranked papers to slack or use papers sent to me by other people to repopulate the bib file.

Always welcoming suggestions: https://github.com/pfdamasceno/shakespeare

[+] bchjam|9 years ago|reply

not exactly the same thing, but http://www.gitxiv.com/ is pretty cool for pairing papers with source

[+] the_duke|9 years ago|reply

Wouldn't you miss a lot of important publications when just checking arxiv?

[+] harry_puttar|9 years ago|reply

Thanks. This is an interesting approach to get the needed.

[+] ztianjin|9 years ago|reply

it is very helpful

[+] Al-Khwarizmi|9 years ago|reply

(1) I manually check the proceedings of the important conferences in my subfield when they come out.

(2) I check my field's arXiv every other day or so.

(3) Google Scholar alerts me of papers that it thinks will interest me, based on my own papers, and it's very useful. Most of what it shows me is in fact interesting for me, and it sometimes catches papers from obscure venues that I wouldn't see otherwise. The problem is that you need to have papers published for this to work, and also, it's only good for stuff close to your own work, not that much for expanding horizons - (1), (2) and Google Scholar search are better for that.

[+] moyix|9 years ago|reply

Yep, this is what I do, except that security papers don't make it to arXiv so I also keep an eye on twitter (I have followed a bunch of academic security people) and a couple subreddits (/r/ReverseEngineering, /r/REMath, and /r/systems). It's not ideal, but it works out okay.

None of them are a substitute for a proper related work search when I'm writing up a paper though, this is just to keep current on what the trends and interests of the community are.

[+] copperx|9 years ago|reply

What does arXiv provide that your conference proceedings / association doesn't?

For example, I usually log in to the ACM site and go to my SIGs and see what's new there. I've never thought about visiting arXiv.

[+] Havoc|9 years ago|reply

>The problem is that you need to have papers published for this to work

The one place where one could actually use a "Follow" button for other people...there isn't one. Classic.

[+] ChuckMcM|9 years ago|reply

This is a great list, also there are sometimes mailing lists dedicated to a particular topic or field. It helps to have more eyes on the net.

[+] mbjorling|9 years ago|reply

I like to follow The morning paper by Adrian Colyer. He writes a summary of an influential CS paper each day and sends it out on his e-mail list.

https://blog.acolyer.org/

[+] sampo|9 years ago|reply

Write one influential paper. Then all the later papers in the same sub-subfield probably cite your paper. Go to Google Scholar and check the latest citations to your paper.

Ok, it doesn't need to be your paper. Just find a paper that was so influential that others working on the same problem probably will cite it, and monitor the new citations.

[+] chrisamiller|9 years ago|reply

I came in to say exactly this. Google Scholar alerts are incredibly useful.

[+] dredmorbius|9 years ago|reply

So, that's close to how I operate (basically bibliography-surfing), though with one handicap: what do you use to track citations?

Particularly something that's generally open.

Best tool I've got ready access to is Google Scholar. There are citations indices I can get access to, by going on-site to a specific facility, but that's pretty limiting when the rest of my work can be done (and the bulk of my materials) are in my office.

(And yes, I'm aware that having to go to where the indices are is how it Used to Be Done, and in fact, I Did That. Technology has moved on.)

[+] devin|9 years ago|reply

Huh, that seems so obvious in retrospect. This is basically how I've grown into jazz. I find someone I like, find out who they played with, who those folks played with, and so on.

[+] jeffspies|9 years ago|reply

Just FYI, you should know about SHARE. It's an effort to create a free, open dataset of research activity across the research lifecycle. You can read more at

http://share-research.org

So, if you want to see a reddit for research, better news feeds, etc., it is the SHARE dataset that can provide that data. SHARE won't build all those things--we want to facilitate others in doing so. You can contribute at

https://github.com/CenterForOpenScience/share

The tooling is all free open source, and we're just finishing up work on v2. You can see an example search page http://osf.io/share, currently using v1. Some more info on the problem and our approach....

What is SHARE doing?

SHARE is harvesting, (legally) scraping, and accepting data to aggregate into a free, open dataset. This is metadata about activity across the research lifecycle: publications and citations, funding information, data, materials, etc. We are using both automatic and manual, crowd-sourced curation interfaces to clean and enhance what is usually highly variable and inconsistent data. This dataset will facilitate metascience (science of science) and innovation in technology that currently can't take place because the data does not exist. To help foster the use of this data, SHARE is creating example interfaces (e.g., search, curation, dashboards) to demonstrate how this data can be used.

Why is SHARING doing it?

The metadata that SHARE is interested in is typically locked behind paywalls, licensing fees, restrictive terms of service and licenses, or a lack of APIs. This is the metadata that powers sites like Google Scholar, Web of Science, and Scopus--literature search and discovery tools that are critical to the research process but that are incredibly closed (and often incredibly expensive to access). This means that innovation is exclusive to major publishers or groups like Google but is otherwise stifled for everyone else. We don't see theses, dissertations, or startups proposing novel algorithms or interfaces for search and discovery because the barrier of entry in acquiring the data is too high.

[+] austinjp|9 years ago|reply

Hi. This looks really interesting. Unfortunately the results page after a search freezes the stock browser on my LG G3.

I've also read the front page, the about page, and your post several times, and I'm not exactly clear what you provide. I thought I'd do some searches to see the product made sense. A search for a field in interested in, arthritis, yielded zero results. Okay, so... no medical research? A search for "reddit" yielded results, and mentions of "providers". I'm not clear what providers are... is reddit a provider, or the research papers, or the publishers, or the researchers...?

I'll read more later when I'm not on mobile, maybe it will be clearer.

I'm starting a project related to analysing published research, so this is a field I'm very interested in. I hope SHARE can help in some way, and I'll definitely be keeping tabs on your work. Thanks for posting.

[+] the_duke|9 years ago|reply

Are there any plans to provide an API or any kind of database dump to allow building other services based on the aggregated data?

[+] gravypod|9 years ago|reply

I know this question is probably a little off topic for this post but I'm very eager to get some kind of answer.

What should I be reading? I'm a computer science student, I want to go into a "Software Engineering" line of work. Are there any places to read up on related topics? I have yet to find something that interests my direct field of choice. Is there one on in academia writing about software?

I also like NLP and other interesting parts. Basically all practical software and their applications are things that interest me.

[+] azhenley|9 years ago|reply

ICSE [1] and FSE [2] are the top software engineering research conferences. Skimming the titles/abstracts of their papers each year doesn't take long.

Also, they generally have industry or "in practice" tracks that have postmortems from the big software companies in case you want something more applied.

[1] http://2016.icse.cs.txstate.edu/

[2] http://www.cs.ucdavis.edu/fse2016/

[+] jessriedel|9 years ago|reply

I'll suggest a minority position: If you feel the need to keep up at the bleeding edge of your field, your work is probably replaceable, i.e., if you didn't do it then someone else would do it a year later.

Instead, read more review papers and seminal papers in your field.

[+] hood_syntax|9 years ago|reply

There are a lot of papers on sentiment analysis if I recall correctly. I would look into literature on parsing and statistical analysis, a lot of big data stuff is related to that and there are a lot of books on big data. Very popular field to hire people for as well, a lot of big companies want people to massage their data into giving them useful avenues for money-making.

[+] kirang1989|9 years ago|reply

You might want to check out the Papers We Love repo at https://github.com/papers-we-love/papers-we-love. That's my goto resource.

[+] dredmorbius|9 years ago|reply

Tossing out a contrarian view: I'm finding there's a tremendous amount of good information and publishing that's old. Keeping up with the cutting-edge can be interesting, but you have to do a lot of the filtering yourself.

Finding out how to identify the relevant older work in your field, finding it, reading it, and seeing for yourself how it's aged, been correctly -- or quite often incorrectly -- presented and interpreted, and what stray gems are hidden within it can be highly interesting.

I've been focusing on economics as well as several other related fields. Classic story is that Pareto optimisation lay buried for most of three decades before being rediscovered in the 1920 (I think I've got dates and timespans roughly right). The irony of economics itself having an inefficient and lossy information propogation system, and a notoriously poor grip on its own history, is not minor.

The Internet Archive, Sci-Hub, and various archives across the Web (some quite highly ideological in their foundation, though the content included is often quite good) are among my most utilised tools.

Libraries as well -- ILL can deliver virtually anything to you in a few days, weeks at the outside. It's quite possible to scan 500+ page books in an hour for transfer to a tablet -- either I'm getting stronger or technology's improving, as I can carry 1,500 books with one hand.

[+] stenl|9 years ago|reply

I made a simple service for myself (http://paperfeed.io) which is a feed of all the new papers in journals I care about. I can "star" papers for reading later. Works extremely well for my habits.

You're welcome to try it (not sure if the signup workflow still works; let me know). I'll be happy to hear your feedback.

Edit: you can upvote papers, and they'll float to the top just like on HN.

[+] syntaxing|9 years ago|reply

This might be off topic but would you mind sharing how you wrote the website and if you have any tutorial that you can recommend? I want to design something extremely similar for a different application but I do not have much knowledge in web development (I am more experience in programming for numerical and data analysis). I figure this might be a good project to get my feet wet. Thanks!

[+] semaphoreP|9 years ago|reply

I actually just manually check arxiv every morning for the new submissions in my field. It's like getting in the habit of browsing reddit except with a lot less cute animal pictures (maybe because I'm not in biology).

[+] semi-extrinsic|9 years ago|reply

ArXiv has email search alerts. I subscribe to a few topics, they are well formatted plain text digests.

I also have a few ScienceDirect search alerts set up, that come in once every few weeks typically with 1-5 papers.

And Google Scholar, if you use it and you are logged in with an account, learns from your search history and suggests new papers for you to read. It's relatively good.

[+] iandanforth|9 years ago|reply

In case someone here hasn't seen it: http://www.arxiv-sanity.com/ (Machine learning topic specific)

[+] jlarocco|9 years ago|reply

I don't. If I'm working on something and need (or want) the latest cutting edge algorithms then I search for papers in that area as I need it. Otherwise, there's simply too much stuff going on to try reading through everything, or even a filtered down subset. Only a very small portion of it will be remotely relevant to my work or my interests.

If there's a fundamental new result in basic CS or something like that, I figure I'll hear about it on HN or another news site.

I can imagine it's different for people actively working on new research, though.

[+] housel|9 years ago|reply

For programming language research, 1) the RSS feed of http://lambda-the-ultimate.org/ (Lambda the Ultimate), and 2) my old-school paper subscription to ACM SIGPLAN, which includes printed proceedings for most of the relevant ACM conferences (POPL, PLDI, OOPSLA etc.)

[+] eatbitseveryday|9 years ago|reply

I manually check conference proceedings when released:

- OSDI - SOSP - FAST - EuroSys - APSys - NSDI - SIGCOMM - ATC - ISMM - PLDI - VLDB

These days, accepted papers in specialized conferences are actually on mixed topics these days.. like you'll see security and file systems in SOSP

[+] yodsanklai|9 years ago|reply

In addition to the important conferences proceedings, it's common for researchers to work in a very narrow subfield where everybody knows everybody. They keep seeing each other at various events where they discuss their ongoing work.

[+] tachim|9 years ago|reply

Surprising that feed.ly hasn't been mentioned. It's like gmail for feeds, and it has all the arxiv categories prepopulated. My workflow is as follows: (i) check feedly every day, see ~20-30 new articles, (ii) skim all the abstracts in 5-10 minutes, (iii) mark 0-2 to read later in the day, (iv) mark rest as read, and repeat.

[+] dredmorbius|9 years ago|reply

E.g., it's an RSS / Atom reader.

Yes, this is precisely the sort of application RSS is excellent for.

[+] inputcoffee|9 years ago|reply

Just knocked this out after reading this question (using an open source tool developed as a Show HN project called https://www.hellobox.co ):

http://www.ivoryturret.com/

I hope it catches on.

Others have tried and they don't get enough traffic to get it to take off but since low levels of hosting are free, I could just keep it out there for a long time.

[+] otaviogood|9 years ago|reply

http://www.arxiv-sanity.com That helps sort through arxiv papers and get recommendations.

[+] adamnemecek|9 years ago|reply

There should be something like reddit for academic papers. With upvotes and what not. But I guess it takes people longer to read a paper than to read reddit content.

[+] Analemma_|9 years ago|reply

It's a neat idea, but I would want identity verification - only upvotes from people well-versed in the field should "count", precisely so it doesn't become Reddit. Which means you would have a chicken-and-egg problem when the service got started and few experts were on it yet.

[+] trurl42|9 years ago|reply

https://scirate.com/

Is something like that for papers on the arXiv.

[+] wodenokoto|9 years ago|reply

I think this is what academia.edu is trying to be. It seems to be a mix of reddit and linkedin for academics.

[+] sitkack|9 years ago|reply

pretty sure this is reddit, just make a subreddit for the topic and then start feeding it.

[+] azuajef|9 years ago|reply

In the bio/health/bio-info areas: a key option is to create alerts with http://www.ncbi.nlm.nih.gov/pubmed

[+] roadnottaken|9 years ago|reply

Yes, and Google Scholar alerts are also useful and pick-up slightly different things. Good to have both

[+] sybilckw|9 years ago|reply

I've been using http://www.sparrho.com throughout my PhD (in Biochemistry) and I was so impressed with its recommendation engine that I joined their team last year. We've been making a lot of changes to the Sparrho platform lately, including adding a pinboard feature to help lab groups and journal clubs coordinate their reading and keep their comments in a single place. Our database are updated hourly with papers from 45,000+ sources from all scientific and engineering fields, including arXiv. Most of our users set up Sparrho email alerts to replace journal eTOCs/newsletters, RSS feeds and Google Scholar alerts. I'd love to hear what you think! Free sign up here: http://www.sparrho.com

[+] outerspace|9 years ago|reply

Take a look at academia.edu. It's basically a social network for the academia. Researchers can post their papers and follow other people's work.

[+] vram22|9 years ago|reply

Yes. I have an account there. Saw either in their newsletter or on their site recently, that they say some X0 million people (researchers) are using it.

135 comments