In memory of Aaron Swartz: a collection of PDFs from PDFtribute

[+] houshuang|13 years ago|reply

An important thing to remember, is that many journals already permit self-archiving of publications (ie. uploading a pre-print to a personal server or an institutional repository). In fact, about 70% of large publishers automatically allow some form of self-archiving, and for the others, many have been successful including a copyright addendum with the copyright-transfer document, retaining some rights (http://scholars.sciencecommons.org/). FAQ on self-archiving (http://www.eprints.org/openaccess/self-faq/).

At my university, we keep running workshops, there are student staff in the library willing to help upload articles to the repository if you just e-mail them, etc, but still, most academics won't take the five minutes to do this, even if they have the right.

This doesn't mean that the academic publishing system shouldn't change, it absolutely should. And there's also a lot of value in "liberating" academic publications that would otherwise not be free. But I hope people would become more aware of what is already possible, and legal!

[+] streptomycin|13 years ago|reply

Agreed, a lot of people don't know their current rights. One major reason is because that information is typically buried in some unintuitive legalese deep in some publisher's website. To work around that problem... this is a very useful database that will allow you to easily check what a journal/publisher allows you to do with your publications: http://www.sherpa.ac.uk/romeo/

Most in my field at least allow you to put postprints (the final version of the paper, but not formatted by the journal's typesetters) online, although there are a few stragglers who don't let you do anything.

[+] houshuang|13 years ago|reply

As long as these PDFs are exposed publicly (and linked to, which a tweet with or without #pdftribute will take care of), they will mostly be indexed by Google Scholar, which does a decent job of extracting metadata using heuristics etc.

Of course, it would be much better if people started embedding machine-readable metadata in PDFs (totally possible, see for example http://code.google.com/p/pdfmeat/), and if there was some agreed-upon format for bibliographic microformats, that could be embedded in websites listing articles.

We also eventually need an open alternative to Google Scholar. GS is great, and I use it every day (and love that you can output BibTex for example), but it has no API (and will never have one because of deals with publishers), actively resists automatic access, is a black-box in terms of how data is gathered, etc. Think of "Open Scholar" to Google Scholar as analogous to OSM vs GMaps. OSM might not look as pretty, or be as consistent in the beginning, but it enables a whole range of applications that GMaps doesn't. (And at least GMaps does have a fairly good API, even if it charges for overuse, GS has nothing).

(These are just some thoughts I've made, as I've been experimenting with an open scholar workflow, trying to share as much of the "byproduct" of the research, including rich notes and summaries, my own bibliography with links to OA pubs where they exist etc: http://reganmian.net/wiki/researchr:start).

Another thing I've found working on my project, where I try to expose OA links to as many pubs as possible, and regularly rescan to see if they are still available (and still OA), is how quickly documents disappear... Hosting on private pages is convenient, but fragile. Ideally, people would upload papers to university repositories, subject repositories like Arxiv.org, etc.

[+] Vivtek|13 years ago|reply

Thanks for contributing to this thread - I've been looking for something like this for years!

[+] smogzer|13 years ago|reply

Cool effort.

But ... its a score to jstor. It's unorganized.

But ... science if full of noise and crappy publications these days anyway. Lots of ways to do the same thing, unprooven and only exists because everybody has to publish to stay relevant.

Now: How to really improve science ? My suggestion: A big python framework for each field of study. That has implementations of the real algorithms and models for comparison and benchmarking and even real life implementation.

See as example in the robotics field, ROS ( Robotics Operating System) . Ros is like a basis glue framework where universities and individuals can publish their code. Its decentralized, it has simulators so that scientists do not need to own the physical robots and can even compare(diff) results and algorithms in a very fast way.

The simulator can have a embedded browser + wiki + quora that explains X.

evolution: physical paper -> PDF -> simulator.

[+] jcitme|13 years ago|reply

It's not meant to be a competitor to JSTOR, as much as this is a statement in honor of someone.

A framework like that would be awesome, but that has a different meaning from the collection of personal pdf posts/uploads each individual on Twitter contributed.

[+] edwardio|13 years ago|reply

> But ... its a score to jstor. It's unorganized.

I'd imagine competing against an organization that's been around for 18 years won't take 6-7 hours of coding. :) It's an MVP, one I'm quite embarrassed about. But I'll continue coding. Someone suggested an open Google Scholar approach - that's one direction this project could head in.

[+] dutchbrit|13 years ago|reply

Cool stuff, have you seen this yet?

http://pdftribute.net/

[+] jychang|13 years ago|reply

I think this website is different from PDFtribute.net because it actually collects and stores the PDFs rather than just having links to the Twitter posts.

From the 'About' section of the website, you can see it uses PDFtribute.net to help scrape links.

[+] devopstom|13 years ago|reply

We're working with the creator of edward.io to get some kind of integration between the two sites, and also working on a search/index/analysis tool for pdftribute.net

[+] jychang|13 years ago|reply

Looking through all the files that are uploaded, there are a lot more non-English documents than I expected; I randomly clicked on 2. It's amazing how there is support from around the world.

[+] zopticity|13 years ago|reply

It is a great loss to know such an entrepreneur has died because of legal problems. I, myself, have faced similar been in a similar situation. I feel that Aaron was a martyr for the open source of academic papers. Unfortunately he will not see his impact on this modern and technology dependent world.

R.I.P. Aaron Swartz!

[+] houshuang|13 years ago|reply

Nice short article: 10 things you can do to really support Open Access: http://phylogenomics.blogspot.de/2013/01/10-things-you-can-d...

[+] wreckimnaked|13 years ago|reply

Nice idea!

Also, some metadata aggregation (title, author, tags, date published) capabilities wouldn't hurt anyone.

[+] fgrt2|13 years ago|reply

in memory of Swartz, 1 million ebooks for free download

http://ebookoid.com

18 comments